from:"Merlin Moncure"

Re: MERGE ... RETURNING

2024-03-06 Thread Merlin Moncure

On Thu, Feb 29, 2024 at 1:49 PM Jeff Davis  wrote:

>
> Can we get some input on whether the current MERGE ... RETURNING patch
> is the right approach from a language standpoint?
>

MERGE_CLAUSE_NUMBER() seems really out of place to me, it feels out of
place to identify output set by number rather than some kind of name.  Did
not see a lot of support for that position though.

merlin

Re: Re: How to solve the problem of one backend process crashing and causing other processes to restart?

2023-11-13 Thread Merlin Moncure

On Mon, Nov 13, 2023 at 3:14 AM yuansong  wrote:

> Enhancing the overall fault tolerance of the entire system for this
> feature is quite important. No one can avoid bugs, and I don't believe that
> this approach is a more advanced one. It might be worth considering adding
> it to the roadmap so that interested parties can conduct relevant research.
>
> The current major issue is that when one process crashes, resetting all
> connections has a significant impact on other connections. Is it possible
> to only disconnect the crashed connection and have the other connections
> simply roll back the current transaction without reconnecting? Perhaps this
> problem can be mitigated through the use of a connection pool.
>
> If we want to implement this feature, would it be sufficient to clean up
> or restore the shared memory and disk changes caused by the crashed
> backend? Besides clearing various known locks, what else needs to be
> changed? Does anyone have any insights? Please help me. Thank you.
>

One thing that's really key to understand about postgres is that there are
a different set of rules regarding what is the database's job to solve vs
supporting libraries and frameworks.  It isn't that hard to wait and retry
a query in most applications, and it is up to you to do that.There are
also various connection poolers that might implement retry logic, and not
having to work through those concerns keeps the code lean and has other
benefits.  While postgres might implement things like a built in connection
pooler, 'o_direct' type memory management, and things like that, there are
long term costs to doing them.

There's another side to this.  Suppose I had to choose between a
hypothetical postgres that had some kind of process local crash recovery
and the current implementation. I might still choose the current
implementation because, in general, crashes are good, and the full reset
has a much better chance of clearing the underlying issue that caused the
problem vs managing the symptoms of it.

merlin

Re: MERGE ... RETURNING

2023-11-01 Thread Merlin Moncure

On Wed, Nov 1, 2023 at 5:12 AM Dean Rasheed 
wrote:

> On Tue, 31 Oct 2023 at 23:19, Vik Fearing  wrote:
> >
> > On 10/31/23 19:28, Jeff Davis wrote:
> >
> > > Assuming we have one RETURNING clause at the end, then it creates the
> > > problem of how to communicate which WHEN clause a tuple came from,
> > > whether it's the old or the new version, and/or which action was
> > > performed on that tuple.
> > >
> > > How do we communicate any of those things? We need to get that
> > > information into the result table somehow, so it should probably be
> > > some kind of expression that can exist in the RETURNING clause. But
> > > what kind of expression?
> > >
> > > (a) It could be a totally new expression kind with a new keyword (or
> > > recycling some existing keywords for the same effect, or something that
> > > looks superficially like a function call but isn't) that's only valid
> > > in the RETURNING clause of a MERGE statement. If you use it in another
> > > expression (say the targetlist of a SELECT statement), then you'd get a
> > > failure at parse analysis time.
> >
> > This would be my choice, the same as how the standard GROUPING()
> > "function" for grouping sets is implemented by GroupingFunc.
> >
>
> Something I'm wondering about is to what extent this discussion is
> driven by concerns about aspects of the implementation (specifically,
> references to function OIDs in code), versus a desire for a different
> user-visible syntax. To a large extent, those are orthogonal
> questions.
>
> (As an aside, I would note that there are already around a dozen
> references to specific function OIDs in the parse analysis code, and a
> lot more if you grep more widely across the whole of the backend
> code.)
>
> At one point, as I was writing this patch, I went part-way down the
> route of adding a new node type (I think I called it MergeFunc), for
> these merge support functions, somewhat inspired by GroupingFunc. In
> the end, I backed out of that approach, because it seemed to be
> introducing a lot of unnecessary additional complexity, and I decided
> that a regular FuncExpr would suffice.
>
> If pg_merge_action() and pg_merge_when_clause_number() were
> implemented using a MergeFunc node, it would reduce the number of
> places that refer to specific function OIDs. Basically, a MergeFunc
> node would be very much like a FuncExpr node, except that it would
> have a "levels up" field, set during parse analysis, at the point
> where we check that it is being used in a merge returning clause, and
> this field would be used during subselect planning. Note, however,
> that that doesn't entirely eliminate references to specific function
> OIDs -- the parse analysis code would still do that. Also, additional
> special-case code in the executor would be required to handle
> MergeFunc nodes. Also, code like IncrementVarSublevelsUp() would need
> adjusting, and anything else like that.
>
> A separate question is what the syntax should be. We could invent a
> new syntax, like GROUPING(). Perhaps:
>
>   MERGE(ACTION) instead of pg_merge_action()
>   MERGE(CLAUSE NUMBER) instead of pg_merge_when_clause_number()
>

Hm, still struggling with this merge action and (especially) number stuff.
Currently we have:

 WHEN MATCHED [ AND *condition* ] THEN { *merge_update* |
*merge_delete* | DO NOTHING } |
  WHEN NOT MATCHED [ AND *condition* ] THEN { *merge_insert* | DO NOTHING } }

What about extending to something like:

WHEN MATCHED [ AND *condition* ] [ AS *merge_clause_name ]*

WHEN MATCHED AND tid > 2 AS giraffes THEN UPDATE SET balance = t.balance +
delta

...and have pg_merge_clause() return 'giraffes' (of name type).  If merge
clause is not identified, maybe don't return any data for that clause
through returning,, or return NULL.  Maybe 'returning' clause doesn't have
to be extended or molested in any way, it would follow mechanics as per
'update', and could not refer to identified merge_clauses, but would allow
for pg_merge_clause() functioning.  You wouldn't need to identify action or
number.  Food for thought, -- may have missed some finer details upthread.

for example,
with r as (
  merge into x using y on x.a = y.a
  when matched and x.c > 0 as good then do nothing
  when matched and x.c <= 0 as bad then do nothing
  returning pg_merge_clause(), x.*
) ...

yielding
pg_merge_clause a  c
good1  5
good2  7
bad 3  0
...

...maybe allow pg_merge_clause()  take to optionally yield column name:
  returning pg_merge_clause('result'), x.*
) ...

yielding
result a  c
good   1  5
good   2  7
bad3  0
...

merlin

Re: MERGE ... RETURNING

2023-10-24 Thread Merlin Moncure

On Tue, Oct 24, 2023 at 2:11 PM Jeff Davis  wrote:

> On Wed, 2023-08-23 at 11:58 +0100, Dean Rasheed wrote:
> > Updated version attached, fixing an uninitialized-variable warning
> > from the cfbot.
>
> I took another look and I'm still not comfortable with the special
> IsMergeSupportFunction() functions. I don't object necessarily -- if
> someone else wants to commit it, they can -- but I don't plan to commit
> it in this form.
>
> Can we revisit the idea of a per-WHEN RETURNING clause? The returning
> clauses could be treated kind of like a UNION, which makes sense
> because it really is a union of different results (the returned tuples
> from an INSERT are different than the returned tuples from a DELETE).
> You can just add constants to the target lists to distinguish which
> WHEN clause they came from.
>
> I know you rejected that approach early on, but perhaps it's worth
> discussing further?
>

 Yeah.  Side benefit, the 'action_number' felt really out of place, and
that neatly might solve it.  It doesn't match tg_op, for example.  With the
current approach, return a text, or an enum? Why doesn't it match concepts
that are pretty well established elsewhere?  SQL has a pretty good track
record for not inventing weird numbers with no real meaning (sadly, not so
much the developers).   Having said that, pg_merge_action() doesn't feel
too bad if the syntax issues can be worked out.

Very supportive of the overall goal.

merlin

Memory knob testing (was Re: Let's make PostgreSQL multi-threaded)

2023-10-11 Thread Merlin Moncure

On Fri, Aug 25, 2023 at 8:35 AM Stephen Frost  wrote:

> Greetings,
>
> This is getting a bit far afield in terms of this specific thread, but
> there's an ongoing effort to give PG administrators knobs to be able to
> control how much actual memory is used rather than depending on the
> kernel to actually tell us when we're "out" of memory.  There'll be new
> patches for the September commitfest posted soon.  If you're interested
> in this issue, it'd be great to get more folks involved in review and
> testing.
>

Noticed I missed this.  I'm interested.   Test #1 would be to set memory to
about max there is, maybe a hair under, turn off swap, and see what happens
in various dynamic load situations.

Disabling overcommit is not a practical solution in my experience; it moves
instability from one place to another and seems to make problems appear in
a broader set of situations. For zero downtime platforms it has place but I
would tend to roll the dice on a reboot even for direct user facing
applications given that it can provide relief for systemic conditions.

My unsophisticated hunch is that postgres and the kernel are not on the
same page about memory somehow and that the multi-process architecture
might be contributing to that issue.  Of course, regarding
rearchitecture skeptically and realistically is a good idea given the
effort and risks.

I guess, in summary, I would personally rate things like better management
of resource tradeoffs, better handling of transient dmenands, predictable
failure modes, and stability in dynamic workloads over things like better
performance in extremely high concurrency situations.  Others might think
differently for objectively good reasons.

merlin

Re: Request for comment on setting binary format output per session

2023-10-04 Thread Merlin Moncure

On Wed, Oct 4, 2023 at 9:17 AM Peter Eisentraut 
wrote:

> I think intuitively, this facility ought to work like client_encoding.
> There, the client declares its capabilities, and the server has to
> format the output according to the client's capabilities.  That works,
> and it also works through connection poolers.  (It is a GUC.)  If we can
> model it like that as closely as possible, then we have a chance of
> getting it working reliably.  Notably, the value space for
> client_encoding is a globally known fixed list of strings.  We need to
> figure out what is the right way to globally identify types, like either
> by fully-qualified name, by base name, some combination, how does it
> work with extensions, or do we need a new mechanism like UUIDs.  I think
> that is something we need to work out, no matter which protocol
> mechanism we end up using.
>

 Fantastic write up.

> globally known fixed list of strings
Are you suggesting that we would have a client/server negotiation such as,
'jdbc', 'all', etc where that would identify which types are done
which way?  If you did that, why would we need to promote names/uuid to
permanent global space?

merlin

Re: dubious warning: FORMAT JSON has no effect for json and jsonb types

2023-08-16 Thread Merlin Moncure

On Wed, Aug 16, 2023 at 8:55 AM Peter Eisentraut 
wrote:

> This warning comes from parse_expr.c transformJsonValueExpr() and is
> triggered for example by the following test case:
>
> SELECT JSON_OBJECT('foo': NULL::json FORMAT JSON);
> WARNING:  FORMAT JSON has no effect for json and jsonb types
>
> But I don't see anything in the SQL standard that would require this
> warning.  It seems pretty clear that FORMAT JSON in this case is
> implicit and otherwise without effect.
>
> Also, we don't have that warning in the output case (RETURNING json
> FORMAT JSON).
>
> Anyone remember why this is here?  Should we remove it?


+1 for removing, on the basis that it is not suprising, and would pollute
logs for most configurations.

merlin

Re: Let's make PostgreSQL multi-threaded

2023-08-11 Thread Merlin Moncure

On Thu, Jul 27, 2023 at 8:28 AM David Geier  wrote:

> Hi,
>
> On 6/7/23 23:37, Andres Freund wrote:
> > I think we're starting to hit quite a few limits related to the process
> model,
> > particularly on bigger machines. The overhead of cross-process context
> > switches is inherently higher than switching between threads in the same
> > process - and my suspicion is that that overhead will continue to
> > increase. Once you have a significant number of connections we end up
> spending
> > a *lot* of time in TLB misses, and that's inherent to the process model,
> > because you can't share the TLB across processes.
>
> Another problem I haven't seen mentioned yet is the excessive kernel
> memory usage because every process has its own set of page table entries
> (PTEs). Without huge pages the amount of wasted memory can be huge if
> shared buffers are big.

Hm, noted this upthread, but asking again, does this
help/benefit interactions with the operating system make oom kill
situations less likely?   These things are the bane of my existence, and
I'm having a hard time finding a solution that prevents them other than
running pgbouncer and lowering max_connections, which adds complexity.  I
suspect I'm not the only one dealing with this.   What's really scary about
these situations is they come without warning.  Here's a pretty typical
example per sar -r.

 kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit
%commit  kbactive   kbinact   kbdirty
 14:20:02   461612  15803476 97.16 0  11120280  12346980
  60.35  10017820   4806356   220
 14:30:01   378244  15886844 97.67 0  11239012  12296276
  60.10  10003540   4909180   240
 14:40:01   308632  15956456 98.10 0  11329516  12295892
  60.10  10015044   4981784   200
 14:50:01   458956  15806132 97.18 0  11383484  12101652
  59.15   9853612   5019916   112
 15:00:01 10592736   5672352 34.87 0   4446852   8378324
  40.95   1602532   3473020   264   <-- reboot!
 15:10:01  9151160   7113928 43.74 0   5298184   8968316
  43.83   2714936   3725092   124
 15:20:01  8629464   7635624 46.94 0   6016936   8777028
  42.90   2881044   4102888   148
 15:30:01  8467884   7797204 47.94 0   6285856   8653908
  42.30   2830572   4323292   436
 15:40:02  8077480   8187608 50.34 0   6828240   8482972
  41.46   2885416   4671620   320
 15:50:01  7683504   8581584 52.76 0   7226132   8511932
  41.60   2998752   4958880   308
 16:00:01  7239068   9026020 55.49 0   7649948   8496764
  41.53   3032140   5358388   232
 16:10:01  7030208   9234880 56.78 0   7899512   8461588
  41.36   3108692   5492296   216

Triggering query was heavy (maybe even runaway), server load was minimal
otherwise:

 CPU %user %nice   %system   %iowait%steal
%idle
 14:30:01all  9.55  0.00  0.63  0.02  0.00
89.81

 14:40:01all  9.95  0.00  0.69  0.02  0.00
89.33

 14:50:01all 10.22  0.00  0.83  0.02  0.00
88.93

 15:00:01all 10.62  0.00  1.63  0.76  0.00
86.99

 15:10:01all  8.55  0.00  0.72  0.12  0.00
90.61

The conjecture here is that lots of idle connections make the server appear
to have less memory available than it looks, and sudden transient demands
can cause it to destabilize.

Just throwing it out there, if it can be shown to help it may be supportive
of moving forward with something like this, either instead of, or along
with, O_DIRECT or other internalized database memory management
strategies.  Lowering context switches, faster page access etc are of
course nice would not be a game changer for the workloads we see which are
pretty varied  (OLTP, analytics) although we don't extremely high
transaction rates.

merlin

Re: Let's make PostgreSQL multi-threaded

2023-06-05 Thread Merlin Moncure

On Mon, Jun 5, 2023 at 12:25 PM Heikki Linnakangas  wrote:

> We currently bend over backwards to make all allocations fixed-sized in
> shared memory. You learn to live with that, but a lot of things would be
> simpler if you could allocate and free in shared memory more freely.
> It's no panacea, you still need to be careful with locking and
> concurrency. But a lot simpler.

Would this help with oom killer in linux?

Isn't it true that pgbouncer provides a lot of the same benefits?

merlin

Re: Request for comment on setting binary format output per session

2023-04-24 Thread Merlin Moncure

On Thu, Apr 20, 2023 at 2:52 PM Dave Cramer  wrote:

>
> As promised here is a patch with defines for all of the protocol messages.
>
I created a protocol.h file and put it in src/includes
> I'm fairly sure that some of the names I used may need to be changed but
> the grunt work of finding and replacing everything is done.
>

In many cases, converting inline character to macro eliminates the need for
inline comment, e.g.:
+ case SIMPLE_QUERY: /* simple query */

...that's more work obviously, do you agree and if so would you like some
help going through that?

merlin

>

Re: Request for comment on setting binary format output per session

2023-04-14 Thread Merlin Moncure

On Mon, Apr 3, 2023 at 11:29 AM Dave Cramer  wrote:

> > participating clients to receive GUC configured format.  I do not
>
>> > think that libpq's result format being able to be overridden by GUC
>>> > is a good idea at all, the library has to to participate, and I
>>> > think can be made to so so without adjusting the interface (say, by
>>> > resultFormat = 3).
>>>
>>> Interesting idea. I suppose you'd need to specify 3 for all result
>>> columns? That is a protocol change, but wouldn't "break" older clients.
>>> The newer clients would need to make sure that they're connecting to
>>> v16+, so using the protocol version alone wouldn't be enough. Hmm.
>>>
>>
>>
> So this only works with extended protocol and not simple queries.
> Again, as Peter mentioned it's already easy enough to confuse psql using
> binary cursors so
> it makes sense to fix psql either way.
>
> If you use resultFormat (3) I think you'd still end up doing the Describe
> (which we are trying to avoid) to make sure you could receive all the
> columns in binary.
>

Can you elaborate on why Describe would have to be passed?  Agreed that
would be a dealbreaker if so.   If you pass a 3 sending it in, the you'd be
checking PQfformat on data coming back as 0/1, or at least that's be smart
since you're indicating the server is able to address the format.   This
addresses the concern libpq clients currently passing resultfomat zero
could not have that decision overridden by the server which I also think is
a dealbreaker.  There might be other reasons why a describe message may be
forced however.

merlin

Re: Request for comment on setting binary format output per session

2023-03-30 Thread Merlin Moncure

On Wed, Mar 29, 2023 at 11:04 AM Jeff Davis  wrote:

>  I'm not clear on what proposal you are making and/or endorsing?
>

ha -- was just backing up dave's GUC idea.

> 1. Fix our own clients, like psql, to check for binary data they can't
> process.
>

This ought to be impossible IMO.  All libpq routines except PQexec have an
explicit expectation on format (via resultformat parameter) that should not
be overridden.  PQexec ought to be explicitly documented and wired to only
request text format data.

resultfomat can be extended now or later to allow participating clients to
receive GUC configured format.  I do not think that libpq's result format
being able to be overridden by GUC is a good idea at all, the library has
to to participate, and I think can be made to so so without adjusting the
interface (say, by resultFormat = 3).  Similarly, in JDBC world, it ought
to be up to the driver to determine when it want the server to flex wire
formats but must be able to override the server's decision.

merlin

Re: Request for comment on setting binary format output per session

2023-03-24 Thread Merlin Moncure

On Tue, Mar 21, 2023 at 4:47 PM Jeff Davis  wrote:

> On Tue, 2023-03-21 at 09:22 -0400, Dave Cramer wrote:
> > As Jeff mentioned there is a visibility problem if the search path is
> > changed. The simplest solution IMO is to look up the OID at the time
> > the format is requested and use the OID going forward to format the
> > output as binary. If the search path changes and a type with the same
> > name is now first in the search path then the data would be returned
> > in text.
>
> Am I over-thinking this?
>

I think so.  Dave's idea puts a lot of flexibility into the client
side, and that's good.  Search path mechanics are really well understood
and well integrated with extensions already (create extension ..schema)
assuming that the precise time UDT are looked up in an unqualified way is
very clear to- or invoked via- the client code.  I'll say it again though;
OIDs really ought to be considered a transient cache of type information
rather than a permanent identifier.

Regarding UDT, lots of common and useful scenarios (containers, enum,
range, etc), do not require special knowledge to parse beyond the kind of
type it is.  Automatic type creation from tables is one of the most
genius things about postgres and directly wiring client structures to them
through binary is really nifty.  This undermines the case that binary
parsing requires special knowledge IMO, UDT might in fact be the main long
term target...I could see scenarios where java classes might be glued
directly to postgres tables by the driver...this would be a lot more
efficient than using json which is how everyone does it today.  Then again,
maybe *I* might be overthinking this.

merlin

Re: Request for comment on setting binary format output per session

2023-03-21 Thread Merlin Moncure

On Tue, Mar 21, 2023 at 8:22 AM Dave Cramer  wrote:

>
> On Tue, 21 Mar 2023 at 07:35, Merlin Moncure  wrote:
>
>>
>>
>> On Mon, Mar 20, 2023 at 7:11 PM Dave Cramer  wrote:
>>
>>>
>>>
>>>
>>> On Mon, 20 Mar 2023 at 19:10, Merlin Moncure  wrote:
>>>
>>>>
>>>>
>>>> On Mon, Mar 13, 2023 at 3:33 PM Dave Cramer 
>>>> wrote:
>>>>
>>>>>
>>>>> OIDs are a pain to deal with IMO.   They will not survive a dump style
>>>> restore, and are hard to keep synchronized between databases...type names
>>>> don't have this problem.   OIDs are an implementation artifact that ought
>>>> not need any extra dependency.
>>>>
>>> AFAIK, OID's for built-in types don't change.
>>> Clearly we need more thought on how to deal with UDT's
>>>
>>
>> Yeah.  Not having a solution that handles arrays and composites though
>> would feel pretty incomplete since they would be the one of the main
>> beneficiaries from a performance standpoint.
>>
> I don't think arrays of built-in types are a problem; drivers already know
> how to deal with these.
>
>
>> I guess minimally you'd need to expose some mechanic to look up oids, but
>> being able to specify "foo"."bar", in the GUC would be pretty nice (albeit
>> a lot more work).
>>
>
> As Jeff mentioned there is a visibility problem if the search path is
> changed.
>

Only if the name is not fully qualified. By allowing OID to bypass
visibility, it stands to reason visibility ought to be bypassed for type
requests as well, or at least be able to be.  If we are setting things in
GUC, that suggests we can establish things in postgresql.conf, and oids
feel out of place there.

Yep.  Your rationale is starting to click.  How would this interact with
>> existing code bases?
>>
> Actually JDBC wasn't the first to ask for this.  Default result formats
> should be settable per session · postgresql-interfaces/enhancement-ideas ·
> Discussion #5 (github.com)
> <https://github.com/postgresql-interfaces/enhancement-ideas/discussions/5> 
> I've
> tested it with JDBC and it requires no code changes on our end. Jack tested
> it and it required no code changes on his end either. He did some
> performance tests and found "At 100 rows the text format takes 48% longer
> than the binary format."
> https://github.com/postgresql-interfaces/enhancement-ideas/discussions/5#discussioncomment-3188599
>

Yeah, the general need is very clear IMO.


> I get that JDBC is the main target, but how does this interact with libpq
>> code that explicitly sets resultformat?
>>
> Honestly I have no idea how it would function with libpq. I presume if the
> client did not request binary format then things would work as they do
> today.
>

I see your argument here, but IMO this is another can of nudge away from
GUC, unless you're willing to establish that behavior.  Thinking here is
that the GUC wouldn't do anything for libpq, uses cases, and couldn't,
since resultformat would be overriding the behavior in all interesting
cases...it seems odd to implement server side specified behavior that the
client library doesn't implement.

merlin

Re: Request for comment on setting binary format output per session

2023-03-21 Thread Merlin Moncure

On Mon, Mar 20, 2023 at 7:11 PM Dave Cramer  wrote:

>
>
>
> On Mon, 20 Mar 2023 at 19:10, Merlin Moncure  wrote:
>
>>
>>
>> On Mon, Mar 13, 2023 at 3:33 PM Dave Cramer  wrote:
>>
>>>
>>> OIDs are a pain to deal with IMO.   They will not survive a dump style
>> restore, and are hard to keep synchronized between databases...type names
>> don't have this problem.   OIDs are an implementation artifact that ought
>> not need any extra dependency.
>>
> AFAIK, OID's for built-in types don't change.
> Clearly we need more thought on how to deal with UDT's
>

Yeah.  Not having a solution that handles arrays and composites though
would feel pretty incomplete since they would be the one of the main
beneficiaries from a performance standpoint.I guess minimally you'd
need to expose some mechanic to look up oids, but being able to
specify "foo"."bar", in the GUC would be pretty nice (albeit a lot more
work).

> This seems like a protocol or even a driver issue rather than a GUC issue.
>> Why does the server need to care what format the client might want to
>> prefer on a query by query basis?
>>
>
> Actually this isn't a query by query basis. The point of this is that the
> client wants all the results for given OID's in binary.
>

Yep.  Your rationale is starting to click.  How would this interact with
existing code bases?  I get that JDBC is the main target, but how does this
interact with libpq code that explicitly sets resultformat? Perhaps the
answer should be as it shouldn't change documented behavior, and a
hypothetical resultformat=2 could be reserved to default to text but allow
for server control, and 3 as the same but default to binary.

merlin

Re: Request for comment on setting binary format output per session

2023-03-20 Thread Merlin Moncure

On Mon, Mar 13, 2023 at 3:33 PM Dave Cramer  wrote:

>
> Dave Cramer
>
>
> On Sat, 4 Mar 2023 at 19:39, Dave Cramer  wrote:
>
>>
>>
>> On Sat, 4 Mar 2023 at 19:06, Tom Lane  wrote:
>>
>>> Jeff Davis  writes:
>>> > On Sat, 2023-03-04 at 18:04 -0500, Dave Cramer wrote:
>>> >> Most of the clients know how to decode the builtin types. I'm not
>>> >> sure there is a use case for binary encode types that the clients
>>> >> don't have a priori knowledge of.
>>>
>>> > The client could, in theory, have a priori knowledge of a non-builtin
>>> > type.
>>>
>>> I don't see what's "in theory" about that.  There seems plenty of
>>> use for binary I/O of, say, PostGIS types.  Even for built-in types,
>>> do we really want to encourage people to hard-wire their OIDs into
>>> applications?
>>>
>>
>> How does a client read these? I'm pretty narrowly focussed. The JDBC API
>> doesn't really have a way to read a non built-in type.  There is a facility
>> to read a UDT, but the user would have to provide that transcoder. I guess
>> I'm curious how other clients read binary UDT's ?
>>
>>>
>>> I don't see a big problem with driving this off a GUC, but I think
>>> it should be a list of type names not OIDs.  We already have plenty
>>> of precedent for dealing with that sort of thing; see search_path
>>> for the canonical example.  IIRC, there's similar caching logic
>>> for temp_tablespaces.
>>>
>>
>> I have no issue with allowing names, OID's were compact, but we could
>> easily support both
>>
>
> Attached is a preliminary patch that takes a list of OID's. I'd like to
> know if this is going in the right direction.
>
> Next step would be to deal with type names as opposed to OID's.
> This will be a bit more challenging as type names are schema specific.
>

OIDs are a pain to deal with IMO.   They will not survive a dump style
restore, and are hard to keep synchronized between databases...type names
don't have this problem.   OIDs are an implementation artifact that ought
not need any extra dependency.

This seems like a protocol or even a driver issue rather than a GUC issue.
Why does the server need to care what format the client might want to
prefer on a query by query basis?  I just don't see it. The resultformat
switch in libpq works pretty well, except that it's "all in" on getting
data from the server, with the dead simple workaround of casting to text
which might even be able to be managed from within the driver itself.

merlin

feature request: IN clause optimized through append nodes with UNION ALL

2023-01-20 Thread Merlin Moncure

In the script below, the presence of an IN clause forces the internal
components of the UNION ALL clause to fully compute even though they are
fully optimizable.  = ANY doesn't have this issue, so I wonder if there is
any opportunity to convert the 'slow' variant (see below) to the 'fast'
variant.thank you!

merlin


drop table a cascade;
drop table b cascade;
drop table c cascade;

create table a (a_id int  primary key);
create table b (b_id int primary key, a_id int references a);
create table c (c_id int primary key, b_id int references b);


insert into a select s from generate_series(1, 5) s;
insert into b select s, (s % 5 ) + 1 from generate_series(1, 10) s;
insert into c select s, (s % 10 ) + 1 from generate_series(1, 100)
s;

create index on b (a_id, b_id);
create index on c (b_id, c_id);

analyze a;
analyze b;
analyze c;


create temp table d (a_id int);
insert into d values (99);
insert into d values (999);
insert into d values ();
analyze d;

create or replace view v as
select * from a join b using(a_id) join c using(b_id)
union all select * from a join b using(a_id) join c using(b_id);

explain analyze select * from v where a_id in (select a_id from d);   --
this is slow
explain analyze select * from v where a_id = any(array(select a_id from
d)); -- this is fast

Re: pgcon unconference / impact of block size on performance

2022-06-13 Thread Merlin Moncure

On Sat, Jun 4, 2022 at 6:23 PM Tomas Vondra 
wrote:

> Hi,
>
> At on of the pgcon unconference sessions a couple days ago, I presented
> a bunch of benchmark results comparing performance with different
> data/WAL block size. Most of the OLTP results showed significant gains
> (up to 50%) with smaller (4k) data pages.
>

Wow.  Random numbers are fantastic,  Significant reduction in sequential
throughput is a little painful though, I see 40% reduction in some cases if
I'm reading that right.  Any thoughts on why that's the case?  Are there
mitigations possible?

merlin

Re: weird interaction between asynchronous queries and pg_sleep

2021-04-08 Thread Merlin Moncure

On Thu, Apr 8, 2021 at 1:05 PM Merlin Moncure  wrote:
> This effect is only noticeable when the remote query is returning
> volumes of data.  My question is, is there any way to sleep loop
> client side without giving up 3x performance penalty?  Why is that
> that when more local sleep queries are executed, performance improves?

Looking at this more, it looks like that when sleeping with pg_sleep,
libpq does not receive the data.  I think for this type of pattern to
work correctly, dblink would need a custom sleep function wrapping
poll (or epoll) that consumes input on the socket when signalled read
ready.

merlin

weird interaction between asynchronous queries and pg_sleep

2021-04-08 Thread Merlin Moncure

Consider the following snippet

create table data as select generate_series(1,100) s;

do $d$
begin
  PERFORM * FROM dblink_connect('test','');

  PERFORM * from dblink_send_query('test', 'SELECT * FROM data');

  LOOP
if dblink_is_busy('test') = 0
THEN
  PERFORM * FROM dblink_get_result('test') AS R(V int);
  PERFORM * FROM dblink_get_result('test') AS R(V int);
  RETURN;
END IF;

PERFORM pg_sleep(.001);
  END LOOP;

  PERFORM * FROM dblink_disconnect('test');
END;
$d$;

What's interesting here is that, when I vary the sleep parameter, I get:
0: .4 seconds (per top, this is busywait), same as running synchronous.
0.01: 1.4 seconds
0.001: 2.4 seconds
0.01: 10.6 seconds
0.1: does not terminate

This effect is only noticeable when the remote query is returning
volumes of data.  My question is, is there any way to sleep loop
client side without giving up 3x performance penalty?  Why is that
that when more local sleep queries are executed, performance improves?

merlin

Re: unconstrained memory growth in long running procedure stored procedure after upgrading 11-12

2021-03-31 Thread Merlin Moncure

On Tue, Mar 30, 2021 at 7:14 PM Tom Lane  wrote:
>
> Justin Pryzby  writes:
> > On Tue, Mar 30, 2021 at 04:17:03PM -0500, Merlin Moncure wrote:
> >> We just upgraded from postgres 11 to 12.6 and our server is running
> >> out of memory and rebooting about 1-2 times a day.
>
> > I haven't tried your test, but this sounds a lot like the issue I reported 
> > with
> > JIT, which is enabled by default in v12.
>
> FWIW, I just finished failing to reproduce any problem with that
> test case ... but I was using a non-JIT-enabled build.

Yep.  Disabling jit (fully, fia jit=off, not what was suggested
upthread) eliminated the issue, or at least highly mitigated the leak.
I was using pgdg rpm packaging, which enables jit by default.  Thanks
everyone for looking at this, and the workaround is quick and easy.

merlin

unconstrained memory growth in long running procedure stored procedure after upgrading 11-12

2021-03-30 Thread Merlin Moncure

Hello all,

We just upgraded from postgres 11 to 12.6 and our server is running
out of memory and rebooting about 1-2 times a day.Application
architecture is a single threaded stored procedure, executed with CALL
that loops and never terminates. With postgres 11 we had no memory
issues.  Ultimately the crash looks like this:

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
2021-03-29 04:34:31.262 CDT [1413] LOG:  server process (PID 9792) was
terminated by signal 6: Aborted
2021-03-29 04:34:31.262 CDT [1413] DETAIL:  Failed process was
running: CALL Main()
2021-03-29 04:34:31.262 CDT [1413] LOG:  terminating any other active
server processes
2021-03-29 04:34:31.264 CDT [9741] WARNING:  terminating connection
because of crash of another server process
2021-03-29 04:34:31.264 CDT [9741] DETAIL:  The postmaster has
commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2021-03-29 04:34:31.264 CDT [9741] HINT:  In a moment you should be
able to reconnect to the database and repeat your command.
2021-03-29 04:34:31.267 CDT [1413] LOG:  archiver process (PID 9742)
exited with exit code 1
2021-03-29 04:34:31.267 CDT [1413] LOG:  all server processes
terminated; reinitializing

Attached is a self contained test case which reproduces the problem.

Instructions:
1. run the attached script in psql, pgtask_test.sql. It will create a
database, initialize it, and run the main procedure. dblink must be
available
2. in another window, run SELECT CreateTaskChain('test', 'DEV');

In the console that ran main(), you should see output that the
procedure began to do work. Once it does, a 'top' should show resident
memory growth immediately.   It's about a gigabyte an hour in my test.
Sorry for the large-ish attachment.

merlin


pgtask_test.sql
Description: Binary data


pgtask.sql
Description: Binary data

Re: plpgsql variable assignment with union is broken

2021-01-08 Thread Merlin Moncure

On Thu, Jan 7, 2021 at 10:55 AM Pavel Stehule  wrote:
>> Again, if this is narrowly confined to assignment into set query
>> operations, maybe this is not so bad. But is it?
>
>  PLpgSQL_Expr: opt_target_list
> <--><--><-->from_clause where_clause
> <--><--><-->group_clause having_clause window_clause
> <--><--><-->opt_sort_clause opt_select_limit opt_for_locking_clause
> <--><--><--><-->{
>
> So SELECT INTO and assignment are not fully compatible now.

OK.  Well, I went ahead and checked the code base for any suspicious
assignments that might fall out of the tightened syntax.  It was
cursory, but didn't turn up anything obvious.  So I'm going to change
my position on this.

My feedback would be to take backwards compatibility very seriously
relating to language changes in the future, and to rely less on
documentation arguments as it relates to change justification. The
current behavior has been in place for decades and is therefore a de
facto standard.  Change freedom ought to be asserted in scenarios
where behavior is reserved as undefined or is non standard rather than
assumed.  Rereading the development thread, it seems a fairly short
timeframe between idea origination and commit, and hypothetical
impacts to existing code bases was not rigorously assessed.  I guess
it's possible this may ultimately cause some heartburn but it's hard
to say without strong data to justify that position.

Having said all of that, I'm very excited about the array assignment
improvements and investment in this space is very much appreciated. .

merlin

Re: plpgsql variable assignment with union is broken

2021-01-07 Thread Merlin Moncure

On Wed, Jan 6, 2021 at 11:07 PM Pavel Stehule  wrote:

> čt 7. 1. 2021 v 4:20 odesílatel Merlin Moncure  napsal:
>>
>> On Tue, Jan 5, 2021 at 3:40 PM Tom Lane  wrote:
>> >
>> > easter...@verfriemelt.org writes:
>> > > i found, that the behaviour of variable assignment in combination with 
>> > > union is not working anymore:
>> > >   DO $$
>> > >   DECLARE t bool;
>> > >   begin
>> > >   t := a FROM ( SELECT true WHERE false ) t(a) UNION SELECT true AS 
>> > > a;
>> > >   END $$;
>> >
>> > > is this an intended change or is it a bug?
>> >
>> > It's an intended change, or at least I considered the case and thought
>> > that it was useless because assignment will reject any result with more
>> > than one row.  Do you have any non-toy example that wouldn't be as
>> > clear or clearer without using UNION?  The above sure seems like an
>> > example of awful SQL code.
>>
>> What is the definition of broken here?  What is the behavior of the
>> query with the change and why?
>>
>> OP's query provably returns a single row and ought to always assign
>> true as written.  A real world example might evaluate multiple
>> condition branches so that the assignment resolves true if any branch
>> is true. It could be rewritten with 'OR' of course.
>>
>> Is this also "broken"?
>>   t := a FROM ( SELECT 'something' WHERE _Flag) t(a) UNION SELECT
>> 'something else' AS a WHERE NOT _Flag;
>>
>> What about this?
>> SELECT INTO t true WHERE false
>> UNION select true;
>
>
> ANSI SQL allows only SELECT INTO or var := SQL expression and SQL expression 
> can be (subquery) too

This is PLPGSQL not ansi SQL so that's irrelevant. If queries along
the lines of:
var := FROM (SELECT ..) UNION ..

are narrowly broken, ok, but is that all that's going on here?  I
guess I ought to test.

I have a 300k line pl/pgsql project, this thread is terrifying me.  I
am going to be blunt here and say I am not comfortable with tightening
pl/pgsql syntax without an impact assessment,  The point that this is
undocumanted behavior is weak, and it's already turning up problem
reports.  IMO, expectation has been clearly set that
var := expression;

is more or less interchangeable with
SELECT INTO var expression;

Again, if this is narrowly confined to assignment into set query
operations, maybe this is not so bad. But is it?

merlin

Re: plpgsql variable assignment with union is broken

2021-01-06 Thread Merlin Moncure

On Wed, Jan 6, 2021 at 9:39 PM Tom Lane  wrote:
>
> Merlin Moncure  writes:
> > On Tue, Jan 5, 2021 at 3:40 PM Tom Lane  wrote:
> >> easter...@verfriemelt.org writes:
> >>> i found, that the behaviour of variable assignment in combination with 
> >>> union is not working anymore:
> >>> DO $$
> >>> DECLARE t bool;
> >>> begin
> >>> t := a FROM ( SELECT true WHERE false ) t(a) UNION SELECT true AS a;
> >>> END $$;
> >>> is this an intended change or is it a bug?
>
> >> It's an intended change, or at least I considered the case and thought
> >> that it was useless because assignment will reject any result with more
> >> than one row.  Do you have any non-toy example that wouldn't be as
> >> clear or clearer without using UNION?  The above sure seems like an
> >> example of awful SQL code.
>
> > What is the definition of broken here?  What is the behavior of the
> > query with the change and why?
>
> The OP is complaining that that gets a syntax error since c9d529848.
>
> > OP's query provably returns a single row and ought to always assign
> > true as written.
>
> My opinion is that (a) it's useless and (b) there has never been any
> documentation that claimed that you could do this.

Here is what the documentation says:

> variable { := | = } expression;
> As explained previously, the expression in such a statement is evaluated by 
> means of an SQL SELECT command sent to the main database engine.

This is valid SQL:
SELECT a FROM ( SELECT true WHERE false ) t(a) UNION SELECT true AS a;

So I'd argue that OP's query *is* syntactically valid per the rules as
I understand them.

and is my opinion entirely consistent with the documentation in that it
a) resolves exactly one row, and:
b) is made syntactically valid by prefixing the expression with SELECT.

Aesthetical considerations are irrelevant IMO.

merlin

Re: plpgsql variable assignment with union is broken

2021-01-06 Thread Merlin Moncure

On Tue, Jan 5, 2021 at 3:40 PM Tom Lane  wrote:
>
> easter...@verfriemelt.org writes:
> > i found, that the behaviour of variable assignment in combination with 
> > union is not working anymore:
> >   DO $$
> >   DECLARE t bool;
> >   begin
> >   t := a FROM ( SELECT true WHERE false ) t(a) UNION SELECT true AS a;
> >   END $$;
>
> > is this an intended change or is it a bug?
>
> It's an intended change, or at least I considered the case and thought
> that it was useless because assignment will reject any result with more
> than one row.  Do you have any non-toy example that wouldn't be as
> clear or clearer without using UNION?  The above sure seems like an
> example of awful SQL code.

What is the definition of broken here?  What is the behavior of the
query with the change and why?

OP's query provably returns a single row and ought to always assign
true as written.  A real world example might evaluate multiple
condition branches so that the assignment resolves true if any branch
is true. It could be rewritten with 'OR' of course.

Is this also "broken"?
  t := a FROM ( SELECT 'something' WHERE _Flag) t(a) UNION SELECT
'something else' AS a WHERE NOT _Flag;

What about this?
SELECT INTO t true WHERE false
UNION select true;

merlin

Re: Zedstore - compressed in-core columnar storage

2020-11-16 Thread Merlin Moncure

On Mon, Nov 16, 2020 at 10:07 AM Tomas Vondra
 wrote:
>
>
> On 11/16/20 1:59 PM, Merlin Moncure wrote:
> > On Thu, Nov 12, 2020 at 4:40 PM Tomas Vondra
> >  wrote:
> >>masterzedstore/pglzzedstore/lz4
> >>   -
> >>copy  1855680922131
> >>dump   751  905 811
> >>
> >> And the size of the lineitem table (as shown by \d+) is:
> >>
> >>   master: 64GB
> >>   zedstore/pglz: 51GB
> >>   zedstore/lz4: 20GB
> >>
> >> It's mostly expected lz4 beats pglz in performance and compression
> >> ratio, but this seems a bit too extreme I guess. Per past benchmarks
> >> (e.g. [1] and [2]) the difference in compression/decompression time
> >> should be maybe 1-2x or something like that, not 35x like here.
> >
> > I can't speak to the ratio, but in basic backup/restore scenarios pglz
> > is absolutely killing me; Performance is just awful; we are cpubound
> > in backups throughout the department.  Installations defaulting to
> > plgz will make this feature show very poorly.
> >
>
> Maybe. I'm not disputing that pglz is considerably slower than lz4, but
> judging by previous benchmarks I'd expect the compression to be slower
> maybe by a factor of ~2x. So the 30x difference is suspicious. Similarly
> for the compression ratio - lz4 is great, but it seems strange it's 1/2
> the size of pglz. Which is why I'm speculating that something else is
> going on.
>
> As for the "plgz will make this feature show very poorly" I think that
> depends. I think we may end up with pglz doing pretty well (compared to
> heap), but lz4 will probably outperform that. OTOH for various use cases
> it may be more efficient to use something else with worse compression
> ratio, but allowing execution on compressed data, etc.

hm, you might be right.  Doing some number crunching, I'm getting
about 23mb/sec compression on a 600gb backup image on a pretty typical
aws server.  That's obviously not great, but your numbers are much
worse than that, so maybe something else might be going on.

> I think we may end up with pglz doing pretty well (compared to heap)

I *don't* think so, or at least I'm skeptical as long as insertion
times are part of the overall performance measurement.  Naturally,
with column stores, insertion times are often very peripheral to the
overall performance picture but for cases that aren't I suspect the
results are not going to be pleasant, and advise planning accordingly.

Aside, I am very interested in this work. I may be able to support
testing in an enterprise environment; lmk if interested -- thank you

merlin

Re: Zedstore - compressed in-core columnar storage

2020-11-16 Thread Merlin Moncure

On Thu, Nov 12, 2020 at 4:40 PM Tomas Vondra
 wrote:
>masterzedstore/pglzzedstore/lz4
>   -
>copy  1855680922131
>dump   751  905 811
>
> And the size of the lineitem table (as shown by \d+) is:
>
>   master: 64GB
>   zedstore/pglz: 51GB
>   zedstore/lz4: 20GB
>
> It's mostly expected lz4 beats pglz in performance and compression
> ratio, but this seems a bit too extreme I guess. Per past benchmarks
> (e.g. [1] and [2]) the difference in compression/decompression time
> should be maybe 1-2x or something like that, not 35x like here.

I can't speak to the ratio, but in basic backup/restore scenarios pglz
is absolutely killing me; Performance is just awful; we are cpubound
in backups throughout the department.  Installations defaulting to
plgz will make this feature show very poorly.

merlin

Re: Optimize memory allocation code

2020-09-25 Thread Merlin Moncure

On Fri, Sep 25, 2020 at 7:32 PM Li Japin  wrote:
>
>
>
> > On Sep 26, 2020, at 8:09 AM, Julien Rouhaud  wrote:
> >
> > Hi,
> >
> > On Sat, Sep 26, 2020 at 12:14 AM Li Japin  wrote:
> >>
> >> Hi, hackers!
> >>
> >> I find the palloc0() is similar to the palloc(), we can use palloc() 
> >> inside palloc0()
> >> to allocate space, thereby I think we can reduce  duplication of code.
> >
> > The code is duplicated on purpose.  There's a comment at the beginning
> > that mentions it:
> >
> >  /* duplicates MemoryContextAllocZero to avoid increased overhead */
> >
> > Same for MemoryContextAllocZero() itself.
>
> Thanks! How big is this overhead? Is there any way I can test it？

Profiler.  For example, oprofile. In hot areas of the code (memory
allocation is very hot), profiling is the first step.

merlin

Re: dblnk_is_busy returns 1 for dead connecitons

2020-08-03 Thread Merlin Moncure

On Sun, Aug 2, 2020 at 9:55 PM Merlin Moncure  wrote:
>
> On Sun, Aug 2, 2020 at 7:18 PM Merlin Moncure  wrote:
> >
> > Hackers,
> >
> > I have a situation that I am observing where dblink_is_busy returns 1
> > even though the connection is long gone.   tcp keepalives are on and
> > the connection has been dead for several hours. Looking at the call
> > for dblink_is_busy, I see that  it is a thin wrapper to PQusBusy().
> > If I attempt to call dblink_get_result(), the result comes back with
> > an error mesage, 'invalid socket'. This however is not helpful since
> > there is no way to probe for dead connections in dblink that appears
> > to be 100% reliable.  My workaround that I had been relying on was to
> > call dblink_get_notify twice, which for some weird reason forced the
> > connection error to the surface.  However for whatever reason, that is
> > not working here.
> >
> > In cases the connection was cancelled via dblink_cancel_query(), so in
> > some scenarios connections cancelled that way seem to become 'stuck'.
> > Any thoughts on this?
>
> Correction, keepalives are probably not on, because dblink does not
> have an option to set them.  Also, it looks like dblink_is_busy()
> calls pqConsumeInput without checking the error code.  Is that safe?

I could not reproduce this with application external test script (see
attached if curious).  I alos noticed you can set keepalives in the
libpq connection string, so I'll do that and see if it helps, and
report back for posterity.  Thanks, sorry for the noise.

merlin


dblink_test.sql
Description: Binary data

Re: dblnk_is_busy returns 1 for dead connecitons

2020-08-02 Thread Merlin Moncure

On Sun, Aug 2, 2020 at 7:18 PM Merlin Moncure  wrote:
>
> Hackers,
>
> I have a situation that I am observing where dblink_is_busy returns 1
> even though the connection is long gone.   tcp keepalives are on and
> the connection has been dead for several hours. Looking at the call
> for dblink_is_busy, I see that  it is a thin wrapper to PQusBusy().
> If I attempt to call dblink_get_result(), the result comes back with
> an error mesage, 'invalid socket'. This however is not helpful since
> there is no way to probe for dead connections in dblink that appears
> to be 100% reliable.  My workaround that I had been relying on was to
> call dblink_get_notify twice, which for some weird reason forced the
> connection error to the surface.  However for whatever reason, that is
> not working here.
>
> In cases the connection was cancelled via dblink_cancel_query(), so in
> some scenarios connections cancelled that way seem to become 'stuck'.
> Any thoughts on this?

Correction, keepalives are probably not on, because dblink does not
have an option to set them.  Also, it looks like dblink_is_busy()
calls pqConsumeInput without checking the error code.  Is that safe?

merlin

dblnk_is_busy returns 1 for dead connecitons

2020-08-02 Thread Merlin Moncure

Hackers,

I have a situation that I am observing where dblink_is_busy returns 1
even though the connection is long gone.   tcp keepalives are on and
the connection has been dead for several hours. Looking at the call
for dblink_is_busy, I see that  it is a thin wrapper to PQusBusy().
If I attempt to call dblink_get_result(), the result comes back with
an error mesage, 'invalid socket'. This however is not helpful since
there is no way to probe for dead connections in dblink that appears
to be 100% reliable.  My workaround that I had been relying on was to
call dblink_get_notify twice, which for some weird reason forced the
connection error to the surface.  However for whatever reason, that is
not working here.

In cases the connection was cancelled via dblink_cancel_query(), so in
some scenarios connections cancelled that way seem to become 'stuck'.
Any thoughts on this?

merlin

Re: database stuck in __epoll_wait_nocancel(). Are infinite timeouts safe?

2020-03-13 Thread Merlin Moncure

On Fri, Mar 13, 2020 at 2:28 PM Andres Freund  wrote:
>
> Hi,
>
> On March 13, 2020 12:08:32 PM PDT, Merlin Moncure  wrote:
> >I have 5 servers in a testing environment that are comprise a data
> >warehousing cluster.   They will typically get each get exactly the
> >same query at approximately the same time.  Yesterday, around 1pm, 3
> >of the five got stuck on the same query.  Each of them yields similar
> >stack traces.  This  happens now and then.  The server is 9.6.12
> >(which is obviously old, but I did not see any changes in relevant
> >code).
> >
> >(gdb) bt
> >#0  0x7fe856c0b463 in __epoll_wait_nocancel () from
> >/lib64/libc.so.6
> >#1  0x006b4416 in WaitEventSetWaitBlock (nevents=1,
> >occurred_events=0x7ffc9f2b0f60, cur_timeout=-1, set=0x27cace8) at
> >latch.c:1053
> >#2  WaitEventSetWait (set=0x27cace8, timeout=timeout@entry=-1,
> >occurred_events=occurred_events@entry=0x7ffc9f2b0f60,
> >nevents=nevents@entry=1) at latch.c:1007
> >#3  0x005f26dd in secure_write (port=0x27f16a0,
> >ptr=ptr@entry=0x27f5528, len=len@entry=192) at be-secure.c:255
> >#4  0x005fb51b in internal_flush () at pqcomm.c:1410
> >#5  0x005fb72a in internal_putbytes (s=0x2a4f245 "14M04",
> >s@entry=0x2a4f228 "", len=70) at pqcomm.c:1356
> >#6  0x005fb7f0 in socket_putmessage (msgtype=68 'D',
> >s=0x2a4f228 "", len=) at pqcomm.c:1553
> >#7  0x005fd5d9 in pq_endmessage (buf=buf@entry=0x7ffc9f2b1040)
> >at pqformat.c:347
> >#8  0x00479a63 in printtup (slot=0x2958fc8, self=0x2b6bca0) at
> >printtup.c:372
> >#9  0x005c1cc9 in ExecutePlan (dest=0x2b6bca0,
> >direction=, numberTuples=0, sendTuples=1 '\001',
> >operation=CMD_SELECT,
> >use_parallel_mode=, planstate=0x2958cf8,
> >estate=0x2958be8) at execMain.c:1606
> >#10 standard_ExecutorRun (queryDesc=0x2834998, direction= >out>, count=0) at execMain.c:339
> >#11 0x006d69a7 in PortalRunSelect
> >(portal=portal@entry=0x2894e38, forward=forward@entry=1 '\001',
> >count=0, count@entry=9223372036854775807,
> >dest=dest@entry=0x2b6bca0) at pquery.c:948
> >#12 0x006d7dbb in PortalRun (portal=0x2894e38,
> >count=9223372036854775807, isTopLevel=, dest=0x2b6bca0,
> >altdest=0x2b6bca0,
> >completionTag=0x7ffc9f2b14e0 "") at pquery.c:789
> >#13 0x006d5a06 in PostgresMain (argc=,
> >argv=, dbname=, username= >out>) at postgres.c:1109
> >#14 0x0046fc28 in BackendRun (port=0x27f16a0) at
> >postmaster.c:4342
> >#15 BackendStartup (port=0x27f16a0) at postmaster.c:4016
> >#16 ServerLoop () at postmaster.c:1721
> >#17 0x00678119 in PostmasterMain (argc=argc@entry=3,
> >argv=argv@entry=0x27c8c90) at postmaster.c:1329
> >#18 0x0047088e in main (argc=3, argv=0x27c8c90) at main.c:228
> >(gdb) quit
> >
> >Now, the fact that this happened to multiple servers at time strongly
> >suggest an external (to the database) problem.  The system initiating
> >the query, a cross database query over dblink, has been has given up
> >(and has been restarted as a precaution) a long time ago, and the
> >connection is dead.   secure_write() sets however an infinite timeout
> >to the latch, and there are clearly scenarios where epoll waits
> >forever for an event that is never going to occur.  If/when this
> >happens, the only recourse is to restart the impacted database.  The
> >question is, shouldn't the latch have a looping timeout that checks
> >for interrupts?   What would the risks be of jumping directly out of
> >the latch loop?
>
> Unless there is a kernel problem latches are interruptible by signals, as the 
> signal handler should do a SetLatch().
>
> This backtrace just looks like the backend is trying to send data to the 
> client? What makes you think it's stuck?

Well, the client has been gone for > 24 hours.   But your right, when
I send cancel to the backend, here is what happens according to
strace:
epoll_wait(3, 0x2915e08, 1, -1) = -1 EINTR (Interrupted system call)
--- SIGINT {si_signo=SIGINT, si_code=SI_USER, si_pid=5024, si_uid=26} ---
write(13, "\0", 1)  = 1
rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call)
sendto(11, "\0\0\0\00545780\0\0\0\003508D\0\0\0d\0\t\0\0\0\00615202"...,
5640, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(3, [{EPOLLIN, {u32=43081176, u64=43081176}}], 1, -1) = 1
read(12, "\0", 16)  = 1
epoll_wait(3,


...pg_terminate_backend() however, does properly kill the query.

> If the connection is dead, epoll should return (both because we ask for the 
> relevant events, and because it just always implicitly does do so).
>
> So it seems likely that either your connection isn't actually dead (e.g. 
> waiting for tcp timeouts), or you have a kennel bug.

maybe, I suspect firewall issue. hard to say

merlin

database stuck in __epoll_wait_nocancel(). Are infinite timeouts safe?

2020-03-13 Thread Merlin Moncure

I have 5 servers in a testing environment that are comprise a data
warehousing cluster.   They will typically get each get exactly the
same query at approximately the same time.  Yesterday, around 1pm, 3
of the five got stuck on the same query.  Each of them yields similar
stack traces.  This  happens now and then.  The server is 9.6.12
(which is obviously old, but I did not see any changes in relevant
code).

(gdb) bt
#0  0x7fe856c0b463 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x006b4416 in WaitEventSetWaitBlock (nevents=1,
occurred_events=0x7ffc9f2b0f60, cur_timeout=-1, set=0x27cace8) at
latch.c:1053
#2  WaitEventSetWait (set=0x27cace8, timeout=timeout@entry=-1,
occurred_events=occurred_events@entry=0x7ffc9f2b0f60,
nevents=nevents@entry=1) at latch.c:1007
#3  0x005f26dd in secure_write (port=0x27f16a0,
ptr=ptr@entry=0x27f5528, len=len@entry=192) at be-secure.c:255
#4  0x005fb51b in internal_flush () at pqcomm.c:1410
#5  0x005fb72a in internal_putbytes (s=0x2a4f245 "14M04",
s@entry=0x2a4f228 "", len=70) at pqcomm.c:1356
#6  0x005fb7f0 in socket_putmessage (msgtype=68 'D',
s=0x2a4f228 "", len=) at pqcomm.c:1553
#7  0x005fd5d9 in pq_endmessage (buf=buf@entry=0x7ffc9f2b1040)
at pqformat.c:347
#8  0x00479a63 in printtup (slot=0x2958fc8, self=0x2b6bca0) at
printtup.c:372
#9  0x005c1cc9 in ExecutePlan (dest=0x2b6bca0,
direction=, numberTuples=0, sendTuples=1 '\001',
operation=CMD_SELECT,
use_parallel_mode=, planstate=0x2958cf8,
estate=0x2958be8) at execMain.c:1606
#10 standard_ExecutorRun (queryDesc=0x2834998, direction=, count=0) at execMain.c:339
#11 0x006d69a7 in PortalRunSelect
(portal=portal@entry=0x2894e38, forward=forward@entry=1 '\001',
count=0, count@entry=9223372036854775807,
dest=dest@entry=0x2b6bca0) at pquery.c:948
#12 0x006d7dbb in PortalRun (portal=0x2894e38,
count=9223372036854775807, isTopLevel=, dest=0x2b6bca0,
altdest=0x2b6bca0,
completionTag=0x7ffc9f2b14e0 "") at pquery.c:789
#13 0x006d5a06 in PostgresMain (argc=,
argv=, dbname=, username=) at postgres.c:1109
#14 0x0046fc28 in BackendRun (port=0x27f16a0) at postmaster.c:4342
#15 BackendStartup (port=0x27f16a0) at postmaster.c:4016
#16 ServerLoop () at postmaster.c:1721
#17 0x00678119 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x27c8c90) at postmaster.c:1329
#18 0x0047088e in main (argc=3, argv=0x27c8c90) at main.c:228
(gdb) quit

Now, the fact that this happened to multiple servers at time strongly
suggest an external (to the database) problem.  The system initiating
the query, a cross database query over dblink, has been has given up
(and has been restarted as a precaution) a long time ago, and the
connection is dead.   secure_write() sets however an infinite timeout
to the latch, and there are clearly scenarios where epoll waits
forever for an event that is never going to occur.  If/when this
happens, the only recourse is to restart the impacted database.  The
question is, shouldn't the latch have a looping timeout that checks
for interrupts?   What would the risks be of jumping directly out of
the latch loop?

merlin

Re: Error on failed COMMIT

2020-02-24 Thread Merlin Moncure

On Mon, Feb 24, 2020 at 4:06 PM Vladimir Sitnikov
 wrote:
>
> Merlin>My biggest sense of alarm with the proposed change is that it could
> Merlin>leave applications in a state where the transaction is hanging there
>
> How come?
> The spec says commit ends the transaction.
> Can you please clarify where the proposed change leaves a hanging transaction?
>
> Just in case, the proposed change is as follows:
>
> postgres=# begin;
> BEGIN
> postgres=# aslkdfasdf;
> ERROR:  syntax error at or near "aslkdfasdf"
> LINE 1: aslkdfasdf;
> ^
> postgres=# commit;
> ROLLBACK   <-- this should be replaced with "ERROR: can't commit the 
> transaction because ..."
> postgres=# commit;
> WARNING:  there is no transaction in progress  <-- this should be as it is 
> currently. Even if commit throws an error, the transaction should be 
> terminated.
> COMMIT

Ok, you're right; I missed the point in that it's not nearly as bad as
I thought you were suggesting (to treat commit as bad statement) but
the transaction would still terminate.   Still, this is very sensitive
stuff, do you think most common connection poolers would continue to
work after making this change?

merlin

Re: Error on failed COMMIT

2020-02-24 Thread Merlin Moncure

On Sun, Feb 23, 2020 at 7:59 PM Dave Cramer  wrote:
>
> I think the fact that this is a violation of the SQL SPEC lends considerable 
> credence to the argument for changing the behaviour.
> Since this can lead to losing a transaction I think there is even more reason 
> to look at changing the behaviour.

The assumption that COMMIT terminates the transaction is going to be
deeply embedded into many applications.  It's just too convenient not
to rely on.  For example, I maintain a bash based deployment framework
that assembles large SQL files from bit and pieces and tacks a COMMIT
at the end.  It's not *that* much work to test for failure and add a
rollback but it's the kind of surprise our users hate during the
upgrade process.

Over the years we've tightened the behavior of postgres to be inline
with the spec (example: Tom cleaned up the row-wise comparison
behavior in 8.2) but in other cases we had to punt (IS NULL/coalesce
disagreement over composites for example), identifier case sensitivity
etc.  The point is, changing this stuff can be really painful and we
have to evaluate the benefits vs the risks.

My biggest sense of alarm with the proposed change is that it could
leave applications in a state where the transaction is hanging there
it could previously assume it had resolved; this could be catastrophic
in impact in certain real world scenarios.  Tom is right, a GUC is the
equivalent of "sweeping the problem under the wrong" (if you want
examples of the long term consequences of that vision read through
this: https://dev.mysql.com/doc/refman/8.0/en/timestamp-initialization.html).
  The value proposition of the change is however a little light
relative to the risks IMO.

I do think we need to have good page summarizing non-spec behaviors in
the documentation however.

merlin

Re: DB running out of memory issues after upgrade

2020-02-18 Thread Merlin Moncure

On Tue, Feb 18, 2020 at 12:10 PM Nagaraj Raj  wrote:
>
> Below are the same configurations ins .conf file before and after updagrade
>
> show max_connections; = 1743
> show shared_buffers = "4057840kB"
> show effective_cache_size =  "8115688kB"
> show maintenance_work_mem = "259MB"
> show checkpoint_completion_target = "0.9"
> show wal_buffers = "16MB"
> show default_statistics_target = "100"
> show random_page_cost = "1.1"
> show effective_io_concurrency =" 200"
> show work_mem = "4MB"
> show min_wal_size = "256MB"
> show max_wal_size = "2GB"
> show max_worker_processes = "8"
> show max_parallel_workers_per_gather = "2"

This smells like oom killer for sure.  how did you resolve some of
these values.  In particular max_connections and effective_cache_size.
  How much memory is in this server?

merlin

Re: [HACKERS] emergency outage requiring database restart

2020-02-07 Thread Merlin Moncure

On Tue, Jan 3, 2017 at 1:05 PM Peter Eisentraut
 wrote:
>
> On 11/7/16 5:31 PM, Merlin Moncure wrote:
> > Regardless, it seems like you might be on to something, and I'm
> > inclined to patch your change, test it, and roll it out to production.
> > If it helps or at least narrows the problem down, we ought to give it
> > consideration for inclusion (unless someone else can think of a good
> > reason not to do that, heh!).
>
> Any results yet?

Not yet.   But I do have some interesting findings.  At this point I
do not think the problem is within  pl/sh itself, but that when a
process is invoked from pl/sh misbehaves that misbehavior can
penetrate into the database processes.  I also believe that this
problem is fd related, so that the 'close on exec' might reasonably
fix it.  All cases of database damage I have observed remain
completely mitigated by enabling database checksums.

Recently, a sqsh process kicked off via pl/sh crashed with signal 11
but the database process was otherwise intact and fine.  This is
strong supporting evidence to my points above, I think.  I've also
turned up a fairly reliable reproduction case from some unrelated
application changes.  If I can demonstrate that close on exec flag
works and prevents these occurrences we can close the book on this.

merlin

Re: Greatest Common Divisor

2020-01-06 Thread Merlin Moncure

On Mon, Jan 6, 2020 at 6:52 AM Fabien COELHO  wrote:
>
>
> Hello Robert,
>
> >>   if (arg1 == PG_INT32_MIN)
> >>   if (arg2 == 0 || arg2 == PG_INT32_MIN)
> >>
> >> And possibly a "likely" on the while.
> >
> > I don't think decoration the code with likely() and unlikely() all
> > over the place is a very good idea.
>
> > Odds are good that we'll end up with a bunch that are actually
> > non-optimal, and nobody will ever figure it out because it's hard to
> > figure out.
>
> My 0.02€: I'd tend to disagree.
>
> Modern pipelined processors can take advantage of speculative execution on
> branches, so if you know which branch is the more likely it can help.
>
> Obviously if you get it wrong it does not, but for the above cases it
> seems to me that they are rather straightforward.
>
> It also provides some "this case is expected to be exceptional" semantics
> to people reading the code.
>
> > I have a hard time believing that we're going to be much
> > worse off if we just write the code normally.
>
> I think that your point applies to more general programming in postgres,
> but this is not the context here.
>
> For low-level arithmetic code like this one, with tests and loops
> containing very few hardware instructions, I think that helping compiler
> optimizations is a good idea.

Do you have any performance data to back that up?

merlin

Re: Greatest Common Divisor

2020-01-03 Thread Merlin Moncure

On Fri, Jan 3, 2020 at 1:32 PM Robert Haas  wrote:
>
> On Fri, Jan 3, 2020 at 2:27 PM Chapman Flack  wrote:
> > On 1/3/20 2:11 PM, Robert Haas wrote:
> > > and moving things to another schema does not help with that. It does
> > > potentially help with the namespace pollution issue, but how much of
> > > an issue is that anyway? Unless you've set up an unusual search_path
> > > configuration, your own schemas probably precede pg_catalog in your
> > > search path, besides which it seems unlikely that many people have a
> > > gcd() function that does anything other than take the greatest common
> > > divisor.
> >
> > As seen in this thread though, there can be edge cases of "take the
> > greatest common divisor" that might not be identically treated in a
> > thoroughly-reviewed addition to core as in someone's hastily-rolled
> > local version.
>
> True, but because of the way search_path is typically set, they'd
> probably continue to get their own version anyway, so I'm not sure
> what the problem is.

Is that right? Default search_path is for pg_catalog to resolve before
public.  Lightly testing with a hand rolled pg_advisory_lock
implementation that raise a notice, my default database seemed to
prefer the build in function.  Maybe I'm not following you.

> There are counter-arguments to that, though. Maintaining a lot of
> extensions with only one or two functions in them is a nuisance.
> Having things installed by default is convenient for wanting to use
> them. Maintaining contrib code so that it works whether or not the SQL
> definitions have been updated via ALTER EXTENSION .. UPDATE takes some
> work and thought, and sometimes we screw it up.

If the external contract changes (which seems likely for gcd) than I
would much rather have the core team worry about this than force your
users to worry about it, which is what putting the function in core
would require them to do (if version < x call it this way, > y then
that way etc).  This is exactly why we shouldn't be putting non
standard items in core (maybe excepting some pg_ prefixed
administration functions).

Now, it's quite unfair to $OP to saddle his proposal and patch with
the broader considerations of core/extension packaging, so if some
kind of rational framework can be applied to the NEXT submission, or a
least a discussion about this can start, those are all good options.
But we need to start from somewhere, and moving forward with, "If it's
not sql standard or prefixed with pg_, it ought not to be in
pg_catalog" might be a good way to open the discussion.

merlin

Re: Greatest Common Divisor

2020-01-03 Thread Merlin Moncure

On Fri, Jan 3, 2020 at 10:24 AM Robert Haas  wrote:
>
> On Fri, Jan 3, 2020 at 10:23 AM Tom Lane  wrote:
> > Now, those functions were just exposing libc functionality, so there
> > wasn't a lot of code to write.  There might be a good argument that
> > gcd isn't useful enough to justify the amount of code we'd have to
> > add (especially if we allow it to scope-creep into needing to deal
> > with "numeric" calculations).  But I'm not on board with just
> > dismissing it as uninteresting.
>
> Yeah. There's always the question with things like this as to whether
> we ought to push certain things into contrib modules that are not
> installed by default to avoid bloating the set of things built into
> the core server. But it's hard to know where to draw the line.

Just stop doing it.  It's very little extra work to package an item
into an extension and this protects your hapless users who might have
implemented a function called gcd() that does something different.
Ideally, the public namespace should contain (by default) only sql
standard functions with all non-standard material in an appropriate
extension.  Already released material is obviously problematic and
needs more thought but we ought to at least stop making the problem
worse if possible.

merlin

Re: Greatest Common Divisor

2020-01-02 Thread Merlin Moncure

On Sat, Dec 28, 2019 at 12:15 PM Fabien COELHO  wrote:
>
>
> Bonsoir Vik,
>
> > I recently came across the need for a gcd function (actually I needed
> > lcm) and was surprised that we didn't have one.
>
> Why not.

Proliferation of code in the public namespace; it can displace code
that is written by others during the upgrade.

merlin

Re: inherits clause for CREATE TYPE? -

2019-12-18 Thread Merlin Moncure

On Wed, Dec 18, 2019 at 12:38 PM Pavel Stehule  wrote:
>
> Hi
>
> I had a talk with one boy about development in plpgsql. He uses table's 
> functions. More times he uses returns types based on some table type + few 
> attributes. Now he use a ugly hack - he create a view on table plus some 
> columns - and then he use the view related type as table function result 
> type. For similar uses cases there can be interesting to have a possibility 
> to create types by extending other types. Probably almost all functionality 
> is inside now - so it should not be hard work.
>
> My idea is implement inherits clause for CREATE TYPE command.
>
> Some like
>
> CREATE TYPE fx_rt (xx int) INHERITS(pg_class);
>
> What do you think about this idea?

How about using composition style approaches?

create type base as (...)

create type extended as  (b base, ...)

create function foo() returns extended as ...

merlin

Re: Why JIT speed improvement is so modest?

2019-11-25 Thread Merlin Moncure

On Mon, Nov 25, 2019 at 9:09 AM Konstantin Knizhnik
 wrote:
> JIT was not able to significantly (times) increase speed on Q1 query?
> Experiment with VOPS shows that used aggregation algorithm itself is not
> a bottleneck.
> Profile also give no answer for this question.
> Any ideas?

Well, in the VOPS variant around 2/3 of the time is spent in routines
that are obviously aggregation.  In the JIT version, it's around 20%.
So this suggests that the replacement execution engine is more
invasive.  I would also guess (!) that the VOPS engine optimizes fewer
classes of query plan.   ExecScan for example, looks to be completely
optimized out VOPS but is still utilized in the JIT engine.

I experimented with Vitessa a couple of years back and this was
consistent with my recollection.

merlin

Re: pg11.5: ExecHashJoinNewBatch: glibc detected...double free or corruption (!prev)

2019-09-03 Thread Merlin Moncure

On Tue, Aug 27, 2019 at 5:52 PM Merlin Moncure  wrote:
>
> On Mon, Aug 26, 2019 at 12:01 PM Tom Lane  wrote:
> >
> > Justin Pryzby  writes:
> > > On Mon, Aug 26, 2019 at 12:45:24PM -0400, Tom Lane wrote:
> > >> However ... there is some pretty interesting info at
> > >> https://bugzilla.redhat.com/show_bug.cgi?id=1338673
> > >> suggesting that compiling with a late-model gcc against older RHEL6
> > >> headers could result in bad code.  I wonder whether the reporters'
> > >> servers were built using such a configuration.  (Although the linkage,
> > >> if any, to this report still wouldn't be very clear.)
> >
> > > I can tell it was compiled using
> > > version | PostgreSQL 11.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 
> > > 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit
> >
> > Ah, that appears to be the default compiler for RHEL6, so that theory
> > is out the window.  It's still interesting that we're only seeing this
> > reported from RHEL6 ... maybe there's something specific to the code
> > that this gcc version generates?
>
> FWIW, I've got the same compiler (which is natural, we are likely
> using the same packaging):
> PostgreSQL 9.6.12 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313
> (Red Hat 4.4.7-23), 64-bit

I'm looking for more common threads here.   One interesting fact about
this server is that we just realized (!) that cron was attempting to
email huge log files and failing due to misconfigured email.  The mail
queue was building into the gigabytes so that on half hour cadence the
server was running out of memory until the mail process crashed.
Postgres ran just fine through this process (which is pretty cool).
So, question: were there any memory issues on the other server on or
around the crash?

merlin

no mailing list hits in google

2019-08-28 Thread Merlin Moncure

Hackers,
[apologies if this is the incorrect list or is already discussed material]

I've noticed that mailing list discussions in -hackers and other
mailing lists appear to not be indexed by google -- at all.  We are
also not being tracked by any mailing list aggregators -- in contrast
to a decade ago where we had nabble and other systems to collect and
organize results (tbh, often better than we do) we are now at an
extreme disadvantage; mailing list activity was formerly and
absolutely fantastic research via google to find solutions to obscure
technical problems in the database.  Limited access to this
information will directly lead to increased bug reports, lack of
solution confidence, etc.

My test case here is the query: pgsql-hackers ExecHashJoinNewBatch
I was searching out a link to recent bug report for copy/paste into
corporate email. In the old days this would fire right up but now
returns no hits even though the discussion is available in the
archives (which I had to find by looking up the specific day the
thread was active).  Just a heads up.

merlin

Re: pg11.5: ExecHashJoinNewBatch: glibc detected...double free or corruption (!prev)

2019-08-27 Thread Merlin Moncure

On Mon, Aug 26, 2019 at 12:01 PM Tom Lane  wrote:
>
> Justin Pryzby  writes:
> > On Mon, Aug 26, 2019 at 12:45:24PM -0400, Tom Lane wrote:
> >> However ... there is some pretty interesting info at
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1338673
> >> suggesting that compiling with a late-model gcc against older RHEL6
> >> headers could result in bad code.  I wonder whether the reporters'
> >> servers were built using such a configuration.  (Although the linkage,
> >> if any, to this report still wouldn't be very clear.)
>
> > I can tell it was compiled using
> > version | PostgreSQL 11.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 
> > 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit
>
> Ah, that appears to be the default compiler for RHEL6, so that theory
> is out the window.  It's still interesting that we're only seeing this
> reported from RHEL6 ... maybe there's something specific to the code
> that this gcc version generates?

FWIW, I've got the same compiler (which is natural, we are likely
using the same packaging):
PostgreSQL 9.6.12 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313
(Red Hat 4.4.7-23), 64-bit

merlin

Re: pg11.5: ExecHashJoinNewBatch: glibc detected...double free or corruption (!prev)

2019-08-27 Thread Merlin Moncure

On Sun, Aug 25, 2019 at 9:35 PM Thomas Munro  wrote:
>
> On Mon, Aug 26, 2019 at 1:44 PM Justin Pryzby  wrote:
> > On Mon, Aug 26, 2019 at 01:09:19PM +1200, Thomas Munro wrote:
> > > On Sun, Aug 25, 2019 at 3:15 PM Peter Geoghegan  wrote:
> > > > I was reminded of this issue from last year, which also appeared to
> > > > involve BufFileClose() and a double-free:
> > > >
> > > > https://postgr.es/m/87y3hmee19@news-spur.riddles.org.uk
> > > >
> > > > That was a BufFile that was under the control of a tuplestore, so it
> > > > was similar to but different from your case. I suspect it's related.
> > >
> > > Hmm.  tuplestore.c follows the same coding pattern as nodeHashjoin.c:
> > > it always nukes its pointer after calling BufFileFlush(), so it
> > > shouldn't be capable of calling it twice for the same pointer, unless
> > > we have two copies of that pointer somehow.
> > >
> > > Merlin's reported a double-free apparently in ExecHashJoin(), not
> > > ExecHashJoinNewBatch() like this report.  Unfortunately that tells us
> > > very little.
>
> Here's another one:
>
> https://www.postgresql.org/message-id/flat/20170601081104.1500.56202%40wrigleys.postgresql.org
>
> Hmm.  Also on RHEL/CentOS 6, and also involving sorting, hashing,
> BufFileClose() but this time the glibc double free error is in
> repalloc().
>
> And another one (repeatedly happening):
>
> https://www.postgresql.org/message-id/flat/3976998C-8D3B-4825-9B10-69ECB70A597A%40appnexus.com
>
> Also on RHEL/CentOS 6, this time a sort in once case and a hash join
> in another case.
>
> Of course it's entirely possible that we have a bug here and I'm very
> keen to find it, but I can't help noticing the common factor here is
> that they're all running ancient RHEL 6.x releases, except Merlin who
> didn't say.  Merlin?

Just noticed this.
redhat-release: "Red Hat Enterprise Linux Server release 6.9 (Santiago)"

merlin

Re: Hstore OID bigger than an integer

2019-08-23 Thread Merlin Moncure

On Fri, Aug 23, 2019 at 9:26 AM Roberto Mireles
 wrote:
>
> Hello team,
>
> This is the first time I post here, if you can provide some help, would be 
> much appreciated.
>
> I have an application that can not access the database due to OID value for 
> hstore extension is bigger than an integer value. Application uses a NpgSql 
> driver that only supports integer types for OIDs.
>
> We have a new app version, which uses a new driver version that supports 
> bigint and has no issues at all, but for that specific database, we still 
> need to use that old version of the app.
>
> I have searched for a workaround or something that can help us to be able to 
> connect to the database, but have not found anything related.
>
> I also tried by dropping, creating extension again, but same result.
>
> Does any of you know any workaround that can help here?

A full dump/restore of the database (via pg_dump) might work, as long
as the oids are not dumped with the database, which I believe to be
the default.  This ought to reset the oid counter. Some while back,
oids were changed so the counter was table specific (IIRC).  Not sure
was after 9.2.  If it was, upgrading the database (which you should be
looking at anyways) might help.  Also, raise an issue upstream.

merlin

Re: double free in ExecHashJoin, 9.6.12

2019-07-26 Thread Merlin Moncure

On Wed, Jul 24, 2019 at 11:01 PM Thomas Munro  wrote:
>
> On Thu, Jul 25, 2019 at 2:39 AM Merlin Moncure  wrote:
> > Server is generally running pretty well, and is high volume.  This
> > query is not new and is also medium volume.  Database rebooted in
> > about 4 seconds with no damage; fast enough we didn't even trip alarms
> > (I noticed this troubleshooting another issue).  We are a couple of
> > bug fixes releases behind but I didn't see anything obvious in the
> > release notes suggesting a resolved issue. Anyone have any ideas?
> > thanks in advance.
>
> > postgres: rms ysconfig 10.33.190.21(36788) 
> > SELECT(ExecHashJoin+0x5a2)[0x5e2d32]
>
> Hi Merlin,
>
> Where's the binary from (exact package name, if installed with a
> package)?  Not sure if this is going to help, but is there any chance
> you could disassemble that function so we can try to see what it's
> doing at that offset?  For example on Debian if you have
> postgresql-9.6 and postgresql-9.6-dbg installed you could run "gdb
> /usr/lib/postgresql/9.6/bin/postgres" and then "disassemble
> ExecHashJoin".  The code at "<+1442>" (0x5a2) is presumably calling
> free or some other libc thing (though I'm surprised not to see an
> intervening palloc thing).

Thanks -- great suggestion.  I'll report back with any interesting findings.

merlin

double free in ExecHashJoin, 9.6.12

2019-07-24 Thread Merlin Moncure

Server is generally running pretty well, and is high volume.  This
query is not new and is also medium volume.  Database rebooted in
about 4 seconds with no damage; fast enough we didn't even trip alarms
(I noticed this troubleshooting another issue).  We are a couple of
bug fixes releases behind but I didn't see anything obvious in the
release notes suggesting a resolved issue. Anyone have any ideas?
thanks in advance.

merlin

*** glibc detected *** postgres: rms ysconfig 10.33.190.21(36788)
SELECT: double free or corruption (!prev): 0x01fb2140 ***
=== Backtrace: =
/lib64/libc.so.6(+0x75dee)[0x7f4fde053dee]
/lib64/libc.so.6(+0x78c80)[0x7f4fde056c80]
postgres: rms ysconfig 10.33.190.21(36788) SELECT(ExecHashJoin+0x5a2)[0x5e2d32]
postgres: rms ysconfig 10.33.190.21(36788) SELECT(ExecProcNode+0x208)[0x5cf728]
postgres: rms ysconfig 10.33.190.21(36788)
SELECT(standard_ExecutorRun+0x18a)[0x5cd1ca]
postgres: rms ysconfig 10.33.190.21(36788) SELECT[0x6e5607]
postgres: rms ysconfig 10.33.190.21(36788) SELECT(PortalRun+0x188)[0x6e67d8]
postgres: rms ysconfig 10.33.190.21(36788) SELECT[0x6e2af3]
postgres: rms ysconfig 10.33.190.21(36788) SELECT(PostgresMain+0x75a)[0x6e456a]
postgres: rms ysconfig 10.33.190.21(36788)
SELECT(PostmasterMain+0x1875)[0x6840b5]
postgres: rms ysconfig 10.33.190.21(36788) SELECT(main+0x7a8)[0x60b528]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7f4fddffcd1d]
postgres: rms ysconfig 10.33.190.21(36788) SELECT[0x46c589]

2019-07-23 09:41:41 CDT[:@]:LOG:  server process (PID 18057) was
terminated by signal 6: Aborted
2019-07-23 09:41:41 CDT[:@]:DETAIL:  Failed process was running:
SELECT JR.job_id as jobId,JR.job_execution_id as
jobResultId,JR.created as lastRunDate, JR.status as status,
JR.status_message as statusMessage, JR.output_format as outputFormat,
JS.schedule_name as scheduleName, JS.job_name as reportName,
JS.created_by as scheduledBy, JS.product as source FROM (SELECT
JR.job_id, MAX(JR.created) AS MaxCreated FROM job_schedule JS JOIN
job_result JR ON JR.job_id=JS.job_id WHERE (lower(JS.recepients) like
lower($1) OR lower(JS.created_by) = lower($2)) GROUP BY JR.job_id) TMP
JOIN job_result JR ON JR.job_id = TMP.job_id AND JR.created =
TMP.MaxCreated JOIN job_schedule JS ON JS.job_id = JR.job_id AND
JS.job_type='CRON'

merlin

Re: crosstab/repivot...any interest?

2019-02-26 Thread Merlin Moncure

On Tue, Feb 26, 2019 at 8:31 AM Joe Conway  wrote:
> On 2/25/19 8:34 PM, Merlin Moncure wrote:
> > The interface I'm looking at is:
> > SELECT repivot(
> >   query TEXT,
> >   static_attributes INT,  /* number of static attributes that are
> > unchanged around key; we need this in our usages */
> >   attribute_query  TEXT); /* query that draws up the pivoted attribute
> > list */
> >
> > The driving query is expected to return 0+ static attributes which are
> > essentially simply pasted to the output. The next two columns are the
> > key column and the pivoting column.   So if you had three attributes,
> > the input set would be:
> >
> > a1, a2, a3, k1, p, v1...vn
> >
> > Where the coordinates v and p would exchange.  I need to get this done
> > quickly and so am trying to avoid more abstracted designs (maybe multi
> > part keys should be supported through...this is big limitation of
> > crosstab albeit with some obnoxious work arounds).
>
> Perhaps not enough coffee yet, but I am not sure I fully grok the design
> here. A fully fleshed out example would be useful.

Thanks for your interest.

A lot of our application data is organized in columnar type formats.
Typically, the analytical reports on the back of the database are
organized withe output columns being a KPI or a unit of time.

KPI Columns:
Country,Make,Model,TimeSlice,Sales,Units
US,Ford,Fusion,Y2018Q1,165MM$,250k
US,Ford,Fusion,Y2018Q2,172MM$,261k
US,Ford,Fusion,Y2018Q3,183MM$,268k
...

Time Columns:
Country,Make,Mode, KPI,Y2018Q1,Y2018Q2,Y2018Q3
US,Ford,Fusion,Y2018Q1,Sales,165MM$,172MM$,183MM$
US,Ford,Fusion,Y2018Q2,Units,250k,261k,268k

SELECT repivot(
  ,
  1, /* only one static attribute */
  2, /* number of key columns, only needed if multi part keys are supported */
  );

In this example supporting multi-part, the repivot column is the 4th,
assumed to be attributes first, keys second, pivot column third, data
block last.  Multi column pivots might be considered but I don't need
them and that's a significant expansion in scope, so I'm avoiding that
consideration.

What we need to do is convert from the first format above (which is
how data is materialized on table) to the second format.  Just like as
within crosstab, if the key column(s) are ordered into the function we
can exploit that structure for an efficient implementation.

'Ford' and 'Fusion' are key columns; 'US' is uninteresting attribute
column (it is unambiguously represented by 'Ford', 'Fusion') and so
would be simply be copied from input to output set.

Our data is materialized in KPI style (which is pretty typical) but we
have large analytical reports that want it with columns representing
units of time.  This is farily common in the industry IMO.

There are various pure SQL approaches to do this but they tend to
force you to build out the X,Y columns and then join it all back
together; this can spiral out of control quite quickly from a
performance standpoint.  A crosstab style crawling cursor over ordered
set ought to get the job done very efficiently.  tablefunc doesn't
support multi part keys today, and the workaround is that you have so
stuff a key into a formatted string and than parse it out for
downstream joining, some needs of having to do that joining might
hopefully be eliminated by allowing the attribute columns to be copied
through.

merlin

crosstab/repivot...any interest?

2019-02-25 Thread Merlin Moncure

On Fri, Jan 25, 2019 at 9:14 PM Morris de Oryx 
wrote:
>
> Hello, I'm not a C coder and can't helpbut I love cross-tab/pivot
tables. They're the best, and just fantastic for preparing data to feed
into various analysis tools. The tablefunc module is helpful, but a bit
awkward to use (as I remember it.)
>
> From a user's point of view, I high-performance cross-tab generator would
be just fantastic.
>
> As I understand it, this is what's involved in a pivot:
>
> 1. Identify rows that should be grouped (consolidated.)
> 2. Distinguish the value that identifies each derived column.
> 3. Distinguish the value that identifies each row-column value.
> 4. Collapse the rows, build the columns, and populate the 'cells' with
data.
>
> In an ideal world, you would be able to perform different grouping
operations. Such as count, sum, avg, etc.
>
> If there's a way to do this in a system-wide and standards-pointing way,
so much the better.
>
> Apologies if I'm violating list etiquette by jumping in here. I've been
lurking on several Postgres lists for a bit and picking up interesting
details every day. If I've been Unintentionally and Cluelessly Off, I'm
find with being told.

No worries, sir! Apologies on the late reply.  I've made some headway on
this item.  Waiting for postgres to implement the SQL standard pivoting
(regardless if it implements the cases I need) is not an option for my
personal case. I can't use the SQL approach either as it's very slow and
imposing some scaling limits that need to work around in the short run.

My strategy is to borrow [steal] from crosstab_hash and make a new version
called repivot which takes an arleady pivoted data set and repivots it
against an identified column.   Most of the code can be shared with
tablefunc so ideally this could live as an amendment to that extension.
That might not happen though, so I'm going to package it as a separate
extension (removing the majority of tablefunc that I don't need) and submit
it to this group for consideration.

If we punt, it'll end up as a private extension or live the life of an
Orphan in pgxn.  If there's some interest here, we can consider a new
contrib extension (which I personally rate very unlikely) or recasting as
an extra routine to tablefunc.  Any way we slice it, huge thanks to Joe
Conway for giving us such an awesome function to work with all these
years (not to mention the strategic plr language).  SRF crosstab() is still
somewhat baroque, but it still fills a niche that nothing else implements.

The interface I'm looking at is:
SELECT repivot(
  query TEXT,
  static_attributes INT,  /* number of static attributes that are unchanged
around key; we need this in our usages */
  attribute_query  TEXT); /* query that draws up the pivoted attribute list
*/

The driving query is expected to return 0+ static attributes which are
essentially simply pasted to the output. The next two columns are the key
column and the pivoting column.   So if you had three attributes, the input
set would be:

a1, a2, a3, k1, p, v1...vn

Where the coordinates v and p would exchange.  I need to get this done
quickly and so am trying to avoid more abstracted designs (maybe multi part
keys should be supported through...this is big limitation of crosstab
albeit with some obnoxious work arounds).

merlin

Re: Early WIP/PoC for inlining CTEs

2019-02-14 Thread Merlin Moncure

On Thu, Feb 14, 2019 at 10:02 AM Alvaro Herrera
 wrote:
>
> On 2019-Feb-14, Peter Eisentraut wrote:
>
> > On 14/02/2019 16:11, Tom Lane wrote:
> > > ... so, have we beaten this topic to death yet?  Can we make a decision?
> > >
> > > Personally, I'd be happy with either of the last two patch versions
> > > I posted (that is, either AS [[NOT] MATERIALIZED] or
> > > AS [MATERIALIZE [ON|OFF]] syntax).  But we gotta pick something.
> >
> > If we're not really planning to add any more options, I'd register a
> > light vote for MATERIALIZED.  It reads easier, seems more grammatically
> > correct, and uses an existing word.
>
> +1 for MATERIALIZED, as I proposed in
> https://postgr.es/m/20170503173305.fetj4tz7kd56tjlr@alvherre.pgsql

Seconded!

merlin

Re: Early WIP/PoC for inlining CTEs

2019-01-29 Thread Merlin Moncure

On Tuesday, January 29, 2019, Tom Lane  wrote:

> Merlin Moncure  writes:
> > On Tue, Jan 29, 2019 at 1:53 PM Tom Lane  wrote:
> >> I propose that we implement and document this as
> >> WITH ctename AS [ MATERIALIZE { ON | OFF } ] ( query )
>
> > I hate to bikeshed here, but I think it's better english using that
> > style of syntax to say,
> >  WITH ctename AS [ MATERIALIZATION { ON | OFF } ] ( query )
>
> Hm.  Doesn't really seem to fit with our style for options elsewhere;
> for instance, EXPLAIN's options are things like ANALYZE ON/OFF and
> VERBOSE ON/OFF, not ANALYSIS and VERBOSITY.
>
> regards, tom lane
>

Yep...I'll concede the point.

merlin

Re: Early WIP/PoC for inlining CTEs

2019-01-29 Thread Merlin Moncure

On Tue, Jan 29, 2019 at 1:53 PM Tom Lane  wrote:
>
> Michael Paquier  writes:
> > On Mon, Jan 28, 2019 at 05:05:32PM -0500, Tom Lane wrote:
> >> Yeah, I thought about that too, but it doesn't seem like an improvement.
> >> If the query is very long (which isn't unlikely) I think people would
> >> prefer to see the option(s) up front.
>
> > Having these options at the front of the WITH clause looks more
> > natural to me.
>
> Well, we've managed to get agreement on the semantics of this thing,
> let's not get hung up on the syntax details.
>
> I propose that we implement and document this as
>
> WITH ctename AS [ MATERIALIZE { ON | OFF } ] ( query )
>
> which is maybe a bit clunky but not awful, and it would leave room
> to generalize it to "AS [ optionname optionvalue [ , ... ] ]" if we
> ever need to.  Looking at the precedent of e.g. EXPLAIN, we could
> probably allow just "MATERIALIZE" as well, with the boolean value
> defaulting to true.

I hate to bikeshed here, but I think it's better english using that
style of syntax to say,
 WITH ctename AS [ MATERIALIZATION { ON | OFF } ] ( query )

merlin

Re: crosstab/repivot...any interest?

2019-01-25 Thread Merlin Moncure

On Fri, Jan 25, 2019 at 3:16 PM David Fetter  wrote:
>
> On Fri, Jan 25, 2019 at 02:21:55PM -0600, Merlin Moncure wrote:
> > Hackers,
> >
> > We have a strong need to make a variant to the crosstab interface so
> > that data that is pivoted one way would be sent through a crosstab
> > like function so that it would be pivoted another way.  For example,
> > if you had
> >
> > row 0: a1, a2, a3, k1, c1, c2, ...
> > row 1: a1, a2, a3, k2, c1, c2, ...
> > row 2: a1, a2, a3, k3, c1, c2, ...
> > ...
> >
> > where 'a' columns are uninteresting attribute columns, 'k' is the
> > dimension we want to pivot on, and c1->cN would be stacked vertically,
> > so that we'd end up with,
> > row 0: a1, a2, a3, c1, k1, k2, ...
> > row 1: a1, a2, a3, c2, k1, k2, ...
> > row 2: a1, a2, a3, c3, k1, k2, ...
> >
> > There are various SQL level approaches to this but they tend to be
> > imperformant with large datasets so that I think a crosstab-like C
> > implementation ought to be able to do better (or at least I hope so)
> > since you have to cross product rows and columns in such a way that
> > you can get a clean join.  Cribbing from tablefunc.c I don't think
> > this is a terrible challenge to do in hash table style.
> >
> > Questions on the table:
> > *) Has anyone done anything like this or know of any current 
> > implementations?
> > *) Would there be any interest in expanding tablefunc along these lines?
>
> There's something in SQL:2016 that I read as crosstabs, or at least as
> enabling crosstabs.
> https://www.iso.org/standard/69776.html
>
> If we're going to put work into crosstabs, it seems to me that the
> "we" needs to be the project as a whole, and the work should be, to
> the extent reasonable, toward standard compliance.

Interesting.  Do you see that the spec (it makes my brain hurt) can
handle that kind of repivoting?

merlin

crosstab/repivot...any interest?

2019-01-25 Thread Merlin Moncure

Hackers,

We have a strong need to make a variant to the crosstab interface so
that data that is pivoted one way would be sent through a crosstab
like function so that it would be pivoted another way.  For example,
if you had

row 0: a1, a2, a3, k1, c1, c2, ...
row 1: a1, a2, a3, k2, c1, c2, ...
row 2: a1, a2, a3, k3, c1, c2, ...
...

where 'a' columns are uninteresting attribute columns, 'k' is the
dimension we want to pivot on, and c1->cN would be stacked vertically,
so that we'd end up with,
row 0: a1, a2, a3, c1, k1, k2, ...
row 1: a1, a2, a3, c2, k1, k2, ...
row 2: a1, a2, a3, c3, k1, k2, ...

There are various SQL level approaches to this but they tend to be
imperformant with large datasets so that I think a crosstab-like C
implementation ought to be able to do better (or at least I hope so)
since you have to cross product rows and columns in such a way that
you can get a clean join.  Cribbing from tablefunc.c I don't think
this is a terrible challenge to do in hash table style.

Questions on the table:
*) Has anyone done anything like this or know of any current implementations?
*) Would there be any interest in expanding tablefunc along these lines?

thanks in advance,
merlin

Re: Early WIP/PoC for inlining CTEs

2019-01-11 Thread Merlin Moncure

On Tue, Jul 24, 2018 at 6:10 PM Andres Freund  wrote:
> My read of the concensus (in which I am in the majority, so I might be
> biased) is that we do want inlining to be the default. We were thinking
> that it'd be necessary to provide a way to force inlining on the SQL
> level for individual CTEs.

This is correct.  Suggesting that we need syntax to disabling inlining
at the CTE level, and/or GUC to control the behavior (which I agree
should be defualted to inline).  Something like
enable_cte_inline=true; I'm not very enthusiastic about explicitly
breaking intentionally introduced optimization fences and then forcing
people to inject our OFFSET 0 hack.   This is just too unpleasant to
contemplate...what  happens if we come up with a better implemntation
of OFFSET?  yuck.

Thanks for providing this, CTE plan problems are a real bugaboo.

merlin

Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-03 Thread Merlin Moncure

On Fri, Nov 2, 2018 at 5:15 PM Andrew Dunstan
 wrote:
> On 11/02/2018 05:20 PM, Andres Freund wrote:
> > Hi,
> >
> > On 2018-11-02 17:02:24 -0400, Andrew Dunstan wrote:
> >> On 11/02/2018 11:34 AM, Merlin Moncure wrote:
> >>> Binary format consuming applications already have to deal with these
> >>> kinds of issues. We already expose internal structures in the other
> >>> functions -- not sure why jsonb is held to a different standard.  For
> >>> other data types where format changes were made, the standard of
> >>> 'caveat version' was in place to protect the user.  For jsonb we
> >>> decided to implement a version flag within the type itself, which I
> >>> thought mistake at the time -- better to have a version header in the
> >>> COPY BINARY if needed.
> >>>
> >>
> >> jsonb_send does output a version header, as I pointed out upthread.
> > That's Merlin's point I think. For reasons I don't quite understand he
> > doesn't like that. Yes, a global solution would have been prettier than
> > per-datum version flag, but that obvioulsy wasn't realistic to introduce
> > around the feature freeze of the version that introduced jsonb.
>
>
> Oh, hmm. It would have been a big change of little value, ISTM. One byte
> of overhead per jsonb datum for a version indicator doesn't seem a huge
> price to pay.

Yeah -- it really isn't.

My concern was more that it seemed odd to protect one type with a
version flag where others aren't; format agreement strikes me as more
of a protocol negotiation thing than an aspect of each individual data
point.  It also makes for slightly odd (IMO) client side coding. The
contract is of concern, not the overhead.

It works well enough though...we discussed this a bit when the header
was introduced and the discussion ended exactly the same way :-).

merlin

Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Merlin Moncure

On Fri, Nov 2, 2018 at 2:34 PM Stephen Frost  wrote:
> * Merlin Moncure (mmonc...@gmail.com) wrote:
> As for what language it's written in- I don't think that matters much.
> I'd very much expect it to be more performant to use binary if you're
> working in C, of course, but there's no point comparing C-parsed json
> into some C structure vs. psycopg2 injesting binary data and building a
> Python json object- what matters is if it'd be faster for psycopg2 to
> pull in binary-json data and put it into a Python json object, or if
> it'd be faster to parse the text-json data and put the result into the
> same Python object.  In my view, there's something clearly quite wrong
> if the text-json data format is faster at that.

Yeah, I figure the language would be C or another language with
drivers smart enough to speak the binary protocol.

> > In my experience with arrays and composites, you can see significant
> > performance reduction and throughput increase in certain classes of
> > queries.  However, some of the types that were the worst offenders
> > (like timestamps) have been subsequently optimized and/or are
> > irrelevant to json since they'd be passed as test anyways.
>
> I've had very good success transferring timestamps as binary, so I'm not
> quite sure what you're getting at here.

Er, yes, timestamps are much faster in binary -- that is what I'd
observed and was the point I was trying to make.  They are slightly
less faster with recent optimizations but still faster.  In the old
days I saw as much as +50% throughput binary vs text in certain
contrived situations; that may no longer be true today.

merlin

Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Merlin Moncure

On Fri, Nov 2, 2018 at 11:15 AM Stephen Frost  wrote:
>
> Greetings,
>
> * Merlin Moncure (mmonc...@gmail.com) wrote:
> > On Fri, Nov 2, 2018 at 10:53 AM Tom Lane  wrote:
> > > Andres' point about alignment is a pretty good one as well, if it applies
> > > here --- I don't recall just what internal alignment requirements jsonb
> > > has.  We have not historically expected clients to have to deal with that.
> >
> > I see your (and Andres') point; the binary wire format ought to lay on
> > top of the basic contracts established by other types.  It can be
> > binary; just not a straight memcpy out of the server.  The array and
> > composite type serializers should give some inspiration there on
> > serialization.
>
> Right- I agree w/ Tom and Andres on this part also.
>
> > I'll still stand other point I made though; I'd
> > really want to see some benchmarks demonstrating benefit over
> > competing approaches that work over the current formats.  That should
> > frame the argument as to whether this is a good idea.
>
> What are the 'competing approaches' you're alluding to here?  Sending
> text-format json across as we do today?

Yep -- exactly.  For example, write a C client program that recursed
the structure and dumped it to stdout or assigned to dummy variables
(being mindful of compiler optimizations).  I'd be contrasting this to
a C parsed json that did essentially the same thing, and rigging a
high scale test on the back of that.  The assumption here is that the
ultimate consumer is not, say, a browser, but some client app that can
actually exploit the performance advantages (else, why bother?).

In my experience with arrays and composites, you can see significant
performance reduction and throughput increase in certain classes of
queries.  However, some of the types that were the worst offenders
(like timestamps) have been subsequently optimized and/or are
irrelevant to json since they'd be passed as test anyways.

merlin

Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Merlin Moncure

On Fri, Nov 2, 2018 at 10:53 AM Tom Lane  wrote:
> Merlin Moncure  writes:
> > On Wed, Oct 31, 2018 at 10:23 AM Andres Freund  wrote:
> >> It's entirely unacceptable afaict. Besides the whole "exposing
> >> internals" issue, it's also at least not endianess safe, depends on the
> >> local alignment requirements (which differ both between platforms and
> >> 32/64 bit), numeric's internal encoding and probably more.
>
> > Binary format consuming applications already have to deal with these
> > kinds of issues. We already expose internal structures in the other
> > functions -- not sure why jsonb is held to a different standard.
>
> I don't think it's being held to a different standard at all.  Even for
> data as simple as integers/floats, we convert to uniform endianness on the
> wire.  Moreover, we do not expose the exact bits for anything more complex
> than machine floats.  Numeric, for instance, gets disassembled into fields
> rather than showing the exact header format (let alone several different
> header formats, as actually found on disk).
>
> Andres' point about alignment is a pretty good one as well, if it applies
> here --- I don't recall just what internal alignment requirements jsonb
> has.  We have not historically expected clients to have to deal with that.

I see your (and Andres') point; the binary wire format ought to lay on
top of the basic contracts established by other types.  It can be
binary; just not a straight memcpy out of the server.  The array and
composite type serializers should give some inspiration there on
serialization.   I'll still stand other point I made though; I'd
really want to see some benchmarks demonstrating benefit over
competing approaches that work over the current formats.  That should
frame the argument as to whether this is a good idea.

merlin

Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Merlin Moncure

On Wed, Oct 31, 2018 at 10:23 AM Andres Freund  wrote:
>
> Hi,
>
> On 2018-10-31 11:13:13 -0400, Andrew Dunstan wrote:
> > I agree that just sending a blob of the internal format isn't a great idea.
>
> It's entirely unacceptable afaict. Besides the whole "exposing
> internals" issue, it's also at least not endianess safe, depends on the
> local alignment requirements (which differ both between platforms and
> 32/64 bit), numeric's internal encoding and probably more.

Binary format consuming applications already have to deal with these
kinds of issues. We already expose internal structures in the other
functions -- not sure why jsonb is held to a different standard.  For
other data types where format changes were made, the standard of
'caveat version' was in place to protect the user.  For jsonb we
decided to implement a version flag within the type itself, which I
thought mistake at the time -- better to have a version header in the
COPY BINARY if needed.

People using binary format are basically interested one one thing,
performance.  It's really the fastest way to get data in an out of the
database.  For my part, I'd like to see some observed demonstrable
benefit in terms of performance to justify making the change.

merlin

Re: transction_timestamp() inside of procedures

2018-10-02 Thread Merlin Moncure

On Wed, Sep 26, 2018 at 10:55 AM Alvaro Herrera
 wrote:
>
> On 2018-Sep-26, Tom Lane wrote:
>
> > Alvaro Herrera  writes:
> > > On 2018-Sep-26, Tom Lane wrote:
> > >> I agree that it would be surprising for transaction timestamp to be newer
> > >> than statement timestamp.  So for now at least, I'd be satisfied with
> > >> documenting the behavior.
> >
> > > Really?  I thought it was practically obvious that for transaction-
> > > controlling procedures, the transaction timestamp would not necessarily
> > > be aligned with the statement timestamp.  The surprise would come
> > > together with the usage of the new feature, so existing users would not
> > > be surprised in any way.
> >
> > Nope.  That's the same poor reasoning we've fallen into in some other
> > cases, of assuming that "the user" is a point source of knowledge.
> > But DBMSes tend to interact with lots of different code.  If some part
> > of application A starts using intraprocedure transactions, and then
> > application B breaks because it wasn't expecting to see xact_start
> > later than query_start in pg_stat_activity, you've still got a problem.
>
> While that's true, I think it's also highly hypothetical.
>
> What could be the use for the transaction timestamp?  I think one of the
> most important uses (at least in pg_stat_activity) is to verify that
> transactions are not taking excessively long time to complete;

+1

I think the existing behavior is broken, and extremely so.
Transaction timestamp has a very clear definition to me.  I'm in
planning to move a lot of code into stored procedures from bash, and
upon doing so it's going to trip all kinds of nagios alarms that are
looking at the longest running transaction.

merlin

Re: Stored procedures and out parameters

2018-09-17 Thread Merlin Moncure

On Mon, Sep 17, 2018 at 7:45 AM Jonathan S. Katz  wrote:
>
> Hi,
>
> On 9/2/18 4:32 PM, Robert Haas wrote:
> > On Thu, Aug 30, 2018 at 4:14 PM, Dave Cramer  wrote:
> >> Reading this from the (JDBC) drivers perspective, which is probably a 
> >> fairly
> >> popular one,
> >> We now have a standard that we can't really support. Either the driver will
> >> have to support
> >> the new PROCEDURE with the {call } mechanism or stay with the existing
> >> FUNCTIONS.
> >> This puts the drivers in a no win situation.
> >
> > I understand and I agree.
> >
> >> Undoubtedly.. surely the opportunity to do something about this has not
> >> passed as this has not been
> >> officially released ?
> >
> > I agree with that, too, but I can't make other people do things they
> > don't want to do, and then there's the question of time.  On the basis
> > of previous experience, there is going to be little appetite for
> > holding up the release to address this issue no matter how badly
> > busted it is.  Ultimately, it's the RMT that must decide what to do in
> > cases like this.  I have confidence that they are watching this
> > thread, but I don't know what they will decide to do about it.
> >
>
> First, I want everyone to know that the RMT took this very seriously and
> took time collect feedback and consult with as many people as we could
> in order to make the best possible decision for v11 and ensure that any
> decision we made did not hinder any future implementation for stored
> procedures nor introduced something that would break backwards
> compatibility.
>
> Ultimately, the decision came down to one of four options:
>
> #1: Do nothing and remove the open item
> #2: Introduce nonstandard syntax for calling functions / procedures
> #3: Have CALL support both functions & procedures (or SELECT support
> calling functions)
> #4: Disable CALL
>
> The RMT has decided to go with option #1 and will be removing the open
> item for the PostgreSQL 11 release.

Hooray for making the right choice.   This is exhibit "Z" as to why
abstracting the SQL language behind a programming API can lead to
problems.  The workaround is to simply not do that and you can get
precise control of behavior.  It's a little unfortunate that API
{call} maps to a select but such is life.

merlin

Re: I'd like to discuss scaleout at PGCon

2018-06-22 Thread Merlin Moncure

On Fri, Jun 22, 2018 at 12:34 PM Bruce Momjian  wrote:
>
> On Fri, Jun  1, 2018 at 11:29:43AM -0500, Merlin Moncure wrote:
> > FWIW, Distributed analytical queries is the right market to be in.
> > This is the field in which I work, and this is where the action is at.
> > I am very, very, sure about this.  My view is that many of the
> > existing solutions to this problem (in particular hadoop class
> > soltuions) have major architectural downsides that make them
> > inappropriate in use cases that postgres really shines at; direct
> > hookups to low latency applications for example.  postgres is
> > fundamentally a more capable 'node' with its multiple man-millennia of
> > engineering behind it.  Unlimited vertical scaling (RAC etc) is
> > interesting too, but this is not the way the market is moving as
> > hardware advancements have reduced or eliminated the need for that in
> > many spheres.
> >
> > The direction of the project is sound and we are on the cusp of the
> > point where multiple independent coalescing features (FDW, logical
> > replication, parallel query, executor enhancements) will open new
> > scaling avenues that will not require trading off the many other
> > benefits of SQL that competing contemporary solutions might.  The
> > broader development market is starting to realize this and that is a
> > major driver of the recent upswing in popularity.  This is benefiting
> > me tremendously personally due to having gone 'all-in' with postgres
> > almost 20 years ago :-D. (Time sure flies)These are truly
> > wonderful times for the community.
>
> While I am glad people know a lot about how other projects handle
> sharding, these can be only guides to how Postgres will handle such
> workloads.  I think we need to get to a point where we have all of the
> minimal sharding-specific code features done, at least as
> proof-of-concept, and then test Postgres with various workloads like
> OLTP/OLAP and read-write/read-only.  This will tell us where
> sharding-specific code will have the greatest impact.
>
> What we don't want to do is to add a bunch of sharding-specific code
> without knowing which workloads it benefits, and how many of our users
> will actually use sharding.  Some projects have it done that, and it
> didn't end well since they then had a lot of product complexity with
> little user value.

Key features from my perspective:
*) fdw in parallel.  how do i do it today? ghetto implemented parallel
queries with asynchronous dblink

*) column store

*) automatic partition management through shards

probably some more, gotta run :-)

merlin

Re: I'd like to discuss scaleout at PGCon

2018-06-01 Thread Merlin Moncure

On Wed, May 30, 2018 at 9:26 PM Robert Haas  wrote:
> The FDW approach, of which I have been a supporter for some years now,
> is really aiming at a different target, which is to allow efficient
> analytics queries across a multi-node cluster.  I think we're getting
> pretty close to being able to do that -- IMHO, the last fundamental
> building block that we need is asynchronous execution, which Andres is
> working on.  After that, it's a matter of adding other features that
> people want (like cross-node MVCC) and improving the plans for queries
> that still don't perform well (like joins that involve redistributing
> one of the tables involved).

FWIW, Distributed analytical queries is the right market to be in.
This is the field in which I work, and this is where the action is at.
I am very, very, sure about this.  My view is that many of the
existing solutions to this problem (in particular hadoop class
soltuions) have major architectural downsides that make them
inappropriate in use cases that postgres really shines at; direct
hookups to low latency applications for example.  postgres is
fundamentally a more capable 'node' with its multiple man-millennia of
engineering behind it.  Unlimited vertical scaling (RAC etc) is
interesting too, but this is not the way the market is moving as
hardware advancements have reduced or eliminated the need for that in
many spheres.

The direction of the project is sound and we are on the cusp of the
point where multiple independent coalescing features (FDW, logical
replication, parallel query, executor enhancements) will open new
scaling avenues that will not require trading off the many other
benefits of SQL that competing contemporary solutions might.  The
broader development market is starting to realize this and that is a
major driver of the recent upswing in popularity.  This is benefiting
me tremendously personally due to having gone 'all-in' with postgres
almost 20 years ago :-D. (Time sure flies)These are truly
wonderful times for the community.

merlin

Re: [HACKERS] Clock with Adaptive Replacement

2018-05-09 Thread Merlin Moncure

On Wed, May 9, 2018 at 11:00 AM Robert Haas  wrote:

> Independently of that, it would be probably also useful to avoid
> bumping the reference count multiple times when a buffer is accessed
> by the same backend several times in quick succession.  Perhaps this
> could even be as simple as maintaining a list of the last two buffer
> IDs for which we bumped the usage count; if we see one of them again,
> don't bump the usage count again.

Hm.  Is the objective here to optimize access patterns or to reduce atomic
operations (or both)?   All else being equal, an algorithm that delivers
the similar eviction results with less cache synchronization ought to be
preferred...are the various algorithms scored in that way?

merlin

Re: Built-in connection pooling

2018-05-04 Thread Merlin Moncure

On Fri, May 4, 2018 at 2:25 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, May 4, 2018 at 11:22 AM, Merlin Moncure <mmonc...@gmail.com> wrote:
>> If we are breaking 1:1 backend:session relationship, what controls
>> would we have to manage resource consumption?
>
> I mean, if you have a large number of sessions open, it's going to
> take more memory in any design.  If there are multiple sessions per
> backend, there may be some possibility to save memory by allocating it
> per-backend rather than per-session; it shouldn't be any worse than if
> you didn't have pooling in the first place.

It is absolutely worse, or at least can be.   plpgsql plan caches can
be GUC dependent due to search_path; you might get a different plan
depending on which tables resolve into the function.  You might
rightfully regard this as an edge case but there are other 'leakages',
for example, sessions with different planner settings obviously ought
not to share backend plans.  Point being, there are many
interdependent things in the session that will make it difficult to
share some portions but not others.

> However, I think that's probably worrying about the wrong end of the
> problem first.  IMHO, what we ought to start by doing is considering
> what a good architecture for this would be, and how to solve the
> general problem of per-backend session state.  If we figure that out,
> then we could worry about optimizing whatever needs optimizing, e.g.
> memory usage.

Exactly -- being able to manage down resource consumption by
controlling session count is a major feature that ought not to be
overlooked. So I'm kind of signalling that if given a choice between
that (funneling a large pool of connections down to a smaller number
of backends) and externalized shared sessions I'd rather have the
funnel; it solves a number of very important problems with respect to
server robustness.  So I'm challenging (in a friendly, curious way) if
breaking session:backend 1:1 is really a good idea.  Maybe a
connection pooler implementation can do both of those things or it's
unfair to expect an implementation to do both of them.

merlin

Re: Built-in connection pooling

2018-05-04 Thread Merlin Moncure

On Thu, May 3, 2018 at 12:01 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, Apr 27, 2018 at 4:43 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
>> What _I_ (maybe not others) want is a
>> faster pgbouncer that is integrated into the database; IMO it does
>> everything exactly right.
>
> I have to admit that I find that an amazing statement.  Not that
> pgbouncer is bad technology, but saying that it does everything
> exactly right seems like a vast overstatement.  That's like saying
> that you don't want running water in your house, just a faster motor
> for the bucket you use to draw water from the well.

Well you certainly have a point there; I do have a strong tendency for
overstatement :-).

Let's put it like this: being able to have connections funnel down to
a smaller number of sessions is nice feature.  Applications that are
large, complex, or super high volume have a tendency towards stateless
(with respect to the database session) architecture anyways so I tend
not to mind lack of session features when pooling (prepared statements
perhaps being the big outlier here).  It really opens up a lot of
scaling avenues.  So better a better phrased statement might be, "I
like the way pgbouncer works, in particular transaction mode pooling
from the perspective of the applications using it".  Current main pain
points are the previously mentioned administrative headaches and
better performance from a different architecture (pthreads vs libev)
would be nice.

I'm a little skeptical that we're on the right path if we are pushing
a lot of memory consumption into the session level where a session is
pinned all the way back to a client connection. plpgsql function plan
caches can be particularly hungry on memory and since sessions have
their own GUC ISTM each sessions has to have their own set of them
since plans depend on search path GUC which is session specific.
Previous discussions on managing cache memory consumption (I do dimly
recall you making a proposal on that very thing) centrally haven't
gone past panning stages AFAIK.

If we are breaking 1:1 backend:session relationship, what controls
would we have to manage resource consumption?

merlin

Re: Built-in connection pooling

2018-04-27 Thread Merlin Moncure

On Fri, Apr 27, 2018 at 11:44 AM, Konstantin Knizhnik
<k.knizh...@postgrespro.ru> wrote:
>
>
> On 27.04.2018 18:33, Merlin Moncure wrote:
>> On Fri, Apr 27, 2018 at 10:05 AM, Konstantin Knizhnik
>> <k.knizh...@postgrespro.ru> wrote:
>>> On 27.04.2018 16:49, Merlin Moncure wrote:
>> I'm confused here...could be language issues or terminology (I'll look
>> at your latest code).  Here is how I understand things:
>> Backend=instance of postgres binary
>> Session=application state within postgres binary (temp tables,
>> prepared statement etc)
>> Connection=Client side connection
>
> Backend is a process, forked by postmaster.

right, we are saying the same thing  here.

>> AIUI (I could certainly be wrong), withing connection pooling, ratio
>> of backend/session is still 1:1.  The idea is that client connections
>> when they issue SQL to the server reserve a Backend/Session, use it
>> for the duration of a transaction, and release it when the transaction
>> resolves.  So many client connections share backends.  As with
>> pgbouncer, the concept of session in a traditional sense is not really
>> defined; session state management would be handled within the
>> application itself, or within data within tables, but not within
>> backend private memory.  Does that align with your thinking?
>
> No. Number of sessions is equal to number of client connections.
> So client is not reserving "Backend/Session" as it happen in pgbouncer.
> One backend keeps multiple sessions. And for each session it maintains
> session context which included client's connection.
> And it is backend's decision transaction of which client it is going to
> execute now.
> This is why built-in pooler is able to provide session semantic without
> backend/session=1:1 requirement.

I see.   I'm not so sure that is a good idea in the general sense :(.
Connection sharing sessions is normal and well understood, and we have
tooling to manage that already (DISCARD).  Having the session state
abstracted out and pinned to the client connection seems complex and
wasteful, at least sometimes.  What _I_ (maybe not others) want is a
faster pgbouncer that is integrated into the database; IMO it does
everything exactly right.

merlin

Re: Built-in connection pooling

2018-04-27 Thread Merlin Moncure

On Fri, Apr 27, 2018 at 10:05 AM, Konstantin Knizhnik
<k.knizh...@postgrespro.ru> wrote:
> On 27.04.2018 16:49, Merlin Moncure wrote:
>> *) How are you pinning client connections to an application managed
>> transaction? (IMNSHO, this feature is useless without being able to do
>> that)
>
> Sorry, I do not completely understand the question.
> Rescheduling is now done at transaction level - it means that backand can
> not be switched to other session until completing current transaction.
> The main argument  for transaction level pooling is that it allows not worry
> about heavy weight locks, which are associated with procarray entries.

I'm confused here...could be language issues or terminology (I'll look
at your latest code).  Here is how I understand things:
Backend=instance of postgres binary
Session=application state within postgres binary (temp tables,
prepared statement etc)
Connection=Client side connection

AIUI (I could certainly be wrong), withing connection pooling, ratio
of backend/session is still 1:1.  The idea is that client connections
when they issue SQL to the server reserve a Backend/Session, use it
for the duration of a transaction, and release it when the transaction
resolves.  So many client connections share backends.  As with
pgbouncer, the concept of session in a traditional sense is not really
defined; session state management would be handled within the
application itself, or within data within tables, but not within
backend private memory.  Does that align with your thinking?

merlin

Re: Built-in connection pooling

2018-04-27 Thread Merlin Moncure

On Thu, Apr 26, 2018 at 6:04 AM, Konstantin Knizhnik
<k.knizh...@postgrespro.ru> wrote:
> On 25.04.2018 20:02, Merlin Moncure wrote:
>> Yep.  The main workaround today is to disable them.  Having said that,
>> it's not that difficult to imagine hooking prepared statement creation
>> to a backend starting up (feature: run X,Y,Z SQL before running user
>> queries).
>
> Sorry, I do not completely understand your idea.
> Yes, it is somehow possible to simulate session semantic by prepending all
> session specific commands (mostly setting GUCs) to each SQL statements.
> But it doesn't work for prepared statements: the idea of prepared statements
> is that compilation of statement should be done only once.

The idea is that you have arbitrary SQL that runs when after the
backend (postgres binary) is forked from postmaster.  This would be an
ideal place to introduce prepared statements in a way that is pooling
compatible; you still couldn't PREPARE from the application but you'd
be free to call already prepared statements (via SQL level EXECUTE or
libpq PQexecPrepared()).  Of course, if somebody throws a DEALLOCATE
or DISCARD ALL, or issues a problematic DROP x CASCADE, you'd be in
trouble but that'a not a big deal IMO because you can control for
those things in the application.

> Database performance is mostly limited by disk, so optimal number of
> backends may be different from number of cores.
> But certainly possibility to launch "optimal" number of backends is one of
> the advantages of builtin session pooling.

Sure, but some workloads are cpu limited (all- or mostly- read with
data < memory, or very complex queries on smaller datasets).   So we
would measure configure based one expectations exactly as is done
today with pgbouncer.   This is a major feature of pgbouncer: being
able to _reduce_ the number of session states relative to the number
of connections is an important feature; it isolates your database from
various unpleasant failure modes such as runaway memory consumption.

Anyways, I'm looking at your patch.  I see you've separated the client
connection count ('sessions') from the server backend instances
('backends') in the GUC.  Questions:
*) Should non pooled connections be supported simultaneously with
pooled connections?
*) Should there be multiple pools with independent configurations (yes, please)?
*) How are you pinning client connections to an application managed
transaction? (IMNSHO, this feature is useless without being able to do
that)

FYI, it's pretty clear you've got a long road building consensus and
hammering out a reasonable patch through the community here.  Don't
get discouraged -- there is value here, but it's going to take some
work.

merlin

Re: Built-in connection pooling

2018-04-25 Thread Merlin Moncure

On Wed, Apr 25, 2018 at 2:58 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Apr 25, 2018 at 10:00 AM, Merlin Moncure <mmonc...@gmail.com> wrote:
>> systems.  If we get that tor free I'd be all for it but reading
>> Robert's email I'm skeptical there are easy wins here.  So +1 for
>> further R and -1 for holding things up based on full
>> transparency...no harm in shooting for that, but let's look at things
>> from a cost/benefit perspective (IMO).
>
> Also, I think it's worth considering that the kinds of failures users
> will get out of anything that's not handled are really the worst kind.
> If you have an application that relies on session state other than
> what his patch knows how to preserve, your application will appear to
> work in light testing because your connection won't actually be
> swapped out underneath you -- and then fail unpredictably in
> production when such swapping occurs.  There will be no clear way to
> tell which error messages or behavior differences are due to
> limitations of the proposed feature, which ones are due to defects in
> the application, and which ones might be due to PostgreSQL bugs.
> They'll all look the same, and even experienced PG hackers won't

Connection pooling is not a new phenomenon, and many stacks (in
particular java) tend to pool connection by default.  All of the
problems we discuss here for the most part affect competitive
solutions and I humbly submit the tradeoffs are _very_ widely
understood.  FWICT we get occasional reports that are simply and
clearly answered.  I guess there are some people dumb enough to flip
GUC settings involving seemingly important things in production
without testing or reading any documentation or the innumerable
articles and blogs that will pop up...hopefully they are self
selecting out of the industry :-).

Looking at pgbouncer, they produce a chart that says, 'these features
don't work, and please consider that before activating this feature'
(https://wiki.postgresql.org/wiki/PgBouncer#Feature_matrix_for_pooling_modes)
and that ought to be entirely sufficient to avoid that class of
problems.   This is very clear and simple.  The main gripes with
pgbouncer FWICT were relating to the postgres JDBC driver's
unavoidable tendency (later fixed) to prepare 'BEGIN' causing various
problems, which was a bug really (in the JDBC driver) which did in
fact spill into this list.

For this feature to be really attractive we'd want to simultaneously
allow pooled and non-pooled connections on different ports, or even
multiple pools (say, for different applications).  Looking at things
from your perspective, we might want to consider blocking (with error)
features that are not 'pooling compatible' if they arrive through a
pooled connection.

merlin

Re: Built-in connection pooling

2018-04-25 Thread Merlin Moncure

On Wed, Apr 25, 2018 at 9:43 AM, Christophe Pettus <x...@thebuild.com> wrote:
>
>> On Apr 25, 2018, at 07:00, Merlin Moncure <mmonc...@gmail.com> wrote:
>> The limitations headaches that I suffer with pgbouncer project (which
>> I love and use often) are mainly administrative and performance
>> related, not lack of session based server features.
>
> For me, the most common issue I run into with pgbouncer (after general 
> administrative overhead of having another moving part) is that it works at 
> cross purposes with database-based sharding, as well as useful role and 
> permissions scheme.  Since each server connection is specific to a 
> database/role pair, you are left with some unappealing options to handle that 
> in a pooling environment.

Would integrated pooling help the sharding case (genuinely curious)?
I don't quite have my head around the issue.  I've always wanted
pgbouncer to be able to do things like round robin queries to
non-sharded replica for simple load balancing but it doesn't (yet)
have that capability.  That type of functionality would not fit into
in in-core pooler AIUI.  Totally agree that the administrative
benefits (user/role/.conf/etc/etc) is a huge win.

> The next most common problem are prepared statements breaking, which 
> certainly qualifies as a session-level feature.

Yep.  The main workaround today is to disable them.  Having said that,
it's not that difficult to imagine hooking prepared statement creation
to a backend starting up (feature: run X,Y,Z SQL before running user
queries).  This might be be less effort than, uh, moving backend
session state to a shareable object.  I'll go further; managing cache
memory consumption (say for pl/pgsql cached plans) is a big deal for
certain workloads.   The only really effective way to deal with that
is to manage the server connection count and/or recycle server
connections on intervals.  Using pgbouncer to control backend count is
a very effective way to deal with this problem and allowing
virtualized connections to each mange there independent cache would be
a step in the opposite direction. I very much like having control so
that I have exactly 8 backends for my 8 core server with 8 copies of
cache.

Advisory locks are a completely separate problem.  I suspect they
might be used more than you realize, and they operate against a very
fundamental subsystem of the database: the locking engine.  I'm
struggling as to why we would take another approach than 'don't use
the non-xact variants of them in a pooling environment'.

merlin

Re: Built-in connection pooling

2018-04-25 Thread Merlin Moncure

On Wed, Apr 25, 2018 at 12:34 AM, Christophe Pettus <x...@thebuild.com> wrote:
>
>> On Apr 24, 2018, at 06:52, Merlin Moncure <mmonc...@gmail.com> wrote:
>> Why does it have to be completely transparent?
>
> The main reason to move it into core is to avoid the limitations that a 
> non-core pooler has.

The limitations headaches that I suffer with pgbouncer project (which
I love and use often) are mainly administrative and performance
related, not lack of session based server features.  Applications that
operate over a very large amount of virtual connections or engage a
very high level of small transaction traffic are going to avoid
session based features for a lot of other reasons anyways, at least in
my experience.  Probably the most useful feature I miss is async
notifications, so much so that at one point we hacked pgbouncer to
support them.  Point being, full transparency is nice, but there are
workarounds for most of the major issues and there are a lot of side
channel benefits to making your applications 'stateless' (defined as
state in application or database but not in between).

Absent any other consideration, OP has proven to me that there is
massive potential performance gains possible from moving the pooling
mechanism into the database core process, and I'm already very excited
about not having an extra server process to monitor and worry about.
Tracking session state out of process seems pretty complicated and
would probably add add complexity or overhead to multiple internal
systems.  If we get that tor free I'd be all for it but reading
Robert's email I'm skeptical there are easy wins here.  So +1 for
further R and -1 for holding things up based on full
transparency...no harm in shooting for that, but let's look at things
from a cost/benefit perspective (IMO).

merlin

Re: Built-in connection pooling

2018-04-24 Thread Merlin Moncure

On Mon, Apr 23, 2018 at 3:14 PM, Robert Haas  wrote:
> In other words, transparent connection pooling is going to require
> some new mechanism, which third-party code will have to know about,
> for tracking every last bit of session state that might need to be
> preserved or cleared.  That's going to be a big project.  Maybe some
> of that can piggyback on existing infrastructure like
> InvalidateSystemCaches(), but there's probably still a ton of ad-hoc
> state to deal with.  And no out-of-core pooler has a chance of
> handling all that stuff correctly; an in-core pooler will be able to
> do so only with a lot of work.

Why does it have to be completely transparent?  As long as the feature
is optional (say, a .conf setting) the tradeoffs can be managed.  It's
a reasonable to expect to exchange some functionality for pooling;
pgbouncer provides a 'release' query (say, DISCARD ALL)  to be called
upon release back to the pool.  Having session state objects (not all
of which we are talking about; advisory locks and notifications
deserve consideration) 'just work' would be wonderful but ought not to
hold up other usages of the feature.

merlin

Re: INOUT parameters in procedures

2018-03-20 Thread Merlin Moncure

On Tue, Mar 20, 2018 at 4:19 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
> A) you can't assign output variables with into:
> CALL p(1) INTO i;  // gives syntax error
>
> B) you can't assign via assignment
> i := p(1); // gives error, 'use CALL'
>
> C) but you *can* via execute
> EXECUTE 'CALL p(1)' INTO i;  // this works!
>
> ...I'm glad 'C' works, as without that there would be no useful way to
> get values out of procedures called from within other
> procedures/functions as things stand today.  'A' ideally also out to
> work, but I'm not sure  'B' should be expected to work since it's
> really a thin layer around SELECT.   What do you think?

Also (sorry for spam),
A procedure created via:
create procedure p() as $$begin call p(); end; $$ language plpgsql;
...will segfault when called -- there ought to be a stack depth check.

merlin

Re: INOUT parameters in procedures

2018-03-20 Thread Merlin Moncure

On Tue, Mar 20, 2018 at 10:09 AM, Pavel Stehule <pavel.steh...@gmail.com> wrote:
> 2018-03-20 15:18 GMT+01:00 Merlin Moncure <mmonc...@gmail.com>:
>> >> postgres=# create or replace procedure p(a inout int default 7) as $$
>> >> begin return; end; $$ language plpgsql;
>> >> CREATE PROCEDURE
>> >> Time: 1.182 ms
>> >> postgres=# call p();
>> >>  a
>> >> ───
>> >>  0
>> >> (1 row)
>> >
>> >
>> > I wrote patch
>>
>> Confirmed this fixes the issue.
>
> Thanks for info

You're welcome.  Working with this feature some more, I noticed that:
A) you can't assign output variables with into:
CALL p(1) INTO i;  // gives syntax error

B) you can't assign via assignment
i := p(1); // gives error, 'use CALL'

C) but you *can* via execute
EXECUTE 'CALL p(1)' INTO i;  // this works!

...I'm glad 'C' works, as without that there would be no useful way to
get values out of procedures called from within other
procedures/functions as things stand today.  'A' ideally also out to
work, but I'm not sure  'B' should be expected to work since it's
really a thin layer around SELECT.   What do you think?

merlin

Re: INOUT parameters in procedures

2018-03-20 Thread Merlin Moncure

On Tue, Mar 20, 2018 at 9:09 AM, Pavel Stehule  wrote:
>> Edit: In one case, after dropping the function and recreating it, I
>> got the procedure to return 0 where it had not before, so this smells
>> like a bug.
>> postgres=# call p();
>> 2018-03-20 09:04:50.543 CDT [21494] ERROR:  function p() does not
>> exist at character 6
>> 2018-03-20 09:04:50.543 CDT [21494] HINT:  No function matches the
>> given name and argument types. You might need to add explicit type
>> casts.
>> 2018-03-20 09:04:50.543 CDT [21494] STATEMENT:  call p();
>> ERROR:  function p() does not exist
>> LINE 1: call p();
>>  ^
>> HINT:  No function matches the given name and argument types. You
>> might need to add explicit type casts.
>> Time: 0.297 ms
>> postgres=# create or replace procedure p(a inout int default 7) as $$
>> begin return; end; $$ language plpgsql;
>> CREATE PROCEDURE
>> Time: 1.182 ms
>> postgres=# call p();
>>  a
>> ───
>>  0
>> (1 row)
>
>
> I wrote patch

Confirmed this fixes the issue.

merlin

Re: INOUT parameters in procedures

2018-03-20 Thread Merlin Moncure

On Wed, Feb 28, 2018 at 4:28 PM, Peter Eisentraut
 wrote:
> This patch set adds support for INOUT parameters to procedures.
> Currently, INOUT and OUT parameters are not supported.
>
> A top-level CALL returns the output parameters as a result row.  In
> PL/pgSQL, I have added special support to pass the output back into the
> variables, as one would expect.
>
> These patches apply on top of the "prokind" patch set v2.  (Tom has
> submitted an updated version of that, which overlaps with some of the
> changes I've made here.  I will work on consolidating that soon.)

I did a pull from master to play around with INOUT parameters and got
some strange interactions with DEFAULT.  Specifically, DEFAULT doesn't
do much beyond, 'return the last supplied value given'.  I'm not sure
if this is expected behavior; it seems odd:

postgres=# create or replace procedure p(a inout int default 7) as $$
begin return; end; $$ language plpgsql;
CREATE PROCEDURE
postgres=# call p();
 a
───

(1 row)

postgres=# call p(3);
 a
───
 3
(1 row)

postgres=# call p();
 a
───
 3
(1 row)

I got null,3,3.  I would have expected 7,3,7.  Default arguments might
remove quite some of the pain associated with having to supply bogus
arguments to get the INOUT parameters working.

Edit: In one case, after dropping the function and recreating it, I
got the procedure to return 0 where it had not before, so this smells
like a bug.
postgres=# call p();
2018-03-20 09:04:50.543 CDT [21494] ERROR:  function p() does not
exist at character 6
2018-03-20 09:04:50.543 CDT [21494] HINT:  No function matches the
given name and argument types. You might need to add explicit type
casts.
2018-03-20 09:04:50.543 CDT [21494] STATEMENT:  call p();
ERROR:  function p() does not exist
LINE 1: call p();
 ^
HINT:  No function matches the given name and argument types. You
might need to add explicit type casts.
Time: 0.297 ms
postgres=# create or replace procedure p(a inout int default 7) as $$
begin return; end; $$ language plpgsql;
CREATE PROCEDURE
Time: 1.182 ms
postgres=# call p();
 a
───
 0
(1 row)

merlin

Re: JIT compiling with LLVM v9.0

2018-02-11 Thread Merlin Moncure

On Thu, Jan 25, 2018 at 9:40 AM, Konstantin Knizhnik
 wrote:
> As far as I understand generation of native code is now always done for all
> supported expressions and individually by each backend.
> I wonder it will be useful to do more efforts to understand when compilation
> to native code should be done and when interpretation is better.
> For example many JIT-able languages like Lua are using traces, i.e. query is
> first interpreted  and trace is generated. If the same trace is followed
> more than N times, then native code is generated for it.
>
> In context of DBMS executor it is obvious that only frequently executed or
> expensive queries have to be compiled.
> So we can use estimated plan cost and number of query executions as simple
> criteria for JIT-ing the query.
> May be compilation of simple queries (with small cost) should be done only
> for prepared statements...
>
> Another question is whether it is sensible to redundantly do expensive work
> (llvm compilation) in all backends.
> This question refers to shared prepared statement cache. But even without
> such cache, it seems to be possible to use for library name some signature
> of the compiled expression and allow
> to share this libraries between backends. So before starting code
> generation, ExecReadyCompiledExpr can first build signature and check if
> correspondent library is already present.
> Also it will be easier to control space used by compiled libraries in this
> case.

Totally agree; these considerations are very important.

I tested several queries in my application that had >30 second compile
times against a one second run time,.  Not being able to manage when
compilation happens is making it difficult to get a sense of llvm
performance in the general case.  Having explain analyze print compile
time and being able to prepare llvm compiled queries ought to help
measurement and tuning.  There may be utility here beyond large
analytical queries as the ability to optimize spreads through the
executor with the right trade off management.

This work is very exciting...thank you.

merlin

Re: JIT compiling with LLVM v9.0

2018-02-09 Thread Merlin Moncure

On Thu, Feb 1, 2018 at 8:16 PM, Thomas Munro
 wrote:
> On Fri, Feb 2, 2018 at 2:05 PM, Andres Freund  wrote:
>> On 2018-02-01 09:32:17 -0800, Jeff Davis wrote:
>>> On Wed, Jan 31, 2018 at 12:03 AM, Konstantin Knizhnik
>>>  wrote:
>>> > The same problem takes place with old versions of GCC: I have to upgrade 
>>> > GCC
>>> > to 7.2 to make it possible to compile this code.
>>> > The problem in not in compiler itself, but in libc++ headers.
>>>
>>> How can I get this branch to compile on ubuntu 16.04? I have llvm-5.0
>>> and gcc-5.4 installed. Do I need to compile with clang or gcc? Any
>>> CXXFLAGS required?
>>
>> Just to understand: You're running in the issue with the header being
>> included from within the extern "C" {}?  Hm, I've pushed a quick fix for
>> that.
>
> That change wasn't quite enough: to get this building against libc++
> (Clang's native stdlb) I also needed this change to llvmjit.h so that
>  wouldn't be included with the wrong linkage (perhaps
> you can find a less ugly way):
>
> +#ifdef __cplusplus
> +}
> +#endif
>  #include 
> +#ifdef __cplusplus
> +extern "C"
> +{
> +#endif

This did the trick -- thanks.  Sitting through 20 minute computer
crashing link times really brings back C++ nightmares -- if anyone
else needs to compile llvm/clang as I did (I'm stuck on 3.2 with my
aging mint box), I strongly encourage you to use the gold linker.

Question:  when watching the compilation log, I see quite a few files
being compiled with both O2 and O1, for example:

clang -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -Wno-unused-command-line-argument -O2 -O1
-Wno-ignored-attributes -Wno-unknown-warning-option
-Wno-ignored-optimization-argument -I../../../../src/include
-D_GNU_SOURCE -I/home/mmoncure/llvm/include -DLLVM_BUILD_GLOBAL_ISEL
-D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS
-D__STDC_LIMIT_MACROS  -flto=thin -emit-llvm -c -o nbtsort.bc
nbtsort.c

Is this intentional?  (didn't check standard compilation, it just jumped out).

merlin

Re: JIT compiling with LLVM v9.0

2018-02-01 Thread Merlin Moncure

On Wed, Jan 31, 2018 at 1:45 PM, Robert Haas  wrote:
> On Wed, Jan 31, 2018 at 1:34 PM, Andres Freund  wrote:
>>> The first one is a problem that's not going to go away.  If the
>>> problem of JIT being enabled "magically" is something we're concerned
>>> about, we need to figure out a good solution, not just disable the
>>> feature by default.
>>
>> That's a fair argument, and I don't really have a good answer to it. We
>> could have a jit = off/try/on, and use that to signal things? I.e. it
>> can be set to try (possibly default in version + 1), and things will
>> work if it's not installed, but if set to on it'll refuse to work if not
>> enabled. Similar to how huge pages work now.
>
> We could do that, but I'd be more inclined just to let JIT be
> magically enabled.  In general, if a user could do 'yum install ip4r'
> (for example) and have that Just Work without any further database
> configuration, I think a lot of people would consider that to be a
> huge improvement.  Unfortunately we can't really do that for various
> reasons, the biggest of which is that there's no way for installing an
> OS package to modify the internal state of a database that may not
> even be running at the time.  But as a general principle, I think
> having to configure both the OS and the DB is an anti-feature, and
> that if installing an extra package is sufficient to get the
> new-and-improved behavior, users will like it.  Bonus points if it
> doesn't require a server restart.

You bet.  It'd be helpful to have some obvious, well advertised ways
to determine when it's enabled and when it isn't, and to have a
straightforward process to determine what to fix when it's not enabled
and the user thinks it ought to be though.

merlin

Re: procedures and plpgsql PERFORM

2017-12-14 Thread Merlin Moncure

On Thu, Dec 14, 2017 at 11:56 AM, Pavel Stehule <pavel.steh...@gmail.com> wrote:
>
>
> 2017-12-14 18:33 GMT+01:00 Merlin Moncure <mmonc...@gmail.com>:
>>
>> On Thu, Dec 14, 2017 at 10:46 AM, Pavel Stehule <pavel.steh...@gmail.com>
>> wrote:
>> >
>> >
>> > 2017-12-14 17:10 GMT+01:00 David G. Johnston
>> > <david.g.johns...@gmail.com>:
>> >>
>> >> On Thu, Dec 14, 2017 at 8:22 AM, Merlin Moncure <mmonc...@gmail.com>
>> >> wrote:
>> >>>
>> >>> On Thu, Dec 14, 2017 at 8:38 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> >>> > Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> writes:
>> >>> >> We allow a function to be invoked as part of PERFORM statement in
>> >>> >> plpgsql
>> >>> >> ...
>> >>> >> But we do not allow a procedure to be invoked this way
>> >>> >
>> >>> >> Procedures fit that category and like functions, I think, we should
>> >>> >> allow them be invoked directly without any quoting and CALL
>> >>> >> decoration.
>> >>> >
>> >>> > How is that going to work?  What if the procedure tries to commit
>> >>> > the
>> >>> > current transaction?
>> >>> >
>> >>> > IOW, this is not merely a syntactic-sugar question.
>> >>>
>> >>> BTW, We've already come to (near-but good enough) consensus that
>> >>> PERFORM syntax is really just unnecessary, and I submitted a patch to
>> >>> make it optional (which I really need to dust off and complete).
>> >>
>> >>
>> >> Except right now PERFORM doesn't exist in SQL and is a pl/pgsql keyword
>> >> to
>> >> specify a specific limited form of the SQL SELECT command.  CALL is an
>> >> SQL
>> >> command.  I don't see any real upside to allowing pl/pgsql to accept
>> >> omission of the command tag while SQL cannot - at least not without a
>> >> use-case describe why such syntax would be beneficial.  And likely
>> >> those use
>> >> cases would revolve around some looping variant as opposed to a single
>> >> stand-alone, result-less, CALL.
>> >>
>> >> If we do keep "PERFORM" in the pl/pgsql vocab I'd consider the
>> >> following
>> >> enhancement:
>> >> PERFORM func() => SELECT func()
>> >> PERFORM proc() => CALL proc()
>> >
>> >
>> > I don't like this idea - functions are not procedures - can be nice if
>> > it
>> > will be visible.
>>
>> We need to get rid of PERFORM ASAP.  Agree that we need to not obfuscate
>> CALL.
>
> If we have a procedures, then functions without returned values lost a sense
> - and I don't see any changes with PERFORM necessary.

I don't think the presence of procedures really changes the thinking
here.  Having to simulate procedures with void returning functions
wasn't really the point; it's an annoying syntax departure from SQL
for little benefit other than assuming the users are wrong when they
are not explicitly capturing the result..

the topic was heavily discussed:
https://www.postgresql.org/message-id/CAHyXU0zYbeT-FzuonaaycbS9Wd8d5JO%2B_niAygzYtv5FMdx4rg%40mail.gmail.com


merlin

Re: procedures and plpgsql PERFORM

2017-12-14 Thread Merlin Moncure

On Thu, Dec 14, 2017 at 10:46 AM, Pavel Stehule <pavel.steh...@gmail.com> wrote:
>
>
> 2017-12-14 17:10 GMT+01:00 David G. Johnston <david.g.johns...@gmail.com>:
>>
>> On Thu, Dec 14, 2017 at 8:22 AM, Merlin Moncure <mmonc...@gmail.com>
>> wrote:
>>>
>>> On Thu, Dec 14, 2017 at 8:38 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> > Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> writes:
>>> >> We allow a function to be invoked as part of PERFORM statement in
>>> >> plpgsql
>>> >> ...
>>> >> But we do not allow a procedure to be invoked this way
>>> >
>>> >> Procedures fit that category and like functions, I think, we should
>>> >> allow them be invoked directly without any quoting and CALL
>>> >> decoration.
>>> >
>>> > How is that going to work?  What if the procedure tries to commit the
>>> > current transaction?
>>> >
>>> > IOW, this is not merely a syntactic-sugar question.
>>>
>>> BTW, We've already come to (near-but good enough) consensus that
>>> PERFORM syntax is really just unnecessary, and I submitted a patch to
>>> make it optional (which I really need to dust off and complete).
>>
>>
>> Except right now PERFORM doesn't exist in SQL and is a pl/pgsql keyword to
>> specify a specific limited form of the SQL SELECT command.  CALL is an SQL
>> command.  I don't see any real upside to allowing pl/pgsql to accept
>> omission of the command tag while SQL cannot - at least not without a
>> use-case describe why such syntax would be beneficial.  And likely those use
>> cases would revolve around some looping variant as opposed to a single
>> stand-alone, result-less, CALL.
>>
>> If we do keep "PERFORM" in the pl/pgsql vocab I'd consider the following
>> enhancement:
>> PERFORM func() => SELECT func()
>> PERFORM proc() => CALL proc()
>
>
> I don't like this idea - functions are not procedures - can be nice if it
> will be visible.

We need to get rid of PERFORM ASAP.  Agree that we need to not obfuscate CALL.

merlin

Re: procedures and plpgsql PERFORM

2017-12-14 Thread Merlin Moncure

On Thursday, December 14, 2017, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> Hi,
> We allow a function to be invoked as part of PERFORM statement in plpgsql
> do $$
> begin perform pg_relation_size('t1'); end; $$ language plpgsql;
> DO
>
> But we do not allow a procedure to be invoked this way
>  create procedure dummy_proc(a int) as $$
> begin null; end;
> $$ language plpgsql;
> CREATE PROCEDURE
>
> do $$
> begin perform dummy_proc(1); end; $$ language plpgsql;
> ERROR:  dummy_proc(integer) is a procedure
> LINE 1: SELECT dummy_proc(1)
>^
> HINT:  To call a procedure, use CALL.
> QUERY:  SELECT dummy_proc(1)
> CONTEXT:  PL/pgSQL function inline_code_block line 2 at PERFORM
>
> The documentation of PERFORM [1] says
> "For any SQL command that does not return rows, for example INSERT
> without a RETURNING clause, you can execute the command within a
> PL/pgSQL function just by writing the command."
>
> Procedures fit that category and like functions, I think, we should
> allow them be invoked directly without any quoting and CALL
> decoration.
>

I disagree.  The SQL command is 'CALL'.  The documentation is really only
clarifying when PERFORM is explicitly required.

merlin

Re: [HACKERS] Transaction control in procedures

2017-12-06 Thread Merlin Moncure

On Wed, Dec 6, 2017 at 8:41 AM, Peter Eisentraut
 wrote:
> On 12/5/17 13:33, Robert Haas wrote:
>> On Tue, Dec 5, 2017 at 1:25 PM, Peter Eisentraut
>>  wrote:
>>> I think ROLLBACK in a cursor loop might not make sense, because the
>>> cursor query itself could have side effects, so a rollback would have to
>>> roll back the entire loop.  That might need more refined analysis before
>>> it could be allowed.
>>
>> COMMIT really has the same problem; if the cursor query has side
>> effects, you can't commit those side effects piecemeal as the loop
>> executed and have things behave sanely.
>
> The first COMMIT inside the loop would commit the cursor query.  This
> isn't all that different from what you'd get now if you coded this
> manually using holdable cursors or just plain client code.  Clearly, you
> can create a mess if the loop body interacts with the loop expression,
> but that's already the case.
>
> But if you coded something like this yourself now and ran a ROLLBACK
> inside the loop, the holdable cursor would disappear (unless previously
> committed), so you couldn't proceed with the loop.
>
> The SQL standard for persistent stored modules explicitly prohibits
> COMMIT and ROLLBACK in cursor loop bodies.  But I think people will
> eventually want it.

The may want it, but silently promoting all cursors to held ones is
not the way to give it to them, unless we narrow it down the the
'for-loop derived cursor' only.  Even then we should consider syntax
decoration:

FOR x IN SELECT  WITH HOLD
LOOP

END LOOP;

merlin

Re: feature request: consume asynchronous notification via a function

2017-11-21 Thread Merlin Moncure

On Tue, Nov 21, 2017 at 2:16 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Tue, Nov 21, 2017 at 12:20 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmh...@gmail.com> writes:
>>> On Tue, Nov 21, 2017 at 11:32 AM, Merlin Moncure <mmonc...@gmail.com> wrote:
>>>> I am very much looking at the new stored procedure functionality and
>>>> imaging a loop like this:
>>>>
>>>> LOOP
>>>>   FOR r IN SELECT * FROM pg_get_notifications(30)
>>>>   LOOP
>>>> PERFORM do_stuff(r);
>>>>   END LOOP;
>>>>   COMMIT;  -- advance xmin etc
>>>> END LOOP;
>>
>>> Yeah, if you keep the timeout fairly short, it would probably work OK
>>> (with Peter's stuff).
>>
>> Traditionally, NOTIFY messages are delivered to the client only between
>> transactions, so that there is no question about whether the
>> message-delivery should roll back if the surrounding transaction aborts.
>> It's not very clear to me what the behavior of pg_get_notifications()
>> inside a transaction ought to be.  Is it OK if it's a volatile function
>> and the messages are just gone once the function has returned them,
>> even if you fail to do anything about them because your transaction
>> fails later?
>
> I think destroying upon consumption is OK.  There are a lot of
> mitigation strategies to deal with that issue and NOTIFY is for
> signalling, not queuing.
>
>> (I'd be against having a function that returns more than one at a time,
>> in any case, as that just complicates matters even more.)

Hm, a less controversial approach might be to only allow consumption
of notifications that were delivered at the start of transaction.
Procedures could then issue an intervening 'COMMIT' statement to pick
up new notifications.  There's be no reason for a timeout argument in
that case obviously, so the end user would have to poll in order to
pick up the notification, which I don't like.   This would be an
alternative approach to the way it do it today, which is to poll for a
set table flag in a non-serializable transaction, maybe with enough
differentiation in use to merit introduction.

merlin

Re: feature request: consume asynchronous notification via a function

2017-11-21 Thread Merlin Moncure

On Tue, Nov 21, 2017 at 12:20 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Robert Haas <robertmh...@gmail.com> writes:
>> On Tue, Nov 21, 2017 at 11:32 AM, Merlin Moncure <mmonc...@gmail.com> wrote:
>>> I am very much looking at the new stored procedure functionality and
>>> imaging a loop like this:
>>>
>>> LOOP
>>>   FOR r IN SELECT * FROM pg_get_notifications(30)
>>>   LOOP
>>> PERFORM do_stuff(r);
>>>   END LOOP;
>>>   COMMIT;  -- advance xmin etc
>>> END LOOP;
>
>> Yeah, if you keep the timeout fairly short, it would probably work OK
>> (with Peter's stuff).
>
> Traditionally, NOTIFY messages are delivered to the client only between
> transactions, so that there is no question about whether the
> message-delivery should roll back if the surrounding transaction aborts.
> It's not very clear to me what the behavior of pg_get_notifications()
> inside a transaction ought to be.  Is it OK if it's a volatile function
> and the messages are just gone once the function has returned them,
> even if you fail to do anything about them because your transaction
> fails later?

I think destroying upon consumption is OK.  There are a lot of
mitigation strategies to deal with that issue and NOTIFY is for
signalling, not queuing.

> (I'd be against having a function that returns more than one at a time,
> in any case, as that just complicates matters even more.)

ok.

merlin

feature request: consume asynchronous notification via a function

2017-11-17 Thread Merlin Moncure

Hackers,

Currently the only way that I know of to consume async notifications
via SQL (as opposed to a client application) is via dblink_get_notify.
This method isn't very good; it requires some extra support coding,
eats a connection and a backend, and doesn't have any timeout
facilities.  The lack a good facility to do this will become more
troublesome if/when Peter's recent fantastic work to implement stored
procedures in the database gets accepted; asynchronous notifications
could be a more efficient mechanic for backend processes to signal
each other than the current method of signalling via fields in a
table.

A good interface might look something like:
pg_get_notifications(
  TimeOut INT DEFAULT 0,
  notify_name  OUT TEXT,
  payload OUT TEXT,
  pid OUT INT) RETURNS SETF RECORD AS...

The function would return immediately by default, or until TimeOut
seconds transpired.  We'd still have to poll internally, so that
signals could be checked etc, but this would be a nice way to consume
notifications without any dependencies -- what do you think?

merlin

Re: [HACKERS] Transaction control in procedures

2017-11-16 Thread Merlin Moncure

On Thu, Nov 16, 2017 at 5:35 PM, Simon Riggs  wrote:
> On 14 November 2017 at 13:09, Peter Eisentraut
>  wrote:
>
>>> *) Will pg_cancel_backend() cancel the currently executing statement
>>> or the procedure? (I guess probably the procedure but I'm curious)
>>
>> Same as the way it currently works.  It will raise an exception, which
>> will travel up the stack and eventually issue an error or be caught.  If
>> someone knows more specific concerns here I could look into it, but I
>> don't see any problem.
>>
>>> *) Will long running procedures be subject to statement timeout (and
>>> does it apply to the entire procedure)?
>>
>> See previous item.
>>
>>> Will they be able to control
>>> statement_timeout from within the procedure itself?
>>
>> The statement timeout alarm is set by the top-level execution loop, so
>> you can't change a statement timeout that is already in progress.  But
>> you could change the GUC and commit it for the next top-level statement.
>>
>>> *) Will pg_stat_activity show the invoking CALL or the currently
>>> executing statement?  I see a strong argument for showing both of
>>> these things. although I understand that's out of scope here.
>>
>> Not different from a function execution, i.e., top-level statement.
>
> Which is the "top-level statement"? The CALL or the currently
> executing statement within the proc? I think you mean former.
>
> For the first two answers above the answer was "currently executing
> statement", yet the third answer seems to be the procedure. So that is
> a slight discrepancy.
>
> ISTM we would like
>
> 1) a way to cancel execution of a procedure
> 2) a way to set a timeout to cancel execution of a procedure
>
> as well as
>
> 1) a way to cancel execution of a statement that is running within a procedure
> 2) a way to set a timeout to cancel execution of a statement in a procedure
>
> Visibility of what a routine is currently executing is the role of a
> debugger utility/API.

How could you cancel a statement but not the procedure itself?
Cancelling (either by timeout or administrative) type errors
untrappable by design for very good reasons and untrapped errors ought
to return the database all the way to 'ready for query'.

merlin

Re: pspg - psql pager

2017-11-16 Thread Merlin Moncure

On Thu, Nov 16, 2017 at 11:04 AM, Pavel Stehule  wrote:
> 2017-11-16 16:40 GMT+01:00 Aaron W. Swenson :
>> I likes it. I likes it a lot.
>>
>> And, you’ve got a PR on GitHub from me.
>>
>> It’s available on Gentoo now!

This would be great package for inclusion in pgdg packaging as well.
Dependencies are minor and compile risk is low.

merlin

Re: Transaction control in procedures

2017-11-16 Thread Merlin Moncure

On Thu, Nov 16, 2017 at 12:36 PM, Peter Eisentraut
 wrote:
> On 11/16/17 07:04, legrand legrand wrote:
>> We are just opening the  "close cursors on/at commit" specification ;o)
>>
>> - MS SQL server: cursor_close_on_commit
>> - Firebird: close_cursors_at_commit
>> - DB2: "with hold" syntax
>> - ...
>>
>> I think it a plus to support keeping opened cursors at commit time,
>> but impacts have to be checked in details ...
>
> I think the facilities to support this in PostgreSQL are already there.
> We'd just have to tweak PL/pgSQL to make some of its internal portals
> "held" and then clean them up manually at some later point.  So I think
> this is a localized detail, not a fundamental problem.

Automatically persisting cursors (WITH HOLD) can have some very
surprising performance considerations, except when the current code
execution depends on that particular cursor, in which case the current
behavior of raising a (hopefully better worded-) error seems
appropriate.  Cursors based on temporary tables could be exempt from
having to be closed or checked on COMMIT.

plpgsql does not have the facility to create held cursors
FWICT...automatic promotion seems pretty dubious.  It could certainly
be added, and cursors so held could be exempt from being force
closed/errored as well. In lieu of that, having users materialize data
in to temp tables for such cases seems reasonable.

merlin

Re: [HACKERS] Transaction control in procedures

2017-11-16 Thread Merlin Moncure

On Wed, Nov 15, 2017 at 3:42 PM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 11/15/17 09:54, Merlin Moncure wrote:
>> ... I noticed that:
>> *) now() did not advance with commit and,
>> *) xact_start via pg_stat_activity did not advance
>>
>> Shouldn't both of those advance with the in-loop COMMIT?
>
> I think you are correct.  I'll include that in the next patch version.
> It shouldn't be difficult.

Thanks.  A couple of more things.

*) This error message is incorrect now:
postgres=# CREATE OR REPLACE PROCEDURE foo() AS
$$
BEGIN
  LOOP
SAVEPOINT x;
  END LOOP;
END;
$$ LANGUAGE PLPGSQL;
CREATE PROCEDURE
Time: 0.912 ms
postgres=# call foo();
ERROR:  cannot begin/end transactions in PL/pgSQL
HINT:  Use a BEGIN block with an EXCEPTION clause instead.
CONTEXT:  PL/pgSQL function foo() line 5 at SQL statement

I guess there are a few places that assume pl/pgsql is always run from
a in-transaction function.

*) Exception handlers seem to override COMMITs.  The the following
procedure will not insert any rows.  I wonder if this is the correct
behavior.  I think there's a pretty good case to be made to raise an
error if a COMMIT is issued if you're in an exception block.

CREATE OR REPLACE PROCEDURE foo() AS
$$
BEGIN
  LOOP
INSERT INTO foo DEFAULT VALUES;
COMMIT;
RAISE EXCEPTION 'test';
  END LOOP;
EXCEPTION
  WHEN OTHERS THEN RAISE NOTICE '%', SQLERRM;
END;
$$ LANGUAGE PLPGSQL;

*) The documentation could use some work.  Would you like some help?

merlin

Re: [HACKERS] Transaction control in procedures

2017-11-15 Thread Merlin Moncure

On Wed, Nov 15, 2017 at 7:38 AM, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Tue, Nov 14, 2017 at 5:27 PM, Peter Eisentraut
>>> Can we zero in on this?  The question implied, 'can you do this
>>> without being in a transaction'?  PERFORM do_stuff() is a implicit
>>> transaction, so it ought to end when the function returns right?
>>> Meaning, assuming I was not already in a transaction when hitting this
>>> block, I would not be subject to an endless transaction duration?
>>
>> In the server, you are always in a transaction, so that's not how this
>> works.  I think this also ties into my first response above.
>
> I'll try this out myself, but as long as we can have a *bounded*
> transaction lifetime (basically the time to do stuff + 1 second) via
> something like:
> LOOP
>   
>   COMMIT;
>   PERFORM pg_sleep(1);
> END LOOP;
>
> ... I'm good. I'll try your patch out ASAP.  Thanks for answering all
> my questions.


Trying this out (v2 both patches,  compiled clean, thank you!),
postgres=# CREATE OR REPLACE PROCEDURE foo() AS
$$
BEGIN
  LOOP
PERFORM 1;
COMMIT;
RAISE NOTICE '%', now();
PERFORM pg_sleep(1);
  END LOOP;
END;
$$ LANGUAGE PLPGSQL;
CREATE PROCEDURE
Time: 0.996 ms
postgres=# call foo();
NOTICE:  2017-11-15 08:52:08.936025-06
NOTICE:  2017-11-15 08:52:08.936025-06

... I noticed that:
*) now() did not advance with commit and,
*) xact_start via pg_stat_activity did not advance

Shouldn't both of those advance with the in-loop COMMIT?

merlin

Re: pspg - psql pager

2017-11-15 Thread Merlin Moncure

On Wed, Nov 15, 2017 at 4:45 AM, Andreas Joseph Krogh
 wrote:
> På onsdag 15. november 2017 kl. 10:41:45, skrev Pavel Stehule
> :
>
> Hi
>
> I finished new pager, that can be interesting for postgresql expert users.
> Thanks for making this, "-X --no-mouse" made my day.

This was my first bugfix/suggestion :-D.

I use psql all day, every day.   PSPG IS AWESOME, IF YOU USE PSQL GET
IT AND DON'T LOOK BACK.   There are no major issues with it; it's
wonderful.  I do not advise a global definition of PAGER...it will
break other utilities (like man).  I find the best way to invoke the
functionality is via bash:

alias psql='PAGER="/home/mmoncure/pspg -s0 -X --no-mouse" psql'

Thank you Pavel.

merlin

Re: [HACKERS] Transaction control in procedures

2017-11-14 Thread Merlin Moncure

On Wed, Nov 8, 2017 at 5:48 PM, Simon Riggs  wrote:
> On 31 October 2017 at 15:38, Peter Eisentraut
>  wrote:
>> Here is a patch that implements transaction control in PL/Python
>> procedures.  (This patch goes on top of "SQL procedures" patch v1.)
>
> The patch is incredibly short for such a feature, which is probably a
> good indication that it is feasible.
>
> Amazing!

I have to agree with that.  I'm really excited about this...

Some questions:
*) Will it be possible to do operations like this in pl/pgsql?

BEGIN
  SELECT INTO r * FROM foo;

  START TRANSACTION;  -- perhaps we ought to have a special function
for this instead (BEGIN is reserved, etc).
  SET transaction_isololation TO serializable;
...

 *) Will there be any negative consequences to a procedure running
with an unbounded run time?  For example, something like:

LOOP
  SELECT check_for_stuff_to_do();

  IF stuff_to_do
  THEN
do_stuff();
  ELSE
PERFORM pg_sleep(1);
  END IF;
END LOOP;

*) Will pg_cancel_backend() cancel the currently executing statement
or the procedure? (I guess probably the procedure but I'm curious)

*) Will long running procedures be subject to statement timeout (and
does it apply to the entire procedure)?  Will they be able to control
statement_timeout from within the procedure itself?

*) Will pg_stat_activity show the invoking CALL or the currently
executing statement?  I see a strong argument for showing both of
these things. although I understand that's out of scope here.

If these questions (especially the first two) come down the correct
way, then it will mean that I can stop coding in other languages
(primarily bash) for a fairly large number of cases that I really
think belong in the database itself.  This would really simplify
coding, some things in bash are really awkward to get right such as a
mutex to guarantee single script invocation.  My only real dependency
on the operation system environment at that point would be cron to
step in to the backround daemon process (which would immediately set
an advisory lock).

I'm somewhat surprised that SPI is the point of attack for this
functionality, but if it works that's really the best case scenario
(the only downside I can see is that the various out of core pl/s have
to implement the interface individually).

merlin

99 matches

Mail list logo