from:"Shay Rojansky"

Re: Support prepared statement invalidation when result types change

2024-01-07 Thread Shay Rojansky

On Mon, Sep 18, 2023 at 1:31 PM Jelte Fennema-Nio  wrote:

> Furthermore caching RowDescription is also not super useful, most
> clients request it every time because it does not require an extra
> round trip, so there's almost no overhead in requesting it.

Just to point out, FWIW, that the .NET Npgsql driver does indeed cache
RowDescriptions... The whole point of preparation is to optimize things as
much as possible for repeated execution of the query; I get that the value
there is much lower than e.g. doing another network roundtrip, but that's
still extra work that's better off being cut if it can be.

Re: Missing docs on AT TIME ZONE precedence?

2023-11-26 Thread Shay Rojansky

>> Is there a missing line in the operator precedence table in the docs?
>
> I think the big question is whether AT TIME ZONE is significant enough
> to list there because there are many other clauses we could potentially
> add there.

Just to give more context, I'm a maintainer on Entity Framework Core (the
.NET ORM), and this caused the provider to generate incorrect SQL etc.

If you decide to not have a comprehensive operator precedence table (though
I do hope you do), I'd at least amend the "any other operator" and "all
other native and user-defined operators" to clearly indicate that some
operators aren't listed and have undocumented precedences, so implementers
can at least be aware and test the unlisted ones etc.

Missing docs on AT TIME ZONE precedence?

2023-11-26 Thread Shay Rojansky

Greeting hackers,

In the operator precedence table[1] table, AT TIME ZONE isn't explicitly
listed out; that means it's to be interpreted in the "any other operator
category".

However, it seems that the precedence of AT TIME ZONE is actually higher
than that of the addition operator:

-- Fails with "function pg_catalog.timezone(unknown, interval) does not
exist
SELECT now() + INTERVAL '14 days' AT TIME ZONE 'UTC';

-- Works:
SELECT (now() + INTERVAL '14 days') AT TIME ZONE 'UTC';

Note that missing parentheses for this were discussed in the context
of pg_catalog.pg_get_viewdef[2].

Is there a missing line in the operator precedence table in the docs?

Thanks,

Shay

[1]
https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-PRECEDENCE
[2]
https://www.postgresql.org/message-id/flat/f41566aa-a057-6628-4b7c-b48770ecb...@deepbluecap.com

Re: CREATE COLLATION must be specified

2022-10-14 Thread Shay Rojansky

> > CREATE COLLATION some_collation (LC_COLLATE = 'en-u-ks-primary',
> >  LC_CTYPE = 'en-u-ks-primary',
> >  PROVIDER = icu,
> >  DETERMINISTIC = False
> > );
> >
> > This works on PG14, but on PG15 it errors with 'parameter "locale" must
> > be specified'.
> >
> > I wanted to make sure this breaking change is intentional (it doesn't
> > seem documented in the release notes or in the docs for CREATE
COLLATION).
>
> This change is intentional, but the documentation could be improved.

I think this is still missing in the PG15 release notes (
https://www.postgresql.org/docs/15/release-15.html).

CREATE COLLATION must be specified

2022-05-28 Thread Shay Rojansky

Hi all,

Testing on the PG15 beta, I'm getting new failures when trying to create a
collation:

CREATE COLLATION some_collation (LC_COLLATE = 'en-u-ks-primary',
LC_CTYPE = 'en-u-ks-primary',
PROVIDER = icu,
DETERMINISTIC = False
);

This works on PG14, but on PG15 it errors with 'parameter "locale" must be
specified'.

I wanted to make sure this breaking change is intentional (it doesn't seem
documented in the release notes or in the docs for CREATE COLLATION).

Shay

Re: Document ordering guarantees on INSERT/UPDATE RETURNING clause

2022-02-26 Thread Shay Rojansky

> > > That seems very reasonable; if the situation is similar on PostgreSQL,
> > > then I'd suggest making that very clear in the INSERT[2] and
UPDATE[3] docs.
> >
> > There is clearly no mention of such a guarantee in our documentation.
>
> Yes, which is just how SQL works: a set doesn't have any ordering unless
an
> explicit one has been defined, RETURNING is no exception to that.

Thanks for confirming that such a guarantee doesn't exist. I would still
suggest explicitly calling that out in the docs around RETURNING, since
that seems like an understand pitfall; personally-speaking, this certainly
wasn't clear to me when first looking at it (even if it is now).

Document ordering guarantees on INSERT/UPDATE RETURNING clause

2022-02-26 Thread Shay Rojansky

Hi all,

I've seen various discussions around whether PG makes any guarantees on the
ordering of rows returned by the RETURNING clause (e.g. [1]). In a
nutshell, when executing a statement such as the following:

CREATE TABLE foo (id INT PRIMARY KEY GENERATED ALWAYS AS IDENTITY, data
INT);
INSERT INTO foo (data) VALUES (8), (9), (10) RETURNING id, data;

... us the INSERT guaranteed to return the rows (1,8), (2,9) and (3,10)
(and in that order)? This point is important when inserting multiple rows
and wanting to e.g. match a database-generated ID back to memory structures
on the client.

FWIW I've received feedback from a SQL Server engineer that one definitely
should *not* depend on such ordering there, and that future optimizations
(e.g. parallel insertion of many rows) could result in row ordering which
differs from the lexical ordering of the VALUES clause. That seems very
reasonable; if the situation is similar on PostgreSQL, then I'd suggest
making that very clear in the INSERT[2] and UPDATE[3] docs. I'd also
possibly point to the workaround of wrapping the INSERT/UPDATE in a CTE
which then defines the ordering.

Thanks,

Shay

[1]
https://stackoverflow.com/questions/5439293/is-insert-returning-guaranteed-to-return-things-in-the-right-order
[2] https://www.postgresql.org/docs/current/sql-insert.html
[3] https://www.postgresql.org/docs/current/sql-update.html

Re: Privilege required for IF EXISTS event if the object already exists

2021-12-16 Thread Shay Rojansky

> As I said before, your position seems reasonable.  I've also got a couple
of reasonable complaints about IF EXISTS out there.  But there is little
interest in changing the status quo with regards to the promises that IF
EXISTS makes. And even with my less constrained views I find that doing
anything but returning an error to a user that issues CREATE SCHEMA on a
database for which they lack CREATE privileges is of limited benefit.
Would I support a well-written patch that made this the new rule?
Probably.  Would I write one or spend time trying to convince others to
write one?  No.

Fair enough, thanks.

Re: Privilege required for IF EXISTS event if the object already exists

2021-12-16 Thread Shay Rojansky

>> Now, before creating tables, the ORM generates CREATE SCHEMA IF NOT
EXISTS, to ensure that the schema exists before CREATE TABLE; that's
reasonable general-purpose behavior.
>
> If the user hasn’t specified they want the schema created it’s arguable
that executing create schema anyway is reasonable.  The orm user should
know whether they are expected/able to create the schema as part of their
responsibilities and instruct the orm accordingly and expect it to only
create what it has been explicitly directed to create.

I think the point being missed here, is that the user isn't interacting
directly with PostgreSQL - they're doing so via an ORM which isn't
necessarily aware of everything. Yes, a switch could be added to the ORM
where the user instructs it to not ensure that the schema exists, but
that's placing unnecessary burden on the user - having the "ensure"
operation silently no-op (instead of throwing) if the schema already exists
just makes everything smoother.

Put another way, let's say I introduce a user-facing flag in my ORM to opt
out of CREATE SCHEMA IF NOT EXISTS. If the user forget to pre-create the
schema, they would still get an error when trying to create the tables
(since the schema doesn't exist). So failure to properly set up would
generate an error in any case, either when trying to create the schema (if
no flag is added), or when trying to create the table (if the flag is
added). This makes the flag pretty useless and an unnecesary extra burden
on the user, when the database could simply be ignoring CREATE SCHEMA IF
NOT EXISTS for the case where the schema already exists.

Is there any specific reason you think this shouldn't be done?

Re: Privilege required for IF EXISTS event if the object already exists

2021-12-15 Thread Shay Rojansky

> I would say it is reasonable in theory.  But I cannot think of an actual
scenario that would benefit from such a change.  Your stated use case is
rejected since you explicitly do not want tenants to be able to create
schemas - so the simple act of issuing "CREATE SCHEMA" is disallowed.
> [...]
> Because tenants are not allowed to CREATE SCHEMA you should replace
"CREATE SCHEMA" in the body of that DO block with "RAISE ERROR 'Schema foo
required but not present!';"  Or, just tell them to create objects in the
presumed present schema and let them see the "schema not found" error that
would occur in rare case the schema didn't exist.

The point here is when layers/ORMs are used, and are not necessarily aware
of the multi-tenant scenario. In my concrete real-world complaints here,
users instruct the ORM to generate the database schema for them. Now,
before creating tables, the ORM generates CREATE SCHEMA IF NOT EXISTS, to
ensure that the schema exists before CREATE TABLE; that's reasonable
general-purpose behavior (again, it does not know about multi-tenancy).
It's the user's responsibility to have already created the schema and
assigned rights to the right PG user, at which point everything could work
transparently (schema creation is skipped because it already exists, CREATE
TABLE succeeds).

Privilege required for IF EXISTS event if the object already exists

2021-12-15 Thread Shay Rojansky

Hi all,

I've received numerous complaints about CREATE SCHEMA IF NOT EXISTS failing
when the user lacks CREATE privileges on the database - even if the schema
already exists. A typical scenario would be a multi-tenant
schema-per-tenant setup, where the schema and tenant user are created
beforehand, but then some database layer or ORM wants to ensure that the
schema is there so the above is issued.

Would it be reasonable to have the above no error if the schema already
exists? That would make it similar to the following (which I'm switching to
in the Entity Framework Core ORM):

DO $$
BEGIN
IF NOT EXISTS(SELECT 1 FROM pg_namespace WHERE nspname = 'foo') THEN
CREATE SCHEMA "foo";
END IF;
END $$;

The same could apply to other CREATE ... IF NOT EXISTS variations.

Shay

Re: Should AT TIME ZONE be volatile?

2021-11-11 Thread Shay Rojansky

> Yeah, it's not clear that forbidding this would make anyone's life any
> better.  If you want an index on the UTC equivalent of a local time,
> you're going to have to find a way to cope with potential mapping
> changes.  But refusing to let you use a generated column doesn't
> seem to help that.

Makes sense, thanks Tom.

Re: Should AT TIME ZONE be volatile?

2021-11-10 Thread Shay Rojansky

> > It seems that PostgreSQL 14 allows using the AT TIME ZONE operator
within
> > generated column definitions; according to the docs, that means the
> > operator is considered immutable. However, unless I'm mistaken, the
result
> > of AT TIME ZONE depends on the time zone database, which is external and
> > can change. I think that means that generated column data can become
> > out-of-date upon tz database changes.
>
> Yeah, we generally don't take such hazards into account.  The poster
> child here is that if we were strict about this, text comparisons
> couldn't be immutable, because the underlying collation rules can
> (and do) change from time to time.  That's obviously unworkable.

Thanks for the explanation Tom. I get the logic, though I think there may
be a difference between "dependent on external rules which may
theoretically change but almost never actually do" and "dependent on
something that really does change frequently"... Countries really do change
their daylight savings quite frequently, whereas I'm assuming collation
rules are relatively immutable and changes are very rare.

> I'm not sure how big a deal this really is for timestamps.  The actual
> stored time is either UTC or local time, and those are generally pretty
> well-defined.  If you make the wrong choice of which one to store for
> your use-case, you might be unhappy.

The example I'm working with, is storing a user-provided local timestamp
and time zone ID, but also having an index generated column in UTC, to be
able to order all rows on the global timeline regardless of time zone (see this
blog post

by Jon Skeet for some context). If the time zone database changes after the
generated column is computed, the UTC timestamp is out of sync with regards
to the reality. This seems unsafe.

On the other hand, it could be argued that this should be allowed, and that
it should be the user's responsibility to update generated columns when the
time zone database changes (or periodically, or whatever). Users always
have the option to define a trigger anyway, so we may as well make this
easier via a generated column.

In any case, if this is the intended behavior, no problem - I was a bit
surprised by it, and found the difference with SQL Server interesting.

Shay

Should AT TIME ZONE be volatile?

2021-11-10 Thread Shay Rojansky

Greetings hackers.

It seems that PostgreSQL 14 allows using the AT TIME ZONE operator within
generated column definitions; according to the docs, that means the
operator is considered immutable. However, unless I'm mistaken, the result
of AT TIME ZONE depends on the time zone database, which is external and
can change. I think that means that generated column data can become
out-of-date upon tz database changes.

Sample table creation DDL:

CREATE TABLE events (
id integer PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
local_timestamp timestamp without time zone NOT NULL,
utc_timestamp timestamp with time zone GENERATED ALWAYS AS
(local_timestamp AT TIME ZONE time_zone_id) STORED,
time_zone_id text NULL
);

For comparison, SQL Server does consider AT TIME ZONE to be
non-deterministic, and therefore does not allow it in stored generated
columns (it does allow it in non-stored ones).

Shay

starts_with, ^@ and index usage

2021-10-09 Thread Shay Rojansky

Greetings hackers,

I'm seeing some odd behavior around string prefix searching -
hopefully I've missed something here (thanks to Nino Floris for
originally flagging this).

In PostgreSQL 11, a starts_with function and a ^@ operators were added
for string prefix checking, as an alternative to LIKE 'foo%' [1] [2].
I've ran a few scenarios and have seen the following behavior:

Queries tested:

1. EXPLAIN SELECT * FROM data WHERE name LIKE 'foo10%';
2. EXPLAIN SELECT * FROM data WHERE name ^@ 'foo10';
3. EXPLAIN SELECT * FROM data WHERE starts_with(name, 'foo10');

... running against a table with 500k rows and enable_seqscan turned
off. Results:

Index  | Operator class   | LIKE 'X%' | ^@| starts_with
-- |  | - | - | ---
btree  | text_ops | Parallel seq scan | Parallel seq scan | Seq scan
btree  | text_pattern_ops | Index scan| Parallel seq scan | Seq scan
spgist |  | Index scan| Index Scan| Seq scan

First, starts_with doesn't seem to use SP-GIST indexes, contrary to
the patch description (and also doesn't trigger a parallel seq scan) -
is this intentional? The function is listed front-and-center on the
string functions and operators page[3], and receives mention on the
pattern matching page[4], without any mention of it being so
problematic.

Note that ^@ isn't documented on the string functions and operators,
so it's not very discoverable; if added to the docs, I'd recommend
adding a note on SP-GIST being required, since uninformed new users
would probably expect a default btree index to work as well.

Shay

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=710d90da1fd8c1d028215ecaf7402062079e99e9
[2] 
https://www.postgresql.org/message-id/flat/03300255-cff2-b508-50f4-f00cca0a57a1%40sigaev.ru#38d2020edf92f96d204cd2679d362c38
[3] https://www.postgresql.org/docs/current/functions-string.html
[4] https://www.postgresql.org/docs/current/functions-matching.html

Making cancellations safe

2020-11-04 Thread Shay Rojansky

Hi all.

Back in 2016 I started a thread about making cancellations safer[1], I'd
like to try to pick this up again. Here a summary of the previous
conversation:

The main ask here is to allow clients to specify which command to cancel,
to avoid various race conditions where the wrong command is accidentally
cancelled. The implementation proposal by Tom was for the client and server
to simply count messages on each side, and to modify the CancelRequest
message for the client to integrate the requested sequence number to be
cancelled.

The CancelRequest format change can probably happen in a non-breaking way,
e.g. by defining a new cancel request code (the 1234/5678 at the beginning
of the message). The client could know that the server is able to accept
the new CancelRequest format via a minor protocol bump, or some other means.

There was also discussion of allowing "cancel up to" semantics, where all
messages up to the provided sequence number are cancelled. While this could
be useful, it's important to allow setting a lower bound as well. A new
experimental mode in the Npgsql driver "multiplexes" queries from unrelated
threads into the same physical connection, possibly pipelining two
unrelated queries go into a single outgoing TCP packet. In this case, the
second query producer may issue a cancellation, and unintentionally cancel
the first query which is still in progress. So ideally the CancelRequest
would contain both a lower and an upper bound (with the lower being
optional).

Since cancellation requests may arrive before their target queries do, the
backend should be able to track unfulfilled requests and apply them as soon
as the target query is received.

Finally, there was also an orthogonal discussion about widening the cancel
key (currently 32-bit). However, that seems like it would be a protocol
breaking change, so possibly better treated separately.

[1]
https://www.postgresql.org/message-id/flat/CADT4RqDh1CEgz7QgKwYSLT9TMCk7O%3DncauUaSQKVt_nPNTE9wQ%40mail.gmail.com#00832c72f4b93d57d6b0ac59de8eca85

Re: dynamic result sets support in extended query protocol

2020-10-20 Thread Shay Rojansky

Very interesting conversation, thanks for including me Dave. Here are some
thoughts from the Npgsql perspective,

Re the binary vs. text discussion... A long time ago, Npgsql became a
"binary-only" driver, meaning that it never sends or receives values in
text encoding, and practically always uses the extended protocol. This was
because in most (all?) cases, encoding/decoding binary is more efficient,
and maintaining two encoders/decoders (one for text, one for binary) made
less and less sense. So by default, Npgsql just requests "all binary" in
all Bind messages it sends (there's an API for the user to request text, in
which case they get pure strings which they're responsible for parsing).
Binary handling is implemented for almost all PG types which support it,
and I've hardly seen any complaints about this for the last few years. I'd
be interested in any arguments against this decision (Dave, when have you
seen that decoding binary is slower than decoding text?).

Given the above, allowing the client to specify in advance which types
should be in binary sounds good, but wouldn't help Npgsql much (since by
default it already requests binary for everything). It would slightly help
in allowing binary-unsupported types to automatically come back as text
without manual user API calls, but as I wrote above this is an extremely
rare scenario that people don't care much about.

> Is there really a good reason for forcing the client to issue NextResult,
Describe, Execute for each of the dynamic result sets?

I very much agree - it should be possible to execute a procedure and
consume all results in a single roundtrip, otherwise this is quite a perf
killer.

Peter, from your original message:

> Following the model from the simple query protocol, CommandComplete
really means one result set complete, not the whole top-level command.
ReadyForQuery means the whole command is complete. This is perhaps
debatable, and interesting questions could also arise when considering what
should happen in the simple query protocol when a query string consists of
multiple commands each returning multiple result sets. But it doesn't
really seem sensible to cater to that

Npgsql implements batching of multiple statements via the extended protocol
in a similar way. In other words, the .NET API allows users to pack
multiple SQL statements and execute them in one roundtrip, and Npgsql does
this by sending
Parse1/Bind1/Describe1/Execute1/Parse2/Bind2/Describe2/Execute2/Sync. So
CommandComplete signals completion of a single statement in the batch,
whereas ReadyForQuery signals completion of the entire batch. This means
that the "interesting questions" mentioned above are possibly relevant to
the extended protocol as well.

Re: Error on failed COMMIT

2020-03-30 Thread Shay Rojansky

Apologies for not responding earlier, busy times.


Fourth, it is not clear how many applications would break if COMMIT
>> started issuing an error rather than return success a with ROLLBACK tag.
>> Certainly SQL scripts would be fine.  They would have one additional
>> error in the script output, but if they had ON_ERROR_STOP enabled, they
>> would have existed before the commit.  Applications that track statement
>> errors and issue rollbacks will be fine.  So, we are left with
>> applications that issue COMMIT and expect success after a transaction
>> block has failed.  Do we know how other database systems handle this?
>>
>
> Well I know pgjdbc handles my patch fine without any changes to the code
> As I mentioned upthread 2 of the 3 go drivers already error if rollback is
> returned. 1 of them does not.
>
> I suspect npgsql would be fine. Shay ?
>

Npgsql would be fine. In fact, Npgsql doesn't have any specific
expectations nor any specific logic around commit; it assumes errors may be
returned for any command (COMMIT or otherwise), and surfaces those errors
as .NET exceptions. The transaction status is tracked via CommandComplete
only, and as mentioned several times, PostgreSQL can already error on
commit for various other reasons (e.g. deferred constraint checks). This
direction makes a lot of sense to me.

Creating a database with a non-predefined collation

2020-03-29 Thread Shay Rojansky

Greetings hackers.

While working on first-class support for PG collations in the Entity
Framework Core ORM, I've come across an interesting problem: it doesn't
seem to be possible to create a database with a collation that isn't
predefined, and there doesn't seem to be a way to add to that list. I'm
specifically looking into creating a database with an ICU collation.

I've seen the discussion in
https://www.postgresql.org/message-id/flat/99faa8eb-9de2-8ec0-0a25-1ad1276167cc%402ndquadrant.com,
though to my understanding, that is about importing ICU rather than libc
predefined collations, and not adding an arbitrary collation to the list
from which databases can be created.

This seems particularly problematic as a database collation cannot be
altered once created, leading to an odd chicken-and-egg problem. My initial
expectation was for collations in the template database to be taken into
account, but that doesn't seem to be the case.

Finally, just a word to say that better support for non-deterministic
collations would be greatly appreciated - specifically LIKE support (though
I realize that isn't trivial). At the moment their actual usefulness seems
somewhat limited because of this.

Thanks,

Shay

Re: Error on failed COMMIT

2020-02-24 Thread Shay Rojansky

>> If we think the current *user-visible* behavior is problematic (commit
on failed transaction completes without throwing), then the only remaining
question is where this behavior should be fixed - at the server or at the
driver. As I wrote above, from the user's perspective it makes no
difference - the change would be identical (and just as breaking) either
way. So while drivers *could* implement the new behavior, what advantages
would that have over doing it at the server? Some disadvantages do seem
clear (repetition of the logic across each driver - leading to
inconsistency across drivers, changing semantics at the driver by turning a
non-error into an exception...).
>
> The advantage is that it doesn't cause a compatibility break.

I think it's very important to expand the reasoning here from "server and
client" to "server, drivers, users". As I wrote above, changing this
behavior in a driver is just as much a compatibility break for any user of
that driver, as a server change; it's true that PostgreSQL would not be
"responsible" ot "at fault" but rather the driver writer, but as far as
angry users go there's very little difference. A break is a break, whether
it happens because of a PostgreSQL change, or because of a .NET/Java driver
change.

> 2. It would be better to fix the driver than the server because this
behavior is very old and there are probably many applications (and perhaps
some drivers) that rely on it, and changing the server would break them.

As above, if Dave and I make this change in the JDBC driver and/or Npgsql,
all applications relying on the previous behavior would be just as broken.

>> If we are assuming that most user code is already written to avoid
committing on failed transactions (by tracking transaction state etc.),
then making this change at the server wouldn't affect those applications;
the only applications affected would be those that do commit on failed
transactions today, and it could be argued that those are likely to be
broken today (since drivers today don't really expose the rollback in an
accessible/discoverable way).
>
> libpq exposes it just fine, so I think you're overgeneralizing here.

The question is more whether typical user applications are actually
checking for rollback-on-commit, not whether they theoretically can. An
exception is something you have to actively swallow to ignore; an
additional returned status saying "hey, this didn't actually commit" is
extremely easy to ignore unless you've specifically been aware of the
situation.

Even so, a quick look at psycopg and Ruby (in addition to JDBC and .NET),
commit APIs generally don't return anything - this is just how the API
abstractions are, probably because across databases nothing like that is
needed (the expectation is for a non-throwing commit to imply that the
commit occurred).

Shay

On Mon, Feb 24, 2020 at 2:34 PM Robert Haas  wrote:

> On Mon, Feb 24, 2020 at 1:56 PM Shay Rojansky  wrote:
> > As Dave wrote, the problem here isn't with the driver, but with
> framework or user-code which swallows the initial exception and allows code
> to continue to the commit. Npgsql (and I'm sure the JDBC driver too) does
> surface PostgreSQL errors as exceptions, and internally tracks the
> transaction status provided in the CommandComplete message. That means
> users have the ability - but not the obligation - to know about failed
> transactions, and some frameworks or user coding patterns could lead to a
> commit being done on a failed transaction.
>
> Agreed. All of that can be fixed in the driver, though.
>
> > If we think the current *user-visible* behavior is problematic (commit
> on failed transaction completes without throwing), then the only remaining
> question is where this behavior should be fixed - at the server or at the
> driver. As I wrote above, from the user's perspective it makes no
> difference - the change would be identical (and just as breaking) either
> way. So while drivers *could* implement the new behavior, what advantages
> would that have over doing it at the server? Some disadvantages do seem
> clear (repetition of the logic across each driver - leading to
> inconsistency across drivers, changing semantics at the driver by turning a
> non-error into an exception...).
>
> The advantage is that it doesn't cause a compatibility break.
>
> > > Well, it seems quite possible that there are drivers and applications
> that don't have this issue; I've never had a problem with this behavior,
> and I've been using PostgreSQL for something like two decades [...]
> >
> > If we are assuming that most user code is already written to avoid
> committing on failed transactions (by tracking transaction state etc.),
> then making this change at the server wouldn't affect those applications;
> the o

Re: Error on failed COMMIT

2020-02-24 Thread Shay Rojansky

> First, to repeat what I said before, the COMMIT only returns a ROLLBACK
command tag if there's been a previous ERROR. So, if you haven't ignored
the prior ERROR, you should be fine. [...]
> I am really struggling to see why this is anything but a bug in the JDBC
driver

As Dave wrote, the problem here isn't with the driver, but with framework
or user-code which swallows the initial exception and allows code to
continue to the commit. Npgsql (and I'm sure the JDBC driver too) does
surface PostgreSQL errors as exceptions, and internally tracks the
transaction status provided in the CommandComplete message. That means
users have the ability - but not the obligation - to know about failed
transactions, and some frameworks or user coding patterns could lead to a
commit being done on a failed transaction.

> So, if you haven't ignored the prior ERROR, you should be fine. Second,
there's nothing to keep the driver itself from translating ROLLBACK into an
exception, if that's more convenient for some particular driver. [...]

This is the main point here IMHO, and I don't think it's a question of
convenience, or of behavior that should vary across drivers.

If we think the current *user-visible* behavior is problematic (commit on
failed transaction completes without throwing), then the only remaining
question is where this behavior should be fixed - at the server or at the
driver. As I wrote above, from the user's perspective it makes no
difference - the change would be identical (and just as breaking) either
way. So while drivers *could* implement the new behavior, what advantages
would that have over doing it at the server? Some disadvantages do seem
clear (repetition of the logic across each driver - leading to
inconsistency across drivers, changing semantics at the driver by turning a
non-error into an exception...).

> Well, it seems quite possible that there are drivers and applications
that don't have this issue; I've never had a problem with this behavior,
and I've been using PostgreSQL for something like two decades [...]

If we are assuming that most user code is already written to avoid
committing on failed transactions (by tracking transaction state etc.),
then making this change at the server wouldn't affect those applications;
the only applications affected would be those that do commit on failed
transactions today, and it could be argued that those are likely to be
broken today (since drivers today don't really expose the rollback in an
accessible/discoverable way).

Shay

On Mon, Feb 24, 2020 at 3:31 AM Robert Haas  wrote:

> On Sun, Feb 23, 2020 at 11:11 AM Shay Rojansky  wrote:
> > I'd like to second Dave on this, from the .NET perspective - actual
> client access is done via standard drivers in almost all cases, and these
> drivers generally adhere to database API abstractions (JDBC for Java,
> ADO.NET for .NET, and so on). AFAIK, in almost all such abstractions,
> commit can either complete (implying success) or throw an exception - there
> is no third way to return a status code. It's true that a driver may expose
> NOTICE/WARNING messages via some other channel (Npgsql emits .NET events
> for these), but this is a separate message "channel" that is disconnected
> API-wise from the commit; this makes the mechanism very "undiscoverable".
>
> I'm still befuddled here. First, to repeat what I said before, the
> COMMIT only returns a ROLLBACK command tag if there's been a previous
> ERROR. So, if you haven't ignored the prior ERROR, you should be fine.
> Second, there's nothing to keep the driver itself from translating
> ROLLBACK into an exception, if that's more convenient for some
> particular driver. Let's go back to Bernhard's example upthred:
>
> composeTransaction() {
> Connection con = getConnection(); // implicitly "begin"
> try {
>insertFrameworkLevelState(con);
>insertApplicationLevelState(con);
>con.commit();
>publishNewState();
> } catch (Throwable ex) {
>con.rollback();
> }
> }
>
> If insertFrameworkLevelState() or insertApplicationLevelState()
> perform database operations that fail, then an exception should be
> thrown and we should end up at con.rollback(), unless there is an
> internal catch block inside those functions that swallows the
> exception, or unless the JDBC driver ignores the error from the
> server. If those things succeed, then COMMIT could still fail with an
> ERROR but it shouldn't return ROLLBACK. But, for extra security,
> con.commit() could be made to throw an exception if the command tag
> returned by COMMIT is not COMMIT. It sounds like Dave doesn't want to
> do that, but it would solve this problem without requiring a server
> behavior change.
>
> Actually, an even better idea might be to make the driver error out
> when the tr

Re: Error on failed COMMIT

2020-02-22 Thread Shay Rojansky

On Fri, 14 Feb 2020 at 14:37, Robert Haas  wrote:
>
>> On Fri, Feb 14, 2020 at 2:08 PM Dave Cramer 
>> wrote:
>> > Well now you are asking the driver to re-interpret the results in a
>> different way than the server which is not what we tend to do.
>> >
>> > The server throws an error we throw an error. We really aren't in the
>> business of re-interpreting the servers responses.
>>
>> I don't really see a reason why the driver has to throw an exception
>> if and only if there is an ERROR on the PostgreSQL side. But even if
>> you want to make that rule for some reason, it doesn't preclude
>> correct behavior here. All you really need is to have con.commit()
>> return some indication of what the command tag was, just as, say, psql
>> would do. If the server provides you with status information and you
>> throw it out instead of passing it along to the application, that's
>> not ideal.
>>
>
> Well con.commit() returns void :(
>

I'd like to second Dave on this, from the .NET perspective - actual client
access is done via standard drivers in almost all cases, and these drivers
generally adhere to database API abstractions (JDBC for Java, ADO.NET for
.NET, and so on). AFAIK, in almost all such abstractions, commit can either
complete (implying success) or throw an exception - there is no third way
to return a status code. It's true that a driver may expose NOTICE/WARNING
messages via some other channel (Npgsql emits .NET events for these), but
this is a separate message "channel" that is disconnected API-wise from the
commit; this makes the mechanism very "undiscoverable".

In other words, if we do agree that there are some legitimate cases where a
program may end up executing commit on a failed transaction (e.g. because
of a combination of framework and application code), and we think that a
well-written client should be aware of the failed transaction and behave in
an exceptional way around a non-committing commit, then I think that's a
good case for a server-side change:

   - Asking drivers to do this at the client have the exact same breakage
   impact as the server change, since the user-visible behavior changes in the
   same way (the change is just shifted from server to driver). What's worse
   is that every driver now has to reimplement the same new logic, and we'd
   most probably end up with some drivers doing it in some languages, and
   others not doing it in others (so behavioral differences).
   - Asking end-users (i.e. application code) to do this seems even worse,
   as every user/application in the world now has to be made somehow aware of
   a somewhat obscure and very un-discoverable situation.

Shay

Re: strpos behavior change around empty substring in PG12

2019-10-29 Thread Shay Rojansky

Thanks for the quick turnaround!

Tom Lane  schrieb am Mo., 28. Okt. 2019, 16:57:

> Robert Haas  writes:
> > On Mon, Oct 28, 2019 at 11:02 AM Shay Rojansky  wrote:
> >> Before PG12, select strpos('test', '') returns 1 (empty substring found
> at first position of the string), whereas starting with PG12 it returns 0
> (empty substring not found).
>
> > It looks to me like this got broken here:
>
> > commit 9556aa01c69a26ca726d8dda8e395acc7c1e30fc
> > Author: Heikki Linnakangas 
> > Date:   Fri Jan 25 16:25:05 2019 +0200
> > Use single-byte Boyer-Moore-Horspool search even with multibyte
> encodings.
>
> > Not sure what happened exactly.
>
> I think the problem is lack of clarity about the edge cases.
> The patch added this short-circuit right at the top of text_position():
>
> +   if (VARSIZE_ANY_EXHDR(t1) < 1 || VARSIZE_ANY_EXHDR(t2) < 1)
> +   return 0;
>
> and as this example shows, that's the Wrong Thing.  Fortunately,
> it also seems easily fixed.
>
> regards, tom lane
>

strpos behavior change around empty substring in PG12

2019-10-28 Thread Shay Rojansky

Greetings hackers,

Before PG12, select strpos('test', '') returns 1 (empty substring found at
first position of the string), whereas starting with PG12 it returns 0
(empty substring not found).

Is this behavior change intentional? If so, it doesn't seem to be
documented in the release notes...

First raised by Austin Drenski in
https://github.com/npgsql/efcore.pg/pull/1068#issuecomment-546795826

Thanks,

Shay

Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-08-01 Thread Shay Rojansky

Tom,

> I have in fact committed that patch.  It won't do anything for your
> problem with respect to existing installations that may have picked
>"localtime", but it'll at least prevent new initdb runs from picking
> that.

Thanks! At least over time the problem will hopefully diminish.

Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-08-01 Thread Shay Rojansky

> I'm not sure we're any closer to a meeting of the minds on whether
> consulting zone[1970].tab is a good thing to do, but we got an actual
> user complaint[1] about how "localtime" should not be a preferred
> spelling.  So I want to go ahead and insert the discussed anti-preference
> against "localtime" and "posixrules", as per 0001 below.  If we do do
> something with zone[1970].tab, we'd still need these special rules,
> so I don't think this is blocking anything.

Just want to stress this point from a PostgreSQL driver maintainer
perspective (see here[1] for the full details). Having "localtime" as the
PostgreSQL timezone basically means that the timezone is completely opaque
from a client point of view - there is no way for clients to know what
actual timezone the server is in, and react to that. This is a limiting
factor in client development, I hope a consensus on this specific point can
be reached.

[1]
https://www.postgresql.org/message-id/cadt4rqccnj6fklisvt8ttpftp4azphhdfjqdf1jfbboh5w4...@mail.gmail.com

Re: "localtime" value in TimeZone

2019-07-25 Thread Shay Rojansky

> Yeah, this is something that some tzdb packagers do --- they put a
> "localtime" file into /usr/share/zoneinfo that is a symlink or hard link
> to the active zone file, and then initdb tends to seize on that as being
> the shortest available spelling of the active zone.

I see, I wasn't aware that this was a distribution-level mechanism - I
understand the situation better now.

I followed the conversations you linked to, and disagreements seem to be
mostly about other aspects of timezone selection. Does it make sense to
have a limited, restricted conversation specifically about avoiding
"localtime"?

"localtime" value in TimeZone

2019-07-24 Thread Shay Rojansky

Greetings everyone.

In (certain) out-of-the-box PostgreSQL installations, the timezone GUC is
set to "localtime", which seems to mean to query the OS for the value.
Unless I'm mistaken, the issue with this is that it doesn't allow clients
inspecting the TimeZone GUC to actually know what timezone the server is
in, making the GUC largely useless (and creates friction as the GUC can't
be expected to always contain valid IANA/Olson values). It would be more
useful if PostgreSQL exposed the actual timezone provided by the OS.

Does this make sense?

As a side note, there doesn't seem to be any specific documentation on the
special "localtime" value of this GUC (e.g.
https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES
).

Shay

Re: Proposal to add GUC_REPORT to lc_monetary, lc_numeric and search_path

2019-07-05 Thread Shay Rojansky

> The latter is important for similar reasons. JDBC caches prepared
statements internally and if the user changes the search path without using
setSchema or uses a function to change it then internally it would be
necessary to invalidate the cache. Currently if this occurs these
statements fail.

While Npgsql specifically doesn't care about any locale/formatting (being a
binary-only driver), knowing about search_path changes would benefit Npgsql
in the same way as it would JDBC.

> This seems like a rather innocuous change as the protocol is not changed,
rather the amount of information returned on startup is increased
marginally.

Although adding these specific parameters are easy to add, we could also
think of a more generic way for clients to subscribe to parameter updates
(am not sure if this was previously discussed - I cannot see anything
obvious in the wiki TODO page). At its simplest, this could be a new
parameter containing a comma-separated list of parameters for which
asynchronous updates should be sent.  This new parameter would default to
the current hard-coded list (as documented in
https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-ASYNC).
Unless I'm mistaken, one issue (as in general with new parameters) is that
drivers wouldn't be able to send this new parameter in the startup package
because they don't yet know whether they're talking to a PostgreSQL version
which supports it.

Re: [PATCH] Allow UNLISTEN during recovery

2019-01-25 Thread Shay Rojansky

>
> > Thanks for insisting - I ended up setting up the environment and running
> > the tests, and discovering that some test-related changes were missing.
> > Here's a 3rd version of the patch. Hope this is now in good shape, let me
> > know if you think anything else needs to be done.
>
> Lotta work for a one-line code change, eh?  Pushed now.
>

Yes, especially when you're new to building and testing PostgreSQL :)
Thanks for accepting!

Re: [PATCH] Allow UNLISTEN during recovery

2019-01-16 Thread Shay Rojansky

>
> On Tue, Jan 15, 2019 at 10:17 AM Shay Rojansky  wrote:
> > Unfortunately I'm extremely tight for time at the moment and don't have
> time to do the appropriate hot standby setup to test this... As the patch
> is pretty straightforward, and since I'm hoping you guys execute the tests
> on your end, I hope that's acceptable. If it's absolutely necessary for me
> to test the patch locally, let me know and I'll do so.
>
> I would definitely prefer if you could run the tests. You might
> discover that your patch does not sufficiently address tests. Please
> do so and confirm here that it works for you and then I can do another
> review.
>

Thanks for insisting - I ended up setting up the environment and running
the tests, and discovering that some test-related changes were missing.
Here's a 3rd version of the patch. Hope this is now in good shape, let me
know if you think anything else needs to be done.
From c5497cc4f7fbad4eafecfa72bc99dfebacd0f9bd Mon Sep 17 00:00:00 2001
From: Shay Rojansky 
Date: Tue, 15 Jan 2019 18:49:40 +0100
Subject: [PATCH] Allow UNLISTEN during recovery

---
 doc/src/sgml/high-availability.sgml  | 16 ++--
 src/backend/tcop/utility.c   |  2 +-
 src/test/regress/expected/hs_standby_allowed.out |  3 +++
 .../regress/expected/hs_standby_disallowed.out   |  4 
 src/test/regress/sql/hs_standby_allowed.sql  |  4 
 src/test/regress/sql/hs_standby_disallowed.sql   |  2 --
 6 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 4882b20828..bb1c86f73e 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1767,6 +1767,11 @@ if (!triggered)
Plugins and extensions - LOAD
   
  
+ 
+  
+   UNLISTEN
+  
+ 
 

 
@@ -1856,18 +1861,17 @@ if (!triggered)
  
  
   
-   LISTEN, UNLISTEN, NOTIFY
+   LISTEN, NOTIFY
   
  
 

 

-In normal operation, read-only transactions are allowed to
-use LISTEN, UNLISTEN, and
-NOTIFY, so Hot Standby sessions operate under slightly tighter
-restrictions than ordinary read-only sessions.  It is possible that some
-of these restrictions might be loosened in a future release.
+In normal operation, read-only transactions are allowed to use
+LISTEN, and NOTIFY, so Hot Standby sessions
+operate under slightly tighter restrictions than ordinary read-only sessions.
+It is possible that some of these restrictions might be loosened in a future release.

 

diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 27ae6be751..44136060d6 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -629,7 +629,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			{
 UnlistenStmt *stmt = (UnlistenStmt *) parsetree;
 
-PreventCommandDuringRecovery("UNLISTEN");
+/* allow UNLISTEN during recovery, which is a noop */
 CheckRestrictedOperation("UNLISTEN");
 if (stmt->conditionname)
 	Async_Unlisten(stmt->conditionname);
diff --git a/src/test/regress/expected/hs_standby_allowed.out b/src/test/regress/expected/hs_standby_allowed.out
index 526f88f2be..00b8faf9eb 100644
--- a/src/test/regress/expected/hs_standby_allowed.out
+++ b/src/test/regress/expected/hs_standby_allowed.out
@@ -208,6 +208,9 @@ LOCK hs1 IN ACCESS SHARE MODE;
 LOCK hs1 IN ROW SHARE MODE;
 LOCK hs1 IN ROW EXCLUSIVE MODE;
 COMMIT;
+-- UNLISTEN
+UNLISTEN a;
+UNLISTEN *;
 -- LOAD
 -- should work, easier if there is no test for that...
 -- ALLOWED COMMANDS
diff --git a/src/test/regress/expected/hs_standby_disallowed.out b/src/test/regress/expected/hs_standby_disallowed.out
index bc117413ff..dff0953e9a 100644
--- a/src/test/regress/expected/hs_standby_disallowed.out
+++ b/src/test/regress/expected/hs_standby_disallowed.out
@@ -118,10 +118,6 @@ listen a;
 ERROR:  cannot execute LISTEN during recovery
 notify a;
 ERROR:  cannot execute NOTIFY during recovery
-unlisten a;
-ERROR:  cannot execute UNLISTEN during recovery
-unlisten *;
-ERROR:  cannot execute UNLISTEN during recovery
 -- disallowed commands
 ANALYZE hs1;
 ERROR:  cannot execute ANALYZE during recovery
diff --git a/src/test/regress/sql/hs_standby_allowed.sql b/src/test/regress/sql/hs_standby_allowed.sql
index a33199dbbd..6debddc5e9 100644
--- a/src/test/regress/sql/hs_standby_allowed.sql
+++ b/src/test/regress/sql/hs_standby_allowed.sql
@@ -110,6 +110,10 @@ LOCK hs1 IN ROW SHARE MODE;
 LOCK hs1 IN ROW EXCLUSIVE MODE;
 COMMIT;
 
+-- UNLISTEN
+UNLISTEN a;
+UNLISTEN *;
+
 -- LOAD
 -- should work, easier if there is no test for that...
 
diff --git a/src/test/regress/sql/hs_standby_disallowed.sql b/src/test/regress/sql/hs_standby_disallowed.sql
index 21bbf526b7..a470600eec 100644
--- a/src/test/regress/sql/hs_standby_disallowed.sql
+++ b/src

Re: [PATCH] Allow UNLISTEN during recovery

2019-01-15 Thread Shay Rojansky

Mitar, thanks for giving this your attention!

So patch looks good to me, but documentation changes are missing from it.
> UNLISTEN should be removed from the list of commands not allowed and moved
> to the list of those which are allowed [1]. Moreover,
> src/test/regress/sql/hs_standby_allowed.sql and
> src/test/regress/sql/hs_standby_disallowed.sql tests should be updated
> based on these changes in the patch. What is surprising to me is that make
> check-world does not fail with this change, but there is an explicit check
> for UNLISTEN *. So does this mean those tests are not run or does it mean
> that this patch does not fix the problem?
>

I've made the requested changes to the docs and to the regression tests.

I think that specifically the standby regression tests do not get executed
by check-world - see section
https://www.postgresql.org/docs/current/regress-run.html#id-1.6.20.5.8. I'm
guessing this should be executed as part of the build verification pipeline
for patches, but I don't know anything about the PostgreSQL build
infrastructure.

Unfortunately I'm extremely tight for time at the moment and don't have
time to do the appropriate hot standby setup to test this... As the patch
is pretty straightforward, and since I'm hoping you guys execute the tests
on your end, I hope that's acceptable. If it's absolutely necessary for me
to test the patch locally, let me know and I'll do so.
From 8c816354e820bf3d0be69d55dbf0052b1d27feeb Mon Sep 17 00:00:00 2001
From: Shay Rojansky 
Date: Tue, 15 Jan 2019 18:49:40 +0100
Subject: [PATCH] Allow unlisten when in hot standby:wq

---
 doc/src/sgml/high-availability.sgml| 16 ++--
 src/backend/tcop/utility.c |  2 +-
 src/test/regress/sql/hs_standby_allowed.sql|  4 
 src/test/regress/sql/hs_standby_disallowed.sql |  2 --
 4 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 4882b20828..bb1c86f73e 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1767,6 +1767,11 @@ if (!triggered)
Plugins and extensions - LOAD
   
  
+ 
+  
+   UNLISTEN
+  
+ 
 

 
@@ -1856,18 +1861,17 @@ if (!triggered)
  
  
   
-   LISTEN, UNLISTEN, NOTIFY
+   LISTEN, NOTIFY
   
  
 

 

-In normal operation, read-only transactions are allowed to
-use LISTEN, UNLISTEN, and
-NOTIFY, so Hot Standby sessions operate under slightly tighter
-restrictions than ordinary read-only sessions.  It is possible that some
-of these restrictions might be loosened in a future release.
+In normal operation, read-only transactions are allowed to use
+LISTEN, and NOTIFY, so Hot Standby sessions
+operate under slightly tighter restrictions than ordinary read-only sessions.
+It is possible that some of these restrictions might be loosened in a future release.

 

diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 27ae6be751..44136060d6 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -629,7 +629,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			{
 UnlistenStmt *stmt = (UnlistenStmt *) parsetree;
 
-PreventCommandDuringRecovery("UNLISTEN");
+/* allow UNLISTEN during recovery, which is a noop */
 CheckRestrictedOperation("UNLISTEN");
 if (stmt->conditionname)
 	Async_Unlisten(stmt->conditionname);
diff --git a/src/test/regress/sql/hs_standby_allowed.sql b/src/test/regress/sql/hs_standby_allowed.sql
index a33199dbbd..1f1cdf5a00 100644
--- a/src/test/regress/sql/hs_standby_allowed.sql
+++ b/src/test/regress/sql/hs_standby_allowed.sql
@@ -110,6 +110,10 @@ LOCK hs1 IN ROW SHARE MODE;
 LOCK hs1 IN ROW EXCLUSIVE MODE;
 COMMIT;
 
+-- UNLISTEN
+unlisten a;
+unlisten *;
+
 -- LOAD
 -- should work, easier if there is no test for that...
 
diff --git a/src/test/regress/sql/hs_standby_disallowed.sql b/src/test/regress/sql/hs_standby_disallowed.sql
index 21bbf526b7..a470600eec 100644
--- a/src/test/regress/sql/hs_standby_disallowed.sql
+++ b/src/test/regress/sql/hs_standby_disallowed.sql
@@ -88,8 +88,6 @@ COMMIT;
 -- Listen
 listen a;
 notify a;
-unlisten a;
-unlisten *;
 
 -- disallowed commands
 
-- 
2.19.1

[PATCH] Allow UNLISTEN during recovery

2018-11-18 Thread Shay Rojansky

Hi all,

Here is a tiny patch removing PreventCommandDuringRecovery() for UNLISTEN.
See previous discussion in
https://www.postgresql.org/message-id/CADT4RqBweu7QKRYAYzeRW77b%2BMhJdUikNe45m%2BfL4GJSq_u2Fg%40mail.gmail.com
.

In a nutshell, this prevents an error being raised when UNLISTEN is issued
during recovery. The operation is a no-op (since LISTEN is still
disallowed). This logic here is that some clients (namely Npgsql) issue
UNLISTEN * to clear connection state (in the connection pool), but this
needlessly breaks when the backend is in recovery.

On a related note, there currently doesn't seem to be a good way for
clients to know whether the backend is in recovery. As a backend can come
out of recovery at any point, perhaps an asynchronous ParameterStatus
announcing this state change could be useful.

Hopefully this also qualifies for backporting to earlier version branches.

Shay
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 970c94ee80..3efd262cb8 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -629,7 +629,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			{
 UnlistenStmt *stmt = (UnlistenStmt *) parsetree;
 
-PreventCommandDuringRecovery("UNLISTEN");
+/* allow UNLISTEN during recovery, which is a noop */
 CheckRestrictedOperation("UNLISTEN");
 if (stmt->conditionname)
 	Async_Unlisten(stmt->conditionname);

Re: UNLISTEN, DISCARD ALL and readonly standby

2018-10-25 Thread Shay Rojansky

Thanks for explanation.

> I guess there's an argument to be made that it'd be OK to allow UNLISTEN
> but not LISTEN, but I find that argument fairly fishy.  I think issuing
> UNLISTEN to a standby is a likely sign of application misdesign, so I'm
> not convinced we'd be doing anyone any favors by not throwing an error.

I understand the argument. It would definitely be a fix for my specific
problem, and I do think in some application designs it may make sense for
an application to say "UNLISTEN from everything" regardless of whether any
LISTENs were previously issued - this could make things easier as the
application doesn't need to be aware of the backend it's connected to
(master, standby) from the place where UNLISTEN is managed So I don't think
it *necessarily* points to bad application design.

> Can't you skip that step when talking to a read-only session?  I think
> you're going to end up writing that code anyway, even if we accepted
> this request, because it'd be a long time before you could count on
> all servers having absorbed the patch.

That possibility was raised, but the problem is how to manage the client's
idea of whether the backend is read-only. If we call pg_is_in_recovery()
every time a connection is returned to the pool, that's a performance
killer. If we cache that information, we miss any state changes that may
happen along the way. I'm open to any suggestions on this though.

Regarding older versions of PostgreSQL, I hoping this is small enough to be
backported to at least active branches. In any case, a workaround already
exists to tell Npgsql to not reset connection state at all when returning
to the pool. This is what I'm recommending to the affected user in the
meantime.

On Thu, Oct 25, 2018 at 10:45 AM Tom Lane  wrote:

> Shay Rojansky  writes:
> > The documentation for DISCARD ALL[1] state that it is equivalent to a
> > series of commands which includes UNLISTEN *. On the other hand, the docs
> > for hot standby mode[1], state that UNLISTEN * is unsupported while
> DISCARD
> > is (although the docs don't specify whether this includes DISCARD ALL). I
> > haven't done a test, but this seems to indicate that while DISCARD ALL is
> > supported in hot standby mode, UNLISTEN is not, although its
> functionality
> > is included in the former.
>
> Well, if there weren't
>
> PreventCommandDuringRecovery("UNLISTEN");
>
> in it, UNLISTEN on a standby would just be a no-op, because there'd be
> nothing for it to do.  DISCARD lacks that error check, but its call to
> the UNLISTEN support code is still a no-op.
>
> The reason for disallowing LISTEN/UNLISTEN on a standby is that they
> wouldn't behave as a reasonable person might expect, ie seeing NOTIFY
> events from the master.  At some point we might allow that to happen, and
> we don't want to have complaints about backwards compatibility if we do.
>
> I guess there's an argument to be made that it'd be OK to allow UNLISTEN
> but not LISTEN, but I find that argument fairly fishy.  I think issuing
> UNLISTEN to a standby is a likely sign of application misdesign, so I'm
> not convinced we'd be doing anyone any favors by not throwing an error.
>
> > If this is indeed the case, is there any specific reason UNLISTEN can't
> be
> > supported? This is actually quite important in the case of Npgsql (.NET
> > driver). When a connection is returned to the connection pool, its state
> is
> > reset - usually with a DISCARD ALL. However, if prepared statements
> exist,
> > we avoid DISCARD ALL since it destroys those (the DEALLOCATE ALL
> component
> > of DISCARD ALL), and we want to keep prepared statements across
> connection
> > lifecycles for performance. So instead of issuing DISCARD ALL, Npgsql
> sends
> > the equivalent commands excluding DEALLOCATE ALL - but UNLISTEN * fails
> > when in recovery.
>
> Can't you skip that step when talking to a read-only session?  I think
> you're going to end up writing that code anyway, even if we accepted
> this request, because it'd be a long time before you could count on
> all servers having absorbed the patch.
>
> regards, tom lane
>

UNLISTEN, DISCARD ALL and readonly standby

2018-10-25 Thread Shay Rojansky

Hi hackers.

The documentation for DISCARD ALL[1] state that it is equivalent to a
series of commands which includes UNLISTEN *. On the other hand, the docs
for hot standby mode[1], state that UNLISTEN * is unsupported while DISCARD
is (although the docs don't specify whether this includes DISCARD ALL). I
haven't done a test, but this seems to indicate that while DISCARD ALL is
supported in hot standby mode, UNLISTEN is not, although its functionality
is included in the former.

If this is indeed the case, is there any specific reason UNLISTEN can't be
supported? This is actually quite important in the case of Npgsql (.NET
driver). When a connection is returned to the connection pool, its state is
reset - usually with a DISCARD ALL. However, if prepared statements exist,
we avoid DISCARD ALL since it destroys those (the DEALLOCATE ALL component
of DISCARD ALL), and we want to keep prepared statements across connection
lifecycles for performance. So instead of issuing DISCARD ALL, Npgsql sends
the equivalent commands excluding DEALLOCATE ALL - but UNLISTEN * fails
when in recovery.

So ideally UNLISTEN would be made to work in standby model, just like
DISCARD ALL. If DISCARD ALL is in fact unsupported in hot standby mode,
then the docs should probably be updated to reflect that.

Thanks for any information or advice on this!

[1] https://www.postgresql.org/docs/current/static/sql-discard.html
[2]
https://www.postgresql.org/docs/current/static/hot-standby.html#HOT-STANDBY-USERS

Re: Stored procedures and out parameters

2018-08-16 Thread Shay Rojansky

Peter,

I think this is all coming from Microsoft.  The JDBC driver API was
> modeled after the ODBC API, and the ODBC specification also contains the
> {call} escape.  Microsoft SQL Server is also the only SQL implementation
> to handle this stored function/procedure stuff totally differently: They
> only have procedures, but they return values, and they are invoked by an
> EXEC command.  (They don't support transaction control AFAIK.)  The .NET
> stuff is obviously also from Microsoft.
>
> So from Microsoft's perspective, this makes some sense: They only have
> one invokable object type, and their invocation syntax is different from
> everyone else's.  So they made a compatibility wrapper in their client
> libraries.
>
> Everyone else, however, has two invokable object types and standard ways
> to invoke them.  And they all seemingly faced this problem of how to jam
> these two into this one hole provided by the JDBC spec and ended up with
> slightly different, and incompatible, solutions.
>
> I think, if you want to write a portable-sans-Microsoft JDBC
> application, you can just run CALL or SELECT directly.  If you want to
> write something that is compatible with Microsoft, you can map {call} to
> a function invocation as before, which is actually more similar to a
> procedure in MS SQL Server.
>

Am going to repeat some of Vladimir's responses here...

I don't really know (or care much) about the history of how language
database APIs evolved to where they are, I'm more concerned with what the
introduction of stored procedures will do... The problem we're describing
seems to go beyond JDBC or .NET. Looking at psycopg, for example, there's a
callproc() function that internally translates to SELECT * FROM (
http://www.postgresqltutorial.com/postgresql-python/call-stored-procedures/)
- at the very least there are going to be some very confused users when
callproc() becomes a way to only invoke functions, whereas calling
procedures requires something else. I don't think there's anything really
Microsoft-specific about any of this (except maybe in the history) - just
like JDBC and psycopg, there's simply a single standard way in the database
API for invoking server-side things, and not two ways.

It's true that users will always be able to simply avoid the standard API
altogether and do SELECT * FROM func() or CALL proc(), but it really isn't
ideal to force users down this road, which once again, hurts portability
and general adoption.

Andres,

> Are you actually suggesting we effectively drop procedure soupport?

The ideal solution here is to allow functions to be invoked with CALL,
rather than rolling back the entire feature (which obviously nobody wants).
This would allow drivers to simply change their API implementation to
translate to CALL instead of SELECT * FROM. I have no idea what the risk of
that is, what it would entail etc. - I'm just expressing the driver writer
perspective here with Vladimir. Hopefully some satisfactory solution can be
found here.

Re: Stored procedures and out parameters

2018-08-15 Thread Shay Rojansky

> Well, no, actually I think it wouldn't.  Multiple rowsets coming back
> from a single query is, to my mind anyway, forbidden in the extended query
> mode.  Yeah, we could probably get away with it in simple query mode
> (PQexec), but it's very likely to break clients in extended mode, because
> they're going to be expecting just a single PGresult from a single SQL
> command.  Moreover, there are aspects of the protocol, such as the
> Describe command, that aren't capable of dealing with more than one
> result row descriptor per query.  It would take some investigation to
> determine the consequences of changing that.  Even if you can weasel-word
> your way into claiming that it's not a complete protocol break, I for one
> would not vote to allow it unless the client has specifically said it
> could handle it.
>
> The protocol extension features we recently put in could be used to tell
> whether libpq or equivalent wire-level driver allows the case, but I'm
> just as concerned about breaking application-layer logic above the driver,
> and it's pretty unclear how we ought to deal with telling whether that
> code is OK with this.
>
> As long as we're sure that the case is prevented in v11, it's something
> that we can leave to work on later.
>

Just to say that from the perspective of a driver writer, this is
absolutely true. The protocol docs explicitly say that the response to
Describe is "a RowDescription message describing the rows that will be
returned by executing the portal", and any deviation from this will likely
cause significant breakage client-side. So a protocol version change is
necessary in my opinion for this.

By the way, from a purely protocol point of view, if you allow stored
procedures to return multiple resultsets, you may as well consider allowing
regular statements to contain semicolons and return multiple resultsets as
well - just like the simple protocol... This obviously would have
consequence beyond a pure protocol change, but it would make thinks more
consistent and would also simplify certain client-side implementation
details.

>> Also another request by Vladimir and myself to consider allowing
> >> functions to be invoked with CALL, in order to provide a single way to
> >> call both procedures and functions - this is important as language
> >> database APIs typically have a single, database-independent way to
> >> invoke server-side code that does not distinguish between functions and
> >> procedures.
>
> > I am familiar with the Java {call} escape.  But I think it's pretty
> > useless.
>

It would be good to understand why you think it's useless (am not familiar
at all with JDBC, am genuinely interested). On the .NET side it's a pretty
common/simple API  (CommandType.StoredProcedure) that most users expect
coming from other databases, hence important for portability and user
acquisition.

I'd also be -1 on enabling this without a lot more thought.  It might
> be fine to allow it, but if it turns out it's not fine, we'd have painted
> ourselves into a corner.  I don't think that late in the release cycle
> is the time to be making such decisions.
>

I realize this is late and it's obviously not a quick and easy decision. On
the other hand, releasing *without* this also has its consequence, namely
setting in stone that the database-independent language API cannot be used
for invoking the new stored procedures. Even if you decide to add this for
PostgreSQL 12, users will have already written code that will need to
execute against PostgreSQL 11, and will therefore have to avoid the
database-independent API altogether and construct CALL statements
themselves.

So I hope you at least consider going through the thought process about
allowing this.

Re: Stored procedures and out parameters

2018-08-12 Thread Shay Rojansky

Peter, Tom,

Would it be possible for you to review the following two questions? Some
assertions have been made in this thread about the new stored procedures
(support for dynamic and multiple resultsets) whose compatibility with the
current PostgreSQL protocol are unclear to me as a client driver
maintainer... Some clarification would really help.

Also another request by Vladimir and myself to consider allowing functions
to be invoked with CALL, in order to provide a single way to call both
procedures and functions - this is important as language database APIs
typically have a single, database-independent way to invoke server-side
code that does not distinguish between functions and procedures.

Thanks for your time!

> 1. A stored procedure should be able to return multiple resultsets with
> different structures.
> > 2. A stored procedure can decide dynamically of the structure of the
> resultset(s) it returns, and the caller will discover it as they're
> returned, not before.
>
> Both of the above seem to be simply incompatible with the current
> PostgreSQL protocol. Describe currently returns a single RowDescription,
> which describes a single resultset, not more. And as I wrote before, I
> don't see how it's possible with the current protocol for the caller to
> discover the structure of the resultset(s) "as they're returned" - type
> information simply isn't included in the responses to Execute, only field
> lengths and values. It also leads me to wonder what exactly is returned in
> the current implementation when Describe is send on a stored procedure
> call: something *is* returned as Vladimir wrote, meaning that stored
> procedures aren't as dynamic as they're made out to be?
>
> To summarize, it seems to me that if the multiple resultsets and/or
> dynamic resultset structure are a real feature of stored procedure,
> attention must be given to possible impact on the protocol and especially
> how client-side drivers are supposed to interact with the resultsets.
>
> The missing part is "invoke functions via CALL statement".
>>
>
> I agree. This is definitely not a JDBC-specific issue - I'm guessing most
> database APIs out there have their (single) way to invoke server-side code,
> and that way is currently set to send SELECT because only functions existed
> before. The distinction between stored functions and stored procedures
> seems to be PostgreSQL-specific, and the different invocation syntax causes
> a mismatch. Hope you consider allowing invoking the new stored procedures
> with CALL.
>

Re: Stored procedures and out parameters

2018-08-04 Thread Shay Rojansky

> Shay>Npgsql currently always sends a describe as part of statement
> execution (for server-prepared messages the describe is done only once, at
> preparation-time). Vladimir, are you doing things differently here?
>
> The same thing is for pgjdbc. It does use describe to identify result row
> format.
> However, "CALL my_proc()" works just fine with current git master for both
> simple and extended protocol.
>

In one way that's good, but I wonder how this squares with the following
written by David above:

> 1. A stored procedure should be able to return multiple resultsets with
different structures.
> 2. A stored procedure can decide dynamically of the structure of the
resultset(s) it returns, and the caller will discover it as they're
returned, not before.

Both of the above seem to be simply incompatible with the current
PostgreSQL protocol. Describe currently returns a single RowDescription,
which describes a single resultset, not more. And as I wrote before, I
don't see how it's possible with the current protocol for the caller to
discover the structure of the resultset(s) "as they're returned" - type
information simply isn't included in the responses to Execute, only field
lengths and values. It also leads me to wonder what exactly is returned in
the current implementation when Describe is send on a stored procedure
call: something *is* returned as Vladimir wrote, meaning that stored
procedures aren't as dynamic as they're made out to be?

To summarize, it seems to me that if the multiple resultsets and/or dynamic
resultset structure are a real feature of stored procedure, attention must
be given to possible impact on the protocol and especially how client-side
drivers are supposed to interact with the resultsets.

The missing part is "invoke functions via CALL statement".
>

I agree. This is definitely not a JDBC-specific issue - I'm guessing most
database APIs out there have their (single) way to invoke server-side code,
and that way is currently set to send SELECT because only functions existed
before. The distinction between stored functions and stored procedures
seems to be PostgreSQL-specific, and the different invocation syntax causes
a mismatch. Hope you consider allowing invoking the new stored procedures
with CALL.

Re: Stored procedures and out parameters

2018-08-02 Thread Shay Rojansky

Apologies for disappearing from this conversation for a week.

First off, on the .NET side I have the exact same issue that Vladimir
Sitnikov described for the Java side. The .NET database API (ADO.NET) has a
standard, portable way for calling "server-side code". Since stored
procedures are in PostgreSQL, this portable API was implemented years ago
to invoke functions, which were the only thing in existence (so Npgsql
issues SELECT * FROM function()). Now that stored procedures have been
introduced, it's impossible to change what the portable API means without
massive backwards compatibility issues for all programs which already rely
on the API calling *functions*.

In other words, I really do hope that on the PostgreSQL side you consider
allowing both functions and procedures to be invoked via CALL. Npgsql (and
likely pgjdbc) would then be able to change the portable API to send CALL
instead of SELECT, avoiding all backwards compatibility issues (they would
do that only for PostgreSQL 11 and above). For now I'm telling users on the
beta version to avoid the API altogether (writing CALL SQL manually), which
as Vladimir wrote above is bad for portability.

> If you "DESCRIBE CALL my_func()" you get back a NoData response; it
doesn't try to inspect the RETURNS clause of the function even though in
theory it could.  The client is using CALL so that is it should expect to
receive.  That said I'm not entirely clear whether the NoData response is a
fixed bug or not...

Uh, this sounds like something we really need to understand... How is a
driver supposed to know what data types are being returned if it can't use
Describe? DataRow messages contain only field lengths and values, so having
a type OID is critical for knowing how to interpret the data, and that
currently is only available by sending a Describe on a statement... Npgsql
currently always sends a describe as part of statement execution (for
server-prepared messages the describe is done only once, at
preparation-time). Vladimir, are you doing things differently here?

On Tue, Jul 24, 2018 at 7:57 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Tue, Jul 24, 2018 at 11:31 AM, Daniel Verite 
> wrote:
>
>> David G. Johnston wrote:
>>
>> > > 2. A stored procedure can decide dynamically of
>> > > the structure of the resultset(s) it returns,
>> > > and the caller will discover it as they're returned, not
>> > > before.
>> > >
>> >
>> > The function itself doesn't care - this concern is about SELECT vs CALL
>> > invocation only, not the script definition.
>>
>> It does care, because CREATE FUNCTION has a RETURNS clause and
>> matching RETURN statements in the body, whereas CREATE PROCEDURE
>> doesn't and (will?) have a different syntax for producing resultsets.
>>
>
> But why does whatever code that implements CALL have to care?
>
> In Object Oriented terms why can not both procedures and functions
> implement a "EXECUTE_VIA_CALL" interface; while functions additionally
> implement a "EXECUTE_VIA_SELECT" interface - the one that they do today.
>
> ISTM that any (most) function could be trivially rewritten into a
> procedure (or wrapped by one) in a mechanical fashion which could then be
> executed via CALL.  I'm proposing that instead of having people write their
> own wrappers we figure out what the mechanical wrapper looks like, ideally
> based upon the public API of the function, and create it on-the-fly
> whenever said function is executed via a CALL statement.
>
> As for the invocation, that's just the starting point.  At this point
>> the driver doesn't know from '{call x}' whether x is a procedure or a
>> function in Postgres, hence the request for a syntax that would work
>> for both.  Okay, but aside from that, if there are results, the driver
>> needs to get them in a way that works without knowing wether it's a
>> function or procedure. How would that happen?
>>
>
> I'm saying that the driver needs to rewrite {call x} as "CALL x()" and
> expect optional resultsets and optional output arguments.  For functions
> invoked as procedures this would be a single resultset with zero output
> arguments.  Which is exactly the same end-user result that is received
> today when "SELECT * FROM x()" is used.
>
>
>> Back to the first message of the thread, Shay Rojansky was saying:
>>
>>   "However, connecting via Npgsql, which uses the extended protocol, I
>>   see something quite different. As a response to a Describe PostgreSQL
>>   message, I get back a NoData response rather than a RowDescription
>>   message"
>>
>> Why would a Describe on a "CALL myproc()&q

Stored procedures and out parameters

2018-07-23 Thread Shay Rojansky

Hi hackers, I've encountered some odd behavior with the new stored
procedure feature, when using INOUT parameters, running PostgreSQL 11-beta2.

With the following procedure:

CREATE OR REPLACE PROCEDURE my_proc(INOUT results text)
LANGUAGE 'plpgsql'
AS $BODY$
BEGIN
select 'test' into results;
END;
$BODY$;

executing CALL my_proc('whatever') yields a resultset with a "results"
column and a single row, containing "test". This is expected and is also
how functions work.

However, connecting via Npgsql, which uses the extended protocol, I see
something quite different. As a response to a Describe PostgreSQL message,
I get back a NoData response rather than a RowDescription message, In other
words, it would seem that the behavior of stored procedures differs between
the simple and extended protocols, when INOUT parameters are involved. Let
me know if you need any more info.

It may be worth adding some more documentation in
https://www.postgresql.org/docs/11/static/sql-createprocedure.html which
doesn't mention OUT/INOUT parameters at all (for instance, the fact that
OUT parameters aren't allowed, and an explanation why INOUT parameter are
allowed etc.).

Thanks for your help!

Re: citext function overloads for text parameters

2018-05-07 Thread Shay Rojansky

>
>
>> Thanks for the input. It's worth noting that the equality operator
>> currently works in the same way: citext = text comparison is (surprisingly
>> for me) case-sensitive.
>>
>> My expectation was that since citext is supposed to be a case-insensitive
>> *type*, all comparison operations involving it should be case-insensitive;
>>
>
> Comparison requires both things to be the same type.  The rules for
> implicitly converting one type to another prefer the core type text over
> the extension type citext.
>
> IOW, there is no such operator =(citext,text) and thus "citext = text
> comparison" is technically invalid.
>
> At this point we're sorta stuck with our choice, and while individual
> databases can implement their own functions and operators there is value in
> doing things the way the system provides to minimize future confusion and
> bugs.
>

OK, thanks for everyone's input.

Re: citext function overloads for text parameters

2018-05-07 Thread Shay Rojansky

>
> > Do you think it would be appropriate to simply add an strpos(citext,
> text)
> > overload to the extension to make sure this behaves more as expected? If
> so
> > I can try to submit a patch at some point.
>
> To me, if there's both a citext and a text parameter, then it's simply
> unclear which behavior is wanted; I do not think there's a principled
> argument that it should be case-insensitive.  Thus, adding a function
> to reverse the current interpretation would amount to breaking some
> currently-working apps to favor others that currently don't work.
> Doesn't really sound like a net gain.
>

Thanks for the input. It's worth noting that the equality operator
currently works in the same way: citext = text comparison is (surprisingly
for me) case-sensitive.

My expectation was that since citext is supposed to be a case-insensitive
*type*, all comparison operations involving it should be case-insensitive;
at least there seems to be more value in things being this way. But if that
doesn't sound like a convincing/principled argument this can be dropped
(and I do agree that the breakage may not be worth it).

Re: citext function overloads for text parameters

2018-05-06 Thread Shay Rojansky

Thanks for your answer Pavel.

This is expected - it is side effect of PostgreSQL implementation of
> function overloading and type conversions
>
> after installation citext, you will have more instances of function strpos
>
> strpos(citext, citext)
> strpos(text, text)
>

Do you think it would be appropriate to simply add an strpos(citext, text)
overload to the extension to make sure this behaves more as expected? If so
I can try to submit a patch at some point.

citext function overloads for text parameters

2018-05-06 Thread Shay Rojansky

Hi hackers.

The following works well of course:

test=# select strpos('Aa'::citext, 'a');
 strpos

  1

However, if I pass a typed text parameter for the substring, I get
case-sensitive behavior instead:

test=# select strpos('Aa'::citext, 'a'::text);
 strpos

  2

This seems like surprising behavior - my expectation was that the first
parameter being citext would be enough to trigger case-insensitive
behavior. The same may be happening with other string functions (e.g.
regexp_matches). This is causing some difficulties in a real scenario where
SQL and parameters are getting generated by an O/RM, and changing them
isn't trivial.

Do the above seem like problematic behavior like it does to me, or is it
the expected behavior?

Shay

Re: Cached/global query plans, autopreparation

2018-02-15 Thread Shay Rojansky

>
> I am an author of one of the proposal (autoprepare which is in commit fest
> now), but I think that sooner or later Postgres has to come to solution
> with shared DB caches/prepared plans.
> Please correct me if I am wrong, but it seems to me that most of all other
> top DBMSes having something like this.
> Such decision can provide a lot of different advantages:
> 1. Better memory utilization: no need to store the same data N times where
> N is number of backends and spend time for warming cache.
> 2. Optimizer can spend more time choosing better plan which then can be
> used by all clients. Even now time of compilation of some queries several
> times exceeds time of their execution.
> 3. It is simpler to add facilities for query plan tuning and maintaining
> (storing, comparing,...)
> 4. It make is possible to control size of memory used by caches. Right now
> catalog cache for DB with hundred thousands and tables and indexes
> multiplied by hundreds of backends can consume terabytes of memory.
> 5. Shared caches can simplify invalidation mechanism.
> 6. Almost all enterprise systems working with Postgres has to use some
> kind of connection pooling (pgbouncer, pgpool,...). It almost exclude
> possibility to use prepared statements. Which can slow down performance up
> to two times.
>

Just wanted to say I didn't see this email before my previous response, but
I agree with all of the above. The last point is particularly important,
especially for short-lived connection scenarios, the most typical of which
is web.


> There is just one (but very important) problem which needs to be solved:
> access to shared cache should be synchronized.
> But there are a lot of other shared resources in Postgres (procarray,
> shared buffers,...). So  I do not think that it is unsolvable problem and
> that it can cause degrade of performance.
>
> So it seems to be obvious that shared caches/plans can provide a lot of
> advantages. But it is still not clear to me the value of this advantages
> for real customers.
> Using -M prepared  protocol in pgbench workload can improve speed up to
> two times. But I have asked real Postgres users in Avito, Yandex, MyOffice
> and them told me
> that on their workloads advantage of prepared statements is about 10%. 10%
> performance improvement is definitely not a good compensation for rewriting
> substantial part of Postgres core...
>

Just wanted to say that I've seen more than 10% improvement in some
real-world application when preparation was done properly. Also, I'm
assuming that implementing this wouldn't involve "rewriting substantial
part of Postgres core", and that even 10% is quite a big gain, especially
if it's a transparent/free one as far as the user is concerned (no
application changes).

Re: Cached/global query plans, autopreparation

2018-02-15 Thread Shay Rojansky

>
> > Well, the issue is that implementing this is a major piece of work. This
> > post doesn't offer either resources nor a simpler way to do so. There's
> > no huge debate about the benefit of having a global plan cache, so I'm
> > not that surprised there's not a huge debate about a post arguing that
> > we should have one.
>
> Actually, I'm pretty darn skeptical about the value of such a cache for
> most use-cases.  But as long as it can be turned off and doesn't leave
> residual overhead nor massive added code cruft, I won't stand in the way
> of someone else investing their time in it.
>
> In any case, as you say, it's moot until somebody steps up to do it.
>

Well, looking at previous conversations and also at the comment above it
doesn't seem like there's a consensus on whether this feature would even be
beneficial... The point of my email above was to have that conversation
before looking into implementation. Tom, I'm especially interested in
understanding why you think this cache wouldn't help most use-cases: I see
many applications which don't prepare (i.e. because they use data access
layers/O/RMs which don't do it or expose it), and implementing this in the
driver seems like the wrong way (although Npgsql and JDBC do it, at least
some other languages don't).

In addition, there are also various options/possibilities here and there
seems no consensus about that either:

* How should statement plan caching be done for unprepared statements, what
strategy should be used? Should a threshold number of unprepared executions
be used before PostgreSQL decides to prepare? Should there be a maximum
number of autoprepared statements, ejecting the least-recently used one to
make room for a new one? Or something else?
* Should the cached plans be shared across connections ("global" cached
statements)? Are the savings from global caching greater than the cost of
the contention? The savings include both (a) not having to re-prepare the
same statement N times on different connections (typically just a one-time
application warm-up cost), and (b) not having the memory duplication of N
identical statements across statements (a constant cost, not warm-up - but
not sure how significant this is). Note that the global/shared discussion
is a bit orthogonal to the general autopreparation conversation - the
latter has value with or without the former.

Essentially I think it's a good idea to have a conversation about all this
before anyone jumps into implementation.

Re: Cached/global query plans, autopreparation

2018-02-13 Thread Shay Rojansky

Hi all,

Was wondering if anyone has a reaction to my email below about statement
preparation, was it too long? :)

(and sorry for top-posting)

On Tue, Feb 6, 2018 at 9:27 PM, Shay Rojansky <r...@roji.org> wrote:

> Hi all.
>
> Various versions of having PostgreSQL caching and/or autopreparing
> statement plans have been discussed (https://www.postgresql.org/
> message-id/op.t9ggb3wacigqcu%40apollo13.peufeu.com, https:/
> /www.postgresql.org/message-id/8e76d8fc-8b8c-14bd-d4d1-
> e9cf193a74f5%40postgrespro.ru), without clear conclusions or even an
> agreement on what might be worthwhile to implement. I wanted to bring this
> up again from a PostgreSQL driver maintainer's perspective (I'm the owner
> of Npgsql, the open source .NET driver), apologies in advance if I'm
> repeating things or I've missed crucial information. Below I'll describe
> three relevant issues and what I've done to deal with them.
>
> When the same statement is rerun, preparing it has a very significant
> performance boost. However, in short-lived connection scenarios it's
> frequently not possible to benefit from this - think of a typical webapp
> which allocates a connection from a pool, run a query and then return the
> connection. To make sure prepared statements are used, Npgsql's connection
> pool doesn't send DISCARD ALL when a connection is returned (to avoid
> wiping out the connections), and maintains an internal table mapping SQL
> (and parameter types) to a PostgreSQL statement name. The next time the
> application attempts to prepare the same SQL, the prepared statement is
> found in the table and no preparation needs to occur. This means that
> prepared statements persist across pooled connection open/close, and are
> never discarded unless the user uses a specific API. While this works, the
> disadvantages are that:
> 1. This kind of mechanism needs to be implemented again and again, in each
> driver:
> 2. It relies on Npgsql's internal pooling, which can track persistent
> prepared statements on physical connections. If an external pool is used
> (e.g. pgpool), this isn't really possible.
> 1. It complicates resetting the session state (instead of DISCARD ALL, a
> combination of all other reset commands except DEALLOCATE ALL needs be
> sent). This is minor.
>
> The second issue is that many applications don't work directly against the
> database API (ADO.NET in .NET, JDBC in Java). If any sort of O/RM or
> additional layer is used, there's a good chance that that layer doesn't
> prepare in any way, and indeed hide your access to the database API's
> preparation method. Two examples from the .NET world is dapper (a very
> popular micro-O/RM) and Entity Framework. In order to provide the best
> possible performance in these scenarios, Npgsql has an opt-in feature
> whereby it tracks how many times a given statement was executed, and once
> it passes a certain threshold automatically prepares it. An LRU cache is
> then used to determine which prepared statements to discard, to avoid
> explosion. In effect, statement auto-preparation is implemented in the
> driver. I know that the JDBC driver also implements such a mechanism (it
> was actually the inspiration for the Npgsql feature). The issues with this
> are:
>
> 1. As above, this has to be implemented by every driver (and is quite
> complex to do well)
> 2. There's a possible missed opportunity in having a single plan on the
> server, as each connection has its own (the "global plan" option). Many
> apps out there send the same statements across many connections so this
> seems relevant - but I don't know if the gains outweigh the contention
> impact in PostgreSQL.
>
> Finally, since quite a few (most?) other databases include autopreparation
> (SQL Server, Oracle...), users porting their applications - which don't
> explicitly prepare - experience a big performance drop. It can rightly be
> said that porting an application across databases isn't a trivial task and
> that adjustments need to be made, but from experience I can say that
> PostgreSQL is losing quite a few users to this.
>
> The above issues could be helped by having PostgreSQL cache on its side
> (especially the second issue, which is the most important). Ideally, any
> adopted solution would be transparent and not require any modification to
> applications. It would also not impact explicitly-prepared statements in
> any way.
>
> Note that I'm not arguing for any specific implementation on the
> PostgreSQL side (e.g. global or not), but just describing a need and hoping
> to restart a conversation that will lead somewhere.
>
> (and thanks for reading this overly long message!)
>
> Shay
>

Re: Built-in connection pooling

2018-02-09 Thread Shay Rojansky

Am a bit late to this thread, sorry if I'm slightly rehashing things. I'd
like to go back to the basic on this.

Unless I'm mistaken, at least in the Java and .NET world, clients are
almost always expected to have their own connection pooling, either
implemented inside the driver (ADO.NET model) or as a separate modular
component (JDBC). This approach has a few performance advantages:

1. "Opening" a new pooled connection is virtually free - no TCP connection
needs to be opened, no I/O, no startup packet, nothing (only a tiny bit of
synchronization).
2. Important client state can be associated to physical connections. For
example, prepared statements can be tracked on the physical connection, and
persisted when the connection is returned to the pool. The next time the
physical connection is returned from the pool, if the user tries to
server-prepare a statement, we can check on the connection if it has
already been prepared in a "previous lifetime", and if so, no need to
prepare again. This is vital for scenarios with short-lived (pooled)
connections, such as web. Npgsql does this.

Regarding the problem of idle connections being kept open by clients, I'd
argue it's a client-side problem. If the client is using a connection pool,
the pool should be configurable to close idle connections after a certain
time (I think this is relatively standard behavior). If the client isn't
using a pool, it seems to be the application's responsibility to release
connections when they're no longer needed.

The one drawback is that the pooling is application-specific, so it can't
be shared by multiple applications/hosts. So in some scenarios it may make
sense to use both client pooling and proxy/server pooling.

To sum it up, I would argue that connection pooling should first and
foremost be considered as a client feature, rather than a proxy feature
(pgpool) or server feature (the PostgreSQL pooling being discussed here).
This isn't to say server-side pooling has no value though.

Cached/global query plans, autopreparation

2018-02-06 Thread Shay Rojansky

Hi all.

Various versions of having PostgreSQL caching and/or autopreparing
statement plans have been discussed (
https://www.postgresql.org/message-id/op.t9ggb3wacigqcu%40apollo13.peufeu.com
,
https://www.postgresql.org/message-id/8e76d8fc-8b8c-14bd-d4d1-e9cf193a74f5%40postgrespro.ru),
without clear conclusions or even an agreement on what might be worthwhile
to implement. I wanted to bring this up again from a PostgreSQL driver
maintainer's perspective (I'm the owner of Npgsql, the open source .NET
driver), apologies in advance if I'm repeating things or I've missed
crucial information. Below I'll describe three relevant issues and what
I've done to deal with them.

When the same statement is rerun, preparing it has a very significant
performance boost. However, in short-lived connection scenarios it's
frequently not possible to benefit from this - think of a typical webapp
which allocates a connection from a pool, run a query and then return the
connection. To make sure prepared statements are used, Npgsql's connection
pool doesn't send DISCARD ALL when a connection is returned (to avoid
wiping out the connections), and maintains an internal table mapping SQL
(and parameter types) to a PostgreSQL statement name. The next time the
application attempts to prepare the same SQL, the prepared statement is
found in the table and no preparation needs to occur. This means that
prepared statements persist across pooled connection open/close, and are
never discarded unless the user uses a specific API. While this works, the
disadvantages are that:
1. This kind of mechanism needs to be implemented again and again, in each
driver:
2. It relies on Npgsql's internal pooling, which can track persistent
prepared statements on physical connections. If an external pool is used
(e.g. pgpool), this isn't really possible.
1. It complicates resetting the session state (instead of DISCARD ALL, a
combination of all other reset commands except DEALLOCATE ALL needs be
sent). This is minor.

The second issue is that many applications don't work directly against the
database API (ADO.NET in .NET, JDBC in Java). If any sort of O/RM or
additional layer is used, there's a good chance that that layer doesn't
prepare in any way, and indeed hide your access to the database API's
preparation method. Two examples from the .NET world is dapper (a very
popular micro-O/RM) and Entity Framework. In order to provide the best
possible performance in these scenarios, Npgsql has an opt-in feature
whereby it tracks how many times a given statement was executed, and once
it passes a certain threshold automatically prepares it. An LRU cache is
then used to determine which prepared statements to discard, to avoid
explosion. In effect, statement auto-preparation is implemented in the
driver. I know that the JDBC driver also implements such a mechanism (it
was actually the inspiration for the Npgsql feature). The issues with this
are:

1. As above, this has to be implemented by every driver (and is quite
complex to do well)
2. There's a possible missed opportunity in having a single plan on the
server, as each connection has its own (the "global plan" option). Many
apps out there send the same statements across many connections so this
seems relevant - but I don't know if the gains outweigh the contention
impact in PostgreSQL.

Finally, since quite a few (most?) other databases include autopreparation
(SQL Server, Oracle...), users porting their applications - which don't
explicitly prepare - experience a big performance drop. It can rightly be
said that porting an application across databases isn't a trivial task and
that adjustments need to be made, but from experience I can say that
PostgreSQL is losing quite a few users to this.

The above issues could be helped by having PostgreSQL cache on its side
(especially the second issue, which is the most important). Ideally, any
adopted solution would be transparent and not require any modification to
applications. It would also not impact explicitly-prepared statements in
any way.

Note that I'm not arguing for any specific implementation on the PostgreSQL
side (e.g. global or not), but just describing a need and hoping to restart
a conversation that will lead somewhere.

(and thanks for reading this overly long message!)

Shay

50 matches

Mail list logo