From: Peter Eisentraut [mailto:peter.eisentr...@2ndquadrant.com]
> The difference is how error recovery works.  So this will necessarily be
> tied to how the client code or other surrounding code is structured or what
> the driver or framework is doing in the background to manage transactions.
> It would also be bad if client code was not prepared for this new behavior,
> reported the transaction as complete while some commands in the middle were
> omitted.
> 
> Drivers can already achieve this behavior and do do that by issuing savepoint
> commands internally.  The point raised in this thread was that that creates
> too much network overhead, so a backend-based solution would be preferable.
> We haven't seen any numbers or other evidence to quantify that claim, so
> maybe it's worth looking into that some more.
> 
> In principle, a backend-based solution that drivers just have to opt into
> would save a lot of duplication.  But the drivers that care or require it
> according to their standards presumably already implement this behavior
> in some other way, so it comes back to whether there is a performance or
> other efficiency gain here.
> 
> Another argument was that other SQL implementations have this behavior.
> This appears to be the case.  But as far as I can tell, it is also tied
> to their particular interfaces and the structure and flow control they
> provide.  So a client-side solution like psql already provides or something
> in the various drivers would work just fine here.
> 
> So my summary for the moment is that a GUC or similar run-time setting might
> be fine, with appropriate explanation and warnings.  But it's not clear
> whether it's worth it given the existing alternatives.

I can think of four reasons why the server-side support is necessary or 
desirable.

First, the server log could be filled with SAVEPOINT and RELEASE lines when you 
need to investigate performance or audit activity.

Second, the ease of use for those who migrate from other DBMSs.  With the 
server-side support, only the DBA needs to be aware of the configuration in 
postgresql.conf.  Other people don't need to be aware of the client-side 
parameter when they deploy applications.

Third, lack of server-side support causes trouble to driver developers.  In a 
recent discussion with the psqlODBC committer, he had some trouble improving or 
fixing the statement-rollback support.  Npgsql doesn't have the 
statement-rollback yet.  PgJDBC has supported the feature with autosave 
parameter only recently.  Do the drivers for other languages like Python, Go, 
JavaScript have the feature?  We should reduce the burdon on the driver 
developers.

Fourth, the runtime performance.  In a performance benchmark of one of our 
customers, where a batch application ran 1.5 or 5 million small SELECTs with 
primary key access, the execution time of the whole batch became shorter by 
more than 30% (IIRC) when the local connection was used instead of the remote 
TCP/IP one.  The communication overhead is not small.

Also, in the PostgreSQL documentation, the communication overhead is treated 
seriously as follows:


https://www.postgresql.org/docs/devel/static/plpgsql-overview.html#plpgsql-advantages

[Excerpt]
--------------------------------------------------
That means that your client application must send each query to the database 
server, wait for it to be processed, receive and process the results, do some 
computation, then send further queries to the server. All this incurs 
interprocess communication and will also incur network overhead if your client 
is on a different machine than the database server.

With PL/pgSQL you can group a block of computation and a series of queries 
inside the database server, thus having the power of a procedural language and 
the ease of use of SQL, but with considerable savings of client/server 
communication overhead.


•Extra round trips between client and server are eliminated


•Intermediate results that the client does not need do not have to be marshaled 
or transferred between server and client


•Multiple rounds of query parsing can be avoided


This can result in a considerable performance increase as compared to an 
application that does not use stored functions.
--------------------------------------------------


Craig reports the big communication overhead:

PATCH: Batch/pipelining support for libpq
https://www.postgresql.org/message-id/CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=b4dm...@mail.gmail.com#CAMsr+YFUjJytRyV4J-16bEoiZyH=4nj+sQ7JP9ajwz=b4dm...@mail.gmail.com

Re: foreign table batch insert
https://www.postgresql.org/message-id/CAMsr+YFgDUiJ37DEfPRk8WDBuZ58psdAYJd8iNFSaGxtw=w...@mail.gmail.com

[Excerpt]
--------------------------------------------------
The time difference for 10k inserts on the local host over a unix socket
shows a solid improvement:

batch insert elapsed:      0.244293s
sequential insert elapsed: 0.375402s

... but over, say, a connection to a random AWS RDS instance fired up for
the purpose that lives about 320ms away the difference is huge:

batch insert elapsed:      9.029995s
sequential insert elapsed: (I got bored after 10 minutes; it should take a
bit less then an hour based on the latency numbers)

With 500 rows on the remote AWS RDS instance, once the I/O quota is already
saturated:

batch insert elapsed:      1.229024s
sequential insert elapsed: 156.962180s

which is an improvement by a factor of over 120
--------------------------------------------------


Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to