[
https://issues.apache.org/jira/browse/OAK-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071272#comment-17071272
]
Solomon Rutzky commented on OAK-8977:
-------------------------------------
{quote}thanks for your research.
{quote}
You're welcome 😺 .
{quote}If handling of PreparedStatements is indeed the problem, then it would
probably better to fix that, instead of working around that introducing
vendor-specific code).
{quote}
Well, I think your test results in OAK-8926 prove that this is, indeed, a
vendor-specific issue. It is certainly understandable that folks who develop
apps that support multiple DB platforms prefer to have a singular interface as
it cuts down on code complexity, but unfortunately that's just not how these
things work. This question comes up every so often on Stack Overflow /
DBA.StackExchange, and the reality is, even if you find ANSI-compliant SQL that
works for all supported platforms, that ignores the fact that each platform is
tuned slightly differently, and something that is fast in one can be slow in
another, yet you fix the other and then you make the first one slow. I can't
say that prepared statements across all platforms are inefficient, I am only
speaking in terms of SQL Server. It might be true for others, or even all, or
perhaps SQL Server is the only one that handles prepared statements this way.
While I find the topic interesting, I absolutely do not have the time to
undertake such research. What I can say is this:
* Your test results show that *H2(file)* and *Derby(embedded)* are quite fast,
so clearly they don't have any issues with prepared statements
* I know from my research into SQL Server that it does _not_ handle prepared
statements as efficiently as truly parameterized queries or stored procedures.
Connection pooling doesn't change how this works because the mapping of
statements to their IDs is really per _session_, not per connection (see below)
* As far as I can tell from my poking around the JDBC documentation, there is
no other method offered for executing SQL statements, outside of callable
statements, which are a variation of prepared statements.
Hence, we don't want to detract from the other platforms currently operating
quite efficiently, yet what works for those is not optimal for SQL Server, and
there aren't a whole lot of options.
This, of course, does mean that the current *storeBlockInDatabase* method will
need to be overridden with a SQL Server-specific implementation, but that would
hopefully be worth the gain in performance. And hopefully that is something we
can test / prove prior to making such changes.
{quote}This benchmark is single-threaded, so it should be using the same
connection. What connection pool/datasource implementation do the statements
above apply to?{quote}
The connection is not the issue. When connection pooling is used, logging into
an existing connection will implicitly execute the {{sp_reset_connection}}
internal stored procedure that resets most of the session settings. The
prepared statement map of cached queries is one of those things that is reset.
The only way to get a performance benefit via prepared statements is to reuse
the same statement multiple times in the same _session_ (i.e. _prior_ to
calling {{connection.close()}} ), but that's not how this DB access pattern
works, and the nature of these operations does not allow for changing that.
Please see the *Update* section of my answer to [What is the sense and benefit
of using SqlCommand.Prepare()?|https://dba.stackexchange.com/a/129721/30859].
> RDBBlobStore performance improvement for SQL Server (MSSQL)
> -----------------------------------------------------------
>
> Key: OAK-8977
> URL: https://issues.apache.org/jira/browse/OAK-8977
> Project: Jackrabbit Oak
> Issue Type: Technical task
> Components: rdbmk
> Reporter: Solomon Rutzky
> Priority: Major
>
> The performance test done in OAK-8926 indicates that one or more aspects of
> using SQL Server as the database need to be addressed (also MySQL, but this
> issue only concerns SQL Server).
>
> OAK-8963 (specify a binary collation in the {{CREATE TABLE}} statement) will
> improve performance a little, but not nearly enough to get its performance up
> to the range of the other non-MySQL databases.
>
> I did some testing and found one area that can be improved, and that should
> be feasible to improve. The current DB access pattern is to submit
> parameterized dynamic SQL. JDBC handles this as a "prepared" statement. I did
> some research four years ago on prepared statements ( [What is the sense and
> benefit of using
> SqlCommand.Prepare()?|https://dba.stackexchange.com/a/129721/30859] on
> DBA.StackExchange ) and found that:
> # prepared statements, even when parameterized, don't cache the execution
> plan between connections (even if the connection is pooled and does not need
> to re-authenticate)
> # prepared statements only re-use the execution plan when executing the same
> prepared statement with different parameters in the _same session_ (the Java
> code opens and closes the connection per each call)
> One way to improve performance is to:
> # put the T-SQL code into a stored procedure (that way SQL Server does not
> need to parse it per each call and can re-use an execution plan), _and_
> # call the stored procedure using a "callable statement": [Using a stored
> procedure with input
> parameters|https://docs.microsoft.com/en-us/sql/connect/jdbc/using-a-stored-procedure-with-input-parameters]
>
> I tested this by first copying the basic sequence of T-SQL statements found
> in the
> [storeBlockInDatabase|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStore.java?revision=1874570&view=markup#l307]
> method (assuming that everything always worked / happy path / no error
> handling) and simulating the new
> [testInsertSmallBlobs()|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/blob/RDBBlobStoreTest.java?view=markup&pathrev=1874693#l308]
> test, first as dynamic SQL, and then as a stored procedure:
> {code:sql}
> ALTER PROCEDURE dbo.[StoreBlockInDatabase]
> (
> @Digest VARBINARY(32),
> @Level INT,
> @Data VARBINARY(MAX)
> )
> AS
> SET NOCOUNT ON;
> DECLARE @Id VARCHAR(64) = CONVERT(VARCHAR(64), @Digest, 2),
> @Now BIGINT;
> SELECT @Now = [cpu_ticks]
> FROM sys.dm_os_sys_info;
> BEGIN TRY
> UPDATE dbo.[DATASTORE_META]
> SET [LASTMOD] = @Now
> WHERE [ID] = @Id;
> IF (@@ROWCOUNT = 0)
> BEGIN
> INSERT INTO dbo.[DATASTORE_DATA] ([ID], [DATA]) VALUES (@Id,
> @Data);
> INSERT INTO dbo.[DATASTORE_META] ([ID], [LVL], [LASTMOD])
> VALUES (@Id, @Level, @Now);
> END;
> END TRY
> BEGIN CATCH
> ;THROW;
> END CATCH;
> GO
> {code}
> I tested against both the {{Latin1_General_CI_AS}} collation and the
> {{Latin1_General_BIN2}} collation:
> * The stored procedure inserts 2.5 times more rows than the dynamic SQL
> version
> * The binary collation is slightly better than the non-binary collation per
> each test
> I will attach the test script once I get it cleaned up.
> P.S. I also tested with another pattern to see if doing a {{SELECT}} first
> (rather than assuming the {{UPDATE}}) would be better (based on my testing
> for a similar case: [UPDATE performance where no data
> changes|https://dba.stackexchange.com/a/114369/30859] on DBA.StackExchange),
> but it wasn't better.
> P.P.S. I figure the {{CREATE PROCEDURE}} statement could be placed in the
> vendor-specific area of
> [RDBBlobStoreDB|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStoreDB.java?revision=1874271&view=markup#l60]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)