[jira] [Commented] (OAK-8977) RDBBlobStore performance improvement for SQL Server (MSSQL)

Solomon Rutzky (Jira) Mon, 30 Mar 2020 13:52:13 -0700


    [ 
https://issues.apache.org/jira/browse/OAK-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071272#comment-17071272
 ]


Solomon Rutzky commented on OAK-8977:
-------------------------------------

{quote}thanks for your research.
{quote}

You're welcome &#x1F63A;  .

{quote}If handling of PreparedStatements is indeed the problem, then it would 
probably better to fix that, instead of working around that introducing 
vendor-specific code).
{quote}

Well, I think your test results in OAK-8926 prove that this is, indeed, a 
vendor-specific issue. It is certainly understandable that folks who develop 
apps that support multiple DB platforms prefer to have a singular  interface as 
it cuts down on code complexity, but unfortunately that's just not how these 
things work. This question comes up every so often on Stack Overflow / 
DBA.StackExchange, and the reality is, even if you find ANSI-compliant SQL that 
works for all supported platforms, that ignores the fact that each platform is 
tuned slightly differently, and something that is fast in one can be slow in 
another, yet you fix the other and then you make the first one slow. I can't 
say that prepared statements across all platforms are inefficient, I am only 
speaking in terms of SQL Server. It might be true for others, or even all, or 
perhaps SQL Server is the only one that handles prepared statements this way. 
While I find the topic interesting, I absolutely do not have the time to 
undertake such research. What I can say is this:

* Your test results show that *H2(file)* and *Derby(embedded)* are quite fast, 
so clearly they don't have any issues with prepared statements
* I know from my research into SQL Server that it does _not_ handle prepared 
statements as efficiently as truly parameterized queries or stored procedures. 
Connection pooling doesn't change how this works because the mapping of 
statements to their IDs is really per _session_, not per connection (see below)
* As far as I can tell from my poking around the JDBC documentation, there is 
no other method offered for executing SQL statements, outside of callable 
statements, which are a variation of prepared statements.

Hence, we don't want to detract from the other platforms currently operating 
quite efficiently, yet what works for those is not optimal for SQL Server, and 
there aren't a whole lot of options.

This, of course, does mean that the current *storeBlockInDatabase* method will 
need to be overridden with a SQL Server-specific implementation, but that would 
hopefully be worth the gain in performance. And hopefully that is something we 
can test / prove prior to making such changes.

{quote}This benchmark is single-threaded, so it should be using the same 
connection. What connection pool/datasource implementation do the statements 
above apply to?{quote}

The connection is not the issue. When connection pooling is used, logging into 
an existing connection will implicitly execute the {{sp_reset_connection}} 
internal stored procedure that resets most of the session settings. The 
prepared statement map of cached queries is one of those things that is reset. 
The only way to get a performance benefit via prepared statements is to reuse 
the same statement multiple times in the same _session_ (i.e. _prior_ to 
calling {{connection.close()}} ), but that's not how this DB access pattern 
works, and the nature of these operations does not allow for changing that.

Please see the *Update* section of my answer to [What is the sense and benefit 
of using SqlCommand.Prepare()?|https://dba.stackexchange.com/a/129721/30859].



> RDBBlobStore performance improvement for SQL Server (MSSQL)
> -----------------------------------------------------------
>
>                 Key: OAK-8977
>                 URL: https://issues.apache.org/jira/browse/OAK-8977
>             Project: Jackrabbit Oak
>          Issue Type: Technical task
>          Components: rdbmk
>            Reporter: Solomon Rutzky
>            Priority: Major
>
> The performance test done in OAK-8926 indicates that one or more aspects of 
> using SQL Server as the database need to be addressed (also MySQL, but this 
> issue only concerns SQL Server).
>  
> OAK-8963 (specify a binary collation in the {{CREATE TABLE}} statement) will 
> improve performance a little, but not nearly enough to get its performance up 
> to the range of the other non-MySQL databases.
>  
> I did some testing and found one area that can be improved, and that should 
> be feasible to improve. The current DB access pattern is to submit 
> parameterized dynamic SQL. JDBC handles this as a "prepared" statement. I did 
> some research four years ago on prepared statements (  [What is the sense and 
> benefit of using 
> SqlCommand.Prepare()?|https://dba.stackexchange.com/a/129721/30859] on 
> DBA.StackExchange ) and found that:
> # prepared statements, even when parameterized, don't cache the execution 
> plan between connections (even if the connection is pooled and does not need 
> to re-authenticate)
> # prepared statements only re-use the execution plan when executing the same 
> prepared statement with different parameters in the _same session_ (the Java 
> code opens and closes the connection per each call)
> One way to improve performance is to:
> # put the T-SQL code into a stored procedure (that way SQL Server does not 
> need to parse it per each call and can re-use an execution plan), _and_
> # call the stored procedure using a "callable statement": [Using a stored 
> procedure with input 
> parameters|https://docs.microsoft.com/en-us/sql/connect/jdbc/using-a-stored-procedure-with-input-parameters]
>  
> I tested this by first copying the basic sequence of T-SQL statements found 
> in the 
> [storeBlockInDatabase|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStore.java?revision=1874570&view=markup#l307]
>  method (assuming that everything always worked / happy path / no error 
> handling) and simulating the new 
> [testInsertSmallBlobs()|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/blob/RDBBlobStoreTest.java?view=markup&pathrev=1874693#l308]
>  test, first as dynamic SQL, and then as a stored procedure:
> {code:sql}
> ALTER PROCEDURE dbo.[StoreBlockInDatabase]
> (
>       @Digest VARBINARY(32),
>       @Level INT,
>       @Data VARBINARY(MAX)
> )
> AS
> SET NOCOUNT ON;
> DECLARE @Id VARCHAR(64) = CONVERT(VARCHAR(64), @Digest, 2),
>               @Now BIGINT;
> SELECT @Now = [cpu_ticks]
> FROM   sys.dm_os_sys_info;
> BEGIN TRY
>       UPDATE dbo.[DATASTORE_META]
>       SET    [LASTMOD] = @Now
>       WHERE  [ID] = @Id;
>       IF (@@ROWCOUNT = 0)
>       BEGIN
>               INSERT INTO dbo.[DATASTORE_DATA] ([ID], [DATA]) VALUES (@Id, 
> @Data);
>               INSERT INTO dbo.[DATASTORE_META] ([ID], [LVL], [LASTMOD]) 
> VALUES (@Id, @Level, @Now);
>       END;
> END TRY
> BEGIN CATCH
>       ;THROW;
> END CATCH;
> GO
> {code}
> I tested against both the {{Latin1_General_CI_AS}} collation and the 
> {{Latin1_General_BIN2}} collation:
> * The stored procedure inserts 2.5 times more rows than the dynamic SQL 
> version
> * The binary collation is slightly better than the non-binary collation per 
> each test
> I will attach the test script once I get it cleaned up.
> P.S. I also tested with another pattern to see if doing a {{SELECT}} first 
> (rather than assuming the {{UPDATE}}) would be better (based on my testing 
> for a similar case: [UPDATE performance where no data 
> changes|https://dba.stackexchange.com/a/114369/30859] on DBA.StackExchange), 
> but it wasn't better.
> P.P.S. I figure the {{CREATE PROCEDURE}} statement could be placed in the 
> vendor-specific area of 
> [RDBBlobStoreDB|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStoreDB.java?revision=1874271&view=markup#l60]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (OAK-8977) RDBBlobStore performance improvement for SQL Server (MSSQL)

Reply via email to