[ 
https://issues.apache.org/jira/browse/OAK-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Reschke updated OAK-8977:
--------------------------------
        Parent: OAK-1266
    Issue Type: Technical task  (was: Improvement)

> RDBBlobStore performance improvement for SQL Server (MSSQL)
> -----------------------------------------------------------
>
>                 Key: OAK-8977
>                 URL: https://issues.apache.org/jira/browse/OAK-8977
>             Project: Jackrabbit Oak
>          Issue Type: Technical task
>          Components: rdbmk
>            Reporter: Solomon Rutzky
>            Priority: Major
>
> The performance test done in OAK-8926 indicates that one or more aspects of 
> using SQL Server as the database need to be addressed (also MySQL, but this 
> issue only concerns SQL Server).
>  
> OAK-8963 (specify a binary collation in the {{CREATE TABLE}} statement) will 
> improve performance a little, but not nearly enough to get its performance up 
> to the range of the other non-MySQL databases.
>  
> I did some testing and found one area that can be improved, and that should 
> be feasible to improve. The current DB access pattern is to submit 
> parameterized dynamic SQL. JDBC handles this as a "prepared" statement. I did 
> some research four years ago on prepared statements (  [What is the sense and 
> benefit of using 
> SqlCommand.Prepare()?|https://dba.stackexchange.com/a/129721/30859] on 
> DBA.StackExchange ) and found that:
> # prepared statements, even when parameterized, don't cache the execution 
> plan between connections (even if the connection is pooled and does not need 
> to re-authenticate)
> # prepared statements only re-use the execution plan when executing the same 
> prepared statement with different parameters in the _same session_ (the Java 
> code opens and closes the connection per each call)
> One way to improve performance is to:
> # put the T-SQL code into a stored procedure (that way SQL Server does not 
> need to parse it per each call and can re-use an execution plan), _and_
> # call the stored procedure using a "callable statement": [Using a stored 
> procedure with input 
> parameters|https://docs.microsoft.com/en-us/sql/connect/jdbc/using-a-stored-procedure-with-input-parameters]
>  
> I tested this by first copying the basic sequence of T-SQL statements found 
> in the 
> [storeBlockInDatabase|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStore.java?revision=1874570&view=markup#l307]
>  method (assuming that everything always worked / happy path / no error 
> handling) and simulating the new 
> [testInsertSmallBlobs()|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/blob/RDBBlobStoreTest.java?view=markup&pathrev=1874693#l308]
>  test, first as dynamic SQL, and then as a stored procedure:
> {code:sql}
> ALTER PROCEDURE dbo.[StoreBlockInDatabase]
> (
>       @Digest VARBINARY(32),
>       @Level INT,
>       @Data VARBINARY(MAX)
> )
> AS
> SET NOCOUNT ON;
> DECLARE @Id VARCHAR(64) = CONVERT(VARCHAR(64), @Digest, 2),
>               @Now BIGINT;
> SELECT @Now = [cpu_ticks]
> FROM   sys.dm_os_sys_info;
> BEGIN TRY
>       UPDATE dbo.[DATASTORE_META]
>       SET    [LASTMOD] = @Now
>       WHERE  [ID] = @Id;
>       IF (@@ROWCOUNT = 0)
>       BEGIN
>               INSERT INTO dbo.[DATASTORE_DATA] ([ID], [DATA]) VALUES (@Id, 
> @Data);
>               INSERT INTO dbo.[DATASTORE_META] ([ID], [LVL], [LASTMOD]) 
> VALUES (@Id, @Level, @Now);
>       END;
> END TRY
> BEGIN CATCH
>       ;THROW;
> END CATCH;
> GO
> {code}
> I tested against both the {{Latin1_General_CI_AS}} collation and the 
> {{Latin1_General_BIN2}} collation:
> * The stored procedure inserts 2.5 times more rows than the dynamic SQL 
> version
> * The binary collation is slightly better than the non-binary collation per 
> each test
> I will attach the test script once I get it cleaned up.
> P.S. I also tested with another pattern to see if doing a {{SELECT}} first 
> (rather than assuming the {{UPDATE}}) would be better (based on my testing 
> for a similar case: [UPDATE performance where no data 
> changes|https://dba.stackexchange.com/a/114369/30859] on DBA.StackExchange), 
> but it wasn't better.
> P.P.S. I figure the {{CREATE PROCEDURE}} statement could be placed in the 
> vendor-specific area of 
> [RDBBlobStoreDB|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStoreDB.java?revision=1874271&view=markup#l60]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to