Solomon Rutzky created OAK-8977:
-----------------------------------
Summary: RDBBlobStore performance improvement for SQL Server
(MSSQL)
Key: OAK-8977
URL: https://issues.apache.org/jira/browse/OAK-8977
Project: Jackrabbit Oak
Issue Type: Improvement
Components: rdbmk
Reporter: Solomon Rutzky
The performance test done in OAK-8926 indicates that one or more aspects of
using SQL Server as the database need to be addressed (also MySQL, but this
issue only concerns SQL Server).
OAK-8963 (specify a binary collation in the {{CREATE TABLE}} statement) will
improve performance a little, but not nearly enough to get its performance up
to the range of the other non-MySQL databases.
I did some testing and found one area that can be improved, and that should be
feasible to improve. The current DB access pattern is to submit parameterized
dynamic SQL. JDBC handles this as a "prepared" statement. I did some research
four years ago on prepared statements ( [What is the sense and benefit of
using SqlCommand.Prepare()?|https://dba.stackexchange.com/a/129721/30859] on
DBA.StackExchange ) and found that:
# prepared statements, even when parameterized, don't cache the execution plan
between connections (even if the connection is pooled and does not need to
re-authenticate)
# prepared statements only re-use the execution plan when executing the same
prepared statement with different parameters in the _same session_ (the Java
code opens and closes the connection per each call)
One way to improve performance is to:
# put the T-SQL code into a stored procedure (that way SQL Server does not need
to parse it per each call and can re-use an execution plan), _and_
# call the stored procedure using a "callable statement": [Using a stored
procedure with input
parameters|https://docs.microsoft.com/en-us/sql/connect/jdbc/using-a-stored-procedure-with-input-parameters]
I tested this by first copying the basic sequence of T-SQL statements found in
the
[storeBlockInDatabase|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStore.java?revision=1874570&view=markup#l307]
method (assuming that everything always worked / happy path / no error
handling) and simulating the new
[testInsertSmallBlobs()|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/blob/RDBBlobStoreTest.java?view=markup&pathrev=1874693#l308]
test, first as dynamic SQL, and then as a stored procedure:
{code:sql}
ALTER PROCEDURE dbo.[StoreBlockInDatabase]
(
@Digest VARBINARY(32),
@Level INT,
@Data VARBINARY(MAX)
)
AS
SET NOCOUNT ON;
DECLARE @Id VARCHAR(64) = CONVERT(VARCHAR(64), @Digest, 2),
@Now BIGINT;
SELECT @Now = [cpu_ticks]
FROM sys.dm_os_sys_info;
BEGIN TRY
UPDATE dbo.[DATASTORE_META]
SET [LASTMOD] = @Now
WHERE [ID] = @Id;
IF (@@ROWCOUNT = 0)
BEGIN
INSERT INTO dbo.[DATASTORE_DATA] ([ID], [DATA]) VALUES (@Id,
@Data);
INSERT INTO dbo.[DATASTORE_META] ([ID], [LVL], [LASTMOD])
VALUES (@Id, @Level, @Now);
END;
END TRY
BEGIN CATCH
;THROW;
END CATCH;
GO
{code}
I tested against both the {{Latin1_General_CI_AS}} collation and the
{{Latin1_General_BIN2}} collation:
* The stored procedure inserts 2.5 times more rows than the dynamic SQL version
* The binary collation is slightly better than the non-binary collation per
each test
I will attach the test script once I get it cleaned up.
P.S. I also tested with another pattern to see if doing a {{SELECT}} first
(rather than assuming the {{UPDATE}}) would be better (based on my testing for
a similar case: [UPDATE performance where no data
changes|https://dba.stackexchange.com/a/114369/30859] on DBA.StackExchange),
but it wasn't better.
P.P.S. I figure the {{CREATE PROCEDURE}} statement could be placed in the
vendor-specific area of
[RDBBlobStoreDB|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBBlobStoreDB.java?revision=1874271&view=markup#l60]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)