[
https://issues.apache.org/jira/browse/OAK-9469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Miroslav Smiljanic updated OAK-9469:
------------------------------------
Description:
Lease for *repo.lock* is initially
[requested|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.40.0/oak-segment-azure/src/main/java/org/apache/jackrabbit/oak/segment/azure/AzureRepositoryLock.java#L68]
for period of 60 seconds, and later it is being periodically
[renewed|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.40.0/oak-segment-azure/src/main/java/org/apache/jackrabbit/oak/segment/azure/AzureRepositoryLock.java#L95]
in the separate thread.
Renewal of the lease can be unsuccessful, when StorageException is being
thrown, and shutdown hook being
[invoked|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.40.0/oak-segment-azure/src/main/java/org/apache/jackrabbit/oak/segment/azure/AzureRepositoryLock.java#L107].
Shutdown hook defined in AzurePersistence is basically NOOP that only prints
warning log message. Process running Oak will continue to run.
Later the other process running Oak can try to acquire the same lock and will
be able to do it. From that moment two processes might have write access over
the remote repo, and cause conflicts to each other.
Test case that demonstrates the issue in AzureRepositoryLock:
[^OAK-9469_test.patch]
One cause for the StorageException to be thrown, that I have seen is operation
timeout.
{noformat}
[pool-13-thread-1] org.apache.jackrabbit.oak.segment.azure.AzureRepositoryLock
Can't renew the lease
com.microsoft.azure.storage.StorageException: The client could not finish the
operation within specified maximum execution timeout.
at
com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:254)
at
com.microsoft.azure.storage.blob.CloudBlob.renewLease(CloudBlob.java:2866)
at
com.microsoft.azure.storage.blob.CloudBlob.renewLease(CloudBlob.java:2832)
at
org.apache.jackrabbit.oak.segment.azure.AzureRepositoryLock.refreshLease(AzureRepositoryLock.java:102)
[org.apache.jackrabbit.oak-segment-azure:1.39.0.R1888564]
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.util.concurrent.TimeoutException: The client could not finish
the operation within specified maximum execution timeout.
at
com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:253)
... 8 common frames omitted
{noformat}
was:
Lease for *repo.lock* is initially
[requested|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.40.0/oak-segment-azure/src/main/java/org/apache/jackrabbit/oak/segment/azure/AzureRepositoryLock.java#L68]
for period of 60 seconds, and later it is being periodically
[renewed|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.40.0/oak-segment-azure/src/main/java/org/apache/jackrabbit/oak/segment/azure/AzureRepositoryLock.java#L95]
in the separate thread.
Renewal of the lease can be unsuccessful, when StorageException is being
thrown, and shutdown hook being
[invoked|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.40.0/oak-segment-azure/src/main/java/org/apache/jackrabbit/oak/segment/azure/AzureRepositoryLock.java#L107].
Shutdown hood defined in AzurePersistence is basically NOOP that only prints
warning log message. Process running Oak will continue to run.
Later the other process running Oak can try to acquire the same lock and will
be able to do it. From that moment two processes might have write access over
the remote repo, and cause conflicts to each other.
Test case that demonstrates the issue in AzureRepositoryLock:
[^OAK-9469_test.patch]
One cause for the StorageException to be thrown, that I have seen is operation
timeout.
{noformat}
[pool-13-thread-1] org.apache.jackrabbit.oak.segment.azure.AzureRepositoryLock
Can't renew the lease
com.microsoft.azure.storage.StorageException: The client could not finish the
operation within specified maximum execution timeout.
at
com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:254)
at
com.microsoft.azure.storage.blob.CloudBlob.renewLease(CloudBlob.java:2866)
at
com.microsoft.azure.storage.blob.CloudBlob.renewLease(CloudBlob.java:2832)
at
org.apache.jackrabbit.oak.segment.azure.AzureRepositoryLock.refreshLease(AzureRepositoryLock.java:102)
[org.apache.jackrabbit.oak-segment-azure:1.39.0.R1888564]
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.util.concurrent.TimeoutException: The client could not finish
the operation within specified maximum execution timeout.
at
com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:253)
... 8 common frames omitted
{noformat}
> Unsuccessful lease refresh in AzureRepositoryLock can cause two processes
> using Oak to have write access
> ---------------------------------------------------------------------------------------------------------
>
> Key: OAK-9469
> URL: https://issues.apache.org/jira/browse/OAK-9469
> Project: Jackrabbit Oak
> Issue Type: Bug
> Reporter: Miroslav Smiljanic
> Assignee: Miroslav Smiljanic
> Priority: Major
> Fix For: 1.42.0
>
> Attachments: OAK-9469_op_timeout.patch, OAK-9469_test.patch
>
>
> Lease for *repo.lock* is initially
> [requested|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.40.0/oak-segment-azure/src/main/java/org/apache/jackrabbit/oak/segment/azure/AzureRepositoryLock.java#L68]
> for period of 60 seconds, and later it is being periodically
> [renewed|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.40.0/oak-segment-azure/src/main/java/org/apache/jackrabbit/oak/segment/azure/AzureRepositoryLock.java#L95]
> in the separate thread.
> Renewal of the lease can be unsuccessful, when StorageException is being
> thrown, and shutdown hook being
> [invoked|https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.40.0/oak-segment-azure/src/main/java/org/apache/jackrabbit/oak/segment/azure/AzureRepositoryLock.java#L107].
> Shutdown hook defined in AzurePersistence is basically NOOP that only prints
> warning log message. Process running Oak will continue to run.
> Later the other process running Oak can try to acquire the same lock and will
> be able to do it. From that moment two processes might have write access over
> the remote repo, and cause conflicts to each other.
> Test case that demonstrates the issue in AzureRepositoryLock:
> [^OAK-9469_test.patch]
> One cause for the StorageException to be thrown, that I have seen is
> operation timeout.
> {noformat}
> [pool-13-thread-1]
> org.apache.jackrabbit.oak.segment.azure.AzureRepositoryLock Can't renew the
> lease
> com.microsoft.azure.storage.StorageException: The client could not finish the
> operation within specified maximum execution timeout.
> at
> com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:254)
> at
> com.microsoft.azure.storage.blob.CloudBlob.renewLease(CloudBlob.java:2866)
> at
> com.microsoft.azure.storage.blob.CloudBlob.renewLease(CloudBlob.java:2832)
> at
> org.apache.jackrabbit.oak.segment.azure.AzureRepositoryLock.refreshLease(AzureRepositoryLock.java:102)
> [org.apache.jackrabbit.oak-segment-azure:1.39.0.R1888564]
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
> at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
> at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: java.util.concurrent.TimeoutException: The client could not finish
> the operation within specified maximum execution timeout.
> at
> com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:253)
> ... 8 common frames omitted
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)