Re: Feedback on S3DataStore JMX stats patch (OAK-4712)

2016-09-08 Thread Amit Jain
Thanks Matt!! I will take a look tomorrow.

Regards
Amit

On Thu, Sep 8, 2016 at 8:53 PM, Matt Ryan  wrote:

> New patch attached to address feedback in OAK-4712 on the last patch.
>
> On Thu, Sep 1, 2016 at 11:00 AM, Matt Ryan  wrote:
>
>> This patch I believe addresses the issues identified with the previous
>> patch.  I've also uploaded it to OAK-4712.
>>
>> Looking for review and feedback.
>>
>> This patch introduces a new interface I've named NodeIdMapper.java.  Not
>> sure where to put this.  I've added it in order to be able to obtain the
>> blob ID for a named node within the context of a NodeStoreService class by
>> using an anonymous class.  However, most of the functionality in each case
>> (DocumentNodeStoreService and SegmentNodeStoreService) is similar.  This
>> could probably be addressed with a lot less code duplication by using an
>> abstract base class, but since I'm not sure where would be best to put such
>> a class.  Ideas?
>>
>> I also want to call attention to how the blob ID is being used in these
>> anonymous classes in the *NodeStoreService classes.  The IDs I was getting
>> back had trailing information with a '#' followed by some number of
>> digits.  I just split the string and discarded the last part.  Is there a
>> better way to do this?
>>
>>
>> On Sun, Aug 28, 2016 at 10:04 PM, Amit Jain  wrote:
>>
>>> Hi Matt,
>>>
>>> I have directly replied to your comments on the jira.
>>>
>>> Thanks
>>> Amit
>>>
>>> On Sat, Aug 27, 2016 at 4:22 AM, Matt Ryan  wrote:
>>>
>>> > Use this patch instead; updated patch from latest in trunk.
>>> >
>>> > On Fri, Aug 26, 2016 at 4:41 PM, Matt Ryan  wrote:
>>> >
>>> >> Hi Oak Devs,
>>> >>
>>> >> I've created OAK-4712 and submitted a patch for the same.  I've
>>> attached
>>> >> the same patch to this email.
>>> >>
>>> >> The submission is to add a new MBean, S3DataStoreStats, which will
>>> allow
>>> >> reporting via JMX about the state of the S3DataStore.  Two metrics are
>>> >> intended.  The first is to report the number of files that are in the
>>> sync
>>> >> state, meaning they have been added to the S3DataStore but are not yet
>>> >> completely copied into S3.  The second is to allow to query the sync
>>> state
>>> >> of a file - a return value of true would mean that the file provided
>>> is
>>> >> fully synchronized.  This uses an external file name, not the
>>> internal ID.
>>> >>
>>> >> I have some questions about the implementation first:
>>> >> 1. For the request to query the sync state of a file, should the file
>>> >> name provided be a full path to a local file or a path to a file
>>> within OAK
>>> >> (e.g. /content/dam/myimage.jpg)?  Current implementation uses a local
>>> file
>>> >> path but I have been wondering if it should be an OAK path.
>>> >> 2. For the request to query the sync state of a file, when converting
>>> >> from the externally-supplied file name to an internal DataIdentifier,
>>> this
>>> >> implementation is performing the same calculation to determine the
>>> internal
>>> >> ID name as is done when a file is stored.  I have a number of
>>> concerns with
>>> >> this:
>>> >>- It is inefficient - the entire file has to be read and digested
>>> in
>>> >> order to compute the internal ID.  This takes a long time for large
>>> assets.
>>> >>- I've essentially duplicated the logic from CachingDataStore into
>>> >> S3DataStore to compute the internal ID.  I hate duplicating the code,
>>> but I
>>> >> am trying to avoid exposing internal IDs in API, and I am not seeing
>>> a good
>>> >> way in the current implementation to avoid this without either
>>> modifying
>>> >> public API to CachingDataStore, or exposing the internal ID via API,
>>> or
>>> >> both.
>>> >>
>>> >> Any suggestions on these two issues?
>>> >>
>>> >>
>>> >> I'm also experiencing a problem with this patch.  In my testing it
>>> >> appears to work fine, until I delete a file.  For example, if I
>>> delete an
>>> >> asset via the REST API, I will see the asset deleted in CRXDE.
>>> However,
>>> >> the file still remains in S3.  This MBean as implemented only knows
>>> how to
>>> >> check with S3DataStore and the corresponding backend, and these all
>>> appear
>>> >> to believe the file still exists.  So the MBean continues to report
>>> that
>>> >> the file's sync state is synchronized (i.e. isFileSynced() returns
>>> true)
>>> >> even though the file has been removed from the JCR.  Any suggestions
>>> on how
>>> >> to resolve this?
>>> >>
>>> >>
>>> >> Finally, any feedback on the patch is welcome.  And if I did the
>>> process
>>> >> wrong please correct me (gently) - first time submission here.  Thanks
>>> >>
>>> >>
>>> >
>>>
>>
>>
>


Re: Feedback on S3DataStore JMX stats patch (OAK-4712)

2016-09-08 Thread Matt Ryan
New patch attached to address feedback in OAK-4712 on the last patch.

On Thu, Sep 1, 2016 at 11:00 AM, Matt Ryan  wrote:

> This patch I believe addresses the issues identified with the previous
> patch.  I've also uploaded it to OAK-4712.
>
> Looking for review and feedback.
>
> This patch introduces a new interface I've named NodeIdMapper.java.  Not
> sure where to put this.  I've added it in order to be able to obtain the
> blob ID for a named node within the context of a NodeStoreService class by
> using an anonymous class.  However, most of the functionality in each case
> (DocumentNodeStoreService and SegmentNodeStoreService) is similar.  This
> could probably be addressed with a lot less code duplication by using an
> abstract base class, but since I'm not sure where would be best to put such
> a class.  Ideas?
>
> I also want to call attention to how the blob ID is being used in these
> anonymous classes in the *NodeStoreService classes.  The IDs I was getting
> back had trailing information with a '#' followed by some number of
> digits.  I just split the string and discarded the last part.  Is there a
> better way to do this?
>
>
> On Sun, Aug 28, 2016 at 10:04 PM, Amit Jain  wrote:
>
>> Hi Matt,
>>
>> I have directly replied to your comments on the jira.
>>
>> Thanks
>> Amit
>>
>> On Sat, Aug 27, 2016 at 4:22 AM, Matt Ryan  wrote:
>>
>> > Use this patch instead; updated patch from latest in trunk.
>> >
>> > On Fri, Aug 26, 2016 at 4:41 PM, Matt Ryan  wrote:
>> >
>> >> Hi Oak Devs,
>> >>
>> >> I've created OAK-4712 and submitted a patch for the same.  I've
>> attached
>> >> the same patch to this email.
>> >>
>> >> The submission is to add a new MBean, S3DataStoreStats, which will
>> allow
>> >> reporting via JMX about the state of the S3DataStore.  Two metrics are
>> >> intended.  The first is to report the number of files that are in the
>> sync
>> >> state, meaning they have been added to the S3DataStore but are not yet
>> >> completely copied into S3.  The second is to allow to query the sync
>> state
>> >> of a file - a return value of true would mean that the file provided is
>> >> fully synchronized.  This uses an external file name, not the internal
>> ID.
>> >>
>> >> I have some questions about the implementation first:
>> >> 1. For the request to query the sync state of a file, should the file
>> >> name provided be a full path to a local file or a path to a file
>> within OAK
>> >> (e.g. /content/dam/myimage.jpg)?  Current implementation uses a local
>> file
>> >> path but I have been wondering if it should be an OAK path.
>> >> 2. For the request to query the sync state of a file, when converting
>> >> from the externally-supplied file name to an internal DataIdentifier,
>> this
>> >> implementation is performing the same calculation to determine the
>> internal
>> >> ID name as is done when a file is stored.  I have a number of concerns
>> with
>> >> this:
>> >>- It is inefficient - the entire file has to be read and digested in
>> >> order to compute the internal ID.  This takes a long time for large
>> assets.
>> >>- I've essentially duplicated the logic from CachingDataStore into
>> >> S3DataStore to compute the internal ID.  I hate duplicating the code,
>> but I
>> >> am trying to avoid exposing internal IDs in API, and I am not seeing a
>> good
>> >> way in the current implementation to avoid this without either
>> modifying
>> >> public API to CachingDataStore, or exposing the internal ID via API, or
>> >> both.
>> >>
>> >> Any suggestions on these two issues?
>> >>
>> >>
>> >> I'm also experiencing a problem with this patch.  In my testing it
>> >> appears to work fine, until I delete a file.  For example, if I delete
>> an
>> >> asset via the REST API, I will see the asset deleted in CRXDE.
>> However,
>> >> the file still remains in S3.  This MBean as implemented only knows
>> how to
>> >> check with S3DataStore and the corresponding backend, and these all
>> appear
>> >> to believe the file still exists.  So the MBean continues to report
>> that
>> >> the file's sync state is synchronized (i.e. isFileSynced() returns
>> true)
>> >> even though the file has been removed from the JCR.  Any suggestions
>> on how
>> >> to resolve this?
>> >>
>> >>
>> >> Finally, any feedback on the patch is welcome.  And if I did the
>> process
>> >> wrong please correct me (gently) - first time submission here.  Thanks
>> >>
>> >>
>> >
>>
>
>
diff --git a/oak-blob-cloud/pom.xml b/oak-blob-cloud/pom.xml
index 800f716..ad3cb3c 100644
--- a/oak-blob-cloud/pom.xml
+++ b/oak-blob-cloud/pom.xml
@@ -41,7 +41,7 @@
 maven-bundle-plugin
 
 
-
org.apache.jackrabbit.oak.blob.cloud.aws.s3
+
org.apache.jackrabbit.oak.blob.cloud.aws.s3,org.apache.jackrabbit.oak.blob.cloud.aws.s3.stats
 sun.io
   

Re: Feedback on S3DataStore JMX stats patch (OAK-4712)

2016-09-01 Thread Matt Ryan
This patch I believe addresses the issues identified with the previous
patch.  I've also uploaded it to OAK-4712.

Looking for review and feedback.

This patch introduces a new interface I've named NodeIdMapper.java.  Not
sure where to put this.  I've added it in order to be able to obtain the
blob ID for a named node within the context of a NodeStoreService class by
using an anonymous class.  However, most of the functionality in each case
(DocumentNodeStoreService and SegmentNodeStoreService) is similar.  This
could probably be addressed with a lot less code duplication by using an
abstract base class, but since I'm not sure where would be best to put such
a class.  Ideas?

I also want to call attention to how the blob ID is being used in these
anonymous classes in the *NodeStoreService classes.  The IDs I was getting
back had trailing information with a '#' followed by some number of
digits.  I just split the string and discarded the last part.  Is there a
better way to do this?


On Sun, Aug 28, 2016 at 10:04 PM, Amit Jain  wrote:

> Hi Matt,
>
> I have directly replied to your comments on the jira.
>
> Thanks
> Amit
>
> On Sat, Aug 27, 2016 at 4:22 AM, Matt Ryan  wrote:
>
> > Use this patch instead; updated patch from latest in trunk.
> >
> > On Fri, Aug 26, 2016 at 4:41 PM, Matt Ryan  wrote:
> >
> >> Hi Oak Devs,
> >>
> >> I've created OAK-4712 and submitted a patch for the same.  I've attached
> >> the same patch to this email.
> >>
> >> The submission is to add a new MBean, S3DataStoreStats, which will allow
> >> reporting via JMX about the state of the S3DataStore.  Two metrics are
> >> intended.  The first is to report the number of files that are in the
> sync
> >> state, meaning they have been added to the S3DataStore but are not yet
> >> completely copied into S3.  The second is to allow to query the sync
> state
> >> of a file - a return value of true would mean that the file provided is
> >> fully synchronized.  This uses an external file name, not the internal
> ID.
> >>
> >> I have some questions about the implementation first:
> >> 1. For the request to query the sync state of a file, should the file
> >> name provided be a full path to a local file or a path to a file within
> OAK
> >> (e.g. /content/dam/myimage.jpg)?  Current implementation uses a local
> file
> >> path but I have been wondering if it should be an OAK path.
> >> 2. For the request to query the sync state of a file, when converting
> >> from the externally-supplied file name to an internal DataIdentifier,
> this
> >> implementation is performing the same calculation to determine the
> internal
> >> ID name as is done when a file is stored.  I have a number of concerns
> with
> >> this:
> >>- It is inefficient - the entire file has to be read and digested in
> >> order to compute the internal ID.  This takes a long time for large
> assets.
> >>- I've essentially duplicated the logic from CachingDataStore into
> >> S3DataStore to compute the internal ID.  I hate duplicating the code,
> but I
> >> am trying to avoid exposing internal IDs in API, and I am not seeing a
> good
> >> way in the current implementation to avoid this without either modifying
> >> public API to CachingDataStore, or exposing the internal ID via API, or
> >> both.
> >>
> >> Any suggestions on these two issues?
> >>
> >>
> >> I'm also experiencing a problem with this patch.  In my testing it
> >> appears to work fine, until I delete a file.  For example, if I delete
> an
> >> asset via the REST API, I will see the asset deleted in CRXDE.  However,
> >> the file still remains in S3.  This MBean as implemented only knows how
> to
> >> check with S3DataStore and the corresponding backend, and these all
> appear
> >> to believe the file still exists.  So the MBean continues to report that
> >> the file's sync state is synchronized (i.e. isFileSynced() returns true)
> >> even though the file has been removed from the JCR.  Any suggestions on
> how
> >> to resolve this?
> >>
> >>
> >> Finally, any feedback on the patch is welcome.  And if I did the process
> >> wrong please correct me (gently) - first time submission here.  Thanks
> >>
> >>
> >
>
diff --git a/oak-blob-cloud/pom.xml b/oak-blob-cloud/pom.xml
index 800f716..ad3cb3c 100644
--- a/oak-blob-cloud/pom.xml
+++ b/oak-blob-cloud/pom.xml
@@ -41,7 +41,7 @@
 maven-bundle-plugin
 
 
-
org.apache.jackrabbit.oak.blob.cloud.aws.s3
+
org.apache.jackrabbit.oak.blob.cloud.aws.s3,org.apache.jackrabbit.oak.blob.cloud.aws.s3.stats
 sun.io
 
 
@@ -101,6 +101,13 @@
 ${jackrabbit.version}
 
 
+
+
+org.apache.jackrabbit
+oak-commons
+${project.version}
+
+
 
 
 com.amazonaws
@@ -140,6 +147,12 @@
   

Re: Feedback on S3DataStore JMX stats patch (OAK-4712)

2016-08-28 Thread Amit Jain
Hi Matt,

I have directly replied to your comments on the jira.

Thanks
Amit

On Sat, Aug 27, 2016 at 4:22 AM, Matt Ryan  wrote:

> Use this patch instead; updated patch from latest in trunk.
>
> On Fri, Aug 26, 2016 at 4:41 PM, Matt Ryan  wrote:
>
>> Hi Oak Devs,
>>
>> I've created OAK-4712 and submitted a patch for the same.  I've attached
>> the same patch to this email.
>>
>> The submission is to add a new MBean, S3DataStoreStats, which will allow
>> reporting via JMX about the state of the S3DataStore.  Two metrics are
>> intended.  The first is to report the number of files that are in the sync
>> state, meaning they have been added to the S3DataStore but are not yet
>> completely copied into S3.  The second is to allow to query the sync state
>> of a file - a return value of true would mean that the file provided is
>> fully synchronized.  This uses an external file name, not the internal ID.
>>
>> I have some questions about the implementation first:
>> 1. For the request to query the sync state of a file, should the file
>> name provided be a full path to a local file or a path to a file within OAK
>> (e.g. /content/dam/myimage.jpg)?  Current implementation uses a local file
>> path but I have been wondering if it should be an OAK path.
>> 2. For the request to query the sync state of a file, when converting
>> from the externally-supplied file name to an internal DataIdentifier, this
>> implementation is performing the same calculation to determine the internal
>> ID name as is done when a file is stored.  I have a number of concerns with
>> this:
>>- It is inefficient - the entire file has to be read and digested in
>> order to compute the internal ID.  This takes a long time for large assets.
>>- I've essentially duplicated the logic from CachingDataStore into
>> S3DataStore to compute the internal ID.  I hate duplicating the code, but I
>> am trying to avoid exposing internal IDs in API, and I am not seeing a good
>> way in the current implementation to avoid this without either modifying
>> public API to CachingDataStore, or exposing the internal ID via API, or
>> both.
>>
>> Any suggestions on these two issues?
>>
>>
>> I'm also experiencing a problem with this patch.  In my testing it
>> appears to work fine, until I delete a file.  For example, if I delete an
>> asset via the REST API, I will see the asset deleted in CRXDE.  However,
>> the file still remains in S3.  This MBean as implemented only knows how to
>> check with S3DataStore and the corresponding backend, and these all appear
>> to believe the file still exists.  So the MBean continues to report that
>> the file's sync state is synchronized (i.e. isFileSynced() returns true)
>> even though the file has been removed from the JCR.  Any suggestions on how
>> to resolve this?
>>
>>
>> Finally, any feedback on the patch is welcome.  And if I did the process
>> wrong please correct me (gently) - first time submission here.  Thanks
>>
>>
>


Re: Feedback on S3DataStore JMX stats patch (OAK-4712)

2016-08-26 Thread Matt Ryan
Use this patch instead; updated patch from latest in trunk.

On Fri, Aug 26, 2016 at 4:41 PM, Matt Ryan  wrote:

> Hi Oak Devs,
>
> I've created OAK-4712 and submitted a patch for the same.  I've attached
> the same patch to this email.
>
> The submission is to add a new MBean, S3DataStoreStats, which will allow
> reporting via JMX about the state of the S3DataStore.  Two metrics are
> intended.  The first is to report the number of files that are in the sync
> state, meaning they have been added to the S3DataStore but are not yet
> completely copied into S3.  The second is to allow to query the sync state
> of a file - a return value of true would mean that the file provided is
> fully synchronized.  This uses an external file name, not the internal ID.
>
> I have some questions about the implementation first:
> 1. For the request to query the sync state of a file, should the file name
> provided be a full path to a local file or a path to a file within OAK
> (e.g. /content/dam/myimage.jpg)?  Current implementation uses a local file
> path but I have been wondering if it should be an OAK path.
> 2. For the request to query the sync state of a file, when converting from
> the externally-supplied file name to an internal DataIdentifier, this
> implementation is performing the same calculation to determine the internal
> ID name as is done when a file is stored.  I have a number of concerns with
> this:
>- It is inefficient - the entire file has to be read and digested in
> order to compute the internal ID.  This takes a long time for large assets.
>- I've essentially duplicated the logic from CachingDataStore into
> S3DataStore to compute the internal ID.  I hate duplicating the code, but I
> am trying to avoid exposing internal IDs in API, and I am not seeing a good
> way in the current implementation to avoid this without either modifying
> public API to CachingDataStore, or exposing the internal ID via API, or
> both.
>
> Any suggestions on these two issues?
>
>
> I'm also experiencing a problem with this patch.  In my testing it appears
> to work fine, until I delete a file.  For example, if I delete an asset via
> the REST API, I will see the asset deleted in CRXDE.  However, the file
> still remains in S3.  This MBean as implemented only knows how to check
> with S3DataStore and the corresponding backend, and these all appear to
> believe the file still exists.  So the MBean continues to report that the
> file's sync state is synchronized (i.e. isFileSynced() returns true) even
> though the file has been removed from the JCR.  Any suggestions on how to
> resolve this?
>
>
> Finally, any feedback on the patch is welcome.  And if I did the process
> wrong please correct me (gently) - first time submission here.  Thanks
>
>
diff --git a/oak-blob-cloud/pom.xml b/oak-blob-cloud/pom.xml
index 800f716..ad3cb3c 100644
--- a/oak-blob-cloud/pom.xml
+++ b/oak-blob-cloud/pom.xml
@@ -41,7 +41,7 @@
 maven-bundle-plugin
 
 
-
org.apache.jackrabbit.oak.blob.cloud.aws.s3
+
org.apache.jackrabbit.oak.blob.cloud.aws.s3,org.apache.jackrabbit.oak.blob.cloud.aws.s3.stats
 sun.io
 
 
@@ -101,6 +101,13 @@
 ${jackrabbit.version}
 
 
+
+
+org.apache.jackrabbit
+oak-commons
+${project.version}
+
+
 
 
 com.amazonaws
@@ -140,6 +147,12 @@
 logback-classic
 test
 
+
+org.mockito
+mockito-core
+1.10.19
+test
+
 
 
 
diff --git 
a/oak-blob-cloud/src/main/java/org/apache/jackrabbit/oak/blob/cloud/aws/s3/S3DataStore.java
 
b/oak-blob-cloud/src/main/java/org/apache/jackrabbit/oak/blob/cloud/aws/s3/S3DataStore.java
index fc21bf6..be2ed54 100644
--- 
a/oak-blob-cloud/src/main/java/org/apache/jackrabbit/oak/blob/cloud/aws/s3/S3DataStore.java
+++ 
b/oak-blob-cloud/src/main/java/org/apache/jackrabbit/oak/blob/cloud/aws/s3/S3DataStore.java
@@ -16,17 +16,64 @@
  */
 package org.apache.jackrabbit.oak.blob.cloud.aws.s3;
 
+import java.io.FileInputStream;
+import java.io.OutputStream;
+import java.security.DigestOutputStream;
+import java.security.MessageDigest;
+import java.security.NoSuchAlgorithmException;
 import java.util.Properties;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import org.apache.commons.io.IOUtils;
+import org.apache.commons.io.output.NullOutputStream;
 import org.apache.jackrabbit.core.data.Backend;
 import org.apache.jackrabbit.core.data.CachingDataStore;
+import org.apache.jackrabbit.core.data.DataIdentifier;
+import 

Feedback on S3DataStore JMX stats patch (OAK-4712)

2016-08-26 Thread Matt Ryan
Hi Oak Devs,

I've created OAK-4712 and submitted a patch for the same.  I've attached
the same patch to this email.

The submission is to add a new MBean, S3DataStoreStats, which will allow
reporting via JMX about the state of the S3DataStore.  Two metrics are
intended.  The first is to report the number of files that are in the sync
state, meaning they have been added to the S3DataStore but are not yet
completely copied into S3.  The second is to allow to query the sync state
of a file - a return value of true would mean that the file provided is
fully synchronized.  This uses an external file name, not the internal ID.

I have some questions about the implementation first:
1. For the request to query the sync state of a file, should the file name
provided be a full path to a local file or a path to a file within OAK
(e.g. /content/dam/myimage.jpg)?  Current implementation uses a local file
path but I have been wondering if it should be an OAK path.
2. For the request to query the sync state of a file, when converting from
the externally-supplied file name to an internal DataIdentifier, this
implementation is performing the same calculation to determine the internal
ID name as is done when a file is stored.  I have a number of concerns with
this:
   - It is inefficient - the entire file has to be read and digested in
order to compute the internal ID.  This takes a long time for large assets.
   - I've essentially duplicated the logic from CachingDataStore into
S3DataStore to compute the internal ID.  I hate duplicating the code, but I
am trying to avoid exposing internal IDs in API, and I am not seeing a good
way in the current implementation to avoid this without either modifying
public API to CachingDataStore, or exposing the internal ID via API, or
both.

Any suggestions on these two issues?


I'm also experiencing a problem with this patch.  In my testing it appears
to work fine, until I delete a file.  For example, if I delete an asset via
the REST API, I will see the asset deleted in CRXDE.  However, the file
still remains in S3.  This MBean as implemented only knows how to check
with S3DataStore and the corresponding backend, and these all appear to
believe the file still exists.  So the MBean continues to report that the
file's sync state is synchronized (i.e. isFileSynced() returns true) even
though the file has been removed from the JCR.  Any suggestions on how to
resolve this?


Finally, any feedback on the patch is welcome.  And if I did the process
wrong please correct me (gently) - first time submission here.  Thanks
diff --git a/oak-blob-cloud/pom.xml b/oak-blob-cloud/pom.xml
index 800f716..ad3cb3c 100644
--- a/oak-blob-cloud/pom.xml
+++ b/oak-blob-cloud/pom.xml
@@ -41,7 +41,7 @@
 maven-bundle-plugin
 
 
-
org.apache.jackrabbit.oak.blob.cloud.aws.s3
+
org.apache.jackrabbit.oak.blob.cloud.aws.s3,org.apache.jackrabbit.oak.blob.cloud.aws.s3.stats
 sun.io
 
 
@@ -101,6 +101,13 @@
 ${jackrabbit.version}
 
 
+
+
+org.apache.jackrabbit
+oak-commons
+${project.version}
+
+
 
 
 com.amazonaws
@@ -140,6 +147,12 @@
 logback-classic
 test
 
+
+org.mockito
+mockito-core
+1.10.19
+test
+
 
 
 
diff --git 
a/oak-blob-cloud/src/main/java/org/apache/jackrabbit/oak/blob/cloud/aws/s3/S3DataStore.java
 
b/oak-blob-cloud/src/main/java/org/apache/jackrabbit/oak/blob/cloud/aws/s3/S3DataStore.java
index fc21bf6..be2ed54 100644
--- 
a/oak-blob-cloud/src/main/java/org/apache/jackrabbit/oak/blob/cloud/aws/s3/S3DataStore.java
+++ 
b/oak-blob-cloud/src/main/java/org/apache/jackrabbit/oak/blob/cloud/aws/s3/S3DataStore.java
@@ -16,17 +16,64 @@
  */
 package org.apache.jackrabbit.oak.blob.cloud.aws.s3;
 
+import java.io.FileInputStream;
+import java.io.OutputStream;
+import java.security.DigestOutputStream;
+import java.security.MessageDigest;
+import java.security.NoSuchAlgorithmException;
 import java.util.Properties;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import org.apache.commons.io.IOUtils;
+import org.apache.commons.io.output.NullOutputStream;
 import org.apache.jackrabbit.core.data.Backend;
 import org.apache.jackrabbit.core.data.CachingDataStore;
+import org.apache.jackrabbit.core.data.DataIdentifier;
+import org.apache.jackrabbit.core.data.DataStoreException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 
 /**
  * An Amazon S3 data store.
  */
 public class S3DataStore extends CachingDataStore {
+
+/**
+ * Logger instance.
+ */
+