Re: S3DataStore leverage Cross-Region Replication
Maybe not a bottleneck, but for large data sets would it reduce the data transfer costs as the inter-s3 bucket transfer would use AWS’ internal links rather than route over external interfaces. -Bruce > Has this been tested already ? Generally, wdyt ? No. I suggest to first test cross geo mongo deployment with single S3 bucket. There shouldn't be functional issue in using single S3 bucket. Few customers use single shared S3 bucket between non-clustered cross-geo jackrabbit2 repositories in production. Sure, adding more complexity only make sense if we can demonstrate this is a bottleneck. Regards, Timothee
RE: S3DataStore leverage Cross-Region Replication
Yes Michael. It would be in OAK's BlobStore Thanks, -shashank -Original Message- From: Michael Marth [mailto:mma...@adobe.com] Sent: Tuesday, June 30, 2015 4:50 PM To: oak-dev@jackrabbit.apache.org Subject: Re: S3DataStore leverage Cross-Region Replication Shashank, In case we think it’s needed to implement multiple chained S3 DSs then I think we should model it after Jackrabbit’s Multidatastore which allows arbitrary DS implementations to be chained: http://jackrabbit.510166.n4.nabble.com/MultiDataStore-td4655772.html Michael On 30/06/15 12:11, Shashank Gupta shgu...@adobe.com wrote: Hi Tim, There is no time bound SLA provided by AWS when a given binary would be successfully replicated to destination S3 bucket. There would be cases of missing binaries if mongo nodes sync faster than S3 replication. Also S3 replication works between a given pair of buckets. So one S3 bucket can replicate to a single S3 destination bucket. I think we can implement a tiered S3Datastore which writes/reads to/from multiple S3 buckets. The tiered S3DS first tries to read from same-region bucket and if not found than fallback to cross-geo buckets. Has this been tested already ? Generally, wdyt ? No. I suggest to first test cross geo mongo deployment with single S3 bucket. There shouldn't be functional issue in using single S3 bucket. Few customers use single shared S3 bucket between non-clustered cross-geo jackrabbit2 repositories in production. Thanks, -shashank -Original Message- From: maret.timot...@gmail.com [mailto:maret.timot...@gmail.com] On Behalf Of Timothée Maret Sent: Monday, June 29, 2015 4:05 PM To: oak-dev@jackrabbit.apache.org Subject: S3DataStore leverage Cross-Region Replication Hi, In a cross region setup using the S3 data store, it may make sense to leverage the Cross-Region auto replication of S3 buckets [0,1]. In order to avoid data replication issues it would make sense IMO to allow configuring the S3DataStore with two S3 buckets, one for writing and one for reading. The writing bucket would be shared among all instance (from all regions) while the reading bucket would be in each region (thus decreasing the latency). The writing bucket would auto replicate to the reading buckets. Has this been tested already ? Generally, wdyt ? Regards, Timothee [0] https://aws.amazon.com/blogs/aws/new-cross-region-replication-for-amazo n-s3/ [1] https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html
Re: S3DataStore leverage Cross-Region Replication
Hi Shashank, Thanks for this. 2015-06-30 12:11 GMT+02:00 Shashank Gupta shgu...@adobe.com: Hi Tim, There is no time bound SLA provided by AWS when a given binary would be successfully replicated to destination S3 bucket. There would be cases of missing binaries if mongo nodes sync faster than S3 replication. Yes, this would be expected, until the buckets replicate. Also S3 replication works between a given pair of buckets. So one S3 bucket can replicate to a single S3 destination bucket. Yes, the setup would be limited to two regions. I think we can implement a tiered S3Datastore which writes/reads to/from multiple S3 buckets. The tiered S3DS first tries to read from same-region bucket and if not found than fallback to cross-geo buckets. Great, although I see it would be valuable in a limited set of use cases (only two regions involved). Has this been tested already ? Generally, wdyt ? No. I suggest to first test cross geo mongo deployment with single S3 bucket. There shouldn't be functional issue in using single S3 bucket. Few customers use single shared S3 bucket between non-clustered cross-geo jackrabbit2 repositories in production. Sure, adding more complexity only make sense if we can demonstrate this is a bottleneck. Regards, Timothee Thanks, -shashank -Original Message- From: maret.timot...@gmail.com [mailto:maret.timot...@gmail.com] On Behalf Of Timothée Maret Sent: Monday, June 29, 2015 4:05 PM To: oak-dev@jackrabbit.apache.org Subject: S3DataStore leverage Cross-Region Replication Hi, In a cross region setup using the S3 data store, it may make sense to leverage the Cross-Region auto replication of S3 buckets [0,1]. In order to avoid data replication issues it would make sense IMO to allow configuring the S3DataStore with two S3 buckets, one for writing and one for reading. The writing bucket would be shared among all instance (from all regions) while the reading bucket would be in each region (thus decreasing the latency). The writing bucket would auto replicate to the reading buckets. Has this been tested already ? Generally, wdyt ? Regards, Timothee [0] https://aws.amazon.com/blogs/aws/new-cross-region-replication-for-amazon-s3/ [1] https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html
RE: S3DataStore leverage Cross-Region Replication
Hi Tim, There is no time bound SLA provided by AWS when a given binary would be successfully replicated to destination S3 bucket. There would be cases of missing binaries if mongo nodes sync faster than S3 replication. Also S3 replication works between a given pair of buckets. So one S3 bucket can replicate to a single S3 destination bucket. I think we can implement a tiered S3Datastore which writes/reads to/from multiple S3 buckets. The tiered S3DS first tries to read from same-region bucket and if not found than fallback to cross-geo buckets. Has this been tested already ? Generally, wdyt ? No. I suggest to first test cross geo mongo deployment with single S3 bucket. There shouldn't be functional issue in using single S3 bucket. Few customers use single shared S3 bucket between non-clustered cross-geo jackrabbit2 repositories in production. Thanks, -shashank -Original Message- From: maret.timot...@gmail.com [mailto:maret.timot...@gmail.com] On Behalf Of Timothée Maret Sent: Monday, June 29, 2015 4:05 PM To: oak-dev@jackrabbit.apache.org Subject: S3DataStore leverage Cross-Region Replication Hi, In a cross region setup using the S3 data store, it may make sense to leverage the Cross-Region auto replication of S3 buckets [0,1]. In order to avoid data replication issues it would make sense IMO to allow configuring the S3DataStore with two S3 buckets, one for writing and one for reading. The writing bucket would be shared among all instance (from all regions) while the reading bucket would be in each region (thus decreasing the latency). The writing bucket would auto replicate to the reading buckets. Has this been tested already ? Generally, wdyt ? Regards, Timothee [0] https://aws.amazon.com/blogs/aws/new-cross-region-replication-for-amazon-s3/ [1] https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html
Re: S3DataStore leverage Cross-Region Replication
Shashank, In case we think it’s needed to implement multiple chained S3 DSs then I think we should model it after Jackrabbit’s Multidatastore which allows arbitrary DS implementations to be chained: http://jackrabbit.510166.n4.nabble.com/MultiDataStore-td4655772.html Michael On 30/06/15 12:11, Shashank Gupta shgu...@adobe.com wrote: Hi Tim, There is no time bound SLA provided by AWS when a given binary would be successfully replicated to destination S3 bucket. There would be cases of missing binaries if mongo nodes sync faster than S3 replication. Also S3 replication works between a given pair of buckets. So one S3 bucket can replicate to a single S3 destination bucket. I think we can implement a tiered S3Datastore which writes/reads to/from multiple S3 buckets. The tiered S3DS first tries to read from same-region bucket and if not found than fallback to cross-geo buckets. Has this been tested already ? Generally, wdyt ? No. I suggest to first test cross geo mongo deployment with single S3 bucket. There shouldn't be functional issue in using single S3 bucket. Few customers use single shared S3 bucket between non-clustered cross-geo jackrabbit2 repositories in production. Thanks, -shashank -Original Message- From: maret.timot...@gmail.com [mailto:maret.timot...@gmail.com] On Behalf Of Timothée Maret Sent: Monday, June 29, 2015 4:05 PM To: oak-dev@jackrabbit.apache.org Subject: S3DataStore leverage Cross-Region Replication Hi, In a cross region setup using the S3 data store, it may make sense to leverage the Cross-Region auto replication of S3 buckets [0,1]. In order to avoid data replication issues it would make sense IMO to allow configuring the S3DataStore with two S3 buckets, one for writing and one for reading. The writing bucket would be shared among all instance (from all regions) while the reading bucket would be in each region (thus decreasing the latency). The writing bucket would auto replicate to the reading buckets. Has this been tested already ? Generally, wdyt ? Regards, Timothee [0] https://aws.amazon.com/blogs/aws/new-cross-region-replication-for-amazon-s3/ [1] https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html
S3DataStore leverage Cross-Region Replication
Hi, In a cross region setup using the S3 data store, it may make sense to leverage the Cross-Region auto replication of S3 buckets [0,1]. In order to avoid data replication issues it would make sense IMO to allow configuring the S3DataStore with two S3 buckets, one for writing and one for reading. The writing bucket would be shared among all instance (from all regions) while the reading bucket would be in each region (thus decreasing the latency). The writing bucket would auto replicate to the reading buckets. Has this been tested already ? Generally, wdyt ? Regards, Timothee [0] https://aws.amazon.com/blogs/aws/new-cross-region-replication-for-amazon-s3/ [1] https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html