Re: OAK-9238 oak-run explore should support Azure Segment Store (Review)

2020-09-28 Thread Andrei Dulceanu
Hi Aravindo,

Thank you for your contribution! I will review the changes and commit the
patch once done.

Regards,
Andrei

On Sat, Sep 26, 2020 at 11:35 PM Aravindo Wingeier
 wrote:

> Hi
>
> In my personal time, I added support for AzureSegmentStore to `oak-run
> explore`. While outdated, I still find it a handy tool.  The actual change
> is small, the diff is just bloated by the new abstract class I added.
>
> Please review my PR.
> https://github.com/apache/jackrabbit-oak/pull/255/files
>
> Story: https://issues.apache.org/jira/browse/OAK-9238
>
> Thanks,
> Aravindo
>


OAK-9238 oak-run explore should support Azure Segment Store (Review)

2020-09-26 Thread Aravindo Wingeier
Hi

In my personal time, I added support for AzureSegmentStore to `oak-run 
explore`. While outdated, I still find it a handy tool.  The actual change is 
small, the diff is just bloated by the new abstract class I added.

Please review my PR.  https://github.com/apache/jackrabbit-oak/pull/255/files

Story: https://issues.apache.org/jira/browse/OAK-9238

Thanks,
Aravindo


Re: Azure Segment Store

2018-03-12 Thread Ian Boston
Hi,
I misread the documentation in the patch.
Thank you for pointing out my mistake.
Best Regards
Ian

On 6 March 2018 at 09:53, Tomek Rękawek  wrote:

> Hi Ian,
>
> > On 5 Mar 2018, at 17:47, Ian Boston  wrote:
> >
> > I assume that the patch deals with the 50K limit[1] to the number of
> blocks
> > per Azure Blob store ?
>
> As far as I understand, it’s the limit that applies to the number of
> blocks in a single blob. Block is a single write. Since the segments are
> immutable (written at once), we don’t need to worry about this limit for
> the segments. It’s a different case for the journal file - a single commit
> leads to a single append which adds a block. However, the patch takes care
> of this, by creating journal.log.001, .002, when we’re close to the limit
> [1].
>
> Regards,
> Tomek
>
> [1] https://github.com/trekawek/jackrabbit-oak/blob/OAK-6922/
> oak-segment-azure/src/main/java/org/apache/jackrabbit/oak/segment/azure/
> AzureJournalFile.java#L37
>
> --
> Tomek Rękawek | Adobe Research | www.adobe.com
> reka...@adobe.com
>
>


Re: Azure Segment Store

2018-03-06 Thread Tomek Rękawek
Hi Ian,

> On 5 Mar 2018, at 17:47, Ian Boston  wrote:
> 
> I assume that the patch deals with the 50K limit[1] to the number of blocks
> per Azure Blob store ?

As far as I understand, it’s the limit that applies to the number of blocks in 
a single blob. Block is a single write. Since the segments are immutable 
(written at once), we don’t need to worry about this limit for the segments. 
It’s a different case for the journal file - a single commit leads to a single 
append which adds a block. However, the patch takes care of this, by creating 
journal.log.001, .002, when we’re close to the limit [1].

Regards,
Tomek

[1] 
https://github.com/trekawek/jackrabbit-oak/blob/OAK-6922/oak-segment-azure/src/main/java/org/apache/jackrabbit/oak/segment/azure/AzureJournalFile.java#L37

--
Tomek Rękawek | Adobe Research | www.adobe.com
reka...@adobe.com



signature.asc
Description: Message signed with OpenPGP


Re: Azure Segment Store

2018-03-06 Thread Valentin Olteanu
> I assume that the patch deals with the 50K limit[1] to the number of
blocks per Azure Blob store ?

I read that limit differently: when you upload a large blob, it will be
split in up to 50K blocks of max 100MiB, thus a single blob cannot be
larger than 4.75 TiB.

Regarding the max number of blobs, that page states:
"Max number of blob containers, blobs, file shares, tables, queues,
entities, or messages per storage account - No limit"

One could do a quick test and upload 50K+ blobs to check that :)

Valentin

On Mon, Mar 5, 2018 at 5:47 PM Ian Boston <i...@tfd.co.uk> wrote:

> On 5 March 2018 at 16:04, Michael Dürig <mic...@gmail.com> wrote:
>
> > > How does it perform compared to TarMK
> > > a) when the entire repo doesn't fit into RAM allocated to the
> container ?
> > > b) when the working set doesn't fit into RAM allocated to the
> container ?
> >
> > I think this is some of the things we need to find out along the way.
> > Currently my thinking is to move from off heap caching (mmap) to on
> > heap caching (leveraging the segment cache). For that to work we
> > likely need better understand locality of the working set (see
> > https://issues.apache.org/jira/browse/OAK-5655) and rethink the
> > granularity of the cached items. There will likely be many more issues
> > coming through Jira re. this.
> >
>
> Agreed.
> All that will help minimise the IO in this case, or are you saying that if
> the IO is managed and not left to the OS via mmap that it may be possible
> to use a network disk cached by the OS VFS Disk cache, if TarMK has been
> optimised for that type of disk ?
>
> @Tomek
> I assume that the patch deals with the 50K limit[1] to the number of blocks
> per Azure Blob store ?
> With a compacted TarEntry size averaging 230K, the max repo size per Azure
> Blob store will be about 10GB.
> I checked the patch but didn't see anything to indicate that the size of
> each tar entry was increased.
> Azure Blob stores are also limited to 500 IOPS (API requests/s), which is
> about the same as a magnetic disk.
>
> Best Regards
> Ian
>
> 1 https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits
>
>
>
> >
> > Michael
> >
> > On 2 March 2018 at 09:45, Ian Boston <i...@tfd.co.uk> wrote:
> > > Hi Tomek,
> > > Thank you for the pointers and the description in OAK-6922. It all
> makes
> > > sense and seems like a reasonable approach. I assume the description is
> > > upto date.
> > >
> > > How does it perform compared to TarMK
> > > a) when the entire repo doesn't fit into RAM allocated to the
> container ?
> > > b) when the working set doesn't fit into RAM allocated to the
> container ?
> > >
> > > Since you mentioned cost, have you done a cost based analysis of RAM vs
> > > attached disk, assuming that TarMK has already been highly optimised to
> > > cope with deployments where the working set may only just fit into RAM
> ?
> > >
> > > IIRC the Azure attached disks mount Azure Blobs behind a kernel block
> > > device driver and use local SSD to optimise caching (in read and write
> > > through mode). Since there are a kernel block device they also benefit
> > from
> > > the linux kernel VFS Disk Cache and support memory mapping via the page
> > > cache. So An Azure attached disk often behaves like a local SSD
> (IIUC). I
> > > realise that some containerisation frameworks in Azure dont yet support
> > > easy native Azure disk mounting (eg Mesos), but others do (eg AKS[1])
> > >
> > > Best regards
> > > Ian
> > >
> > >
> > > 1 https://azure.microsoft.com/en-us/services/container-service/
> > > https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv
> > >
> > >
> > >
> > > On 1 March 2018 at 18:40, Matt Ryan <o...@mvryan.org> wrote:
> > >
> > >> Hi Tomek,
> > >>
> > >> Some time ago (November 2016 Oakathon IIRC) some people explored a
> > similar
> > >> concept using AWS (S3) instead of Azure.  If you haven’t discussed
> with
> > >> them already it may be worth doing so.  IIRC Stefan Egli and I believe
> > >> Michael Duerig were involved and probably some others as well.
> > >>
> > >> -MR
> > >>
> > >>
> > >> On March 1, 2018 at 5:42:07 AM, Tomek Rekawek
> (reka...@adobe.com.invalid
> > )
> > >> wrote:
> > >>
> > >> Hi Tommaso,
> > >>
> > >> so,

Re: Azure Segment Store

2018-03-05 Thread Ian Boston
On 5 March 2018 at 16:04, Michael Dürig <mic...@gmail.com> wrote:

> > How does it perform compared to TarMK
> > a) when the entire repo doesn't fit into RAM allocated to the container ?
> > b) when the working set doesn't fit into RAM allocated to the container ?
>
> I think this is some of the things we need to find out along the way.
> Currently my thinking is to move from off heap caching (mmap) to on
> heap caching (leveraging the segment cache). For that to work we
> likely need better understand locality of the working set (see
> https://issues.apache.org/jira/browse/OAK-5655) and rethink the
> granularity of the cached items. There will likely be many more issues
> coming through Jira re. this.
>

Agreed.
All that will help minimise the IO in this case, or are you saying that if
the IO is managed and not left to the OS via mmap that it may be possible
to use a network disk cached by the OS VFS Disk cache, if TarMK has been
optimised for that type of disk ?

@Tomek
I assume that the patch deals with the 50K limit[1] to the number of blocks
per Azure Blob store ?
With a compacted TarEntry size averaging 230K, the max repo size per Azure
Blob store will be about 10GB.
I checked the patch but didn't see anything to indicate that the size of
each tar entry was increased.
Azure Blob stores are also limited to 500 IOPS (API requests/s), which is
about the same as a magnetic disk.

Best Regards
Ian

1 https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits



>
> Michael
>
> On 2 March 2018 at 09:45, Ian Boston <i...@tfd.co.uk> wrote:
> > Hi Tomek,
> > Thank you for the pointers and the description in OAK-6922. It all makes
> > sense and seems like a reasonable approach. I assume the description is
> > upto date.
> >
> > How does it perform compared to TarMK
> > a) when the entire repo doesn't fit into RAM allocated to the container ?
> > b) when the working set doesn't fit into RAM allocated to the container ?
> >
> > Since you mentioned cost, have you done a cost based analysis of RAM vs
> > attached disk, assuming that TarMK has already been highly optimised to
> > cope with deployments where the working set may only just fit into RAM ?
> >
> > IIRC the Azure attached disks mount Azure Blobs behind a kernel block
> > device driver and use local SSD to optimise caching (in read and write
> > through mode). Since there are a kernel block device they also benefit
> from
> > the linux kernel VFS Disk Cache and support memory mapping via the page
> > cache. So An Azure attached disk often behaves like a local SSD (IIUC). I
> > realise that some containerisation frameworks in Azure dont yet support
> > easy native Azure disk mounting (eg Mesos), but others do (eg AKS[1])
> >
> > Best regards
> > Ian
> >
> >
> > 1 https://azure.microsoft.com/en-us/services/container-service/
> > https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv
> >
> >
> >
> > On 1 March 2018 at 18:40, Matt Ryan <o...@mvryan.org> wrote:
> >
> >> Hi Tomek,
> >>
> >> Some time ago (November 2016 Oakathon IIRC) some people explored a
> similar
> >> concept using AWS (S3) instead of Azure.  If you haven’t discussed with
> >> them already it may be worth doing so.  IIRC Stefan Egli and I believe
> >> Michael Duerig were involved and probably some others as well.
> >>
> >> -MR
> >>
> >>
> >> On March 1, 2018 at 5:42:07 AM, Tomek Rekawek (reka...@adobe.com.invalid
> )
> >> wrote:
> >>
> >> Hi Tommaso,
> >>
> >> so, the goal is to run the Oak in a cloud, in this case Azure. In order
> to
> >> do this in a scalable way (eg. multiple instances on a single VM,
> >> containerized), we need to take care of provisioning the sufficient
> amount
> >> of space for the segmentstore. Mounting the physical SSD/HDD disks (in
> >> Azure they’re called “Managed Disks” aka EBS in Amazon) has two
> drawbacks:
> >>
> >> * it’s expensive,
> >> * it’s complex (each disk is a separate /dev/sdX that has to be
> formatted,
> >> mounted, etc.)
> >>
> >> The point of the Azure Segment Store is to deal with these two issues,
> by
> >> replacing the need for a local file system space with a remote service,
> >> that will be (a) cheaper and (b) easier to provision (as it’ll be
> >> configured on the application layer rather than VM layer).
> >>
> >> Another option would be using the Azure File Storage (which mounts the
> SMB
> >> file system, not the “physical” disk). Howe

Re: Azure Segment Store

2018-03-05 Thread Michael Dürig
> How does it perform compared to TarMK
> a) when the entire repo doesn't fit into RAM allocated to the container ?
> b) when the working set doesn't fit into RAM allocated to the container ?

I think this is some of the things we need to find out along the way.
Currently my thinking is to move from off heap caching (mmap) to on
heap caching (leveraging the segment cache). For that to work we
likely need better understand locality of the working set (see
https://issues.apache.org/jira/browse/OAK-5655) and rethink the
granularity of the cached items. There will likely be many more issues
coming through Jira re. this.

Michael

On 2 March 2018 at 09:45, Ian Boston <i...@tfd.co.uk> wrote:
> Hi Tomek,
> Thank you for the pointers and the description in OAK-6922. It all makes
> sense and seems like a reasonable approach. I assume the description is
> upto date.
>
> How does it perform compared to TarMK
> a) when the entire repo doesn't fit into RAM allocated to the container ?
> b) when the working set doesn't fit into RAM allocated to the container ?
>
> Since you mentioned cost, have you done a cost based analysis of RAM vs
> attached disk, assuming that TarMK has already been highly optimised to
> cope with deployments where the working set may only just fit into RAM ?
>
> IIRC the Azure attached disks mount Azure Blobs behind a kernel block
> device driver and use local SSD to optimise caching (in read and write
> through mode). Since there are a kernel block device they also benefit from
> the linux kernel VFS Disk Cache and support memory mapping via the page
> cache. So An Azure attached disk often behaves like a local SSD (IIUC). I
> realise that some containerisation frameworks in Azure dont yet support
> easy native Azure disk mounting (eg Mesos), but others do (eg AKS[1])
>
> Best regards
> Ian
>
>
> 1 https://azure.microsoft.com/en-us/services/container-service/
> https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv
>
>
>
> On 1 March 2018 at 18:40, Matt Ryan <o...@mvryan.org> wrote:
>
>> Hi Tomek,
>>
>> Some time ago (November 2016 Oakathon IIRC) some people explored a similar
>> concept using AWS (S3) instead of Azure.  If you haven’t discussed with
>> them already it may be worth doing so.  IIRC Stefan Egli and I believe
>> Michael Duerig were involved and probably some others as well.
>>
>> -MR
>>
>>
>> On March 1, 2018 at 5:42:07 AM, Tomek Rekawek (reka...@adobe.com.invalid)
>> wrote:
>>
>> Hi Tommaso,
>>
>> so, the goal is to run the Oak in a cloud, in this case Azure. In order to
>> do this in a scalable way (eg. multiple instances on a single VM,
>> containerized), we need to take care of provisioning the sufficient amount
>> of space for the segmentstore. Mounting the physical SSD/HDD disks (in
>> Azure they’re called “Managed Disks” aka EBS in Amazon) has two drawbacks:
>>
>> * it’s expensive,
>> * it’s complex (each disk is a separate /dev/sdX that has to be formatted,
>> mounted, etc.)
>>
>> The point of the Azure Segment Store is to deal with these two issues, by
>> replacing the need for a local file system space with a remote service,
>> that will be (a) cheaper and (b) easier to provision (as it’ll be
>> configured on the application layer rather than VM layer).
>>
>> Another option would be using the Azure File Storage (which mounts the SMB
>> file system, not the “physical” disk). However, in this case we’d have a
>> remote storage that emulates a local one and SegmentMK doesn’t really
>> expect this. Rather than that it’s better to create a full-fledged remote
>> storage implementation, so we can work out the issues caused by the higher
>> latency, etc.
>>
>> Regards,
>> Tomek
>>
>> --
>> Tomek Rękawek | Adobe Research | www.adobe.com
>> reka...@adobe.com
>>
>> > On 1 Mar 2018, at 11:16, Tommaso Teofili <tommaso.teof...@gmail.com>
>> wrote:
>> >
>> > Hi Tomek,
>> >
>> > While I think it's an interesting feature, I'd be also interested to hear
>> > about the user story behind your prototype.
>> >
>> > Regards,
>> > Tommaso
>> >
>> >
>> > Il giorno gio 1 mar 2018 alle ore 10:31 Tomek Rękawek <tom...@apache.org
>> >
>> > ha scritto:
>> >
>> >> Hello,
>> >>
>> >> I prepared a prototype for the Azure-based Segment Store, which allows
>> to
>> >> persist all the SegmentMK-related resources (segments, journal,
>> manifest,
>> >> etc.) on a remote service, namely th

Re: Azure Segment Store

2018-03-02 Thread Ian Boston
Hi Tomek,
Thank you for the pointers and the description in OAK-6922. It all makes
sense and seems like a reasonable approach. I assume the description is
upto date.

How does it perform compared to TarMK
a) when the entire repo doesn't fit into RAM allocated to the container ?
b) when the working set doesn't fit into RAM allocated to the container ?

Since you mentioned cost, have you done a cost based analysis of RAM vs
attached disk, assuming that TarMK has already been highly optimised to
cope with deployments where the working set may only just fit into RAM ?

IIRC the Azure attached disks mount Azure Blobs behind a kernel block
device driver and use local SSD to optimise caching (in read and write
through mode). Since there are a kernel block device they also benefit from
the linux kernel VFS Disk Cache and support memory mapping via the page
cache. So An Azure attached disk often behaves like a local SSD (IIUC). I
realise that some containerisation frameworks in Azure dont yet support
easy native Azure disk mounting (eg Mesos), but others do (eg AKS[1])

Best regards
Ian


1 https://azure.microsoft.com/en-us/services/container-service/
https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv



On 1 March 2018 at 18:40, Matt Ryan <o...@mvryan.org> wrote:

> Hi Tomek,
>
> Some time ago (November 2016 Oakathon IIRC) some people explored a similar
> concept using AWS (S3) instead of Azure.  If you haven’t discussed with
> them already it may be worth doing so.  IIRC Stefan Egli and I believe
> Michael Duerig were involved and probably some others as well.
>
> -MR
>
>
> On March 1, 2018 at 5:42:07 AM, Tomek Rekawek (reka...@adobe.com.invalid)
> wrote:
>
> Hi Tommaso,
>
> so, the goal is to run the Oak in a cloud, in this case Azure. In order to
> do this in a scalable way (eg. multiple instances on a single VM,
> containerized), we need to take care of provisioning the sufficient amount
> of space for the segmentstore. Mounting the physical SSD/HDD disks (in
> Azure they’re called “Managed Disks” aka EBS in Amazon) has two drawbacks:
>
> * it’s expensive,
> * it’s complex (each disk is a separate /dev/sdX that has to be formatted,
> mounted, etc.)
>
> The point of the Azure Segment Store is to deal with these two issues, by
> replacing the need for a local file system space with a remote service,
> that will be (a) cheaper and (b) easier to provision (as it’ll be
> configured on the application layer rather than VM layer).
>
> Another option would be using the Azure File Storage (which mounts the SMB
> file system, not the “physical” disk). However, in this case we’d have a
> remote storage that emulates a local one and SegmentMK doesn’t really
> expect this. Rather than that it’s better to create a full-fledged remote
> storage implementation, so we can work out the issues caused by the higher
> latency, etc.
>
> Regards,
> Tomek
>
> --
> Tomek Rękawek | Adobe Research | www.adobe.com
> reka...@adobe.com
>
> > On 1 Mar 2018, at 11:16, Tommaso Teofili <tommaso.teof...@gmail.com>
> wrote:
> >
> > Hi Tomek,
> >
> > While I think it's an interesting feature, I'd be also interested to hear
> > about the user story behind your prototype.
> >
> > Regards,
> > Tommaso
> >
> >
> > Il giorno gio 1 mar 2018 alle ore 10:31 Tomek Rękawek <tom...@apache.org
> >
> > ha scritto:
> >
> >> Hello,
> >>
> >> I prepared a prototype for the Azure-based Segment Store, which allows
> to
> >> persist all the SegmentMK-related resources (segments, journal,
> manifest,
> >> etc.) on a remote service, namely the Azure Blob Storage [1]. The whole
> >> description of the approach, data structure, etc. as well as the patch
> can
> >> be found in OAK-6922. It uses the extension points introduced in the
> >> OAK-6921.
> >>
> >> While it’s still an experimental code, I’d like to commit it to trunk
> >> rather sooner than later. The patch is already pretty big and I’d like
> to
> >> avoid developing it “privately” on my own branch. It’s a new, optional
> >> Maven module, which doesn’t change any existing behaviour of Oak or
> >> SegmentMK. The only change it makes externally is adding a few exports
> to
> >> the oak-segment-tar, so it can use the SPI introduced in the OAK-6921.
> We
> >> may narrow these exports to a single package if you think it’d be good
> for
> >> the encapsulation.
> >>
> >> There’s a related issue OAK-7297, which introduces the new fixture for
> >> benchmark and ITs. After merging it, all the Oak integration tests pass
> on
> >> the Azure Segment Store.
> >>
> >> Looking forward for the feedback.
> >>
> >> Regards,
> >> Tomek
> >>
> >> [1] https://azure.microsoft.com/en-us/services/storage/blobs/
> >>
> >> --
> >> Tomek Rękawek | Adobe Research | www.adobe.com
> >> reka...@adobe.com
> >>
> >>
>


Re: Azure Segment Store

2018-03-01 Thread Matt Ryan
Hi Tomek,

Some time ago (November 2016 Oakathon IIRC) some people explored a similar
concept using AWS (S3) instead of Azure.  If you haven’t discussed with
them already it may be worth doing so.  IIRC Stefan Egli and I believe
Michael Duerig were involved and probably some others as well.

-MR


On March 1, 2018 at 5:42:07 AM, Tomek Rekawek (reka...@adobe.com.invalid)
wrote:

Hi Tommaso,

so, the goal is to run the Oak in a cloud, in this case Azure. In order to
do this in a scalable way (eg. multiple instances on a single VM,
containerized), we need to take care of provisioning the sufficient amount
of space for the segmentstore. Mounting the physical SSD/HDD disks (in
Azure they’re called “Managed Disks” aka EBS in Amazon) has two drawbacks:

* it’s expensive,
* it’s complex (each disk is a separate /dev/sdX that has to be formatted,
mounted, etc.)

The point of the Azure Segment Store is to deal with these two issues, by
replacing the need for a local file system space with a remote service,
that will be (a) cheaper and (b) easier to provision (as it’ll be
configured on the application layer rather than VM layer).

Another option would be using the Azure File Storage (which mounts the SMB
file system, not the “physical” disk). However, in this case we’d have a
remote storage that emulates a local one and SegmentMK doesn’t really
expect this. Rather than that it’s better to create a full-fledged remote
storage implementation, so we can work out the issues caused by the higher
latency, etc.

Regards,
Tomek

-- 
Tomek Rękawek | Adobe Research | www.adobe.com
reka...@adobe.com

> On 1 Mar 2018, at 11:16, Tommaso Teofili <tommaso.teof...@gmail.com>
wrote:
>
> Hi Tomek,
>
> While I think it's an interesting feature, I'd be also interested to hear
> about the user story behind your prototype.
>
> Regards,
> Tommaso
>
>
> Il giorno gio 1 mar 2018 alle ore 10:31 Tomek Rękawek <tom...@apache.org>
> ha scritto:
>
>> Hello,
>>
>> I prepared a prototype for the Azure-based Segment Store, which allows
to
>> persist all the SegmentMK-related resources (segments, journal,
manifest,
>> etc.) on a remote service, namely the Azure Blob Storage [1]. The whole
>> description of the approach, data structure, etc. as well as the patch
can
>> be found in OAK-6922. It uses the extension points introduced in the
>> OAK-6921.
>>
>> While it’s still an experimental code, I’d like to commit it to trunk
>> rather sooner than later. The patch is already pretty big and I’d like
to
>> avoid developing it “privately” on my own branch. It’s a new, optional
>> Maven module, which doesn’t change any existing behaviour of Oak or
>> SegmentMK. The only change it makes externally is adding a few exports
to
>> the oak-segment-tar, so it can use the SPI introduced in the OAK-6921.
We
>> may narrow these exports to a single package if you think it’d be good
for
>> the encapsulation.
>>
>> There’s a related issue OAK-7297, which introduces the new fixture for
>> benchmark and ITs. After merging it, all the Oak integration tests pass
on
>> the Azure Segment Store.
>>
>> Looking forward for the feedback.
>>
>> Regards,
>> Tomek
>>
>> [1] https://azure.microsoft.com/en-us/services/storage/blobs/
>>
>> --
>> Tomek Rękawek | Adobe Research | www.adobe.com
>> reka...@adobe.com
>>
>>


Azure Segment Store

2018-03-01 Thread Tomek Rękawek
Hello,

I prepared a prototype for the Azure-based Segment Store, which allows to 
persist all the SegmentMK-related resources (segments, journal, manifest, etc.) 
on a remote service, namely the Azure Blob Storage [1]. The whole description 
of the approach, data structure, etc. as well as the patch can be found in 
OAK-6922. It uses the extension points introduced in the OAK-6921.

While it’s still an experimental code, I’d like to commit it to trunk rather 
sooner than later. The patch is already pretty big and I’d like to avoid 
developing it “privately” on my own branch. It’s a new, optional Maven module, 
which doesn’t change any existing behaviour of Oak or SegmentMK. The only 
change it makes externally is adding a few exports to the oak-segment-tar, so 
it can use the SPI introduced in the OAK-6921. We may narrow these exports to a 
single package if you think it’d be good for the encapsulation.

There’s a related issue OAK-7297, which introduces the new fixture for 
benchmark and ITs. After merging it, all the Oak integration tests pass on the 
Azure Segment Store.

Looking forward for the feedback.

Regards,
Tomek

[1] https://azure.microsoft.com/en-us/services/storage/blobs/

--
Tomek Rękawek | Adobe Research | www.adobe.com
reka...@adobe.com



signature.asc
Description: Message signed with OpenPGP