Re: OAK-9238 oak-run explore should support Azure Segment Store (Review)
Hi Aravindo, Thank you for your contribution! I will review the changes and commit the patch once done. Regards, Andrei On Sat, Sep 26, 2020 at 11:35 PM Aravindo Wingeier wrote: > Hi > > In my personal time, I added support for AzureSegmentStore to `oak-run > explore`. While outdated, I still find it a handy tool. The actual change > is small, the diff is just bloated by the new abstract class I added. > > Please review my PR. > https://github.com/apache/jackrabbit-oak/pull/255/files > > Story: https://issues.apache.org/jira/browse/OAK-9238 > > Thanks, > Aravindo >
OAK-9238 oak-run explore should support Azure Segment Store (Review)
Hi In my personal time, I added support for AzureSegmentStore to `oak-run explore`. While outdated, I still find it a handy tool. The actual change is small, the diff is just bloated by the new abstract class I added. Please review my PR. https://github.com/apache/jackrabbit-oak/pull/255/files Story: https://issues.apache.org/jira/browse/OAK-9238 Thanks, Aravindo
Re: Azure Segment Store
Hi, I misread the documentation in the patch. Thank you for pointing out my mistake. Best Regards Ian On 6 March 2018 at 09:53, Tomek Rękawek wrote: > Hi Ian, > > > On 5 Mar 2018, at 17:47, Ian Boston wrote: > > > > I assume that the patch deals with the 50K limit[1] to the number of > blocks > > per Azure Blob store ? > > As far as I understand, it’s the limit that applies to the number of > blocks in a single blob. Block is a single write. Since the segments are > immutable (written at once), we don’t need to worry about this limit for > the segments. It’s a different case for the journal file - a single commit > leads to a single append which adds a block. However, the patch takes care > of this, by creating journal.log.001, .002, when we’re close to the limit > [1]. > > Regards, > Tomek > > [1] https://github.com/trekawek/jackrabbit-oak/blob/OAK-6922/ > oak-segment-azure/src/main/java/org/apache/jackrabbit/oak/segment/azure/ > AzureJournalFile.java#L37 > > -- > Tomek Rękawek | Adobe Research | www.adobe.com > reka...@adobe.com > >
Re: Azure Segment Store
Hi Ian, > On 5 Mar 2018, at 17:47, Ian Boston wrote: > > I assume that the patch deals with the 50K limit[1] to the number of blocks > per Azure Blob store ? As far as I understand, it’s the limit that applies to the number of blocks in a single blob. Block is a single write. Since the segments are immutable (written at once), we don’t need to worry about this limit for the segments. It’s a different case for the journal file - a single commit leads to a single append which adds a block. However, the patch takes care of this, by creating journal.log.001, .002, when we’re close to the limit [1]. Regards, Tomek [1] https://github.com/trekawek/jackrabbit-oak/blob/OAK-6922/oak-segment-azure/src/main/java/org/apache/jackrabbit/oak/segment/azure/AzureJournalFile.java#L37 -- Tomek Rękawek | Adobe Research | www.adobe.com reka...@adobe.com signature.asc Description: Message signed with OpenPGP
Re: Azure Segment Store
> I assume that the patch deals with the 50K limit[1] to the number of blocks per Azure Blob store ? I read that limit differently: when you upload a large blob, it will be split in up to 50K blocks of max 100MiB, thus a single blob cannot be larger than 4.75 TiB. Regarding the max number of blobs, that page states: "Max number of blob containers, blobs, file shares, tables, queues, entities, or messages per storage account - No limit" One could do a quick test and upload 50K+ blobs to check that :) Valentin On Mon, Mar 5, 2018 at 5:47 PM Ian Boston wrote: > On 5 March 2018 at 16:04, Michael Dürig wrote: > > > > How does it perform compared to TarMK > > > a) when the entire repo doesn't fit into RAM allocated to the > container ? > > > b) when the working set doesn't fit into RAM allocated to the > container ? > > > > I think this is some of the things we need to find out along the way. > > Currently my thinking is to move from off heap caching (mmap) to on > > heap caching (leveraging the segment cache). For that to work we > > likely need better understand locality of the working set (see > > https://issues.apache.org/jira/browse/OAK-5655) and rethink the > > granularity of the cached items. There will likely be many more issues > > coming through Jira re. this. > > > > Agreed. > All that will help minimise the IO in this case, or are you saying that if > the IO is managed and not left to the OS via mmap that it may be possible > to use a network disk cached by the OS VFS Disk cache, if TarMK has been > optimised for that type of disk ? > > @Tomek > I assume that the patch deals with the 50K limit[1] to the number of blocks > per Azure Blob store ? > With a compacted TarEntry size averaging 230K, the max repo size per Azure > Blob store will be about 10GB. > I checked the patch but didn't see anything to indicate that the size of > each tar entry was increased. > Azure Blob stores are also limited to 500 IOPS (API requests/s), which is > about the same as a magnetic disk. > > Best Regards > Ian > > 1 https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits > > > > > > > Michael > > > > On 2 March 2018 at 09:45, Ian Boston wrote: > > > Hi Tomek, > > > Thank you for the pointers and the description in OAK-6922. It all > makes > > > sense and seems like a reasonable approach. I assume the description is > > > upto date. > > > > > > How does it perform compared to TarMK > > > a) when the entire repo doesn't fit into RAM allocated to the > container ? > > > b) when the working set doesn't fit into RAM allocated to the > container ? > > > > > > Since you mentioned cost, have you done a cost based analysis of RAM vs > > > attached disk, assuming that TarMK has already been highly optimised to > > > cope with deployments where the working set may only just fit into RAM > ? > > > > > > IIRC the Azure attached disks mount Azure Blobs behind a kernel block > > > device driver and use local SSD to optimise caching (in read and write > > > through mode). Since there are a kernel block device they also benefit > > from > > > the linux kernel VFS Disk Cache and support memory mapping via the page > > > cache. So An Azure attached disk often behaves like a local SSD > (IIUC). I > > > realise that some containerisation frameworks in Azure dont yet support > > > easy native Azure disk mounting (eg Mesos), but others do (eg AKS[1]) > > > > > > Best regards > > > Ian > > > > > > > > > 1 https://azure.microsoft.com/en-us/services/container-service/ > > > https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv > > > > > > > > > > > > On 1 March 2018 at 18:40, Matt Ryan wrote: > > > > > >> Hi Tomek, > > >> > > >> Some time ago (November 2016 Oakathon IIRC) some people explored a > > similar > > >> concept using AWS (S3) instead of Azure. If you haven’t discussed > with > > >> them already it may be worth doing so. IIRC Stefan Egli and I believe > > >> Michael Duerig were involved and probably some others as well. > > >> > > >> -MR > > >> > > >> > > >> On March 1, 2018 at 5:42:07 AM, Tomek Rekawek > (reka...@adobe.com.invalid > > ) > > >> wrote: > > >> > > >> Hi Tommaso, > > >> > > >> so, the goal is to run the Oak in a cloud, in this case Azure. In &g
Re: Azure Segment Store
On 5 March 2018 at 16:04, Michael Dürig wrote: > > How does it perform compared to TarMK > > a) when the entire repo doesn't fit into RAM allocated to the container ? > > b) when the working set doesn't fit into RAM allocated to the container ? > > I think this is some of the things we need to find out along the way. > Currently my thinking is to move from off heap caching (mmap) to on > heap caching (leveraging the segment cache). For that to work we > likely need better understand locality of the working set (see > https://issues.apache.org/jira/browse/OAK-5655) and rethink the > granularity of the cached items. There will likely be many more issues > coming through Jira re. this. > Agreed. All that will help minimise the IO in this case, or are you saying that if the IO is managed and not left to the OS via mmap that it may be possible to use a network disk cached by the OS VFS Disk cache, if TarMK has been optimised for that type of disk ? @Tomek I assume that the patch deals with the 50K limit[1] to the number of blocks per Azure Blob store ? With a compacted TarEntry size averaging 230K, the max repo size per Azure Blob store will be about 10GB. I checked the patch but didn't see anything to indicate that the size of each tar entry was increased. Azure Blob stores are also limited to 500 IOPS (API requests/s), which is about the same as a magnetic disk. Best Regards Ian 1 https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits > > Michael > > On 2 March 2018 at 09:45, Ian Boston wrote: > > Hi Tomek, > > Thank you for the pointers and the description in OAK-6922. It all makes > > sense and seems like a reasonable approach. I assume the description is > > upto date. > > > > How does it perform compared to TarMK > > a) when the entire repo doesn't fit into RAM allocated to the container ? > > b) when the working set doesn't fit into RAM allocated to the container ? > > > > Since you mentioned cost, have you done a cost based analysis of RAM vs > > attached disk, assuming that TarMK has already been highly optimised to > > cope with deployments where the working set may only just fit into RAM ? > > > > IIRC the Azure attached disks mount Azure Blobs behind a kernel block > > device driver and use local SSD to optimise caching (in read and write > > through mode). Since there are a kernel block device they also benefit > from > > the linux kernel VFS Disk Cache and support memory mapping via the page > > cache. So An Azure attached disk often behaves like a local SSD (IIUC). I > > realise that some containerisation frameworks in Azure dont yet support > > easy native Azure disk mounting (eg Mesos), but others do (eg AKS[1]) > > > > Best regards > > Ian > > > > > > 1 https://azure.microsoft.com/en-us/services/container-service/ > > https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv > > > > > > > > On 1 March 2018 at 18:40, Matt Ryan wrote: > > > >> Hi Tomek, > >> > >> Some time ago (November 2016 Oakathon IIRC) some people explored a > similar > >> concept using AWS (S3) instead of Azure. If you haven’t discussed with > >> them already it may be worth doing so. IIRC Stefan Egli and I believe > >> Michael Duerig were involved and probably some others as well. > >> > >> -MR > >> > >> > >> On March 1, 2018 at 5:42:07 AM, Tomek Rekawek (reka...@adobe.com.invalid > ) > >> wrote: > >> > >> Hi Tommaso, > >> > >> so, the goal is to run the Oak in a cloud, in this case Azure. In order > to > >> do this in a scalable way (eg. multiple instances on a single VM, > >> containerized), we need to take care of provisioning the sufficient > amount > >> of space for the segmentstore. Mounting the physical SSD/HDD disks (in > >> Azure they’re called “Managed Disks” aka EBS in Amazon) has two > drawbacks: > >> > >> * it’s expensive, > >> * it’s complex (each disk is a separate /dev/sdX that has to be > formatted, > >> mounted, etc.) > >> > >> The point of the Azure Segment Store is to deal with these two issues, > by > >> replacing the need for a local file system space with a remote service, > >> that will be (a) cheaper and (b) easier to provision (as it’ll be > >> configured on the application layer rather than VM layer). > >> > >> Another option would be using the Azure File Storage (which mounts the > SMB > >> file system, not the “physical” disk). However, in this case we’d have a > >> r
Re: Azure Segment Store
> How does it perform compared to TarMK > a) when the entire repo doesn't fit into RAM allocated to the container ? > b) when the working set doesn't fit into RAM allocated to the container ? I think this is some of the things we need to find out along the way. Currently my thinking is to move from off heap caching (mmap) to on heap caching (leveraging the segment cache). For that to work we likely need better understand locality of the working set (see https://issues.apache.org/jira/browse/OAK-5655) and rethink the granularity of the cached items. There will likely be many more issues coming through Jira re. this. Michael On 2 March 2018 at 09:45, Ian Boston wrote: > Hi Tomek, > Thank you for the pointers and the description in OAK-6922. It all makes > sense and seems like a reasonable approach. I assume the description is > upto date. > > How does it perform compared to TarMK > a) when the entire repo doesn't fit into RAM allocated to the container ? > b) when the working set doesn't fit into RAM allocated to the container ? > > Since you mentioned cost, have you done a cost based analysis of RAM vs > attached disk, assuming that TarMK has already been highly optimised to > cope with deployments where the working set may only just fit into RAM ? > > IIRC the Azure attached disks mount Azure Blobs behind a kernel block > device driver and use local SSD to optimise caching (in read and write > through mode). Since there are a kernel block device they also benefit from > the linux kernel VFS Disk Cache and support memory mapping via the page > cache. So An Azure attached disk often behaves like a local SSD (IIUC). I > realise that some containerisation frameworks in Azure dont yet support > easy native Azure disk mounting (eg Mesos), but others do (eg AKS[1]) > > Best regards > Ian > > > 1 https://azure.microsoft.com/en-us/services/container-service/ > https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv > > > > On 1 March 2018 at 18:40, Matt Ryan wrote: > >> Hi Tomek, >> >> Some time ago (November 2016 Oakathon IIRC) some people explored a similar >> concept using AWS (S3) instead of Azure. If you haven’t discussed with >> them already it may be worth doing so. IIRC Stefan Egli and I believe >> Michael Duerig were involved and probably some others as well. >> >> -MR >> >> >> On March 1, 2018 at 5:42:07 AM, Tomek Rekawek (reka...@adobe.com.invalid) >> wrote: >> >> Hi Tommaso, >> >> so, the goal is to run the Oak in a cloud, in this case Azure. In order to >> do this in a scalable way (eg. multiple instances on a single VM, >> containerized), we need to take care of provisioning the sufficient amount >> of space for the segmentstore. Mounting the physical SSD/HDD disks (in >> Azure they’re called “Managed Disks” aka EBS in Amazon) has two drawbacks: >> >> * it’s expensive, >> * it’s complex (each disk is a separate /dev/sdX that has to be formatted, >> mounted, etc.) >> >> The point of the Azure Segment Store is to deal with these two issues, by >> replacing the need for a local file system space with a remote service, >> that will be (a) cheaper and (b) easier to provision (as it’ll be >> configured on the application layer rather than VM layer). >> >> Another option would be using the Azure File Storage (which mounts the SMB >> file system, not the “physical” disk). However, in this case we’d have a >> remote storage that emulates a local one and SegmentMK doesn’t really >> expect this. Rather than that it’s better to create a full-fledged remote >> storage implementation, so we can work out the issues caused by the higher >> latency, etc. >> >> Regards, >> Tomek >> >> -- >> Tomek Rękawek | Adobe Research | www.adobe.com >> reka...@adobe.com >> >> > On 1 Mar 2018, at 11:16, Tommaso Teofili >> wrote: >> > >> > Hi Tomek, >> > >> > While I think it's an interesting feature, I'd be also interested to hear >> > about the user story behind your prototype. >> > >> > Regards, >> > Tommaso >> > >> > >> > Il giorno gio 1 mar 2018 alle ore 10:31 Tomek Rękawek > > >> > ha scritto: >> > >> >> Hello, >> >> >> >> I prepared a prototype for the Azure-based Segment Store, which allows >> to >> >> persist all the SegmentMK-related resources (segments, journal, >> manifest, >> >> etc.) on a remote service, namely the Azure Blob Storage [1]. The whole >> >> description of the approach,
Re: Azure Segment Store
Hi Tomek, Thank you for the pointers and the description in OAK-6922. It all makes sense and seems like a reasonable approach. I assume the description is upto date. How does it perform compared to TarMK a) when the entire repo doesn't fit into RAM allocated to the container ? b) when the working set doesn't fit into RAM allocated to the container ? Since you mentioned cost, have you done a cost based analysis of RAM vs attached disk, assuming that TarMK has already been highly optimised to cope with deployments where the working set may only just fit into RAM ? IIRC the Azure attached disks mount Azure Blobs behind a kernel block device driver and use local SSD to optimise caching (in read and write through mode). Since there are a kernel block device they also benefit from the linux kernel VFS Disk Cache and support memory mapping via the page cache. So An Azure attached disk often behaves like a local SSD (IIUC). I realise that some containerisation frameworks in Azure dont yet support easy native Azure disk mounting (eg Mesos), but others do (eg AKS[1]) Best regards Ian 1 https://azure.microsoft.com/en-us/services/container-service/ https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv On 1 March 2018 at 18:40, Matt Ryan wrote: > Hi Tomek, > > Some time ago (November 2016 Oakathon IIRC) some people explored a similar > concept using AWS (S3) instead of Azure. If you haven’t discussed with > them already it may be worth doing so. IIRC Stefan Egli and I believe > Michael Duerig were involved and probably some others as well. > > -MR > > > On March 1, 2018 at 5:42:07 AM, Tomek Rekawek (reka...@adobe.com.invalid) > wrote: > > Hi Tommaso, > > so, the goal is to run the Oak in a cloud, in this case Azure. In order to > do this in a scalable way (eg. multiple instances on a single VM, > containerized), we need to take care of provisioning the sufficient amount > of space for the segmentstore. Mounting the physical SSD/HDD disks (in > Azure they’re called “Managed Disks” aka EBS in Amazon) has two drawbacks: > > * it’s expensive, > * it’s complex (each disk is a separate /dev/sdX that has to be formatted, > mounted, etc.) > > The point of the Azure Segment Store is to deal with these two issues, by > replacing the need for a local file system space with a remote service, > that will be (a) cheaper and (b) easier to provision (as it’ll be > configured on the application layer rather than VM layer). > > Another option would be using the Azure File Storage (which mounts the SMB > file system, not the “physical” disk). However, in this case we’d have a > remote storage that emulates a local one and SegmentMK doesn’t really > expect this. Rather than that it’s better to create a full-fledged remote > storage implementation, so we can work out the issues caused by the higher > latency, etc. > > Regards, > Tomek > > -- > Tomek Rękawek | Adobe Research | www.adobe.com > reka...@adobe.com > > > On 1 Mar 2018, at 11:16, Tommaso Teofili > wrote: > > > > Hi Tomek, > > > > While I think it's an interesting feature, I'd be also interested to hear > > about the user story behind your prototype. > > > > Regards, > > Tommaso > > > > > > Il giorno gio 1 mar 2018 alle ore 10:31 Tomek Rękawek > > > ha scritto: > > > >> Hello, > >> > >> I prepared a prototype for the Azure-based Segment Store, which allows > to > >> persist all the SegmentMK-related resources (segments, journal, > manifest, > >> etc.) on a remote service, namely the Azure Blob Storage [1]. The whole > >> description of the approach, data structure, etc. as well as the patch > can > >> be found in OAK-6922. It uses the extension points introduced in the > >> OAK-6921. > >> > >> While it’s still an experimental code, I’d like to commit it to trunk > >> rather sooner than later. The patch is already pretty big and I’d like > to > >> avoid developing it “privately” on my own branch. It’s a new, optional > >> Maven module, which doesn’t change any existing behaviour of Oak or > >> SegmentMK. The only change it makes externally is adding a few exports > to > >> the oak-segment-tar, so it can use the SPI introduced in the OAK-6921. > We > >> may narrow these exports to a single package if you think it’d be good > for > >> the encapsulation. > >> > >> There’s a related issue OAK-7297, which introduces the new fixture for > >> benchmark and ITs. After merging it, all the Oak integration tests pass > on > >> the Azure Segment Store. > >> > >> Looking forward for the feedback. > >> > >> Regards, > >> Tomek > >> > >> [1] https://azure.microsoft.com/en-us/services/storage/blobs/ > >> > >> -- > >> Tomek Rękawek | Adobe Research | www.adobe.com > >> reka...@adobe.com > >> > >> >
Re: Azure Segment Store
Hi Tomek, Some time ago (November 2016 Oakathon IIRC) some people explored a similar concept using AWS (S3) instead of Azure. If you haven’t discussed with them already it may be worth doing so. IIRC Stefan Egli and I believe Michael Duerig were involved and probably some others as well. -MR On March 1, 2018 at 5:42:07 AM, Tomek Rekawek (reka...@adobe.com.invalid) wrote: Hi Tommaso, so, the goal is to run the Oak in a cloud, in this case Azure. In order to do this in a scalable way (eg. multiple instances on a single VM, containerized), we need to take care of provisioning the sufficient amount of space for the segmentstore. Mounting the physical SSD/HDD disks (in Azure they’re called “Managed Disks” aka EBS in Amazon) has two drawbacks: * it’s expensive, * it’s complex (each disk is a separate /dev/sdX that has to be formatted, mounted, etc.) The point of the Azure Segment Store is to deal with these two issues, by replacing the need for a local file system space with a remote service, that will be (a) cheaper and (b) easier to provision (as it’ll be configured on the application layer rather than VM layer). Another option would be using the Azure File Storage (which mounts the SMB file system, not the “physical” disk). However, in this case we’d have a remote storage that emulates a local one and SegmentMK doesn’t really expect this. Rather than that it’s better to create a full-fledged remote storage implementation, so we can work out the issues caused by the higher latency, etc. Regards, Tomek -- Tomek Rękawek | Adobe Research | www.adobe.com reka...@adobe.com > On 1 Mar 2018, at 11:16, Tommaso Teofili wrote: > > Hi Tomek, > > While I think it's an interesting feature, I'd be also interested to hear > about the user story behind your prototype. > > Regards, > Tommaso > > > Il giorno gio 1 mar 2018 alle ore 10:31 Tomek Rękawek > ha scritto: > >> Hello, >> >> I prepared a prototype for the Azure-based Segment Store, which allows to >> persist all the SegmentMK-related resources (segments, journal, manifest, >> etc.) on a remote service, namely the Azure Blob Storage [1]. The whole >> description of the approach, data structure, etc. as well as the patch can >> be found in OAK-6922. It uses the extension points introduced in the >> OAK-6921. >> >> While it’s still an experimental code, I’d like to commit it to trunk >> rather sooner than later. The patch is already pretty big and I’d like to >> avoid developing it “privately” on my own branch. It’s a new, optional >> Maven module, which doesn’t change any existing behaviour of Oak or >> SegmentMK. The only change it makes externally is adding a few exports to >> the oak-segment-tar, so it can use the SPI introduced in the OAK-6921. We >> may narrow these exports to a single package if you think it’d be good for >> the encapsulation. >> >> There’s a related issue OAK-7297, which introduces the new fixture for >> benchmark and ITs. After merging it, all the Oak integration tests pass on >> the Azure Segment Store. >> >> Looking forward for the feedback. >> >> Regards, >> Tomek >> >> [1] https://azure.microsoft.com/en-us/services/storage/blobs/ >> >> -- >> Tomek Rękawek | Adobe Research | www.adobe.com >> reka...@adobe.com >> >>
Re: Azure Segment Store
Hi Tommaso, so, the goal is to run the Oak in a cloud, in this case Azure. In order to do this in a scalable way (eg. multiple instances on a single VM, containerized), we need to take care of provisioning the sufficient amount of space for the segmentstore. Mounting the physical SSD/HDD disks (in Azure they’re called “Managed Disks” aka EBS in Amazon) has two drawbacks: * it’s expensive, * it’s complex (each disk is a separate /dev/sdX that has to be formatted, mounted, etc.) The point of the Azure Segment Store is to deal with these two issues, by replacing the need for a local file system space with a remote service, that will be (a) cheaper and (b) easier to provision (as it’ll be configured on the application layer rather than VM layer). Another option would be using the Azure File Storage (which mounts the SMB file system, not the “physical” disk). However, in this case we’d have a remote storage that emulates a local one and SegmentMK doesn’t really expect this. Rather than that it’s better to create a full-fledged remote storage implementation, so we can work out the issues caused by the higher latency, etc. Regards, Tomek -- Tomek Rękawek | Adobe Research | www.adobe.com reka...@adobe.com > On 1 Mar 2018, at 11:16, Tommaso Teofili wrote: > > Hi Tomek, > > While I think it's an interesting feature, I'd be also interested to hear > about the user story behind your prototype. > > Regards, > Tommaso > > > Il giorno gio 1 mar 2018 alle ore 10:31 Tomek Rękawek > ha scritto: > >> Hello, >> >> I prepared a prototype for the Azure-based Segment Store, which allows to >> persist all the SegmentMK-related resources (segments, journal, manifest, >> etc.) on a remote service, namely the Azure Blob Storage [1]. The whole >> description of the approach, data structure, etc. as well as the patch can >> be found in OAK-6922. It uses the extension points introduced in the >> OAK-6921. >> >> While it’s still an experimental code, I’d like to commit it to trunk >> rather sooner than later. The patch is already pretty big and I’d like to >> avoid developing it “privately” on my own branch. It’s a new, optional >> Maven module, which doesn’t change any existing behaviour of Oak or >> SegmentMK. The only change it makes externally is adding a few exports to >> the oak-segment-tar, so it can use the SPI introduced in the OAK-6921. We >> may narrow these exports to a single package if you think it’d be good for >> the encapsulation. >> >> There’s a related issue OAK-7297, which introduces the new fixture for >> benchmark and ITs. After merging it, all the Oak integration tests pass on >> the Azure Segment Store. >> >> Looking forward for the feedback. >> >> Regards, >> Tomek >> >> [1] https://azure.microsoft.com/en-us/services/storage/blobs/ >> >> -- >> Tomek Rękawek | Adobe Research | www.adobe.com >> reka...@adobe.com >> >> signature.asc Description: Message signed with OpenPGP
Re: Azure Segment Store
Hi Tomek, While I think it's an interesting feature, I'd be also interested to hear about the user story behind your prototype. Regards, Tommaso Il giorno gio 1 mar 2018 alle ore 10:31 Tomek Rękawek ha scritto: > Hello, > > I prepared a prototype for the Azure-based Segment Store, which allows to > persist all the SegmentMK-related resources (segments, journal, manifest, > etc.) on a remote service, namely the Azure Blob Storage [1]. The whole > description of the approach, data structure, etc. as well as the patch can > be found in OAK-6922. It uses the extension points introduced in the > OAK-6921. > > While it’s still an experimental code, I’d like to commit it to trunk > rather sooner than later. The patch is already pretty big and I’d like to > avoid developing it “privately” on my own branch. It’s a new, optional > Maven module, which doesn’t change any existing behaviour of Oak or > SegmentMK. The only change it makes externally is adding a few exports to > the oak-segment-tar, so it can use the SPI introduced in the OAK-6921. We > may narrow these exports to a single package if you think it’d be good for > the encapsulation. > > There’s a related issue OAK-7297, which introduces the new fixture for > benchmark and ITs. After merging it, all the Oak integration tests pass on > the Azure Segment Store. > > Looking forward for the feedback. > > Regards, > Tomek > > [1] https://azure.microsoft.com/en-us/services/storage/blobs/ > > -- > Tomek Rękawek | Adobe Research | www.adobe.com > reka...@adobe.com > >
Azure Segment Store
Hello, I prepared a prototype for the Azure-based Segment Store, which allows to persist all the SegmentMK-related resources (segments, journal, manifest, etc.) on a remote service, namely the Azure Blob Storage [1]. The whole description of the approach, data structure, etc. as well as the patch can be found in OAK-6922. It uses the extension points introduced in the OAK-6921. While it’s still an experimental code, I’d like to commit it to trunk rather sooner than later. The patch is already pretty big and I’d like to avoid developing it “privately” on my own branch. It’s a new, optional Maven module, which doesn’t change any existing behaviour of Oak or SegmentMK. The only change it makes externally is adding a few exports to the oak-segment-tar, so it can use the SPI introduced in the OAK-6921. We may narrow these exports to a single package if you think it’d be good for the encapsulation. There’s a related issue OAK-7297, which introduces the new fixture for benchmark and ITs. After merging it, all the Oak integration tests pass on the Azure Segment Store. Looking forward for the feedback. Regards, Tomek [1] https://azure.microsoft.com/en-us/services/storage/blobs/ -- Tomek Rękawek | Adobe Research | www.adobe.com reka...@adobe.com signature.asc Description: Message signed with OpenPGP