Re: Azure Segment Store

Valentin Olteanu Tue, 06 Mar 2018 00:32:23 -0800

> I assume that the patch deals with the 50K limit[1] to the number of
blocks per Azure Blob store ?


I read that limit differently: when you upload a large blob, it will be
split in up to 50K blocks of max 100MiB, thus a single blob cannot be
larger than 4.75 TiB.

Regarding the max number of blobs, that page states:
"Max number of blob containers, blobs, file shares, tables, queues,
entities, or messages per storage account - No limit"

One could do a quick test and upload 50K+ blobs to check that :)

Valentin

On Mon, Mar 5, 2018 at 5:47 PM Ian Boston <[email protected]> wrote:

> On 5 March 2018 at 16:04, Michael Dürig <[email protected]> wrote:
>
> > > How does it perform compared to TarMK
> > > a) when the entire repo doesn't fit into RAM allocated to the
> container ?
> > > b) when the working set doesn't fit into RAM allocated to the
> container ?
> >
> > I think this is some of the things we need to find out along the way.
> > Currently my thinking is to move from off heap caching (mmap) to on
> > heap caching (leveraging the segment cache). For that to work we
> > likely need better understand locality of the working set (see
> > https://issues.apache.org/jira/browse/OAK-5655) and rethink the
> > granularity of the cached items. There will likely be many more issues
> > coming through Jira re. this.
> >
>
> Agreed.
> All that will help minimise the IO in this case, or are you saying that if
> the IO is managed and not left to the OS via mmap that it may be possible
> to use a network disk cached by the OS VFS Disk cache, if TarMK has been
> optimised for that type of disk ?
>
> @Tomek
> I assume that the patch deals with the 50K limit[1] to the number of blocks
> per Azure Blob store ?
> With a compacted TarEntry size averaging 230K, the max repo size per Azure
> Blob store will be about 10GB.
> I checked the patch but didn't see anything to indicate that the size of
> each tar entry was increased.
> Azure Blob stores are also limited to 500 IOPS (API requests/s), which is
> about the same as a magnetic disk.
>
> Best Regards
> Ian
>
> 1 https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits
>
>
>
> >
> > Michael
> >
> > On 2 March 2018 at 09:45, Ian Boston <[email protected]> wrote:
> > > Hi Tomek,
> > > Thank you for the pointers and the description in OAK-6922. It all
> makes
> > > sense and seems like a reasonable approach. I assume the description is
> > > upto date.
> > >
> > > How does it perform compared to TarMK
> > > a) when the entire repo doesn't fit into RAM allocated to the
> container ?
> > > b) when the working set doesn't fit into RAM allocated to the
> container ?
> > >
> > > Since you mentioned cost, have you done a cost based analysis of RAM vs
> > > attached disk, assuming that TarMK has already been highly optimised to
> > > cope with deployments where the working set may only just fit into RAM
> ?
> > >
> > > IIRC the Azure attached disks mount Azure Blobs behind a kernel block
> > > device driver and use local SSD to optimise caching (in read and write
> > > through mode). Since there are a kernel block device they also benefit
> > from
> > > the linux kernel VFS Disk Cache and support memory mapping via the page
> > > cache. So An Azure attached disk often behaves like a local SSD
> (IIUC). I
> > > realise that some containerisation frameworks in Azure dont yet support
> > > easy native Azure disk mounting (eg Mesos), but others do (eg AKS[1])
> > >
> > > Best regards
> > > Ian
> > >
> > >
> > > 1 https://azure.microsoft.com/en-us/services/container-service/
> > > https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv
> > >
> > >
> > >
> > > On 1 March 2018 at 18:40, Matt Ryan <[email protected]> wrote:
> > >
> > >> Hi Tomek,
> > >>
> > >> Some time ago (November 2016 Oakathon IIRC) some people explored a
> > similar
> > >> concept using AWS (S3) instead of Azure.  If you haven’t discussed
> with
> > >> them already it may be worth doing so.  IIRC Stefan Egli and I believe
> > >> Michael Duerig were involved and probably some others as well.
> > >>
> > >> -MR
> > >>
> > >>
> > >> On March 1, 2018 at 5:42:07 AM, Tomek Rekawek
> ([email protected]
> > )
> > >> wrote:
> > >>
> > >> Hi Tommaso,
> > >>
> > >> so, the goal is to run the Oak in a cloud, in this case Azure. In
> order
> > to
> > >> do this in a scalable way (eg. multiple instances on a single VM,
> > >> containerized), we need to take care of provisioning the sufficient
> > amount
> > >> of space for the segmentstore. Mounting the physical SSD/HDD disks (in
> > >> Azure they’re called “Managed Disks” aka EBS in Amazon) has two
> > drawbacks:
> > >>
> > >> * it’s expensive,
> > >> * it’s complex (each disk is a separate /dev/sdX that has to be
> > formatted,
> > >> mounted, etc.)
> > >>
> > >> The point of the Azure Segment Store is to deal with these two issues,
> > by
> > >> replacing the need for a local file system space with a remote
> service,
> > >> that will be (a) cheaper and (b) easier to provision (as it’ll be
> > >> configured on the application layer rather than VM layer).
> > >>
> > >> Another option would be using the Azure File Storage (which mounts the
> > SMB
> > >> file system, not the “physical” disk). However, in this case we’d
> have a
> > >> remote storage that emulates a local one and SegmentMK doesn’t really
> > >> expect this. Rather than that it’s better to create a full-fledged
> > remote
> > >> storage implementation, so we can work out the issues caused by the
> > higher
> > >> latency, etc.
> > >>
> > >> Regards,
> > >> Tomek
> > >>
> > >> --
> > >> Tomek Rękawek | Adobe Research | www.adobe.com
> > >> [email protected]
> > >>
> > >> > On 1 Mar 2018, at 11:16, Tommaso Teofili <[email protected]
> >
> > >> wrote:
> > >> >
> > >> > Hi Tomek,
> > >> >
> > >> > While I think it's an interesting feature, I'd be also interested to
> > hear
> > >> > about the user story behind your prototype.
> > >> >
> > >> > Regards,
> > >> > Tommaso
> > >> >
> > >> >
> > >> > Il giorno gio 1 mar 2018 alle ore 10:31 Tomek Rękawek <
> > [email protected]
> > >> >
> > >> > ha scritto:
> > >> >
> > >> >> Hello,
> > >> >>
> > >> >> I prepared a prototype for the Azure-based Segment Store, which
> > allows
> > >> to
> > >> >> persist all the SegmentMK-related resources (segments, journal,
> > >> manifest,
> > >> >> etc.) on a remote service, namely the Azure Blob Storage [1]. The
> > whole
> > >> >> description of the approach, data structure, etc. as well as the
> > patch
> > >> can
> > >> >> be found in OAK-6922. It uses the extension points introduced in
> the
> > >> >> OAK-6921.
> > >> >>
> > >> >> While it’s still an experimental code, I’d like to commit it to
> trunk
> > >> >> rather sooner than later. The patch is already pretty big and I’d
> > like
> > >> to
> > >> >> avoid developing it “privately” on my own branch. It’s a new,
> > optional
> > >> >> Maven module, which doesn’t change any existing behaviour of Oak or
> > >> >> SegmentMK. The only change it makes externally is adding a few
> > exports
> > >> to
> > >> >> the oak-segment-tar, so it can use the SPI introduced in the
> > OAK-6921.
> > >> We
> > >> >> may narrow these exports to a single package if you think it’d be
> > good
> > >> for
> > >> >> the encapsulation.
> > >> >>
> > >> >> There’s a related issue OAK-7297, which introduces the new fixture
> > for
> > >> >> benchmark and ITs. After merging it, all the Oak integration tests
> > pass
> > >> on
> > >> >> the Azure Segment Store.
> > >> >>
> > >> >> Looking forward for the feedback.
> > >> >>
> > >> >> Regards,
> > >> >> Tomek
> > >> >>
> > >> >> [1] https://azure.microsoft.com/en-us/services/storage/blobs/
> > >> >>
> > >> >> --
> > >> >> Tomek Rękawek | Adobe Research | www.adobe.com
> > >> >> [email protected]
> > >> >>
> > >> >>
> > >>
> >
>

Re: Azure Segment Store

Reply via email to