> I assume that the patch deals with the 50K limit[1] to the number of blocks per Azure Blob store ?
I read that limit differently: when you upload a large blob, it will be split in up to 50K blocks of max 100MiB, thus a single blob cannot be larger than 4.75 TiB. Regarding the max number of blobs, that page states: "Max number of blob containers, blobs, file shares, tables, queues, entities, or messages per storage account - No limit" One could do a quick test and upload 50K+ blobs to check that :) Valentin On Mon, Mar 5, 2018 at 5:47 PM Ian Boston <[email protected]> wrote: > On 5 March 2018 at 16:04, Michael Dürig <[email protected]> wrote: > > > > How does it perform compared to TarMK > > > a) when the entire repo doesn't fit into RAM allocated to the > container ? > > > b) when the working set doesn't fit into RAM allocated to the > container ? > > > > I think this is some of the things we need to find out along the way. > > Currently my thinking is to move from off heap caching (mmap) to on > > heap caching (leveraging the segment cache). For that to work we > > likely need better understand locality of the working set (see > > https://issues.apache.org/jira/browse/OAK-5655) and rethink the > > granularity of the cached items. There will likely be many more issues > > coming through Jira re. this. > > > > Agreed. > All that will help minimise the IO in this case, or are you saying that if > the IO is managed and not left to the OS via mmap that it may be possible > to use a network disk cached by the OS VFS Disk cache, if TarMK has been > optimised for that type of disk ? > > @Tomek > I assume that the patch deals with the 50K limit[1] to the number of blocks > per Azure Blob store ? > With a compacted TarEntry size averaging 230K, the max repo size per Azure > Blob store will be about 10GB. > I checked the patch but didn't see anything to indicate that the size of > each tar entry was increased. > Azure Blob stores are also limited to 500 IOPS (API requests/s), which is > about the same as a magnetic disk. > > Best Regards > Ian > > 1 https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits > > > > > > > Michael > > > > On 2 March 2018 at 09:45, Ian Boston <[email protected]> wrote: > > > Hi Tomek, > > > Thank you for the pointers and the description in OAK-6922. It all > makes > > > sense and seems like a reasonable approach. I assume the description is > > > upto date. > > > > > > How does it perform compared to TarMK > > > a) when the entire repo doesn't fit into RAM allocated to the > container ? > > > b) when the working set doesn't fit into RAM allocated to the > container ? > > > > > > Since you mentioned cost, have you done a cost based analysis of RAM vs > > > attached disk, assuming that TarMK has already been highly optimised to > > > cope with deployments where the working set may only just fit into RAM > ? > > > > > > IIRC the Azure attached disks mount Azure Blobs behind a kernel block > > > device driver and use local SSD to optimise caching (in read and write > > > through mode). Since there are a kernel block device they also benefit > > from > > > the linux kernel VFS Disk Cache and support memory mapping via the page > > > cache. So An Azure attached disk often behaves like a local SSD > (IIUC). I > > > realise that some containerisation frameworks in Azure dont yet support > > > easy native Azure disk mounting (eg Mesos), but others do (eg AKS[1]) > > > > > > Best regards > > > Ian > > > > > > > > > 1 https://azure.microsoft.com/en-us/services/container-service/ > > > https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv > > > > > > > > > > > > On 1 March 2018 at 18:40, Matt Ryan <[email protected]> wrote: > > > > > >> Hi Tomek, > > >> > > >> Some time ago (November 2016 Oakathon IIRC) some people explored a > > similar > > >> concept using AWS (S3) instead of Azure. If you haven’t discussed > with > > >> them already it may be worth doing so. IIRC Stefan Egli and I believe > > >> Michael Duerig were involved and probably some others as well. > > >> > > >> -MR > > >> > > >> > > >> On March 1, 2018 at 5:42:07 AM, Tomek Rekawek > ([email protected] > > ) > > >> wrote: > > >> > > >> Hi Tommaso, > > >> > > >> so, the goal is to run the Oak in a cloud, in this case Azure. In > order > > to > > >> do this in a scalable way (eg. multiple instances on a single VM, > > >> containerized), we need to take care of provisioning the sufficient > > amount > > >> of space for the segmentstore. Mounting the physical SSD/HDD disks (in > > >> Azure they’re called “Managed Disks” aka EBS in Amazon) has two > > drawbacks: > > >> > > >> * it’s expensive, > > >> * it’s complex (each disk is a separate /dev/sdX that has to be > > formatted, > > >> mounted, etc.) > > >> > > >> The point of the Azure Segment Store is to deal with these two issues, > > by > > >> replacing the need for a local file system space with a remote > service, > > >> that will be (a) cheaper and (b) easier to provision (as it’ll be > > >> configured on the application layer rather than VM layer). > > >> > > >> Another option would be using the Azure File Storage (which mounts the > > SMB > > >> file system, not the “physical” disk). However, in this case we’d > have a > > >> remote storage that emulates a local one and SegmentMK doesn’t really > > >> expect this. Rather than that it’s better to create a full-fledged > > remote > > >> storage implementation, so we can work out the issues caused by the > > higher > > >> latency, etc. > > >> > > >> Regards, > > >> Tomek > > >> > > >> -- > > >> Tomek Rękawek | Adobe Research | www.adobe.com > > >> [email protected] > > >> > > >> > On 1 Mar 2018, at 11:16, Tommaso Teofili <[email protected] > > > > >> wrote: > > >> > > > >> > Hi Tomek, > > >> > > > >> > While I think it's an interesting feature, I'd be also interested to > > hear > > >> > about the user story behind your prototype. > > >> > > > >> > Regards, > > >> > Tommaso > > >> > > > >> > > > >> > Il giorno gio 1 mar 2018 alle ore 10:31 Tomek Rękawek < > > [email protected] > > >> > > > >> > ha scritto: > > >> > > > >> >> Hello, > > >> >> > > >> >> I prepared a prototype for the Azure-based Segment Store, which > > allows > > >> to > > >> >> persist all the SegmentMK-related resources (segments, journal, > > >> manifest, > > >> >> etc.) on a remote service, namely the Azure Blob Storage [1]. The > > whole > > >> >> description of the approach, data structure, etc. as well as the > > patch > > >> can > > >> >> be found in OAK-6922. It uses the extension points introduced in > the > > >> >> OAK-6921. > > >> >> > > >> >> While it’s still an experimental code, I’d like to commit it to > trunk > > >> >> rather sooner than later. The patch is already pretty big and I’d > > like > > >> to > > >> >> avoid developing it “privately” on my own branch. It’s a new, > > optional > > >> >> Maven module, which doesn’t change any existing behaviour of Oak or > > >> >> SegmentMK. The only change it makes externally is adding a few > > exports > > >> to > > >> >> the oak-segment-tar, so it can use the SPI introduced in the > > OAK-6921. > > >> We > > >> >> may narrow these exports to a single package if you think it’d be > > good > > >> for > > >> >> the encapsulation. > > >> >> > > >> >> There’s a related issue OAK-7297, which introduces the new fixture > > for > > >> >> benchmark and ITs. After merging it, all the Oak integration tests > > pass > > >> on > > >> >> the Azure Segment Store. > > >> >> > > >> >> Looking forward for the feedback. > > >> >> > > >> >> Regards, > > >> >> Tomek > > >> >> > > >> >> [1] https://azure.microsoft.com/en-us/services/storage/blobs/ > > >> >> > > >> >> -- > > >> >> Tomek Rękawek | Adobe Research | www.adobe.com > > >> >> [email protected] > > >> >> > > >> >> > > >> > > >
