> How does it perform compared to TarMK > a) when the entire repo doesn't fit into RAM allocated to the container ? > b) when the working set doesn't fit into RAM allocated to the container ?
I think this is some of the things we need to find out along the way. Currently my thinking is to move from off heap caching (mmap) to on heap caching (leveraging the segment cache). For that to work we likely need better understand locality of the working set (see https://issues.apache.org/jira/browse/OAK-5655) and rethink the granularity of the cached items. There will likely be many more issues coming through Jira re. this. Michael On 2 March 2018 at 09:45, Ian Boston <[email protected]> wrote: > Hi Tomek, > Thank you for the pointers and the description in OAK-6922. It all makes > sense and seems like a reasonable approach. I assume the description is > upto date. > > How does it perform compared to TarMK > a) when the entire repo doesn't fit into RAM allocated to the container ? > b) when the working set doesn't fit into RAM allocated to the container ? > > Since you mentioned cost, have you done a cost based analysis of RAM vs > attached disk, assuming that TarMK has already been highly optimised to > cope with deployments where the working set may only just fit into RAM ? > > IIRC the Azure attached disks mount Azure Blobs behind a kernel block > device driver and use local SSD to optimise caching (in read and write > through mode). Since there are a kernel block device they also benefit from > the linux kernel VFS Disk Cache and support memory mapping via the page > cache. So An Azure attached disk often behaves like a local SSD (IIUC). I > realise that some containerisation frameworks in Azure dont yet support > easy native Azure disk mounting (eg Mesos), but others do (eg AKS[1]) > > Best regards > Ian > > > 1 https://azure.microsoft.com/en-us/services/container-service/ > https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv > > > > On 1 March 2018 at 18:40, Matt Ryan <[email protected]> wrote: > >> Hi Tomek, >> >> Some time ago (November 2016 Oakathon IIRC) some people explored a similar >> concept using AWS (S3) instead of Azure. If you haven’t discussed with >> them already it may be worth doing so. IIRC Stefan Egli and I believe >> Michael Duerig were involved and probably some others as well. >> >> -MR >> >> >> On March 1, 2018 at 5:42:07 AM, Tomek Rekawek ([email protected]) >> wrote: >> >> Hi Tommaso, >> >> so, the goal is to run the Oak in a cloud, in this case Azure. In order to >> do this in a scalable way (eg. multiple instances on a single VM, >> containerized), we need to take care of provisioning the sufficient amount >> of space for the segmentstore. Mounting the physical SSD/HDD disks (in >> Azure they’re called “Managed Disks” aka EBS in Amazon) has two drawbacks: >> >> * it’s expensive, >> * it’s complex (each disk is a separate /dev/sdX that has to be formatted, >> mounted, etc.) >> >> The point of the Azure Segment Store is to deal with these two issues, by >> replacing the need for a local file system space with a remote service, >> that will be (a) cheaper and (b) easier to provision (as it’ll be >> configured on the application layer rather than VM layer). >> >> Another option would be using the Azure File Storage (which mounts the SMB >> file system, not the “physical” disk). However, in this case we’d have a >> remote storage that emulates a local one and SegmentMK doesn’t really >> expect this. Rather than that it’s better to create a full-fledged remote >> storage implementation, so we can work out the issues caused by the higher >> latency, etc. >> >> Regards, >> Tomek >> >> -- >> Tomek Rękawek | Adobe Research | www.adobe.com >> [email protected] >> >> > On 1 Mar 2018, at 11:16, Tommaso Teofili <[email protected]> >> wrote: >> > >> > Hi Tomek, >> > >> > While I think it's an interesting feature, I'd be also interested to hear >> > about the user story behind your prototype. >> > >> > Regards, >> > Tommaso >> > >> > >> > Il giorno gio 1 mar 2018 alle ore 10:31 Tomek Rękawek <[email protected] >> > >> > ha scritto: >> > >> >> Hello, >> >> >> >> I prepared a prototype for the Azure-based Segment Store, which allows >> to >> >> persist all the SegmentMK-related resources (segments, journal, >> manifest, >> >> etc.) on a remote service, namely the Azure Blob Storage [1]. The whole >> >> description of the approach, data structure, etc. as well as the patch >> can >> >> be found in OAK-6922. It uses the extension points introduced in the >> >> OAK-6921. >> >> >> >> While it’s still an experimental code, I’d like to commit it to trunk >> >> rather sooner than later. The patch is already pretty big and I’d like >> to >> >> avoid developing it “privately” on my own branch. It’s a new, optional >> >> Maven module, which doesn’t change any existing behaviour of Oak or >> >> SegmentMK. The only change it makes externally is adding a few exports >> to >> >> the oak-segment-tar, so it can use the SPI introduced in the OAK-6921. >> We >> >> may narrow these exports to a single package if you think it’d be good >> for >> >> the encapsulation. >> >> >> >> There’s a related issue OAK-7297, which introduces the new fixture for >> >> benchmark and ITs. After merging it, all the Oak integration tests pass >> on >> >> the Azure Segment Store. >> >> >> >> Looking forward for the feedback. >> >> >> >> Regards, >> >> Tomek >> >> >> >> [1] https://azure.microsoft.com/en-us/services/storage/blobs/ >> >> >> >> -- >> >> Tomek Rękawek | Adobe Research | www.adobe.com >> >> [email protected] >> >> >> >> >>
