Tomek Rękawek edited comment on OAK-6922 at 2/13/18 10:36 AM:

The azure implementation:

was (Author: tomek.rekawek):
The azure implementation:

> Azure support for the segment-tar
> ---------------------------------
>                 Key: OAK-6922
>                 URL: https://issues.apache.org/jira/browse/OAK-6922
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: segment-tar
>            Reporter: Tomek Rękawek
>            Priority: Major
>             Fix For: 1.9.0, 1.10
>         Attachments: OAK-6922.patch
> An Azure Blob Storage implementation of the segment storage, based on the 
> OAK-6921 work.
> h3. Segment files layout
> Thew new implementation doesn't use tar files. They are replaced with 
> directories, storing segments, named after their UUIDs. This approach has 
> following advantages:
> * no need to call seek(), which may be expensive on a remote file system. 
> Rather than that we can read the whole file (=segment) at once.
> * it's possible to send multiple segments at once, asynchronously, which 
> reduces the performance overhead (see below).
> The file structure is as follows:
> {noformat}
> [~]$ az storage blob list -c oak --output table
> Name                                                          Blob Type    
> Blob Tier    Length    Content Type              Last Modified
> ------------------------------------------------------------  -----------  
> -----------  --------  ------------------------  -------------------------
> oak/data00000a.tar/0000.ca1326d1-edf4-4d53-aef0-0f14a6d05b63  BlockBlob       
>           192       application/octet-stream  2018-01-31T10:59:14+00:00
> oak/data00000a.tar/0001.c6e03426-db9d-4315-a20a-12559e6aee54  BlockBlob       
>           262144    application/octet-stream  2018-01-31T10:59:14+00:00
> oak/data00000a.tar/0002.b3784e27-6d16-4f80-afc1-6f3703f6bdb9  BlockBlob       
>           262144    application/octet-stream  2018-01-31T10:59:14+00:00
> oak/data00000a.tar/0003.5d2f9588-0c92-4547-abf7-0263ee7c37bb  BlockBlob       
>           259216    application/octet-stream  2018-01-31T10:59:14+00:00
> ...
> oak/data00000a.tar/006e.7b8cf63d-849a-4120-aa7c-47c3dde25e48  BlockBlob       
>           4368      application/octet-stream  2018-01-31T12:01:09+00:00
> oak/data00000a.tar/006f.93799ae9-288e-4b32-afc2-bbc676fad7e5  BlockBlob       
>           3792      application/octet-stream  2018-01-31T12:01:14+00:00
> oak/data00000a.tar/0070.8b2d5ff2-6a74-4ac3-a3cc-cc439367c2aa  BlockBlob       
>           3680      application/octet-stream  2018-01-31T12:01:14+00:00
> oak/data00000a.tar/0071.2a1c49f0-ce33-4777-a042-8aa8a704d202  BlockBlob       
>           7760      application/octet-stream  2018-01-31T12:10:54+00:00
> oak/journal.log.001                                           AppendBlob      
>           1010      application/octet-stream  2018-01-31T12:10:54+00:00
> oak/manifest                                                  BlockBlob       
>           46        application/octet-stream  2018-01-31T10:59:14+00:00
> oak/repo.lock                                                 BlockBlob       
>                     application/octet-stream  2018-01-31T10:59:14+00:00
> {noformat}
> For the segment files, each name is prefixed with the index number. This 
> allows to maintain an order, as in the tar archive. This order is normally 
> stored in the index files as well, but if it's missing, the recovery process 
> needs it.
> Each file contains the raw segment data, with no padding/headers. Apart from 
> the segment files, there are 3 special files: binary references (.brf), 
> segment graph (.gph) and segment index (.idx).
> h3. Asynchronous writes
> Normally, all the TarWriter writes are synchronous, appending the segments to 
> the tar file. In case of Azure Blob Storage each write involves a network 
> latency. That's why the SegmentWriteQueue was introduced. The segments are 
> added to the blocking dequeue, which is served by a number of the consumer 
> threads, writing the segments to the cloud. There's also a map UUID->Segment, 
> which allows to return the segments in case they are requested by the 
> readSegment() method before they are actually persisted. Segments are removed 
> from the map only after a successful write operation.
> The flush() method blocks accepting the new segments and returns after all 
> waiting segments are written. The close() method waits until the current 
> operations are finished and stops all threads.
> The asynchronous mode can be disabled by setting the number of threads to 0.
> h5. Queue recovery mode
> If the Azure Blob Storage write() operation fails, the segment will be 
> re-added and the queue is switched to an "recovery mode". In this mode, all 
> the threads are suspended and new segments are not accepted (active waiting). 
> There's a single thread which retries adding the segment with some delay. If 
> the segment is successfully written, the queue will back to the normal 
> operation.
> This way the unavailable remote service is not flooded by the requests and 
> we're not accepting the segments when we can't persist them.
> The close() method finishes the recovery mode - in this case, some of the 
> awaiting segments won't be persisted.
> h5. Consistency
> The asynchronous mode isn't as reliable as the standard, synchronous case. 
> Following cases are possible:
> * TarWriter#writeEntry() returns successfully, but the segments are not 
> persisted.
> * TarWriter#writeEntry() accepts a number of segments: S1, S2, S3. The S2 and 
> S3 are persisted, but the S1 is not.
> On the other hand:
> * If the TarWriter#flush() returns successfully, it means that all the 
> accepted segments has been persisted.
> h5. Recovery
> During the segment recovery (eg. if the index file is missing), the HDFS 
> implementation checks if there's no missing segment in the middle. If so, 
> only the consecutive segments are recovered. For instance, if we have S1, S2, 
> S3, S5, S6, S7, then the recovery process will return only the first three.

This message was sent by Atlassian JIRA

Reply via email to