[jira] [Comment Edited] (OAK-6922) Azure support for the segment-tar
[ https://issues.apache.org/jira/browse/OAK-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16402371#comment-16402371 ] Tomek Rękawek edited comment on OAK-6922 at 3/17/18 9:49 PM: - [~frm], [~mduerig] - thanks for the comments. I moved all the required interfaces to SPI packages and applied all the suggestions (I think). Since the changes in the oak-segment-tar are now quite extensive, I created a separate issue to cover the SPI updated: OAK-7355. See the issue for the patch and the summary of changes. The [^OAK-6922-3.patch] now only contains the new Azure bundle, it requires the OAK-7355 to work. was (Author: tomek.rekawek): [~frm], [~mduerig] - thanks for the comments. I moved all the required interfaces to SPI packages and applied all the suggestions (I think). Since the changes in the oak-segment-tar are now quite extensive, I created a separate issue to cover the SPI updated: OAK-7355. See the issue for the patch and the summary of changes. The [^OAK-6922-3.patch] now only contains the new Azure bundle, it requires the OAK-7355 to work. > Azure support for the segment-tar > - > > Key: OAK-6922 > URL: https://issues.apache.org/jira/browse/OAK-6922 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: segment-tar >Reporter: Tomek Rękawek >Assignee: Tomek Rękawek >Priority: Major > Fix For: 1.9.0, 1.10 > > Attachments: OAK-6922-2.patch, OAK-6922-3.patch, OAK-6922.patch > > > An Azure Blob Storage implementation of the segment storage, based on the > OAK-6921 work. > h3. Segment files layout > Thew new implementation doesn't use tar files. They are replaced with > directories, storing segments, named after their UUIDs. This approach has > following advantages: > * no need to call seek(), which may be expensive on a remote file system. > Rather than that we can read the whole file (=segment) at once. > * it's possible to send multiple segments at once, asynchronously, which > reduces the performance overhead (see below). > The file structure is as follows: > {noformat} > [~]$ az storage blob list -c oak --output table > Name Blob Type > Blob TierLengthContent Type Last Modified > --- > --- - > oak/data0a.tar/.ca1326d1-edf4-4d53-aef0-0f14a6d05b63 BlockBlob > 192 application/octet-stream 2018-01-31T10:59:14+00:00 > oak/data0a.tar/0001.c6e03426-db9d-4315-a20a-12559e6aee54 BlockBlob > 262144application/octet-stream 2018-01-31T10:59:14+00:00 > oak/data0a.tar/0002.b3784e27-6d16-4f80-afc1-6f3703f6bdb9 BlockBlob > 262144application/octet-stream 2018-01-31T10:59:14+00:00 > oak/data0a.tar/0003.5d2f9588-0c92-4547-abf7-0263ee7c37bb BlockBlob > 259216application/octet-stream 2018-01-31T10:59:14+00:00 > ... > oak/data0a.tar/006e.7b8cf63d-849a-4120-aa7c-47c3dde25e48 BlockBlob > 4368 application/octet-stream 2018-01-31T12:01:09+00:00 > oak/data0a.tar/006f.93799ae9-288e-4b32-afc2-bbc676fad7e5 BlockBlob > 3792 application/octet-stream 2018-01-31T12:01:14+00:00 > oak/data0a.tar/0070.8b2d5ff2-6a74-4ac3-a3cc-cc439367c2aa BlockBlob > 3680 application/octet-stream 2018-01-31T12:01:14+00:00 > oak/data0a.tar/0071.2a1c49f0-ce33-4777-a042-8aa8a704d202 BlockBlob > 7760 application/octet-stream 2018-01-31T12:10:54+00:00 > oak/journal.log.001 AppendBlob > 1010 application/octet-stream 2018-01-31T12:10:54+00:00 > oak/manifest BlockBlob > 46application/octet-stream 2018-01-31T10:59:14+00:00 > oak/repo.lock BlockBlob > application/octet-stream 2018-01-31T10:59:14+00:00 > {noformat} > For the segment files, each name is prefixed with the index number. This > allows to maintain an order, as in the tar archive. This order is normally > stored in the index files as well, but if it's missing, the recovery process > uses the prefixes to maintain it. > Each file contains the raw segment data, with no padding/headers. Apart from > the segment files, there are 3 special files: binary references (.brf), > segment graph (.gph) and segment index (.idx). > h3. Asynchronous writes > Normally, all the TarWriter writes are synchronous, appending the segments to > the tar file. In case of Azure Blob Stor
[jira] [Comment Edited] (OAK-6922) Azure support for the segment-tar
[ https://issues.apache.org/jira/browse/OAK-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16337848#comment-16337848 ] Tomek Rękawek edited comment on OAK-6922 at 2/27/18 11:57 AM: -- The azure implementation: https://github.com/trekawek/jackrabbit-oak/tree/OAK-6922 was (Author: tomek.rekawek): The azure implementation: https://github.com/trekawek/jackrabbit-oak/tree/segment-tar-trunk/azure > Azure support for the segment-tar > - > > Key: OAK-6922 > URL: https://issues.apache.org/jira/browse/OAK-6922 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: segment-tar >Reporter: Tomek Rękawek >Assignee: Tomek Rękawek >Priority: Major > Fix For: 1.9.0, 1.10 > > Attachments: OAK-6922.patch > > > An Azure Blob Storage implementation of the segment storage, based on the > OAK-6921 work. > h3. Segment files layout > Thew new implementation doesn't use tar files. They are replaced with > directories, storing segments, named after their UUIDs. This approach has > following advantages: > * no need to call seek(), which may be expensive on a remote file system. > Rather than that we can read the whole file (=segment) at once. > * it's possible to send multiple segments at once, asynchronously, which > reduces the performance overhead (see below). > The file structure is as follows: > {noformat} > [~]$ az storage blob list -c oak --output table > Name Blob Type > Blob TierLengthContent Type Last Modified > --- > --- - > oak/data0a.tar/.ca1326d1-edf4-4d53-aef0-0f14a6d05b63 BlockBlob > 192 application/octet-stream 2018-01-31T10:59:14+00:00 > oak/data0a.tar/0001.c6e03426-db9d-4315-a20a-12559e6aee54 BlockBlob > 262144application/octet-stream 2018-01-31T10:59:14+00:00 > oak/data0a.tar/0002.b3784e27-6d16-4f80-afc1-6f3703f6bdb9 BlockBlob > 262144application/octet-stream 2018-01-31T10:59:14+00:00 > oak/data0a.tar/0003.5d2f9588-0c92-4547-abf7-0263ee7c37bb BlockBlob > 259216application/octet-stream 2018-01-31T10:59:14+00:00 > ... > oak/data0a.tar/006e.7b8cf63d-849a-4120-aa7c-47c3dde25e48 BlockBlob > 4368 application/octet-stream 2018-01-31T12:01:09+00:00 > oak/data0a.tar/006f.93799ae9-288e-4b32-afc2-bbc676fad7e5 BlockBlob > 3792 application/octet-stream 2018-01-31T12:01:14+00:00 > oak/data0a.tar/0070.8b2d5ff2-6a74-4ac3-a3cc-cc439367c2aa BlockBlob > 3680 application/octet-stream 2018-01-31T12:01:14+00:00 > oak/data0a.tar/0071.2a1c49f0-ce33-4777-a042-8aa8a704d202 BlockBlob > 7760 application/octet-stream 2018-01-31T12:10:54+00:00 > oak/journal.log.001 AppendBlob > 1010 application/octet-stream 2018-01-31T12:10:54+00:00 > oak/manifest BlockBlob > 46application/octet-stream 2018-01-31T10:59:14+00:00 > oak/repo.lock BlockBlob > application/octet-stream 2018-01-31T10:59:14+00:00 > {noformat} > For the segment files, each name is prefixed with the index number. This > allows to maintain an order, as in the tar archive. This order is normally > stored in the index files as well, but if it's missing, the recovery process > needs it. > Each file contains the raw segment data, with no padding/headers. Apart from > the segment files, there are 3 special files: binary references (.brf), > segment graph (.gph) and segment index (.idx). > h3. Asynchronous writes > Normally, all the TarWriter writes are synchronous, appending the segments to > the tar file. In case of Azure Blob Storage each write involves a network > latency. That's why the SegmentWriteQueue was introduced. The segments are > added to the blocking dequeue, which is served by a number of the consumer > threads, writing the segments to the cloud. There's also a map UUID->Segment, > which allows to return the segments in case they are requested by the > readSegment() method before they are actually persisted. Segments are removed > from the map only after a successful write operation. > The flush() method blocks accepting the new segments and returns after all > waiting segments are written. The close() method waits until the current > operations are finished and stops all threads. > The asynchronous mode can be disabled by setting the
[jira] [Comment Edited] (OAK-6922) Azure support for the segment-tar
[ https://issues.apache.org/jira/browse/OAK-6922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16337848#comment-16337848 ] Tomek Rękawek edited comment on OAK-6922 at 2/13/18 10:36 AM: -- The azure implementation: https://github.com/trekawek/jackrabbit-oak/tree/segment-tar-trunk/azure was (Author: tomek.rekawek): The azure implementation: https://github.com/trekawek/jackrabbit-oak/tree/segment-tar/azure > Azure support for the segment-tar > - > > Key: OAK-6922 > URL: https://issues.apache.org/jira/browse/OAK-6922 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: segment-tar >Reporter: Tomek Rękawek >Priority: Major > Fix For: 1.9.0, 1.10 > > Attachments: OAK-6922.patch > > > An Azure Blob Storage implementation of the segment storage, based on the > OAK-6921 work. > h3. Segment files layout > Thew new implementation doesn't use tar files. They are replaced with > directories, storing segments, named after their UUIDs. This approach has > following advantages: > * no need to call seek(), which may be expensive on a remote file system. > Rather than that we can read the whole file (=segment) at once. > * it's possible to send multiple segments at once, asynchronously, which > reduces the performance overhead (see below). > The file structure is as follows: > {noformat} > [~]$ az storage blob list -c oak --output table > Name Blob Type > Blob TierLengthContent Type Last Modified > --- > --- - > oak/data0a.tar/.ca1326d1-edf4-4d53-aef0-0f14a6d05b63 BlockBlob > 192 application/octet-stream 2018-01-31T10:59:14+00:00 > oak/data0a.tar/0001.c6e03426-db9d-4315-a20a-12559e6aee54 BlockBlob > 262144application/octet-stream 2018-01-31T10:59:14+00:00 > oak/data0a.tar/0002.b3784e27-6d16-4f80-afc1-6f3703f6bdb9 BlockBlob > 262144application/octet-stream 2018-01-31T10:59:14+00:00 > oak/data0a.tar/0003.5d2f9588-0c92-4547-abf7-0263ee7c37bb BlockBlob > 259216application/octet-stream 2018-01-31T10:59:14+00:00 > ... > oak/data0a.tar/006e.7b8cf63d-849a-4120-aa7c-47c3dde25e48 BlockBlob > 4368 application/octet-stream 2018-01-31T12:01:09+00:00 > oak/data0a.tar/006f.93799ae9-288e-4b32-afc2-bbc676fad7e5 BlockBlob > 3792 application/octet-stream 2018-01-31T12:01:14+00:00 > oak/data0a.tar/0070.8b2d5ff2-6a74-4ac3-a3cc-cc439367c2aa BlockBlob > 3680 application/octet-stream 2018-01-31T12:01:14+00:00 > oak/data0a.tar/0071.2a1c49f0-ce33-4777-a042-8aa8a704d202 BlockBlob > 7760 application/octet-stream 2018-01-31T12:10:54+00:00 > oak/journal.log.001 AppendBlob > 1010 application/octet-stream 2018-01-31T12:10:54+00:00 > oak/manifest BlockBlob > 46application/octet-stream 2018-01-31T10:59:14+00:00 > oak/repo.lock BlockBlob > application/octet-stream 2018-01-31T10:59:14+00:00 > {noformat} > For the segment files, each name is prefixed with the index number. This > allows to maintain an order, as in the tar archive. This order is normally > stored in the index files as well, but if it's missing, the recovery process > needs it. > Each file contains the raw segment data, with no padding/headers. Apart from > the segment files, there are 3 special files: binary references (.brf), > segment graph (.gph) and segment index (.idx). > h3. Asynchronous writes > Normally, all the TarWriter writes are synchronous, appending the segments to > the tar file. In case of Azure Blob Storage each write involves a network > latency. That's why the SegmentWriteQueue was introduced. The segments are > added to the blocking dequeue, which is served by a number of the consumer > threads, writing the segments to the cloud. There's also a map UUID->Segment, > which allows to return the segments in case they are requested by the > readSegment() method before they are actually persisted. Segments are removed > from the map only after a successful write operation. > The flush() method blocks accepting the new segments and returns after all > waiting segments are written. The close() method waits until the current > operations are finished and stops all threads. > The asynchronous mode can be disabled by setting the number of threads to 0. > h