Re: [s3ql] Re: Partial block caching implementation

Nikolaus Rath Wed, 15 Jul 2020 03:13:20 -0700

On Jul 11 2020, Ivan Shapovalov <[email protected]> wrote:
> On 2020-07-11 at 12:13 +0100, Nikolaus Rath
> wrote:
>> On Jul 11 2020, Ivan Shapovalov <[email protected]> wrote:
>> > On 2020-07-10 at 19:54 +0100, Nikolaus Rath wrote:
>> > > On Jul 10 2020, Daniel Jagszent <[email protected]> wrote:
>> > > > > Ah yes, compression and probably encryption will indeed preclude any 
>> > > > > sort of
>> > > > > partial block caching. An implementation will have to be limited to 
>> > > > > plain
>> > > > > uncompressed blocks, which is okay for my use- case though (borg 
>> > > > > provides its
>> > > > > own encryption and compression anyway).  [...]
>> > > > Compression and encryption are integral parts of S3QL and I would 
>> > > > argue that
>> > > > disabling them is only an edge case.
>> > >  If I were to write S3QL from scratch, I would probably not support this 
>> > > at all,
>> > > right. However, since the feature is present, I think we ought to 
>> > > consider it fully
>> > > supported ("edge case" makes it sound as if this isn't the case).
>> > > 
>> > > 
>> > > > I might be wrong but I think Nikolaus (maintainer of S3QL) will not 
>> > > > accept such a
>> > > > huge change into S3QL that is only beneficial for an edge case.
>> > >  Never say never, but the bar is certainly high here. I think there are 
>> > > more
>> > > promising avenues to explore - eg. storing the compressed/uncompressed 
>> > > offset
>> > > mapping to make partial retrieval work for all cases.
>> >  Hmm, I'm not sure how's that supposed to work.
>> > 
>> > AFAICS, s3ql uses "solid compression", meaning that the entire block is 
>> > compressed at
>> > once. It is generally impossible to extract a specific range of 
>> > uncompressed data
>> > without decompressing the whole stream.[1]
>>
>>  At least bzip2 always works in blocks, IIRC blocks are at most 900 kB (for 
>> highest
>> compression settings). I wouldn't be surprised if the same holds for LZMA.
>
> True, I forgot that bzip2 is inherently block-based. Not sure about LZMA or 
> gzip, but
> there is still a significant obstacle: how would you extract this information 
> from the
> compression libraries?


No need to extract it, S3QL hands data to the compression library in
smaller chunks (IIRC 128 kB), so we just have to keep track of what goes
into and comes out of the compression library.


Best,
-Nikolaus

-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/s3ql/874kq9f5o7.fsf%40vostro.rath.org.

Re: [s3ql] Re: Partial block caching implementation

Reply via email to