Rabid Mutant <[email protected]> writes:
> On Saturday, January 24, 2015 at 3:00:20 AM UTC+11, Nikolaus Rath wrote:
>>
>> On 01/21/2015 08:16 PM, Rabid Mutant wrote:
>> > 
>> > Does S3QL really need 1 file handle per cache entry?
>>
>> In principle, no. The way it's currently programmed, yes.
>
> Looking briefly at the code, it seems I might be able to replace
> access to the file handle with a call to a cache manager, and
> everything should just work...but that's based on one quick
> look. Would that be a correct assessment?

It sounds right, but I haven't looked in detail either.

>>> I could use rsync to compare files and update only the new & changed
>>> files without any unnecessary network I/O. It would also allow for
>>> the possibility of offline use.
>>
>> rsync by default uses file name, modification time, and size to check if 
>> a file has changed, so it won't incur any network IO apart from what's 
>> necessary to transfer new and changed files. 
>>
>> This changes if you use the -c option, but I'd be rather curious why 
>> you'd need that. 
>>
>
>
> Some applications (notably PostgreSQL) do not update inode dates when they 
> update files, specifically to reduce IO load. ie. the data is changed, but 
> the modification dates (and quite probably size) are not.

Are you sure that's correct? Updating the inode dates happens in the
kernel, and as far as I know there is no way for an userspace
application to prevent this. You can of course reset the times to the
original values, but this would increase the IO load rather than reduce
it.

> So the -c option becomes important, at least in this case.
>
> I also (sometimes) change the file modification dates on photos to the
> original photo date after trivial edits: eg changing EXIF data. In this
> case the date and size remain the same.

That might cause problems indeed. But seriously, I'd simply not do that
instead of trying to patch S3QL to support a bigger cache.

> Another factor, and I agree it's probably minor, but decryption is usually 
> considerably faster than compression, and my expectation was that using 
> 'rsync -c' on a fully cached file system (thereby comparing uncompressed 
> data) would be faster than compressing the data and comparing the 
> checksums.

The checksum is calculated before compression. Compression and
encryption only happens after the checksum has been calculated and no
matching existing block has been found.

> Since the following would occur:
>
> Normal Copy:
>
> 1. compress chunk
> 2. compare hash in DB
> 3. If different:
>   a. send compressed from step 1

No. It's

1. Calculate checksum
2. If different:
  a. compress
  b. encrypt
  c. upload

> rsync -c copy with complete cache:
>
> 1. decompress chunk from cache
> 2. compare data
> 3. if different:
>   a. Compress chunk
>   b. send

No, that wrong as well. The cache is stored uncompressed.


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to