Thanks for the detailed reply. I'm wondering now about the deduplication. Do you have an impression on what it would take from a code perspective to split an incoming object into blocks so that there is block-level deduplication, or even variable-block deduplication? How hard would this be to implement from your perspective? If you have some feedback here I'd be interested to know.
Mike On Tuesday, September 30, 2014 7:05:44 PM UTC-7, Nikolaus Rath wrote: > > [email protected] <javascript:> writes: > > Hi, > > > > I have recently tried out s3ql on Debian testing, and I have a few > > questions. > > > > I'm using s3ql with local storage, without encryption or compression. > > I set threads to 1 as a baseline > [...] . > > I find when I specify cachesize manually to be small or zero that my > > write throughput goes down by several orders of magnitude. Is using > > no cache unsupported? > > Yes, this is not supported. You are right that if the backend storage is > a local disk, this could be made to work. However, S3QL was designed for > network storage, and the "local" storage backend was added for use > with a network file system (like sshfs) and testing, and not as an > efficient method to utilize your local disk. > > In theory, there are several optimizations one could implement with the > local backend (not requiring a cache being one of them). However, I > don't think this is worth it. I don't think that even with additional > optimizations, there'd be little reason not to use e.g. dm-crypt with > btrfs to get very similar features with orders of magnitude better > performance. > > > I don't mind a small performance loss but when I use a zero cache size > > I get throughput of around 50 kilobytes per second, which suggests > > that I'm running up against an unexpected code path. Read performance > > is okay even in that case. > > I think with zero cache, S3QL probably downloads, updates, uploads and > removes a cache entry for every single read() call. > > > The next thing I'm wondering a lot about is the deduplication. In my > > test, I'm writing all zeroes. I write a megabyte using one block of a > > 1MB block size using dd, and then I write 1024 blocks of a kilobyte > > each. I then also write 2MB or 4MB at a time. I'd expect that > > deduplication would catch these very trivial cases and that I'd only > > see one entry of at most 2^n bytes, where 2^n represents the > > approximate block size of the deduplication. > > Yes, this is what should happen. > > > I'd also expect 2^n to be smaller than a megabyte (maybe like a single > > 64k block). > > That's probably not the case. S3QL de-duplicates on the level of storage > objects. You specify the maximum storage object size at mkfs.s3ql time > with the --blocksize option, and the default is 10 MB. > > To see de-duplication in action, you either need to write more data, or > you need to write smaller, but identical files: > > $ echo hello, world > foo > $ echo hello, world > bar > > ..in this case s3ql will store only one storage object (containing > "hello, world") in the backend. > > > Best, > -Nikolaus > > -- > GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F > Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F > > »Time flies like an arrow, fruit flies like a Banana.« > -- You received this message because you are subscribed to the Google Groups "s3ql" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
