Nikolaus Rath schrieb:

> On Dec 09 2015, Daniel Jagszent <[email protected]> wrote:
>> > AFAIK S3QL does not do directory/bucket/container listings on the Swift
>> > backend. So it does not matter if there are thousands or millions of
>> > data blocks on your Swift storage.
>
> Well, fsck.s3ql does such listings. But they are paginated, so there
> should be no issues.

I can confirm that. Due to stupidity (wanting to increase the nofile
limit |ulimit -n| but actually setting a hard limit on file size |ulimit
-f|) I once got the sqlite database corrupted for that big file system.
After re-creating the database (with the sqlite command line tool) I
naturally needed to run a fsck.s3ql on the file system. It took some
time but worked flawlessly.

>> > [...] At the end the filesystem had 16 million directory
>> > entries and 1.5 million inodes (Burp uses hard links excessively) and
>> > the sqlite database that S3QL uses to store the filesystem structure was
>> > 1.2 GB uncompressed.
>
> This is not unreasonable though. Note that ext4 would require at least 5
> GB of metadata as well - just to store the inodes (assuming 4096 bytes
> inode size). That's not yet counting directory entry *names*.

Sure. The size of the sqlite database is reasonable for so many
inodes/directory entries. But I suspect that ext4 will scale better in
terms of execution time for normal operations like e.g. file system
stats (|df|). S3QL needs to do several full table scans
<https://bitbucket.org/nikratio/s3ql/src/default/src/s3ql/fs.py?fileviewer=file-view-default#fs.py-916:918>
for that and this will take its time for tables that big (In my case
approx. 10 seconds).

>> > Also S3QL scales not very good with parallel file accesses but Burp
>> > does a ton of those. (The sqlite database is not thread safe and thus
>> > every read/write access to the database gets serialized by S3QL).
>
> Both is true, but one is not the cause of another. Most reads/writes
> don't require access to the database and could run in parallel. However,
> S3QL itself is mostly single threaded at the moment so the requests are
> indeed serialized.

Thanks for the clarification!

> However, I have plans in the drawer to fix this at some point. The idea
> is to handle reads/writes for blocks that are already cached entirely at
> the C level. This will allow concurrenty *and* at the same time boost
> single-threaded performance as well. Just need to find the time...

That sounds great! (More performance always does :) )
Am I right in assuming that this will speed up read/write syscalls but
not stuff that solely works on the database? (like opendir or the attr
and xattr calls)

​

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to