Nikolaus Rath schrieb: > On Dec 09 2015, Daniel Jagszent <[email protected]> wrote: >> > AFAIK S3QL does not do directory/bucket/container listings on the Swift >> > backend. So it does not matter if there are thousands or millions of >> > data blocks on your Swift storage. > > Well, fsck.s3ql does such listings. But they are paginated, so there > should be no issues.
I can confirm that. Due to stupidity (wanting to increase the nofile limit |ulimit -n| but actually setting a hard limit on file size |ulimit -f|) I once got the sqlite database corrupted for that big file system. After re-creating the database (with the sqlite command line tool) I naturally needed to run a fsck.s3ql on the file system. It took some time but worked flawlessly. >> > [...] At the end the filesystem had 16 million directory >> > entries and 1.5 million inodes (Burp uses hard links excessively) and >> > the sqlite database that S3QL uses to store the filesystem structure was >> > 1.2 GB uncompressed. > > This is not unreasonable though. Note that ext4 would require at least 5 > GB of metadata as well - just to store the inodes (assuming 4096 bytes > inode size). That's not yet counting directory entry *names*. Sure. The size of the sqlite database is reasonable for so many inodes/directory entries. But I suspect that ext4 will scale better in terms of execution time for normal operations like e.g. file system stats (|df|). S3QL needs to do several full table scans <https://bitbucket.org/nikratio/s3ql/src/default/src/s3ql/fs.py?fileviewer=file-view-default#fs.py-916:918> for that and this will take its time for tables that big (In my case approx. 10 seconds). >> > Also S3QL scales not very good with parallel file accesses but Burp >> > does a ton of those. (The sqlite database is not thread safe and thus >> > every read/write access to the database gets serialized by S3QL). > > Both is true, but one is not the cause of another. Most reads/writes > don't require access to the database and could run in parallel. However, > S3QL itself is mostly single threaded at the moment so the requests are > indeed serialized. Thanks for the clarification! > However, I have plans in the drawer to fix this at some point. The idea > is to handle reads/writes for blocks that are already cached entirely at > the C level. This will allow concurrenty *and* at the same time boost > single-threaded performance as well. Just need to find the time... That sounds great! (More performance always does :) ) Am I right in assuming that this will speed up read/write syscalls but not stuff that solely works on the database? (like opendir or the attr and xattr calls) -- You received this message because you are subscribed to the Google Groups "s3ql" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
