Am 15.07.2015 um 18:53 schrieb Matthew Dillon:
You should use a database, frankly. A HAMMER1 inode is 128 bytes and
for small files I think the data will run on 16-byte boundaries. Not
sure of that. Be sure to mount with 'noatime', and also use the
double buffer option because the kernel generally can't cache that
many tiny files itself.
The main issue with using millions of tiny files is that each one
imposes a great deal of *ram* overhead for caching, since each one
needs an in-memory vnode, in-memory inode, and all related file
tracking infrastructure.
Secondarily, hammer's I/O optimizations are designed for large files,
not small files, so the I/O is going to be a lot more random.
Thanks for the insight. I take a database :). It's a pitty that none of
the databases out there support HAMMER-like queue-less streaming out of
the box.
I wish you would have finished backplane database :).
Regards,
Michael
-Matt
On Wed, Jul 15, 2015 at 8:58 AM, Michael Neumann <[email protected]
<mailto:[email protected]>> wrote:
Hi,
Lets say I want to store 100 million small files (each one about
1k in size) in a HAMMER file system.
Files are only written once, then kept unmodified and accessed
randomly (older files will be access less often).
It is basically a simple file based key/value store, but
accessible by multiple processes.
a) What is the overhead in size for HAMMER1? For HAMMER2 I expect
each file to take exactly 1k when the file
is below 512 bytes.
b) Can I store all files in one huge directory? Or is it better to
fan out the files into several sub-directories?
c) What other issues I should expect to run into? For sure I
should enable swapcache :)
I probably should use a "real" database like LMDB, but I like the
versatility of files.
Regards,
Michael