There is the same kind of issues with Hadoop but the block size is 128MB.
So, lots of small files give the same issue.

This is solved by having HAR files (Hadoop Archive) that contain the files.

The haddop filesystem is usually able to access the har contents somewhat
transparently from userland.

But I guess that libgit will not be too cooperative.

We can also look into how to mount such an archive as a filesystem.


Phil



Le 21 mai 2017 17:27, "Stephan Eggermont" <step...@stack.nl> a écrit :

At the PharoDays I was painfully reminded that SSDs perform really badly
when using small files. The Bloc tutorial used a github filetree repo and
that has a lot of files. The whole folder is 116 MB in 16K files. Copying
that amount of data should not be noticable, taking about a third of a
second. With it being in so many files, it took more than half a minute, or
a hundred times as long.

That is too much overhead. How can we improve the file format in a way that
keeps the cross-platform exchange advantages and a reasonable way to view
diffs and propose small changes using the github web tools?

Cuis uses a different format with git. How does that compare? What is used
in Squeak?

Stephan

Reply via email to