Andrew Deason wrote:
On Fri, 18 Sep 2009 16:38:28 -0400
Robert Milkowski <mi...@task.gda.pl> wrote:

No. We need to be able to tell how close to full we are, for
determining when to start/stop removing things from the cache
before we can add new items to the cache again.
but having a dedicated dataset will let you answer such a question immediatelly as then you get from zfs information from for the
dataset on how much space is used (everything: data + metadata) and
how much is left.

Immediately? There isn't a delay between the write and the next commit
when the space is recorded? (Do you mean a statvfs equivalent, or some
zfs-specific call?)

And the current code is structured such that we record usage changes
before a write; it would be a huge pain to rely on the write to
calculate the usage (for that and other reasons).

There will be a delay of up-to 30s currently.

But how much data do you expect to be pushed within 30s?
Lets say it would be even 10g to lots of small file and you would calculate the total size by only summing up a logical size of data. Would you really expect that an error would be greater than 5% which would be 500mb. Does it matter in practice?



Setting recordsize to 1k if you have lots of files (I assume)
larger than that doesn't really make sense.
The problem with metadata is that by default it is also compressed
so there is no easy way to tell how much disk space it occupies
for a specified file using standard API.
We do not know in advance what file sizes we'll be seeing in
general. We could of course tell people to tune the cache dataset
according to their usage pattern, but I don't think users are
generally going to know what their cache usage pattern looks like.

I can say that at least right now, usually each file will be at
most 1M long (1M is the max unless the user specifically changes
it). But between the range 1k-1M, I don't know what the
distribution looks like.

What I meant was that I believe that default recordsize of 128k
should be fine for you (files smaller than 128k will use smaller
recordsize, larger ones will use a recordsize of 128k). The only
problem will be with files truncated to 0 and growing again as they
will be stuck with an old recordsize. But in most cases it won't
probably be a practical problem anyway.

Well, it may or may not be 'fine'; we may have a lot of little files in
the cache, and rounding up to 128k for each one reduces our disk
efficiency somewhat. Files are truncated to 0 and grow again quite often
in busy clients. But that's an efficiency issue, we'd still be able to
stay within the configured limit that way.

But anyway, 128k may be fine for me, but what about if someone sets
their recordsize to something different? That's why I was wondering
about the overhead if someone sets the recordsize to 1k; is there no way
to account for it even if I know the recordsize is 1k?


what is user enables compression like lzjb or even gzip?
How would you like to take it into account before doing writes?

What if user creates a snapshot? How would you take it into account?

I'm under suspicion that you are looking too closely  for no real benefit.
Especially if you don't want to dedicate a dataset to cache you would expect other applications in a system to write to the same file system but different locations which you have no control or ability to predict how much data will be written at all. Be it Linux, Solaris, BSD, ... the issue will be there.

IMHO a dedicated dataset and statvfs() on it should be good enough, eventually with an estimate before writing your data (as a total logical file size from application point of view) - however due to compression or dedup enabled by user that estimate could be totally wrong so probably doesn't actually make sense.


--
Robert Milkowski
http://milek.blogspot.com


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to