In message <[EMAIL PROTECTED]>,Jeffrey Hutzelman w
rites:
>Well, right now we use two numbers.  One is a constant; the other is a 
>function of the chunk size.  It sounds like you're arguing for eliminating 
>the constant, or at least limiting its effect as the cache size grows very 
>large.  Fine, but without data, how do we decide where to draw the line? 

if it difficult to work without the data but you can make some
reasonable guesses based on the typical machine size.  for instance,
-files 500000 would probably need more memory than a typical 1GB
machine could provide.  its much safer to err on the side of caution when
it comes to consuming kernel memory (for the same reason someone should
get around to getting to making the memcache safer instead of letting
people shoot themselves in the foot).

attached is a small perl script that takes an directory as the first
argument.  run this on a your local cache.  for instance:

# avg.pl /usr/vice/cache
# !avg
avg.pl /usr/vice/cache
1222 files
3330850 total bytes
2725 avg bytes

< 1k 52 4%
< 2k 7 4%
< 3k 1147 98%
< 5k 2 98%
< 7k 1 98%
< 9k 3 99%
< 11k 2 99%
< 19k 2 99%
< 21k 2 99%
< 25k 1 99%
< 41k 1 99%
< 47k 1 99%

(yes, my cache looks funny since i was running a benchmark inside afs.)
you could also run it on a sample dir to check the file distribution.
for instance, if you were going to build the 2.6 kernel sources in
afs, the file distribution looks something like:

28374 files
1566439944 total bytes
55206 avg bytes

< 1k 7536 26%
< 2k 2477 35%
< 3k 1837 41%
< 4k 1411 46%
< 5k 1101 50%
< 6k 844 53%
< 7k 691 56%
< 8k 572 58%
< 9k 509 59%
< 10k 1181 63%
< 11k 635 66%
< 12k 643 68%
< 13k 565 70%
< 14k 708 72%
< 15k 709 75%

this should give you some idea for manual tuning.  

>With a small cache and a large chunk size, we need that constant to ensure 
>we have enough files.

it would be unwise to let the autotuning pick such a configuration.
however, we cant prevent users from manually tuning.

Attachment: avg.pl
Description: avg.pl

Reply via email to