In message <[EMAIL PROTECTED]>,Jeffrey Hutzelman w rites: >Well, right now we use two numbers. One is a constant; the other is a >function of the chunk size. It sounds like you're arguing for eliminating >the constant, or at least limiting its effect as the cache size grows very >large. Fine, but without data, how do we decide where to draw the line?
if it difficult to work without the data but you can make some reasonable guesses based on the typical machine size. for instance, -files 500000 would probably need more memory than a typical 1GB machine could provide. its much safer to err on the side of caution when it comes to consuming kernel memory (for the same reason someone should get around to getting to making the memcache safer instead of letting people shoot themselves in the foot). attached is a small perl script that takes an directory as the first argument. run this on a your local cache. for instance: # avg.pl /usr/vice/cache # !avg avg.pl /usr/vice/cache 1222 files 3330850 total bytes 2725 avg bytes < 1k 52 4% < 2k 7 4% < 3k 1147 98% < 5k 2 98% < 7k 1 98% < 9k 3 99% < 11k 2 99% < 19k 2 99% < 21k 2 99% < 25k 1 99% < 41k 1 99% < 47k 1 99% (yes, my cache looks funny since i was running a benchmark inside afs.) you could also run it on a sample dir to check the file distribution. for instance, if you were going to build the 2.6 kernel sources in afs, the file distribution looks something like: 28374 files 1566439944 total bytes 55206 avg bytes < 1k 7536 26% < 2k 2477 35% < 3k 1837 41% < 4k 1411 46% < 5k 1101 50% < 6k 844 53% < 7k 691 56% < 8k 572 58% < 9k 509 59% < 10k 1181 63% < 11k 635 66% < 12k 643 68% < 13k 565 70% < 14k 708 72% < 15k 709 75% this should give you some idea for manual tuning. >With a small cache and a large chunk size, we need that constant to ensure >we have enough files. it would be unwise to let the autotuning pick such a configuration. however, we cant prevent users from manually tuning.
avg.pl
Description: avg.pl