On Thu, Sep 16, 2010 at 6:39 PM, Leif Hedstrom <[email protected]> wrote: > On 09/16/2010 05:04 PM, Pranav Desai wrote: >> >> Hi! >> >> I am running some performance test with large files. As mentioned in >> one of the earlier threads I am using curl-loader for testing with >> randomization in the URL to stress the cache. >> >> Version: 2.0.1 >> >> Config: >> CONFIG proxy.config.cache.ram_cache.size LLONG 2097152000 >> CONFIG proxy.config.cache.ram_cache_cutoff LLONG 100048576 > > First thing, can you make sure when serving a single 15MB object out of > cache, that it serves it out of RAM cache, and that it doesn't hit the disk > at all (other than logs, but might want to turn that off, to make sure the > only disk I/O is cache)? We had a problem in the past where it'd hit the > disk for certain large object even though they should fit in RAM (that > should be fixed / gone though).
Initially with the default cutoff value of 1MB, I didnt see any 'Bytes Used' under RAM in cache-stats. So I figured cutoff might be the value that is like max object size to be put in mem. After increasing that value to 100MB, I started seeing those values bump up. I will still reconfirm that everything is served from RAM. > >> storage.config >> /mnt/cache/trafficserver 60368709120 >> the file is 15MB in size. > > The first thing I'd recommend (which holds true for all ATS versions) is to > switch to the raw device cache. The on-filesystem cache is primarily > intended for testing / development, real usage should use a raw device (for > direct I/O). The raw device cache should be superior in performance and > reliability. > Done. Using 2 disks. Do you recommend a raid config for better performance ? >> * The url randomness is just a number within that range in the URL. >> * There are 500 clients each access the URL 50 times. >> >> * So in the best case scenario with only a single URL, I can get 700+ >> Mbps and I think I can get more if I use 2 client machines and more >> network cards. Currently the testbed is limited to 1Gbps. >> * As I can increase the randomness, so essentially there are 2000 >> unique URLs, the performance drops significantly. > > This is not entirely surprising. This version of ATS (v2.0.x) partitions the > disk(s) into 8GB partitions, and each such partition has it's own disk > position "pointer". It'd be interesting to see if you get the same > performance up to 8GB cache size, and then notice a drop in performance when > going from (say) 8GB to 15GB. This "problem" is completely eliminated in ATS > v2.1.x (where each partition will be up to 0.5PB). > I see. So does it have to seek > > Yes, you definitely want to increase that, I'd recommend trying maybe 16 - > 24 I/O threads per disk (spindle), and see if it makes a noticeable. Make > sure that if your disk is RAIDed (e.g. RAID1), that you adjust the I/O > threads accordingly (ATS has no way of knowing how many spindles are > actually behind a RAIDed disk, so it treats it as one). The setting would be > > CONFIG proxy.config.cache.threads_per_disk INT 16 > > (for example). I don't think it's in the default records.config, so you'll > have to add it manually I think. Another interesting configuration is > > proxy.config.cache.min_average_object_size > > (default is 8000), which doesn't really affect performance, but if you know > that your cache is going to hold much larger objects than that, it can save > a large amount of memory increasing this (since it reduces the in-memory > directory size). > > > There might also be some network related kernel tuning that could improve > the situation a bit, I'd expect you to be able to drive the full GigE unless > disk is becoming the bottleneck.' > Here are my tcp mem parameters. And req/sec isnt of concern here so I should be ok with the listen queue and backlogs. If you have any particular setting in mind please let me know. net.ipv4.tcp_mem = 1339776 1786368 2679552 net.ipv4.tcp_wmem = 4096 87380 8388608 net.ipv4.tcp_rmem = 4096 87380 8388608 > > We really have to fix TS-441 though, if you can figure out some reliable > (and hopefully easy) way of reproducing it, that would help tremendously. > I think I can reproduce it but under load, so it might be a bit difficult to debug it especially with all the threads. I will try to get to a simpler test case to reproduce it. Maybe I can run traffic_server alone with a single network and io thread ? How do you guys normally debug it ? > Cheers, > > -- leif > >
