Re: [perf-discuss] file system cache / segmap tuning

Jim Mauro Wed, 30 Apr 2008 12:59:59 -0700

Hi Nicolas - I agree - there's nothing wrong at all with your app consuming
60GB of RAM. It wasn't clear to me from the original post that the memory
consumer was known and understood. Now I know that it is. Cool.


Given all that, and Steve's information on segmap, I have to agree that the
memory you save me tweaking segmap down buys you the headroom you
need to run your workload without hitting a memory deficit and starting
to page.

Thanks - I think we're all on the same page now!

:^)

/jim


Nicolas Michael wrote:
> Hi Jim,
>
> thanks for your quick reply. My comments inline.
>
> Jim Mauro schrieb:
>
>>> - SPARC, 64 GB memory
>>> - UFS, PxFS file systems
>>>
>>> Our application is writing some logs to disk (4 GB / hour), flushing 
>>> some mmapped files from time to time (4 GB each 15 min), but is not 
>>> doing much disk I/O.
>>> Once our application is started and "warm", it doesn't allocate any 
>>> further memory.  At this point, we have 3-4 GB of free memory 
>>> (vmstat) and nothing paged out to disk (swap -l).   
>> Well, something is certainly consuming memory, because you indicate 
>> this is a 64GB system, and you show
>> 3-4GB free. Who/what is consuming 60GB of RAM?
>
> Our application! ;-)
> I don't want to go into the details here, but there's nothing wrong 
> about that. We know where all this memory is coming from (there are 
> some processes with large heaps, some large shm segments and so on).
>
> Steve has some slides on our application, in case you're really 
> interested...
>
>>> Since those memory requests are not coming from our application, I 
>>> assume that those 5 GB (3 GB less free memory plus 2 GB paged-out 
>>> data) are used for the file system cache. I always thought the fs 
>>> cache would never grow any more once memory gets short, so it should 
>>> never cause paging activity (since the cache list is part of the 
>>> free list). Reading Solaris Internals, I just learned that there's 
>>> not only a cache list, but also a segmap cache. As I understand 
>>> this, the segmap cache may very well grow up to 12% of main memory 
>>> and may even cause application pages to be paged out, correct? So, 
>>> this might be what's happening here. Can I somehow monitor the 
>>> segmap cache (since it is kernel memory, is it reported as "Kernel" 
>>> in ::memstat?)?
>>>   
>> Thinks of UFS as having an L1 and L2 cache (like processors). segmap 
>> is the L1 cache, when segmap fills up, pages get pushed
>> out the cache list (the "L2" cache), where they can be reclaimed back 
>> into segmap if they are referenced again via read/write
>> before.
>
> Ok, thanks.
>
>> The 12% of memory being consumed by segmap is not what's hurting you 
>> here (at least, I would be very
>> surprised if it is).
>
> We easily consume ~ 60 GB of memory just with our application 
> (including kernel, libs etc.). This doesn't allow us to spend 12% of 
> total memory for segmap cache in addition to that. If we would really 
> use all segmap cache that's possible (7.68 GB), we would exceed our 
> physical memory -- and I think this is happening.
>
> We can't reduce our application's demand for memory (in fact, we 
> already did reduce it by something like 20 GB to fit into 64 GB of 
> memory), so we need to reduce the max segmap cache size. Otherwise we 
> would need to install more memory in the system (which we don't want).
>
>>> My idea is now to set segmap_percent=1 to decrease the max size of 
>>> the segmap cache and this way avoid having pages paged out due to 
>>> growing fs cache. In a testrun with this configuration, my free 
>>> memory doesn't fall below 3.5 GB any more and nothing is being paged 
>>> out -- saving me 4.5 GB of memory!
>>>   
>> Does this machine really have 64GB of RAM (as indicated above)?
>
> Yep!
>
>>> Since we don't do much disk I/O, I would assume that we don't gain 
>>> much from the segmap cache anyway, so I would like to configure it 
>>> to 1%. File system pages will still be cached in the cache list as 
>>> long as memory is available, right? With the advantage, that the 
>>> cache list is basically "free" memory and would never cause other 
>>> pages to be paged out.   
>> Generally, yes.
>
> Ok.
>
>>> I'm not sure, but as I understand it the segmap cache is still used 
>>> during read and write operations, right? So, every time we write a 
>>> file, we always write into the segmap cache. If this cache is small 
>>> (let's say: 1% = 640 MB), we might be slowed done when writing more 
>>> than 640 MB all at once. However, if we would only write 64 MB every 
>>> minute, pages from the segmap cache would migrate to the cache list 
>>> and make room for more pages in the segmap cache, so next time we 
>>> write 64 MB, would there again be enough space in the segmap cache 
>>> for the write operation?
>>>   
>> Generally, yes, assuming the writes are not to files with 
>> O_SYNC/O_DSYNC, in which case every write must go through
>> the cache anyway.
>
> Thanks.
>
>>> Also, just to be sure: memory mapped files are never read or written 
>>> through the segmap cache, so shrinking that cache has no effect on 
>>> memory mapped files, right?
>>>   
>> That is correct. mmap()'d files are not cached in segmap.
>
> Ok, that's good to know.
>
>> Something is missing here, or the 64GB value is wrong.
>>
>> You need to figure out who/what is consuming 60GB of RAM.
>> Use ' echo "::memstat" | mdb -k' for a high-order profile.
>
> As I said above, there's really nothing wrong with our application 
> consuming 60 GB... ;-)
>
> But here it is:
>
> Page Summary                Pages                MB  %Tot
> ------------     ----------------  ----------------  ----
> Kernel                     291570              2277    4%
> Anon                      6892465             53847   83%
> Exec and libs               45966               359    1%
> Page cache                 751264              5869    9%
> Free (cachelist)           137777              1076    2%
> Free (freelist)            179173              1399    2%
>
> Total                     8298215             64829
> Physical                  8166070             63797
>
> This snapshot has been taken before I reconfigured the system. So this 
> is with segmap_percent=12. It was taken 2 hours after a long testrun. 
> As I wrote above, free memory jumped from 1 GB to 2.5 GB 1 hour after 
> we stopped the load. The only explanation I have for this is pages 
> being freed from the segmap cache.
>
> Steve wrote, segmap cache is part of "Page cache". Assuming, there was 
> 1.5 GB more data in the segmap cache during the testrun, this would 
> make 7.4 GB Page cache. 4 GB of it are memory mapped files. This 
> leaves 3.4 GB for segmap cache. Seems to me that's just 50% of its 
> possible max size, but still too much for our system.
>
> I believe we don't need that much for segmap. All we are doing on the 
> file system (except for the mmapped files) is write a large logfile 
> sequentially, close it and copy it to a different location, and later 
> on ftp it somewhere. This shouldn't require much segmap cache...
>
> Thanks a lot,
> Nick.
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Re: [perf-discuss] file system cache / segmap tuning

Reply via email to