On Thu 23 Apr 2015 12:15:04 PM CEST, Stefan Hajnoczi wrote: >> For a cache size of 128MB, the PSS is actually ~10MB larger without >> the patch, which seems to come from posix_memalign(). > > Do you mean RSS or are you using a tool that reports a "PSS" number > that I don't know about? > > We should understand what is going on instead of moving the code > around to hide/delay the problem.
Both RSS and PSS ("proportional set size", also reported by the kernel). I'm not an expert in memory allocators, but I measured the overhead like this: An L2 cache of 128MB implies a refcount cache of 32MB, in total 160MB. With a default cluster size of 64k, that's 2560 cache entries. So I wrote a test case that allocates 2560 blocks of 64k each using posix_memalign and mmap, and here's how their /proc/<pid>/smaps compare: -Size: 165184 kB -Rss: 10244 kB -Pss: 10244 kB +Size: 161856 kB +Rss: 0 kB +Pss: 0 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB -Private_Dirty: 10244 kB -Referenced: 10244 kB -Anonymous: 10244 kB +Private_Dirty: 0 kB +Referenced: 0 kB +Anonymous: 0 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB Those are the 10MB I saw. For the record I also tried with malloc() and the results are similar to those of posix_memalign(). Berto