Re: An analysis of content process memory overhead
On Mon, Mar 21, 2016 at 3:50 PM, Nicholas Nethercotewrote: > > - Heap overhead is significant. Reducing the page-cache size could save a > couple of MiBs. Improvements beyond that are hard. Turning on jemalloc4 > *might* help a bit, but I wouldn't bank on it, and there are other > complications with that. I reduced the page-cache size from 4 MiB to 1 MiB in bug 1258257, saving up to 3 MiB per process. There was no discernible performance impact. > On Linux64, libxul contains about 5.3 MiB of static data. I've done some work to reduce this, as has Nathan Froyd, mostly under bug 1254777. I just did local Linux64 builds of the release branch and mozilla-inbound. The 'data' measurement provided by the |size| utility has dropped from 5,515,676 to 4,683,616 bytes, a reduction of 832,060 bytes. I also double-checked this by enabling memory.system_memory_reporter (which provides detailed OS-level memory measurements, Linux-only) and then looking at the appropriate "libxul.so/[rw-p]" entry in about:memory. The change there was from 5,472,256 to 4,685,824 bytes, a reduction of 786,432 bytes. I'm not sure why these numbers don't quite match the |size| numbers -- difference between on-disk an memory representations, perhaps? Nonetheless, they're similar. So that's the good news. The bad news is (a) there's still a long way to go before we can reasonably ship with more than 2 or perhaps 4 content processes enabled, (b) those changes above represent the lowest-hanging fruit I could find, and (c) I will have very limited time to work further on this in the medium-term. Nick ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: An analysis of content process memory overhead
On Tue, Mar 15, 2016 at 2:34 PM, Nicholas Nethercotewrote: > > - > Conclusion > - > > The overhead per content process is significant. I can see scope for moderate > improvements, but I'm having trouble seeing how big improvements can be made. > Without big improvements, scaling the number of content processes beyond > 4 (*maybe* 8) won't be possible. > > - JS overhead is the biggest factor. We execute a lot of JS code just starting > up for each content process -- can that be reduced? We should also consider a > smaller nursery size limit for content processes. > > - Heap overhead is significant. Reducing the page-cache size could save a > couple of MiBs. Improvements beyond that are hard. Turning on jemalloc4 > *might* help a bit, but I wouldn't bank on it, and there are other > complications with that. > > - Static data is a big chunk. It's hard to make much of a dent there because > it has a *very* long tail. > > - The remaining buckets are a lot smaller. Just to expand upon that, here are the top-level numbers for all three platforms, both small and large processes. For this computation I assumed that "explicit" memory is entirely a subset of "resident-unique", which is probably true or very close to it. (Note: this data looks best with a fixed-width font.) Linux64, small processes - resident-unique 38.1 MiB (100%) - explicit- 22.8 MiB (60%) - js-non-window - 11.2 MiB (29%) - other - 7.8 MiB (20%) - heap-overhead - 3.8 MiB (10%) - static? - 15.3 MiB (40%) Linux64, large processes - resident-unique 52.6 MiB (100%) - explicit- 38.7 MiB (74%) - js-non-window - 22.3 MiB (42%) - other - 9.8 MiB (19%) - heap-overhead - 6.6 MiB (13%) - static? - 13.9 MiB (26%) Mac64, small processes - resident-unique 49.3 MiB (100%) - static? - 27.9 MiB (57%) - explicit- 21.4 MiB (43%) - js-non-windows - 11.1 MiB (23%) - other - 6.9 MiB (14%) - heap-overhead - 3.4 MiB ( 7%) Mac64, large processes - resident-unique 59.4 MiB (100%) - explicit- 30.1 MiB (51%) - js-non-windows - 15.7 MiB (26%) - heap-overhead - 7.7 MiB (13%) - other - 6.7 MiB (11%) - static? - 29.3 MiB (49%) Win32, small processes - resident-unique 39.3 MiB (100%) - static? - 23.4 MiB (60%) - explicit- 15.9 MiB (40%) - js-non-windows - 8.4 MiB (21%) - heap-overhead - 3.8 MiB (10%) - other - 3.7 MiB ( 9%) Win32, large processes - resident-unique 51.6 MiB (100%) - explicit- 28.5 MiB (55%) - js-non-windows - 16.1 MiB (31%) - heap-overhead - 6.8 MiB (13%) - other - 5.6 MiB (11%) - static? - 23.1 MiB (45%) The "resident-unique" increases by 38--59 MiB per content process. That's a bit lower than erahm got in his measurements, possibly because his methodology involved doing a lot more work in each content process. Of that increase: - "static?" accounts for 26--60% - "explicit/js-non-windows" accounts for 21--42% - "explicit/heap-overhead" accounts for 7--13% - "explicit/other" (everything not accounted for by the above three lines) accounts for 9--20% About the "static?" measure -- On Linux64, libxul contains about 5.3 MiB of static data. Other libraries used by Firefox contain much less. So I don't know what else is being measured in the "static?" number (i.e. what accounts for the change in difference between "resident-unique" and "explicit"). Nick ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: An analysis of content process memory overhead
I filed bug 876173[1] about this a long time ago. Recently, I talked to Gabor, who's started looking into enabling multiple content processes. One other thing we should be able to do is sharing the self-hosting compartment as we do between runtimes within a process. It's not that big, but it's not nothing, either. till [1] https://bugzilla.mozilla.org/show_bug.cgi?id=876173 On Tue, Mar 15, 2016 at 4:34 AM, Nicholas Nethercotewrote: > Greetings, > > erahm recently wrote a nice blog post with measurements showing the > overhead of > enabling multiple content processes: > > http://www.erahm.org/2016/02/11/memory-usage-of-firefox-with-e10s-enabled/ > > The overhead is high -- 8 content processes *doubles* our physical memory > usage -- which limits the possibility of increasing the number of content > processes beyond a small number. Now I've done some follow-up > measurements to find out what is causing the per-content-process overhead. > > I did this by measuring memory usage with four trivial web pages open, > first > with a single content process, then with four content processes, and then > getting the diff between content processes of the two. (about:memory's diff > algorithm normalizes PIDs in memory reports as "NNN" so multiple content > processes naturally get collapsed together, which in this case is exactly > what > we want.) I call this the "small processes" measurement. > > If we divide the memory usage increase by 3 (the increase in the number of > content processes) we get a rough measure of the minimum per-content > process > overhead. > > I then did a similar thing but with four more complex web pages (gmail, > Google > Docs, TreeHerder, Bugzilla). I call this the "large processes" measurement. > > [ lots of analysis omitted to not get caught in the 40kb+ moderation queue ] - > Conclusion > > - > > The overhead per content process is significant. I can see scope for > moderate > improvements, but I'm having trouble seeing how big improvements can be > made. > Without big improvements, scaling the number of content processes beyond > 4 (*maybe* 8) won't be possible. > > - JS overhead is the biggest factor. We execute a lot of JS code just > starting > up for each content process -- can that be reduced? We should also > consider a > smaller nursery size limit for content processes. > > - Heap overhead is significant. Reducing the page-cache size could save a > couple of MiBs. Improvements beyond that are hard. Turning on jemalloc4 > *might* help a bit, but I wouldn't bank on it, and there are other > complications with that. > > - Static data is a big chunk. It's hard to make much of a dent there > because > it has a *very* long tail. > > - The remaining buckets are a lot smaller. > > I'm happy to gives copies of the raw data files to anyone who wants to look > at > them in more detail. > > Nick > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: An analysis of content process memory overhead
On 03/17/2016 08:05 AM, Thinker Li wrote: On Wednesday, March 16, 2016 at 10:22:40 PM UTC+8, Nicholas Nethercote wrote: Even if we can fix that, it's just a lot of JS code. We can lazily import JSMs; I wonder if we are failing to do that as much as we could, i.e. are all these modules really needed at start-up? It would be great if we could instrument module-loading code in some way that answers this question. B2G also did dropping JS source, for Tarako branch, since source is useless for loaded module save for stringify functions. (Gecko compress in-memory source.) But, I am not sure if it was landed on m-c then. Note, this worked on B2G, but this would not work for Gecko. For example all tabs addons have to use toSource to patch the JS functions. Source compressions should already be enabled. I think we do not do it for small sources, and for Huge sources, as the compression would either be useless, or it would take a noticeable amount of time. -- Nicolas B. Pierron ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: An analysis of content process memory overhead
On 15/03/2016 04:34, Nicholas Nethercote wrote: > - "heap-overhead" is 4 MiB per process. I've looked at this closely. > The numbers tend to be noisy. > > - "page-cache" is pages that jemalloc holds onto for fast recycling. It is > capped at 4 MiB per process and we can reduce that with a jemalloc > configuration, though this may make allocation slightly slower. We aggressively got rid of that on B2G by sending memory-pressure events to apps that were unused. We did have the advantage there that we had only one page per process so establishing if one was not being used was very easy. On desktop Firefox we might consider to try and minimize the memory usage of processes which do not have active tabs (e.g. none of the tabs is visible, or none of the tabs has received input for a while). Besides the immediate memory usage reduction this had the important side-effect of reducing steady-state consumption. A lot of the structures and caches that were purged had often been bloated by transient data required only during startup. Once minimized they would start to grow again once a process would become active again but never as much as before the minimization. Gabriele signature.asc Description: OpenPGP digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: An analysis of content process memory overhead
I seem to remember that our ChromeWorkers (SessionWorker, PageThumbsWorker, OS.File Worker) were pretty memory-hungry, but I don't see any workers there. Does this mean that they have negligible overhead or that they are only in the parent process? Cheers, David On 15/03/16 04:34, Nicholas Nethercote wrote: > Greetings, > > erahm recently wrote a nice blog post with measurements showing the > overhead of > enabling multiple content processes: > > http://www.erahm.org/2016/02/11/memory-usage-of-firefox-with-e10s-enabled/ > > The overhead is high -- 8 content processes *doubles* our physical memory > usage -- which limits the possibility of increasing the number of content > processes beyond a small number. Now I've done some follow-up > measurements to find out what is causing the per-content-process overhead. > > I did this by measuring memory usage with four trivial web pages open, first > with a single content process, then with four content processes, and then > getting the diff between content processes of the two. (about:memory's diff > algorithm normalizes PIDs in memory reports as "NNN" so multiple content > processes naturally get collapsed together, which in this case is exactly > what > we want.) I call this the "small processes" measurement. > > If we divide the memory usage increase by 3 (the increase in the number of > content processes) we get a rough measure of the minimum per-content process > overhead. > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: An analysis of content process memory overhead
On Thu, Mar 17, 2016 at 9:50 AM, Nicolas B. Pierron < nicolas.b.pier...@mozilla.com> wrote: > Source compressions should already be enabled. I think we do not do it > for small sources, and for Huge sources, as the compression would either be > useless, or it would take a noticeable amount of time. > I think Luke suggested that we could compress larger JS sources off the main thread if we implemented this bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1001231 Its been in my queue for 2 years, unfortunately. If anyone wants to make that happen, please feel free to steal it. :-) Ben ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: An analysis of content process memory overhead
On Fri, Mar 18, 2016 at 2:29 AM, David Rajchenbach-Teller < dtel...@mozilla.com> wrote: > > I seem to remember that our ChromeWorkers (SessionWorker, > PageThumbsWorker, OS.File Worker) were pretty memory-hungry, but I don't > see any workers there. Does this mean that they have negligible overhead > or that they are only in the parent process? I checked the data again: they are only in the parent process, so they don't affect content process scaling. And they're not *that* big -- here's the biggest I saw in my data (from the Mac "large processes" data): > 6.33 MB (04.00%) -- workers/workers(chrome) > ├──2.15 MB (01.36%) ++ worker(resource://gre/modules/osfile/osfile_async_worker.js, 0x113881800) > ├──2.11 MB (01.33%) ++ worker(resource:///modules/sessionstore/SessionWorker.js, 0x1297e7800) > └──2.06 MB (01.30%) ++ worker(resource://gre/modules/PageThumbsWorker.js, 0x1169c1000) Nick ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: An analysis of content process memory overhead
On 3/17/16 9:50 AM, Nicolas B. Pierron wrote: Note, this worked on B2G, but this would not work for Gecko. For example all tabs addons have to use toSource to patch the JS functions. Note that we do have the capability to lazily load the source from disk when someone does this, and we do use it in Gecko for some things. We could use it for more things -Boris ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform