Re: [mesos-mail] Re: [Performance WG] Notes from meeting today
I just pushed some initial documentation for this, it will show up soon next to the memory profiling link: http://mesos.apache.org/documentation/latest/#administration On Fri, May 25, 2018 at 6:13 PM, Benjamin Mahler wrote: > I'll write up some instructions with what I know so far and get it added > to the website. In the meantime, here's what you need to do to generate a > 60 second profile: > > $ sudo perf record -F 100 -a -g --call-graph dwarf -p > -- sleep 60 > $ sudo perf script --header | c++filt > mesos-master.stacks > $ gzip mesos-master.stacks > # Share the mesos-master.stacks.gz file for analysis. > > It seems that frame pointer omission is ok, as long as '--call-graph > dwarf' is provided to perf. I don't yet know if frame pointers yield better > traces than '--call-graph dwarf' without frame pointers. > > If you want to use flamescope yourself, follow the instructions here and > put the unzipped file above into the 'examples' directory: > https://github.com/Netflix/flamescope > > On Thu, May 17, 2018 at 4:51 PM, Zhitao Li wrote: > >> Hi Ben, >> >> Thanks a lot, this is super informative. >> >> One question: will you write a blog/doc on how to generate flamescope >> graphs from either a micro-benchmark, or a real cluster? Also, do you know >> what configuration for compiling should be used to preserve proper debug >> symbols for both Mesos and 3rdparty libraries? >> >> On Wed, May 16, 2018 at 5:44 PM, Benjamin Mahler >> wrote: >> >> > +Judith >> > >> > There should be a recording. Judith, do you know where they get posted? >> > >> > Benjamin, glad to hear it's useful, I'll continue doing it! >> > >> > On Wed, May 16, 2018 at 4:41 PM Gilbert Song >> > wrote: >> > >> > > Do we have the recorded video for this meeting? >> > > >> > > On Wed, May 16, 2018 at 1:54 PM, Benjamin Bannier < >> > > benjamin.bann...@mesosphere.io> wrote: >> > > >> > > > Hi Ben, >> > > > >> > > > thanks for taking the time to edit and share these detailed notes. >> > Being >> > > > able to asynchronously see the great work folks are doing surfaced >> is >> > > > great, especially when put into context with thought like here. >> > > > >> > > > >> > > > Benjamin >> > > > >> > > > > On May 16, 2018, at 8:06 PM, Benjamin Mahler >> > > wrote: >> > > > > >> > > > > Hi folks, >> > > > > >> > > > > Here are some notes from the performance meeting today. >> > > > > >> > > > > (1) First I did a demo of flamescope, you can find it here: >> > > > > https://github.com/Netflix/flamescope >> > > > > >> > > > > It's a very useful tool, hopefully we can make it easier for >> users to >> > > > > generate the data that we can drop into flamescope when reporting >> any >> > > > > performance issues. One of the open questions is how `perf >> > --call-graph >> > > > > dwarf` compares to `perf -g` but with mesos compiled with frame >> > > > pointers. I >> > > > > haven't had time to check this yet. >> > > > > >> > > > > When playing with the tool, it was easy to find some hot spots in >> the >> > > > given >> > > > > cluster I was looking at (which was not necessarily >> representative). >> > > For >> > > > > the agent, jie filed: >> > > > > >> > > > > https://issues.apache.org/jira/browse/MESOS-8901 >> > > > > >> > > > > And for the master, I noticed that metrics, state json generation >> (no >> > > > > surprise), and a particular spot in the allocator were very >> > expensive. >> > > > > >> > > > > Metrics we'd like to address via migration to push gauges (Zhitao >> has >> > > > > offered to help with this effort): >> > > > > >> > > > > https://issues.apache.org/jira/browse/MESOS-8914 >> > > > > >> > > > > The state generation we'd like to address via streaming state >> into a >> > > > > separate actor (and providing filtering as well), this will get >> > further >> > > > > investigated / prioritized very soon: >> > > > > >> > > > > https://issues.apache.org/jira/browse/MESOS-8345 >> > > > > >> > > > > (2) Kapil discussed benchmarks for the long standing "offer >> > starvation" >> > > > > issue: >> > > > > >> > > > > https://issues.apache.org/jira/browse/MESOS-3202 >> > > > > >> > > > > I'll send out an email or document soon with some background on >> this >> > > > issue >> > > > > as well as our options to address it. >> > > > > >> > > > > Let me know if you have any questions or feedback! >> > > > > >> > > > > Ben >> > > > >> > > > -- >> > > > You received this message because you are subscribed to the Google >> > Groups >> > > > "Apache Mesos Mail Lists" group. >> > > > Visit this group at >> > > https://groups.google.com/a/mesosphere.io/group/mesos- >> > > > mail/. >> > > > For more options, visit >> > > https://groups.google.com/a/mesosphere.io/d/optout >> > > > . >> > > > >> > > >> > >> >> >> >> -- >> Cheers, >> >> Zhitao Li >> > >
Re: [mesos-mail] Re: [Performance WG] Notes from meeting today
I'll write up some instructions with what I know so far and get it added to the website. In the meantime, here's what you need to do to generate a 60 second profile: $ sudo perf record -F 100 -a -g --call-graph dwarf -p -- sleep 60 $ sudo perf script --header | c++filt > mesos-master.stacks $ gzip mesos-master.stacks # Share the mesos-master.stacks.gz file for analysis. It seems that frame pointer omission is ok, as long as '--call-graph dwarf' is provided to perf. I don't yet know if frame pointers yield better traces than '--call-graph dwarf' without frame pointers. If you want to use flamescope yourself, follow the instructions here and put the unzipped file above into the 'examples' directory: https://github.com/Netflix/flamescope On Thu, May 17, 2018 at 4:51 PM, Zhitao Liwrote: > Hi Ben, > > Thanks a lot, this is super informative. > > One question: will you write a blog/doc on how to generate flamescope > graphs from either a micro-benchmark, or a real cluster? Also, do you know > what configuration for compiling should be used to preserve proper debug > symbols for both Mesos and 3rdparty libraries? > > On Wed, May 16, 2018 at 5:44 PM, Benjamin Mahler > wrote: > > > +Judith > > > > There should be a recording. Judith, do you know where they get posted? > > > > Benjamin, glad to hear it's useful, I'll continue doing it! > > > > On Wed, May 16, 2018 at 4:41 PM Gilbert Song > > wrote: > > > > > Do we have the recorded video for this meeting? > > > > > > On Wed, May 16, 2018 at 1:54 PM, Benjamin Bannier < > > > benjamin.bann...@mesosphere.io> wrote: > > > > > > > Hi Ben, > > > > > > > > thanks for taking the time to edit and share these detailed notes. > > Being > > > > able to asynchronously see the great work folks are doing surfaced is > > > > great, especially when put into context with thought like here. > > > > > > > > > > > > Benjamin > > > > > > > > > On May 16, 2018, at 8:06 PM, Benjamin Mahler > > > wrote: > > > > > > > > > > Hi folks, > > > > > > > > > > Here are some notes from the performance meeting today. > > > > > > > > > > (1) First I did a demo of flamescope, you can find it here: > > > > > https://github.com/Netflix/flamescope > > > > > > > > > > It's a very useful tool, hopefully we can make it easier for users > to > > > > > generate the data that we can drop into flamescope when reporting > any > > > > > performance issues. One of the open questions is how `perf > > --call-graph > > > > > dwarf` compares to `perf -g` but with mesos compiled with frame > > > > pointers. I > > > > > haven't had time to check this yet. > > > > > > > > > > When playing with the tool, it was easy to find some hot spots in > the > > > > given > > > > > cluster I was looking at (which was not necessarily > representative). > > > For > > > > > the agent, jie filed: > > > > > > > > > > https://issues.apache.org/jira/browse/MESOS-8901 > > > > > > > > > > And for the master, I noticed that metrics, state json generation > (no > > > > > surprise), and a particular spot in the allocator were very > > expensive. > > > > > > > > > > Metrics we'd like to address via migration to push gauges (Zhitao > has > > > > > offered to help with this effort): > > > > > > > > > > https://issues.apache.org/jira/browse/MESOS-8914 > > > > > > > > > > The state generation we'd like to address via streaming state into > a > > > > > separate actor (and providing filtering as well), this will get > > further > > > > > investigated / prioritized very soon: > > > > > > > > > > https://issues.apache.org/jira/browse/MESOS-8345 > > > > > > > > > > (2) Kapil discussed benchmarks for the long standing "offer > > starvation" > > > > > issue: > > > > > > > > > > https://issues.apache.org/jira/browse/MESOS-3202 > > > > > > > > > > I'll send out an email or document soon with some background on > this > > > > issue > > > > > as well as our options to address it. > > > > > > > > > > Let me know if you have any questions or feedback! > > > > > > > > > > Ben > > > > > > > > -- > > > > You received this message because you are subscribed to the Google > > Groups > > > > "Apache Mesos Mail Lists" group. > > > > Visit this group at > > > https://groups.google.com/a/mesosphere.io/group/mesos- > > > > mail/. > > > > For more options, visit > > > https://groups.google.com/a/mesosphere.io/d/optout > > > > . > > > > > > > > > > > > > -- > Cheers, > > Zhitao Li >
Re: [mesos-mail] Re: [Performance WG] Notes from meeting today
Hi, Just uploaded the video. It will be done processing in a couple minutes, and when it finishes you can find it here https://youtu.be/LyFYTVOaJfQ On Thu, May 17, 2018 at 4:51 PM, Zhitao Liwrote: > Hi Ben, > > Thanks a lot, this is super informative. > > One question: will you write a blog/doc on how to generate flamescope > graphs from either a micro-benchmark, or a real cluster? Also, do you know > what configuration for compiling should be used to preserve proper debug > symbols for both Mesos and 3rdparty libraries? > > On Wed, May 16, 2018 at 5:44 PM, Benjamin Mahler > wrote: > >> +Judith >> >> There should be a recording. Judith, do you know where they get posted? >> >> Benjamin, glad to hear it's useful, I'll continue doing it! >> >> On Wed, May 16, 2018 at 4:41 PM Gilbert Song >> wrote: >> >> > Do we have the recorded video for this meeting? >> > >> > On Wed, May 16, 2018 at 1:54 PM, Benjamin Bannier < >> > benjamin.bann...@mesosphere.io> wrote: >> > >> > > Hi Ben, >> > > >> > > thanks for taking the time to edit and share these detailed notes. >> Being >> > > able to asynchronously see the great work folks are doing surfaced is >> > > great, especially when put into context with thought like here. >> > > >> > > >> > > Benjamin >> > > >> > > > On May 16, 2018, at 8:06 PM, Benjamin Mahler >> > wrote: >> > > > >> > > > Hi folks, >> > > > >> > > > Here are some notes from the performance meeting today. >> > > > >> > > > (1) First I did a demo of flamescope, you can find it here: >> > > > https://github.com/Netflix/flamescope >> > > > >> > > > It's a very useful tool, hopefully we can make it easier for users >> to >> > > > generate the data that we can drop into flamescope when reporting >> any >> > > > performance issues. One of the open questions is how `perf >> --call-graph >> > > > dwarf` compares to `perf -g` but with mesos compiled with frame >> > > pointers. I >> > > > haven't had time to check this yet. >> > > > >> > > > When playing with the tool, it was easy to find some hot spots in >> the >> > > given >> > > > cluster I was looking at (which was not necessarily representative). >> > For >> > > > the agent, jie filed: >> > > > >> > > > https://issues.apache.org/jira/browse/MESOS-8901 >> > > > >> > > > And for the master, I noticed that metrics, state json generation >> (no >> > > > surprise), and a particular spot in the allocator were very >> expensive. >> > > > >> > > > Metrics we'd like to address via migration to push gauges (Zhitao >> has >> > > > offered to help with this effort): >> > > > >> > > > https://issues.apache.org/jira/browse/MESOS-8914 >> > > > >> > > > The state generation we'd like to address via streaming state into a >> > > > separate actor (and providing filtering as well), this will get >> further >> > > > investigated / prioritized very soon: >> > > > >> > > > https://issues.apache.org/jira/browse/MESOS-8345 >> > > > >> > > > (2) Kapil discussed benchmarks for the long standing "offer >> starvation" >> > > > issue: >> > > > >> > > > https://issues.apache.org/jira/browse/MESOS-3202 >> > > > >> > > > I'll send out an email or document soon with some background on this >> > > issue >> > > > as well as our options to address it. >> > > > >> > > > Let me know if you have any questions or feedback! >> > > > >> > > > Ben >> > > >> > > -- >> > > You received this message because you are subscribed to the Google >> Groups >> > > "Apache Mesos Mail Lists" group. >> > > Visit this group at >> > https://groups.google.com/a/mesosphere.io/group/mesos- >> > > mail/. >> > > For more options, visit >> > https://groups.google.com/a/mesosphere.io/d/optout >> > > . >> > > >> > >> > > > > -- > Cheers, > > Zhitao Li > -- Judith Malnick Community Manager 310-709-1517
Re: [mesos-mail] Re: [Performance WG] Notes from meeting today
Hi Ben, Thanks a lot, this is super informative. One question: will you write a blog/doc on how to generate flamescope graphs from either a micro-benchmark, or a real cluster? Also, do you know what configuration for compiling should be used to preserve proper debug symbols for both Mesos and 3rdparty libraries? On Wed, May 16, 2018 at 5:44 PM, Benjamin Mahlerwrote: > +Judith > > There should be a recording. Judith, do you know where they get posted? > > Benjamin, glad to hear it's useful, I'll continue doing it! > > On Wed, May 16, 2018 at 4:41 PM Gilbert Song > wrote: > > > Do we have the recorded video for this meeting? > > > > On Wed, May 16, 2018 at 1:54 PM, Benjamin Bannier < > > benjamin.bann...@mesosphere.io> wrote: > > > > > Hi Ben, > > > > > > thanks for taking the time to edit and share these detailed notes. > Being > > > able to asynchronously see the great work folks are doing surfaced is > > > great, especially when put into context with thought like here. > > > > > > > > > Benjamin > > > > > > > On May 16, 2018, at 8:06 PM, Benjamin Mahler > > wrote: > > > > > > > > Hi folks, > > > > > > > > Here are some notes from the performance meeting today. > > > > > > > > (1) First I did a demo of flamescope, you can find it here: > > > > https://github.com/Netflix/flamescope > > > > > > > > It's a very useful tool, hopefully we can make it easier for users to > > > > generate the data that we can drop into flamescope when reporting any > > > > performance issues. One of the open questions is how `perf > --call-graph > > > > dwarf` compares to `perf -g` but with mesos compiled with frame > > > pointers. I > > > > haven't had time to check this yet. > > > > > > > > When playing with the tool, it was easy to find some hot spots in the > > > given > > > > cluster I was looking at (which was not necessarily representative). > > For > > > > the agent, jie filed: > > > > > > > > https://issues.apache.org/jira/browse/MESOS-8901 > > > > > > > > And for the master, I noticed that metrics, state json generation (no > > > > surprise), and a particular spot in the allocator were very > expensive. > > > > > > > > Metrics we'd like to address via migration to push gauges (Zhitao has > > > > offered to help with this effort): > > > > > > > > https://issues.apache.org/jira/browse/MESOS-8914 > > > > > > > > The state generation we'd like to address via streaming state into a > > > > separate actor (and providing filtering as well), this will get > further > > > > investigated / prioritized very soon: > > > > > > > > https://issues.apache.org/jira/browse/MESOS-8345 > > > > > > > > (2) Kapil discussed benchmarks for the long standing "offer > starvation" > > > > issue: > > > > > > > > https://issues.apache.org/jira/browse/MESOS-3202 > > > > > > > > I'll send out an email or document soon with some background on this > > > issue > > > > as well as our options to address it. > > > > > > > > Let me know if you have any questions or feedback! > > > > > > > > Ben > > > > > > -- > > > You received this message because you are subscribed to the Google > Groups > > > "Apache Mesos Mail Lists" group. > > > Visit this group at > > https://groups.google.com/a/mesosphere.io/group/mesos- > > > mail/. > > > For more options, visit > > https://groups.google.com/a/mesosphere.io/d/optout > > > . > > > > > > -- Cheers, Zhitao Li
Re: [mesos-mail] Re: [Performance WG] Notes from meeting today
+Judith There should be a recording. Judith, do you know where they get posted? Benjamin, glad to hear it's useful, I'll continue doing it! On Wed, May 16, 2018 at 4:41 PM Gilbert Songwrote: > Do we have the recorded video for this meeting? > > On Wed, May 16, 2018 at 1:54 PM, Benjamin Bannier < > benjamin.bann...@mesosphere.io> wrote: > > > Hi Ben, > > > > thanks for taking the time to edit and share these detailed notes. Being > > able to asynchronously see the great work folks are doing surfaced is > > great, especially when put into context with thought like here. > > > > > > Benjamin > > > > > On May 16, 2018, at 8:06 PM, Benjamin Mahler > wrote: > > > > > > Hi folks, > > > > > > Here are some notes from the performance meeting today. > > > > > > (1) First I did a demo of flamescope, you can find it here: > > > https://github.com/Netflix/flamescope > > > > > > It's a very useful tool, hopefully we can make it easier for users to > > > generate the data that we can drop into flamescope when reporting any > > > performance issues. One of the open questions is how `perf --call-graph > > > dwarf` compares to `perf -g` but with mesos compiled with frame > > pointers. I > > > haven't had time to check this yet. > > > > > > When playing with the tool, it was easy to find some hot spots in the > > given > > > cluster I was looking at (which was not necessarily representative). > For > > > the agent, jie filed: > > > > > > https://issues.apache.org/jira/browse/MESOS-8901 > > > > > > And for the master, I noticed that metrics, state json generation (no > > > surprise), and a particular spot in the allocator were very expensive. > > > > > > Metrics we'd like to address via migration to push gauges (Zhitao has > > > offered to help with this effort): > > > > > > https://issues.apache.org/jira/browse/MESOS-8914 > > > > > > The state generation we'd like to address via streaming state into a > > > separate actor (and providing filtering as well), this will get further > > > investigated / prioritized very soon: > > > > > > https://issues.apache.org/jira/browse/MESOS-8345 > > > > > > (2) Kapil discussed benchmarks for the long standing "offer starvation" > > > issue: > > > > > > https://issues.apache.org/jira/browse/MESOS-3202 > > > > > > I'll send out an email or document soon with some background on this > > issue > > > as well as our options to address it. > > > > > > Let me know if you have any questions or feedback! > > > > > > Ben > > > > -- > > You received this message because you are subscribed to the Google Groups > > "Apache Mesos Mail Lists" group. > > Visit this group at > https://groups.google.com/a/mesosphere.io/group/mesos- > > mail/. > > For more options, visit > https://groups.google.com/a/mesosphere.io/d/optout > > . > > >
Re: [mesos-mail] Re: [Performance WG] Notes from meeting today
Do we have the recorded video for this meeting? On Wed, May 16, 2018 at 1:54 PM, Benjamin Bannier < benjamin.bann...@mesosphere.io> wrote: > Hi Ben, > > thanks for taking the time to edit and share these detailed notes. Being > able to asynchronously see the great work folks are doing surfaced is > great, especially when put into context with thought like here. > > > Benjamin > > > On May 16, 2018, at 8:06 PM, Benjamin Mahlerwrote: > > > > Hi folks, > > > > Here are some notes from the performance meeting today. > > > > (1) First I did a demo of flamescope, you can find it here: > > https://github.com/Netflix/flamescope > > > > It's a very useful tool, hopefully we can make it easier for users to > > generate the data that we can drop into flamescope when reporting any > > performance issues. One of the open questions is how `perf --call-graph > > dwarf` compares to `perf -g` but with mesos compiled with frame > pointers. I > > haven't had time to check this yet. > > > > When playing with the tool, it was easy to find some hot spots in the > given > > cluster I was looking at (which was not necessarily representative). For > > the agent, jie filed: > > > > https://issues.apache.org/jira/browse/MESOS-8901 > > > > And for the master, I noticed that metrics, state json generation (no > > surprise), and a particular spot in the allocator were very expensive. > > > > Metrics we'd like to address via migration to push gauges (Zhitao has > > offered to help with this effort): > > > > https://issues.apache.org/jira/browse/MESOS-8914 > > > > The state generation we'd like to address via streaming state into a > > separate actor (and providing filtering as well), this will get further > > investigated / prioritized very soon: > > > > https://issues.apache.org/jira/browse/MESOS-8345 > > > > (2) Kapil discussed benchmarks for the long standing "offer starvation" > > issue: > > > > https://issues.apache.org/jira/browse/MESOS-3202 > > > > I'll send out an email or document soon with some background on this > issue > > as well as our options to address it. > > > > Let me know if you have any questions or feedback! > > > > Ben > > -- > You received this message because you are subscribed to the Google Groups > "Apache Mesos Mail Lists" group. > Visit this group at https://groups.google.com/a/mesosphere.io/group/mesos- > mail/. > For more options, visit https://groups.google.com/a/mesosphere.io/d/optout > . >
Re: [Performance WG] Notes from meeting today
Hi Ben, thanks for taking the time to edit and share these detailed notes. Being able to asynchronously see the great work folks are doing surfaced is great, especially when put into context with thought like here. Benjamin > On May 16, 2018, at 8:06 PM, Benjamin Mahlerwrote: > > Hi folks, > > Here are some notes from the performance meeting today. > > (1) First I did a demo of flamescope, you can find it here: > https://github.com/Netflix/flamescope > > It's a very useful tool, hopefully we can make it easier for users to > generate the data that we can drop into flamescope when reporting any > performance issues. One of the open questions is how `perf --call-graph > dwarf` compares to `perf -g` but with mesos compiled with frame pointers. I > haven't had time to check this yet. > > When playing with the tool, it was easy to find some hot spots in the given > cluster I was looking at (which was not necessarily representative). For > the agent, jie filed: > > https://issues.apache.org/jira/browse/MESOS-8901 > > And for the master, I noticed that metrics, state json generation (no > surprise), and a particular spot in the allocator were very expensive. > > Metrics we'd like to address via migration to push gauges (Zhitao has > offered to help with this effort): > > https://issues.apache.org/jira/browse/MESOS-8914 > > The state generation we'd like to address via streaming state into a > separate actor (and providing filtering as well), this will get further > investigated / prioritized very soon: > > https://issues.apache.org/jira/browse/MESOS-8345 > > (2) Kapil discussed benchmarks for the long standing "offer starvation" > issue: > > https://issues.apache.org/jira/browse/MESOS-3202 > > I'll send out an email or document soon with some background on this issue > as well as our options to address it. > > Let me know if you have any questions or feedback! > > Ben
[Performance WG] Notes from meeting today
Hi folks, Here are some notes from the performance meeting today. (1) First I did a demo of flamescope, you can find it here: https://github.com/Netflix/flamescope It's a very useful tool, hopefully we can make it easier for users to generate the data that we can drop into flamescope when reporting any performance issues. One of the open questions is how `perf --call-graph dwarf` compares to `perf -g` but with mesos compiled with frame pointers. I haven't had time to check this yet. When playing with the tool, it was easy to find some hot spots in the given cluster I was looking at (which was not necessarily representative). For the agent, jie filed: https://issues.apache.org/jira/browse/MESOS-8901 And for the master, I noticed that metrics, state json generation (no surprise), and a particular spot in the allocator were very expensive. Metrics we'd like to address via migration to push gauges (Zhitao has offered to help with this effort): https://issues.apache.org/jira/browse/MESOS-8914 The state generation we'd like to address via streaming state into a separate actor (and providing filtering as well), this will get further investigated / prioritized very soon: https://issues.apache.org/jira/browse/MESOS-8345 (2) Kapil discussed benchmarks for the long standing "offer starvation" issue: https://issues.apache.org/jira/browse/MESOS-3202 I'll send out an email or document soon with some background on this issue as well as our options to address it. Let me know if you have any questions or feedback! Ben