Re: Jon Haddad on Diagnosing Performance Problems in Production
+1 That was a nice talk! I don't know why I haven't come across that video before! On Tue, Feb 27, 2018 at 9:12 AM, Jonathan Haddadwrote: > There isn't a ton from that talk I'd consider "wrong" at this point, but > some of it is a little stale. I always start off looking at system > metrics. For a very thorough discussion on the matter check out Brendan > Gregg's USE [1] method. I did a blog post on my own about the talk [2] > that has screenshots and might be helpful. Generally speaking know your OS > and the tools to examine each component. Learn how to interpret the > numbers you see, there's more information than a human can process in a > lifetime but understanding some fundamentals of throughput vs latency & > error rates and how to find out each of those metrics for cpu / memory / > network / disk is a good start. > > More recently I did a talk at Data Day Texas, I posted the slides on > Slideshare [3]. The focus there was more on perf tuning and less on > performance troubleshooting, but I guess it's a matter of perspective which > point your at. The tools have changed a little (Prometheus instead of > Graphite), and there's some new perf tuning tips like examining your read > ahead and compression settings, generating flame graphs and using tools > like YourKit and Java Flight Recorder, and the easiest win of all time, > disabling dynamic snitch if your hardware is fast and you want sub ms > p99s. Turn up counter cache if you use counters (it still gets hit on the > write path), and row cache is way more effective than people give it credit > for under the right workloads. > > I've got a blog post in the works on JVM tuning, but for now I reference > CASSANDRA-8150 [4] and Blake Eggleston's blog post [5] from back in our > days at a small startup. > > Lastly, I'm doing a performance tuning series on our blog at The Last > Pickle, with the first being on Flame Graphs [6]. I've got about 6 posts > in the pipeline, just need to find time to get to them. > > Hope this helps, > Jon > > [1] http://www.brendangregg.com/usemethod.html > [2] http://rustyrazorblade.com/post/2014/2014-09-18-diagnosing-production/ > [3] https://www.slideshare.net/JonHaddad/performance-tuning-86995333 > [4] https://issues.apache.org/jira/browse/CASSANDRA-8150 > [5] http://blakeeggleston.com/cassandra-tuning-the-jvm-for- > read-heavy-workloads.html > [6] http://thelastpickle.com/blog/2018/01/16/cassandra-flame-graphs.html > > > > On Tue, Feb 27, 2018 at 8:56 AM Michael Shuler > wrote: > >> On 02/27/2018 10:20 AM, Nicolas Guyomar wrote: >> > Is Jon blog >> > post https://academy.datastax.com/planet-cassandra/blog/ >> cassandra-summit-recap-diagnosing-problems-in-production >> > was relocated somewhere ? >> >> https://web.archive.org/web/20160322011022/planetcassandra.org/blog/ >> cassandra-summit-recap-diagnosing-problems-in-production >> >> -- >> Kind regards, >> Michael >> >> - >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >>
Re: Jon Haddad on Diagnosing Performance Problems in Production
There isn't a ton from that talk I'd consider "wrong" at this point, but some of it is a little stale. I always start off looking at system metrics. For a very thorough discussion on the matter check out Brendan Gregg's USE [1] method. I did a blog post on my own about the talk [2] that has screenshots and might be helpful. Generally speaking know your OS and the tools to examine each component. Learn how to interpret the numbers you see, there's more information than a human can process in a lifetime but understanding some fundamentals of throughput vs latency & error rates and how to find out each of those metrics for cpu / memory / network / disk is a good start. More recently I did a talk at Data Day Texas, I posted the slides on Slideshare [3]. The focus there was more on perf tuning and less on performance troubleshooting, but I guess it's a matter of perspective which point your at. The tools have changed a little (Prometheus instead of Graphite), and there's some new perf tuning tips like examining your read ahead and compression settings, generating flame graphs and using tools like YourKit and Java Flight Recorder, and the easiest win of all time, disabling dynamic snitch if your hardware is fast and you want sub ms p99s. Turn up counter cache if you use counters (it still gets hit on the write path), and row cache is way more effective than people give it credit for under the right workloads. I've got a blog post in the works on JVM tuning, but for now I reference CASSANDRA-8150 [4] and Blake Eggleston's blog post [5] from back in our days at a small startup. Lastly, I'm doing a performance tuning series on our blog at The Last Pickle, with the first being on Flame Graphs [6]. I've got about 6 posts in the pipeline, just need to find time to get to them. Hope this helps, Jon [1] http://www.brendangregg.com/usemethod.html [2] http://rustyrazorblade.com/post/2014/2014-09-18-diagnosing-production/ [3] https://www.slideshare.net/JonHaddad/performance-tuning-86995333 [4] https://issues.apache.org/jira/browse/CASSANDRA-8150 [5] http://blakeeggleston.com/cassandra-tuning-the-jvm-for-read-heavy-workloads.html [6] http://thelastpickle.com/blog/2018/01/16/cassandra-flame-graphs.html On Tue, Feb 27, 2018 at 8:56 AM Michael Shulerwrote: > On 02/27/2018 10:20 AM, Nicolas Guyomar wrote: > > Is Jon blog > > post > https://academy.datastax.com/planet-cassandra/blog/cassandra-summit-recap-diagnosing-problems-in-production > > was relocated somewhere ? > > > https://web.archive.org/web/20160322011022/planetcassandra.org/blog/cassandra-summit-recap-diagnosing-problems-in-production > > -- > Kind regards, > Michael > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >
RE: Jon Haddad on Diagnosing Performance Problems in Production
Perhaps Mr. Hadad himself will share it again somewhere; he was kind enough to share it once at datastax! From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] Sent: Tuesday, February 27, 2018 10:39 AM To: user@cassandra.apache.org Subject: RE: Jon Haddad on Diagnosing Performance Problems in Production Nicolas, I think you had the link to the other version I was thinking of. I couldn’t find it. I think it might have gotten taken down; a lot of other stuff seems to be gone too. Maybe it will be back. Maybe they are just redoing stuff. Either way, it’s another sign of Mom and Dad drifting apart – I’m not sure who’s Mom and who’s Dad: DataStax or ASF. Hopefully, for the sake of everyone in the family they will reconcile. It’s gems like that presentation that will keep us vital. Kenneth Brotman From: Nicolas Guyomar [mailto:nicolas.guyo...@gmail.com] Sent: Tuesday, February 27, 2018 8:21 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Jon Haddad on Diagnosing Performance Problems in Production Is Jon blog post https://academy.datastax.com/planet-cassandra/blog/cassandra-summit-recap-diagnosing-problems-in-production<https://urldefense.proofpoint.com/v2/url?u=https-3A__academy.datastax.com_planet-2Dcassandra_blog_cassandra-2Dsummit-2Drecap-2Ddiagnosing-2Dproblems-2Din-2Dproduction=DwMFaQ=LFYZ-o9_HUMeMTSQicvjIg=FsmDztdsVuIKml8IDhdHdg=ETtRCCbiqO2DbUs6JS3LXKpTS6WClUKrPG4hYxYR55E=mOIbQnFR3d-E0jT3Dr2183IMO9PygcXXZiignU8XTHM=> was relocated somewhere ? On 27 February 2018 at 16:34, Kenneth Brotman <kenbrot...@yahoo.com.invalid<mailto:kenbrot...@yahoo.com.invalid>> wrote: One presentation that I hope can get updated is Jon Haddad’s very thorough presentation on Diagnosing Performance Problems in Production. I’ve seen another version somewhere where I believe he says something like “This should help you fix 99% of the problems you see.” Seems right. I’m sure it will be well attended and well viewed for some time. Here’s the version I found: https://www.youtube.com/watch?v=2JlUpgsEdN8<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_watch-3Fv-3D2JlUpgsEdN8=DwMFaQ=LFYZ-o9_HUMeMTSQicvjIg=FsmDztdsVuIKml8IDhdHdg=ETtRCCbiqO2DbUs6JS3LXKpTS6WClUKrPG4hYxYR55E=FuNx8e6rV7QEvzGVdXxFdRROaxaBUy4A3f4-_t3USgQ=> If Jon did a new version I’d probably stop and watch it three times right now. If we started with that video inline on the Apache Cassandra web site in the troubleshooting section, that would help a lot of people because of the quality of the content and the density of the content. Kenneth Brotman
Re: Jon Haddad on Diagnosing Performance Problems in Production
On 02/27/2018 10:20 AM, Nicolas Guyomar wrote: > Is Jon blog > post > https://academy.datastax.com/planet-cassandra/blog/cassandra-summit-recap-diagnosing-problems-in-production > was relocated somewhere ? https://web.archive.org/web/20160322011022/planetcassandra.org/blog/cassandra-summit-recap-diagnosing-problems-in-production -- Kind regards, Michael - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
RE: Jon Haddad on Diagnosing Performance Problems in Production
Nicolas, I think you had the link to the other version I was thinking of. I couldn’t find it. I think it might have gotten taken down; a lot of other stuff seems to be gone too. Maybe it will be back. Maybe they are just redoing stuff. Either way, it’s another sign of Mom and Dad drifting apart – I’m not sure who’s Mom and who’s Dad: DataStax or ASF. Hopefully, for the sake of everyone in the family they will reconcile. It’s gems like that presentation that will keep us vital. Kenneth Brotman From: Nicolas Guyomar [mailto:nicolas.guyo...@gmail.com] Sent: Tuesday, February 27, 2018 8:21 AM To: user@cassandra.apache.org Subject: Re: Jon Haddad on Diagnosing Performance Problems in Production Is Jon blog post https://academy.datastax.com/planet-cassandra/blog/cassandra-summit-recap-diagnosing-problems-in-production was relocated somewhere ? On 27 February 2018 at 16:34, Kenneth Brotman <kenbrot...@yahoo.com.invalid> wrote: One presentation that I hope can get updated is Jon Haddad’s very thorough presentation on Diagnosing Performance Problems in Production. I’ve seen another version somewhere where I believe he says something like “This should help you fix 99% of the problems you see.” Seems right. I’m sure it will be well attended and well viewed for some time. Here’s the version I found: https://www.youtube.com/watch?v=2JlUpgsEdN8 If Jon did a new version I’d probably stop and watch it three times right now. If we started with that video inline on the Apache Cassandra web site in the troubleshooting section, that would help a lot of people because of the quality of the content and the density of the content. Kenneth Brotman
Re: Jon Haddad on Diagnosing Performance Problems in Production
Is Jon blog post https://academy.datastax.com/planet-cassandra/blog/cassandra-summit-recap-diagnosing-problems-in-production was relocated somewhere ? On 27 February 2018 at 16:34, Kenneth Brotmanwrote: > One presentation that I hope can get updated is Jon Haddad’s very thorough > presentation on Diagnosing Performance Problems in Production. I’ve seen > another version somewhere where I believe he says something like “This > should help you fix 99% of the problems you see.” Seems right. > > > > I’m sure it will be well attended and well viewed for some time. Here’s > the version I found: https://www.youtube.com/watch?v=2JlUpgsEdN8 > > > > If Jon did a new version I’d probably stop and watch it three times right > now. > > > > If we started with that video inline on the Apache Cassandra web site in > the troubleshooting section, that would help a lot of people because of the > quality of the content and the density of the content. > > > > Kenneth Brotman >