On Tue, Aug 15, 2017 at 12:07 PM, Scott Marlowe <scott.marl...@gmail.com> wrote:
> So do iostat or iotop show you if / where your disks are working > hardest? Or is this CPU overhead that's killing performance? > Sorry for the delayed reply. I took a look in more detail at the query plans from our problem query during this incident. There are actually 6 plans, because there were 6 unique queries. I traced one query through our logs, and found something really interesting. That is that all of the first 5 queries are creating temp tables, and all of them took upwards of 500ms each to run. The final query, however, is a simple select from the last temp table, and that query took 0.035ms! This really confirms that somehow, the issue had to do with *writing *to the SAN, I think. Of course this doesn't answer a whole lot, because we had no other apparent issues with write performance at all. I also provide some graphs below. 7pm-3am on 8/10 (first incidents were around 10:30pm, other incidents ~1am, 2am): Local Disk IO: [image: Screen Shot 2017-08-18 at 8.20.06 AM.png] SAN IO: [image: Screen Shot 2017-08-18 at 8.16.59 AM.png] CPU: [image: Screen Shot 2017-08-18 at 8.20.58 AM.png] 7-9pm on 8/10 (controlled attempts starting a little after 7): CPU: [image: Screen Shot 2017-08-18 at 8.43.35 AM.png] Write IO on SAN: [image: Screen Shot 2017-08-18 at 8.44.32 AM.png] Read IO on Local disk: [image: Screen Shot 2017-08-18 at 8.46.27 AM.png] Write IO on Local disk: [image: Screen Shot 2017-08-18 at 8.46.58 AM.png]