Re: [ceph-users] Huge latency spikes
All 3 nodes have this status for SSD mirror. Controller cache is on for all 3. Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU On 11/18/2018 12:45 AM, Serkan Çoban wrote: Does write cache on SSDs enabled on three servers? Can you check them? On Sun, Nov 18, 2018 at 9:05 AM Alex Litvak wrote: Raid card for journal disks is Perc H730 (Megaraid), RAID 1, battery back cache is on Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU I have 2 other nodes with older Perc H710 and similar SSDs with slightly higher wear (6.3% vs 5.18%) but from observation they hardly hit 1.5 ms on rear occasion Cache, RAID, and battery situation is the same. On 11/17/2018 11:38 PM, Serkan Çoban wrote: 10ms w_await for SSD is too much. How that SSD is connected to the system? Any raid card installed on this system? What is the raid mode? On Sun, Nov 18, 2018 at 8:25 AM Alex Litvak wrote: Here is another snapshot. I wonder if this write io wait is too big Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-14 0.00 0.000.00 23.00 0.00 336.0029.22 0.34 14.740.00 14.74 2.87 6.60 dm-15 0.00 0.000.00 16.00 0.00 200.0025.00 0.010.750.000.75 0.75 1.20 dm-16 0.00 0.000.00 17.00 0.00 276.0032.47 0.25 14.940.00 14.94 3.35 5.70 dm-17 0.00 0.000.00 17.00 0.00 252.0029.65 0.32 18.650.00 18.65 4.00 6.80 dm-18 0.00 0.000.00 15.00 0.00 152.0020.27 0.25 16.800.00 16.80 4.07 6.10 dm-19 0.00 0.000.00 13.00 0.00 152.0023.38 0.21 15.920.00 15.92 4.85 6.30 dm-20 0.00 0.000.00 20.00 0.00 248.0024.80 0.27 13.600.00 13.60 3.25 6.50 dm-21 0.00 0.000.00 17.00 0.00 188.0022.12 0.27 16.000.00 16.00 3.59 6.10 dm-22 0.00 0.000.00 20.00 0.00 156.0015.60 0.115.550.005.55 2.95 5.90 dm-24 0.00 0.000.008.00 0.0056.0014.00 0.12 14.620.00 14.62 4.75 3.80 dm-25 0.00 0.000.00 19.00 0.00 200.0021.05 0.21 10.890.00 10.89 2.74 5.20 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-14 0.00 0.000.00 11.00 0.00 136.0024.73 0.119.730.009.73 1.82 2.00 dm-15 0.00 0.000.00 12.00 0.00 136.0022.67 0.043.750.003.75 1.08 1.30 dm-16 0.00 0.000.009.00 0.00 104.0023.11 0.09 10.440.00 10.44 2.44 2.20 dm-17 0.00 0.000.005.00 0.00 160.0064.00 0.024.000.004.00 4.00 2.00 dm-18 0.00 0.000.005.00 0.0052.0020.80 0.035.800.005.80 3.60 1.80 dm-19 0.00 0.000.00 10.00 0.00 104.0020.80 0.087.900.007.90 2.10 2.10 dm-20 0.00 0.000.009.00 0.00 132.0029.33 0.10 11.220.00 11.22 2.56 2.30 dm-21 0.00 0.000.006.00 0.0068.0022.67 0.07 12.330.00 12.33 3.83 2.30 dm-22 0.00 0.000.003.00 0.0020.0013.33 0.013.670.003.67 3.67 1.10 dm-24 0.00 0.000.004.00 0.0024.0012.00 0.07 18.000.00 18.00 5.25 2.10 dm-25 0.00 0.000.006.00 0.0064.0021.33 0.06 10.330.00 10.33 3.67 2.20 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-14 0.00 0.000.005.00 0.00 140.0056.00 0.08 15.200.00 15.20 5.40 2.70 dm-15 0.00 0.000.006.00 0.00 236.0078.67 0.18 30.670.00 30.67 6.83 4.10 dm-16 0.00 0.000.008.00 0.0084.0021.00 0.067.250.007.25 1.62 1.30 dm-17 0.00 0.000.003.00 0.0084.0056.00 0.000.330.000.33 0.33 0.10 dm-18 0.00 0.000.002.00 0.0020.0020.00 0.02 12.000.00 12.00 12.00 2.40 dm-19 0.00 0.000.00 12.00 0.0080.0013.33 0.054.000.004.00 2.33
Re: [ceph-users] Huge latency spikes
Does write cache on SSDs enabled on three servers? Can you check them? On Sun, Nov 18, 2018 at 9:05 AM Alex Litvak wrote: > > Raid card for journal disks is Perc H730 (Megaraid), RAID 1, battery back > cache is on > > Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad > BBU > Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad > BBU > > I have 2 other nodes with older Perc H710 and similar SSDs with slightly > higher wear (6.3% vs 5.18%) but from observation they hardly hit 1.5 ms on > rear occasion > Cache, RAID, and battery situation is the same. > > On 11/17/2018 11:38 PM, Serkan Çoban wrote: > >> 10ms w_await for SSD is too much. How that SSD is connected to the system? > >> Any raid card installed on this system? What is the raid mode? > > On Sun, Nov 18, 2018 at 8:25 AM Alex Litvak > > wrote: > >> > >> Here is another snapshot. I wonder if this write io wait is too big > >> Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz > >> avgqu-sz await r_await w_await svctm %util > >> dm-14 0.00 0.000.00 23.00 0.00 336.0029.22 > >> 0.34 14.740.00 14.74 2.87 6.60 > >> dm-15 0.00 0.000.00 16.00 0.00 200.0025.00 > >> 0.010.750.000.75 0.75 1.20 > >> dm-16 0.00 0.000.00 17.00 0.00 276.0032.47 > >> 0.25 14.940.00 14.94 3.35 5.70 > >> dm-17 0.00 0.000.00 17.00 0.00 252.0029.65 > >> 0.32 18.650.00 18.65 4.00 6.80 > >> dm-18 0.00 0.000.00 15.00 0.00 152.0020.27 > >> 0.25 16.800.00 16.80 4.07 6.10 > >> dm-19 0.00 0.000.00 13.00 0.00 152.0023.38 > >> 0.21 15.920.00 15.92 4.85 6.30 > >> dm-20 0.00 0.000.00 20.00 0.00 248.0024.80 > >> 0.27 13.600.00 13.60 3.25 6.50 > >> dm-21 0.00 0.000.00 17.00 0.00 188.0022.12 > >> 0.27 16.000.00 16.00 3.59 6.10 > >> dm-22 0.00 0.000.00 20.00 0.00 156.0015.60 > >> 0.115.550.005.55 2.95 5.90 > >> dm-24 0.00 0.000.008.00 0.0056.0014.00 > >> 0.12 14.620.00 14.62 4.75 3.80 > >> dm-25 0.00 0.000.00 19.00 0.00 200.0021.05 > >> 0.21 10.890.00 10.89 2.74 5.20 > >> > >> Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz > >> avgqu-sz await r_await w_await svctm %util > >> dm-14 0.00 0.000.00 11.00 0.00 136.0024.73 > >> 0.119.730.009.73 1.82 2.00 > >> dm-15 0.00 0.000.00 12.00 0.00 136.0022.67 > >> 0.043.750.003.75 1.08 1.30 > >> dm-16 0.00 0.000.009.00 0.00 104.0023.11 > >> 0.09 10.440.00 10.44 2.44 2.20 > >> dm-17 0.00 0.000.005.00 0.00 160.0064.00 > >> 0.024.000.004.00 4.00 2.00 > >> dm-18 0.00 0.000.005.00 0.0052.0020.80 > >> 0.035.800.005.80 3.60 1.80 > >> dm-19 0.00 0.000.00 10.00 0.00 104.0020.80 > >> 0.087.900.007.90 2.10 2.10 > >> dm-20 0.00 0.000.009.00 0.00 132.0029.33 > >> 0.10 11.220.00 11.22 2.56 2.30 > >> dm-21 0.00 0.000.006.00 0.0068.0022.67 > >> 0.07 12.330.00 12.33 3.83 2.30 > >> dm-22 0.00 0.000.003.00 0.0020.0013.33 > >> 0.013.670.003.67 3.67 1.10 > >> dm-24 0.00 0.000.004.00 0.0024.0012.00 > >> 0.07 18.000.00 18.00 5.25 2.10 > >> dm-25 0.00 0.000.006.00 0.0064.0021.33 > >> 0.06 10.330.00 10.33 3.67 2.20 > >> > >> Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz > >> avgqu-sz await r_await w_await svctm %util > >> dm-14 0.00 0.000.005.00 0.00 140.0056.00 > >> 0.08 15.200.00 15.20 5.40 2.70 > >> dm-15 0.00 0.000.006.00 0.00 236.0078.67 > >> 0.18 30.670.00 30.67 6.83 4.10 > >> dm-16 0.00 0.000.008.00 0.0084.0021.00 > >> 0.067.250.007.25 1.62 1.30 > >> dm-17 0.00 0.000.003.00 0.0084.0056.00 > >> 0.000.330.000.33 0.33 0.10 > >> dm-18 0.00 0.000.002.00 0.0020.0020.00 > >> 0.02 12.000.00 12.00 12.00 2.40 > >> dm-19 0.00 0.000.00
Re: [ceph-users] Huge latency spikes
Raid card for journal disks is Perc H730 (Megaraid), RAID 1, battery back cache is on Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU I have 2 other nodes with older Perc H710 and similar SSDs with slightly higher wear (6.3% vs 5.18%) but from observation they hardly hit 1.5 ms on rear occasion Cache, RAID, and battery situation is the same. On 11/17/2018 11:38 PM, Serkan Çoban wrote: 10ms w_await for SSD is too much. How that SSD is connected to the system? Any raid card installed on this system? What is the raid mode? On Sun, Nov 18, 2018 at 8:25 AM Alex Litvak wrote: Here is another snapshot. I wonder if this write io wait is too big Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-14 0.00 0.000.00 23.00 0.00 336.0029.22 0.34 14.740.00 14.74 2.87 6.60 dm-15 0.00 0.000.00 16.00 0.00 200.0025.00 0.010.750.000.75 0.75 1.20 dm-16 0.00 0.000.00 17.00 0.00 276.0032.47 0.25 14.940.00 14.94 3.35 5.70 dm-17 0.00 0.000.00 17.00 0.00 252.0029.65 0.32 18.650.00 18.65 4.00 6.80 dm-18 0.00 0.000.00 15.00 0.00 152.0020.27 0.25 16.800.00 16.80 4.07 6.10 dm-19 0.00 0.000.00 13.00 0.00 152.0023.38 0.21 15.920.00 15.92 4.85 6.30 dm-20 0.00 0.000.00 20.00 0.00 248.0024.80 0.27 13.600.00 13.60 3.25 6.50 dm-21 0.00 0.000.00 17.00 0.00 188.0022.12 0.27 16.000.00 16.00 3.59 6.10 dm-22 0.00 0.000.00 20.00 0.00 156.0015.60 0.115.550.005.55 2.95 5.90 dm-24 0.00 0.000.008.00 0.0056.0014.00 0.12 14.620.00 14.62 4.75 3.80 dm-25 0.00 0.000.00 19.00 0.00 200.0021.05 0.21 10.890.00 10.89 2.74 5.20 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-14 0.00 0.000.00 11.00 0.00 136.0024.73 0.119.730.009.73 1.82 2.00 dm-15 0.00 0.000.00 12.00 0.00 136.0022.67 0.043.750.003.75 1.08 1.30 dm-16 0.00 0.000.009.00 0.00 104.0023.11 0.09 10.440.00 10.44 2.44 2.20 dm-17 0.00 0.000.005.00 0.00 160.0064.00 0.024.000.004.00 4.00 2.00 dm-18 0.00 0.000.005.00 0.0052.0020.80 0.035.800.005.80 3.60 1.80 dm-19 0.00 0.000.00 10.00 0.00 104.0020.80 0.087.900.007.90 2.10 2.10 dm-20 0.00 0.000.009.00 0.00 132.0029.33 0.10 11.220.00 11.22 2.56 2.30 dm-21 0.00 0.000.006.00 0.0068.0022.67 0.07 12.330.00 12.33 3.83 2.30 dm-22 0.00 0.000.003.00 0.0020.0013.33 0.013.670.003.67 3.67 1.10 dm-24 0.00 0.000.004.00 0.0024.0012.00 0.07 18.000.00 18.00 5.25 2.10 dm-25 0.00 0.000.006.00 0.0064.0021.33 0.06 10.330.00 10.33 3.67 2.20 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-14 0.00 0.000.005.00 0.00 140.0056.00 0.08 15.200.00 15.20 5.40 2.70 dm-15 0.00 0.000.006.00 0.00 236.0078.67 0.18 30.670.00 30.67 6.83 4.10 dm-16 0.00 0.000.008.00 0.0084.0021.00 0.067.250.007.25 1.62 1.30 dm-17 0.00 0.000.003.00 0.0084.0056.00 0.000.330.000.33 0.33 0.10 dm-18 0.00 0.000.002.00 0.0020.0020.00 0.02 12.000.00 12.00 12.00 2.40 dm-19 0.00 0.000.00 12.00 0.0080.0013.33 0.054.000.004.00 2.33 2.80 dm-20 0.00 0.000.00 16.00 0.00 256.0032.00 0.000.060.000.06 0.06 0.10 dm-21 0.00 0.000.008.00 0.00 500.00 125.00 0.000.120.000.12 0.12 0.10 dm-22 0.00 0.000.001.00 0.00 8.0016.00 0.000.000.000.00 0.00 0.00 dm-24 0.00 0.000.00
Re: [ceph-users] Huge latency spikes
>10ms w_await for SSD is too much. How that SSD is connected to the system? Any >raid card installed on this system? What is the raid mode? On Sun, Nov 18, 2018 at 8:25 AM Alex Litvak wrote: > > Here is another snapshot. I wonder if this write io wait is too big > Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz > avgqu-sz await r_await w_await svctm %util > dm-14 0.00 0.000.00 23.00 0.00 336.0029.22 > 0.34 14.740.00 14.74 2.87 6.60 > dm-15 0.00 0.000.00 16.00 0.00 200.0025.00 > 0.010.750.000.75 0.75 1.20 > dm-16 0.00 0.000.00 17.00 0.00 276.0032.47 > 0.25 14.940.00 14.94 3.35 5.70 > dm-17 0.00 0.000.00 17.00 0.00 252.0029.65 > 0.32 18.650.00 18.65 4.00 6.80 > dm-18 0.00 0.000.00 15.00 0.00 152.0020.27 > 0.25 16.800.00 16.80 4.07 6.10 > dm-19 0.00 0.000.00 13.00 0.00 152.0023.38 > 0.21 15.920.00 15.92 4.85 6.30 > dm-20 0.00 0.000.00 20.00 0.00 248.0024.80 > 0.27 13.600.00 13.60 3.25 6.50 > dm-21 0.00 0.000.00 17.00 0.00 188.0022.12 > 0.27 16.000.00 16.00 3.59 6.10 > dm-22 0.00 0.000.00 20.00 0.00 156.0015.60 > 0.115.550.005.55 2.95 5.90 > dm-24 0.00 0.000.008.00 0.0056.0014.00 > 0.12 14.620.00 14.62 4.75 3.80 > dm-25 0.00 0.000.00 19.00 0.00 200.0021.05 > 0.21 10.890.00 10.89 2.74 5.20 > > Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz > avgqu-sz await r_await w_await svctm %util > dm-14 0.00 0.000.00 11.00 0.00 136.0024.73 > 0.119.730.009.73 1.82 2.00 > dm-15 0.00 0.000.00 12.00 0.00 136.0022.67 > 0.043.750.003.75 1.08 1.30 > dm-16 0.00 0.000.009.00 0.00 104.0023.11 > 0.09 10.440.00 10.44 2.44 2.20 > dm-17 0.00 0.000.005.00 0.00 160.0064.00 > 0.024.000.004.00 4.00 2.00 > dm-18 0.00 0.000.005.00 0.0052.0020.80 > 0.035.800.005.80 3.60 1.80 > dm-19 0.00 0.000.00 10.00 0.00 104.0020.80 > 0.087.900.007.90 2.10 2.10 > dm-20 0.00 0.000.009.00 0.00 132.0029.33 > 0.10 11.220.00 11.22 2.56 2.30 > dm-21 0.00 0.000.006.00 0.0068.0022.67 > 0.07 12.330.00 12.33 3.83 2.30 > dm-22 0.00 0.000.003.00 0.0020.0013.33 > 0.013.670.003.67 3.67 1.10 > dm-24 0.00 0.000.004.00 0.0024.0012.00 > 0.07 18.000.00 18.00 5.25 2.10 > dm-25 0.00 0.000.006.00 0.0064.0021.33 > 0.06 10.330.00 10.33 3.67 2.20 > > Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz > avgqu-sz await r_await w_await svctm %util > dm-14 0.00 0.000.005.00 0.00 140.0056.00 > 0.08 15.200.00 15.20 5.40 2.70 > dm-15 0.00 0.000.006.00 0.00 236.0078.67 > 0.18 30.670.00 30.67 6.83 4.10 > dm-16 0.00 0.000.008.00 0.0084.0021.00 > 0.067.250.007.25 1.62 1.30 > dm-17 0.00 0.000.003.00 0.0084.0056.00 > 0.000.330.000.33 0.33 0.10 > dm-18 0.00 0.000.002.00 0.0020.0020.00 > 0.02 12.000.00 12.00 12.00 2.40 > dm-19 0.00 0.000.00 12.00 0.0080.0013.33 > 0.054.000.004.00 2.33 2.80 > dm-20 0.00 0.000.00 16.00 0.00 256.0032.00 > 0.000.060.000.06 0.06 0.10 > dm-21 0.00 0.000.008.00 0.00 500.00 125.00 > 0.000.120.000.12 0.12 0.10 > dm-22 0.00 0.000.001.00 0.00 8.0016.00 > 0.000.000.000.00 0.00 0.00 > dm-24 0.00 0.000.000.00 0.00 0.00 0.00 > 0.000.000.000.00 0.00 0.00 > dm-25 0.00 0.000.002.00 0.0032.0032.00 > 0.08 40.000.00 40.00 20.50 4.10 > > Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz > avgqu-sz await r_await w_await svctm %util > dm-14 0.00
Re: [ceph-users] Huge latency spikes
Here is another snapshot. I wonder if this write io wait is too big Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-14 0.00 0.000.00 23.00 0.00 336.0029.22 0.34 14.740.00 14.74 2.87 6.60 dm-15 0.00 0.000.00 16.00 0.00 200.0025.00 0.010.750.000.75 0.75 1.20 dm-16 0.00 0.000.00 17.00 0.00 276.0032.47 0.25 14.940.00 14.94 3.35 5.70 dm-17 0.00 0.000.00 17.00 0.00 252.0029.65 0.32 18.650.00 18.65 4.00 6.80 dm-18 0.00 0.000.00 15.00 0.00 152.0020.27 0.25 16.800.00 16.80 4.07 6.10 dm-19 0.00 0.000.00 13.00 0.00 152.0023.38 0.21 15.920.00 15.92 4.85 6.30 dm-20 0.00 0.000.00 20.00 0.00 248.0024.80 0.27 13.600.00 13.60 3.25 6.50 dm-21 0.00 0.000.00 17.00 0.00 188.0022.12 0.27 16.000.00 16.00 3.59 6.10 dm-22 0.00 0.000.00 20.00 0.00 156.0015.60 0.115.550.005.55 2.95 5.90 dm-24 0.00 0.000.008.00 0.0056.0014.00 0.12 14.620.00 14.62 4.75 3.80 dm-25 0.00 0.000.00 19.00 0.00 200.0021.05 0.21 10.890.00 10.89 2.74 5.20 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-14 0.00 0.000.00 11.00 0.00 136.0024.73 0.119.730.009.73 1.82 2.00 dm-15 0.00 0.000.00 12.00 0.00 136.0022.67 0.043.750.003.75 1.08 1.30 dm-16 0.00 0.000.009.00 0.00 104.0023.11 0.09 10.440.00 10.44 2.44 2.20 dm-17 0.00 0.000.005.00 0.00 160.0064.00 0.024.000.004.00 4.00 2.00 dm-18 0.00 0.000.005.00 0.0052.0020.80 0.035.800.005.80 3.60 1.80 dm-19 0.00 0.000.00 10.00 0.00 104.0020.80 0.087.900.007.90 2.10 2.10 dm-20 0.00 0.000.009.00 0.00 132.0029.33 0.10 11.220.00 11.22 2.56 2.30 dm-21 0.00 0.000.006.00 0.0068.0022.67 0.07 12.330.00 12.33 3.83 2.30 dm-22 0.00 0.000.003.00 0.0020.0013.33 0.013.670.003.67 3.67 1.10 dm-24 0.00 0.000.004.00 0.0024.0012.00 0.07 18.000.00 18.00 5.25 2.10 dm-25 0.00 0.000.006.00 0.0064.0021.33 0.06 10.330.00 10.33 3.67 2.20 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-14 0.00 0.000.005.00 0.00 140.0056.00 0.08 15.200.00 15.20 5.40 2.70 dm-15 0.00 0.000.006.00 0.00 236.0078.67 0.18 30.670.00 30.67 6.83 4.10 dm-16 0.00 0.000.008.00 0.0084.0021.00 0.067.250.007.25 1.62 1.30 dm-17 0.00 0.000.003.00 0.0084.0056.00 0.000.330.000.33 0.33 0.10 dm-18 0.00 0.000.002.00 0.0020.0020.00 0.02 12.000.00 12.00 12.00 2.40 dm-19 0.00 0.000.00 12.00 0.0080.0013.33 0.054.000.004.00 2.33 2.80 dm-20 0.00 0.000.00 16.00 0.00 256.0032.00 0.000.060.000.06 0.06 0.10 dm-21 0.00 0.000.008.00 0.00 500.00 125.00 0.000.120.000.12 0.12 0.10 dm-22 0.00 0.000.001.00 0.00 8.0016.00 0.000.000.000.00 0.00 0.00 dm-24 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 dm-25 0.00 0.000.002.00 0.0032.0032.00 0.08 40.000.00 40.00 20.50 4.10 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-14 0.00 0.000.00 10.00 0.00 108.0021.60 0.11 10.800.00 10.80 1.90 1.90 dm-15 0.00 0.000.005.00 0.0060.0024.00 0.036.200.006.20 3.40 1.70 dm-16 0.00 0.000.006.00 0.0068.0022.67 0.000.170.000.17 0.17 0.10 dm-17
Re: [ceph-users] Huge latency spikes
I stand corrected, I looked at the device iostat, but it was partitioned. Here is a more correct picture of what is going on now. Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-14 0.00 0.000.00 19.00 0.00 4116.00 433.26 0.010.680.000.68 0.05 0.10 dm-15 0.00 0.000.00 35.00 0.00 8224.00 469.94 0.030.860.000.86 0.06 0.20 dm-16 0.00 0.000.00 53.00 0.00 12428.00 468.98 0.112.040.002.04 0.17 0.90 dm-17 0.00 0.000.00 43.00 0.00 8344.00 388.09 0.092.140.002.14 0.42 1.80 dm-18 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 dm-19 0.00 0.000.00 75.00 0.00 16824.00 448.64 0.081.110.001.11 0.08 0.60 dm-20 0.00 0.000.00 70.00 0.00 16452.00 470.06 0.060.900.000.90 0.09 0.60 dm-21 0.00 0.000.00 18.00 0.00 4112.00 456.89 0.021.000.001.00 0.11 0.20 dm-22 0.00 0.000.00 53.00 0.00 12324.00 465.06 0.060.700.000.70 0.08 0.40 dm-24 0.00 0.000.00 18.00 0.00 4272.00 474.67 0.021.060.001.06 0.17 0.30 dm-25 0.00 0.000.00 74.00 0.00 16916.00 457.19 0.091.260.001.26 0.18 1.30 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-14 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 dm-15 0.00 0.000.00 17.00 0.00 4108.00 483.29 0.021.000.001.00 0.06 0.10 dm-16 0.00 0.000.00 34.00 0.00 8208.00 482.82 0.031.000.001.00 0.06 0.20 dm-17 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 dm-18 0.00 0.000.00 36.00 0.00 8220.00 456.67 0.051.330.001.33 0.08 0.30 dm-19 0.00 0.000.001.00 0.00 8.0016.00 0.000.000.000.00 0.00 0.00 dm-20 0.00 0.000.00 36.00 0.00 8288.00 460.44 0.051.420.001.42 0.08 0.30 dm-21 0.00 0.000.00 34.00 0.00 8208.00 482.82 0.031.000.001.00 0.06 0.20 dm-22 0.00 0.000.00 18.00 0.00 4128.00 458.67 0.043.220.003.22 0.17 0.30 dm-24 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 dm-25 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-14 0.00 0.000.00 20.00 0.00 4032.00 403.20 0.000.000.000.00 0.00 0.00 dm-15 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 dm-16 0.00 0.000.001.00 0.0020.0040.00 0.000.000.000.00 0.00 0.00 dm-17 0.00 0.000.004.00 0.0028.0014.00 0.000.000.000.00 0.00 0.00 dm-18 0.00 0.000.003.00 0.0036.0024.00 0.000.000.000.00 0.00 0.00 dm-19 0.00 0.000.002.00 0.0020.0020.00 0.012.500.002.50 2.50 0.50 dm-20 0.00 0.000.006.00 0.0096.0032.00 0.023.330.003.33 2.00 1.20 dm-21 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 dm-22 0.00 0.000.002.00 0.0032.0032.00 0.000.000.000.00 0.00 0.00 dm-24 0.00 0.000.00 22.00 0.00 4184.00 380.36 0.104.590.004.59 0.95 2.10 dm-25 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util dm-14 0.00 0.000.008.00 0.00 1928.00 482.00 0.011.000.001.00 0.12 0.10 dm-15 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 dm-16 0.00 0.000.003.00 0.00
Re: [ceph-users] Huge latency spikes
The iostat isn't very helpful because there are not many writes. I'd recommend disabling cstates entirely, not sure it's your problem but it's good practice and if your cluster goes as idle as your iostat suggests it could be the culprit. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Use SSDs for metadata or for a pool cache?
Hello, I am building a new cluster with 4 hosts, which have the following configuration: 128Gb RAM 12 HDs SATA 8TB 7.2k rpm 2 SSDs 240Gb 2x10GB Network I will use the cluster to store RBD images of VMs, I thought to use with 2x replica, if it does not get too slow. My question is: Using bluestore (default since Luminous, right?), should I use the SSDs as a "cache pool" or use the SSDs to store the bluestore metadata? Or could I use 1 SSD for metadata and another for a "cache pool"? Thank you in advance for your opinions. Gesiel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Huge latency spikes
Plot thickens: I checked c-states and apparently I am operating in c1 with all CPUS on. Apparently servers were tuned to use latency-performance tuned-adm active Current active profile: latency-performance turbostat shows PackageCore CPU Avg_MHz %Busy Bzy_MHz TSC_MHz SMI CPU%c1 CPU%c3 CPU%c6 CPU%c7 CoreTmp PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt RAMWatt PKG_% RAM_% - - - 220.8426002400 0 99.16 0.000.000.00 49 580.000.000.000.00 69.51 17.290.000.00 0 0 0 391.5226002400 0 98.48 0.000.000.00 48 580.000.000.000.00 36.30 8.730.000.00 0 0 12 150.5626002400 0 99.44 0 1 2 471.8126002400 0 98.19 0.000.000.00 49 0 1 14 170.6626002400 0 99.34 0 2 4 311.2026002400 0 98.80 0.000.000.00 47 0 2 16 180.7126002400 0 99.29 0 3 6 311.2126002400 0 98.79 0.000.000.00 49 0 3 18 391.5026002400 0 98.50 0 4 8 331.2726002400 0 98.73 0.000.000.00 46 0 4 20 170.6426002400 0 99.36 0 5 10 321.2326002400 0 98.77 0.000.000.00 48 0 5 22 200.7626002400 0 99.24 1 0 1 250.9526002400 0 99.05 0.000.000.00 44 520.000.000.000.00 33.21 8.560.000.00 1 0 13 90.3426002400 0 99.66 1 1 3 90.3526002400 0 99.65 0.000.000.00 42 1 1 15 110.4226002400 0 99.58 1 2 5 301.1726002400 0 98.83 0.000.000.00 46 1 2 17 70.2826002400 0 99.72 1 3 7 100.4026002400 0 99.60 0.000.000.00 44 1 3 19 100.3726002400 0 99.63 1 4 9 90.3626002400 0 99.64 0.000.000.00 45 1 4 21 70.2726002400 0 99.73 1 5 11 120.4526002400 0 99.55 0.000.000.00 45 1 5 23 461.7626002400 0 98.24 iostat for ssd shows # iostat -xd -p sdb 1 1000 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0.00 0.000.05 26.78 0.20 2299.53 171.42 0.020.640.110.64 0.08 0.20 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0.00 0.000.00 16.00 0.00 392.0049.00 0.000.060.000.06 0.06 0.10 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0.00 0.000.00 74.00 0.00 880.0023.78 0.000.000.000.00 0.00 0.00 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0.00 0.000.00 56.00 0.00 240.00 8.57 0.000.000.000.00 0.00 0.00 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0.00 0.000.00 44.00 0.00 676.0030.73 0.000.070.000.07 0.05 0.20 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0.00 0.000.00 10.00 0.0092.0018.40 0.000.000.000.00 0.00 0.00 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0.00 0.000.006.00 0.0084.0028.00 0.000.000.000.00 0.00 0.00 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 0.00 0.000.001.00 0.0020.0040.00 0.000.000.000.00 0.00 0.00 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz
Re: [ceph-users] Huge latency spikes
You can check if cstates are enabled with cat /proc/acpi/processor/info. Look for power management: yes/no. If they are enabled then you can check the current cstate of each core. 0 is the CPU's normal operating range, any other state means the processor is in a power saving mode. cat /proc/acpi/processor/CPU?/power. cstates are configured in the bios so a reboot is required to change them. I know with Dell servers you can trigger the change with omconfig and then issue a reboot for it to take effect. Otherwise you'll need to disable it directly in the bios. As for the SSD's I would just run iostat and check the iowait. If you see small disk writes causing high iowait then your SSD's are probably at the end of their life. Ceph journaling is good at destroying SSD's. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Huge latency spikes
John, Thank you for suggestions: I looked into journal SSDs. It is close to 3 years old showing 5.17% of wear (352941GB Written to disk with 3.6 PB endurance specs over 5 years) It could be that smart not telling all but that it what I see. Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0032 100 100 000Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000Old_age Always - 29054 12 Power_Cycle_Count 0x0032 100 100 000Old_age Always - 4 170 Available_Reservd_Space 0x0033 100 100 010Pre-fail Always - 0 171 Program_Fail_Count 0x0032 100 100 000Old_age Always - 0 172 Erase_Fail_Count0x0032 100 100 000Old_age Always - 0 174 Unsafe_Shutdown_Count 0x0032 100 100 000Old_age Always - 3 175 Power_Loss_Cap_Test 0x0033 100 100 010Pre-fail Always - 5130 (117 3127) 183 SATA_Downshift_Count0x0032 100 100 000Old_age Always - 0 184 End-to-End_Error0x0033 100 100 090Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 100 000Old_age Always - 0 190 Temperature_Case0x0022 074 064 000Old_age Always - 26 (Min/Max 23/36) 192 Unsafe_Shutdown_Count 0x0032 100 100 000Old_age Always - 3 194 Temperature_Internal0x0022 100 100 000Old_age Always - 26 197 Current_Pending_Sector 0x0012 100 100 000Old_age Always - 0 199 CRC_Error_Count 0x003e 100 100 000Old_age Always - 0 225 Host_Writes_32MiB 0x0032 100 100 000Old_age Always - 10518704 226 Workld_Media_Wear_Indic 0x0032 100 100 000Old_age Always - 5304 227 Workld_Host_Reads_Perc 0x0032 100 100 000Old_age Always - 0 228 Workload_Minutes0x0032 100 100 000Old_age Always - 1743266 232 Available_Reservd_Space 0x0033 100 100 010Pre-fail Always - 0 233 Media_Wearout_Indicator 0x0032 095 095 000Old_age Always - 0 234 Thermal_Throttle0x0032 100 100 000Old_age Always - 0/0 241 Host_Writes_32MiB 0x0032 100 100 000Old_age Always - 10518704 242 Host_Reads_32MiB0x0032 100 100 000Old_age Always - 6034 SMART Error Log Version: 1 No Errors Logged How do you look at cstates? On 11/17/2018 2:37 PM, John Petrini wrote: I'd take a look at cstates if it's only happening during periods of low activity. If your journals are on SSD you should also check their health. They may have exceeded their write endurance - high apply latency is a tell tale sign of this and you'd see high iowait on those disks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG auto repair with BlueStore
> Wido den Hollander : > On 11/15/18 7:51 PM, koukou73gr wrote: > > Are there any means to notify the administrator that an auto-repair has > > taken place? > > I don't think so. You'll see the cluster go to HEALTH_ERR for a while > before it turns to HEALTH_OK again after the PG has been repaired. and I think even this is too much. No point in triggering a monitoring system in the middle of the night when the scrubs are running just because of some bit rot on a disk. Losing a few bits on disks here and there is a perfectly normal and expected scenario that Ceph can take care of all by itself without triggering an health *error*. It certainly doesn't require immediate attention (with auto repair enabled) like the error state indicates. The message in the cluster log should be enough. -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 Am Fr., 16. Nov. 2018 um 08:26 Uhr schrieb Wido den Hollander : > > > > On 11/15/18 7:51 PM, koukou73gr wrote: > > Are there any means to notify the administrator that an auto-repair has > > taken place? > > I don't think so. You'll see the cluster go to HEALTH_ERR for a while > before it turns to HEALTH_OK again after the PG has been repaired. > > You would have to search the cluster logs to find out that a auto repair > took place on a Placement Group. > > Wido > > > > > -K. > > > > > > On 2018-11-15 20:45, Mark Schouten wrote: > >> As a user, I’m very surprised that this isn’t a default setting. > >> > >> Mark Schouten > >> > >>> Op 15 nov. 2018 om 18:40 heeft Wido den Hollander het > >>> volgende geschreven: > >>> > >>> Hi, > >>> > >>> This question is actually still outstanding. Is there any good reason to > >>> keep auto repair for scrub errors disabled with BlueStore? > >>> > >>> I couldn't think of a reason when using size=3 and min_size=2, so just > >>> wondering. > >>> > >>> Thanks! > >>> > >>> Wido > >>> > On 8/24/18 8:55 AM, Wido den Hollander wrote: > Hi, > > osd_scrub_auto_repair still defaults to false and I was wondering > how we > think about enabling this feature by default. > > Would we say it's safe to enable this with BlueStore? > > Wido > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >>> ___ > >>> ceph-users mailing list > >>> ceph-users@lists.ceph.com > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Huge latency spikes
I'd take a look at cstates if it's only happening during periods of low activity. If your journals are on SSD you should also check their health. They may have exceeded their write endurance - high apply latency is a tell tale sign of this and you'd see high iowait on those disks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Huge latency spikes
I am evaluating bluestore on the separate cluster. Unfortunately upgrading this one is out of the question at the moment for multiple reasons. That is why I am trying to find a possible root cause. On 11/17/2018 2:14 PM, Paul Emmerich wrote: Are you running FileStore? (The config options you are using looks like a FileStore config) Try out BlueStore, we've found that it reduces random latency spikes due to filesystem weirdness a lot. Paul ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Huge latency spikes
Are you running FileStore? (The config options you are using looks like a FileStore config) Try out BlueStore, we've found that it reduces random latency spikes due to filesystem weirdness a lot. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 Am Sa., 17. Nov. 2018 um 21:07 Uhr schrieb Alex Litvak : > > I am using libvirt for block device (openstack, proxmox KVM VMs) > Also I am mounting cephfs inside of VMs and on bare metal hosts. In > this case it would be a kernel based client. > > From what I can see based on pool stats cephfs pools have higher > utilization comparing to block pools during the spikes, how ever it is > still small. > > On 11/17/2018 1:40 PM, Kees Meijs wrote: > > Hi Alex, > > > > What kind of clients do you use? Is it KVM (QEMU) using NBD driver, > > kernel, or...? > > > > Regards, > > Kees > > > > On 17-11-18 20:17, Alex Litvak wrote: > >> Hello everyone, > >> > >> I am trying to troubleshoot cluster exhibiting huge spikes of latency. > >> I cannot quite catch it because it happens during the light activity > >> and randomly affects one osd node out of 3 in the pool. > >> > >> This is a file store. > >> I see some osds exhibit applied latency of 400 ms, 1 minute load > >> average shuts to 60. Client commit latency with queue shoots to 300ms > >> and journal latency (return write ack for client) (journal on Intel > >> DC-S3710 SSD) shoots on 40 ms > >> > >> op_w_process_latency showed 250 ms and client read-modify-write > >> operation readable/applied latency jumped to 1.25 s on one of the OSDs > >> > >> I rescheduled the scrubbing and deep scrubbing and was watching ceph > >> -w activity so it is definitely not related. > >> > >> At the same time node shows 98 % cpu idle no significant changes in > >> memory utilization, no errors on network with bandwidth utilization > >> between 20 - 50 Mbit on client and back end networks > >> > >> OSD node has 12 OSDs (2TB rust) 2 partitioned SSD journal disks, 32 GB > >> RAM, dial 6 core / 12 thread CPUs > >> > >> This is perhaps the most relevant part of ceph config > >> > >> debug lockdep = 0/0 > >> debug context = 0/0 > >> debug crush = 0/0 > >> debug buffer = 0/0 > >> debug timer = 0/0 > >> debug journaler = 0/0 > >> debug osd = 0/0 > >> debug optracker = 0/0 > >> debug objclass = 0/0 > >> debug filestore = 0/0 > >> debug journal = 0/0 > >> debug ms = 0/0 > >> debug monc = 0/0 > >> debug tp = 0/0 > >> debug auth = 0/0 > >> debug finisher = 0/0 > >> debug heartbeatmap = 0/0 > >> debug perfcounter = 0/0 > >> debug asok = 0/0 > >> debug throttle = 0/0 > >> > >> [osd] > >> journal_dio = true > >> journal_aio = true > >> osd_journal = /var/lib/ceph/osd/$cluster-$id-journal/journal > >> osd_journal_size = 2048 ; journal size, in megabytes > >> osd crush update on start = false > >> osd mount options xfs = > >> "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" > >> osd_op_threads = 5 > >> osd_disk_threads = 4 > >> osd_pool_default_size = 2 > >> osd_pool_default_min_size = 1 > >> osd_pool_default_pg_num = 512 > >> osd_pool_default_pgp_num = 512 > >> osd_crush_chooseleaf_type = 1 > >> ; osd pool_default_crush_rule = 1 > >> ; new options 04.12.2015 > >> filestore_op_threads = 4 > >> osd_op_num_threads_per_shard = 1 > >> osd_op_num_shards = 25 > >> filestore_fd_cache_size = 64 > >> filestore_fd_cache_shards = 32 > >> filestore_fiemap = false > >> ; Reduce impact of scrub (needs cfq on osds) > >> osd_disk_thread_ioprio_class = "idle" > >> osd_disk_thread_ioprio_priority = 7 > >> osd_deep_scrub_interval = 1211600 > >> osd_scrub_begin_hour = 19 > >> osd_scrub_end_hour = 4 > >> osd_scrub_sleep = 0.1 > >> [client] > >> rbd_cache = true > >> rbd_cache_size = 67108864 > >> rbd_cache_max_dirty = 50331648 > >> rbd_cache_target_dirty = 33554432 > >> rbd_cache_max_dirty_age = 2 > >> rbd_cache_writethrough_until_flush = true > >> > >> OSD logs and system log at that time show nothing interesting. > >> > >> Any clue of what to look for in order to diagnose the load / latency > >> spikes would be really appreciated. > >> > >> Thank you > >> > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Huge latency spikes
I am using libvirt for block device (openstack, proxmox KVM VMs) Also I am mounting cephfs inside of VMs and on bare metal hosts. In this case it would be a kernel based client. From what I can see based on pool stats cephfs pools have higher utilization comparing to block pools during the spikes, how ever it is still small. On 11/17/2018 1:40 PM, Kees Meijs wrote: Hi Alex, What kind of clients do you use? Is it KVM (QEMU) using NBD driver, kernel, or...? Regards, Kees On 17-11-18 20:17, Alex Litvak wrote: Hello everyone, I am trying to troubleshoot cluster exhibiting huge spikes of latency. I cannot quite catch it because it happens during the light activity and randomly affects one osd node out of 3 in the pool. This is a file store. I see some osds exhibit applied latency of 400 ms, 1 minute load average shuts to 60. Client commit latency with queue shoots to 300ms and journal latency (return write ack for client) (journal on Intel DC-S3710 SSD) shoots on 40 ms op_w_process_latency showed 250 ms and client read-modify-write operation readable/applied latency jumped to 1.25 s on one of the OSDs I rescheduled the scrubbing and deep scrubbing and was watching ceph -w activity so it is definitely not related. At the same time node shows 98 % cpu idle no significant changes in memory utilization, no errors on network with bandwidth utilization between 20 - 50 Mbit on client and back end networks OSD node has 12 OSDs (2TB rust) 2 partitioned SSD journal disks, 32 GB RAM, dial 6 core / 12 thread CPUs This is perhaps the most relevant part of ceph config debug lockdep = 0/0 debug context = 0/0 debug crush = 0/0 debug buffer = 0/0 debug timer = 0/0 debug journaler = 0/0 debug osd = 0/0 debug optracker = 0/0 debug objclass = 0/0 debug filestore = 0/0 debug journal = 0/0 debug ms = 0/0 debug monc = 0/0 debug tp = 0/0 debug auth = 0/0 debug finisher = 0/0 debug heartbeatmap = 0/0 debug perfcounter = 0/0 debug asok = 0/0 debug throttle = 0/0 [osd] journal_dio = true journal_aio = true osd_journal = /var/lib/ceph/osd/$cluster-$id-journal/journal osd_journal_size = 2048 ; journal size, in megabytes osd crush update on start = false osd mount options xfs = "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" osd_op_threads = 5 osd_disk_threads = 4 osd_pool_default_size = 2 osd_pool_default_min_size = 1 osd_pool_default_pg_num = 512 osd_pool_default_pgp_num = 512 osd_crush_chooseleaf_type = 1 ; osd pool_default_crush_rule = 1 ; new options 04.12.2015 filestore_op_threads = 4 osd_op_num_threads_per_shard = 1 osd_op_num_shards = 25 filestore_fd_cache_size = 64 filestore_fd_cache_shards = 32 filestore_fiemap = false ; Reduce impact of scrub (needs cfq on osds) osd_disk_thread_ioprio_class = "idle" osd_disk_thread_ioprio_priority = 7 osd_deep_scrub_interval = 1211600 osd_scrub_begin_hour = 19 osd_scrub_end_hour = 4 osd_scrub_sleep = 0.1 [client] rbd_cache = true rbd_cache_size = 67108864 rbd_cache_max_dirty = 50331648 rbd_cache_target_dirty = 33554432 rbd_cache_max_dirty_age = 2 rbd_cache_writethrough_until_flush = true OSD logs and system log at that time show nothing interesting. Any clue of what to look for in order to diagnose the load / latency spikes would be really appreciated. Thank you ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG auto repair with BlueStore
While I also believe it to be perfectly safe on a bluestore cluster (especially since there's osd_scrub_auto_repair_num_errors if there's more wrong than your usual bit rot), we also don't run any cluster with this option at the moment. We had it enabled for some time before we backported the OOM-read-error stuff on some clusters. But there's a small operational issue with auto repair at the moment: this option will occasionally set the repair flag on a PG without any scrub errors during scrubbing for some reason which triggers a health error. We've had a quick look at the code and couldn't figure out how the repair flag gets set in some cases on perfectly healthy PGs. Does it maybe only get set for a very short time while finishing up the scrub and that's not always picked up in time? Anyways, a potential work-around for this would be to maybe remove the repair state from the conditions for the PG_DAMAGED warning? Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 Am Fr., 16. Nov. 2018 um 08:49 Uhr schrieb Mark Schouten : > > > Which, as a user, is very surprising to me too.. > -- > > Mark Schouten | Tuxis Internet Engineering > KvK: 61527076 | http://www.tuxis.nl/ > T: 0318 200208 | i...@tuxis.nl > > > > > - Original Message - > > > From: Wido den Hollander (w...@42on.com) > Date: 16-11-2018 08:25 > To: Mark Schouten (m...@tuxis.nl) > Cc: Ceph Users (ceph-us...@ceph.com) > Subject: Re: [ceph-users] PG auto repair with BlueStore > > > On 11/15/18 7:45 PM, Mark Schouten wrote: > > As a user, I’m very surprised that this isn’t a default setting. > > > > That is because you can also have FileStore OSDs in a cluster on which > such a auto-repair is not safe. > > Wido > > > Mark Schouten > > > >> Op 15 nov. 2018 om 18:40 heeft Wido den Hollander het > >> volgende geschreven: > >> > >> Hi, > >> > >> This question is actually still outstanding. Is there any good reason to > >> keep auto repair for scrub errors disabled with BlueStore? > >> > >> I couldn't think of a reason when using size=3 and min_size=2, so just > >> wondering. > >> > >> Thanks! > >> > >> Wido > >> > >>> On 8/24/18 8:55 AM, Wido den Hollander wrote: > >>> Hi, > >>> > >>> osd_scrub_auto_repair still defaults to false and I was wondering how we > >>> think about enabling this feature by default. > >>> > >>> Would we say it's safe to enable this with BlueStore? > >>> > >>> Wido > >>> ___ > >>> ceph-users mailing list > >>> ceph-users@lists.ceph.com > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Huge latency spikes
Hi Alex, What kind of clients do you use? Is it KVM (QEMU) using NBD driver, kernel, or...? Regards, Kees On 17-11-18 20:17, Alex Litvak wrote: > Hello everyone, > > I am trying to troubleshoot cluster exhibiting huge spikes of latency. > I cannot quite catch it because it happens during the light activity > and randomly affects one osd node out of 3 in the pool. > > This is a file store. > I see some osds exhibit applied latency of 400 ms, 1 minute load > average shuts to 60. Client commit latency with queue shoots to 300ms > and journal latency (return write ack for client) (journal on Intel > DC-S3710 SSD) shoots on 40 ms > > op_w_process_latency showed 250 ms and client read-modify-write > operation readable/applied latency jumped to 1.25 s on one of the OSDs > > I rescheduled the scrubbing and deep scrubbing and was watching ceph > -w activity so it is definitely not related. > > At the same time node shows 98 % cpu idle no significant changes in > memory utilization, no errors on network with bandwidth utilization > between 20 - 50 Mbit on client and back end networks > > OSD node has 12 OSDs (2TB rust) 2 partitioned SSD journal disks, 32 GB > RAM, dial 6 core / 12 thread CPUs > > This is perhaps the most relevant part of ceph config > > debug lockdep = 0/0 > debug context = 0/0 > debug crush = 0/0 > debug buffer = 0/0 > debug timer = 0/0 > debug journaler = 0/0 > debug osd = 0/0 > debug optracker = 0/0 > debug objclass = 0/0 > debug filestore = 0/0 > debug journal = 0/0 > debug ms = 0/0 > debug monc = 0/0 > debug tp = 0/0 > debug auth = 0/0 > debug finisher = 0/0 > debug heartbeatmap = 0/0 > debug perfcounter = 0/0 > debug asok = 0/0 > debug throttle = 0/0 > > [osd] > journal_dio = true > journal_aio = true > osd_journal = /var/lib/ceph/osd/$cluster-$id-journal/journal > osd_journal_size = 2048 ; journal size, in megabytes > osd crush update on start = false > osd mount options xfs = > "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" > osd_op_threads = 5 > osd_disk_threads = 4 > osd_pool_default_size = 2 > osd_pool_default_min_size = 1 > osd_pool_default_pg_num = 512 > osd_pool_default_pgp_num = 512 > osd_crush_chooseleaf_type = 1 > ; osd pool_default_crush_rule = 1 > ; new options 04.12.2015 > filestore_op_threads = 4 > osd_op_num_threads_per_shard = 1 > osd_op_num_shards = 25 > filestore_fd_cache_size = 64 > filestore_fd_cache_shards = 32 > filestore_fiemap = false > ; Reduce impact of scrub (needs cfq on osds) > osd_disk_thread_ioprio_class = "idle" > osd_disk_thread_ioprio_priority = 7 > osd_deep_scrub_interval = 1211600 > osd_scrub_begin_hour = 19 > osd_scrub_end_hour = 4 > osd_scrub_sleep = 0.1 > [client] > rbd_cache = true > rbd_cache_size = 67108864 > rbd_cache_max_dirty = 50331648 > rbd_cache_target_dirty = 33554432 > rbd_cache_max_dirty_age = 2 > rbd_cache_writethrough_until_flush = true > > OSD logs and system log at that time show nothing interesting. > > Any clue of what to look for in order to diagnose the load / latency > spikes would be really appreciated. > > Thank you > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Huge latency spikes
Hello everyone, I am trying to troubleshoot cluster exhibiting huge spikes of latency. I cannot quite catch it because it happens during the light activity and randomly affects one osd node out of 3 in the pool. This is a file store. I see some osds exhibit applied latency of 400 ms, 1 minute load average shuts to 60. Client commit latency with queue shoots to 300ms and journal latency (return write ack for client) (journal on Intel DC-S3710 SSD) shoots on 40 ms op_w_process_latency showed 250 ms and client read-modify-write operation readable/applied latency jumped to 1.25 s on one of the OSDs I rescheduled the scrubbing and deep scrubbing and was watching ceph -w activity so it is definitely not related. At the same time node shows 98 % cpu idle no significant changes in memory utilization, no errors on network with bandwidth utilization between 20 - 50 Mbit on client and back end networks OSD node has 12 OSDs (2TB rust) 2 partitioned SSD journal disks, 32 GB RAM, dial 6 core / 12 thread CPUs This is perhaps the most relevant part of ceph config debug lockdep = 0/0 debug context = 0/0 debug crush = 0/0 debug buffer = 0/0 debug timer = 0/0 debug journaler = 0/0 debug osd = 0/0 debug optracker = 0/0 debug objclass = 0/0 debug filestore = 0/0 debug journal = 0/0 debug ms = 0/0 debug monc = 0/0 debug tp = 0/0 debug auth = 0/0 debug finisher = 0/0 debug heartbeatmap = 0/0 debug perfcounter = 0/0 debug asok = 0/0 debug throttle = 0/0 [osd] journal_dio = true journal_aio = true osd_journal = /var/lib/ceph/osd/$cluster-$id-journal/journal osd_journal_size = 2048 ; journal size, in megabytes osd crush update on start = false osd mount options xfs = "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" osd_op_threads = 5 osd_disk_threads = 4 osd_pool_default_size = 2 osd_pool_default_min_size = 1 osd_pool_default_pg_num = 512 osd_pool_default_pgp_num = 512 osd_crush_chooseleaf_type = 1 ; osd pool_default_crush_rule = 1 ; new options 04.12.2015 filestore_op_threads = 4 osd_op_num_threads_per_shard = 1 osd_op_num_shards = 25 filestore_fd_cache_size = 64 filestore_fd_cache_shards = 32 filestore_fiemap = false ; Reduce impact of scrub (needs cfq on osds) osd_disk_thread_ioprio_class = "idle" osd_disk_thread_ioprio_priority = 7 osd_deep_scrub_interval = 1211600 osd_scrub_begin_hour = 19 osd_scrub_end_hour = 4 osd_scrub_sleep = 0.1 [client] rbd_cache = true rbd_cache_size = 67108864 rbd_cache_max_dirty = 50331648 rbd_cache_target_dirty = 33554432 rbd_cache_max_dirty_age = 2 rbd_cache_writethrough_until_flush = true OSD logs and system log at that time show nothing interesting. Any clue of what to look for in order to diagnose the load / latency spikes would be really appreciated. Thank you ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com