Re: [ceph-users] Huge latency spikes

2018-11-17 Thread Alex Litvak

All 3 nodes have this status for SSD mirror.  Controller cache is on for all 3.

Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU


On 11/18/2018 12:45 AM, Serkan Çoban wrote:

Does write cache on SSDs enabled on three servers? Can you check them?
On Sun, Nov 18, 2018 at 9:05 AM Alex Litvak
 wrote:


Raid card for journal disks is Perc H730 (Megaraid), RAID 1, battery back cache 
is on

Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU

I have  2 other nodes with older Perc H710 and similar SSDs with slightly 
higher wear (6.3% vs 5.18%) but from observation they hardly hit 1.5 ms on rear 
occasion
Cache, RAID, and battery situation is the same.

On 11/17/2018 11:38 PM, Serkan Çoban wrote:

10ms w_await for SSD is too much. How that SSD is connected to the system? Any 
raid card installed on this system? What is the raid mode?

On Sun, Nov 18, 2018 at 8:25 AM Alex Litvak
 wrote:


Here is another snapshot.  I wonder if this write io wait is too big
Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
dm-14 0.00 0.000.00   23.00 0.00   336.0029.22 
0.34   14.740.00   14.74   2.87   6.60
dm-15 0.00 0.000.00   16.00 0.00   200.0025.00 
0.010.750.000.75   0.75   1.20
dm-16 0.00 0.000.00   17.00 0.00   276.0032.47 
0.25   14.940.00   14.94   3.35   5.70
dm-17 0.00 0.000.00   17.00 0.00   252.0029.65 
0.32   18.650.00   18.65   4.00   6.80
dm-18 0.00 0.000.00   15.00 0.00   152.0020.27 
0.25   16.800.00   16.80   4.07   6.10
dm-19 0.00 0.000.00   13.00 0.00   152.0023.38 
0.21   15.920.00   15.92   4.85   6.30
dm-20 0.00 0.000.00   20.00 0.00   248.0024.80 
0.27   13.600.00   13.60   3.25   6.50
dm-21 0.00 0.000.00   17.00 0.00   188.0022.12 
0.27   16.000.00   16.00   3.59   6.10
dm-22 0.00 0.000.00   20.00 0.00   156.0015.60 
0.115.550.005.55   2.95   5.90
dm-24 0.00 0.000.008.00 0.0056.0014.00 
0.12   14.620.00   14.62   4.75   3.80
dm-25 0.00 0.000.00   19.00 0.00   200.0021.05 
0.21   10.890.00   10.89   2.74   5.20

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
dm-14 0.00 0.000.00   11.00 0.00   136.0024.73 
0.119.730.009.73   1.82   2.00
dm-15 0.00 0.000.00   12.00 0.00   136.0022.67 
0.043.750.003.75   1.08   1.30
dm-16 0.00 0.000.009.00 0.00   104.0023.11 
0.09   10.440.00   10.44   2.44   2.20
dm-17 0.00 0.000.005.00 0.00   160.0064.00 
0.024.000.004.00   4.00   2.00
dm-18 0.00 0.000.005.00 0.0052.0020.80 
0.035.800.005.80   3.60   1.80
dm-19 0.00 0.000.00   10.00 0.00   104.0020.80 
0.087.900.007.90   2.10   2.10
dm-20 0.00 0.000.009.00 0.00   132.0029.33 
0.10   11.220.00   11.22   2.56   2.30
dm-21 0.00 0.000.006.00 0.0068.0022.67 
0.07   12.330.00   12.33   3.83   2.30
dm-22 0.00 0.000.003.00 0.0020.0013.33 
0.013.670.003.67   3.67   1.10
dm-24 0.00 0.000.004.00 0.0024.0012.00 
0.07   18.000.00   18.00   5.25   2.10
dm-25 0.00 0.000.006.00 0.0064.0021.33 
0.06   10.330.00   10.33   3.67   2.20

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
dm-14 0.00 0.000.005.00 0.00   140.0056.00 
0.08   15.200.00   15.20   5.40   2.70
dm-15 0.00 0.000.006.00 0.00   236.0078.67 
0.18   30.670.00   30.67   6.83   4.10
dm-16 0.00 0.000.008.00 0.0084.0021.00 
0.067.250.007.25   1.62   1.30
dm-17 0.00 0.000.003.00 0.0084.0056.00 
0.000.330.000.33   0.33   0.10
dm-18 0.00 0.000.002.00 0.0020.0020.00 
0.02   12.000.00   12.00  12.00   2.40
dm-19 0.00 0.000.00   12.00 0.0080.0013.33 
0.054.000.004.00   2.33   

Re: [ceph-users] Huge latency spikes

2018-11-17 Thread Serkan Çoban
Does write cache on SSDs enabled on three servers? Can you check them?
On Sun, Nov 18, 2018 at 9:05 AM Alex Litvak
 wrote:
>
> Raid card for journal disks is Perc H730 (Megaraid), RAID 1, battery back 
> cache is on
>
> Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad 
> BBU
> Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad 
> BBU
>
> I have  2 other nodes with older Perc H710 and similar SSDs with slightly 
> higher wear (6.3% vs 5.18%) but from observation they hardly hit 1.5 ms on 
> rear occasion
> Cache, RAID, and battery situation is the same.
>
> On 11/17/2018 11:38 PM, Serkan Çoban wrote:
> >> 10ms w_await for SSD is too much. How that SSD is connected to the system? 
> >> Any raid card installed on this system? What is the raid mode?
> > On Sun, Nov 18, 2018 at 8:25 AM Alex Litvak
> >  wrote:
> >>
> >> Here is another snapshot.  I wonder if this write io wait is too big
> >> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
> >> avgqu-sz   await r_await w_await  svctm  %util
> >> dm-14 0.00 0.000.00   23.00 0.00   336.0029.22 
> >> 0.34   14.740.00   14.74   2.87   6.60
> >> dm-15 0.00 0.000.00   16.00 0.00   200.0025.00 
> >> 0.010.750.000.75   0.75   1.20
> >> dm-16 0.00 0.000.00   17.00 0.00   276.0032.47 
> >> 0.25   14.940.00   14.94   3.35   5.70
> >> dm-17 0.00 0.000.00   17.00 0.00   252.0029.65 
> >> 0.32   18.650.00   18.65   4.00   6.80
> >> dm-18 0.00 0.000.00   15.00 0.00   152.0020.27 
> >> 0.25   16.800.00   16.80   4.07   6.10
> >> dm-19 0.00 0.000.00   13.00 0.00   152.0023.38 
> >> 0.21   15.920.00   15.92   4.85   6.30
> >> dm-20 0.00 0.000.00   20.00 0.00   248.0024.80 
> >> 0.27   13.600.00   13.60   3.25   6.50
> >> dm-21 0.00 0.000.00   17.00 0.00   188.0022.12 
> >> 0.27   16.000.00   16.00   3.59   6.10
> >> dm-22 0.00 0.000.00   20.00 0.00   156.0015.60 
> >> 0.115.550.005.55   2.95   5.90
> >> dm-24 0.00 0.000.008.00 0.0056.0014.00 
> >> 0.12   14.620.00   14.62   4.75   3.80
> >> dm-25 0.00 0.000.00   19.00 0.00   200.0021.05 
> >> 0.21   10.890.00   10.89   2.74   5.20
> >>
> >> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
> >> avgqu-sz   await r_await w_await  svctm  %util
> >> dm-14 0.00 0.000.00   11.00 0.00   136.0024.73 
> >> 0.119.730.009.73   1.82   2.00
> >> dm-15 0.00 0.000.00   12.00 0.00   136.0022.67 
> >> 0.043.750.003.75   1.08   1.30
> >> dm-16 0.00 0.000.009.00 0.00   104.0023.11 
> >> 0.09   10.440.00   10.44   2.44   2.20
> >> dm-17 0.00 0.000.005.00 0.00   160.0064.00 
> >> 0.024.000.004.00   4.00   2.00
> >> dm-18 0.00 0.000.005.00 0.0052.0020.80 
> >> 0.035.800.005.80   3.60   1.80
> >> dm-19 0.00 0.000.00   10.00 0.00   104.0020.80 
> >> 0.087.900.007.90   2.10   2.10
> >> dm-20 0.00 0.000.009.00 0.00   132.0029.33 
> >> 0.10   11.220.00   11.22   2.56   2.30
> >> dm-21 0.00 0.000.006.00 0.0068.0022.67 
> >> 0.07   12.330.00   12.33   3.83   2.30
> >> dm-22 0.00 0.000.003.00 0.0020.0013.33 
> >> 0.013.670.003.67   3.67   1.10
> >> dm-24 0.00 0.000.004.00 0.0024.0012.00 
> >> 0.07   18.000.00   18.00   5.25   2.10
> >> dm-25 0.00 0.000.006.00 0.0064.0021.33 
> >> 0.06   10.330.00   10.33   3.67   2.20
> >>
> >> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
> >> avgqu-sz   await r_await w_await  svctm  %util
> >> dm-14 0.00 0.000.005.00 0.00   140.0056.00 
> >> 0.08   15.200.00   15.20   5.40   2.70
> >> dm-15 0.00 0.000.006.00 0.00   236.0078.67 
> >> 0.18   30.670.00   30.67   6.83   4.10
> >> dm-16 0.00 0.000.008.00 0.0084.0021.00 
> >> 0.067.250.007.25   1.62   1.30
> >> dm-17 0.00 0.000.003.00 0.0084.0056.00 
> >> 0.000.330.000.33   0.33   0.10
> >> dm-18 0.00 0.000.002.00 0.0020.0020.00 
> >> 0.02   12.000.00   12.00  12.00   2.40
> >> dm-19 0.00 0.000.00   

Re: [ceph-users] Huge latency spikes

2018-11-17 Thread Alex Litvak

Raid card for journal disks is Perc H730 (Megaraid), RAID 1, battery back cache 
is on

Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU

I have  2 other nodes with older Perc H710 and similar SSDs with slightly 
higher wear (6.3% vs 5.18%) but from observation they hardly hit 1.5 ms on rear 
occasion
Cache, RAID, and battery situation is the same.

On 11/17/2018 11:38 PM, Serkan Çoban wrote:

10ms w_await for SSD is too much. How that SSD is connected to the system? Any 
raid card installed on this system? What is the raid mode?

On Sun, Nov 18, 2018 at 8:25 AM Alex Litvak
 wrote:


Here is another snapshot.  I wonder if this write io wait is too big
Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
dm-14 0.00 0.000.00   23.00 0.00   336.0029.22 
0.34   14.740.00   14.74   2.87   6.60
dm-15 0.00 0.000.00   16.00 0.00   200.0025.00 
0.010.750.000.75   0.75   1.20
dm-16 0.00 0.000.00   17.00 0.00   276.0032.47 
0.25   14.940.00   14.94   3.35   5.70
dm-17 0.00 0.000.00   17.00 0.00   252.0029.65 
0.32   18.650.00   18.65   4.00   6.80
dm-18 0.00 0.000.00   15.00 0.00   152.0020.27 
0.25   16.800.00   16.80   4.07   6.10
dm-19 0.00 0.000.00   13.00 0.00   152.0023.38 
0.21   15.920.00   15.92   4.85   6.30
dm-20 0.00 0.000.00   20.00 0.00   248.0024.80 
0.27   13.600.00   13.60   3.25   6.50
dm-21 0.00 0.000.00   17.00 0.00   188.0022.12 
0.27   16.000.00   16.00   3.59   6.10
dm-22 0.00 0.000.00   20.00 0.00   156.0015.60 
0.115.550.005.55   2.95   5.90
dm-24 0.00 0.000.008.00 0.0056.0014.00 
0.12   14.620.00   14.62   4.75   3.80
dm-25 0.00 0.000.00   19.00 0.00   200.0021.05 
0.21   10.890.00   10.89   2.74   5.20

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
dm-14 0.00 0.000.00   11.00 0.00   136.0024.73 
0.119.730.009.73   1.82   2.00
dm-15 0.00 0.000.00   12.00 0.00   136.0022.67 
0.043.750.003.75   1.08   1.30
dm-16 0.00 0.000.009.00 0.00   104.0023.11 
0.09   10.440.00   10.44   2.44   2.20
dm-17 0.00 0.000.005.00 0.00   160.0064.00 
0.024.000.004.00   4.00   2.00
dm-18 0.00 0.000.005.00 0.0052.0020.80 
0.035.800.005.80   3.60   1.80
dm-19 0.00 0.000.00   10.00 0.00   104.0020.80 
0.087.900.007.90   2.10   2.10
dm-20 0.00 0.000.009.00 0.00   132.0029.33 
0.10   11.220.00   11.22   2.56   2.30
dm-21 0.00 0.000.006.00 0.0068.0022.67 
0.07   12.330.00   12.33   3.83   2.30
dm-22 0.00 0.000.003.00 0.0020.0013.33 
0.013.670.003.67   3.67   1.10
dm-24 0.00 0.000.004.00 0.0024.0012.00 
0.07   18.000.00   18.00   5.25   2.10
dm-25 0.00 0.000.006.00 0.0064.0021.33 
0.06   10.330.00   10.33   3.67   2.20

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
dm-14 0.00 0.000.005.00 0.00   140.0056.00 
0.08   15.200.00   15.20   5.40   2.70
dm-15 0.00 0.000.006.00 0.00   236.0078.67 
0.18   30.670.00   30.67   6.83   4.10
dm-16 0.00 0.000.008.00 0.0084.0021.00 
0.067.250.007.25   1.62   1.30
dm-17 0.00 0.000.003.00 0.0084.0056.00 
0.000.330.000.33   0.33   0.10
dm-18 0.00 0.000.002.00 0.0020.0020.00 
0.02   12.000.00   12.00  12.00   2.40
dm-19 0.00 0.000.00   12.00 0.0080.0013.33 
0.054.000.004.00   2.33   2.80
dm-20 0.00 0.000.00   16.00 0.00   256.0032.00 
0.000.060.000.06   0.06   0.10
dm-21 0.00 0.000.008.00 0.00   500.00   125.00 
0.000.120.000.12   0.12   0.10
dm-22 0.00 0.000.001.00 0.00 8.0016.00 
0.000.000.000.00   0.00   0.00
dm-24 0.00 0.000.00  

Re: [ceph-users] Huge latency spikes

2018-11-17 Thread Serkan Çoban
>10ms w_await for SSD is too much. How that SSD is connected to the system? Any 
>raid card installed on this system? What is the raid mode?
On Sun, Nov 18, 2018 at 8:25 AM Alex Litvak
 wrote:
>
> Here is another snapshot.  I wonder if this write io wait is too big
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
> avgqu-sz   await r_await w_await  svctm  %util
> dm-14 0.00 0.000.00   23.00 0.00   336.0029.22
>  0.34   14.740.00   14.74   2.87   6.60
> dm-15 0.00 0.000.00   16.00 0.00   200.0025.00
>  0.010.750.000.75   0.75   1.20
> dm-16 0.00 0.000.00   17.00 0.00   276.0032.47
>  0.25   14.940.00   14.94   3.35   5.70
> dm-17 0.00 0.000.00   17.00 0.00   252.0029.65
>  0.32   18.650.00   18.65   4.00   6.80
> dm-18 0.00 0.000.00   15.00 0.00   152.0020.27
>  0.25   16.800.00   16.80   4.07   6.10
> dm-19 0.00 0.000.00   13.00 0.00   152.0023.38
>  0.21   15.920.00   15.92   4.85   6.30
> dm-20 0.00 0.000.00   20.00 0.00   248.0024.80
>  0.27   13.600.00   13.60   3.25   6.50
> dm-21 0.00 0.000.00   17.00 0.00   188.0022.12
>  0.27   16.000.00   16.00   3.59   6.10
> dm-22 0.00 0.000.00   20.00 0.00   156.0015.60
>  0.115.550.005.55   2.95   5.90
> dm-24 0.00 0.000.008.00 0.0056.0014.00
>  0.12   14.620.00   14.62   4.75   3.80
> dm-25 0.00 0.000.00   19.00 0.00   200.0021.05
>  0.21   10.890.00   10.89   2.74   5.20
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
> avgqu-sz   await r_await w_await  svctm  %util
> dm-14 0.00 0.000.00   11.00 0.00   136.0024.73
>  0.119.730.009.73   1.82   2.00
> dm-15 0.00 0.000.00   12.00 0.00   136.0022.67
>  0.043.750.003.75   1.08   1.30
> dm-16 0.00 0.000.009.00 0.00   104.0023.11
>  0.09   10.440.00   10.44   2.44   2.20
> dm-17 0.00 0.000.005.00 0.00   160.0064.00
>  0.024.000.004.00   4.00   2.00
> dm-18 0.00 0.000.005.00 0.0052.0020.80
>  0.035.800.005.80   3.60   1.80
> dm-19 0.00 0.000.00   10.00 0.00   104.0020.80
>  0.087.900.007.90   2.10   2.10
> dm-20 0.00 0.000.009.00 0.00   132.0029.33
>  0.10   11.220.00   11.22   2.56   2.30
> dm-21 0.00 0.000.006.00 0.0068.0022.67
>  0.07   12.330.00   12.33   3.83   2.30
> dm-22 0.00 0.000.003.00 0.0020.0013.33
>  0.013.670.003.67   3.67   1.10
> dm-24 0.00 0.000.004.00 0.0024.0012.00
>  0.07   18.000.00   18.00   5.25   2.10
> dm-25 0.00 0.000.006.00 0.0064.0021.33
>  0.06   10.330.00   10.33   3.67   2.20
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
> avgqu-sz   await r_await w_await  svctm  %util
> dm-14 0.00 0.000.005.00 0.00   140.0056.00
>  0.08   15.200.00   15.20   5.40   2.70
> dm-15 0.00 0.000.006.00 0.00   236.0078.67
>  0.18   30.670.00   30.67   6.83   4.10
> dm-16 0.00 0.000.008.00 0.0084.0021.00
>  0.067.250.007.25   1.62   1.30
> dm-17 0.00 0.000.003.00 0.0084.0056.00
>  0.000.330.000.33   0.33   0.10
> dm-18 0.00 0.000.002.00 0.0020.0020.00
>  0.02   12.000.00   12.00  12.00   2.40
> dm-19 0.00 0.000.00   12.00 0.0080.0013.33
>  0.054.000.004.00   2.33   2.80
> dm-20 0.00 0.000.00   16.00 0.00   256.0032.00
>  0.000.060.000.06   0.06   0.10
> dm-21 0.00 0.000.008.00 0.00   500.00   125.00
>  0.000.120.000.12   0.12   0.10
> dm-22 0.00 0.000.001.00 0.00 8.0016.00
>  0.000.000.000.00   0.00   0.00
> dm-24 0.00 0.000.000.00 0.00 0.00 0.00
>  0.000.000.000.00   0.00   0.00
> dm-25 0.00 0.000.002.00 0.0032.0032.00
>  0.08   40.000.00   40.00  20.50   4.10
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
> avgqu-sz   await r_await w_await  svctm  %util
> dm-14 0.00

Re: [ceph-users] Huge latency spikes

2018-11-17 Thread Alex Litvak

Here is another snapshot.  I wonder if this write io wait is too big
Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
dm-14 0.00 0.000.00   23.00 0.00   336.0029.22 
0.34   14.740.00   14.74   2.87   6.60
dm-15 0.00 0.000.00   16.00 0.00   200.0025.00 
0.010.750.000.75   0.75   1.20
dm-16 0.00 0.000.00   17.00 0.00   276.0032.47 
0.25   14.940.00   14.94   3.35   5.70
dm-17 0.00 0.000.00   17.00 0.00   252.0029.65 
0.32   18.650.00   18.65   4.00   6.80
dm-18 0.00 0.000.00   15.00 0.00   152.0020.27 
0.25   16.800.00   16.80   4.07   6.10
dm-19 0.00 0.000.00   13.00 0.00   152.0023.38 
0.21   15.920.00   15.92   4.85   6.30
dm-20 0.00 0.000.00   20.00 0.00   248.0024.80 
0.27   13.600.00   13.60   3.25   6.50
dm-21 0.00 0.000.00   17.00 0.00   188.0022.12 
0.27   16.000.00   16.00   3.59   6.10
dm-22 0.00 0.000.00   20.00 0.00   156.0015.60 
0.115.550.005.55   2.95   5.90
dm-24 0.00 0.000.008.00 0.0056.0014.00 
0.12   14.620.00   14.62   4.75   3.80
dm-25 0.00 0.000.00   19.00 0.00   200.0021.05 
0.21   10.890.00   10.89   2.74   5.20

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
dm-14 0.00 0.000.00   11.00 0.00   136.0024.73 
0.119.730.009.73   1.82   2.00
dm-15 0.00 0.000.00   12.00 0.00   136.0022.67 
0.043.750.003.75   1.08   1.30
dm-16 0.00 0.000.009.00 0.00   104.0023.11 
0.09   10.440.00   10.44   2.44   2.20
dm-17 0.00 0.000.005.00 0.00   160.0064.00 
0.024.000.004.00   4.00   2.00
dm-18 0.00 0.000.005.00 0.0052.0020.80 
0.035.800.005.80   3.60   1.80
dm-19 0.00 0.000.00   10.00 0.00   104.0020.80 
0.087.900.007.90   2.10   2.10
dm-20 0.00 0.000.009.00 0.00   132.0029.33 
0.10   11.220.00   11.22   2.56   2.30
dm-21 0.00 0.000.006.00 0.0068.0022.67 
0.07   12.330.00   12.33   3.83   2.30
dm-22 0.00 0.000.003.00 0.0020.0013.33 
0.013.670.003.67   3.67   1.10
dm-24 0.00 0.000.004.00 0.0024.0012.00 
0.07   18.000.00   18.00   5.25   2.10
dm-25 0.00 0.000.006.00 0.0064.0021.33 
0.06   10.330.00   10.33   3.67   2.20

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
dm-14 0.00 0.000.005.00 0.00   140.0056.00 
0.08   15.200.00   15.20   5.40   2.70
dm-15 0.00 0.000.006.00 0.00   236.0078.67 
0.18   30.670.00   30.67   6.83   4.10
dm-16 0.00 0.000.008.00 0.0084.0021.00 
0.067.250.007.25   1.62   1.30
dm-17 0.00 0.000.003.00 0.0084.0056.00 
0.000.330.000.33   0.33   0.10
dm-18 0.00 0.000.002.00 0.0020.0020.00 
0.02   12.000.00   12.00  12.00   2.40
dm-19 0.00 0.000.00   12.00 0.0080.0013.33 
0.054.000.004.00   2.33   2.80
dm-20 0.00 0.000.00   16.00 0.00   256.0032.00 
0.000.060.000.06   0.06   0.10
dm-21 0.00 0.000.008.00 0.00   500.00   125.00 
0.000.120.000.12   0.12   0.10
dm-22 0.00 0.000.001.00 0.00 8.0016.00 
0.000.000.000.00   0.00   0.00
dm-24 0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
dm-25 0.00 0.000.002.00 0.0032.0032.00 
0.08   40.000.00   40.00  20.50   4.10

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
dm-14 0.00 0.000.00   10.00 0.00   108.0021.60 
0.11   10.800.00   10.80   1.90   1.90
dm-15 0.00 0.000.005.00 0.0060.0024.00 
0.036.200.006.20   3.40   1.70
dm-16 0.00 0.000.006.00 0.0068.0022.67 
0.000.170.000.17   0.17   0.10
dm-17 

Re: [ceph-users] Huge latency spikes

2018-11-17 Thread Alex Litvak

I stand corrected, I looked at the device iostat, but it was partitioned.  Here 
is a more correct picture of what is going on now.

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
dm-14 0.00 0.000.00   19.00 0.00  4116.00   433.26 
0.010.680.000.68   0.05   0.10
dm-15 0.00 0.000.00   35.00 0.00  8224.00   469.94 
0.030.860.000.86   0.06   0.20
dm-16 0.00 0.000.00   53.00 0.00 12428.00   468.98 
0.112.040.002.04   0.17   0.90
dm-17 0.00 0.000.00   43.00 0.00  8344.00   388.09 
0.092.140.002.14   0.42   1.80
dm-18 0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
dm-19 0.00 0.000.00   75.00 0.00 16824.00   448.64 
0.081.110.001.11   0.08   0.60
dm-20 0.00 0.000.00   70.00 0.00 16452.00   470.06 
0.060.900.000.90   0.09   0.60
dm-21 0.00 0.000.00   18.00 0.00  4112.00   456.89 
0.021.000.001.00   0.11   0.20
dm-22 0.00 0.000.00   53.00 0.00 12324.00   465.06 
0.060.700.000.70   0.08   0.40
dm-24 0.00 0.000.00   18.00 0.00  4272.00   474.67 
0.021.060.001.06   0.17   0.30
dm-25 0.00 0.000.00   74.00 0.00 16916.00   457.19 
0.091.260.001.26   0.18   1.30

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
dm-14 0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
dm-15 0.00 0.000.00   17.00 0.00  4108.00   483.29 
0.021.000.001.00   0.06   0.10
dm-16 0.00 0.000.00   34.00 0.00  8208.00   482.82 
0.031.000.001.00   0.06   0.20
dm-17 0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
dm-18 0.00 0.000.00   36.00 0.00  8220.00   456.67 
0.051.330.001.33   0.08   0.30
dm-19 0.00 0.000.001.00 0.00 8.0016.00 
0.000.000.000.00   0.00   0.00
dm-20 0.00 0.000.00   36.00 0.00  8288.00   460.44 
0.051.420.001.42   0.08   0.30
dm-21 0.00 0.000.00   34.00 0.00  8208.00   482.82 
0.031.000.001.00   0.06   0.20
dm-22 0.00 0.000.00   18.00 0.00  4128.00   458.67 
0.043.220.003.22   0.17   0.30
dm-24 0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
dm-25 0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
dm-14 0.00 0.000.00   20.00 0.00  4032.00   403.20 
0.000.000.000.00   0.00   0.00
dm-15 0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
dm-16 0.00 0.000.001.00 0.0020.0040.00 
0.000.000.000.00   0.00   0.00
dm-17 0.00 0.000.004.00 0.0028.0014.00 
0.000.000.000.00   0.00   0.00
dm-18 0.00 0.000.003.00 0.0036.0024.00 
0.000.000.000.00   0.00   0.00
dm-19 0.00 0.000.002.00 0.0020.0020.00 
0.012.500.002.50   2.50   0.50
dm-20 0.00 0.000.006.00 0.0096.0032.00 
0.023.330.003.33   2.00   1.20
dm-21 0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
dm-22 0.00 0.000.002.00 0.0032.0032.00 
0.000.000.000.00   0.00   0.00
dm-24 0.00 0.000.00   22.00 0.00  4184.00   380.36 
0.104.590.004.59   0.95   2.10
dm-25 0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
dm-14 0.00 0.000.008.00 0.00  1928.00   482.00 
0.011.000.001.00   0.12   0.10
dm-15 0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
dm-16 0.00 0.000.003.00 0.00   

Re: [ceph-users] Huge latency spikes

2018-11-17 Thread John Petrini
The iostat isn't very helpful because there are not many writes. I'd
recommend disabling cstates entirely, not sure it's your problem but it's
good practice and if your cluster goes as idle as your iostat suggests it
could be the culprit.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Use SSDs for metadata or for a pool cache?

2018-11-17 Thread Gesiel Galvão Bernardes
Hello,
I am building a new cluster with 4 hosts, which have the following
configuration:

128Gb RAM
12 HDs SATA 8TB 7.2k rpm
2 SSDs 240Gb
2x10GB Network

I will use the cluster to store RBD images of VMs, I thought to use with 2x
replica, if it does not get too slow.

My question is: Using bluestore (default since Luminous, right?), should I
use the SSDs as a "cache pool" or use the SSDs to store the bluestore
metadata? Or could I use 1 SSD for metadata and another for a "cache pool"?

Thank you in advance for your opinions.

Gesiel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Huge latency spikes

2018-11-17 Thread Alex Litvak

Plot thickens:

I checked c-states and apparently I am operating in c1 with all CPUS on.  
Apparently servers were tuned to use latency-performance

 tuned-adm active
Current active profile: latency-performance

turbostat shows
 PackageCore CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz SMI  CPU%c1  
CPU%c3  CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt 
RAMWatt   PKG_%   RAM_%
   -   -   -  220.8426002400   0   99.16
0.000.000.00  49  580.000.000.000.00   69.51   
17.290.000.00
   0   0   0  391.5226002400   0   98.48
0.000.000.00  48  580.000.000.000.00   36.30
8.730.000.00
   0   0  12  150.5626002400   0   99.44
   0   1   2  471.8126002400   0   98.19
0.000.000.00  49
   0   1  14  170.6626002400   0   99.34
   0   2   4  311.2026002400   0   98.80
0.000.000.00  47
   0   2  16  180.7126002400   0   99.29
   0   3   6  311.2126002400   0   98.79
0.000.000.00  49
   0   3  18  391.5026002400   0   98.50
   0   4   8  331.2726002400   0   98.73
0.000.000.00  46
   0   4  20  170.6426002400   0   99.36
   0   5  10  321.2326002400   0   98.77
0.000.000.00  48
   0   5  22  200.7626002400   0   99.24
   1   0   1  250.9526002400   0   99.05
0.000.000.00  44  520.000.000.000.00   33.21
8.560.000.00
   1   0  13   90.3426002400   0   99.66
   1   1   3   90.3526002400   0   99.65
0.000.000.00  42
   1   1  15  110.4226002400   0   99.58
   1   2   5  301.1726002400   0   98.83
0.000.000.00  46
   1   2  17   70.2826002400   0   99.72
   1   3   7  100.4026002400   0   99.60
0.000.000.00  44
   1   3  19  100.3726002400   0   99.63
   1   4   9   90.3626002400   0   99.64
0.000.000.00  45
   1   4  21   70.2726002400   0   99.73
   1   5  11  120.4526002400   0   99.55
0.000.000.00  45
   1   5  23  461.7626002400   0   98.24

iostat for ssd shows

# iostat -xd -p sdb 1 1000

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sdb   0.00 0.000.05   26.78 0.20  2299.53   171.42 
0.020.640.110.64   0.08   0.20

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sdb   0.00 0.000.00   16.00 0.00   392.0049.00 
0.000.060.000.06   0.06   0.10

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sdb   0.00 0.000.00   74.00 0.00   880.0023.78 
0.000.000.000.00   0.00   0.00

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sdb   0.00 0.000.00   56.00 0.00   240.00 8.57 
0.000.000.000.00   0.00   0.00

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sdb   0.00 0.000.00   44.00 0.00   676.0030.73 
0.000.070.000.07   0.05   0.20

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sdb   0.00 0.000.00   10.00 0.0092.0018.40 
0.000.000.000.00   0.00   0.00

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sdb   0.00 0.000.006.00 0.0084.0028.00 
0.000.000.000.00   0.00   0.00

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sdb   0.00 0.000.001.00 0.0020.0040.00 
0.000.000.000.00   0.00   0.00

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   

Re: [ceph-users] Huge latency spikes

2018-11-17 Thread John Petrini
You can check if cstates are enabled with cat /proc/acpi/processor/info.
Look for power management: yes/no.

If they are enabled then you can check the current cstate of each core. 0
is the CPU's normal operating range, any other state means the processor is
in a power saving mode. cat /proc/acpi/processor/CPU?/power.

cstates are configured in the bios so a reboot is required to change them.
I know with Dell servers you can trigger the change with omconfig and then
issue a reboot for it to take effect. Otherwise you'll need to disable it
directly in the bios.

As for the SSD's I would just run iostat and check the iowait. If you see
small disk writes causing high iowait then your SSD's are probably at the
end of their life. Ceph journaling is good at destroying SSD's.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Huge latency spikes

2018-11-17 Thread Alex Litvak

John,

Thank you for suggestions:

I looked into journal SSDs.  It is close to 3 years old showing 5.17% of 
wear (352941GB Written to disk with 3.6 PB endurance specs over 5 years)


It could be that smart not telling all but that it what I see.

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE 
UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   000Old_age 
Always   -   0
  9 Power_On_Hours  0x0032   100   100   000Old_age 
Always   -   29054
 12 Power_Cycle_Count   0x0032   100   100   000Old_age 
Always   -   4
170 Available_Reservd_Space 0x0033   100   100   010Pre-fail  Always 
  -   0
171 Program_Fail_Count  0x0032   100   100   000Old_age   Always 
  -   0
172 Erase_Fail_Count0x0032   100   100   000Old_age   Always 
  -   0
174 Unsafe_Shutdown_Count   0x0032   100   100   000Old_age   Always 
  -   3
175 Power_Loss_Cap_Test 0x0033   100   100   010Pre-fail  Always 
  -   5130 (117 3127)
183 SATA_Downshift_Count0x0032   100   100   000Old_age   Always 
  -   0
184 End-to-End_Error0x0033   100   100   090Pre-fail  Always 
  -   0
187 Reported_Uncorrect  0x0032   100   100   000Old_age   Always 
  -   0
190 Temperature_Case0x0022   074   064   000Old_age   Always 
  -   26 (Min/Max 23/36)
192 Unsafe_Shutdown_Count   0x0032   100   100   000Old_age   Always 
  -   3
194 Temperature_Internal0x0022   100   100   000Old_age   Always 
  -   26
197 Current_Pending_Sector  0x0012   100   100   000Old_age   Always 
  -   0
199 CRC_Error_Count 0x003e   100   100   000Old_age   Always 
  -   0
225 Host_Writes_32MiB   0x0032   100   100   000Old_age   Always 
  -   10518704
226 Workld_Media_Wear_Indic 0x0032   100   100   000Old_age   Always 
  -   5304
227 Workld_Host_Reads_Perc  0x0032   100   100   000Old_age   Always 
  -   0
228 Workload_Minutes0x0032   100   100   000Old_age   Always 
  -   1743266
232 Available_Reservd_Space 0x0033   100   100   010Pre-fail  Always 
  -   0
233 Media_Wearout_Indicator 0x0032   095   095   000Old_age   Always 
  -   0
234 Thermal_Throttle0x0032   100   100   000Old_age   Always 
  -   0/0
241 Host_Writes_32MiB   0x0032   100   100   000Old_age   Always 
  -   10518704
242 Host_Reads_32MiB0x0032   100   100   000Old_age   Always 
  -   6034


SMART Error Log Version: 1
No Errors Logged

How do you look at cstates?

On 11/17/2018 2:37 PM, John Petrini wrote:

I'd take a look at cstates if it's only happening during periods of
low activity. If your journals are on SSD you should also check their
health. They may have exceeded their write endurance - high apply
latency is a tell tale sign of this and you'd see high iowait on those
disks.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG auto repair with BlueStore

2018-11-17 Thread Paul Emmerich
> Wido den Hollander :
> On 11/15/18 7:51 PM, koukou73gr wrote:
> > Are there any means to notify the administrator that an auto-repair has
> > taken place?
>
> I don't think so. You'll see the cluster go to HEALTH_ERR for a while
> before it turns to HEALTH_OK again after the PG has been repaired.

and I think even this is too much. No point in triggering a monitoring
system in the middle of the night when the scrubs are running just
because of some bit rot on a disk. Losing a few bits on disks here and
there is a perfectly normal and expected scenario that Ceph can take
care of all by itself without triggering an health *error*. It
certainly doesn't require immediate attention (with auto repair
enabled) like the error state indicates.
The message in the cluster log should be enough.

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

Am Fr., 16. Nov. 2018 um 08:26 Uhr schrieb Wido den Hollander :
>
>
>
> On 11/15/18 7:51 PM, koukou73gr wrote:
> > Are there any means to notify the administrator that an auto-repair has
> > taken place?
>
> I don't think so. You'll see the cluster go to HEALTH_ERR for a while
> before it turns to HEALTH_OK again after the PG has been repaired.
>
> You would have to search the cluster logs to find out that a auto repair
> took place on a Placement Group.
>
> Wido
>
> >
> > -K.
> >
> >
> > On 2018-11-15 20:45, Mark Schouten wrote:
> >> As a user, I’m very surprised that this isn’t a default setting.
> >>
> >> Mark Schouten
> >>
> >>> Op 15 nov. 2018 om 18:40 heeft Wido den Hollander  het
> >>> volgende geschreven:
> >>>
> >>> Hi,
> >>>
> >>> This question is actually still outstanding. Is there any good reason to
> >>> keep auto repair for scrub errors disabled with BlueStore?
> >>>
> >>> I couldn't think of a reason when using size=3 and min_size=2, so just
> >>> wondering.
> >>>
> >>> Thanks!
> >>>
> >>> Wido
> >>>
>  On 8/24/18 8:55 AM, Wido den Hollander wrote:
>  Hi,
> 
>  osd_scrub_auto_repair still defaults to false and I was wondering
>  how we
>  think about enabling this feature by default.
> 
>  Would we say it's safe to enable this with BlueStore?
> 
>  Wido
>  ___
>  ceph-users mailing list
>  ceph-users@lists.ceph.com
>  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Huge latency spikes

2018-11-17 Thread John Petrini
I'd take a look at cstates if it's only happening during periods of
low activity. If your journals are on SSD you should also check their
health. They may have exceeded their write endurance - high apply
latency is a tell tale sign of this and you'd see high iowait on those
disks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Huge latency spikes

2018-11-17 Thread Alex Litvak
I am evaluating bluestore on the separate cluster.  Unfortunately 
upgrading this one is out of the question at the moment for multiple 
reasons.  That is why I am trying to find a possible root cause.


On 11/17/2018 2:14 PM, Paul Emmerich wrote:

Are you running FileStore? (The config options you are using looks
like a FileStore config)
Try out BlueStore, we've found that it reduces random latency spikes
due to filesystem weirdness a lot.



Paul




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Huge latency spikes

2018-11-17 Thread Paul Emmerich
Are you running FileStore? (The config options you are using looks
like a FileStore config)
Try out BlueStore, we've found that it reduces random latency spikes
due to filesystem weirdness a lot.



Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

Am Sa., 17. Nov. 2018 um 21:07 Uhr schrieb Alex Litvak
:
>
> I am using libvirt for block device (openstack, proxmox KVM VMs)
> Also I am mounting cephfs inside of VMs and on bare metal hosts.  In
> this case it would be a kernel based client.
>
>  From what I can see based on pool stats cephfs pools have higher
> utilization comparing to block pools during the spikes, how ever it is
> still small.
>
> On 11/17/2018 1:40 PM, Kees Meijs wrote:
> > Hi Alex,
> >
> > What kind of clients do you use? Is it KVM (QEMU) using NBD driver,
> > kernel, or...?
> >
> > Regards,
> > Kees
> >
> > On 17-11-18 20:17, Alex Litvak wrote:
> >> Hello everyone,
> >>
> >> I am trying to troubleshoot cluster exhibiting huge spikes of latency.
> >> I cannot quite catch it because it happens during the light activity
> >> and randomly affects one osd node out of 3 in the pool.
> >>
> >> This is a file store.
> >> I see some osds exhibit applied latency  of 400 ms, 1 minute load
> >> average shuts to 60.  Client commit latency with queue shoots to 300ms
> >> and journal latency (return write ack for client) (journal on Intel
> >> DC-S3710 SSD) shoots on 40 ms
> >>
> >> op_w_process_latency showed 250 ms and client read-modify-write
> >> operation readable/applied latency jumped to 1.25 s on one of the OSDs
> >>
> >> I rescheduled the scrubbing and deep scrubbing and was watching ceph
> >> -w activity so it is definitely not related.
> >>
> >> At the same time node shows 98 % cpu idle no significant changes in
> >> memory utilization, no errors on network with bandwidth utilization
> >> between 20 - 50 Mbit on client and back end networks
> >>
> >> OSD node has 12 OSDs (2TB rust) 2 partitioned SSD journal disks, 32 GB
> >> RAM, dial 6 core / 12 thread CPUs
> >>
> >> This is perhaps the most relevant part of ceph config
> >>
> >> debug lockdep = 0/0
> >> debug context = 0/0
> >> debug crush = 0/0
> >> debug buffer = 0/0
> >> debug timer = 0/0
> >> debug journaler = 0/0
> >> debug osd = 0/0
> >> debug optracker = 0/0
> >> debug objclass = 0/0
> >> debug filestore = 0/0
> >> debug journal = 0/0
> >> debug ms = 0/0
> >> debug monc = 0/0
> >> debug tp = 0/0
> >> debug auth = 0/0
> >> debug finisher = 0/0
> >> debug heartbeatmap = 0/0
> >> debug perfcounter = 0/0
> >> debug asok = 0/0
> >> debug throttle = 0/0
> >>
> >> [osd]
> >>  journal_dio = true
> >>  journal_aio = true
> >>  osd_journal = /var/lib/ceph/osd/$cluster-$id-journal/journal
> >>  osd_journal_size = 2048 ; journal size, in megabytes
> >>  osd crush update on start = false
> >>  osd mount options xfs =
> >> "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"
> >>  osd_op_threads = 5
> >>  osd_disk_threads = 4
> >>  osd_pool_default_size = 2
> >>  osd_pool_default_min_size = 1
> >>  osd_pool_default_pg_num = 512
> >>  osd_pool_default_pgp_num = 512
> >>  osd_crush_chooseleaf_type = 1
> >>  ; osd pool_default_crush_rule = 1
> >>  ; new options 04.12.2015
> >>  filestore_op_threads = 4
> >>  osd_op_num_threads_per_shard = 1
> >>  osd_op_num_shards = 25
> >>  filestore_fd_cache_size = 64
> >>  filestore_fd_cache_shards = 32
> >>  filestore_fiemap = false
> >>  ; Reduce impact of scrub (needs cfq on osds)
> >>  osd_disk_thread_ioprio_class = "idle"
> >>  osd_disk_thread_ioprio_priority = 7
> >>  osd_deep_scrub_interval = 1211600
> >>  osd_scrub_begin_hour = 19
> >>  osd_scrub_end_hour = 4
> >>  osd_scrub_sleep = 0.1
> >> [client]
> >>  rbd_cache = true
> >>  rbd_cache_size = 67108864
> >>  rbd_cache_max_dirty = 50331648
> >>  rbd_cache_target_dirty = 33554432
> >>  rbd_cache_max_dirty_age = 2
> >>  rbd_cache_writethrough_until_flush = true
> >>
> >> OSD logs and system log at that time show nothing interesting.
> >>
> >> Any clue of what to look for in order to diagnose the load / latency
> >> spikes would be really appreciated.
> >>
> >> Thank you
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Huge latency spikes

2018-11-17 Thread Alex Litvak

I am using libvirt for block device (openstack, proxmox KVM VMs)
Also I am mounting cephfs inside of VMs and on bare metal hosts.  In 
this case it would be a kernel based client.


From what I can see based on pool stats cephfs pools have higher 
utilization comparing to block pools during the spikes, how ever it is 
still small.


On 11/17/2018 1:40 PM, Kees Meijs wrote:

Hi Alex,

What kind of clients do you use? Is it KVM (QEMU) using NBD driver,
kernel, or...?

Regards,
Kees

On 17-11-18 20:17, Alex Litvak wrote:

Hello everyone,

I am trying to troubleshoot cluster exhibiting huge spikes of latency.
I cannot quite catch it because it happens during the light activity
and randomly affects one osd node out of 3 in the pool.

This is a file store.
I see some osds exhibit applied latency  of 400 ms, 1 minute load
average shuts to 60.  Client commit latency with queue shoots to 300ms
and journal latency (return write ack for client) (journal on Intel
DC-S3710 SSD) shoots on 40 ms

op_w_process_latency showed 250 ms and client read-modify-write
operation readable/applied latency jumped to 1.25 s on one of the OSDs

I rescheduled the scrubbing and deep scrubbing and was watching ceph
-w activity so it is definitely not related.

At the same time node shows 98 % cpu idle no significant changes in
memory utilization, no errors on network with bandwidth utilization
between 20 - 50 Mbit on client and back end networks

OSD node has 12 OSDs (2TB rust) 2 partitioned SSD journal disks, 32 GB
RAM, dial 6 core / 12 thread CPUs

This is perhaps the most relevant part of ceph config

debug lockdep = 0/0
debug context = 0/0
debug crush = 0/0
debug buffer = 0/0
debug timer = 0/0
debug journaler = 0/0
debug osd = 0/0
debug optracker = 0/0
debug objclass = 0/0
debug filestore = 0/0
debug journal = 0/0
debug ms = 0/0
debug monc = 0/0
debug tp = 0/0
debug auth = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug perfcounter = 0/0
debug asok = 0/0
debug throttle = 0/0

[osd]
     journal_dio = true
     journal_aio = true
     osd_journal = /var/lib/ceph/osd/$cluster-$id-journal/journal
     osd_journal_size = 2048 ; journal size, in megabytes
 osd crush update on start = false
     osd mount options xfs =
"rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"
     osd_op_threads = 5
     osd_disk_threads = 4
     osd_pool_default_size = 2
     osd_pool_default_min_size = 1
     osd_pool_default_pg_num = 512
     osd_pool_default_pgp_num = 512
     osd_crush_chooseleaf_type = 1
     ; osd pool_default_crush_rule = 1
 ; new options 04.12.2015
 filestore_op_threads = 4
     osd_op_num_threads_per_shard = 1
     osd_op_num_shards = 25
     filestore_fd_cache_size = 64
     filestore_fd_cache_shards = 32
 filestore_fiemap = false
 ; Reduce impact of scrub (needs cfq on osds)
 osd_disk_thread_ioprio_class = "idle"
 osd_disk_thread_ioprio_priority = 7
 osd_deep_scrub_interval = 1211600
     osd_scrub_begin_hour = 19
     osd_scrub_end_hour = 4
     osd_scrub_sleep = 0.1
[client]
 rbd_cache = true
 rbd_cache_size = 67108864
 rbd_cache_max_dirty = 50331648
 rbd_cache_target_dirty = 33554432
 rbd_cache_max_dirty_age = 2
 rbd_cache_writethrough_until_flush = true

OSD logs and system log at that time show nothing interesting.

Any clue of what to look for in order to diagnose the load / latency
spikes would be really appreciated.

Thank you

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG auto repair with BlueStore

2018-11-17 Thread Paul Emmerich
While I also believe it to be perfectly safe on a bluestore cluster
(especially since there's osd_scrub_auto_repair_num_errors if there's
more wrong than your usual bit rot), we also don't run any cluster
with this option at the moment. We had it enabled for some time before
we backported the OOM-read-error stuff on some clusters.

But there's a small operational issue with auto repair at the moment:
this option will occasionally set the repair flag on a PG without any
scrub errors during scrubbing for some reason which triggers a health
error.

We've had a quick look at the code and couldn't figure out how the
repair flag gets set in some cases on perfectly healthy PGs. Does it
maybe only get set for a very short time while finishing up the scrub
and that's not always picked up in time?
Anyways, a potential work-around for this would be to maybe remove the
repair state from the conditions for the PG_DAMAGED warning?

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

Am Fr., 16. Nov. 2018 um 08:49 Uhr schrieb Mark Schouten :
>
>
> Which, as a user, is very surprising to me too..
> --
>
> Mark Schouten  | Tuxis Internet Engineering
> KvK: 61527076  | http://www.tuxis.nl/
> T: 0318 200208 | i...@tuxis.nl
>
>
>
>
> - Original Message -
>
>
> From: Wido den Hollander (w...@42on.com)
> Date: 16-11-2018 08:25
> To: Mark Schouten (m...@tuxis.nl)
> Cc: Ceph Users (ceph-us...@ceph.com)
> Subject: Re: [ceph-users] PG auto repair with BlueStore
>
>
> On 11/15/18 7:45 PM, Mark Schouten wrote:
> > As a user, I’m very surprised that this isn’t a default setting.
> >
>
> That is because you can also have FileStore OSDs in a cluster on which
> such a auto-repair is not safe.
>
> Wido
>
> > Mark Schouten
> >
> >> Op 15 nov. 2018 om 18:40 heeft Wido den Hollander  het 
> >> volgende geschreven:
> >>
> >> Hi,
> >>
> >> This question is actually still outstanding. Is there any good reason to
> >> keep auto repair for scrub errors disabled with BlueStore?
> >>
> >> I couldn't think of a reason when using size=3 and min_size=2, so just
> >> wondering.
> >>
> >> Thanks!
> >>
> >> Wido
> >>
> >>> On 8/24/18 8:55 AM, Wido den Hollander wrote:
> >>> Hi,
> >>>
> >>> osd_scrub_auto_repair still defaults to false and I was wondering how we
> >>> think about enabling this feature by default.
> >>>
> >>> Would we say it's safe to enable this with BlueStore?
> >>>
> >>> Wido
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Huge latency spikes

2018-11-17 Thread Kees Meijs
Hi Alex,

What kind of clients do you use? Is it KVM (QEMU) using NBD driver,
kernel, or...?

Regards,
Kees

On 17-11-18 20:17, Alex Litvak wrote:
> Hello everyone,
>
> I am trying to troubleshoot cluster exhibiting huge spikes of latency.
> I cannot quite catch it because it happens during the light activity
> and randomly affects one osd node out of 3 in the pool.
>
> This is a file store.
> I see some osds exhibit applied latency  of 400 ms, 1 minute load
> average shuts to 60.  Client commit latency with queue shoots to 300ms
> and journal latency (return write ack for client) (journal on Intel
> DC-S3710 SSD) shoots on 40 ms
>
> op_w_process_latency showed 250 ms and client read-modify-write
> operation readable/applied latency jumped to 1.25 s on one of the OSDs
>
> I rescheduled the scrubbing and deep scrubbing and was watching ceph
> -w activity so it is definitely not related.
>
> At the same time node shows 98 % cpu idle no significant changes in
> memory utilization, no errors on network with bandwidth utilization
> between 20 - 50 Mbit on client and back end networks
>
> OSD node has 12 OSDs (2TB rust) 2 partitioned SSD journal disks, 32 GB
> RAM, dial 6 core / 12 thread CPUs
>
> This is perhaps the most relevant part of ceph config
>
> debug lockdep = 0/0
> debug context = 0/0
> debug crush = 0/0
> debug buffer = 0/0
> debug timer = 0/0
> debug journaler = 0/0
> debug osd = 0/0
> debug optracker = 0/0
> debug objclass = 0/0
> debug filestore = 0/0
> debug journal = 0/0
> debug ms = 0/0
> debug monc = 0/0
> debug tp = 0/0
> debug auth = 0/0
> debug finisher = 0/0
> debug heartbeatmap = 0/0
> debug perfcounter = 0/0
> debug asok = 0/0
> debug throttle = 0/0
>
> [osd]
>     journal_dio = true
>     journal_aio = true
>     osd_journal = /var/lib/ceph/osd/$cluster-$id-journal/journal
>     osd_journal_size = 2048 ; journal size, in megabytes
> osd crush update on start = false
>     osd mount options xfs =
> "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"
>     osd_op_threads = 5
>     osd_disk_threads = 4
>     osd_pool_default_size = 2
>     osd_pool_default_min_size = 1
>     osd_pool_default_pg_num = 512
>     osd_pool_default_pgp_num = 512
>     osd_crush_chooseleaf_type = 1
>     ; osd pool_default_crush_rule = 1
> ; new options 04.12.2015
> filestore_op_threads = 4
>     osd_op_num_threads_per_shard = 1
>     osd_op_num_shards = 25
>     filestore_fd_cache_size = 64
>     filestore_fd_cache_shards = 32
> filestore_fiemap = false
> ; Reduce impact of scrub (needs cfq on osds)
> osd_disk_thread_ioprio_class = "idle"
> osd_disk_thread_ioprio_priority = 7
> osd_deep_scrub_interval = 1211600
>     osd_scrub_begin_hour = 19
>     osd_scrub_end_hour = 4
>     osd_scrub_sleep = 0.1
> [client]
> rbd_cache = true
> rbd_cache_size = 67108864
> rbd_cache_max_dirty = 50331648
> rbd_cache_target_dirty = 33554432
> rbd_cache_max_dirty_age = 2
> rbd_cache_writethrough_until_flush = true
>
> OSD logs and system log at that time show nothing interesting.
>
> Any clue of what to look for in order to diagnose the load / latency
> spikes would be really appreciated.
>
> Thank you
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Huge latency spikes

2018-11-17 Thread Alex Litvak

Hello everyone,

I am trying to troubleshoot cluster exhibiting huge spikes of latency. 
I cannot quite catch it because it happens during the light activity and 
randomly affects one osd node out of 3 in the pool.


This is a file store.
I see some osds exhibit applied latency  of 400 ms, 1 minute load 
average shuts to 60.  Client commit latency with queue shoots to 300ms 
and journal latency (return write ack for client) (journal on Intel 
DC-S3710 SSD) shoots on 40 ms


op_w_process_latency showed 250 ms and client read-modify-write 
operation readable/applied latency jumped to 1.25 s on one of the OSDs


I rescheduled the scrubbing and deep scrubbing and was watching ceph -w 
activity so it is definitely not related.


At the same time node shows 98 % cpu idle no significant changes in 
memory utilization, no errors on network with bandwidth utilization 
between 20 - 50 Mbit on client and back end networks


OSD node has 12 OSDs (2TB rust) 2 partitioned SSD journal disks, 32 GB 
RAM, dial 6 core / 12 thread CPUs


This is perhaps the most relevant part of ceph config

debug lockdep = 0/0
debug context = 0/0
debug crush = 0/0
debug buffer = 0/0
debug timer = 0/0
debug journaler = 0/0
debug osd = 0/0
debug optracker = 0/0
debug objclass = 0/0
debug filestore = 0/0
debug journal = 0/0
debug ms = 0/0
debug monc = 0/0
debug tp = 0/0
debug auth = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug perfcounter = 0/0
debug asok = 0/0
debug throttle = 0/0

[osd]
journal_dio = true
journal_aio = true
osd_journal = /var/lib/ceph/osd/$cluster-$id-journal/journal
osd_journal_size = 2048 ; journal size, in megabytes
osd crush update on start = false
osd mount options xfs = 
"rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"

osd_op_threads = 5
osd_disk_threads = 4
osd_pool_default_size = 2
osd_pool_default_min_size = 1
osd_pool_default_pg_num = 512
osd_pool_default_pgp_num = 512
osd_crush_chooseleaf_type = 1
; osd pool_default_crush_rule = 1
; new options 04.12.2015
filestore_op_threads = 4
osd_op_num_threads_per_shard = 1
osd_op_num_shards = 25
filestore_fd_cache_size = 64
filestore_fd_cache_shards = 32
filestore_fiemap = false
; Reduce impact of scrub (needs cfq on osds)
osd_disk_thread_ioprio_class = "idle"
osd_disk_thread_ioprio_priority = 7
osd_deep_scrub_interval = 1211600
osd_scrub_begin_hour = 19
osd_scrub_end_hour = 4
osd_scrub_sleep = 0.1
[client]
rbd_cache = true
rbd_cache_size = 67108864
rbd_cache_max_dirty = 50331648
rbd_cache_target_dirty = 33554432
rbd_cache_max_dirty_age = 2
rbd_cache_writethrough_until_flush = true

OSD logs and system log at that time show nothing interesting.

Any clue of what to look for in order to diagnose the load / latency 
spikes would be really appreciated.


Thank you

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com