Re: [ceph-users] Luminous with osd flapping, slow requests when deep scrubbing

2018-10-16 Thread Christian Balzer


Hello,
 
On Tue, 16 Oct 2018 14:09:23 +0100 (BST) Andrei Mikhailovsky wrote:

> Hi Christian,
> 
> 
> - Original Message -
> > From: "Christian Balzer" 
> > To: "ceph-users" 
> > Cc: "Andrei Mikhailovsky" 
> > Sent: Tuesday, 16 October, 2018 08:51:36
> > Subject: Re: [ceph-users] Luminous with osd flapping, slow requests when 
> > deep scrubbing  
> 
> > Hello,
> > 
> > On Mon, 15 Oct 2018 12:26:50 +0100 (BST) Andrei Mikhailovsky wrote:
> >   
> >> Hello,
> >> 
> >> I am currently running Luminous 12.2.8 on Ubuntu with 4.15.0-36-generic 
> >> kernel
> >> from the official ubuntu repo. The cluster has 4 mon + osd servers. Each 
> >> osd
> >> server has the total of 9 spinning osds and 1 ssd for the hdd and ssd 
> >> pools.
> >> The hdds are backed by the S3710 ssds for journaling with a ration of 1:5. 
> >> The
> >> ssd pool osds are not using external journals. Ceph is used as a Primary
> >> storage for Cloudstack - all vm disk images are stored on the cluster.
> >>  
> > 
> > For the record, are you seeing the flapping only on HDD pools or with SSD
> > pools as well?
> >   
> 
> 
> I think so, this tend to happen to the HDD pool.
>
Fits the picture and expectations.
 
> 
> 
> > When migrating to Bluestore, did you see this starting to happen before
> > the migration was complete (and just on Bluestore OSDs of course)?
> >   
> 
> 
> Nope, not that I can recall. I did have some issues with performance 
> initially, but I've added a few temp disks to the servers to help with the 
> free space. The cluster was well unhappy when the usage spiked above 90% on 
> some of the osds. After the temp disks were in place, the cluster was back to 
> being a happy.
>
Well, that's never a good state indeed. 

> 
> 
> > What's your HW like, in particular RAM? Current output of "free"?  
> 
> Each of the mon/osd servers has 64GB of ram. Currently, one of the server's 
> mem usage is (it has been restarted 30 mins ago):
> 
> root@arh-ibstorage4-ib:/home/andrei# free -h
>   totalusedfree  shared  buff/cache   
> available
> Mem:62G 11G 50G 10M575M 
> 49G
> Swap:   45G  0B 45G
>
Something with a little more uptime would be more relevant, but at 64GB
and 10 OSDs you'll never use even close to the caching that you had with
filestore when running with default settings.

> 
> The servers with 24 hours uptime have a similar picture, but a slightly 
> larger used amount.
> 
But still nowhere near half let alone near all, right?

> > 
> > If you didn't tune your bluestore cache you're likely just using a
> > fraction of the RAM for caching, making things a LOT harder for OSDs to
> > keep up when compared to filestore and the global (per node) page cache.
> >   
> 
> I haven't done any bluestore cache changes at all after moving to the 
> bluestore type. Could you please point me in the right direction?
> 
Google is your friend, "bluestore cache" finds this as first hit in this
ML, read this thread and the referred documentation and other threads.
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-August/029449.html

> 
> > See the various bluestore cache threads here, one quite recently.
> > 
> > If your cluster was close to the brink with filestore just moving it to
> > bluestore would nicely fit into what you're seeing, especially for the
> > high stress and cache bypassing bluestore deep scrubbing.
> >   
> 
> 
> I have put in place the following config settings in the [global] section:
> 
> 
> # Settings to try to minimise IO client impact / slow requests / osd flapping 
> from scrubbing and snap trimming
> osd_scrub_chunk_min = 1
> osd_scrub_chunk_max = 5
> #osd_scrub_begin_hour = 21
> #osd_scrub_end_hour = 5
> osd_scrub_sleep = 0.1
> osd_scrub_max_interval = 1209600
> osd_scrub_min_interval = 259200
> osd_deep_scrub_interval = 1209600
> osd_deep_scrub_stride = 1048576
> osd_scrub_priority = 1
> osd_snap_trim_priority = 1
> 
> 
> Following the restart of the servers and doing a few tests by manually 
> invoking 6 deep scrubbing processes I haven't seen any more issues with osd 
> flapping or the slow requests. I will keep an eye on it over the next few 
> weeks to see if the issue is resolved.
>
Yes, tuning deep scrubs way down is an obvious way forward and with
bluestore they're less relevant to begin with.
Also note that AFAIK with bluestore deep scrub will bypass all ca

Re: [ceph-users] Luminous with osd flapping, slow requests when deep scrubbing

2018-10-16 Thread Andrei Mikhailovsky
Hi Christian,


- Original Message -
> From: "Christian Balzer" 
> To: "ceph-users" 
> Cc: "Andrei Mikhailovsky" 
> Sent: Tuesday, 16 October, 2018 08:51:36
> Subject: Re: [ceph-users] Luminous with osd flapping, slow requests when deep 
> scrubbing

> Hello,
> 
> On Mon, 15 Oct 2018 12:26:50 +0100 (BST) Andrei Mikhailovsky wrote:
> 
>> Hello,
>> 
>> I am currently running Luminous 12.2.8 on Ubuntu with 4.15.0-36-generic 
>> kernel
>> from the official ubuntu repo. The cluster has 4 mon + osd servers. Each osd
>> server has the total of 9 spinning osds and 1 ssd for the hdd and ssd pools.
>> The hdds are backed by the S3710 ssds for journaling with a ration of 1:5. 
>> The
>> ssd pool osds are not using external journals. Ceph is used as a Primary
>> storage for Cloudstack - all vm disk images are stored on the cluster.
>>
> 
> For the record, are you seeing the flapping only on HDD pools or with SSD
> pools as well?
> 


I think so, this tend to happen to the HDD pool.



> When migrating to Bluestore, did you see this starting to happen before
> the migration was complete (and just on Bluestore OSDs of course)?
> 


Nope, not that I can recall. I did have some issues with performance initially, 
but I've added a few temp disks to the servers to help with the free space. The 
cluster was well unhappy when the usage spiked above 90% on some of the osds. 
After the temp disks were in place, the cluster was back to being a happy.



> What's your HW like, in particular RAM? Current output of "free"?

Each of the mon/osd servers has 64GB of ram. Currently, one of the server's mem 
usage is (it has been restarted 30 mins ago):

root@arh-ibstorage4-ib:/home/andrei# free -h
  totalusedfree  shared  buff/cache   available
Mem:62G 11G 50G 10M575M 49G
Swap:   45G  0B 45G


The servers with 24 hours uptime have a similar picture, but a slightly larger 
used amount.

> 
> If you didn't tune your bluestore cache you're likely just using a
> fraction of the RAM for caching, making things a LOT harder for OSDs to
> keep up when compared to filestore and the global (per node) page cache.
> 

I haven't done any bluestore cache changes at all after moving to the bluestore 
type. Could you please point me in the right direction?


> See the various bluestore cache threads here, one quite recently.
> 
> If your cluster was close to the brink with filestore just moving it to
> bluestore would nicely fit into what you're seeing, especially for the
> high stress and cache bypassing bluestore deep scrubbing.
> 


I have put in place the following config settings in the [global] section:


# Settings to try to minimise IO client impact / slow requests / osd flapping 
from scrubbing and snap trimming
osd_scrub_chunk_min = 1
osd_scrub_chunk_max = 5
#osd_scrub_begin_hour = 21
#osd_scrub_end_hour = 5
osd_scrub_sleep = 0.1
osd_scrub_max_interval = 1209600
osd_scrub_min_interval = 259200
osd_deep_scrub_interval = 1209600
osd_deep_scrub_stride = 1048576
osd_scrub_priority = 1
osd_snap_trim_priority = 1


Following the restart of the servers and doing a few tests by manually invoking 
6 deep scrubbing processes I haven't seen any more issues with osd flapping or 
the slow requests. I will keep an eye on it over the next few weeks to see if 
the issue is resolved.



> Regards,
> 
> Christian
>> I have recently migrated all osds to the bluestore, which was a long process
>> with ups and downs, but I am happy to say that the migration is done. During
>> the migration I've disabled the scrubbing (both deep and standard). After
>> reenabling the scrubbing I have noticed the cluster started having a large
>> number of slow requests and poor client IO (to the point of vms stall for
>> minutes). Further investigation showed that the slow requests happen because 
>> of
>> the osds flapping. In a single day my logs have over 1000 entries which 
>> report
>> osd going down. This effects random osds. Disabling deep-scrubbing stabilises
>> the cluster and the osds are no longer flap and the slow requests disappear. 
>> As
>> a short term solution I've disabled the deepscurbbing, but was hoping to fix
>> the issues with your help.
>> 
>> At the moment, I am running the cluster with default settings apart from the
>> following settings:
>> 
>> [global]
>> osd_disk_thread_ioprio_priority = 7
>> osd_disk_thread_ioprio_class = idle
>> osd_recovery_op_priority = 1
>> 
>> [osd]
>> debug_ms = 0
>> debug_auth = 0
>> debug_osd = 0
>> debug_bluestore = 0
>> debug_blue

Re: [ceph-users] Luminous with osd flapping, slow requests when deep scrubbing

2018-10-16 Thread Christian Balzer


Hello,

On Mon, 15 Oct 2018 12:26:50 +0100 (BST) Andrei Mikhailovsky wrote:

> Hello, 
> 
> I am currently running Luminous 12.2.8 on Ubuntu with 4.15.0-36-generic 
> kernel from the official ubuntu repo. The cluster has 4 mon + osd servers. 
> Each osd server has the total of 9 spinning osds and 1 ssd for the hdd and 
> ssd pools. The hdds are backed by the S3710 ssds for journaling with a ration 
> of 1:5. The ssd pool osds are not using external journals. Ceph is used as a 
> Primary storage for Cloudstack - all vm disk images are stored on the 
> cluster. 
>

For the record, are you seeing the flapping only on HDD pools or with SSD
pools as well?

When migrating to Bluestore, did you see this starting to happen before
the migration was complete (and just on Bluestore OSDs of course)?

What's your HW like, in particular RAM? Current output of "free"?

If you didn't tune your bluestore cache you're likely just using a
fraction of the RAM for caching, making things a LOT harder for OSDs to
keep up when compared to filestore and the global (per node) page cache.

See the various bluestore cache threads here, one quite recently.

If your cluster was close to the brink with filestore just moving it to
bluestore would nicely fit into what you're seeing, especially for the
high stress and cache bypassing bluestore deep scrubbing.

Regards,

Christian
> I have recently migrated all osds to the bluestore, which was a long process 
> with ups and downs, but I am happy to say that the migration is done. During 
> the migration I've disabled the scrubbing (both deep and standard). After 
> reenabling the scrubbing I have noticed the cluster started having a large 
> number of slow requests and poor client IO (to the point of vms stall for 
> minutes). Further investigation showed that the slow requests happen because 
> of the osds flapping. In a single day my logs have over 1000 entries which 
> report osd going down. This effects random osds. Disabling deep-scrubbing 
> stabilises the cluster and the osds are no longer flap and the slow requests 
> disappear. As a short term solution I've disabled the deepscurbbing, but was 
> hoping to fix the issues with your help. 
> 
> At the moment, I am running the cluster with default settings apart from the 
> following settings: 
> 
> [global] 
> osd_disk_thread_ioprio_priority = 7 
> osd_disk_thread_ioprio_class = idle 
> osd_recovery_op_priority = 1 
> 
> [osd] 
> debug_ms = 0 
> debug_auth = 0 
> debug_osd = 0 
> debug_bluestore = 0 
> debug_bluefs = 0 
> debug_bdev = 0 
> debug_rocksdb = 0 
> 
> 
> Could you share experiences with deep scrubbing of bluestore osds? Are there 
> any options that I should set to make sure the osds are not flapping and the 
> client IO is still available? 
> 
> Thanks 
> 
> Andrei 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous with osd flapping, slow requests when deep scrubbing

2018-10-15 Thread Igor Fedotov

Perhaps this is the same issue as indicated here:

https://tracker.ceph.com/issues/36364


Can you check OSD iostat reports for similarities to this ticket, please?

Thanks,
Igor

On 10/15/2018 2:26 PM, Andrei Mikhailovsky wrote:

Hello,

I am currently running Luminous 12.2.8 on Ubuntu with 
4.15.0-36-generic kernel from the official ubuntu repo. The cluster 
has 4 mon + osd servers. Each osd server has the total of 9 spinning 
osds and 1 ssd for the hdd and ssd pools. The hdds are backed by the 
S3710 ssds for journaling with a ration of 1:5. The ssd pool osds are 
not using external journals. Ceph is used as a Primary storage for 
Cloudstack - all vm disk images are stored on the cluster.


I have recently migrated all osds to the bluestore, which was a long 
process with ups and downs, but I am happy to say that the migration 
is done. During the migration I've disabled the scrubbing (both deep 
and standard). After reenabling the scrubbing I have noticed the 
cluster started having a large number of slow requests and poor client 
IO (to the point of vms stall for minutes). Further investigation 
showed that the slow requests happen because of the osds flapping. In 
a single day my logs have over 1000 entries which report osd going 
down. This effects random osds. Disabling deep-scrubbing stabilises 
the cluster and the osds are no longer flap and the slow requests 
disappear. As a short term solution I've disabled the deepscurbbing, 
but was hoping to fix the issues with your help.


At the moment, I am running the cluster with default settings apart 
from the following settings:


[global]
osd_disk_thread_ioprio_priority = 7
osd_disk_thread_ioprio_class = idle
osd_recovery_op_priority = 1

[osd]
debug_ms = 0
debug_auth = 0
debug_osd = 0
debug_bluestore = 0
debug_bluefs = 0
debug_bdev = 0
debug_rocksdb = 0


Could you share experiences with deep scrubbing of bluestore osds? Are 
there any options that I should set to make sure the osds are not 
flapping and the client IO is still available?


Thanks

Andrei


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous with osd flapping, slow requests when deep scrubbing

2018-10-15 Thread Eugen Block

Hi Andrei,

we have been using the script from [1] to define the number of PGs to  
deep-scrub in parallel, we currently use MAXSCRUBS=4, you could start  
with 1 to minimize performance impacts.


And these are the scrub settings from our ceph.conf:

ceph:~ # grep scrub /etc/ceph/ceph.conf
osd_scrub_begin_hour = 0
osd_scrub_end_hour = 7
osd_scrub_sleep = 0.1
osd_deep_scrub_interval = 2419200

The osd_deep_scrub_interval is set to 4 weeks so that it doesn't  
interfere with our own interval defined by the cronjob, scrubbing a  
quarter of PGs four times a week, so that every PG has been  
deep-scrubbed within one week.


Regards,
Eugen

[1]  
https://www.formann.de/2015/05/cronjob-to-enable-timed-deep-scrubbing-in-a-ceph-cluster/



Zitat von Andrei Mikhailovsky :


Hello,

I am currently running Luminous 12.2.8 on Ubuntu with  
4.15.0-36-generic kernel from the official ubuntu repo. The cluster  
has 4 mon + osd servers. Each osd server has the total of 9 spinning  
osds and 1 ssd for the hdd and ssd pools. The hdds are backed by the  
S3710 ssds for journaling with a ration of 1:5. The ssd pool osds  
are not using external journals. Ceph is used as a Primary storage  
for Cloudstack - all vm disk images are stored on the cluster.


I have recently migrated all osds to the bluestore, which was a long  
process with ups and downs, but I am happy to say that the migration  
is done. During the migration I've disabled the scrubbing (both deep  
and standard). After reenabling the scrubbing I have noticed the  
cluster started having a large number of slow requests and poor  
client IO (to the point of vms stall for minutes). Further  
investigation showed that the slow requests happen because of the  
osds flapping. In a single day my logs have over 1000 entries which  
report osd going down. This effects random osds. Disabling  
deep-scrubbing stabilises the cluster and the osds are no longer  
flap and the slow requests disappear. As a short term solution I've  
disabled the deepscurbbing, but was hoping to fix the issues with  
your help.


At the moment, I am running the cluster with default settings apart  
from the following settings:


[global]
osd_disk_thread_ioprio_priority = 7
osd_disk_thread_ioprio_class = idle
osd_recovery_op_priority = 1

[osd]
debug_ms = 0
debug_auth = 0
debug_osd = 0
debug_bluestore = 0
debug_bluefs = 0
debug_bdev = 0
debug_rocksdb = 0


Could you share experiences with deep scrubbing of bluestore osds?  
Are there any options that I should set to make sure the osds are  
not flapping and the client IO is still available?


Thanks

Andrei




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Luminous with osd flapping, slow requests when deep scrubbing

2018-10-15 Thread Andrei Mikhailovsky
Hello, 

I am currently running Luminous 12.2.8 on Ubuntu with 4.15.0-36-generic kernel 
from the official ubuntu repo. The cluster has 4 mon + osd servers. Each osd 
server has the total of 9 spinning osds and 1 ssd for the hdd and ssd pools. 
The hdds are backed by the S3710 ssds for journaling with a ration of 1:5. The 
ssd pool osds are not using external journals. Ceph is used as a Primary 
storage for Cloudstack - all vm disk images are stored on the cluster. 

I have recently migrated all osds to the bluestore, which was a long process 
with ups and downs, but I am happy to say that the migration is done. During 
the migration I've disabled the scrubbing (both deep and standard). After 
reenabling the scrubbing I have noticed the cluster started having a large 
number of slow requests and poor client IO (to the point of vms stall for 
minutes). Further investigation showed that the slow requests happen because of 
the osds flapping. In a single day my logs have over 1000 entries which report 
osd going down. This effects random osds. Disabling deep-scrubbing stabilises 
the cluster and the osds are no longer flap and the slow requests disappear. As 
a short term solution I've disabled the deepscurbbing, but was hoping to fix 
the issues with your help. 

At the moment, I am running the cluster with default settings apart from the 
following settings: 

[global] 
osd_disk_thread_ioprio_priority = 7 
osd_disk_thread_ioprio_class = idle 
osd_recovery_op_priority = 1 

[osd] 
debug_ms = 0 
debug_auth = 0 
debug_osd = 0 
debug_bluestore = 0 
debug_bluefs = 0 
debug_bdev = 0 
debug_rocksdb = 0 


Could you share experiences with deep scrubbing of bluestore osds? Are there 
any options that I should set to make sure the osds are not flapping and the 
client IO is still available? 

Thanks 

Andrei 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com