Re: [ceph-users] HELP ! Cluster unusable with lots of "hitsuicidetimeout"

2016-10-19 Thread Yoann Moulin
Hello, >>> We have a cluster in Jewel 10.2.2 under ubuntu 16.04. The cluster is >>> compose by 12 nodes, each nodes have 10 OSD with journal on disk. >>> >>> We have one rbd partition and a radosGW with 2 data pool, one replicated, >>> one EC (8+2) >>> >>> in attachment few details on our

Re: [ceph-users] HELP ! Cluster unusable with lots of "hitsuicidetimeout"

2016-10-19 Thread Burkhard Linke
Hi, just an additional comment: you can disable backfilling and recovery temporarily by setting the 'nobackfill' and 'norecover' flags. It will reduce the backfilling traffic and may help the cluster and its OSD to recover. Afterwards you should set the backfill traffic settings to the