Re: [ceph-users] Cluster hang (deep scrub bug? "waiting for scrub")

2017-11-13 Thread Matteo Dacrema
I’ve seen that only one time and noticed that there’s a bug fixed in 10.2.10 ( http://tracker.ceph.com/issues/20041 ) Yes I use snapshots. As I can see in my case the PG was scrubbing since 20 days but I’ve only 7 days logs so I’m not able to identify

Re: [ceph-users] Cluster hang (deep scrub bug? "waiting for scrub")

2017-11-10 Thread Peter Maloney
I have often seen a problem where a single osd in an eternal deep scrup will hang any client trying to connect. Stopping or restarting that single OSD fixes the problem. Do you use snapshots? Here's what the scrub bug looks like (where that many seconds is 14 hours): > ceph daemon

Re: [ceph-users] Cluster hang

2017-11-09 Thread Matteo Dacrema
Update: I noticed that there was a pg that remained scrubbing from the first day I found the issue to when I reboot the node and problem disappeared. Can this cause the behaviour I described before? > Il giorno 09 nov 2017, alle ore 15:55, Matteo Dacrema ha > scritto: > >

[ceph-users] Cluster hang

2017-11-09 Thread Matteo Dacrema
Hi all, I’ve experienced a strange issue with my cluster. The cluster is composed by 10 HDDs nodes with 20 nodes + 4 journal each plus 4 SSDs nodes with 5 SSDs each. All the nodes are behind 3 monitors and 2 different crush maps. All the cluster is on 10.2.7 About 20 days ago I started to