Re: [ceph-users] Node crash, filesytem not usable

2018-05-15 Thread Webert de Souza Lima
"osd_peering_wq_threads": "2", > "osd_recovery_thread_suicide_timeout": "300", > "osd_recovery_thread_timeout": "30", > "osd_remove_thread_suicide_timeout": "36000", > "osd_remove_th

Re: [ceph-users] Node crash, filesytem not usable

2018-05-13 Thread Marc Roos
uot;, "osd_recovery_thread_suicide_timeout": "300", "osd_recovery_thread_timeout": "30", "osd_remove_thread_suicide_timeout": "36000", "osd_remove_thread_timeout": "3600", -Original Message--

Re: [ceph-users] Node crash, filesytem not usable

2018-05-11 Thread Webert de Souza Lima
This message seems to be very concerning: >mds0: Metadata damage detected but for the rest, the cluster seems still to be recovering. you could try to seep thing up with ceph tell, like: ceph tell osd.* injectargs --osd_max_backfills=10 ceph tell osd.* injectargs

Re: [ceph-users] Node crash, filesytem not usable

2018-05-11 Thread Daniel Davidson
Below id the information you were asking for.  I think they are size=2, min size=1. Dan # ceph status     cluster 7bffce86-9d7b-4bdf-a9c9-67670e68ca77 health HEALTH_ERR     140 pgs are stuck inactive for more than 300 seconds     64 pgs backfill_wait     76 pgs

Re: [ceph-users] Node crash, filesytem not usable

2018-05-11 Thread David Turner
What are some outputs of commands to show us the state of your cluster. Most notable is `ceph status` but `ceph osd tree` would be helpful. What are the size of the pools in your cluster? Are they all size=3 min_size=2? On Fri, May 11, 2018 at 12:05 PM Daniel Davidson

[ceph-users] Node crash, filesytem not usable

2018-05-11 Thread Daniel Davidson
Hello, Today we had a node crash, and looking at it, it seems there is a problem with the RAID controller, so it is not coming back up, maybe ever.  It corrupted the local filesytem for the ceph storage there. The remainder of our storage (10.2.10) cluster is running, and it looks to be