[ceph-users] lost bluestore metadata but still have data
Hi everyone, in the case where I’ve lost the entire directory below that contains a bluestore OSD’s config and metadata, but all the bluestore devices are intact (block, block.db, block.wal), how can I get the OSD up and running again? I tried to do a ceph-osd –mkfs again, which seemed to regenerate everything OK and got the OSD back to up/in, but all the placement groups assigned to the OSD are stuck stale. Using the admin socket on the OSD to ask it to trigger a scrub on a particular PG gives a result of “Can't find pg ”. It seems the OSD has no knowledge of the PGs that were assigned to it before. I assume this is because the mkfs operation cleared out state from the block/db devices. Is there any feasible approach to bring an OSD that’s lost its config back to life in the future? Thanks! osd0 # ls -l total 112 lrwxrwxrwx. 1 root root 58 Sep 22 22:26 block -> /dev/disk/by-partuuid/e0b7583c-aa1a-49b9-906b-3580f9f92b9a lrwxrwxrwx. 1 root root 58 Sep 22 22:26 block.db -> /dev/disk/by-partuuid/e68da1b1-b13c-4ca7-8055-884b0cf32a38 lrwxrwxrwx. 1 root root 58 Sep 22 22:26 block.wal -> /dev/disk/by-partuuid/5d0589b7-e149-4a4f-9dd6-a5444ef25c72 -rw-r--r--. 1 root root2 Sep 22 22:26 bluefs -rw-r--r--. 1 root root 37 Sep 22 22:26 ceph_fsid -rw-r--r--. 1 root root 37 Sep 22 22:26 fsid -rw-r--r--. 1 root root 56 Sep 22 22:26 keyring -rw-r--r--. 1 root root8 Sep 22 22:26 kv_backend -rw-r--r--. 1 root root 21 Sep 22 22:26 magic -rw-r--r--. 1 root root4 Sep 22 22:26 mkfs_done -rw-r--r--. 1 root root6 Sep 22 22:26 ready srwxr-xr-x. 1 root root0 Sep 22 22:26 ceph-osd.0.asok -rw-r--r--. 1 root root 2221 Sep 22 22:26 ceph.config -rw-r--r--. 1 root root 10 Sep 22 22:26 type -rw-r--r--. 1 root root2 Sep 22 22:26 whoami ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] how to troubleshoot "heartbeat_check: no reply" in OSD log
I’ve got a cluster where a bunch of OSDs are down/out (only 6/21 are up/in). ceph status and ceph osd tree output can be found at: https://gist.github.com/jbw976/24895f5c35ef0557421124f4b26f6a12 In osd.4 log, I see many of these: 2017-07-27 19:38:53.468852 7f3855c1c700 -1 osd.4 120 heartbeat_check: no reply from 10.32.0.3:6807 osd.15 ever on either front or back, first ping sent 2017-07-27 19:37:40.857220 (cutoff 2017-07-27 19:38:33.468850) 2017-07-27 19:38:53.468881 7f3855c1c700 -1 osd.4 120 heartbeat_check: no reply from 10.32.0.3:6811 osd.16 ever on either front or back, first ping sent 2017-07-27 19:37:40.857220 (cutoff 2017-07-27 19:38:33.468850) From osd.4, those endpoints look reachable: / # nc -vz 10.32.0.3 6807 10.32.0.3 (10.32.0.3:6807) open / # nc -vz 10.32.0.3 6811 10.32.0.3 (10.32.0.3:6811) open What else can I look at to determine why most of the OSDs cannot communicate? http://tracker.ceph.com/issues/16092 indicates this behavior is a networking or hardware issue, what else can I check there? I can turn on extra logging as needed. Thanks! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] uncompiled crush map for ceph-rest-api /osd/crush/set
Hi, I have a few questions about the usage of ceph-rest-api's /osd/crush/set. Documentation obtained from GET /api/v0.1/osd: osd/crush/set PUT "set crush map from input file" 1) Does the input crush map have to be compiled before using this API or can an uncompiled map be used? 2) Is there anything in the ceph-rest-api API to compile a crush map? I can successfully set a compiled crush map, but I get an error using an uncompiled one which seems to indicate it must be compiled first. The below is my attempt with curl to use an uncompiled map but it gets 400 BadRequest: "Error: Failed to parse crushmap: buffer::malformed_input: bad magic number (-22)”. Is there a way to do this with an uncompiled map? curl -iv -XPUT --data-binary "@/tmp/crushmap-uncompiled" -H "Accept: application/json" -H "Content-type: application/octet-stream" '0.0.0.0:53279/api/v0.1/osd/crush/set' * Trying 0.0.0.0... * Connected to 0.0.0.0 (127.0.0.1) port 53279 (#0) > PUT /api/v0.1/osd/setcrushmap HTTP/1.1 > User-Agent: curl/7.38.0 > Host: 0.0.0.0:53279 > Accept: application/json > Content-type: application/octet-stream > Content-Length: 1297 > Expect: 100-continue > < HTTP/1.1 100 Continue HTTP/1.1 100 Continue * HTTP 1.0, assume close after body < HTTP/1.0 400 BAD REQUEST HTTP/1.0 400 BAD REQUEST < Content-Type: text/html; charset=utf-8 Content-Type: text/html; charset=utf-8 < Content-Length: 108 Content-Length: 108 * Server Werkzeug/0.9.6 Python/2.7.9 is not blacklisted < Server: Werkzeug/0.9.6 Python/2.7.9 Server: Werkzeug/0.9.6 Python/2.7.9 < Date: Wed, 09 Mar 2016 22:45:57 GMT Date: Wed, 09 Mar 2016 22:45:57 GMT < * Closing connection 0 {"status": "Error: Failed to parse crushmap: buffer::malformed_input: bad magic number (-22)", "output": []} Thanks for any help! -- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com