[ceph-users] lost bluestore metadata but still have data

2017-09-22 Thread Jared Watts
Hi everyone, in the case where I’ve lost the entire directory below that 
contains a bluestore OSD’s config and metadata, but all the bluestore devices 
are intact (block, block.db, block.wal), how can I get the OSD up and running 
again?

I tried to do a ceph-osd –mkfs again, which seemed to regenerate everything OK 
and got the OSD back to up/in, but all the placement groups assigned to the OSD 
are stuck stale.  Using the admin socket on the OSD to ask it to trigger a 
scrub on a particular PG gives a result of “Can't find pg ”.

It seems the OSD has no knowledge of the PGs that were assigned to it before.  
I assume this is because the mkfs operation cleared out state from the block/db 
devices.

Is there any feasible approach to bring an OSD that’s lost its config back to 
life in the future?  Thanks!

osd0 # ls -l
total 112
lrwxrwxrwx. 1 root root   58 Sep 22 22:26 block -> 
/dev/disk/by-partuuid/e0b7583c-aa1a-49b9-906b-3580f9f92b9a
lrwxrwxrwx. 1 root root   58 Sep 22 22:26 block.db -> 
/dev/disk/by-partuuid/e68da1b1-b13c-4ca7-8055-884b0cf32a38
lrwxrwxrwx. 1 root root   58 Sep 22 22:26 block.wal -> 
/dev/disk/by-partuuid/5d0589b7-e149-4a4f-9dd6-a5444ef25c72
-rw-r--r--. 1 root root2 Sep 22 22:26 bluefs
-rw-r--r--. 1 root root   37 Sep 22 22:26 ceph_fsid
-rw-r--r--. 1 root root   37 Sep 22 22:26 fsid
-rw-r--r--. 1 root root   56 Sep 22 22:26 keyring
-rw-r--r--. 1 root root8 Sep 22 22:26 kv_backend
-rw-r--r--. 1 root root   21 Sep 22 22:26 magic
-rw-r--r--. 1 root root4 Sep 22 22:26 mkfs_done
-rw-r--r--. 1 root root6 Sep 22 22:26 ready
srwxr-xr-x. 1 root root0 Sep 22 22:26 ceph-osd.0.asok
-rw-r--r--. 1 root root 2221 Sep 22 22:26 ceph.config
-rw-r--r--. 1 root root   10 Sep 22 22:26 type
-rw-r--r--. 1 root root2 Sep 22 22:26 whoami

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to troubleshoot "heartbeat_check: no reply" in OSD log

2017-07-27 Thread Jared Watts
I’ve got a cluster where a bunch of OSDs are down/out (only 6/21 are up/in).  
ceph status and ceph osd tree output can be found at:
https://gist.github.com/jbw976/24895f5c35ef0557421124f4b26f6a12

In osd.4 log, I see many of these:
2017-07-27 19:38:53.468852 7f3855c1c700 -1 osd.4 120 heartbeat_check: no reply 
from 10.32.0.3:6807 osd.15 ever on either front or back, first ping sent 
2017-07-27 19:37:40.857220 (cutoff 2017-07-27 19:38:33.468850)
2017-07-27 19:38:53.468881 7f3855c1c700 -1 osd.4 120 heartbeat_check: no reply 
from 10.32.0.3:6811 osd.16 ever on either front or back, first ping sent 
2017-07-27 19:37:40.857220 (cutoff 2017-07-27 19:38:33.468850)

From osd.4, those endpoints look reachable:
/ # nc -vz 10.32.0.3 6807
10.32.0.3 (10.32.0.3:6807) open
/ # nc -vz 10.32.0.3 6811
10.32.0.3 (10.32.0.3:6811) open

What else can I look at to determine why most of the OSDs cannot communicate?  
http://tracker.ceph.com/issues/16092 indicates this behavior is a networking or 
hardware issue, what else can I check there?  I can turn on extra logging as 
needed.  Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] uncompiled crush map for ceph-rest-api /osd/crush/set

2016-03-09 Thread Jared Watts
Hi, I have a few questions about the usage of ceph-rest-api's /osd/crush/set.

Documentation obtained from GET /api/v0.1/osd:

osd/crush/set PUT "set crush map from input file"

1) Does the input crush map have to be compiled before using this API or can an 
uncompiled map be used?
2) Is there anything in the ceph-rest-api API to compile a crush map?

I can successfully set a compiled crush map, but I get an error using an 
uncompiled one which seems to indicate it must be compiled first.  The below is 
my attempt with curl to use an uncompiled map but it gets 400 BadRequest: 
"Error: Failed to parse crushmap: buffer::malformed_input: bad magic number 
(-22)”.

Is there a way to do this with an uncompiled map?


curl -iv -XPUT --data-binary "@/tmp/crushmap-uncompiled" -H "Accept: 
application/json" -H "Content-type: application/octet-stream" 
'0.0.0.0:53279/api/v0.1/osd/crush/set'

*   Trying 0.0.0.0...

* Connected to 0.0.0.0 (127.0.0.1) port 53279 (#0)

> PUT /api/v0.1/osd/setcrushmap HTTP/1.1

> User-Agent: curl/7.38.0

> Host: 0.0.0.0:53279

> Accept: application/json

> Content-type: application/octet-stream

> Content-Length: 1297

> Expect: 100-continue

>

< HTTP/1.1 100 Continue

HTTP/1.1 100 Continue


* HTTP 1.0, assume close after body

< HTTP/1.0 400 BAD REQUEST

HTTP/1.0 400 BAD REQUEST

< Content-Type: text/html; charset=utf-8

Content-Type: text/html; charset=utf-8

< Content-Length: 108

Content-Length: 108

* Server Werkzeug/0.9.6 Python/2.7.9 is not blacklisted

< Server: Werkzeug/0.9.6 Python/2.7.9

Server: Werkzeug/0.9.6 Python/2.7.9

< Date: Wed, 09 Mar 2016 22:45:57 GMT

Date: Wed, 09 Mar 2016 22:45:57 GMT


<

* Closing connection 0

{"status": "Error: Failed to parse crushmap: buffer::malformed_input: bad magic 
number (-22)", "output": []}

Thanks for any help!


--
The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com