Re: [ceph-users] scrub error: found clone without head
Not yet. I keep it for now. Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit : rb.0.15c26.238e1f29 Has that rbd volume been removed? -Sam On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet ceph.l...@daevel.fr wrote: 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail. Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit : What version are you running? -Sam On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote: Is it enough ? # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head' 2013-05-22 15:43:09.308352 7f707dd64700 0 log [INF] : 9.105 scrub ok 2013-05-22 15:44:21.054893 7f707dd64700 0 log [INF] : 9.451 scrub ok 2013-05-22 15:44:52.898784 7f707cd62700 0 log [INF] : 9.784 scrub ok 2013-05-22 15:47:43.148515 7f707cd62700 0 log [INF] : 9.3c3 scrub ok 2013-05-22 15:47:45.717085 7f707dd64700 0 log [INF] : 9.3d0 scrub ok 2013-05-22 15:52:14.573815 7f707dd64700 0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.9221/12d7//3 found clone without head 2013-05-22 15:55:07.230114 7f707d563700 0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.3671/12d7//3 found clone without head 2013-05-22 15:56:56.456242 7f707d563700 0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.86a2/12d7//3 found clone without head 2013-05-22 15:57:51.667085 7f707dd64700 0 log [ERR] : 3.6b scrub 3 errors 2013-05-22 15:57:55.241224 7f707dd64700 0 log [INF] : 9.450 scrub ok 2013-05-22 15:57:59.800383 7f707cd62700 0 log [INF] : 9.465 scrub ok 2013-05-22 15:59:55.024065 7f707661a700 0 -- 192.168.42.3:6803/12142 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby 2013-05-22 16:01:45.542579 7f7022770700 0 -- 192.168.42.3:6803/12142 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby -- 2013-05-22 16:29:49.544310 7f707dd64700 0 log [INF] : 9.4eb scrub ok 2013-05-22 16:29:53.190233 7f707dd64700 0 log [INF] : 9.4f4 scrub ok 2013-05-22 16:29:59.478736 7f707dd64700 0 log [INF] : 8.6bb scrub ok 2013-05-22 16:35:12.240246 7f7022770700 0 -- 192.168.42.3:6803/12142 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby 2013-05-22 16:35:19.519019 7f707d563700 0 log [INF] : 8.700 scrub ok 2013-05-22 16:39:15.422532 7f707dd64700 0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.0836/12d7//3 found clone without head 2013-05-22 16:40:04.995256 7f707cd62700 0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.9a00/12d7//3 found clone without head 2013-05-22 16:41:07.008717 7f707d563700 0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.9820/12d7//3 found clone without head 2013-05-22 16:41:42.460280 7f707c561700 0 log [ERR] : 3.1 scrub 3 errors 2013-05-22 16:46:12.385678 7f7077735700 0 -- 192.168.42.3:6803/12142 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby 2013-05-22 16:58:36.079010 7f707661a700 0 -- 192.168.42.3:6803/12142 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby 2013-05-22 16:58:36.798038 7f707d563700 0 log [INF] : 9.50c scrub ok 2013-05-22 16:58:40.104159 7f707c561700 0 log [INF] : 9.526 scrub ok Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29). Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit : Can you post your ceph.log with the period including all of these errors? -Sam On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich maha...@bspu.unibel.by wrote: Olivier Bonvalet пишет: Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit : Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit : I have 4 scrub errors (3 PGs - found clone without head), on one OSD. Not repairing. How to repair it exclude re-creating of OSD? Now it easy to clean+create OSD, but in theory - in case there are multiple OSDs - it may cause data lost. I have same problem : 8 objects (4 PG) with error found clone without head. How can I fix that ? since pg repair doesn't handle that kind of errors, is there a way to manually fix that ? (it's a production cluster) Trying to fix manually I cause assertions in trimming process (died OSD). And many others troubles. So, if you want to keep cluster running, wait for developers answer. IMHO. About manual repair attempt: see issue #4937. Also
Re: [ceph-users] mon problems after upgrading to cuttlefish
Hi, please do not forget to respond to the list ceph-users@lists.ceph.com find answer below. Am 23.05.2013 17:16, schrieb Bryan Stillwell: This is what I currently have configured: # Ceph config file [global] auth cluster required = none auth service required = none # auth client required = cephx [osd] osd journal size = 1000 filestore xattr use omap = true osd mkfs type = xfs osd mkfs options xfs = noatime [mon.a] host = a1 mon addr = 172.24.88.50:6789 [osd.0] host = b1 devs = /dev/sdb [osd.1] host = b1 devs = /dev/sdc [osd.2] host = b1 devs = /dev/sdd [osd.3] host = b1 devs = /dev/sde [osd.4] host = b1 devs = /dev/sdf [osd.5] host = b2 devs = /dev/sdb [osd.6] host = b2 devs = /dev/sdc [osd.7] host = b2 devs = /dev/sdd1 [osd.8] host = b2 devs = /dev/sde [osd.9] host = b2 devs = /dev/sdf [osd.10] host = b3 devs = /dev/sdb1 [osd.11] host = b3 devs = /dev/sdc1 [osd.12] host = b3 devs = /dev/sdd1 [osd.13] host = b3 devs = /dev/sde1 [osd.14] host = b3 devs = /dev/sdf1 [osd.15] host = b4 devs = /dev/sdb1 [osd.16] host = b4 devs = /dev/sdc1 [osd.17] host = b4 devs = /dev/sdd1 [osd.18] host = b4 devs = /dev/sde1 [osd.19] host = b4 devs = /dev/sdf1 [osd.20] host = b1 devs = /dev/sda4 [osd.21] host = b2 devs = /dev/sda4 [osd.22] host = b3 devs = /dev/sda4 [osd.23] host = b4 devs = /dev/sda4 [mds.a] host = a1 #[client] # debug ms = 1 # debug client = 20 On Thu, May 23, 2013 at 4:00 AM, Smart Weblications GmbH - Florian Wiessner f.wiess...@smart-weblications.de wrote: Am 23.05.2013 07:45, schrieb Bryan Stillwell: I attempted to upgrade my bobtail cluster to cuttlefish tonight and I believe I'm running into some mon related issues. I did the original install manually instead of with mkcephfs or ceph-deploy, so I think that might have to do with this error: root@a1:~# ceph-mon -d -c /etc/ceph/ceph.conf 2013-05-22 23:37:29.283975 7f8fb97b3780 0 ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60), process ceph-mon, pid 5531 IO error: /var/lib/ceph/mon/ceph-admin/store.db/LOCK: No such file or directory 2013-05-22 23:37:29.286534 7f8fb97b3780 1 unable to open monitor store at /var/lib/ceph/mon/ceph-admin 2013-05-22 23:37:29.286544 7f8fb97b3780 1 check for old monitor store format 2013-05-22 23:37:29.286550 7f8fb97b3780 1 store(/var/lib/ceph/mon/ceph-admin) mount 2013-05-22 23:37:29.286559 7f8fb97b3780 1 store(/var/lib/ceph/mon/ceph-admin) basedir /var/lib/ceph/mon/ceph-admin dne 2013-05-22 23:37:29.286564 7f8fb97b3780 -1 unable to mount monitor store: (2) No such file or directory 2013-05-22 23:37:29.286577 7f8fb97b3780 -1 found errors while attempting to convert the monitor store: (2) No such file or directory root@a1:~# ls -l /var/lib/ceph/mon/ total 4 drwxr-xr-x 15 root root 4096 May 22 23:30 ceph-a I only have one mon daemon in this cluster as well. I was planning on upgrading it to 3 tonight but when I try to run most commands they just hang now. I do see the store.db directory in the ceph-a directory if that helps: root@a1:~# ls -l /var/lib/ceph/mon/ceph-a/ total 868 drwxr-xr-x 2 root root 4096 May 22 23:30 auth drwxr-xr-x 2 root root 4096 May 22 23:30 auth_gv -rw--- 1 root root 37 Feb 4 14:22 cluster_uuid -rw--- 1 root root 2 May 22 23:30 election_epoch -rw--- 1 root root120 Feb 4 14:22 feature_set -rw--- 1 root root 2 Dec 28 11:35 joined -rw--- 1 root root 77 May 22 22:30 keyring -rw--- 1 root root 0 Dec 28 11:35 lock drwxr-xr-x 2 root root 20480 May 22 23:30 logm drwxr-xr-x 2 root root 20480 May 22 23:30 logm_gv -rw--- 1 root root 21 Dec 28 11:35 magic drwxr-xr-x 2 root root 12288 May 22 23:30 mdsmap drwxr-xr-x 2 root root 12288 May 22 23:30 mdsmap_gv drwxr-xr-x 2 root root 4096 Dec 28 11:35 monmap drwxr-xr-x 2 root root 233472 May 22 23:30 osdmap drwxr-xr-x 2 root root 237568 May 22 23:30 osdmap_full drwxr-xr-x 2 root root 253952 May 22 23:30 osdmap_gv drwxr-xr-x 2 root root 20480 May 22 23:30 pgmap drwxr-xr-x 2 root root 20480 May 22 23:30 pgmap_gv drwxr-xr-x 2 root root 4096 May 22 23:36 store.db what does your ceph.conf look like? store(/var/lib/ceph/mon/ceph-admin) mount 2013-05-22 23:37:29.286559 7f8fb97b3780 1 store(/var/lib/ceph/mon/ceph-admin) basedir /var/lib/ceph/mon/ceph-admin dne 2013-05-22 23:37:29.286564
[ceph-users] radosgw with nginx
Hi all, We are trying to run radosgw with nginx. We've found an example https://gist.github.com/guilhem/4964818 And changed our nginx.conf like below: http { server { listen 0.0.0.0:80 http://0.0.0.0/; server_name _; access_log off; location / { fastcgi_pass_header Authorization; fastcgi_pass_request_headers on; include fastcgi_params; fastcgi_keep_conn on; fastcgi_pass unix:/tmp/radosgw.sock; } } } But the simplest test gives following error: # curl -v http://x.x.x.x/bucket/test.jpg * About to connect() to x.x.x.x port 80 (#0) * Trying x.x.x.x ... connected GET /bucket/test.jpg HTTP/1.1 User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3 Host: x.x.x.x Accept: */* HTTP/1.1 400 Server: nginx/1.1.19 Date: Thu, 23 May 2013 15:34:05 GMT Content-Type: application/json Content-Length: 26 Connection: keep-alive Accept-Ranges: bytes * Connection #0 to host x.x.x.x left intact * Closing connection #0 {Code:InvalidArgument} radosgw logs show these: 2013-05-23 08:34:31.074037 7f0739c33780 20 enqueued request req=0x1e78870 2013-05-23 08:34:31.074044 7f0739c33780 20 RGWWQ: 2013-05-23 08:34:31.074045 7f0739c33780 20 req: 0x1e78870 2013-05-23 08:34:31.074047 7f0739c33780 10 allocated request req=0x1ec6490 2013-05-23 08:34:31.074084 7f0720ce8700 20 dequeued request req=0x1e78870 2013-05-23 08:34:31.074093 7f0720ce8700 20 RGWWQ: empty 2013-05-23 08:34:31.074098 7f0720ce8700 1 == starting new request req=0x1e78870 = 2013-05-23 08:34:31.074140 7f0720ce8700 2 req 4:0.42initializing 2013-05-23 08:34:31.074174 7f0720ce8700 5 nothing to log for operation 2013-05-23 08:34:31.074178 7f0720ce8700 2 req 4:0.80::GET /bucket/test.jpg::http status=400 2013-05-23 08:34:31.074192 7f0720ce8700 1 == req done req=0x1e78870 http_status=400 == Normally we expect a well formed 403 (because request doesn't have Authorization header) but we have a 400 and cannot figure out why. Thanks in advance. -- erdem agaoglu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-deploy
Hi, I tried ceph-deploy all day. Found that it has a python-setuptools as dependency. I knew about python-pushy. But is there any other dependency that I'm missing? The problem I'm getting are as follows: #ceph-deploy gatherkeys ceph0 ceph1 ceph2 returns the following error, Unable to find /etc/ceph/ceph.client.admin.keyring on ['ceph0', 'ceph1', 'ceph2'] Once I got passed this, I don't know why it works sometimes. I have been following the exact steps as mentioned in the blog. Then when I try to do ceph-deploy osd create ceph0:/dev/sda3 ceph1:/dev/sda3 ceph2:/dev/sda3 It gets stuck. I'm using Ubuntu 13.04 for ceph-deploy and 12.04 for ceph nodes. I just need to get the cuttlefish working and willing to change the OS if it is required. Please help. :) Best Regards, Dewan Shamsul Alam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] mkcephfs
Hi, I had a running ceph cluster on bobtail. It was on 0.56.4. It is my test cluster. I upgraded it to 0.56.6, now mkcephfs doesn't work with the same working configuration and the following command: /sbin/mkcephfs -a -c /etc/ceph/ceph.conf ceph.conf [global] auth supported = none auth cluster required = none auth service required = none auth client required = none [osd] osd journal size = 1000 filestore xattr use omap = true osd mkfs type = btrfs osd mkfs options btrfs = -m raid0 osd mount options btrfs = rw, noatime [mon.a] host = ceph0 mon addr = 192.168.128.10:6789 [mon.b] host = ceph1 mon addr = 192.168.128.11:6789 [mon.c] host = ceph2 mon addr = 192.168.128.12:6789 [osd.0] host = ceph0 devs = /dev/sda3 [osd.1] host = ceph1 devs = /dev/sda3 [osd.2] host = ceph2 devs = /dev/sda3 [mds.a] host = ceph0 [mds.b] host = ceph1 [mds.c] host = ceph2 Best Regards, Dewan Shamsul Alam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mkcephfs
Can you be more specific? How does it fail? A copy of the actual output would be ideal, thanks! sage On Thu, 23 May 2013, Dewan Shamsul Alam wrote: Hi, I had a running ceph cluster on bobtail. It was on 0.56.4. It is my test cluster. I upgraded it to 0.56.6, now mkcephfs doesn't work with the same working configuration and the following command: /sbin/mkcephfs -a -c /etc/ceph/ceph.conf ceph.conf [global] auth supported = none auth cluster required = none auth service required = none auth client required = none [osd] osd journal size = 1000 filestore xattr use omap = true osd mkfs type = btrfs osd mkfs options btrfs = -m raid0 osd mount options btrfs = rw, noatime [mon.a] host = ceph0 mon addr = 192.168.128.10:6789 [mon.b] host = ceph1 mon addr = 192.168.128.11:6789 [mon.c] host = ceph2 mon addr = 192.168.128.12:6789 [osd.0] host = ceph0 devs = /dev/sda3 [osd.1] host = ceph1 devs = /dev/sda3 [osd.2] host = ceph2 devs = /dev/sda3 [mds.a] host = ceph0 [mds.b] host = ceph1 [mds.c] host = ceph2 Best Regards, Dewan Shamsul Alam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] FW: About RBD
Thank you very much for your prompt response… So basically I can’t use cluster aware tool like Microsoft CSV on the RBD, is that correct? What I am trying to understand is that can I have 2 physical hosts (Maybe Dell PowerEdge2950) *host1 with VM#0-10 *host2 with VM #10-20 And both of these hosts accessing one big LUN or, in this case ceph RBD? Can host1 failed all it VMs to host2 in case that machine has trouble and still make it resources available to my users? This is very important to us if we really want to explore this new avenue of Ceph Thank you, Yao Mensah Systems Administrator II OLS Servers yao.men...@usdoj.govmailto:yao.men...@usdoj.gov (202) 307 0354 MCITP MCSE NT4.0 / 2000-2003 A+ From: Dave Spano [mailto:dsp...@optogenics.com] Sent: Thursday, May 23, 2013 1:19 PM To: Mensah, Yao (CIV) Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] FW: About RBD Unless something changed, each RBD needs to be attached to 1 host at a time like an ISCSI lun. Dave Spano Optogenics From: Yao Mensah (CIV) yao.men...@usdoj.govmailto:yao.men...@usdoj.gov To: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Sent: Thursday, May 23, 2013 1:10:53 PM Subject: [ceph-users] FW: About RBD FYI From: Mensah, Yao (CIV) Sent: Wednesday, May 22, 2013 5:59 PM To: 'i...@inktank.com' Subject: About RBD Hello, I was doing some reading on your web site about ceph and what it capable of. I have one question and maybe you can help on this: Can ceph RBD be used by 2 physical hosts at the same time? Or, is Ceph rbd CSV(Clustered Shared Volumes) aware? Thank you, Yao Mensah Systems Administrator II OLS Servers ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mkcephfs
Hi, This is what I get while building the cluster: #/sbin/mkcephfs -a -c /etc/ceph/ceph.conf temp dir is /tmp/mkcephfs.yzl9PFOJYo preparing monmap in /tmp/mkcephfs.yzl9PFOJYo/monmap /usr/bin/monmaptool --create --clobber --add a 192.168.128.10:6789 --add b 192.168.128.11:6789 --add c 192.168.128.12:6789 --print /tmp/mkcephfs.yzl9PFOJYo/monmap /usr/bin/monmaptool: monmap file /tmp/mkcephfs.yzl9PFOJYo/monmap /usr/bin/monmaptool: generated fsid 09136333-16dc-476f-8773-90262ad0b80d epoch 0 fsid 09136333-16dc-476f-8773-90262ad0b80d last_changed 2013-05-23 23:36:41.325667 created 2013-05-23 23:36:41.325667 0: 192.168.128.10:6789/0 mon.a 1: 192.168.128.11:6789/0 mon.b 2: 192.168.128.12:6789/0 mon.c /usr/bin/monmaptool: writing epoch 0 to /tmp/mkcephfs.yzl9PFOJYo/monmap (3 monitors) WARNING: mkcephfs is now deprecated in favour of ceph-deploy. Please see: http://github.com/ceph/ceph-deploy === osd.0 === 2013-05-23 23:36:41.851599 7fe81f143780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2013-05-23 23:36:42.576549 7fe81f143780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2013-05-23 23:36:42.577795 7fe81f143780 -1 filestore(/var/lib/ceph/osd/ceph-0) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory 2013-05-23 23:36:42.918456 7fe81f143780 -1 created object store /var/lib/ceph/osd/ceph-0 journal /var/lib/ceph/osd/ceph-0/journal for osd.0 fsid 09136333-16dc-476f-8773-90262ad0b80d 2013-05-23 23:36:42.918520 7fe81f143780 -1 auth: error reading file: /var/lib/ceph/osd/ceph-0/keyring: can't open /var/lib/ceph/osd/ceph-0/keyring: (2) No such file or directory 2013-05-23 23:36:42.918642 7fe81f143780 -1 created new key in keyring /var/lib/ceph/osd/ceph-0/keyring WARNING: mkcephfs is now deprecated in favour of ceph-deploy. Please see: http://github.com/ceph/ceph-deploy === osd.1 === pushing conf and monmap to ceph1:/tmp/mkfs.ceph.HWyfcu95hsnB1jxdVqxGJNAOd2u3aj5I 2013-05-23 23:36:12.380573 7ff116b63780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2013-05-23 23:36:13.026598 7ff116b63780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2013-05-23 23:36:13.037762 7ff116b63780 -1 filestore(/var/lib/ceph/osd/ceph-1) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory 2013-05-23 23:36:13.366445 7ff116b63780 -1 created object store /var/lib/ceph/osd/ceph-1 journal /var/lib/ceph/osd/ceph-1/journal for osd.1 fsid 09136333-16dc-476f-8773-90262ad0b80d 2013-05-23 23:36:13.366510 7ff116b63780 -1 auth: error reading file: /var/lib/ceph/osd/ceph-1/keyring: can't open /var/lib/ceph/osd/ceph-1/keyring: (2) No such file or directory 2013-05-23 23:36:13.366621 7ff116b63780 -1 created new key in keyring /var/lib/ceph/osd/ceph-1/keyring WARNING: mkcephfs is now deprecated in favour of ceph-deploy. Please see: http://github.com/ceph/ceph-deploy collecting osd.1 key === osd.2 === pushing conf and monmap to ceph2:/tmp/mkfs.ceph.tNt36unRvZ6lVKmz65OjiOhrpUfsw7xz 2013-05-23 23:36:59.086209 7fe38a955780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2013-05-23 23:36:59.610999 7fe38a955780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2013-05-23 23:36:59.623725 7fe38a955780 -1 filestore(/var/lib/ceph/osd/ceph-2) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory 2013-05-23 23:36:59.850510 7fe38a955780 -1 created object store /var/lib/ceph/osd/ceph-2 journal /var/lib/ceph/osd/ceph-2/journal for osd.2 fsid 09136333-16dc-476f-8773-90262ad0b80d 2013-05-23 23:36:59.850574 7fe38a955780 -1 auth: error reading file: /var/lib/ceph/osd/ceph-2/keyring: can't open /var/lib/ceph/osd/ceph-2/keyring: (2) No such file or directory 2013-05-23 23:36:59.850688 7fe38a955780 -1 created new key in keyring /var/lib/ceph/osd/ceph-2/keyring WARNING: mkcephfs is now deprecated in favour of ceph-deploy. Please see: http://github.com/ceph/ceph-deploy collecting osd.2 key === mds.a === creating private key for mds.a keyring /var/lib/ceph/mds/ceph-a/keyring creating /var/lib/ceph/mds/ceph-a/keyring WARNING: mkcephfs is now deprecated in favour of ceph-deploy. Please see: http://github.com/ceph/ceph-deploy === mds.b === pushing conf and monmap to ceph1:/tmp/mkfs.ceph.QDX0IZSEBd3469OT6yw6ISlckxfmO6nu creating private key for mds.b keyring /var/lib/ceph/mds/ceph-b/keyring creating /var/lib/ceph/mds/ceph-b/keyring WARNING: mkcephfs is now deprecated in favour of ceph-deploy. Please see: http://github.com/ceph/ceph-deploy collecting mds.b key === mds.c === pushing conf and monmap to ceph2:/tmp/mkfs.ceph.XimfAW4CrJR11rs8IAJhsHn0inBNJdhl creating private
Re: [ceph-users] RADOS Gateway Configuration
Hey John, Thanks for the reply. I'll check out that other doc you have there. Just for future reference do you know where ceph-deploy puts the ceph keyring? Daniel On Wed, May 22, 2013 at 7:19 PM, John Wilkins john.wilk...@inktank.comwrote: Daniel, It looks like I need to update that portion of the docs too, as it links back to the 5-minute quick start. Once you are up and running with HEALTH OK on either the 5-minute Quick Start or Quick Ceph Deploy, your storage cluster is running fine. The remaining issues would likely be with authentication, chmod on the files, or with the RGW setup. There's a quick start for RGW, which I had verified here: http://ceph.com/docs/master/start/quick-rgw/. Someone else had a problem with the Rewrite rule on that example reported here: http://tracker.ceph.com/issues/4608. It's likely I need to run through with specific Ceph and Apache versions. There are also a few additional tips in the configuration section. http://ceph.com/docs/master/radosgw/config/ There is an issue in some cases where keys have forward or backslash characters, and you may need to regenerate the keys. On Wed, May 22, 2013 at 4:42 PM, Daniel Curran danielcurra...@gmail.com wrote: Hello, I just started using ceph recently and was trying to get the RADOS Gateway working in order to use the Swift compatible API. I followed the install instructions found here (http://ceph.com/docs/master /start/quick-ceph-deploy/) and got to a point where ceph health give me HEALTH_OK. This is all well and good but near the end of the rados gw setup (found here http://ceph.com/docs/master/radosgw/manual-install/) I need to execute the following line: sudo ceph -k /etc/ceph/ceph.keyring auth add client.radosgw.gateway -i /etc/ceph/keyring.radosgw.gateway Unfortunately, I don't believe ceph-deploy places the keyring at /etc/ceph/ceph.keyring. I tried to use the one from /var/lib/ceph/bootstrap-osd/ceph.keyring but it was unable to authenticate as client.admin. Is there another location that the keyring needs to be copied from or am I doing something totally wrong? I didn't want to be held back so I restarted and did the manual install from the 5-minute quick start where I was able to find the ring. I had more issues almost immediately. I have to execute the following steps to create some users for swift: radosgw-admin user create --uid=johndoe --display-name=John Doe --email=j...@example.com sudo radosgw-admin subuser create --uid=johndoe --subuser=johndoe:swift --access=full sudo radosgw-admin key create --subuser=johndoe:swift --key-type=swift The first two gave me output I was expecting but the very last line had some weirdness that essentially made swift unusable. The expected output is something along these lines: { user_id: johndoe, rados_uid: 0, display_name: John Doe, email: j...@example.com, suspended: 0, subusers: [ { id: johndoe:swift, permissions: full-control}], keys: [ { user: johndoe, access_key: QFAMEDSJP5DEKJO0DDXY, secret_key: iaSFLDVvDdQt6lkNzHyW4fPLZugBAI1g17LO0+87}], swift_keys: [ { user: johndoe:swift, secret_key: E9T2rUZNu2gxUjcwUBO8n\/Ev4KX6\/GprEuH4qhu1}]} Where that last secret key is what we hand the swift CLI as seen here: swift -V 1.0 -A http://radosgw.example.com/auth -U johndoe:swift -K E9T2rUZNu2gxUjcwUBO8n\/Ev4KX6\/GprEuH4qhu1 post test However, my output came out like this: { user_id: johndoe, display_name: John Doe, email: j...@example.com, suspended: 0, max_buckets: 1000, auid: 0, subusers: [ { id: johndoe:swift, permissions: full-control}], keys: [ { user: johndoe, access_key: SUEXWVL3WB2Z64CRAG97, secret_key: C\/jHFJ3wdPv4iJ+aq4JeZ52LEC3OdnhsYEnVkhBP}], swift_keys: [ { user: johndoe:swift, secret_key: }], caps: []} Giving me no swift key to use. I don't believe the key is supposed to be blank because I tried that and received auth errors (to the best of my ability). I can't tell if this is my fault since I'm new nor am I able to find a way around it. It looks like there are definitely changes between the version used in the doc and mine so maybe it's all working as it should but the secret_key for swift lives somewhere else. If anyone knows anything I'd appreciate it a lot. Thank you, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- John Wilkins Senior Technical Writer Intank john.wilk...@inktank.com (415) 425-9599 http://inktank.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RADOS Gateway Configuration
It puts it in the same directory where you executed ceph-deploy. On Thu, May 23, 2013 at 10:57 AM, Daniel Curran danielcurra...@gmail.com wrote: Hey John, Thanks for the reply. I'll check out that other doc you have there. Just for future reference do you know where ceph-deploy puts the ceph keyring? Daniel On Wed, May 22, 2013 at 7:19 PM, John Wilkins john.wilk...@inktank.com wrote: Daniel, It looks like I need to update that portion of the docs too, as it links back to the 5-minute quick start. Once you are up and running with HEALTH OK on either the 5-minute Quick Start or Quick Ceph Deploy, your storage cluster is running fine. The remaining issues would likely be with authentication, chmod on the files, or with the RGW setup. There's a quick start for RGW, which I had verified here: http://ceph.com/docs/master/start/quick-rgw/. Someone else had a problem with the Rewrite rule on that example reported here: http://tracker.ceph.com/issues/4608. It's likely I need to run through with specific Ceph and Apache versions. There are also a few additional tips in the configuration section. http://ceph.com/docs/master/radosgw/config/ There is an issue in some cases where keys have forward or backslash characters, and you may need to regenerate the keys. On Wed, May 22, 2013 at 4:42 PM, Daniel Curran danielcurra...@gmail.com wrote: Hello, I just started using ceph recently and was trying to get the RADOS Gateway working in order to use the Swift compatible API. I followed the install instructions found here (http://ceph.com/docs/master /start/quick-ceph-deploy/) and got to a point where ceph health give me HEALTH_OK. This is all well and good but near the end of the rados gw setup (found here http://ceph.com/docs/master/radosgw/manual-install/) I need to execute the following line: sudo ceph -k /etc/ceph/ceph.keyring auth add client.radosgw.gateway -i /etc/ceph/keyring.radosgw.gateway Unfortunately, I don't believe ceph-deploy places the keyring at /etc/ceph/ceph.keyring. I tried to use the one from /var/lib/ceph/bootstrap-osd/ceph.keyring but it was unable to authenticate as client.admin. Is there another location that the keyring needs to be copied from or am I doing something totally wrong? I didn't want to be held back so I restarted and did the manual install from the 5-minute quick start where I was able to find the ring. I had more issues almost immediately. I have to execute the following steps to create some users for swift: radosgw-admin user create --uid=johndoe --display-name=John Doe --email=j...@example.com sudo radosgw-admin subuser create --uid=johndoe --subuser=johndoe:swift --access=full sudo radosgw-admin key create --subuser=johndoe:swift --key-type=swift The first two gave me output I was expecting but the very last line had some weirdness that essentially made swift unusable. The expected output is something along these lines: { user_id: johndoe, rados_uid: 0, display_name: John Doe, email: j...@example.com, suspended: 0, subusers: [ { id: johndoe:swift, permissions: full-control}], keys: [ { user: johndoe, access_key: QFAMEDSJP5DEKJO0DDXY, secret_key: iaSFLDVvDdQt6lkNzHyW4fPLZugBAI1g17LO0+87}], swift_keys: [ { user: johndoe:swift, secret_key: E9T2rUZNu2gxUjcwUBO8n\/Ev4KX6\/GprEuH4qhu1}]} Where that last secret key is what we hand the swift CLI as seen here: swift -V 1.0 -A http://radosgw.example.com/auth -U johndoe:swift -K E9T2rUZNu2gxUjcwUBO8n\/Ev4KX6\/GprEuH4qhu1 post test However, my output came out like this: { user_id: johndoe, display_name: John Doe, email: j...@example.com, suspended: 0, max_buckets: 1000, auid: 0, subusers: [ { id: johndoe:swift, permissions: full-control}], keys: [ { user: johndoe, access_key: SUEXWVL3WB2Z64CRAG97, secret_key: C\/jHFJ3wdPv4iJ+aq4JeZ52LEC3OdnhsYEnVkhBP}], swift_keys: [ { user: johndoe:swift, secret_key: }], caps: []} Giving me no swift key to use. I don't believe the key is supposed to be blank because I tried that and received auth errors (to the best of my ability). I can't tell if this is my fault since I'm new nor am I able to find a way around it. It looks like there are definitely changes between the version used in the doc and mine so maybe it's all working as it should but the secret_key for swift lives somewhere else. If anyone knows anything I'd appreciate it a lot. Thank you, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- John Wilkins Senior Technical Writer Intank john.wilk...@inktank.com (415) 425-9599 http://inktank.com -- John Wilkins Senior Technical
[ceph-users] ZFS on RBD?
Hi all, I'm evaluating Ceph and one of my workloads is a server that provides home directories to end users over both NFS and Samba. I'm looking at whether this could be backed by Ceph provided storage. So to test this I built a single node Ceph instance (Ubuntu precise, ceph.com packages) in a VM and popped a couple of OSDs on it. I then built another VM and used it to mount an RBD from the Ceph node. No problems... it all worked as described in the documentation. Then I started to look at the filesystem I was using on top of the RBD. I'd tested ext4 without any problems. I'd been testing ZFS (from stable zfs-native PPA) separately against local storage on the client VM too, so I thought I'd try that on top of the RBD. This is when I hit problems, and the VM paniced (trace at the end of this email). Now I am just experimenting, so this isn't a huge deal right now. But I'm wondering if this is something that should work? Am I overlooking something? Is it a silly idea to even try it? The trace looks to be in the ZFS code, so if there's a bug that needs fixing it's probably over there rather than in Ceph, but I thought here might be a good starting point for advice. Thanks in advance everyone, Tim. [ 504.644120] divide error: [#1] SMP [ 504.644298] Modules linked in: coretemp(F) ppdev(F) vmw_balloon(F) microcode(F) psmouse(F) serio_raw(F) parport_pc(F) vmwgfx(F) i2c_piix4(F) mac_hid(F) ttm(F) shpchp(F) drm(F) rbd(F) libceph(F) lp(F) parport(F) zfs(POF) zcommon(POF) znvpair(POF) zavl(POF) zunicode(POF) spl(OF) floppy(F) e1000(F) mptspi(F) mptscsih(F) mptbase(F) btrfs(F) zlib_deflate(F) libcrc32c(F) [ 504.646156] CPU 0 [ 504.646234] Pid: 2281, comm: txg_sync Tainted: PF B O 3.8.0-21-generic #32~precise1-Ubuntu VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform [ 504.646550] RIP: 0010:[a0258092] [a0258092] spa_history_write+0x82/0x1d0 [zfs] [ 504.646816] RSP: 0018:88003ae3dab8 EFLAGS: 00010246 [ 504.646940] RAX: RBX: RCX: [ 504.647091] RDX: RSI: 0020 RDI: [ 504.647242] RBP: 88003ae3db28 R08: 88003b2afc00 R09: 0002 [ 504.647423] R10: 88003b9a4512 R11: 6d206b6e61742066 R12: 88003add6600 [ 504.647600] R13: 88003cfc2000 R14: 88003d3c9000 R15: 0008 [ 504.647778] FS: () GS:88003fc0() knlGS: [ 504.647997] CS: 0010 DS: ES: CR0: 8005003b [ 504.648153] CR2: 7fbc1ef54a38 CR3: 3bf3e000 CR4: 07f0 [ 504.648380] DR0: DR1: DR2: [ 504.648586] DR3: DR6: 0ff0 DR7: 0400 [ 504.648766] Process txg_sync (pid: 2281, threadinfo 88003ae3c000, task 88003b7c45c0) [ 504.648990] Stack: [ 504.649087] 0002 a01e3360 88003b2afc00 88003ae3dba0 [ 504.649461] 88003d3c9000 0008 88003cfc2000 5530ebc2 [ 504.649835] 88003d22ac40 88003d22ac40 88003cfc2000 88003b2afc00 [ 504.650209] Call Trace: [ 504.650351] [a0258415] spa_history_log_sync+0x235/0x650 [zfs] [ 504.650554] [a023fdf3] dsl_sync_task_group_sync+0x123/0x210 [zfs] [ 504.650760] [a0237deb] dsl_pool_sync+0x41b/0x530 [zfs] [ 504.650953] [a024cfd8] spa_sync+0x3a8/0xa50 [zfs] [ 504.651117] [810ae6ac] ? ktime_get_ts+0x4c/0xe0 [ 504.651302] [a025de3f] txg_sync_thread+0x2df/0x540 [zfs] [ 504.651501] [a025db60] ? txg_init+0x250/0x250 [zfs] [ 504.651676] [a0156c58] thread_generic_wrapper+0x78/0x90 [spl] [ 504.651856] [a0156be0] ? __thread_create+0x310/0x310 [spl] [ 504.652029] [8107f000] kthread+0xc0/0xd0 [ 504.652174] [8107ef40] ? flush_kthread_worker+0xb0/0xb0 [ 504.652339] [816facac] ret_from_fork+0x7c/0xb0 [ 504.652492] [8107ef40] ? flush_kthread_worker+0xb0/0xb0 [ 504.652655] Code: 55 b0 48 89 fa 48 29 f2 48 01 c2 48 39 55 b8 0f 82 bc 00 00 00 4c 8b 75 b0 41 bf 08 00 00 00 48 29 c8 31 d2 49 8b b5 70 08 00 00 48 f7 f7 4c 8d 45 c0 4c 89 f7 48 01 ca 48 29 d3 48 83 fb 08 49 [ 504.659810] RIP [a0258092] spa_history_write+0x82/0x1d0 [zfs] [ 504.660045] RSP 88003ae3dab8 [ 504.660187] ---[ end trace e69c7eee3ba17773 ]--- -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] MDS dying on cuttlefish
Hi! I've got a cluster of two nodes on Ubuntu 12.04 with cuttlefish from the ceph.com repo. ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60) The MDS process is dying after a while with a stack trace, but I can't understand why. I reproduced the same problem on debian 7 with the same repository. -3 2013-05-23 23:00:42.957679 7fa39e28e700 1 -- 10.123.200.189:6800/28919 == osd.0 10.123.200.188:6802/27665 1 osd_op_reply(5 200. [read 0~0] ack = -2 (No such file or directory)) v4 111+0+0 (2261481792 0 0) 0x29afe00 con 0x29c4b00 -2 2013-05-23 23:00:42.957780 7fa39e28e700 0 mds.0.journaler(ro) error getting journal off disk -1 2013-05-23 23:00:42.960974 7fa39e28e700 1 -- 10.123.200.189:6800/28919 == osd.0 10.123.200.188:6802/27665 2 osd_op_reply(1 mds0_inotable [read 0~0] ack = -2 (No such file or directory)) v4 112+0+0 (1612134461 0 0) 0x2a1c200 con 0x29c4b00 0 2013-05-23 23:00:42.963326 7fa39e28e700 -1 mds/MDSTable.cc: In function 'void MDSTable::load_2(int, ceph::bufferlist, Context*)' thread 7fa39e28e700 time 2013-05-23 23:00:42.961076 mds/MDSTable.cc: 150: FAILED assert(0) ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60) 1: (MDSTable::load_2(int, ceph::buffer::list, Context*)+0x3bb) [0x6dd2db] 2: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe1b) [0x7275bb] 3: (MDS::handle_core_message(Message*)+0xae7) [0x513c57] 4: (MDS::_dispatch(Message*)+0x33) [0x513d53] 5: (MDS::ms_dispatch(Message*)+0xab) [0x515b3b] 6: (DispatchQueue::entry()+0x393) [0x847ca3] 7: (DispatchQueue::DispatchThread::entry()+0xd) [0x7caeed] 8: (()+0x6b50) [0x7fa3a3376b50] 9: (clone()+0x6d) [0x7fa3a1d24a7d] Full logs here: http://pastebin.com/C81g5jFd I can't understand why and I'd really appreciate an hint. Thanks! Regards, Giuseppe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] scrub error: found clone without head
Do all of the affected PGs share osd.28 as the primary? I think the only recovery is probably to manually remove the orphaned clones. -Sam On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote: Not yet. I keep it for now. Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit : rb.0.15c26.238e1f29 Has that rbd volume been removed? -Sam On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet ceph.l...@daevel.fr wrote: 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail. Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit : What version are you running? -Sam On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote: Is it enough ? # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head' 2013-05-22 15:43:09.308352 7f707dd64700 0 log [INF] : 9.105 scrub ok 2013-05-22 15:44:21.054893 7f707dd64700 0 log [INF] : 9.451 scrub ok 2013-05-22 15:44:52.898784 7f707cd62700 0 log [INF] : 9.784 scrub ok 2013-05-22 15:47:43.148515 7f707cd62700 0 log [INF] : 9.3c3 scrub ok 2013-05-22 15:47:45.717085 7f707dd64700 0 log [INF] : 9.3d0 scrub ok 2013-05-22 15:52:14.573815 7f707dd64700 0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.9221/12d7//3 found clone without head 2013-05-22 15:55:07.230114 7f707d563700 0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.3671/12d7//3 found clone without head 2013-05-22 15:56:56.456242 7f707d563700 0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.86a2/12d7//3 found clone without head 2013-05-22 15:57:51.667085 7f707dd64700 0 log [ERR] : 3.6b scrub 3 errors 2013-05-22 15:57:55.241224 7f707dd64700 0 log [INF] : 9.450 scrub ok 2013-05-22 15:57:59.800383 7f707cd62700 0 log [INF] : 9.465 scrub ok 2013-05-22 15:59:55.024065 7f707661a700 0 -- 192.168.42.3:6803/12142 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby 2013-05-22 16:01:45.542579 7f7022770700 0 -- 192.168.42.3:6803/12142 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby -- 2013-05-22 16:29:49.544310 7f707dd64700 0 log [INF] : 9.4eb scrub ok 2013-05-22 16:29:53.190233 7f707dd64700 0 log [INF] : 9.4f4 scrub ok 2013-05-22 16:29:59.478736 7f707dd64700 0 log [INF] : 8.6bb scrub ok 2013-05-22 16:35:12.240246 7f7022770700 0 -- 192.168.42.3:6803/12142 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby 2013-05-22 16:35:19.519019 7f707d563700 0 log [INF] : 8.700 scrub ok 2013-05-22 16:39:15.422532 7f707dd64700 0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.0836/12d7//3 found clone without head 2013-05-22 16:40:04.995256 7f707cd62700 0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.9a00/12d7//3 found clone without head 2013-05-22 16:41:07.008717 7f707d563700 0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.9820/12d7//3 found clone without head 2013-05-22 16:41:42.460280 7f707c561700 0 log [ERR] : 3.1 scrub 3 errors 2013-05-22 16:46:12.385678 7f7077735700 0 -- 192.168.42.3:6803/12142 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby 2013-05-22 16:58:36.079010 7f707661a700 0 -- 192.168.42.3:6803/12142 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby 2013-05-22 16:58:36.798038 7f707d563700 0 log [INF] : 9.50c scrub ok 2013-05-22 16:58:40.104159 7f707c561700 0 log [INF] : 9.526 scrub ok Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29). Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit : Can you post your ceph.log with the period including all of these errors? -Sam On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich maha...@bspu.unibel.by wrote: Olivier Bonvalet пишет: Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit : Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit : I have 4 scrub errors (3 PGs - found clone without head), on one OSD. Not repairing. How to repair it exclude re-creating of OSD? Now it easy to clean+create OSD, but in theory - in case there are multiple OSDs - it may cause data lost. I have same problem : 8 objects (4 PG) with error found clone without head. How can I fix that ? since pg repair doesn't handle that kind of errors, is there a way to manually fix that ? (it's a production cluster) Trying to fix manually I cause
Re: [ceph-users] scrub error: found clone without head
No : pg 3.7c is active+clean+inconsistent, acting [24,13,39] pg 3.6b is active+clean+inconsistent, acting [28,23,5] pg 3.d is active+clean+inconsistent, acting [29,4,11] pg 3.1 is active+clean+inconsistent, acting [28,19,5] But I suppose that all PG *was* having the osd.25 as primary (on the same host), which is (disabled) buggy OSD. Question : 12d7 in object path is the snapshot id, right ? If it's the case, I haven't got any snapshot with this id for the rb.0.15c26.238e1f29 image. So, which files should I remove ? Thanks for your help. Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit : Do all of the affected PGs share osd.28 as the primary? I think the only recovery is probably to manually remove the orphaned clones. -Sam On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote: Not yet. I keep it for now. Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit : rb.0.15c26.238e1f29 Has that rbd volume been removed? -Sam On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet ceph.l...@daevel.fr wrote: 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail. Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit : What version are you running? -Sam On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote: Is it enough ? # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head' 2013-05-22 15:43:09.308352 7f707dd64700 0 log [INF] : 9.105 scrub ok 2013-05-22 15:44:21.054893 7f707dd64700 0 log [INF] : 9.451 scrub ok 2013-05-22 15:44:52.898784 7f707cd62700 0 log [INF] : 9.784 scrub ok 2013-05-22 15:47:43.148515 7f707cd62700 0 log [INF] : 9.3c3 scrub ok 2013-05-22 15:47:45.717085 7f707dd64700 0 log [INF] : 9.3d0 scrub ok 2013-05-22 15:52:14.573815 7f707dd64700 0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.9221/12d7//3 found clone without head 2013-05-22 15:55:07.230114 7f707d563700 0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.3671/12d7//3 found clone without head 2013-05-22 15:56:56.456242 7f707d563700 0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.86a2/12d7//3 found clone without head 2013-05-22 15:57:51.667085 7f707dd64700 0 log [ERR] : 3.6b scrub 3 errors 2013-05-22 15:57:55.241224 7f707dd64700 0 log [INF] : 9.450 scrub ok 2013-05-22 15:57:59.800383 7f707cd62700 0 log [INF] : 9.465 scrub ok 2013-05-22 15:59:55.024065 7f707661a700 0 -- 192.168.42.3:6803/12142 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby 2013-05-22 16:01:45.542579 7f7022770700 0 -- 192.168.42.3:6803/12142 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby -- 2013-05-22 16:29:49.544310 7f707dd64700 0 log [INF] : 9.4eb scrub ok 2013-05-22 16:29:53.190233 7f707dd64700 0 log [INF] : 9.4f4 scrub ok 2013-05-22 16:29:59.478736 7f707dd64700 0 log [INF] : 8.6bb scrub ok 2013-05-22 16:35:12.240246 7f7022770700 0 -- 192.168.42.3:6803/12142 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby 2013-05-22 16:35:19.519019 7f707d563700 0 log [INF] : 8.700 scrub ok 2013-05-22 16:39:15.422532 7f707dd64700 0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.0836/12d7//3 found clone without head 2013-05-22 16:40:04.995256 7f707cd62700 0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.9a00/12d7//3 found clone without head 2013-05-22 16:41:07.008717 7f707d563700 0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.9820/12d7//3 found clone without head 2013-05-22 16:41:42.460280 7f707c561700 0 log [ERR] : 3.1 scrub 3 errors 2013-05-22 16:46:12.385678 7f7077735700 0 -- 192.168.42.3:6803/12142 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby 2013-05-22 16:58:36.079010 7f707661a700 0 -- 192.168.42.3:6803/12142 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby 2013-05-22 16:58:36.798038 7f707d563700 0 log [INF] : 9.50c scrub ok 2013-05-22 16:58:40.104159 7f707c561700 0 log [INF] : 9.526 scrub ok Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29). Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit : Can you post your ceph.log with the period including all of these errors? -Sam On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich maha...@bspu.unibel.by wrote: Olivier
Re: [ceph-users] scrub error: found clone without head
Can you send the filenames in the pg directories for those 4 pgs? -Sam On Thu, May 23, 2013 at 3:27 PM, Olivier Bonvalet ceph.l...@daevel.fr wrote: No : pg 3.7c is active+clean+inconsistent, acting [24,13,39] pg 3.6b is active+clean+inconsistent, acting [28,23,5] pg 3.d is active+clean+inconsistent, acting [29,4,11] pg 3.1 is active+clean+inconsistent, acting [28,19,5] But I suppose that all PG *was* having the osd.25 as primary (on the same host), which is (disabled) buggy OSD. Question : 12d7 in object path is the snapshot id, right ? If it's the case, I haven't got any snapshot with this id for the rb.0.15c26.238e1f29 image. So, which files should I remove ? Thanks for your help. Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit : Do all of the affected PGs share osd.28 as the primary? I think the only recovery is probably to manually remove the orphaned clones. -Sam On Thu, May 23, 2013 at 5:00 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote: Not yet. I keep it for now. Le mercredi 22 mai 2013 à 15:50 -0700, Samuel Just a écrit : rb.0.15c26.238e1f29 Has that rbd volume been removed? -Sam On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet ceph.l...@daevel.fr wrote: 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail. Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit : What version are you running? -Sam On Wed, May 22, 2013 at 11:25 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote: Is it enough ? # tail -n500 -f /var/log/ceph/osd.28.log | grep -A5 -B5 'found clone without head' 2013-05-22 15:43:09.308352 7f707dd64700 0 log [INF] : 9.105 scrub ok 2013-05-22 15:44:21.054893 7f707dd64700 0 log [INF] : 9.451 scrub ok 2013-05-22 15:44:52.898784 7f707cd62700 0 log [INF] : 9.784 scrub ok 2013-05-22 15:47:43.148515 7f707cd62700 0 log [INF] : 9.3c3 scrub ok 2013-05-22 15:47:45.717085 7f707dd64700 0 log [INF] : 9.3d0 scrub ok 2013-05-22 15:52:14.573815 7f707dd64700 0 log [ERR] : scrub 3.6b ade3c16b/rb.0.15c26.238e1f29.9221/12d7//3 found clone without head 2013-05-22 15:55:07.230114 7f707d563700 0 log [ERR] : scrub 3.6b 261cc0eb/rb.0.15c26.238e1f29.3671/12d7//3 found clone without head 2013-05-22 15:56:56.456242 7f707d563700 0 log [ERR] : scrub 3.6b b10deaeb/rb.0.15c26.238e1f29.86a2/12d7//3 found clone without head 2013-05-22 15:57:51.667085 7f707dd64700 0 log [ERR] : 3.6b scrub 3 errors 2013-05-22 15:57:55.241224 7f707dd64700 0 log [INF] : 9.450 scrub ok 2013-05-22 15:57:59.800383 7f707cd62700 0 log [INF] : 9.465 scrub ok 2013-05-22 15:59:55.024065 7f707661a700 0 -- 192.168.42.3:6803/12142 192.168.42.5:6828/31490 pipe(0x2a689000 sd=108 :6803 s=2 pgs=200652 cs=73 l=0).fault with nothing to send, going to standby 2013-05-22 16:01:45.542579 7f7022770700 0 -- 192.168.42.3:6803/12142 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 74 vs existing 73 state standby -- 2013-05-22 16:29:49.544310 7f707dd64700 0 log [INF] : 9.4eb scrub ok 2013-05-22 16:29:53.190233 7f707dd64700 0 log [INF] : 9.4f4 scrub ok 2013-05-22 16:29:59.478736 7f707dd64700 0 log [INF] : 8.6bb scrub ok 2013-05-22 16:35:12.240246 7f7022770700 0 -- 192.168.42.3:6803/12142 192.168.42.5:6828/31490 pipe(0x2a689280 sd=99 :6803 s=2 pgs=200667 cs=75 l=0).fault with nothing to send, going to standby 2013-05-22 16:35:19.519019 7f707d563700 0 log [INF] : 8.700 scrub ok 2013-05-22 16:39:15.422532 7f707dd64700 0 log [ERR] : scrub 3.1 b1869301/rb.0.15c26.238e1f29.0836/12d7//3 found clone without head 2013-05-22 16:40:04.995256 7f707cd62700 0 log [ERR] : scrub 3.1 bccad701/rb.0.15c26.238e1f29.9a00/12d7//3 found clone without head 2013-05-22 16:41:07.008717 7f707d563700 0 log [ERR] : scrub 3.1 8a9bec01/rb.0.15c26.238e1f29.9820/12d7//3 found clone without head 2013-05-22 16:41:42.460280 7f707c561700 0 log [ERR] : 3.1 scrub 3 errors 2013-05-22 16:46:12.385678 7f7077735700 0 -- 192.168.42.3:6803/12142 192.168.42.5:6828/31490 pipe(0x2a689c80 sd=137 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 76 vs existing 75 state standby 2013-05-22 16:58:36.079010 7f707661a700 0 -- 192.168.42.3:6803/12142 192.168.42.3:6801/11745 pipe(0x2a689a00 sd=44 :6803 s=0 pgs=0 cs=0 l=0).accept connect_seq 40 vs existing 39 state standby 2013-05-22 16:58:36.798038 7f707d563700 0 log [INF] : 9.50c scrub ok 2013-05-22 16:58:40.104159 7f707c561700 0 log [INF] : 9.526 scrub ok Note : I have 8 scrub errors like that, on 4 impacted PG, and all impacted objects are about the same RBD image (rb.0.15c26.238e1f29). Le mercredi 22 mai 2013 à 11:01 -0700, Samuel Just a écrit : Can you post
Re: [ceph-users] mkcephfs
Hi, The previous log is based on cuttlefish. This one is based on bobtail. I'm not using cephx, may be that's what causing the problem? temp dir is /tmp/mkcephfs.xf5TsinRsL preparing monmap in /tmp/mkcephfs.xf5TsinRsL/monmap /usr/bin/monmaptool --create --clobber --add a 192.168.128.10:6789 --add b 192.168.128.11:6789 --add c 192.168.128.12:6789 --print /tmp/mkcephfs.xf5TsinRsL/monmap /usr/bin/monmaptool: monmap file /tmp/mkcephfs.xf5TsinRsL/monmap /usr/bin/monmaptool: generated fsid 1168e717-5db5-488c-a1f7-0b61e7f19138 epoch 0 fsid 1168e717-5db5-488c-a1f7-0b61e7f19138 last_changed 2013-05-24 09:51:15.012839 created 2013-05-24 09:51:15.012839 0: 192.168.128.10:6789/0 mon.a 1: 192.168.128.11:6789/0 mon.b 2: 192.168.128.12:6789/0 mon.c /usr/bin/monmaptool: writing epoch 0 to /tmp/mkcephfs.xf5TsinRsL/monmap (3 monitors) === osd.0 === 2013-05-24 09:51:16.247443 7f19b2ce5780 -1 filestore(/var/lib/ceph/osd/ceph-0) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory 2013-05-24 09:51:16.604586 7f19b2ce5780 -1 created object store /var/lib/ceph/osd/ceph-0 journal /var/lib/ceph/osd/ceph-0/journal for osd.0 fsid 1168e717-5db5-488c-a1f7-0b61e7f19138 2013-05-24 09:51:16.604667 7f19b2ce5780 -1 auth: error reading file: /var/lib/ceph/osd/ceph-0/keyring: can't open /var/lib/ceph/osd/ceph-0/keyring: (2) No such file or directory 2013-05-24 09:51:16.604850 7f19b2ce5780 -1 created new key in keyring /var/lib/ceph/osd/ceph-0/keyring === osd.1 === pushing conf and monmap to ceph1:/tmp/mkfs.ceph.f0a8d758e9f1a3f32160f67a12149281 2013-05-24 09:50:46.405722 7fd7fa8c5780 -1 filestore(/var/lib/ceph/osd/ceph-1) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory 2013-05-24 09:50:46.750885 7fd7fa8c5780 -1 created object store /var/lib/ceph/osd/ceph-1 journal /var/lib/ceph/osd/ceph-1/journal for osd.1 fsid 1168e717-5db5-488c-a1f7-0b61e7f19138 2013-05-24 09:50:46.750945 7fd7fa8c5780 -1 auth: error reading file: /var/lib/ceph/osd/ceph-1/keyring: can't open /var/lib/ceph/osd/ceph-1/keyring: (2) No such file or directory 2013-05-24 09:50:46.751120 7fd7fa8c5780 -1 created new key in keyring /var/lib/ceph/osd/ceph-1/keyring collecting osd.1 key === osd.2 === pushing conf and monmap to ceph2:/tmp/mkfs.ceph.e07be4351777982bb28d1cc7ab52e01b 2013-05-24 09:51:33.623922 7fa231b87780 -1 filestore(/var/lib/ceph/osd/ceph-2) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory 2013-05-24 09:51:33.859703 7fa231b87780 -1 created object store /var/lib/ceph/osd/ceph-2 journal /var/lib/ceph/osd/ceph-2/journal for osd.2 fsid 1168e717-5db5-488c-a1f7-0b61e7f19138 2013-05-24 09:51:33.859772 7fa231b87780 -1 auth: error reading file: /var/lib/ceph/osd/ceph-2/keyring: can't open /var/lib/ceph/osd/ceph-2/keyring: (2) No such file or directory 2013-05-24 09:51:33.859930 7fa231b87780 -1 created new key in keyring /var/lib/ceph/osd/ceph-2/keyring collecting osd.2 key === mds.a === creating private key for mds.a keyring /var/lib/ceph/mds/ceph-a/keyring creating /var/lib/ceph/mds/ceph-a/keyring === mds.b === pushing conf and monmap to ceph1:/tmp/mkfs.ceph.5031963c92bcc7e98cd4422ee99d1220 creating private key for mds.b keyring /var/lib/ceph/mds/ceph-b/keyring creating /var/lib/ceph/mds/ceph-b/keyring collecting mds.b key === mds.c === pushing conf and monmap to ceph2:/tmp/mkfs.ceph.a7475d8f01e340fc9e410fd60a3d8a80 creating private key for mds.c keyring /var/lib/ceph/mds/ceph-c/keyring creating /var/lib/ceph/mds/ceph-c/keyring collecting mds.c key Building generic osdmap from /tmp/mkcephfs.xf5TsinRsL/conf /usr/bin/osdmaptool: osdmap file '/tmp/mkcephfs.xf5TsinRsL/osdmap' /usr/bin/osdmaptool: writing epoch 1 to /tmp/mkcephfs.xf5TsinRsL/osdmap Generating admin key at /tmp/mkcephfs.xf5TsinRsL/keyring.admin creating /tmp/mkcephfs.xf5TsinRsL/keyring.admin Building initial monitor keyring added entity mds.a auth auth(auid = 18446744073709551615 key=AQC9455RSHUiFhAAcxXX2jmAnPk69KGQ7rUczA== with 0 caps) added entity mds.b auth auth(auid = 18446744073709551615 key=AQCe455RcAEhCRAARPQhkIuPvYIXDElerq+zJg== with 0 caps) added entity mds.c auth auth(auid = 18446744073709551615 key=AQDM455R0KRfEhAAPByEb/CBqaUK68tssLH/ug== with 0 caps) added entity osd.0 auth auth(auid = 18446744073709551615 key=AQC0455RWKsKJBAANgtn7Xq8g3u1CPXekMCy7g== with 0 caps) added entity osd.1 auth auth(auid = 18446744073709551615 key=AQCW455RQJ7CLBAAQzY2cAMjFwHLd36s1w8m6g== with 0 caps) added entity osd.2 auth auth(auid = 18446744073709551615 key=AQDF455RoDM/MxAAaC/sEyCE4xFpeNHCqkyZSA== with 0 caps) === mon.a === /usr/bin/ceph-mon: created monfs at /var/lib/ceph/mon/ceph-a for mon.a === mon.b === pushing everything to ceph1 /usr/bin/ceph-mon: created monfs at /var/lib/ceph/mon/ceph-b for mon.b === mon.c === pushing everything to ceph2 /usr/bin/ceph-mon: created monfs at /var/lib/ceph/mon/ceph-c for mon.c placing client.admin keyring in /etc/ceph/keyring 2013-05-24 09:51:36.973708 7f774d118700 0 -- :/21213
Re: [ceph-users] ceph-deploy
I just found that #ceph-deploy gatherkeys ceph0 ceph1 ceph2 works only if I have bobtail. cuttlefish can't find ceph.client.admin. keyring and then when I try this on bobtail, it says, root@cephdeploy:~/12.04# ceph-deploy osd create ceph0:/dev/sda3 ceph1:/dev/sda3 ceph2:/dev/sda3 ceph-disk: Error: Device is mounted: /dev/sda3 Traceback (most recent call last): File /usr/bin/ceph-deploy, line 22, in module main() File /usr/lib/pymodules/python2.7/ceph_deploy/cli.py, line 112, in main return args.func(args) File /usr/lib/pymodules/python2.7/ceph_deploy/osd.py, line 293, in osd prepare(args, cfg, activate_prepared_disk=True) File /usr/lib/pymodules/python2.7/ceph_deploy/osd.py, line 177, in prepare dmcrypt_dir=args.dmcrypt_key_dir, File /usr/lib/python2.7/dist-packages/pushy/protocol/proxy.py, line 255, in lambda (conn.operator(type_, self, args, kwargs)) File /usr/lib/python2.7/dist-packages/pushy/protocol/connection.py, line 66, in operator return self.send_request(type_, (object, args, kwargs)) File /usr/lib/python2.7/dist-packages/pushy/protocol/baseconnection.py, line 323, in send_request return self.__handle(m) File /usr/lib/python2.7/dist-packages/pushy/protocol/baseconnection.py, line 639, in __handle raise e pushy.protocol.proxy.ExceptionProxy: Command '['ceph-disk-prepare', '--', '/dev/sda3']' returned non-zero exit status 1 root@cephdeploy:~/12.04# On Thu, May 23, 2013 at 10:49 PM, Dewan Shamsul Alam dewan.sham...@gmail.com wrote: Hi, I tried ceph-deploy all day. Found that it has a python-setuptools as dependency. I knew about python-pushy. But is there any other dependency that I'm missing? The problem I'm getting are as follows: #ceph-deploy gatherkeys ceph0 ceph1 ceph2 returns the following error, Unable to find /etc/ceph/ceph.client.admin.keyring on ['ceph0', 'ceph1', 'ceph2'] Once I got passed this, I don't know why it works sometimes. I have been following the exact steps as mentioned in the blog. Then when I try to do ceph-deploy osd create ceph0:/dev/sda3 ceph1:/dev/sda3 ceph2:/dev/sda3 It gets stuck. I'm using Ubuntu 13.04 for ceph-deploy and 12.04 for ceph nodes. I just need to get the cuttlefish working and willing to change the OS if it is required. Please help. :) Best Regards, Dewan Shamsul Alam ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com