Re: [ceph-users] PG down incomplete
If you can follow the documentation here: http://ceph.com/docs/master/rados/operations/monitoring-osd-pg/ and http://ceph.com/docs/master/rados/troubleshooting/ to provide some additional information, we may be better able to help you. For example, ceph osd tree would help us understand the status of your cluster a bit better. On Thu, May 16, 2013 at 10:32 PM, Olivier Bonvalet ceph.l...@daevel.fr wrote: Le mercredi 15 mai 2013 à 00:15 +0200, Olivier Bonvalet a écrit : Hi, I have some PG in state down and/or incomplete on my cluster, because I loose 2 OSD and a pool was having only 2 replicas. So of course that data is lost. My problem now is that I can't retreive a HEALTH_OK status : if I try to remove, read or overwrite the corresponding RBD images, near all OSD hang (well... they don't do anything and requests stay in a growing queue, until the production will be done). So, what can I do to remove that corrupts images ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Up. Nobody can help me on that problem ? Thanks, Olivier ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- John Wilkins Senior Technical Writer Intank john.wilk...@inktank.com (415) 425-9599 http://inktank.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Mount error 5 while mounting cephfs
Hi, I have deployed the ceph object store using ceph-deploy. I tried to mount cephfs and I got struck with this error. *sudo mount.ceph 192.168.35.82:/ /mnt/mycephfs -o name=admin,secret=AQDa5JJRqLxuOxAA77VljIjaAGWR6mGdL12NUQ==* *mount error 5 = Input/output error* The output of the command # *ceph -s * *health HEALTH_WARN 64 pgs degraded; mds cluster is degraded; mds blade2-qq is laggy monmap e1: 1 mons at {blade2-qq=192.168.35.82:6789/0}, election epoch 1, quorum 0 blade2-qq osdmap e56: 4 osds: 4 up, 4 in pgmap v834: 192 pgs: 128 active+clean, 64 active+clean+degraded; 0 bytes data, 50871 MB used, 437 GB / 513 GB avail mdsmap e6457: 1/1/1 up {0=blade2-qq=up:replay(laggy or crashed)} * As it says the MDS has crashed. I dont see ceph-mds running in the MDS_Node. I executed *ceph-deploy mds create mds_node *and this starts the ceph-mds daemon in the mds_node, but I see that the ceph-mds daemon crashes after sometime. Kindly help me on this issue. -- --sridhar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mount error 5 while mounting cephfs
Have you tried restarting your MDS server? http://ceph.com/docs/master/rados/operations/operating/#operating-a-cluster On Fri, May 17, 2013 at 12:16 AM, Sridhar Mahadevan msridha...@gmail.com wrote: Hi, I have deployed the ceph object store using ceph-deploy. I tried to mount cephfs and I got struck with this error. sudo mount.ceph 192.168.35.82:/ /mnt/mycephfs -o name=admin,secret=AQDa5JJRqLxuOxAA77VljIjaAGWR6mGdL12NUQ== mount error 5 = Input/output error The output of the command # ceph -s health HEALTH_WARN 64 pgs degraded; mds cluster is degraded; mds blade2-qq is laggy monmap e1: 1 mons at {blade2-qq=192.168.35.82:6789/0}, election epoch 1, quorum 0 blade2-qq osdmap e56: 4 osds: 4 up, 4 in pgmap v834: 192 pgs: 128 active+clean, 64 active+clean+degraded; 0 bytes data, 50871 MB used, 437 GB / 513 GB avail mdsmap e6457: 1/1/1 up {0=blade2-qq=up:replay(laggy or crashed)} As it says the MDS has crashed. I dont see ceph-mds running in the MDS_Node. I executed ceph-deploy mds create mds_node and this starts the ceph-mds daemon in the mds_node, but I see that the ceph-mds daemon crashes after sometime. Kindly help me on this issue. -- --sridhar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- John Wilkins Senior Technical Writer Intank john.wilk...@inktank.com (415) 425-9599 http://inktank.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mount error 5 while mounting cephfs
Hi, I did try to restart the MDS server. The logs show the following error *[187846.234448] init: ceph-mds (ceph/blade2-qq) main process (15077) killed by ABRT signal [187846.234493] init: ceph-mds (ceph/blade2-qq) main process ended, respawning [187846.687929] init: ceph-mds (ceph/blade2-qq) main process (15099) killed by ABRT signal [187846.687977] init: ceph-mds (ceph/blade2-qq) respawning too fast, stopped * Thanks and Regards On Fri, May 17, 2013 at 3:33 PM, John Wilkins john.wilk...@inktank.comwrote: Have you tried restarting your MDS server? http://ceph.com/docs/master/rados/operations/operating/#operating-a-cluster On Fri, May 17, 2013 at 12:16 AM, Sridhar Mahadevan msridha...@gmail.com wrote: Hi, I have deployed the ceph object store using ceph-deploy. I tried to mount cephfs and I got struck with this error. sudo mount.ceph 192.168.35.82:/ /mnt/mycephfs -o name=admin,secret=AQDa5JJRqLxuOxAA77VljIjaAGWR6mGdL12NUQ== mount error 5 = Input/output error The output of the command # ceph -s health HEALTH_WARN 64 pgs degraded; mds cluster is degraded; mds blade2-qq is laggy monmap e1: 1 mons at {blade2-qq=192.168.35.82:6789/0}, election epoch 1, quorum 0 blade2-qq osdmap e56: 4 osds: 4 up, 4 in pgmap v834: 192 pgs: 128 active+clean, 64 active+clean+degraded; 0 bytes data, 50871 MB used, 437 GB / 513 GB avail mdsmap e6457: 1/1/1 up {0=blade2-qq=up:replay(laggy or crashed)} As it says the MDS has crashed. I dont see ceph-mds running in the MDS_Node. I executed ceph-deploy mds create mds_node and this starts the ceph-mds daemon in the mds_node, but I see that the ceph-mds daemon crashes after sometime. Kindly help me on this issue. -- --sridhar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- John Wilkins Senior Technical Writer Intank john.wilk...@inktank.com (415) 425-9599 http://inktank.com -- --sridhar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mount error 5 while mounting cephfs
Are you running the MDS in a VM? On Fri, May 17, 2013 at 12:40 AM, Sridhar Mahadevan msridha...@gmail.com wrote: Hi, I did try to restart the MDS server. The logs show the following error [187846.234448] init: ceph-mds (ceph/blade2-qq) main process (15077) killed by ABRT signal [187846.234493] init: ceph-mds (ceph/blade2-qq) main process ended, respawning [187846.687929] init: ceph-mds (ceph/blade2-qq) main process (15099) killed by ABRT signal [187846.687977] init: ceph-mds (ceph/blade2-qq) respawning too fast, stopped Thanks and Regards On Fri, May 17, 2013 at 3:33 PM, John Wilkins john.wilk...@inktank.com wrote: Have you tried restarting your MDS server? http://ceph.com/docs/master/rados/operations/operating/#operating-a-cluster On Fri, May 17, 2013 at 12:16 AM, Sridhar Mahadevan msridha...@gmail.com wrote: Hi, I have deployed the ceph object store using ceph-deploy. I tried to mount cephfs and I got struck with this error. sudo mount.ceph 192.168.35.82:/ /mnt/mycephfs -o name=admin,secret=AQDa5JJRqLxuOxAA77VljIjaAGWR6mGdL12NUQ== mount error 5 = Input/output error The output of the command # ceph -s health HEALTH_WARN 64 pgs degraded; mds cluster is degraded; mds blade2-qq is laggy monmap e1: 1 mons at {blade2-qq=192.168.35.82:6789/0}, election epoch 1, quorum 0 blade2-qq osdmap e56: 4 osds: 4 up, 4 in pgmap v834: 192 pgs: 128 active+clean, 64 active+clean+degraded; 0 bytes data, 50871 MB used, 437 GB / 513 GB avail mdsmap e6457: 1/1/1 up {0=blade2-qq=up:replay(laggy or crashed)} As it says the MDS has crashed. I dont see ceph-mds running in the MDS_Node. I executed ceph-deploy mds create mds_node and this starts the ceph-mds daemon in the mds_node, but I see that the ceph-mds daemon crashes after sometime. Kindly help me on this issue. -- --sridhar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- John Wilkins Senior Technical Writer Intank john.wilk...@inktank.com (415) 425-9599 http://inktank.com -- --sridhar -- John Wilkins Senior Technical Writer Intank john.wilk...@inktank.com (415) 425-9599 http://inktank.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG down incomplete
Hi, thanks for your answer. In fact I have several different problems, which I tried to solve separatly : 1) I loose 2 OSD, and some pools have only 2 replicas. So some data was lost. 2) One monitor refuse the Cuttlefish upgrade, so I only have 4 of 5 monitors running. 3) I have 4 old inconsistent PG that I can't repair. So the status : health HEALTH_ERR 15 pgs incomplete; 4 pgs inconsistent; 15 pgs stuck inactive; 15 pgs stuck unclean; 1 near full osd(s); 19 scrub errors; noout flag(s) set; 1 mons down, quorum 0,1,2,3 a,b,c,e monmap e7: 5 mons at {a=10.0.0.1:6789/0,b=10.0.0.2:6789/0,c=10.0.0.5:6789/0,d=10.0.0.6:6789/0,e=10.0.0.3:6789/0}, election epoch 2584, quorum 0,1,2,3 a,b,c,e osdmap e82502: 50 osds: 48 up, 48 in pgmap v12807617: 7824 pgs: 7803 active+clean, 1 active+clean +scrubbing, 15 incomplete, 4 active+clean+inconsistent, 1 active+clean +scrubbing+deep; 5676 GB data, 18948 GB used, 18315 GB / 37263 GB avail; 137KB/s rd, 1852KB/s wr, 199op/s mdsmap e1: 0/0/1 up The tree : # idweight type name up/down reweight -8 14.26 root SSDroot -27 8 datacenter SSDrbx2 -26 8 room SSDs25 -25 8 net SSD188-165-12 -24 8 rack SSD25B09 -23 8 host lyll 46 2 osd.46 up 1 47 2 osd.47 up 1 48 2 osd.48 up 1 49 2 osd.49 up 1 -10 4.26datacenter SSDrbx3 -12 2 room SSDs43 -13 2 net SSD178-33-122 -16 2 rack SSD43S01 -17 2 host kaino 42 1 osd.42 up 1 43 1 osd.43 up 1 -22 2.26room SSDs45 -21 2.26net SSD5-135-138 -20 2.26rack SSD45F01 -19 2.26host taman 44 1.13osd.44 up 1 45 1.13osd.45 up 1 -9 2 datacenter SSDrbx4 -11 2 room SSDs52 -14 2 net SSD176-31-226 -15 2 rack SSD52B09 -18 2 host dragan 40 1 osd.40 up 1 41 1 osd.41 up 1 -1 33.43 root SASroot -10015.9datacenter SASrbx1 -90 15.9room SASs15 -72 15.9net SAS188-165-15 -40 8 rack SAS15B01 -3 8 host brontes 0 1 osd.0 up 1 1 1 osd.1 up 1 2 1 osd.2 up 1 3 1 osd.3 up 1 4 1 osd.4 up 1 5 1 osd.5 up 1 6 1 osd.6 up 1 7 1 osd.7 up 1 -41 7.9 rack SAS15B02 -6 7.9 host alim 24 1 osd.24 up 1 25 1 osd.25 down 0 26 1 osd.26 up 1 27 1 osd.27 up 1 28 1 osd.28 up 1 29 1 osd.29 up 1 30 1 osd.30 up 1 31 0.9 osd.31 up 1 -10117.53 datacenter SASrbx2 -91 17.53
[ceph-users] ceph v6.1, rbd-fuse issue, rbd_list: error %d Numerical result out of range
Hi everyone The image files don't display in mount point when using the command rbd-fuse -p poolname -c /etc/ceph/ceph.conf /aa but other pools can display image files with the same command. I also create more sizes and more numbers images than that pool, it's work fine. How can I track the issue? It reports the below errors after enabling debug output of Fuse options. root@ceph3:/# rbd-fuse -p qa_vol /aa -d FUSE library version: 2.8.6 nullpath_ok: 0 unique: 1, opcode: INIT (26), nodeid: 0, insize: 56 INIT: 7.17 flags=0x047b max_readahead=0x0002 INIT: 7.12 flags=0x0031 max_readahead=0x0002 max_write=0x0002 unique: 1, success, outsize: 40 unique: 2, opcode: GETATTR (3), nodeid: 1, insize: 56 getattr / rbd_list: error %d : Numerical result out of range unique: 2, success, outsize: 120 unique: 3, opcode: OPENDIR (27), nodeid: 1, insize: 48 opendir flags: 0x98800 / rbd_list: error %d : Numerical result out of range opendir[0] flags: 0x98800 / unique: 3, success, outsize: 32 unique: 4, opcode: READDIR (28), nodeid: 1, insize: 80 readdir[0] from 0 unique: 4, success, outsize: 80 unique: 5, opcode: READDIR (28), nodeid: 1, insize: 80 unique: 5, success, outsize: 16 unique: 6, opcode: RELEASEDIR (29), nodeid: 1, insize: 64 releasedir[0] flags: 0x0 unique: 6, success, outsize: 16 thanks. Sean Cao ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy
Thanks Gary, after you throwing me those clues I got furthur but it still isnt working. It seems there are no i386 deb python-pushy packages in either of those ceph repo's. I also attempted using PIP and got pushy installed but the ceph-deploy debs still refused to install. I built another VM with 64bit Debian 7 and the packages were found and installed however there is an error on compiling during install. any ideas? cheers -Matt administrator@ceph-admin:~$ sudo aptitude install ceph-deploy The following NEW packages will be installed: ceph-deploy python-pushy{a} 0 packages upgraded, 2 newly installed, 0 to remove and 0 not upgraded. Need to get 53.3 kB of archives. After unpacking 328 kB will be used. Do you want to continue? [Y/n/?] Get: 1 http://ceph.com/debian-cuttlefish/ wheezy/main python-pushy amd64 0.5.1-1 [30.9 kB] Get: 2 http://ceph.com/debian-cuttlefish/ wheezy/main ceph-deploy all 0.1-1 [22.4 kB] Fetched 53.3 kB in 1s (33.2 kB/s) Selecting previously unselected package python-pushy. (Reading database ... 38969 files and directories currently installed.) Unpacking python-pushy (from .../python-pushy_0.5.1-1_amd64.deb) ... Selecting previously unselected package ceph-deploy. Unpacking ceph-deploy (from .../ceph-deploy_0.1-1_all.deb) ... Setting up python-pushy (0.5.1-1) ... Setting up ceph-deploy (0.1-1) ... Processing triggers for python-support ... Compiling /usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli.py ... SyntaxError: ('invalid syntax', ('/usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli.py', 44, 26, ' assert {p.basename for p in tmpdir.listdir()} == set()\n')) Compiling /usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli_new.py ... SyntaxError: ('invalid syntax', ('/usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli_new.py', 33, 26, assert {p.basename for p in tmpdir.listdir()} == {'ceph.conf'}\n)) administrator@ceph-admin:~$ ceph-deploy Traceback (most recent call last): File /usr/bin/ceph-deploy, line 19, in module from ceph_deploy.cli import main File /usr/lib/pymodules/python2.7/ceph_deploy/cli.py, line 1, in module import pkg_resources ImportError: No module named pkg_resources administrator@ceph-admin:~$ python Python 2.7.3 (default, Jan 2 2013, 13:56:14) [GCC 4.7.2] on linux2 On Fri, May 17, 2013 at 9:10 AM, Gary Lowell glow...@sonic.net wrote: Hi Matt - Sounds like you installed ceph-deploy by downloading from github.com/ceph/ceph-deploy, then running the bootstrap script. We have debian packages for ceph-deploy and python-pushy that are included in the debian-cuttlefish repo, as well as http://ceph.com/packages/ceph-deploy/debian.You can install python-push from those locations with apt, or you can install via pip: sudo pip python-pushy. Let me know if you continue to have problems. Cheers, Gary On May 16, 2013, at 3:51 PM, Matt Chipman wrote: hi, I used ceph-deploy successfully a few days ago but recently reinstalled my admin machine from the same instructions http://ceph.com/docs/master/rados/deployment/preflight-checklist/ now getting the error below. Then I figured I'd just use the debs but they are missing the python-pushy dependancy. Debian 7.0 Is there any way to solve either issue? thanks administrator@cephadmin:~$ ceph-deploy usage: ceph-deploy [-h] [-v | -q] [-n] [--overwrite-conf] [--cluster NAME] COMMAND ... ceph-deploy: error: too few arguments administrator@cephadmin:~$ ceph-deploy install ceph00 Traceback (most recent call last): File /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/client.py, line 383, in __init__ self.modules = AutoImporter(self) File /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/client.py, line 236, in __init__ remote_compile = self.__client.eval(compile) File /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/client.py, line 478, in eval return self.remote.eval(code, globals, locals) File /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/connection.py, line 54, in eval return self.send_request(MessageType.evaluate, args) File /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/baseconnection.py, line 315, in send_request m = self.__waitForResponse(handler) File /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/baseconnection.py, line 420, in __waitForResponse m = self.__recv() File /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/baseconnection.py, line 601, in __recv m = self.__istream.receive_message() File
Re: [ceph-users] PG down incomplete
It looks like you have the noout flag set: noout flag(s) set; 1 mons down, quorum 0,1,2,3 a,b,c,e monmap e7: 5 mons at {a=10.0.0.1:6789/0,b=10.0.0.2:6789/0,c=10.0.0.5:6789/0,d=10.0.0.6:6789/0,e=10.0.0.3:6789/0}, election epoch 2584, quorum 0,1,2,3 a,b,c,e osdmap e82502: 50 osds: 48 up, 48 in http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#stopping-w-out-rebalancing If you have down OSDs that don't get marked out, that would certainly cause problems. Have you tried restarting the failed OSDs? What do the logs look like for osd.15 and osd.25? On Fri, May 17, 2013 at 1:31 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote: Hi, thanks for your answer. In fact I have several different problems, which I tried to solve separatly : 1) I loose 2 OSD, and some pools have only 2 replicas. So some data was lost. 2) One monitor refuse the Cuttlefish upgrade, so I only have 4 of 5 monitors running. 3) I have 4 old inconsistent PG that I can't repair. So the status : health HEALTH_ERR 15 pgs incomplete; 4 pgs inconsistent; 15 pgs stuck inactive; 15 pgs stuck unclean; 1 near full osd(s); 19 scrub errors; noout flag(s) set; 1 mons down, quorum 0,1,2,3 a,b,c,e monmap e7: 5 mons at {a=10.0.0.1:6789/0,b=10.0.0.2:6789/0,c=10.0.0.5:6789/0,d=10.0.0.6:6789/0,e=10.0.0.3:6789/0}, election epoch 2584, quorum 0,1,2,3 a,b,c,e osdmap e82502: 50 osds: 48 up, 48 in pgmap v12807617: 7824 pgs: 7803 active+clean, 1 active+clean +scrubbing, 15 incomplete, 4 active+clean+inconsistent, 1 active+clean +scrubbing+deep; 5676 GB data, 18948 GB used, 18315 GB / 37263 GB avail; 137KB/s rd, 1852KB/s wr, 199op/s mdsmap e1: 0/0/1 up The tree : # idweight type name up/down reweight -8 14.26 root SSDroot -27 8 datacenter SSDrbx2 -26 8 room SSDs25 -25 8 net SSD188-165-12 -24 8 rack SSD25B09 -23 8 host lyll 46 2 osd.46 up 1 47 2 osd.47 up 1 48 2 osd.48 up 1 49 2 osd.49 up 1 -10 4.26datacenter SSDrbx3 -12 2 room SSDs43 -13 2 net SSD178-33-122 -16 2 rack SSD43S01 -17 2 host kaino 42 1 osd.42 up 1 43 1 osd.43 up 1 -22 2.26room SSDs45 -21 2.26net SSD5-135-138 -20 2.26rack SSD45F01 -19 2.26host taman 44 1.13osd.44 up 1 45 1.13osd.45 up 1 -9 2 datacenter SSDrbx4 -11 2 room SSDs52 -14 2 net SSD176-31-226 -15 2 rack SSD52B09 -18 2 host dragan 40 1 osd.40 up 1 41 1 osd.41 up 1 -1 33.43 root SASroot -10015.9datacenter SASrbx1 -90 15.9room SASs15 -72 15.9net SAS188-165-15 -40 8 rack SAS15B01 -3 8 host brontes 0 1 osd.0 up 1 1 1 osd.1 up 1 2 1 osd.2 up 1 3 1 osd.3 up 1 4 1 osd.4 up 1 5 1 osd.5 up 1 6 1 osd.6 up 1 7 1 osd.7 up 1 -41 7.9 rack SAS15B02 -6 7.9 host alim 24 1 osd.24 up 1 25 1 osd.25 down 0 26
Re: [ceph-users] PG down incomplete
Another thing... since your osd.10 is near full, your cluster may be fairly close to capacity for the purposes of rebalancing. Have a look at: http://ceph.com/docs/master/rados/configuration/mon-config-ref/#storage-capacity http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#no-free-drive-space Maybe we can get some others to look at this. It's not clear to me why the other OSD crashes after you take osd.25 out. It could be capacity, but that shouldn't make it crash. Have you tried adding more OSDs to increase capacity? On Fri, May 17, 2013 at 11:27 AM, John Wilkins john.wilk...@inktank.com wrote: It looks like you have the noout flag set: noout flag(s) set; 1 mons down, quorum 0,1,2,3 a,b,c,e monmap e7: 5 mons at {a=10.0.0.1:6789/0,b=10.0.0.2:6789/0,c=10.0.0.5:6789/0,d=10.0.0.6:6789/0,e=10.0.0.3:6789/0}, election epoch 2584, quorum 0,1,2,3 a,b,c,e osdmap e82502: 50 osds: 48 up, 48 in http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#stopping-w-out-rebalancing If you have down OSDs that don't get marked out, that would certainly cause problems. Have you tried restarting the failed OSDs? What do the logs look like for osd.15 and osd.25? On Fri, May 17, 2013 at 1:31 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote: Hi, thanks for your answer. In fact I have several different problems, which I tried to solve separatly : 1) I loose 2 OSD, and some pools have only 2 replicas. So some data was lost. 2) One monitor refuse the Cuttlefish upgrade, so I only have 4 of 5 monitors running. 3) I have 4 old inconsistent PG that I can't repair. So the status : health HEALTH_ERR 15 pgs incomplete; 4 pgs inconsistent; 15 pgs stuck inactive; 15 pgs stuck unclean; 1 near full osd(s); 19 scrub errors; noout flag(s) set; 1 mons down, quorum 0,1,2,3 a,b,c,e monmap e7: 5 mons at {a=10.0.0.1:6789/0,b=10.0.0.2:6789/0,c=10.0.0.5:6789/0,d=10.0.0.6:6789/0,e=10.0.0.3:6789/0}, election epoch 2584, quorum 0,1,2,3 a,b,c,e osdmap e82502: 50 osds: 48 up, 48 in pgmap v12807617: 7824 pgs: 7803 active+clean, 1 active+clean +scrubbing, 15 incomplete, 4 active+clean+inconsistent, 1 active+clean +scrubbing+deep; 5676 GB data, 18948 GB used, 18315 GB / 37263 GB avail; 137KB/s rd, 1852KB/s wr, 199op/s mdsmap e1: 0/0/1 up The tree : # idweight type name up/down reweight -8 14.26 root SSDroot -27 8 datacenter SSDrbx2 -26 8 room SSDs25 -25 8 net SSD188-165-12 -24 8 rack SSD25B09 -23 8 host lyll 46 2 osd.46 up 1 47 2 osd.47 up 1 48 2 osd.48 up 1 49 2 osd.49 up 1 -10 4.26datacenter SSDrbx3 -12 2 room SSDs43 -13 2 net SSD178-33-122 -16 2 rack SSD43S01 -17 2 host kaino 42 1 osd.42 up 1 43 1 osd.43 up 1 -22 2.26room SSDs45 -21 2.26net SSD5-135-138 -20 2.26rack SSD45F01 -19 2.26host taman 44 1.13osd.44 up 1 45 1.13osd.45 up 1 -9 2 datacenter SSDrbx4 -11 2 room SSDs52 -14 2 net SSD176-31-226 -15 2 rack SSD52B09 -18 2 host dragan 40 1 osd.40 up 1 41 1 osd.41 up 1 -1 33.43 root SASroot -10015.9datacenter SASrbx1 -90 15.9room SASs15 -72 15.9net SAS188-165-15 -40 8 rack SAS15B01 -3 8 host brontes 0 1 osd.0 up 1 1 1 osd.1 up 1 2 1 osd.2 up 1 3 1 osd.3 up 1
Re: [ceph-users] ceph-deploy
Hi Matt - Sorry, I just spotted at the end of your message that you are using python 2.7.3. But the modules are installing into the python2.6 directories. I don't know why that would be happening, and we'll have to dig into more. Python is tripping over incompatible syntax for some reason. Cheers, Gary On May 17, 2013, at 1:41 PM, Gary Lowell wrote: Hi Matt - I see in the message below that you are using python 2.6. Ceph-deploy may have some syntax that is incompatible with that version of python. On wheezy we tested with the default python 2.7.3 interpreter. You might try using the newer interpreter, we will also do so more testing to see if we can get ceph-deploy working with python 2.6. Cheers, Gary On May 17, 2013, at 6:23 AM, Matt Chipman wrote: Thanks Gary, after you throwing me those clues I got furthur but it still isnt working. It seems there are no i386 deb python-pushy packages in either of those ceph repo's. I also attempted using PIP and got pushy installed but the ceph-deploy debs still refused to install. I built another VM with 64bit Debian 7 and the packages were found and installed however there is an error on compiling during install. any ideas? cheers -Matt administrator@ceph-admin:~$ sudo aptitude install ceph-deploy The following NEW packages will be installed: ceph-deploy python-pushy{a} 0 packages upgraded, 2 newly installed, 0 to remove and 0 not upgraded. Need to get 53.3 kB of archives. After unpacking 328 kB will be used. Do you want to continue? [Y/n/?] Get: 1 http://ceph.com/debian-cuttlefish/ wheezy/main python-pushy amd64 0.5.1-1 [30.9 kB] Get: 2 http://ceph.com/debian-cuttlefish/ wheezy/main ceph-deploy all 0.1-1 [22.4 kB] Fetched 53.3 kB in 1s (33.2 kB/s) Selecting previously unselected package python-pushy. (Reading database ... 38969 files and directories currently installed.) Unpacking python-pushy (from .../python-pushy_0.5.1-1_amd64.deb) ... Selecting previously unselected package ceph-deploy. Unpacking ceph-deploy (from .../ceph-deploy_0.1-1_all.deb) ... Setting up python-pushy (0.5.1-1) ... Setting up ceph-deploy (0.1-1) ... Processing triggers for python-support ... Compiling /usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli.py ... SyntaxError: ('invalid syntax', ('/usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli.py', 44, 26, ' assert {p.basename for p in tmpdir.listdir()} == set()\n')) Compiling /usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli_new.py ... SyntaxError: ('invalid syntax', ('/usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli_new.py', 33, 26, assert {p.basename for p in tmpdir.listdir()} == {'ceph.conf'}\n)) administrator@ceph-admin:~$ ceph-deploy Traceback (most recent call last): File /usr/bin/ceph-deploy, line 19, in module from ceph_deploy.cli import main File /usr/lib/pymodules/python2.7/ceph_deploy/cli.py, line 1, in module import pkg_resources ImportError: No module named pkg_resources administrator@ceph-admin:~$ python Python 2.7.3 (default, Jan 2 2013, 13:56:14) [GCC 4.7.2] on linux2 On Fri, May 17, 2013 at 9:10 AM, Gary Lowell glow...@sonic.net wrote: Hi Matt - Sounds like you installed ceph-deploy by downloading from github.com/ceph/ceph-deploy, then running the bootstrap script. We have debian packages for ceph-deploy and python-pushy that are included in the debian-cuttlefish repo, as well as http://ceph.com/packages/ceph-deploy/debian.You can install python-push from those locations with apt, or you can install via pip: sudo pip python-pushy. Let me know if you continue to have problems. Cheers, Gary On May 16, 2013, at 3:51 PM, Matt Chipman wrote: hi, I used ceph-deploy successfully a few days ago but recently reinstalled my admin machine from the same instructions http://ceph.com/docs/master/rados/deployment/preflight-checklist/ now getting the error below. Then I figured I'd just use the debs but they are missing the python-pushy dependancy. Debian 7.0 Is there any way to solve either issue? thanks administrator@cephadmin:~$ ceph-deploy usage: ceph-deploy [-h] [-v | -q] [-n] [--overwrite-conf] [--cluster NAME] COMMAND ... ceph-deploy: error: too few arguments administrator@cephadmin:~$ ceph-deploy install ceph00 Traceback (most recent call last): File /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/client.py, line 383, in __init__ self.modules = AutoImporter(self) File /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/client.py, line 236, in __init__ remote_compile = self.__client.eval(compile) File /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/client.py, line 478, in eval return
Re: [ceph-users] Kernel panic on rbd map when cluster is out of monitor quorum
On Fri, 17 May 2013, Joe Ryner wrote: Hi All, I have had an issue recently while working on my ceph clusters. The following issue seems to be true on bobtail and cuttlefish. I have two production clusters in two different data centers and a test cluster. We are using ceph to run virtual machines. I use rbd as block devices for sanlock. I am running Fedora 18. I have been moving monitors around and in the process I got the cluster out of quorum, so ceph stopped responding. During this time I decided to reboot a ceph node that performs an rbd map during startup. The system boots ok but the service script that is performing the rbd map doesn't finish and eventually the system will OOPS and then finally panic. I was able to disable the rbd map during boot and finally got the cluster back in quorum and everything settled down nicely. What kernel version? Are you using cephx authentication? If you could open a bug at tracker.ceph.com that would be most helpful! Question, has anyone seen this behavior of crashing/panic? I have seen this happen on both of my production clusters. Secondly, the ceph command hangs when the cluster is out of quorum, is there a timeout available? Not currently. You can do this yourself with 'timeout 120 ...' with any recent coreutils. Thanks- sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Kernel panic on rbd map when cluster is out of monitor quorum
On 05/17/2013 03:49 PM, Joe Ryner wrote: Hi All, I have had an issue recently while working on my ceph clusters. The following issue seems to be true on bobtail and cuttlefish. I have two production clusters in two different data centers and a test cluster. We are using ceph to run virtual machines. I use rbd as block devices for sanlock. Also, do you have any of the information that the kernel might have dumped when it panicked? That might be helpful identifying the problem. -Alex I am running Fedora 18. I have been moving monitors around and in the process I got the cluster out of quorum, so ceph stopped responding. During this time I decided to reboot a ceph node that performs an rbd map during startup. The system boots ok but the service script that is performing the rbd map doesn't finish and eventually the system will OOPS and then finally panic. I was able to disable the rbd map during boot and finally got the cluster back in quorum and everything settled down nicely. Question, has anyone seen this behavior of crashing/panic? I have seen this happen on both of my production clusters. Secondly, the ceph command hangs when the cluster is out of quorum, is there a timeout available? Thanks Joe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG down incomplete
Yes, I set the noout flag to avoid the auto balancing of the osd.25, which will crash all OSD of this host (already tried several times). Le vendredi 17 mai 2013 à 11:27 -0700, John Wilkins a écrit : It looks like you have the noout flag set: noout flag(s) set; 1 mons down, quorum 0,1,2,3 a,b,c,e monmap e7: 5 mons at {a=10.0.0.1:6789/0,b=10.0.0.2:6789/0,c=10.0.0.5:6789/0,d=10.0.0.6:6789/0,e=10.0.0.3:6789/0}, election epoch 2584, quorum 0,1,2,3 a,b,c,e osdmap e82502: 50 osds: 48 up, 48 in http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#stopping-w-out-rebalancing If you have down OSDs that don't get marked out, that would certainly cause problems. Have you tried restarting the failed OSDs? What do the logs look like for osd.15 and osd.25? On Fri, May 17, 2013 at 1:31 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote: Hi, thanks for your answer. In fact I have several different problems, which I tried to solve separatly : 1) I loose 2 OSD, and some pools have only 2 replicas. So some data was lost. 2) One monitor refuse the Cuttlefish upgrade, so I only have 4 of 5 monitors running. 3) I have 4 old inconsistent PG that I can't repair. So the status : health HEALTH_ERR 15 pgs incomplete; 4 pgs inconsistent; 15 pgs stuck inactive; 15 pgs stuck unclean; 1 near full osd(s); 19 scrub errors; noout flag(s) set; 1 mons down, quorum 0,1,2,3 a,b,c,e monmap e7: 5 mons at {a=10.0.0.1:6789/0,b=10.0.0.2:6789/0,c=10.0.0.5:6789/0,d=10.0.0.6:6789/0,e=10.0.0.3:6789/0}, election epoch 2584, quorum 0,1,2,3 a,b,c,e osdmap e82502: 50 osds: 48 up, 48 in pgmap v12807617: 7824 pgs: 7803 active+clean, 1 active+clean +scrubbing, 15 incomplete, 4 active+clean+inconsistent, 1 active+clean +scrubbing+deep; 5676 GB data, 18948 GB used, 18315 GB / 37263 GB avail; 137KB/s rd, 1852KB/s wr, 199op/s mdsmap e1: 0/0/1 up The tree : # idweight type name up/down reweight -8 14.26 root SSDroot -27 8 datacenter SSDrbx2 -26 8 room SSDs25 -25 8 net SSD188-165-12 -24 8 rack SSD25B09 -23 8 host lyll 46 2 osd.46 up 1 47 2 osd.47 up 1 48 2 osd.48 up 1 49 2 osd.49 up 1 -10 4.26datacenter SSDrbx3 -12 2 room SSDs43 -13 2 net SSD178-33-122 -16 2 rack SSD43S01 -17 2 host kaino 42 1 osd.42 up 1 43 1 osd.43 up 1 -22 2.26room SSDs45 -21 2.26net SSD5-135-138 -20 2.26rack SSD45F01 -19 2.26host taman 44 1.13osd.44 up 1 45 1.13osd.45 up 1 -9 2 datacenter SSDrbx4 -11 2 room SSDs52 -14 2 net SSD176-31-226 -15 2 rack SSD52B09 -18 2 host dragan 40 1 osd.40 up 1 41 1 osd.41 up 1 -1 33.43 root SASroot -10015.9datacenter SASrbx1 -90 15.9room SASs15 -72 15.9net SAS188-165-15 -40 8 rack SAS15B01 -3 8 host brontes 0 1 osd.0 up 1 1 1 osd.1 up 1 2 1 osd.2 up 1 3 1 osd.3 up 1 4 1 osd.4 up 1 5 1 osd.5 up 1 6 1 osd.6 up 1 7 1 osd.7 up
Re: [ceph-users] ceph-deploy
Great news. There was a patch committed to master last week that added python-setuptools to the dependency. So the issue shouldn't happen with the next build. Cheers, Gary On May 17, 2013, at 4:47 PM, Matt Chipman wrote: Hi Gary, after a bit of searching on the list I was able to resolve this by aptitude install python-setuptools. seems it's a missing dependency on wheezy ceph-deploy install. thanks for your help -Matt On Sat, May 18, 2013 at 6:54 AM, Gary Lowell glow...@sonic.net wrote: Hi Matt - Sorry, I just spotted at the end of your message that you are using python 2.7.3. But the modules are installing into the python2.6 directories. I don't know why that would be happening, and we'll have to dig into more. Python is tripping over incompatible syntax for some reason. Cheers, Gary On May 17, 2013, at 1:41 PM, Gary Lowell wrote: Hi Matt - I see in the message below that you are using python 2.6. Ceph-deploy may have some syntax that is incompatible with that version of python. On wheezy we tested with the default python 2.7.3 interpreter. You might try using the newer interpreter, we will also do so more testing to see if we can get ceph-deploy working with python 2.6. Cheers, Gary On May 17, 2013, at 6:23 AM, Matt Chipman wrote: Thanks Gary, after you throwing me those clues I got furthur but it still isnt working. It seems there are no i386 deb python-pushy packages in either of those ceph repo's. I also attempted using PIP and got pushy installed but the ceph-deploy debs still refused to install. I built another VM with 64bit Debian 7 and the packages were found and installed however there is an error on compiling during install. any ideas? cheers -Matt administrator@ceph-admin:~$ sudo aptitude install ceph-deploy The following NEW packages will be installed: ceph-deploy python-pushy{a} 0 packages upgraded, 2 newly installed, 0 to remove and 0 not upgraded. Need to get 53.3 kB of archives. After unpacking 328 kB will be used. Do you want to continue? [Y/n/?] Get: 1 http://ceph.com/debian-cuttlefish/ wheezy/main python-pushy amd64 0.5.1-1 [30.9 kB] Get: 2 http://ceph.com/debian-cuttlefish/ wheezy/main ceph-deploy all 0.1-1 [22.4 kB] Fetched 53.3 kB in 1s (33.2 kB/s) Selecting previously unselected package python-pushy. (Reading database ... 38969 files and directories currently installed.) Unpacking python-pushy (from .../python-pushy_0.5.1-1_amd64.deb) ... Selecting previously unselected package ceph-deploy. Unpacking ceph-deploy (from .../ceph-deploy_0.1-1_all.deb) ... Setting up python-pushy (0.5.1-1) ... Setting up ceph-deploy (0.1-1) ... Processing triggers for python-support ... Compiling /usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli.py ... SyntaxError: ('invalid syntax', ('/usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli.py', 44, 26, ' assert {p.basename for p in tmpdir.listdir()} == set()\n')) Compiling /usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli_new.py ... SyntaxError: ('invalid syntax', ('/usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli_new.py', 33, 26, assert {p.basename for p in tmpdir.listdir()} == {'ceph.conf'}\n)) administrator@ceph-admin:~$ ceph-deploy Traceback (most recent call last): File /usr/bin/ceph-deploy, line 19, in module from ceph_deploy.cli import main File /usr/lib/pymodules/python2.7/ceph_deploy/cli.py, line 1, in module import pkg_resources ImportError: No module named pkg_resources administrator@ceph-admin:~$ python Python 2.7.3 (default, Jan 2 2013, 13:56:14) [GCC 4.7.2] on linux2 On Fri, May 17, 2013 at 9:10 AM, Gary Lowell glow...@sonic.net wrote: Hi Matt - Sounds like you installed ceph-deploy by downloading from github.com/ceph/ceph-deploy, then running the bootstrap script. We have debian packages for ceph-deploy and python-pushy that are included in the debian-cuttlefish repo, as well as http://ceph.com/packages/ceph-deploy/debian.You can install python-push from those locations with apt, or you can install via pip: sudo pip python-pushy. Let me know if you continue to have problems. Cheers, Gary On May 16, 2013, at 3:51 PM, Matt Chipman wrote: hi, I used ceph-deploy successfully a few days ago but recently reinstalled my admin machine from the same instructions http://ceph.com/docs/master/rados/deployment/preflight-checklist/ now getting the error below. Then I figured I'd just use the debs but they are missing the python-pushy dependancy. Debian 7.0 Is there any way to solve either issue? thanks administrator@cephadmin:~$ ceph-deploy usage: ceph-deploy [-h] [-v | -q] [-n] [--overwrite-conf] [--cluster NAME] COMMAND ... ceph-deploy: error: too few arguments administrator@cephadmin:~$ ceph-deploy install ceph00 Traceback (most recent call last): File