Re: [ceph-users] PG down incomplete

2013-05-17 Thread John Wilkins
If you can follow the documentation here:
http://ceph.com/docs/master/rados/operations/monitoring-osd-pg/  and
http://ceph.com/docs/master/rados/troubleshooting/  to provide some
additional information, we may be better able to help you.

For example, ceph osd tree would help us understand the status of
your cluster a bit better.

On Thu, May 16, 2013 at 10:32 PM, Olivier Bonvalet ceph.l...@daevel.fr wrote:
 Le mercredi 15 mai 2013 à 00:15 +0200, Olivier Bonvalet a écrit :
 Hi,

 I have some PG in state down and/or incomplete on my cluster, because I
 loose 2 OSD and a pool was having only 2 replicas. So of course that
 data is lost.

 My problem now is that I can't retreive a HEALTH_OK status : if I try
 to remove, read or overwrite the corresponding RBD images, near all OSD
 hang (well... they don't do anything and requests stay in a growing
 queue, until the production will be done).

 So, what can I do to remove that corrupts images ?

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 Up. Nobody can help me on that problem ?

 Thanks,

 Olivier

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Mount error 5 while mounting cephfs

2013-05-17 Thread Sridhar Mahadevan
Hi,

I have deployed the ceph object store using ceph-deploy.
I tried to mount cephfs and I got struck with this error.

*sudo mount.ceph 192.168.35.82:/ /mnt/mycephfs -o
name=admin,secret=AQDa5JJRqLxuOxAA77VljIjaAGWR6mGdL12NUQ==*

*mount error 5 = Input/output error*

The output of the command

# *ceph -s

*
   *health HEALTH_WARN 64 pgs degraded; mds cluster is degraded; mds
blade2-qq is laggy
   monmap e1: 1 mons at {blade2-qq=192.168.35.82:6789/0}, election epoch 1,
quorum 0 blade2-qq
   osdmap e56: 4 osds: 4 up, 4 in
   pgmap v834: 192 pgs: 128 active+clean, 64 active+clean+degraded; 0 bytes
data, 50871 MB used, 437 GB / 513 GB avail
   mdsmap e6457: 1/1/1 up {0=blade2-qq=up:replay(laggy or crashed)}
*
As it says the MDS has crashed. I dont see ceph-mds running in the
MDS_Node.
I executed *ceph-deploy mds create mds_node  *and this starts the
ceph-mds daemon in the mds_node, but I see that the ceph-mds daemon crashes
after sometime.

Kindly help me on this issue.

-- 
--sridhar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mount error 5 while mounting cephfs

2013-05-17 Thread John Wilkins
Have you tried restarting your MDS server?
http://ceph.com/docs/master/rados/operations/operating/#operating-a-cluster

On Fri, May 17, 2013 at 12:16 AM, Sridhar Mahadevan
msridha...@gmail.com wrote:
 Hi,

 I have deployed the ceph object store using ceph-deploy.
 I tried to mount cephfs and I got struck with this error.

 sudo mount.ceph 192.168.35.82:/ /mnt/mycephfs -o
 name=admin,secret=AQDa5JJRqLxuOxAA77VljIjaAGWR6mGdL12NUQ==

 mount error 5 = Input/output error

 The output of the command

 # ceph -s

health HEALTH_WARN 64 pgs degraded; mds cluster is degraded; mds
 blade2-qq is laggy
monmap e1: 1 mons at {blade2-qq=192.168.35.82:6789/0}, election epoch 1,
 quorum 0 blade2-qq
osdmap e56: 4 osds: 4 up, 4 in
pgmap v834: 192 pgs: 128 active+clean, 64 active+clean+degraded; 0 bytes
 data, 50871 MB used, 437 GB / 513 GB avail
mdsmap e6457: 1/1/1 up {0=blade2-qq=up:replay(laggy or crashed)}

 As it says the MDS has crashed. I dont see ceph-mds running in the MDS_Node.
 I executed ceph-deploy mds create mds_node  and this starts the ceph-mds
 daemon in the mds_node, but I see that the ceph-mds daemon crashes after
 sometime.

 Kindly help me on this issue.

 --
 --sridhar

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mount error 5 while mounting cephfs

2013-05-17 Thread Sridhar Mahadevan
Hi,
I did try to restart the MDS server. The logs show the following error

*[187846.234448] init: ceph-mds (ceph/blade2-qq) main process (15077)
killed by ABRT signal
[187846.234493] init: ceph-mds (ceph/blade2-qq) main process ended,
respawning
[187846.687929] init: ceph-mds (ceph/blade2-qq) main process (15099) killed
by ABRT signal
[187846.687977] init: ceph-mds (ceph/blade2-qq) respawning too fast, stopped
*

Thanks and Regards


On Fri, May 17, 2013 at 3:33 PM, John Wilkins john.wilk...@inktank.comwrote:

 Have you tried restarting your MDS server?
 http://ceph.com/docs/master/rados/operations/operating/#operating-a-cluster

 On Fri, May 17, 2013 at 12:16 AM, Sridhar Mahadevan
 msridha...@gmail.com wrote:
  Hi,
 
  I have deployed the ceph object store using ceph-deploy.
  I tried to mount cephfs and I got struck with this error.
 
  sudo mount.ceph 192.168.35.82:/ /mnt/mycephfs -o
  name=admin,secret=AQDa5JJRqLxuOxAA77VljIjaAGWR6mGdL12NUQ==
 
  mount error 5 = Input/output error
 
  The output of the command
 
  # ceph -s
 
 health HEALTH_WARN 64 pgs degraded; mds cluster is degraded; mds
  blade2-qq is laggy
 monmap e1: 1 mons at {blade2-qq=192.168.35.82:6789/0}, election
 epoch 1,
  quorum 0 blade2-qq
 osdmap e56: 4 osds: 4 up, 4 in
 pgmap v834: 192 pgs: 128 active+clean, 64 active+clean+degraded; 0
 bytes
  data, 50871 MB used, 437 GB / 513 GB avail
 mdsmap e6457: 1/1/1 up {0=blade2-qq=up:replay(laggy or crashed)}
 
  As it says the MDS has crashed. I dont see ceph-mds running in the
 MDS_Node.
  I executed ceph-deploy mds create mds_node  and this starts the
 ceph-mds
  daemon in the mds_node, but I see that the ceph-mds daemon crashes after
  sometime.
 
  Kindly help me on this issue.
 
  --
  --sridhar
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 



 --
 John Wilkins
 Senior Technical Writer
 Intank
 john.wilk...@inktank.com
 (415) 425-9599
 http://inktank.com




-- 
--sridhar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mount error 5 while mounting cephfs

2013-05-17 Thread John Wilkins
Are you running the MDS in a VM?

On Fri, May 17, 2013 at 12:40 AM, Sridhar Mahadevan
msridha...@gmail.com wrote:
 Hi,
 I did try to restart the MDS server. The logs show the following error

 [187846.234448] init: ceph-mds (ceph/blade2-qq) main process (15077) killed
 by ABRT signal
 [187846.234493] init: ceph-mds (ceph/blade2-qq) main process ended,
 respawning
 [187846.687929] init: ceph-mds (ceph/blade2-qq) main process (15099) killed
 by ABRT signal
 [187846.687977] init: ceph-mds (ceph/blade2-qq) respawning too fast, stopped


 Thanks and Regards


 On Fri, May 17, 2013 at 3:33 PM, John Wilkins john.wilk...@inktank.com
 wrote:

 Have you tried restarting your MDS server?

 http://ceph.com/docs/master/rados/operations/operating/#operating-a-cluster

 On Fri, May 17, 2013 at 12:16 AM, Sridhar Mahadevan
 msridha...@gmail.com wrote:
  Hi,
 
  I have deployed the ceph object store using ceph-deploy.
  I tried to mount cephfs and I got struck with this error.
 
  sudo mount.ceph 192.168.35.82:/ /mnt/mycephfs -o
  name=admin,secret=AQDa5JJRqLxuOxAA77VljIjaAGWR6mGdL12NUQ==
 
  mount error 5 = Input/output error
 
  The output of the command
 
  # ceph -s
 
 health HEALTH_WARN 64 pgs degraded; mds cluster is degraded; mds
  blade2-qq is laggy
 monmap e1: 1 mons at {blade2-qq=192.168.35.82:6789/0}, election epoch
  1,
  quorum 0 blade2-qq
 osdmap e56: 4 osds: 4 up, 4 in
 pgmap v834: 192 pgs: 128 active+clean, 64 active+clean+degraded; 0
  bytes
  data, 50871 MB used, 437 GB / 513 GB avail
 mdsmap e6457: 1/1/1 up {0=blade2-qq=up:replay(laggy or crashed)}
 
  As it says the MDS has crashed. I dont see ceph-mds running in the
  MDS_Node.
  I executed ceph-deploy mds create mds_node  and this starts the
  ceph-mds
  daemon in the mds_node, but I see that the ceph-mds daemon crashes after
  sometime.
 
  Kindly help me on this issue.
 
  --
  --sridhar
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 



 --
 John Wilkins
 Senior Technical Writer
 Intank
 john.wilk...@inktank.com
 (415) 425-9599
 http://inktank.com




 --
 --sridhar



-- 
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG down incomplete

2013-05-17 Thread Olivier Bonvalet
Hi,

thanks for your answer. In fact I have several different problems, which
I tried to solve separatly :

1) I loose 2 OSD, and some pools have only 2 replicas. So some data was
lost.
2) One monitor refuse the Cuttlefish upgrade, so I only have 4 of 5
monitors running.
3) I have 4 old inconsistent PG that I can't repair.


So the status :

   health HEALTH_ERR 15 pgs incomplete; 4 pgs inconsistent; 15 pgs stuck
inactive; 15 pgs stuck unclean; 1 near full osd(s); 19 scrub errors;
noout flag(s) set; 1 mons down, quorum 0,1,2,3 a,b,c,e
   monmap e7: 5 mons at
{a=10.0.0.1:6789/0,b=10.0.0.2:6789/0,c=10.0.0.5:6789/0,d=10.0.0.6:6789/0,e=10.0.0.3:6789/0},
 election epoch 2584, quorum 0,1,2,3 a,b,c,e
   osdmap e82502: 50 osds: 48 up, 48 in
pgmap v12807617: 7824 pgs: 7803 active+clean, 1 active+clean
+scrubbing, 15 incomplete, 4 active+clean+inconsistent, 1 active+clean
+scrubbing+deep; 5676 GB data, 18948 GB used, 18315 GB / 37263 GB avail;
137KB/s rd, 1852KB/s wr, 199op/s
   mdsmap e1: 0/0/1 up



The tree :

# idweight  type name   up/down reweight
-8  14.26   root SSDroot
-27 8   datacenter SSDrbx2
-26 8   room SSDs25
-25 8   net SSD188-165-12
-24 8   rack SSD25B09
-23 8   host lyll
46  2   osd.46  up  
1   
47  2   osd.47  up  
1   
48  2   osd.48  up  
1   
49  2   osd.49  up  
1   
-10 4.26datacenter SSDrbx3
-12 2   room SSDs43
-13 2   net SSD178-33-122
-16 2   rack SSD43S01
-17 2   host kaino
42  1   osd.42  up  
1   
43  1   osd.43  up  
1   
-22 2.26room SSDs45
-21 2.26net SSD5-135-138
-20 2.26rack SSD45F01
-19 2.26host taman
44  1.13osd.44  up  
1   
45  1.13osd.45  up  
1   
-9  2   datacenter SSDrbx4
-11 2   room SSDs52
-14 2   net SSD176-31-226
-15 2   rack SSD52B09
-18 2   host dragan
40  1   osd.40  up  
1   
41  1   osd.41  up  
1   
-1  33.43   root SASroot
-10015.9datacenter SASrbx1
-90 15.9room SASs15
-72 15.9net SAS188-165-15
-40 8   rack SAS15B01
-3  8   host brontes
0   1   osd.0   up  
1   
1   1   osd.1   up  
1   
2   1   osd.2   up  
1   
3   1   osd.3   up  
1   
4   1   osd.4   up  
1   
5   1   osd.5   up  
1   
6   1   osd.6   up  
1   
7   1   osd.7   up  
1   
-41 7.9 rack SAS15B02
-6  7.9 host alim
24  1   osd.24  up  
1   
25  1   osd.25  down
0   
26  1   osd.26  up  
1   
27  1   osd.27  up  
1   
28  1   osd.28  up  
1   
29  1   osd.29  up  
1   
30  1   osd.30  up  
1   
31  0.9 osd.31  up  
1   
-10117.53   datacenter SASrbx2
-91 17.53 

[ceph-users] ceph v6.1, rbd-fuse issue, rbd_list: error %d Numerical result out of range

2013-05-17 Thread Sean
Hi everyone

The image files don't display in mount point when using the command
rbd-fuse -p poolname -c /etc/ceph/ceph.conf /aa

but other pools can display image files with the same command. I also create
more sizes and more numbers images than that pool, it's work fine.

How can I track the issue? 

It reports the below errors after enabling debug output of Fuse options.

root@ceph3:/# rbd-fuse -p qa_vol /aa -d
FUSE library version: 2.8.6
nullpath_ok: 0
unique: 1, opcode: INIT (26), nodeid: 0, insize: 56
INIT: 7.17
flags=0x047b
max_readahead=0x0002
   INIT: 7.12
   flags=0x0031
   max_readahead=0x0002
   max_write=0x0002
   unique: 1, success, outsize: 40
unique: 2, opcode: GETATTR (3), nodeid: 1, insize: 56
getattr /
rbd_list: error %d
: Numerical result out of range
   unique: 2, success, outsize: 120
unique: 3, opcode: OPENDIR (27), nodeid: 1, insize: 48
opendir flags: 0x98800 /
rbd_list: error %d
: Numerical result out of range
   opendir[0] flags: 0x98800 /
   unique: 3, success, outsize: 32
unique: 4, opcode: READDIR (28), nodeid: 1, insize: 80
readdir[0] from 0
   unique: 4, success, outsize: 80
unique: 5, opcode: READDIR (28), nodeid: 1, insize: 80
   unique: 5, success, outsize: 16
unique: 6, opcode: RELEASEDIR (29), nodeid: 1, insize: 64
releasedir[0] flags: 0x0
   unique: 6, success, outsize: 16


thanks.
Sean Cao


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy

2013-05-17 Thread Matt Chipman
Thanks Gary,

after you throwing me those clues I got furthur but it still isnt working.
 It seems there are no i386 deb python-pushy packages in either of those
ceph repo's.  I also attempted using PIP and got pushy installed but the
ceph-deploy debs still refused to install.

I built another VM with 64bit Debian 7 and the packages were found and
installed however there is an error on compiling during install.

any ideas?

cheers

-Matt

administrator@ceph-admin:~$ sudo aptitude install ceph-deploy
The following NEW packages will be installed:
  ceph-deploy python-pushy{a}
0 packages upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 53.3 kB of archives. After unpacking 328 kB will be used.
Do you want to continue? [Y/n/?]
Get: 1 http://ceph.com/debian-cuttlefish/ wheezy/main python-pushy amd64
0.5.1-1 [30.9 kB]
Get: 2 http://ceph.com/debian-cuttlefish/ wheezy/main ceph-deploy all 0.1-1
[22.4 kB]
Fetched 53.3 kB in 1s (33.2 kB/s)
Selecting previously unselected package python-pushy.
(Reading database ... 38969 files and directories currently installed.)
Unpacking python-pushy (from .../python-pushy_0.5.1-1_amd64.deb) ...
Selecting previously unselected package ceph-deploy.
Unpacking ceph-deploy (from .../ceph-deploy_0.1-1_all.deb) ...
Setting up python-pushy (0.5.1-1) ...
Setting up ceph-deploy (0.1-1) ...
Processing triggers for python-support ...
Compiling /usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli.py ...
SyntaxError: ('invalid syntax',
('/usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli.py', 44, 26, '
 assert {p.basename for p in tmpdir.listdir()} == set()\n'))

Compiling /usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli_new.py ...
SyntaxError: ('invalid syntax',
('/usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli_new.py', 33, 26, 
   assert {p.basename for p in tmpdir.listdir()} == {'ceph.conf'}\n))


administrator@ceph-admin:~$ ceph-deploy
Traceback (most recent call last):
  File /usr/bin/ceph-deploy, line 19, in module
from ceph_deploy.cli import main
  File /usr/lib/pymodules/python2.7/ceph_deploy/cli.py, line 1, in
module
import pkg_resources
ImportError: No module named pkg_resources
administrator@ceph-admin:~$ python
Python 2.7.3 (default, Jan  2 2013, 13:56:14)
[GCC 4.7.2] on linux2



On Fri, May 17, 2013 at 9:10 AM, Gary Lowell glow...@sonic.net wrote:

 Hi Matt -

 Sounds like you installed ceph-deploy by downloading from
 github.com/ceph/ceph-deploy, then running the bootstrap script.

 We have debian packages for ceph-deploy and python-pushy that are included
 in the debian-cuttlefish repo, as well as
 http://ceph.com/packages/ceph-deploy/debian.You can install
 python-push from those locations with apt, or you can install via pip:
  sudo pip python-pushy.

 Let me know if you continue to have problems.

 Cheers,
 Gary

 On May 16, 2013, at 3:51 PM, Matt Chipman wrote:

 hi,
 I used ceph-deploy successfully a few days ago but recently reinstalled my
 admin machine from the same instructions
 http://ceph.com/docs/master/rados/deployment/preflight-checklist/

 now getting the error below. Then I figured I'd just use the debs but they
 are missing the python-pushy dependancy.  Debian 7.0

 Is there any way to solve either issue?

 thanks

 administrator@cephadmin:~$ ceph-deploy
 usage: ceph-deploy [-h] [-v | -q] [-n] [--overwrite-conf] [--cluster NAME]
COMMAND ...
 ceph-deploy: error: too few arguments
 administrator@cephadmin:~$ ceph-deploy install ceph00
 Traceback (most recent call last):
   File
 /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/client.py,
 line 383, in __init__
 self.modules = AutoImporter(self)
   File
 /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/client.py,
 line 236, in __init__
 remote_compile = self.__client.eval(compile)
   File
 /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/client.py,
 line 478, in eval
 return self.remote.eval(code, globals, locals)
   File
 /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/connection.py,
 line 54, in eval
 return self.send_request(MessageType.evaluate, args)
   File
 /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/baseconnection.py,
 line 315, in send_request
 m = self.__waitForResponse(handler)
   File
 /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/baseconnection.py,
 line 420, in __waitForResponse
 m = self.__recv()
   File
 /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/protocol/baseconnection.py,
 line 601, in __recv
 m = self.__istream.receive_message()
   File
 

Re: [ceph-users] PG down incomplete

2013-05-17 Thread John Wilkins
It looks like you have the noout flag set:

noout flag(s) set; 1 mons down, quorum 0,1,2,3 a,b,c,e
   monmap e7: 5 mons at
{a=10.0.0.1:6789/0,b=10.0.0.2:6789/0,c=10.0.0.5:6789/0,d=10.0.0.6:6789/0,e=10.0.0.3:6789/0},
election epoch 2584, quorum 0,1,2,3 a,b,c,e
   osdmap e82502: 50 osds: 48 up, 48 in

http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#stopping-w-out-rebalancing

If you have down OSDs that don't get marked out, that would certainly
cause problems. Have you tried restarting the failed OSDs?

What do the logs look like for osd.15 and osd.25?

On Fri, May 17, 2013 at 1:31 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote:
 Hi,

 thanks for your answer. In fact I have several different problems, which
 I tried to solve separatly :

 1) I loose 2 OSD, and some pools have only 2 replicas. So some data was
 lost.
 2) One monitor refuse the Cuttlefish upgrade, so I only have 4 of 5
 monitors running.
 3) I have 4 old inconsistent PG that I can't repair.


 So the status :

health HEALTH_ERR 15 pgs incomplete; 4 pgs inconsistent; 15 pgs stuck
 inactive; 15 pgs stuck unclean; 1 near full osd(s); 19 scrub errors;
 noout flag(s) set; 1 mons down, quorum 0,1,2,3 a,b,c,e
monmap e7: 5 mons at
 {a=10.0.0.1:6789/0,b=10.0.0.2:6789/0,c=10.0.0.5:6789/0,d=10.0.0.6:6789/0,e=10.0.0.3:6789/0},
  election epoch 2584, quorum 0,1,2,3 a,b,c,e
osdmap e82502: 50 osds: 48 up, 48 in
 pgmap v12807617: 7824 pgs: 7803 active+clean, 1 active+clean
 +scrubbing, 15 incomplete, 4 active+clean+inconsistent, 1 active+clean
 +scrubbing+deep; 5676 GB data, 18948 GB used, 18315 GB / 37263 GB avail;
 137KB/s rd, 1852KB/s wr, 199op/s
mdsmap e1: 0/0/1 up



 The tree :

 # idweight  type name   up/down reweight
 -8  14.26   root SSDroot
 -27 8   datacenter SSDrbx2
 -26 8   room SSDs25
 -25 8   net SSD188-165-12
 -24 8   rack SSD25B09
 -23 8   host lyll
 46  2   osd.46  up
   1
 47  2   osd.47  up
   1
 48  2   osd.48  up
   1
 49  2   osd.49  up
   1
 -10 4.26datacenter SSDrbx3
 -12 2   room SSDs43
 -13 2   net SSD178-33-122
 -16 2   rack SSD43S01
 -17 2   host kaino
 42  1   osd.42  up
   1
 43  1   osd.43  up
   1
 -22 2.26room SSDs45
 -21 2.26net SSD5-135-138
 -20 2.26rack SSD45F01
 -19 2.26host taman
 44  1.13osd.44  up
   1
 45  1.13osd.45  up
   1
 -9  2   datacenter SSDrbx4
 -11 2   room SSDs52
 -14 2   net SSD176-31-226
 -15 2   rack SSD52B09
 -18 2   host dragan
 40  1   osd.40  up
   1
 41  1   osd.41  up
   1
 -1  33.43   root SASroot
 -10015.9datacenter SASrbx1
 -90 15.9room SASs15
 -72 15.9net SAS188-165-15
 -40 8   rack SAS15B01
 -3  8   host brontes
 0   1   osd.0   up
   1
 1   1   osd.1   up
   1
 2   1   osd.2   up
   1
 3   1   osd.3   up
   1
 4   1   osd.4   up
   1
 5   1   osd.5   up
   1
 6   1   osd.6   up
   1
 7   1   osd.7   up
   1
 -41 7.9 rack SAS15B02
 -6  7.9 host alim
 24  1   osd.24  up
   1
 25  1   osd.25  down  
   0
 26 

Re: [ceph-users] PG down incomplete

2013-05-17 Thread John Wilkins
Another thing... since your osd.10 is near full, your cluster may be
fairly close to capacity for the purposes of rebalancing.  Have a look
at:

http://ceph.com/docs/master/rados/configuration/mon-config-ref/#storage-capacity
http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#no-free-drive-space

Maybe we can get some others to look at this.  It's not clear to me
why the other OSD crashes after you take osd.25 out. It could be
capacity, but that shouldn't make it crash. Have you tried adding more
OSDs to increase capacity?



On Fri, May 17, 2013 at 11:27 AM, John Wilkins john.wilk...@inktank.com wrote:
 It looks like you have the noout flag set:

 noout flag(s) set; 1 mons down, quorum 0,1,2,3 a,b,c,e
monmap e7: 5 mons at
 {a=10.0.0.1:6789/0,b=10.0.0.2:6789/0,c=10.0.0.5:6789/0,d=10.0.0.6:6789/0,e=10.0.0.3:6789/0},
 election epoch 2584, quorum 0,1,2,3 a,b,c,e
osdmap e82502: 50 osds: 48 up, 48 in

 http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#stopping-w-out-rebalancing

 If you have down OSDs that don't get marked out, that would certainly
 cause problems. Have you tried restarting the failed OSDs?

 What do the logs look like for osd.15 and osd.25?

 On Fri, May 17, 2013 at 1:31 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote:
 Hi,

 thanks for your answer. In fact I have several different problems, which
 I tried to solve separatly :

 1) I loose 2 OSD, and some pools have only 2 replicas. So some data was
 lost.
 2) One monitor refuse the Cuttlefish upgrade, so I only have 4 of 5
 monitors running.
 3) I have 4 old inconsistent PG that I can't repair.


 So the status :

health HEALTH_ERR 15 pgs incomplete; 4 pgs inconsistent; 15 pgs stuck
 inactive; 15 pgs stuck unclean; 1 near full osd(s); 19 scrub errors;
 noout flag(s) set; 1 mons down, quorum 0,1,2,3 a,b,c,e
monmap e7: 5 mons at
 {a=10.0.0.1:6789/0,b=10.0.0.2:6789/0,c=10.0.0.5:6789/0,d=10.0.0.6:6789/0,e=10.0.0.3:6789/0},
  election epoch 2584, quorum 0,1,2,3 a,b,c,e
osdmap e82502: 50 osds: 48 up, 48 in
 pgmap v12807617: 7824 pgs: 7803 active+clean, 1 active+clean
 +scrubbing, 15 incomplete, 4 active+clean+inconsistent, 1 active+clean
 +scrubbing+deep; 5676 GB data, 18948 GB used, 18315 GB / 37263 GB avail;
 137KB/s rd, 1852KB/s wr, 199op/s
mdsmap e1: 0/0/1 up



 The tree :

 # idweight  type name   up/down reweight
 -8  14.26   root SSDroot
 -27 8   datacenter SSDrbx2
 -26 8   room SSDs25
 -25 8   net SSD188-165-12
 -24 8   rack SSD25B09
 -23 8   host lyll
 46  2   osd.46  up   
1
 47  2   osd.47  up   
1
 48  2   osd.48  up   
1
 49  2   osd.49  up   
1
 -10 4.26datacenter SSDrbx3
 -12 2   room SSDs43
 -13 2   net SSD178-33-122
 -16 2   rack SSD43S01
 -17 2   host kaino
 42  1   osd.42  up   
1
 43  1   osd.43  up   
1
 -22 2.26room SSDs45
 -21 2.26net SSD5-135-138
 -20 2.26rack SSD45F01
 -19 2.26host taman
 44  1.13osd.44  up   
1
 45  1.13osd.45  up   
1
 -9  2   datacenter SSDrbx4
 -11 2   room SSDs52
 -14 2   net SSD176-31-226
 -15 2   rack SSD52B09
 -18 2   host dragan
 40  1   osd.40  up   
1
 41  1   osd.41  up   
1
 -1  33.43   root SASroot
 -10015.9datacenter SASrbx1
 -90 15.9room SASs15
 -72 15.9net SAS188-165-15
 -40 8   rack SAS15B01
 -3  8   host brontes
 0   1   osd.0   up   
1
 1   1   osd.1   up   
1
 2   1   osd.2   up   
1
 3   1   osd.3   up   
1
 

Re: [ceph-users] ceph-deploy

2013-05-17 Thread Gary Lowell
Hi Matt -

Sorry, I just spotted at the end of your message that you are using python 
2.7.3.  But the modules are installing into the python2.6 directories.   I 
don't know why that would be happening, and we'll have to dig into more.  
Python is tripping over incompatible syntax for some reason.

Cheers,
Gary

On May 17, 2013, at 1:41 PM, Gary Lowell wrote:

 Hi Matt -
 
 
 I see in the message below that you are using python 2.6.   Ceph-deploy may 
 have some syntax that is incompatible with that version of python.  On wheezy 
 we tested with the default python 2.7.3 interpreter.  You might try using the 
 newer interpreter, we will also do so more testing to see if we can get 
 ceph-deploy working with python 2.6.
 
 Cheers,
 Gary
 
 
 On May 17, 2013, at 6:23 AM, Matt Chipman wrote:
 
 Thanks Gary,
 
 after you throwing me those clues I got furthur but it still isnt working.  
 It seems there are no i386 deb python-pushy packages in either of those 
 ceph repo's.  I also attempted using PIP and got pushy installed but the 
 ceph-deploy debs still refused to install.
 
 I built another VM with 64bit Debian 7 and the packages were found and 
 installed however there is an error on compiling during install.
 
 any ideas?
 
 cheers
 
 -Matt
 
 administrator@ceph-admin:~$ sudo aptitude install ceph-deploy
 The following NEW packages will be installed:
   ceph-deploy python-pushy{a}
 0 packages upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
 Need to get 53.3 kB of archives. After unpacking 328 kB will be used.
 Do you want to continue? [Y/n/?]
 Get: 1 http://ceph.com/debian-cuttlefish/ wheezy/main python-pushy amd64 
 0.5.1-1 [30.9 kB]
 Get: 2 http://ceph.com/debian-cuttlefish/ wheezy/main ceph-deploy all 0.1-1 
 [22.4 kB]
 Fetched 53.3 kB in 1s (33.2 kB/s)
 Selecting previously unselected package python-pushy.
 (Reading database ... 38969 files and directories currently installed.)
 Unpacking python-pushy (from .../python-pushy_0.5.1-1_amd64.deb) ...
 Selecting previously unselected package ceph-deploy.
 Unpacking ceph-deploy (from .../ceph-deploy_0.1-1_all.deb) ...
 Setting up python-pushy (0.5.1-1) ...
 Setting up ceph-deploy (0.1-1) ...
 Processing triggers for python-support ...
 Compiling /usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli.py ...
 SyntaxError: ('invalid syntax', 
 ('/usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli.py', 44, 26, '
 assert {p.basename for p in tmpdir.listdir()} == set()\n'))
 
 Compiling /usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli_new.py ...
 SyntaxError: ('invalid syntax', 
 ('/usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli_new.py', 33, 26,   
   assert {p.basename for p in tmpdir.listdir()} == {'ceph.conf'}\n))
 
 
 administrator@ceph-admin:~$ ceph-deploy
 Traceback (most recent call last):
   File /usr/bin/ceph-deploy, line 19, in module
 from ceph_deploy.cli import main
   File /usr/lib/pymodules/python2.7/ceph_deploy/cli.py, line 1, in module
 import pkg_resources
 ImportError: No module named pkg_resources
 administrator@ceph-admin:~$ python
 Python 2.7.3 (default, Jan  2 2013, 13:56:14)
 [GCC 4.7.2] on linux2
 
 
 
 On Fri, May 17, 2013 at 9:10 AM, Gary Lowell glow...@sonic.net wrote:
 Hi Matt -
 
 Sounds like you installed ceph-deploy by downloading from 
 github.com/ceph/ceph-deploy, then running the bootstrap script.
 
 We have debian packages for ceph-deploy and python-pushy that are included 
 in the debian-cuttlefish repo, as well as 
 http://ceph.com/packages/ceph-deploy/debian.You can install python-push 
 from those locations with apt, or you can install via pip:  sudo pip 
 python-pushy.
 
 Let me know if you continue to have problems.
 
 Cheers,
 Gary
 
 On May 16, 2013, at 3:51 PM, Matt Chipman wrote:
 
 hi, 
 I used ceph-deploy successfully a few days ago but recently reinstalled my 
 admin machine from the same instructions 
 http://ceph.com/docs/master/rados/deployment/preflight-checklist/
 
 now getting the error below. Then I figured I'd just use the debs but they 
 are missing the python-pushy dependancy.  Debian 7.0
 
 Is there any way to solve either issue? 
 
 thanks
 
 administrator@cephadmin:~$ ceph-deploy
 usage: ceph-deploy [-h] [-v | -q] [-n] [--overwrite-conf] [--cluster NAME]
COMMAND ...
 ceph-deploy: error: too few arguments
 administrator@cephadmin:~$ ceph-deploy install ceph00
 Traceback (most recent call last):
   File 
 /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/client.py,
  line 383, in __init__
 self.modules = AutoImporter(self)
   File 
 /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/client.py,
  line 236, in __init__
 remote_compile = self.__client.eval(compile)
   File 
 /home/administrator/ceph-deploy/virtualenv/local/lib/python2.7/site-packages/pushy-0.5.1-py2.7.egg/pushy/client.py,
  line 478, in eval
 return 

Re: [ceph-users] Kernel panic on rbd map when cluster is out of monitor quorum

2013-05-17 Thread Sage Weil
On Fri, 17 May 2013, Joe Ryner wrote:
 Hi All,
 
 I have had an issue recently while working on my ceph clusters.  The 
 following issue seems to be true on bobtail and cuttlefish.  I have two 
 production clusters in two different data centers and a test cluster.  We are 
 using ceph to run virtual machines.  I use rbd as block devices for sanlock.
 
 I am running Fedora 18.
 
 I have been moving monitors around and in the process I got the cluster 
 out of quorum, so ceph stopped responding.  During this time I decided 
 to reboot a ceph node that performs an rbd map during startup.  The 
 system boots ok but the service script that is performing the rbd map 
 doesn't finish and eventually the system will OOPS and then finally 
 panic.  I was able to disable the rbd map during boot and finally got 
 the cluster back in quorum and everything settled down nicely.

What kernel version?  Are you using cephx authentication?  If you could 
open a bug at tracker.ceph.com that would be most helpful!

 Question, has anyone seen this behavior of crashing/panic?  I have seen this 
 happen on both of my production clusters.
 Secondly, the ceph command hangs when the cluster is out of quorum, is there 
 a timeout available?

Not currently.  You can do this yourself with 'timeout 120 ...' with any 
recent coreutils.

Thanks-
sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Kernel panic on rbd map when cluster is out of monitor quorum

2013-05-17 Thread Alex Elder
On 05/17/2013 03:49 PM, Joe Ryner wrote:
 Hi All,
 
 I have had an issue recently while working on my ceph clusters.  The
 following issue seems to be true on bobtail and cuttlefish.  I have
 two production clusters in two different data centers and a test
 cluster.  We are using ceph to run virtual machines.  I use rbd as
 block devices for sanlock.

Also, do you have any of the information that the kernel
might have dumped when it panicked?

That might be helpful identifying the problem.

-Alex

 I am running Fedora 18.
 
 I have been moving monitors around and in the process I got the
 cluster out of quorum, so ceph stopped responding.  During this time
 I decided to reboot a ceph node that performs an rbd map during
 startup.  The system boots ok but the service script that is
 performing the rbd map doesn't finish and eventually the system will
 OOPS and then finally panic.  I was able to disable the rbd map
 during boot and finally got the cluster back in quorum and everything
 settled down nicely.
 
 Question, has anyone seen this behavior of crashing/panic?  I have
 seen this happen on both of my production clusters. Secondly, the
 ceph command hangs when the cluster is out of quorum, is there a
 timeout available?
 
 Thanks Joe ___ ceph-users
 mailing list ceph-users@lists.ceph.com 
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG down incomplete

2013-05-17 Thread Olivier Bonvalet
Yes, I set the noout flag to avoid the auto balancing of the osd.25,
which will crash all OSD of this host (already tried several times).

Le vendredi 17 mai 2013 à 11:27 -0700, John Wilkins a écrit :
 It looks like you have the noout flag set:
 
 noout flag(s) set; 1 mons down, quorum 0,1,2,3 a,b,c,e
monmap e7: 5 mons at
 {a=10.0.0.1:6789/0,b=10.0.0.2:6789/0,c=10.0.0.5:6789/0,d=10.0.0.6:6789/0,e=10.0.0.3:6789/0},
 election epoch 2584, quorum 0,1,2,3 a,b,c,e
osdmap e82502: 50 osds: 48 up, 48 in
 
 http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#stopping-w-out-rebalancing
 
 If you have down OSDs that don't get marked out, that would certainly
 cause problems. Have you tried restarting the failed OSDs?
 
 What do the logs look like for osd.15 and osd.25?
 
 On Fri, May 17, 2013 at 1:31 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote:
  Hi,
 
  thanks for your answer. In fact I have several different problems, which
  I tried to solve separatly :
 
  1) I loose 2 OSD, and some pools have only 2 replicas. So some data was
  lost.
  2) One monitor refuse the Cuttlefish upgrade, so I only have 4 of 5
  monitors running.
  3) I have 4 old inconsistent PG that I can't repair.
 
 
  So the status :
 
 health HEALTH_ERR 15 pgs incomplete; 4 pgs inconsistent; 15 pgs stuck
  inactive; 15 pgs stuck unclean; 1 near full osd(s); 19 scrub errors;
  noout flag(s) set; 1 mons down, quorum 0,1,2,3 a,b,c,e
 monmap e7: 5 mons at
  {a=10.0.0.1:6789/0,b=10.0.0.2:6789/0,c=10.0.0.5:6789/0,d=10.0.0.6:6789/0,e=10.0.0.3:6789/0},
   election epoch 2584, quorum 0,1,2,3 a,b,c,e
 osdmap e82502: 50 osds: 48 up, 48 in
  pgmap v12807617: 7824 pgs: 7803 active+clean, 1 active+clean
  +scrubbing, 15 incomplete, 4 active+clean+inconsistent, 1 active+clean
  +scrubbing+deep; 5676 GB data, 18948 GB used, 18315 GB / 37263 GB avail;
  137KB/s rd, 1852KB/s wr, 199op/s
 mdsmap e1: 0/0/1 up
 
 
 
  The tree :
 
  # idweight  type name   up/down reweight
  -8  14.26   root SSDroot
  -27 8   datacenter SSDrbx2
  -26 8   room SSDs25
  -25 8   net SSD188-165-12
  -24 8   rack SSD25B09
  -23 8   host lyll
  46  2   osd.46  up  
  1
  47  2   osd.47  up  
  1
  48  2   osd.48  up  
  1
  49  2   osd.49  up  
  1
  -10 4.26datacenter SSDrbx3
  -12 2   room SSDs43
  -13 2   net SSD178-33-122
  -16 2   rack SSD43S01
  -17 2   host kaino
  42  1   osd.42  up  
  1
  43  1   osd.43  up  
  1
  -22 2.26room SSDs45
  -21 2.26net SSD5-135-138
  -20 2.26rack SSD45F01
  -19 2.26host taman
  44  1.13osd.44  up  
  1
  45  1.13osd.45  up  
  1
  -9  2   datacenter SSDrbx4
  -11 2   room SSDs52
  -14 2   net SSD176-31-226
  -15 2   rack SSD52B09
  -18 2   host dragan
  40  1   osd.40  up  
  1
  41  1   osd.41  up  
  1
  -1  33.43   root SASroot
  -10015.9datacenter SASrbx1
  -90 15.9room SASs15
  -72 15.9net SAS188-165-15
  -40 8   rack SAS15B01
  -3  8   host brontes
  0   1   osd.0   up  
  1
  1   1   osd.1   up  
  1
  2   1   osd.2   up  
  1
  3   1   osd.3   up  
  1
  4   1   osd.4   up  
  1
  5   1   osd.5   up  
  1
  6   1   osd.6   up  
  1
  7   1   osd.7   up  
 

Re: [ceph-users] ceph-deploy

2013-05-17 Thread Gary Lowell
Great news.  There was a patch committed to master last week that added 
python-setuptools to the dependency.  So the issue shouldn't happen with the 
next build.

Cheers,
Gary

On May 17, 2013, at 4:47 PM, Matt Chipman wrote:

 Hi Gary, 
 
 after a bit of searching on the list I was able to resolve this by aptitude 
 install python-setuptools.
 
 seems it's a missing dependency on wheezy ceph-deploy install.
 
 thanks for your help
 
 -Matt
 
 
 On Sat, May 18, 2013 at 6:54 AM, Gary Lowell glow...@sonic.net wrote:
 Hi Matt -
 
 Sorry, I just spotted at the end of your message that you are using python 
 2.7.3.  But the modules are installing into the python2.6 directories.   I 
 don't know why that would be happening, and we'll have to dig into more.  
 Python is tripping over incompatible syntax for some reason.
 
 Cheers,
 Gary
 
 On May 17, 2013, at 1:41 PM, Gary Lowell wrote:
 
 Hi Matt -
 
 
 I see in the message below that you are using python 2.6.   Ceph-deploy may 
 have some syntax that is incompatible with that version of python.  On 
 wheezy we tested with the default python 2.7.3 interpreter.  You might try 
 using the newer interpreter, we will also do so more testing to see if we 
 can get ceph-deploy working with python 2.6.
 
 Cheers,
 Gary
 
 
 On May 17, 2013, at 6:23 AM, Matt Chipman wrote:
 
 Thanks Gary,
 
 after you throwing me those clues I got furthur but it still isnt working.  
 It seems there are no i386 deb python-pushy packages in either of those 
 ceph repo's.  I also attempted using PIP and got pushy installed but the 
 ceph-deploy debs still refused to install.
 
 I built another VM with 64bit Debian 7 and the packages were found and 
 installed however there is an error on compiling during install.
 
 any ideas?
 
 cheers
 
 -Matt
 
 administrator@ceph-admin:~$ sudo aptitude install ceph-deploy
 The following NEW packages will be installed:
   ceph-deploy python-pushy{a}
 0 packages upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
 Need to get 53.3 kB of archives. After unpacking 328 kB will be used.
 Do you want to continue? [Y/n/?]
 Get: 1 http://ceph.com/debian-cuttlefish/ wheezy/main python-pushy amd64 
 0.5.1-1 [30.9 kB]
 Get: 2 http://ceph.com/debian-cuttlefish/ wheezy/main ceph-deploy all 0.1-1 
 [22.4 kB]
 Fetched 53.3 kB in 1s (33.2 kB/s)
 Selecting previously unselected package python-pushy.
 (Reading database ... 38969 files and directories currently installed.)
 Unpacking python-pushy (from .../python-pushy_0.5.1-1_amd64.deb) ...
 Selecting previously unselected package ceph-deploy.
 Unpacking ceph-deploy (from .../ceph-deploy_0.1-1_all.deb) ...
 Setting up python-pushy (0.5.1-1) ...
 Setting up ceph-deploy (0.1-1) ...
 Processing triggers for python-support ...
 Compiling /usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli.py ...
 SyntaxError: ('invalid syntax', 
 ('/usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli.py', 44, 26, '
 assert {p.basename for p in tmpdir.listdir()} == set()\n'))
 
 Compiling /usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli_new.py ...
 SyntaxError: ('invalid syntax', 
 ('/usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli_new.py', 33, 26,  
assert {p.basename for p in tmpdir.listdir()} == {'ceph.conf'}\n))
 
 
 administrator@ceph-admin:~$ ceph-deploy
 Traceback (most recent call last):
   File /usr/bin/ceph-deploy, line 19, in module
 from ceph_deploy.cli import main
   File /usr/lib/pymodules/python2.7/ceph_deploy/cli.py, line 1, in 
 module
 import pkg_resources
 ImportError: No module named pkg_resources
 administrator@ceph-admin:~$ python
 Python 2.7.3 (default, Jan  2 2013, 13:56:14)
 [GCC 4.7.2] on linux2
 
 
 
 On Fri, May 17, 2013 at 9:10 AM, Gary Lowell glow...@sonic.net wrote:
 Hi Matt -
 
 Sounds like you installed ceph-deploy by downloading from 
 github.com/ceph/ceph-deploy, then running the bootstrap script.
 
 We have debian packages for ceph-deploy and python-pushy that are included 
 in the debian-cuttlefish repo, as well as 
 http://ceph.com/packages/ceph-deploy/debian.You can install python-push 
 from those locations with apt, or you can install via pip:  sudo pip 
 python-pushy.
 
 Let me know if you continue to have problems.
 
 Cheers,
 Gary
 
 On May 16, 2013, at 3:51 PM, Matt Chipman wrote:
 
 hi, 
 I used ceph-deploy successfully a few days ago but recently reinstalled my 
 admin machine from the same instructions 
 http://ceph.com/docs/master/rados/deployment/preflight-checklist/
 
 now getting the error below. Then I figured I'd just use the debs but they 
 are missing the python-pushy dependancy.  Debian 7.0
 
 Is there any way to solve either issue? 
 
 thanks
 
 administrator@cephadmin:~$ ceph-deploy
 usage: ceph-deploy [-h] [-v | -q] [-n] [--overwrite-conf] [--cluster NAME]
COMMAND ...
 ceph-deploy: error: too few arguments
 administrator@cephadmin:~$ ceph-deploy install ceph00
 Traceback (most recent call last):
   File