[ceph-users] systemctl enable ceph-mon fails in ceph-deploy create initial (no such service)
Hey folks. Running RHEL7.1 with stock 3.10.0 kernel and trying to deploy Infernalis. Haven't done this since Firefly but I used to know what I was doing. My problem is "ceph-deploy new" and "ceph-deploy install" seem to go well but "ceph-deploy mon create-initial" reliably fails when starting the ceph-mon service. I attached a full log of the deploy attempt and have pasted a sample of the problem below. Problem seems to be that the ceph-mon service it wants to start doesn't actually exist on the target system. Any ideas? Thanks! [root@bdcr151 ceph]# ceph-deploy --overwrite-conf mon create-initial [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.28): /usr/bin/ceph-deploy --overwrite-conf mon create-initial [ceph_deploy.cli][INFO ] ceph-deploy options: [ceph_deploy.cli][INFO ] username : None [ceph_deploy.cli][INFO ] verbose : False [ceph_deploy.cli][INFO ] overwrite_conf: True [ceph_deploy.cli][INFO ] subcommand: create-initial [ceph_deploy.cli][INFO ] quiet : False [ceph_deploy.cli][INFO ] cd_conf : [ceph_deploy.cli][INFO ] cluster : ceph [ceph_deploy.cli][INFO ] func : [ceph_deploy.cli][INFO ] ceph_conf : None [ceph_deploy.cli][INFO ] default_release : False [ceph_deploy.cli][INFO ] keyrings : None [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts bdcr151 bdcr153 bdcr155 [ceph_deploy.mon][DEBUG ] detecting platform for host bdcr151 ... [bdcr151][DEBUG ] connected to host: bdcr151 [bdcr151][DEBUG ] detect platform information from remote host [bdcr151][DEBUG ] detect machine type [bdcr151][DEBUG ] find the location of an executable [ceph_deploy.mon][INFO ] distro info: Red Hat Enterprise Linux Server 7.1 Maipo [bdcr151][DEBUG ] determining if provided host has same hostname in remote [bdcr151][DEBUG ] get remote short hostname [bdcr151][DEBUG ] deploying mon to bdcr151 [bdcr151][DEBUG ] get remote short hostname [bdcr151][DEBUG ] remote hostname: bdcr151 [bdcr151][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [bdcr151][DEBUG ] create the mon path if it does not exist [bdcr151][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-bdcr151/done [bdcr151][DEBUG ] create a done file to avoid re-doing the mon deployment [bdcr151][DEBUG ] create the init path if it does not exist [bdcr151][INFO ] Running command: systemctl enable ceph.target [bdcr151][INFO ] Running command: systemctl enable ceph-mon@bdcr151 [bdcr151][WARNIN] Failed to issue method call: No such file or directory [bdcr151][ERROR ] RuntimeError: command returned non-zero exit status: 1 [ceph_deploy.mon][ERROR ] Failed to execute command: systemctl enable ceph-mon@bdcr151 [ceph_deploy.mon][DEBUG ] detecting platform for host bdcr153 ... ceph-deployment.log Description: ceph-deployment.log ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MONs not forming quorum
Augh, never mind, firewall problem. Thanks anyway. From: Gruher, Joseph R Sent: Thursday, June 11, 2015 10:55 PM To: ceph-users@lists.ceph.com Cc: Gruher, Joseph R Subject: MONs not forming quorum Hi folks- I'm trying to deploy 0.94.2 (Hammer) onto CentOS7. I used to be pretty good at this on Ubuntu but it has been a while. Anyway, my monitors are not forming quorum, and I'm not sure why. They can definitely all ping each other and such. Any thoughts on specific problems in the output below, or just general causes for monitors not forming quorum, or where to get more debug information on what is going wrong? Thanks!! [root@bdca151 ceph]# ceph-deploy mon create-initial bdca15{0,2,3} [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.25): /bin/ceph-deploy mon create-initial bdca150 bdca152 bdca153 [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts bdca150 bdca152 bdca153 [ceph_deploy.mon][DEBUG ] detecting platform for host bdca150 ... [bdca150][DEBUG ] connected to host: bdca150 [bdca150][DEBUG ] detect platform information from remote host [bdca150][DEBUG ] detect machine type [ceph_deploy.mon][INFO ] distro info: CentOS Linux 7.1.1503 Core [bdca150][DEBUG ] determining if provided host has same hostname in remote [bdca150][DEBUG ] get remote short hostname [bdca150][DEBUG ] deploying mon to bdca150 [bdca150][DEBUG ] get remote short hostname [bdca150][DEBUG ] remote hostname: bdca150 [bdca150][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [bdca150][DEBUG ] create the mon path if it does not exist [bdca150][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-bdca150/done [bdca150][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-bdca150/done [bdca150][INFO ] creating keyring file: /var/lib/ceph/tmp/ceph-bdca150.mon.keyring [bdca150][DEBUG ] create the monitor keyring file [bdca150][INFO ] Running command: ceph-mon --cluster ceph --mkfs -i bdca150 --keyring /var/lib/ceph/tmp/ceph-bdca150.mon.keyring [bdca150][DEBUG ] ceph-mon: renaming mon.noname-a 10.1.0.150:6789/0 to mon.bdca150 [bdca150][DEBUG ] ceph-mon: set fsid to 770514ba-65e6-475b-8d43-ad6ee850ead6 [bdca150][DEBUG ] ceph-mon: created monfs at /var/lib/ceph/mon/ceph-bdca150 for mon.bdca150 [bdca150][INFO ] unlinking keyring file /var/lib/ceph/tmp/ceph-bdca150.mon.keyring [bdca150][DEBUG ] create a done file to avoid re-doing the mon deployment [bdca150][DEBUG ] create the init path if it does not exist [bdca150][DEBUG ] locating the `service` executable... [bdca150][INFO ] Running command: /usr/sbin/service ceph -c /etc/ceph/ceph.conf start mon.bdca150 [bdca150][DEBUG ] === mon.bdca150 === [bdca150][DEBUG ] Starting Ceph mon.bdca150 on bdca150... [bdca150][WARNIN] Running as unit run-52328.service. [bdca150][DEBUG ] Starting ceph-create-keys on bdca150... [bdca150][INFO ] Running command: systemctl enable ceph [bdca150][WARNIN] ceph.service is not a native service, redirecting to /sbin/chkconfig. [bdca150][WARNIN] Executing /sbin/chkconfig ceph on [bdca150][WARNIN] The unit files have no [Install] section. They are not meant to be enabled [bdca150][WARNIN] using systemctl. [bdca150][WARNIN] Possible reasons for having this kind of units are: [bdca150][WARNIN] 1) A unit may be statically enabled by being symlinked from another unit's [bdca150][WARNIN].wants/ or .requires/ directory. [bdca150][WARNIN] 2) A unit's purpose may be to act as a helper for some other unit which has [bdca150][WARNIN]a requirement dependency on it. [bdca150][WARNIN] 3) A unit may be started when needed via activation (socket, path, timer, [bdca150][WARNIN]D-Bus, udev, scripted systemctl call, ...). [bdca150][INFO ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.bdca150.asok mon_status [bdca150][DEBUG ] [bdca150][DEBUG ] status for monitor: mon.bdca150 [bdca150][DEBUG ] { [bdca150][DEBUG ] election_epoch: 0, [bdca150][DEBUG ] extra_probe_peers: [ [bdca150][DEBUG ] 10.1.0.152:6789/0, [bdca150][DEBUG ] 10.1.0.153:6789/0 [bdca150][DEBUG ] ], [bdca150][DEBUG ] monmap: { [bdca150][DEBUG ] created: 0.00, [bdca150][DEBUG ] epoch: 0, [bdca150][DEBUG ] fsid: 770514ba-65e6-475b-8d43-ad6ee850ead6, [bdca150][DEBUG ] modified: 0.00, [bdca150][DEBUG ] mons: [ [bdca150][DEBUG ] { [bdca150][DEBUG ] addr: 10.1.0.150:6789/0, [bdca150][DEBUG ] name: bdca150, [bdca150][DEBUG ] rank: 0 [bdca150][DEBUG ] }, [bdca150][DEBUG ] { [bdca150][DEBUG ] addr: 0.0.0.0:0/1, [bdca150][DEBUG ] name: bdca152, [bdca150][DEBUG ] rank: 1 [bdca150][DEBUG ] }, [bdca150][DEBUG ] { [bdca150][DEBUG ] addr: 0.0.0.0:0/2, [bdca150][DEBUG ] name: bdca153, [bdca150][DEBUG ] rank: 2 [bdca150][DEBUG ] } [bdca150][DEBUG
[ceph-users] MONs not forming quorum
Hi folks- I'm trying to deploy 0.94.2 (Hammer) onto CentOS7. I used to be pretty good at this on Ubuntu but it has been a while. Anyway, my monitors are not forming quorum, and I'm not sure why. They can definitely all ping each other and such. Any thoughts on specific problems in the output below, or just general causes for monitors not forming quorum, or where to get more debug information on what is going wrong? Thanks!! [root@bdca151 ceph]# ceph-deploy mon create-initial bdca15{0,2,3} [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.25): /bin/ceph-deploy mon create-initial bdca150 bdca152 bdca153 [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts bdca150 bdca152 bdca153 [ceph_deploy.mon][DEBUG ] detecting platform for host bdca150 ... [bdca150][DEBUG ] connected to host: bdca150 [bdca150][DEBUG ] detect platform information from remote host [bdca150][DEBUG ] detect machine type [ceph_deploy.mon][INFO ] distro info: CentOS Linux 7.1.1503 Core [bdca150][DEBUG ] determining if provided host has same hostname in remote [bdca150][DEBUG ] get remote short hostname [bdca150][DEBUG ] deploying mon to bdca150 [bdca150][DEBUG ] get remote short hostname [bdca150][DEBUG ] remote hostname: bdca150 [bdca150][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [bdca150][DEBUG ] create the mon path if it does not exist [bdca150][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-bdca150/done [bdca150][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-bdca150/done [bdca150][INFO ] creating keyring file: /var/lib/ceph/tmp/ceph-bdca150.mon.keyring [bdca150][DEBUG ] create the monitor keyring file [bdca150][INFO ] Running command: ceph-mon --cluster ceph --mkfs -i bdca150 --keyring /var/lib/ceph/tmp/ceph-bdca150.mon.keyring [bdca150][DEBUG ] ceph-mon: renaming mon.noname-a 10.1.0.150:6789/0 to mon.bdca150 [bdca150][DEBUG ] ceph-mon: set fsid to 770514ba-65e6-475b-8d43-ad6ee850ead6 [bdca150][DEBUG ] ceph-mon: created monfs at /var/lib/ceph/mon/ceph-bdca150 for mon.bdca150 [bdca150][INFO ] unlinking keyring file /var/lib/ceph/tmp/ceph-bdca150.mon.keyring [bdca150][DEBUG ] create a done file to avoid re-doing the mon deployment [bdca150][DEBUG ] create the init path if it does not exist [bdca150][DEBUG ] locating the `service` executable... [bdca150][INFO ] Running command: /usr/sbin/service ceph -c /etc/ceph/ceph.conf start mon.bdca150 [bdca150][DEBUG ] === mon.bdca150 === [bdca150][DEBUG ] Starting Ceph mon.bdca150 on bdca150... [bdca150][WARNIN] Running as unit run-52328.service. [bdca150][DEBUG ] Starting ceph-create-keys on bdca150... [bdca150][INFO ] Running command: systemctl enable ceph [bdca150][WARNIN] ceph.service is not a native service, redirecting to /sbin/chkconfig. [bdca150][WARNIN] Executing /sbin/chkconfig ceph on [bdca150][WARNIN] The unit files have no [Install] section. They are not meant to be enabled [bdca150][WARNIN] using systemctl. [bdca150][WARNIN] Possible reasons for having this kind of units are: [bdca150][WARNIN] 1) A unit may be statically enabled by being symlinked from another unit's [bdca150][WARNIN].wants/ or .requires/ directory. [bdca150][WARNIN] 2) A unit's purpose may be to act as a helper for some other unit which has [bdca150][WARNIN]a requirement dependency on it. [bdca150][WARNIN] 3) A unit may be started when needed via activation (socket, path, timer, [bdca150][WARNIN]D-Bus, udev, scripted systemctl call, ...). [bdca150][INFO ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.bdca150.asok mon_status [bdca150][DEBUG ] [bdca150][DEBUG ] status for monitor: mon.bdca150 [bdca150][DEBUG ] { [bdca150][DEBUG ] election_epoch: 0, [bdca150][DEBUG ] extra_probe_peers: [ [bdca150][DEBUG ] 10.1.0.152:6789/0, [bdca150][DEBUG ] 10.1.0.153:6789/0 [bdca150][DEBUG ] ], [bdca150][DEBUG ] monmap: { [bdca150][DEBUG ] created: 0.00, [bdca150][DEBUG ] epoch: 0, [bdca150][DEBUG ] fsid: 770514ba-65e6-475b-8d43-ad6ee850ead6, [bdca150][DEBUG ] modified: 0.00, [bdca150][DEBUG ] mons: [ [bdca150][DEBUG ] { [bdca150][DEBUG ] addr: 10.1.0.150:6789/0, [bdca150][DEBUG ] name: bdca150, [bdca150][DEBUG ] rank: 0 [bdca150][DEBUG ] }, [bdca150][DEBUG ] { [bdca150][DEBUG ] addr: 0.0.0.0:0/1, [bdca150][DEBUG ] name: bdca152, [bdca150][DEBUG ] rank: 1 [bdca150][DEBUG ] }, [bdca150][DEBUG ] { [bdca150][DEBUG ] addr: 0.0.0.0:0/2, [bdca150][DEBUG ] name: bdca153, [bdca150][DEBUG ] rank: 2 [bdca150][DEBUG ] } [bdca150][DEBUG ] ] [bdca150][DEBUG ] }, [bdca150][DEBUG ] name: bdca150, [bdca150][DEBUG ] outside_quorum: [ [bdca150][DEBUG ] bdca150 [bdca150][DEBUG ] ], [bdca150][DEBUG ] quorum: [],
Re: [ceph-users] Ceph RBD 0.78 Bug or feature?
Hi folks- Was this ever resolved? I’m not finding a resolution in the email chain, apologies if I am missing it. I am experiencing this same problem. Cluster works fine for object traffic, can’t seem to get rbd to work in 0.78. Worked fine in 0.72.2 for me. Running Ubuntu 13.04 with 3.12 kernel. $ rbd create rbd/myimage --size 102400 $ sudo rbd map rbd/myimage rbd: add failed: (5) Input/output error $ rbd ls rbd myimage $ Thanks, Joe From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of ??? Sent: Tuesday, March 25, 2014 1:59 AM To: Ilya Dryomov Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Ceph RBD 0.78 Bug or feature? Ilya, set chooseleaf_vary_r 0, but no map rbd images. [root@ceph01 cluster]# rbd map rbd/tst 2014-03-25 12:48:14.318167 7f44717f7760 2 auth: KeyRing::load: loaded key file /etc/ceph/ceph.client.admin.keyring rbd: add failed: (5) Input/output error [root@ceph01 cluster]# cat /var/log/messages | tail Mar 25 12:45:06 ceph01 kernel: libceph: osdc handle_map corrupt msg Mar 25 12:45:06 ceph01 kernel: libceph: mon2 192.168.100.203:6789http://192.168.100.203:6789 session established Mar 25 12:46:33 ceph01 kernel: libceph: client11240 fsid 10b46114-ac17-404e-99e3-69b34b85c901 Mar 25 12:46:33 ceph01 kernel: libceph: got v 13 cv 11 9 of ceph_pg_pool Mar 25 12:46:33 ceph01 kernel: libceph: osdc handle_map corrupt msg Mar 25 12:46:33 ceph01 kernel: libceph: mon2 192.168.100.203:6789http://192.168.100.203:6789 session established Mar 25 12:48:14 ceph01 kernel: libceph: client11313 fsid 10b46114-ac17-404e-99e3-69b34b85c901 Mar 25 12:48:14 ceph01 kernel: libceph: got v 13 cv 11 9 of ceph_pg_pool Mar 25 12:48:14 ceph01 kernel: libceph: osdc handle_map corrupt msg Mar 25 12:48:14 ceph01 kernel: libceph: mon0 192.168.100.201:6789http://192.168.100.201:6789 session established I do not really understand this error. CRUSH correct. Thanks. 2014-03-25 12:26 GMT+04:00 Ilya Dryomov ilya.dryo...@inktank.commailto:ilya.dryo...@inktank.com: On Tue, Mar 25, 2014 at 8:38 AM, Ирек Фасихов malm...@gmail.commailto:malm...@gmail.com wrote: Hi, Ilya. I added the files(crushd and osddump) to a folder in GoogleDrive. https://drive.google.com/folderview?id=0BxoNLVWxzOJWX0NLV1kzQ1l3Ymcusp=sharing OK, so this has nothing to do with caching. You have chooseleaf_vary_r set to 1 in your crushmap. This is a new crush tunable, which was introduced long after 3.14 merge window closed. It will be supported starting with 3.15, until then you should be able to do ceph osd getcrushmap -o /tmp/crush crushtool -i /tmp/crush --set-chooseleaf_vary_r 0 -o /tmp/crush.new ceph osd setcrushmap -i /tmp/crush.new to disable it. Thanks, Ilya -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph RBD 0.78 Bug or feature?
Meant to include this – what do these messages indicate? All systems have 0.78. [1301268.557820] Key type ceph registered [1301268.558524] libceph: loaded (mon/osd proto 15/24) [1301268.579486] rbd: loaded rbd (rados block device) [1301268.582364] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301268.582462] libceph: mon1 10.0.0.102:6789 socket error on read [1301278.589461] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301278.589558] libceph: mon1 10.0.0.102:6789 socket error on read [1301288.607615] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301288.607713] libceph: mon1 10.0.0.102:6789 socket error on read [1301298.625873] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301298.625970] libceph: mon1 10.0.0.102:6789 socket error on read [1301308.643936] libceph: mon0 10.0.0.101:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301308.644033] libceph: mon0 10.0.0.101:6789 socket error on read [1301318.662082] libceph: mon0 10.0.0.101:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301318.662179] libceph: mon0 10.0.0.101:6789 socket error on read [1301449.695232] libceph: mon0 10.0.0.101:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301449.695329] libceph: mon0 10.0.0.101:6789 socket error on read [1301459.716235] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301459.716332] libceph: mon1 10.0.0.102:6789 socket error on read [1301469.734425] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301469.734523] libceph: mon1 10.0.0.102:6789 socket error on read [1301479.752603] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301479.752700] libceph: mon1 10.0.0.102:6789 socket error on read [1301489.770773] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301489.770870] libceph: mon1 10.0.0.102:6789 socket error on read [1301499.788904] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301499.789001] libceph: mon1 10.0.0.102:6789 socket error on read $ ceph --version ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4) $ ssh mohonpeak01 'ceph --version' ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4) $ ssh mohonpeak02 'ceph --version' ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4) $ ceph health detail HEALTH_WARN noscrub,nodeep-scrub flag(s) set noscrub,nodeep-scrub flag(s) set $ ceph status cluster b12ebb71-e4a6-41fa-8246-71cbfa09fb6e health HEALTH_WARN noscrub,nodeep-scrub flag(s) set monmap e1: 2 mons at {mohonpeak01=10.0.0.101:6789/0,mohonpeak02=10.0.0.102:6789/0}, election epoch 10, quorum 0,1 mohonpeak01,mohonpeak02 osdmap e216: 18 osds: 18 up, 18 in flags noscrub,nodeep-scrub pgmap v202112: 2784 pgs, 10 pools, 1637 GB data, 427 kobjects 2439 GB used, 12643 GB / 15083 GB avail 2784 active+clean From: Gruher, Joseph R Sent: Friday, April 04, 2014 11:44 AM To: 'Ирек Фасихов'; Ilya Dryomov Cc: ceph-users@lists.ceph.com; Gruher, Joseph R Subject: RE: [ceph-users] Ceph RBD 0.78 Bug or feature? Hi folks- Was this ever resolved? I’m not finding a resolution in the email chain, apologies if I am missing it. I am experiencing this same problem. Cluster works fine for object traffic, can’t seem to get rbd to work in 0.78. Worked fine in 0.72.2 for me. Running Ubuntu 13.04 with 3.12 kernel. $ rbd create rbd/myimage --size 102400 $ sudo rbd map rbd/myimage rbd: add failed: (5) Input/output error $ rbd ls rbd myimage $ Thanks, Joe From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of ??? Sent: Tuesday, March 25, 2014 1:59 AM To: Ilya Dryomov Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Ceph RBD 0.78 Bug or feature? Ilya, set chooseleaf_vary_r 0, but no map rbd images. [root@ceph01 cluster]# rbd map rbd/tst 2014-03-25 12:48:14.318167 7f44717f7760 2 auth: KeyRing::load: loaded key file /etc/ceph/ceph.client.admin.keyring rbd: add failed: (5) Input/output error [root@ceph01 cluster]# cat /var/log/messages | tail Mar 25 12:45:06 ceph01 kernel: libceph: osdc handle_map corrupt msg Mar 25 12:45:06 ceph01 kernel: libceph: mon2 192.168.100.203:6789http://192.168.100.203:6789 session established Mar 25 12:46:33 ceph01 kernel: libceph: client11240 fsid 10b46114-ac17-404e-99e3-69b34b85c901 Mar 25 12:46:33 ceph01 kernel: libceph: got v 13 cv 11 9 of ceph_pg_pool Mar 25 12:46:33 ceph01 kernel: libceph
Re: [ceph-users] Ceph RBD 0.78 Bug or feature?
Aha – upgrade of kernel from 3.13 to 3.14 appears to have resolved the problem. Thanks, Joe From: Gruher, Joseph R Sent: Friday, April 04, 2014 11:48 AM To: Ирек Фасихов; Ilya Dryomov Cc: ceph-users@lists.ceph.com; Gruher, Joseph R Subject: RE: [ceph-users] Ceph RBD 0.78 Bug or feature? Meant to include this – what do these messages indicate? All systems have 0.78. [1301268.557820] Key type ceph registered [1301268.558524] libceph: loaded (mon/osd proto 15/24) [1301268.579486] rbd: loaded rbd (rados block device) [1301268.582364] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301268.582462] libceph: mon1 10.0.0.102:6789 socket error on read [1301278.589461] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301278.589558] libceph: mon1 10.0.0.102:6789 socket error on read [1301288.607615] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301288.607713] libceph: mon1 10.0.0.102:6789 socket error on read [1301298.625873] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301298.625970] libceph: mon1 10.0.0.102:6789 socket error on read [1301308.643936] libceph: mon0 10.0.0.101:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301308.644033] libceph: mon0 10.0.0.101:6789 socket error on read [1301318.662082] libceph: mon0 10.0.0.101:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301318.662179] libceph: mon0 10.0.0.101:6789 socket error on read [1301449.695232] libceph: mon0 10.0.0.101:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301449.695329] libceph: mon0 10.0.0.101:6789 socket error on read [1301459.716235] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301459.716332] libceph: mon1 10.0.0.102:6789 socket error on read [1301469.734425] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301469.734523] libceph: mon1 10.0.0.102:6789 socket error on read [1301479.752603] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301479.752700] libceph: mon1 10.0.0.102:6789 socket error on read [1301489.770773] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301489.770870] libceph: mon1 10.0.0.102:6789 socket error on read [1301499.788904] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 4a042a42 server's 104a042a42, missing 10 [1301499.789001] libceph: mon1 10.0.0.102:6789 socket error on read $ ceph --version ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4) $ ssh mohonpeak01 'ceph --version' ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4) $ ssh mohonpeak02 'ceph --version' ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4) $ ceph health detail HEALTH_WARN noscrub,nodeep-scrub flag(s) set noscrub,nodeep-scrub flag(s) set $ ceph status cluster b12ebb71-e4a6-41fa-8246-71cbfa09fb6e health HEALTH_WARN noscrub,nodeep-scrub flag(s) set monmap e1: 2 mons at {mohonpeak01=10.0.0.101:6789/0,mohonpeak02=10.0.0.102:6789/0}, election epoch 10, quorum 0,1 mohonpeak01,mohonpeak02 osdmap e216: 18 osds: 18 up, 18 in flags noscrub,nodeep-scrub pgmap v202112: 2784 pgs, 10 pools, 1637 GB data, 427 kobjects 2439 GB used, 12643 GB / 15083 GB avail 2784 active+clean From: Gruher, Joseph R Sent: Friday, April 04, 2014 11:44 AM To: 'Ирек Фасихов'; Ilya Dryomov Cc: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com; Gruher, Joseph R Subject: RE: [ceph-users] Ceph RBD 0.78 Bug or feature? Hi folks- Was this ever resolved? I’m not finding a resolution in the email chain, apologies if I am missing it. I am experiencing this same problem. Cluster works fine for object traffic, can’t seem to get rbd to work in 0.78. Worked fine in 0.72.2 for me. Running Ubuntu 13.04 with 3.12 kernel. $ rbd create rbd/myimage --size 102400 $ sudo rbd map rbd/myimage rbd: add failed: (5) Input/output error $ rbd ls rbd myimage $ Thanks, Joe From: ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of ??? Sent: Tuesday, March 25, 2014 1:59 AM To: Ilya Dryomov Cc: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Subject: Re: [ceph-users] Ceph RBD 0.78 Bug or feature? Ilya, set chooseleaf_vary_r 0, but no map rbd images. [root@ceph01 cluster]# rbd map rbd/tst 2014-03-25 12:48:14.318167 7f44717f7760 2 auth: KeyRing::load: loaded key file /etc/ceph/ceph.client.admin.keyring rbd: add failed: (5) Input/output error [root@ceph01 cluster]# cat /var/log/messages
[ceph-users] Erasure Code Setup
Hi Folks- Having a bit of trouble with EC setup on 0.78. Hoping someone can help me out. I've got most of the pieces in place, I think I'm just having a problem with the ruleset. I am running 0.78: ceph --version ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4) I created a new ruleset: ceph osd crush rule create-erasure ecruleset Then I created a new erasure code pool: ceph osd pool create mycontainers_1 1800 1800 erasure crush_ruleset=ecruleset erasure-code-k=9 erasure-code-m=3 Pool exists: ceph@joceph-admin01:/etc/ceph$ ceph osd dump epoch 106 fsid b12ebb71-e4a6-41fa-8246-71cbfa09fb6e created 2014-03-24 12:06:28.290970 modified 2014-03-24 12:42:59.231381 flags pool 0 'data' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 84 owner 0 flags hashpspool crash_replay_interval 45 stripe_width 0 pool 1 'metadata' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 86 owner 0 flags hashpspool stripe_width 0 pool 2 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 88 owner 0 flags hashpspool stripe_width 0 pool 4 'mycontainers_2' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 1200 pgp_num 1200 last_change 100 owner 0 flags hashpspool stripe_width 0 pool 5 'mycontainers_3' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1800 pgp_num 1800 last_change 94 owner 0 flags hashpspool stripe_width 0 pool 6 'mycontainers_1' erasure size 12 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 1800 pgp_num 1800 last_change 104 owner 0 flags hashpspool stripe_width 4320 However, the new PGs won't come to a healthy state: ceph@joceph-admin01:/etc/ceph$ ceph status cluster b12ebb71-e4a6-41fa-8246-71cbfa09fb6e health HEALTH_WARN 1800 pgs incomplete; 1800 pgs stuck inactive; 1800 pgs stuck unclean monmap e1: 2 mons at {mohonpeak01=10.0.0.101:6789/0,mohonpeak02=10.0.0.102:6789/0}, election epoch 4, quorum 0,1 mohonpeak01,mohonpeak02 osdmap e106: 18 osds: 18 up, 18 in pgmap v261: 5184 pgs, 7 pools, 0 bytes data, 0 objects 682 MB used, 15082 GB / 15083 GB avail 3384 active+clean 1800 incomplete I think this is because it is using a failure domain of hosts and I only have 2 hosts (with 9 OSDs on each for 18 OSDs total). I suspect I need to change the ruleset to use a failure domain of OSD instead of host. This is also mentioned on this page: https://ceph.com/docs/master/dev/erasure-coded-pool/. However, the guidance on that that page to adjust it using commands of the form ceph osd erasure-code-profile set myprofile is not working for me. As far as I can tell ceph osd erasure-code-profile does not seem to be a valid command syntax. Is this documentation correct and up to date for 0.78? Can anyone suggest where I am going wrong? Thanks! ceph@joceph-admin01:/etc/ceph$ ceph osd erasure-code-profile ls no valid command found; 10 closest matches: osd tier add-cache poolname poolname int[0-] osd tier set-overlay poolname poolname osd tier remove-overlay poolname osd tier remove poolname poolname osd tier cache-mode poolname none|writeback|forward|readonly osd thrash int[0-] osd tier add poolname poolname {--force-nonempty} osd stat osd reweight-by-utilization {int[100-]} osd pool stats {name} Error EINVAL: invalid command ceph@joceph-admin01:/etc/ceph$ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] erasure coding testing
Great, thanks! I'll watch (hope) for an update later this week. Appreciate the rapid response. -Joe From: Ian Colle [mailto:ian.co...@inktank.com] Sent: Sunday, March 16, 2014 7:22 PM To: Gruher, Joseph R; ceph-users@lists.ceph.com Subject: Re: [ceph-users] erasure coding testing Joe, We're pushing to get 0.78 out this week, which will allow you to play with EC. Ian R. Colle Director of Engineering Inktank Delivering the Future of Storage http://www.linkedin.com/in/ircolle http://www.twitter.com/ircolle Cell: +1.303.601.7713 Email: i...@inktank.commailto:i...@inktank.com On 3/16/14, 8:11 PM, Gruher, Joseph R joseph.r.gru...@intel.commailto:joseph.r.gru...@intel.com wrote: Hey all- Can anyone tell me, if I install the latest development release (looks like it is 0.77) can I enable and test erasure coding? Or do I have to wait for the actual Firefly release? I don't want to deploy anything for production, basically I just want to do some lab testing to see what kind of CPU loading results from erasure coding. Also, if anyone has any data along those lines already, would love a pointer to it. Thanks! -Joe ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Low RBD Performance
-Original Message- From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users- boun...@lists.ceph.com] On Behalf Of Mark Nelson Sent: Monday, February 03, 2014 6:48 PM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Low RBD Performance On 02/03/2014 07:29 PM, Gruher, Joseph R wrote: Hi folks- I'm having trouble demonstrating reasonable performance of RBDs. I'm running Ceph 0.72.2 on Ubuntu 13.04 with the 3.12 kernel. I have four dual-Xeon servers, each with 24GB RAM, and an Intel 320 SSD for journals and four WD 10K RPM SAS drives for OSDs, all connected with an LSI 1078. This is just a lab experiment using scrounged hardware so everything isn't sized to be a Ceph cluster, it's just what I have lying around, but I should have more than enough CPU and memory resources. Everything is connected with a single 10GbE. When testing with RBDs from four clients (also running Ubuntu 13.04 with 3.12 kernel) I am having trouble breaking 300 IOPS on a 4KB random read or write workload (cephx set to none, replication set to one). IO is generated using FIO from four clients, each hosting a single 1TB RBD, and I've experimented with queue depths and increasing the number of RBDs without any benefit. 300 IOPS for a pool of 16 10K RPM HDDs seems quite low, not to mention the journal should provide a good boost on write workloads. When I run a 4KB object write workload in Cosbench I can approach 3500 Obj/Sec which seems more reasonable. Sample FIO configuration: [global] ioengine=libaio direct=1 ramp_time=300 runtime=300 [4k-rw] description=4k-rw filename=/dev/rbd1 rw=randwrite bs=4k stonewall I use --iodepth=X on the FIO command line to set the queue depth when testing. I notice in the FIO output despite the iodepth setting it seems to be reporting an IO depth of only 1, which would certainly help explain poor performance, but I'm at a loss as to why, I wonder if it could be something specific to RBD behavior, like I need to use a different IO engine to establish queue depth. IO depths: 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0% Any thoughts appreciated! Interesting results with the io depth at 1. I Haven't seen that behaviour when using libaio, direct=1, and higher io depths. Is this kernel RBD or QEMU/KVM? If it's QEMU/KVM, is it the libvirt driver? Certainly 300 IOPS is low for that kind of setup compared to what we've seen for RBD on other systems (especially with 1x replication). Given that you are seeing more reasonable performance with RGW, I guess I'd look at a couple things: - Figure out why fio is reporting queue depth = 1 Yup, I agree, I will work on this and report back. First thought is to try specifying the queue depth in the FIO workload file instead of on the command line. - Does increasing the num jobs help (ie get concurrency another way)? I will give this a shot. - Do you have enough PGs in the RBD pool? I should, for 16 OSDs and no replication I use 2048 PGs/PGPs (100 * 16 / 1 rounded up to power of 2). - Are you using the virtio driver if QEMU/KVM? No virtualization, clients are bare metal using kernel RBD. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Low RBD Performance
-Original Message- From: Gregory Farnum [mailto:g...@inktank.com] Sent: Tuesday, February 04, 2014 9:46 AM To: Gruher, Joseph R Cc: Mark Nelson; ceph-users@lists.ceph.com; Ilya Dryomov Subject: Re: [ceph-users] Low RBD Performance On Tue, Feb 4, 2014 at 9:29 AM, Gruher, Joseph R joseph.r.gru...@intel.com wrote: -Original Message- From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users- boun...@lists.ceph.com] On Behalf Of Mark Nelson Sent: Monday, February 03, 2014 6:48 PM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Low RBD Performance On 02/03/2014 07:29 PM, Gruher, Joseph R wrote: Hi folks- I'm having trouble demonstrating reasonable performance of RBDs. I'm running Ceph 0.72.2 on Ubuntu 13.04 with the 3.12 kernel. I have four dual-Xeon servers, each with 24GB RAM, and an Intel 320 SSD for journals and four WD 10K RPM SAS drives for OSDs, all connected with an LSI 1078. This is just a lab experiment using scrounged hardware so everything isn't sized to be a Ceph cluster, it's just what I have lying around, but I should have more than enough CPU and memory resources. Everything is connected with a single 10GbE. When testing with RBDs from four clients (also running Ubuntu 13.04 with 3.12 kernel) I am having trouble breaking 300 IOPS on a 4KB random read or write workload (cephx set to none, replication set to one). IO is generated using FIO from four clients, each hosting a single 1TB RBD, and I've experimented with queue depths and increasing the number of RBDs without any benefit. 300 IOPS for a pool of 16 10K RPM HDDs seems quite low, not to mention the journal should provide a good boost on write workloads. When I run a 4KB object write workload in Cosbench I can approach 3500 Obj/Sec which seems more reasonable. Sample FIO configuration: [global] ioengine=libaio direct=1 ramp_time=300 runtime=300 [4k-rw] description=4k-rw filename=/dev/rbd1 rw=randwrite bs=4k stonewall I use --iodepth=X on the FIO command line to set the queue depth when testing. I notice in the FIO output despite the iodepth setting it seems to be reporting an IO depth of only 1, which would certainly help explain poor performance, but I'm at a loss as to why, I wonder if it could be something specific to RBD behavior, like I need to use a different IO engine to establish queue depth. IO depths: 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0% Any thoughts appreciated! Interesting results with the io depth at 1. I Haven't seen that behaviour when using libaio, direct=1, and higher io depths. Is this kernel RBD or QEMU/KVM? If it's QEMU/KVM, is it the libvirt driver? Certainly 300 IOPS is low for that kind of setup compared to what we've seen for RBD on other systems (especially with 1x replication). Given that you are seeing more reasonable performance with RGW, I guess I'd look at a couple things: - Figure out why fio is reporting queue depth = 1 Yup, I agree, I will work on this and report back. First thought is to try specifying the queue depth in the FIO workload file instead of on the command line. - Does increasing the num jobs help (ie get concurrency another way)? I will give this a shot. - Do you have enough PGs in the RBD pool? I should, for 16 OSDs and no replication I use 2048 PGs/PGPs (100 * 16 / 1 rounded up to power of 2). - Are you using the virtio driver if QEMU/KVM? No virtualization, clients are bare metal using kernel RBD. I believe that directIO via the kernel client will go all the way to the OSDs and to disk before returning. I imagine that something in the stack is preventing the dispatch from actually happening asynchronously in that case, and the reason you're getting 300 IOPS is because your total RTT is about 3 ms with that code... Ilya, is that assumption of mine correct? One thing that occurs to me is that for direct IO it's fair to use the ack instead of on-disk response from the OSDs, although that would only help us for people using btrfs. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com Ultimately this seems to be an FIO issue. If I use --iodepth X or --iodepth=X on the FIO command line I always get queue depth 1. After switching to specifying iodepth=X in the body of the FIO workload file I do get the desired queue depth and I can immediately see performance is much higher (a full re-test is underway, I can share some results when complete if anyone is curious). This seems to have effectively worked around the problem, although I'm still curious why the command line parameters don't have the desired effect. Thanks for the responses! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Low RBD Performance
Ultimately this seems to be an FIO issue. If I use --iodepth X or -- iodepth=X on the FIO command line I always get queue depth 1. After switching to specifying iodepth=X in the body of the FIO workload file I do get the desired queue depth and I can immediately see performance is much higher (a full re-test is underway, I can share some results when complete if anyone is curious). This seems to have effectively worked around the problem, although I'm still curious why the command line parameters don't have the desired effect. Thanks for the responses! Strange! I do most of our testing using the command line parameters as well. What version of fio are you using? Maybe there is a bug. For what it's worth, I'm using --iodepth=X, and fio version 1.59 from the Ubuntu precise repository. Mark FIO --version reports 2.0.8. Installed on Ubuntu 13.04 from the default repositories (just did an 'apt-get install fio'). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Low RBD Performance
Hi folks- I'm having trouble demonstrating reasonable performance of RBDs. I'm running Ceph 0.72.2 on Ubuntu 13.04 with the 3.12 kernel. I have four dual-Xeon servers, each with 24GB RAM, and an Intel 320 SSD for journals and four WD 10K RPM SAS drives for OSDs, all connected with an LSI 1078. This is just a lab experiment using scrounged hardware so everything isn't sized to be a Ceph cluster, it's just what I have lying around, but I should have more than enough CPU and memory resources. Everything is connected with a single 10GbE. When testing with RBDs from four clients (also running Ubuntu 13.04 with 3.12 kernel) I am having trouble breaking 300 IOPS on a 4KB random read or write workload (cephx set to none, replication set to one). IO is generated using FIO from four clients, each hosting a single 1TB RBD, and I've experimented with queue depths and increasing the number of RBDs without any benefit. 300 IOPS for a pool of 16 10K RPM HDDs seems quite low, not to mention the journal should provide a good boost on write workloads. When I run a 4KB object write workload in Cosbench I can approach 3500 Obj/Sec which seems more reasonable. Sample FIO configuration: [global] ioengine=libaio direct=1 ramp_time=300 runtime=300 [4k-rw] description=4k-rw filename=/dev/rbd1 rw=randwrite bs=4k stonewall I use --iodepth=X on the FIO command line to set the queue depth when testing. I notice in the FIO output despite the iodepth setting it seems to be reporting an IO depth of only 1, which would certainly help explain poor performance, but I'm at a loss as to why, I wonder if it could be something specific to RBD behavior, like I need to use a different IO engine to establish queue depth. IO depths: 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0% Any thoughts appreciated! Thanks, Joe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Performance Testing Setup Tricks?
Hi all- I'm creating some scripted performance testing for my Ceph cluster. The part relevant to my questions works like this: 1. Create some pools 2. Create and map some RBDs 3. Write-in the RBDs using DD or FIO 4. Run FIO testing on the RBDs (small block random and large block sequential with varying queue depths and workers) 5. Delete the pools and make some new pools 6. Populate with objects using Cosbench 7. Run Cosbench to measure object read and write performance 8. (repeat for various object sizes) 9. Delete the pools The whole this works pretty well as far as generating results. The part I'm hoping to improve is steps 3 and 6, where I'm writing in the RBDs, or where I'm populating objects to the pools, respectively. For any significant amount of data relative to the size of the cluster (which is 16TB now but will probably get bigger) this takes hours and hours and hours. I'm wondering if there is any way to shortcut these preparation steps. For example, for a new RBD, is there any way to tell Ceph to treat it as already written-in or thickly provisioned, and just serve me up whatever junk data is in there when I read from it? Since the RBD sits on objects instead of blocks I'm guessing not but it doesn't hurt to ask. Similarly, are there any tricks I might investigate for populating junk objects into a pool, which I can then read and write, other than actually writing all the objects in with a tool like Cosbench? There may not be a better approach, but any thoughts are appreciated. Thanks! -Joe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] odd performance graph
I don't know how rbd works inside, but i think ceph rbd here returns zeros without real osd disk read if the block/sector of the rbd-disk is unused. That would explain the graph you see. You can try adding a second rbd image and not format/use it and benchmark this disk, then make a filesystem on it and write some data and benchmark again... When performance testing RBDs I generally write in the whole area before doing any testing to avoid this problem. It would be interesting to have confirmation this is a real concern with Ceph. I know it is in other thin provisioned storage, for example, VMWare. Perhaps someone more expert can comment. Also, is there any way to shortcut the write-in process? Writing in TBs of RBD image can really extend the length of our performance test cycle. It would be great if there was some shortcut to cause Ceph to treat the whole RBD as having already been written, or just go fetch data from disk on all reads regardless of whether that area had been written, just for testing purposes. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Minimum requirements for ceph monitors?
For ~$67 you get a mini-itx motherboard with a soldered on 17W dual core 1.8GHz ivy-bridge based Celeron (supports SSE4.2 CRC32 instructions!). It has 2 standard dimm slots so no compromising on memory, on-board gigabit eithernet, 3 3Gb/s + 1 6Gb/s SATA, and a single PCIE slot for an additional NIC. This has the potential to make a very competent low cost, lowish power OSD or mon server. The biggest downside is that it doesn't appear to support ECC memory. Some of the newer Atoms appear to, so that might be an option as well. Yup, the server and storage purposed Atoms do support ECC. I think Atom sounds like an interesting fit for OSD servers, the new Avoton SoCs are quite fast, can host up to 64GB ECC RAM on two channels, and have 4x1GbE or 1x10GbE onboard. Plus six SATA lanes onboard which would be a nice fit for an OS disk, a journal SSD and four OSD disks. I have been hoping to track down a few boards and do some testing with Atom myself. http://ark.intel.com/products/77987/Intel-Atom-Processor-C2750-4M-Cache-2_40-GHz Would be interested to hear if anyone else has tried such an experiment. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [ANN] ceph-deploy 1.3.3 released!
Hi Alfredo- Have you looked at adding the ability to specify a proxy on the ceph-deploy command line? Something like: ceph-deploy install --proxy {http_proxy} That would then need to run all the remote commands (rpm, curl, wget, etc) with the proxy. Not sure how complex that would be to implement... just curious. -Joe -Original Message- From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users- boun...@lists.ceph.com] On Behalf Of Alfredo Deza Sent: Tuesday, November 26, 2013 12:51 PM To: ceph-devel; ceph-users@lists.ceph.com Subject: [ceph-users] [ANN] ceph-deploy 1.3.3 released! Hi All, There is a new release of ceph-deploy, the easy deployment tool for Ceph. The most important (non-bug) change for this release is the ability to specify repository mirrors when installing ceph. This can be done with environment variables or flags in the `install` subcommand. Full documentation on that feature can be found in the new location for docs: http://ceph.com/ceph-deploy/docs/install.html#behind-firewall The complete changelog can be found here: http://ceph.com/ceph-deploy/docs/changelog.html#id1 Make sure you update! Thanks, Alfredo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy problems on CentOS-6.4
Those aren't really errors, when ceph-deploy runs commands on the host anything that gets printed to stderr as a result is relayed back through ceph-deploy with the [ERROR] tag. If you look at the content of the errors it just has the output of the commands that were run in the step beforehand. This seems to confuse a ton of people, I wonder if ceph-deploy wouldn't be better off labeling this content as something like [OUTPUT] or [RESPONSE]. From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Gautam Saxena Sent: Friday, November 22, 2013 10:48 AM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] ceph-deploy problems on CentOS-6.4 I'm also getting similar problems, although in my installation, even though there are errors, it seems to finish. (I'm using centos 6.4 and emperor release and I added the defaults http and https to the sudoers file for the ia1 node, though I didn't do so for the the ia2 and ia3 nodes.) So is everything ok? If so, why are there error statements? Here are the excerpt of the logs: command that I executed -- ceph-deploy install ia1 ia2 ia3 First portion of the log -- [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy install ia1 ia2 ia3 [ceph_deploy.install][DEBUG ] Installing stable version emperor on cluster ceph hosts ia1 ia2 ia3 [ceph_deploy.install][DEBUG ] Detecting platform for host ia1 ... [ia1][DEBUG ] connected to host: ia1 [ia1][DEBUG ] detect platform information from remote host [ia1][DEBUG ] detect machine type [ceph_deploy.install][INFO ] Distro info: CentOS 6.4 Final [ia1][INFO ] installing ceph on ia1 [ia1][INFO ] Running command: sudo yum -y -q install wget [ia1][DEBUG ] Package wget-1.12-1.8.el6.x86_64 already installed and latest version [ia1][INFO ] adding EPEL repository [ia1][INFO ] Running command: sudo wget http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm [ia1][ERROR ] --2013-11-22 13:40:52-- http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm [ia1][ERROR ] Resolving dl.fedoraproject.org... 209.132.181.23, 209.132.181.24, 209.132.181.25, ... [ia1][ERROR ] Connecting to dl.fedoraproject.orghttp://dl.fedoraproject.org|209.132.181.23|:80... connected. [ia1][ERROR ] HTTP request sent, awaiting response... 200 OK [ia1][ERROR ] Length: 14540 (14K) [application/x-rpm] [ia1][ERROR ] Saving to: `epel-release-6-8.noarch.rpm.1' [ia1][ERROR ] [ia1][ERROR ] 0K .. 100% 158K=0.09s [ia1][ERROR ] [ia1][ERROR ] 2013-11-22 13:40:52 (158 KB/s) - `epel-release-6-8.noarch.rpm.1' saved [14540/14540] [ia1][ERROR ] [ia1][INFO ] Running command: sudo rpm -Uvh --replacepkgs epel-release-6*.rpm [ia1][ERROR ] warning: epel-release-6-8.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID 0608b895: NOKEY [ia1][DEBUG ] Preparing... ## [ia1][DEBUG ] epel-release ## [ia1][INFO ] Running command: sudo rpm --import https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc [ia1][INFO ] Running command: sudo rpm -Uvh --replacepkgs http://ceph.com/rpm-emperor/el6/noarch/ceph-release-1-0.el6.noarch.rpm [ia1][DEBUG ] Retrieving http://ceph.com/rpm-emperor/el6/noarch/ceph-release-1-0.el6.noarch.rpm [ia1][DEBUG ] Preparing... ## [ia1][DEBUG ] ceph-release ## [ia1][INFO ] Running command: sudo yum -y -q install ceph [ia1][ERROR ] warning: rpmts_HdrFromFdno: Header V3 RSA/SHA256 Signature, key ID 0608b895: NOKEY [ia1][ERROR ] Importing GPG key 0x0608B895: [ia1][ERROR ] Userid : EPEL (6) e...@fedoraproject.orgmailto:e...@fedoraproject.org [ia1][ERROR ] Package: epel-release-6-8.noarch (installed) [ia1][ERROR ] From : /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6 [ia1][ERROR ] Warning: RPMDB altered outside of yum. [ia1][INFO ] Running command: sudo ceph --version [ia1][DEBUG ] ceph version 0.72.1 (4d923861868f6a15dcb33fef7f50f674997322de) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy disk zap fails but succeeds on retry
-Original Message- From: Alfredo Deza [mailto:alfredo.d...@inktank.com] Sent: Wednesday, November 20, 2013 7:17 AM To: Gruher, Joseph R Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] ceph-deploy disk zap fails but succeeds on retry On Mon, Nov 18, 2013 at 1:12 PM, Gruher, Joseph R joseph.r.gru...@intel.com wrote: -Original Message- From: Alfredo Deza [mailto:alfredo.d...@inktank.com] Sent: Monday, November 18, 2013 6:34 AM To: Gruher, Joseph R Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] ceph-deploy disk zap fails but succeeds on retry I went ahead and created a ticket to track this, if you have any new input, please make sure you add to the actual ticket: http://tracker.ceph.com/issues/6793 Thanks for reporting the problem! Will do! I should be bringing up a few different cluster configurations on this hardware (we're doing some Ceph performance testing) so I may be able to reproduce again and get more details. I am trying to replicate this but somehow failing... in what state where the drives, e.g. did you have any partitions before starting? or was this like a new drive out of the box that was put in there? So far I can only see it sometimes in 13.04 and not anywhere else Looking at the ceph-deploy disk list output I captured at the time (see below) it seems to be reporting partition data on the drives (/dev/sdd1 exists in this example). The last thing I did with the drives prior to deploying Emperor was some tests to baseline their performance with FIO, including a fair amount of write activity to the raw devices. As a result I would expect their initial state to have basically been junk data. Prior to that FIO testing the disks would have been OSD disks in a Dumpling cluster. ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk list joceph02 [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk list joceph02 [joceph02][DEBUG ] connected to host: joceph02 [joceph02][DEBUG ] detect platform information from remote host [joceph02][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring [ceph_deploy.osd][DEBUG ] Listing disks on joceph02... [joceph02][INFO ] Running command: sudo ceph-disk list [joceph02][DEBUG ] /dev/sda : [joceph02][DEBUG ] /dev/sda1 other, ext4, mounted on / [joceph02][DEBUG ] /dev/sda2 other [joceph02][DEBUG ] /dev/sda5 swap, swap [joceph02][DEBUG ] /dev/sdb other, unknown [joceph02][DEBUG ] /dev/sdc other, unknown [joceph02][DEBUG ] /dev/sdd : [joceph02][DEBUG ] /dev/sdd1 other [joceph02][DEBUG ] /dev/sde : [joceph02][DEBUG ] /dev/sde1 other [joceph02][DEBUG ] /dev/sdf : [joceph02][DEBUG ] /dev/sdf1 other ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk zap joceph02:/dev/sdd [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk zap joceph02:/dev/sdd [ceph_deploy.osd][DEBUG ] zapping /dev/sdd on joceph02 [joceph02][DEBUG ] connected to host: joceph02 [joceph02][DEBUG ] detect platform information from remote host [joceph02][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring [joceph02][DEBUG ] zeroing last few blocks of device [joceph02][INFO ] Running command: sudo sgdisk --zap-all --clear --mbrtogpt -- /dev/sdd [joceph02][ERROR ] Caution: invalid main GPT header, but valid backup; regenerating main header [joceph02][ERROR ] from backup! [joceph02][ERROR ] [joceph02][ERROR ] Warning! Main partition table CRC mismatch! Loaded backup partition table [joceph02][ERROR ] instead of main partition table! [joceph02][ERROR ] [joceph02][ERROR ] Warning! One or more CRCs don't match. You should repair the disk! [joceph02][ERROR ] [joceph02][ERROR ] Invalid partition data! [joceph02][DEBUG ] Caution! After loading partitions, the CRC doesn't check out! [joceph02][DEBUG ] GPT data structures destroyed! You may now partition the disk using fdisk or [joceph02][DEBUG ] other utilities. [joceph02][DEBUG ] Information: Creating fresh partition table; will override earlier problems! [joceph02][DEBUG ] Non-GPT disk; not saving changes. Use -g to override. [joceph02][ERROR ] Traceback (most recent call last): [joceph02][ERROR ] File /usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/process.py, line 68, in run [joceph02][ERROR ] reporting(conn, result, timeout) [joceph02][ERROR ] File /usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/log.py, line 13, in reporting [joceph02][ERROR ] received = result.receive(timeout) [joceph02][ERROR ] File /usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/lib/execnet/gateway_base.py, line 455, in receive [joceph02][ERROR ] raise self._getremoteerror() or EOFError() [joceph02][ERROR ] RemoteError: Traceback (most recent call last): [joceph02][ERROR ] File string, line 806, in executetask [joceph02][ERROR ] File , line 35, in _remote_run [joceph02][ERROR ] RuntimeError: command returned non-zero exit status: 3 [joceph02][ERROR ] [joceph02][ERROR ] [ceph_deploy
Re: [ceph-users] Size of RBD images
So is there any size limit on RBD images? I had a failure this morning mounting 1TB RBD. Deleting now (why does it take so long to delete if it was never even mapped, much less written to?) and will retry with smaller images. See output below. This is 0.72 on Ubuntu 13.04 with 3.12 kernel. ceph@joceph-client01:~$ rbd info testrbd rbd image 'testrbd': size 1024 GB in 262144 objects order 22 (4096 kB objects) block_name_prefix: rb.0.1770.6b8b4567 format: 1 ceph@joceph-client01:~$ rbd map testrbd -p testpool01 rbd: add failed: (13) Permission denied ceph@joceph-client01:~$ sudo rbd map testrbd -p testpool01 rbd: add failed: (2) No such file or directory ceph@joceph-client01:/etc/ceph$ rados df pool name category KB objects clones degraded unfound rdrd KB wrwr KB data- 000 0 00000 metadata- 000 0 00000 rbd - 120 0 0 10788 testpool01 - 000 0 00000 testpool02 - 000 0 00000 testpool03 - 000 0 00000 testpool04 - 000 0 00000 total used 23287851602 total avail 9218978040 total space11547763200 ceph@joceph-client01:/etc/ceph$ sudo modprobe rbd ceph@joceph-client01:/etc/ceph$ sudo rbd map testrbd --pool testpool01 rbd: add failed: (2) No such file or directory ceph@joceph-client01:/etc/ceph$ rbd info testrbd rbd image 'testrbd': size 1024 GB in 262144 objects order 22 (4096 kB objects) block_name_prefix: rb.0.1770.6b8b4567 format: 1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Size of RBD images
-Original Message- From: Gruher, Joseph R Sent: Tuesday, November 19, 2013 12:24 PM To: 'Wolfgang Hennerbichler'; Bernhard Glomm Cc: ceph-users@lists.ceph.com Subject: RE: [ceph-users] Size of RBD images So is there any size limit on RBD images? I had a failure this morning mounting 1TB RBD. Deleting now (why does it take so long to delete if it was never even mapped, much less written to?) and will retry with smaller images. See output below. This is 0.72 on Ubuntu 13.04 with 3.12 kernel. ceph@joceph-client01:~$ rbd info testrbd rbd image 'testrbd': size 1024 GB in 262144 objects order 22 (4096 kB objects) block_name_prefix: rb.0.1770.6b8b4567 format: 1 ceph@joceph-client01:~$ rbd map testrbd -p testpool01 rbd: add failed: (13) Permission denied ceph@joceph-client01:~$ sudo rbd map testrbd -p testpool01 rbd: add failed: (2) No such file or directory ceph@joceph-client01:/etc/ceph$ rados df pool name category KB objects clones degraded unfound rdrd KB wrwr KB data- 000 0 0000 0 metadata- 000 0 0000 0 rbd - 120 0 0 1078 8 testpool01 - 000 0 0000 0 testpool02 - 000 0 0000 0 testpool03 - 000 0 0000 0 testpool04 - 000 0 0000 0 total used 23287851602 total avail 9218978040 total space11547763200 ceph@joceph-client01:/etc/ceph$ sudo modprobe rbd ceph@joceph-client01:/etc/ceph$ sudo rbd map testrbd --pool testpool01 rbd: add failed: (2) No such file or directory ceph@joceph-client01:/etc/ceph$ rbd info testrbd rbd image 'testrbd': size 1024 GB in 262144 objects order 22 (4096 kB objects) block_name_prefix: rb.0.1770.6b8b4567 format: 1 I think I figured out where I went wrong here. I had thought if you didn't specify the pool on the 'rbd create' command line you could then later map to any pool. In retrospect that probably doesn't make a lot of sense and it appears if you don't specify the pool at the create step it just defaults to the rbd pool. See example below. ceph@joceph-client01:/etc/ceph$ sudo rbd create --size 1048576 testimage5 --pool testpool01 ceph@joceph-client01:/etc/ceph$ sudo rbd map testimage5 --pool testpool01 ceph@joceph-client01:/etc/ceph$ sudo rbd create --size 1048576 testimage6 ceph@joceph-client01:/etc/ceph$ sudo rbd map testimage6 --pool testpool01 rbd: add failed: (2) No such file or directory ceph@joceph-client01:/etc/ceph$ sudo rbd map testimage6 --pool rbd ceph@joceph-client01:/etc/ceph$ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy disk zap fails but succeeds on retry
-Original Message- From: Alfredo Deza [mailto:alfredo.d...@inktank.com] Sent: Monday, November 18, 2013 6:34 AM To: Gruher, Joseph R Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] ceph-deploy disk zap fails but succeeds on retry I went ahead and created a ticket to track this, if you have any new input, please make sure you add to the actual ticket: http://tracker.ceph.com/issues/6793 Thanks for reporting the problem! Will do! I should be bringing up a few different cluster configurations on this hardware (we're doing some Ceph performance testing) so I may be able to reproduce again and get more details. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-deploy disk zap fails but succeeds on retry
Using ceph-deploy 1.3.2 with ceph 0.72.1. Ceph-deploy disk zap will fail and exit with error, but then on retry will succeed. This is repeatable as I go through each of the OSD disks in my cluster. See output below. I am guessing the first attempt to run changes something about the initial state of the disk which then allows the second run to complete, but if it can be changed to where it will complete, why doesn't the first run just do that? The main negative effect is this causes a compact command like ceph-deploy disk zap joceph0{1,2,3,4}:/dev/sd{b,c,d,e,f} to fail and exit without running through all the targets. I did not encounter this in the previous release of ceph and ceph-deploy (dumpling and 1.2.7?) but I can't say for sure my disks were in the same initial state when running ceph-deploy on that release. Would this be a bug, or expected behavior? ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk zap joceph02:/dev/sdc [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk zap joceph02:/dev/sdc [ceph_deploy.osd][DEBUG ] zapping /dev/sdc on joceph02 [joceph02][DEBUG ] connected to host: joceph02 [joceph02][DEBUG ] detect platform information from remote host [joceph02][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring [joceph02][DEBUG ] zeroing last few blocks of device [joceph02][INFO ] Running command: sudo sgdisk --zap-all --clear --mbrtogpt -- /dev/sdc [joceph02][ERROR ] Caution: invalid main GPT header, but valid backup; regenerating main header [joceph02][ERROR ] from backup! [joceph02][ERROR ] [joceph02][ERROR ] Warning! Main partition table CRC mismatch! Loaded backup partition table [joceph02][ERROR ] instead of main partition table! [joceph02][ERROR ] [joceph02][ERROR ] Warning! One or more CRCs don't match. You should repair the disk! [joceph02][ERROR ] [joceph02][ERROR ] Invalid partition data! [joceph02][DEBUG ] Caution! After loading partitions, the CRC doesn't check out! [joceph02][DEBUG ] GPT data structures destroyed! You may now partition the disk using fdisk or [joceph02][DEBUG ] other utilities. [joceph02][DEBUG ] Information: Creating fresh partition table; will override earlier problems! [joceph02][DEBUG ] Non-GPT disk; not saving changes. Use -g to override. [joceph02][ERROR ] Traceback (most recent call last): [joceph02][ERROR ] File /usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/process.py, line 68, in run [joceph02][ERROR ] reporting(conn, result, timeout) [joceph02][ERROR ] File /usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/log.py, line 13, in reporting [joceph02][ERROR ] received = result.receive(timeout) [joceph02][ERROR ] File /usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/lib/execnet/gateway_base.py, line 455, in receive [joceph02][ERROR ] raise self._getremoteerror() or EOFError() [joceph02][ERROR ] RemoteError: Traceback (most recent call last): [joceph02][ERROR ] File string, line 806, in executetask [joceph02][ERROR ] File , line 35, in _remote_run [joceph02][ERROR ] RuntimeError: command returned non-zero exit status: 3 [joceph02][ERROR ] [joceph02][ERROR ] [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: sgdisk --zap-all --clear --mbrtogpt -- /dev/sdc ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk zap joceph02:/dev/sdc [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk zap joceph02:/dev/sdc [ceph_deploy.osd][DEBUG ] zapping /dev/sdc on joceph02 [joceph02][DEBUG ] connected to host: joceph02 [joceph02][DEBUG ] detect platform information from remote host [joceph02][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring [joceph02][DEBUG ] zeroing last few blocks of device [joceph02][INFO ] Running command: sudo sgdisk --zap-all --clear --mbrtogpt -- /dev/sdc [joceph02][DEBUG ] Creating new GPT entries. [joceph02][DEBUG ] GPT data structures destroyed! You may now partition the disk using fdisk or [joceph02][DEBUG ] other utilities. [joceph02][DEBUG ] The operation has completed successfully. ceph@joceph-admin01:/etc/ceph$ Here's some additional output with a disk-list executed in between zaps: ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk list joceph02 [ceph_deploy.cli][INFO ] Invoked (1.3.2): /usr/bin/ceph-deploy disk list joceph02 [joceph02][DEBUG ] connected to host: joceph02 [joceph02][DEBUG ] detect platform information from remote host [joceph02][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring [ceph_deploy.osd][DEBUG ] Listing disks on joceph02... [joceph02][INFO ] Running command: sudo ceph-disk list [joceph02][DEBUG ] /dev/sda : [joceph02][DEBUG ] /dev/sda1 other, ext4, mounted on / [joceph02][DEBUG ] /dev/sda2 other [joceph02][DEBUG ] /dev/sda5 swap, swap [joceph02][DEBUG ] /dev/sdb other, unknown [joceph02][DEBUG ] /dev/sdc other, unknown [joceph02][DEBUG ] /dev/sdd : [joceph02][DEBUG ] /dev/sdd1 other
Re: [ceph-users] ceph cluster performance
-Original Message- From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users- boun...@lists.ceph.com] On Behalf Of Dinu Vlad Sent: Thursday, November 07, 2013 3:30 AM To: ja...@peacon.co.uk; ceph-users@lists.ceph.com Subject: Re: [ceph-users] ceph cluster performance In this case however, the SSDs were only used for journals and I don't know if ceph-osd sends TRIM to the drive in the process of journaling over a block device. They were also under-subscribed, with just 3 x 10G partitions out of 240 GB raw capacity. I did a manual trim, but it hasn't changed anything. If your SSD capacity is well in excess of your journal capacity requirements you could consider overprovisioning the SSD. Overprovisioning should increase SSD performance and lifetime. This achieves the same effect as trim to some degree (lets the SSD better understand what cells have real data and which can be treated as free). I wonder how effective trim would be on a Ceph journal area. If the journal empties and is then trimmed the next write cycle should be faster, but if the journal is active all the time the benefits would be lost almost immediately, as those cells are going to receive data again almost immediately and go back to an untrimmed state until the next trim occurs. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] please help me.problem with my ceph
From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users- boun...@lists.ceph.com] On Behalf Of ?? Sent: Wednesday, November 06, 2013 10:04 PM To: ceph-users Subject: [ceph-users] please help me.problem with my ceph 1. I have installed ceph with one mon/mds and one osd.When i use 'ceph - s',there si a warning:health HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; recovery 21/42 degraded (50.000%) I would think this is because Ceph defaults to a replication level of 2 and you only have one OSD (nowhere to write a second copy) so you are degraded? You could add a second OSD or perhaps you could set the replication level to 1? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph Block Storage QoS
Is there any plan to implement some kind of QoS in Ceph? Say I want to provide service level assurance to my OpenStack VMs and I might have to throttle bandwidth to some to provide adequate bandwidth to others - is anything like that planned for Ceph? Generally with regard to block storage (rbds), not object or filesystem. Or is there already a better way to do this elsewhere in the OpenStack cloud? Thanks, Joe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw fails to start
Sorry to bump this, but does anyone have any idea what could be wrong here? To resummarize, radosgw fails to start. Debug output seems to indicate it is complaining about the keyring, but the keyring is present and readable, and other Ceph functions which require the keyring can success. So why can't radosgw start? Details below. Thanks! -Original Message- From: Gruher, Joseph R Sent: Friday, November 01, 2013 11:50 AM To: Gruher, Joseph R Subject: RE: radosgw fails to start Adding some debug arguments has generated output which I believe indicates the problem is my keyring is missing, but the keyring seems to be here. Why would this complain about the keyring and fail to start? [ceph@joceph08 ceph]$ sudo /usr/bin/radosgw -d --debug-rgw 20 --debug- ms 1 start 2013-11-01 10:59:47.015332 7f83978e4820 0 ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process radosgw, pid 18760 2013-11-01 10:59:47.015338 7f83978e4820 -1 WARNING: libcurl doesn't support curl_multi_wait() 2013-11-01 10:59:47.015340 7f83978e4820 -1 WARNING: cross zone / region transfer performance may be affected 2013-11-01 10:59:47.018707 7f83978e4820 1 -- :/0 messenger.start 2013-11-01 10:59:47.018773 7f83978e4820 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication 2013-11-01 10:59:47.018774 7f83978e4820 0 librados: client.admin initialization error (2) No such file or directory 2013-11-01 10:59:47.018788 7f83978e4820 1 -- :/1018760 mark_down_all 2013-11-01 10:59:47.018932 7f83978e4820 1 -- :/1018760 shutdown complete. 2013-11-01 10:59:47.018967 7f83978e4820 -1 Couldn't init storage provider (RADOS) [ceph@joceph08 ceph]$ sudo service ceph-radosgw status /usr/bin/radosgw is not running. [ceph@joceph08 ceph]$ pwd /etc/ceph [ceph@joceph08 ceph]$ ls ceph.client.admin.keyring ceph.conf keyring.radosgw.gateway rbdmap [ceph@joceph08 ceph]$ cat ceph.client.admin.keyring [client.admin] key = AQCYyHJSCFH3BBAA472q80qrAiIIVbvJfK/47A== [ceph@joceph08 ceph]$ cat keyring.radosgw.gateway [client.radosgw.gateway] key = AQBh6nNS0Cu3HxAAMxLsbEYZ3pEbwEBajQb1WA== caps mon = allow rw caps osd = allow rwx [ceph@joceph08 ceph]$ cat ceph.conf [client.radosgw.joceph08] host = joceph08 log_file = /var/log/ceph/radosgw.log keyring = /etc/ceph/keyring.radosgw.gateway rgw_socket_path = /tmp/radosgw.sock [global] auth_service_required = cephx filestore_xattr_use_omap = true auth_client_required = cephx auth_cluster_required = cephx mon_host = 10.23.37.142,10.23.37.145,10.23.37.161,10.23.37.165 osd_journal_size = 1024 mon_initial_members = joceph01, joceph02, joceph03, joceph04 fsid = 74d808db-aaa7-41d2-8a84-7d590327a3c7 By the way, I can run other commands on the node which I think must require the keyring. they succeed. [ceph@joceph08 ceph]$ sudo /usr/bin/radosgw -d -c /etc/ceph/ceph.conf -- debug-rgw 20 --debug-ms 1 start 2013-11-01 11:45:07.935483 7ff2e2f11820 0 ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process radosgw, pid 19265 2013-11-01 11:45:07.935488 7ff2e2f11820 -1 WARNING: libcurl doesn't support curl_multi_wait() 2013-11-01 11:45:07.935489 7ff2e2f11820 -1 WARNING: cross zone / region transfer performance may be affected 2013-11-01 11:45:07.938719 7ff2e2f11820 1 -- :/0 messenger.start 2013-11-01 11:45:07.938817 7ff2e2f11820 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication 2013-11-01 11:45:07.938818 7ff2e2f11820 0 librados: client.admin initialization error (2) No such file or directory 2013-11-01 11:45:07.938832 7ff2e2f11820 1 -- :/1019265 mark_down_all 2013-11-01 11:45:07.939150 7ff2e2f11820 1 -- :/1019265 shutdown complete. 2013-11-01 11:45:07.939219 7ff2e2f11820 -1 Couldn't init storage provider (RADOS) [ceph@joceph08 ceph]$ rados df pool name category KB objects clones degraded unfound rdrd KB wrwr KB data- 000 0 0000 0 metadata- 000 0 0000 0 rbd - 000 0 0000 0 total used 6306480 total avail11714822792 total space11715453440 [ceph@joceph08 ceph]$ ceph status cluster 74d808db-aaa7-41d2-8a84-7d590327a3c7 health HEALTH_OK monmap e1: 4 mons at {joceph01=10.23.37.142:6789/0,joceph02=10.23.37.145:6789/0,joceph03=10.2 3.37.161:6789/0,joceph04=10.23.37.165:6789/0}, election epoch 8, quorum 0,1,2,3 joceph01,joceph02,joceph03,joceph04 osdmap e88: 16 osds: 16 up, 16 in pgmap v1402: 2400 pgs: 2400 active+clean; 0 bytes data, 615 MB used, 11172 GB / 11172 GB avail mdsmap e1: 0/0/1 up
Re: [ceph-users] radosgw fails to start
-Original Message- From: Yehuda Sadeh [mailto:yeh...@inktank.com] Sent: Monday, November 04, 2013 12:40 PM To: Gruher, Joseph R Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] radosgw fails to start Not sure why you're able to run the 'rados' and 'ceph' command, and not 'radosgw', just note that the former two don't connect to the osds, whereas the latter does, so it might fail on a different level. You're using the default client.admin as the user for radosgw, but your ceph.conf file doesn't have a section for it and all the relevant configurables are under client.radosgw.gateway. Try fixing that first. Yehuda Thanks for the hint. Adding the section below seems to have addressed the problem. For some reason I didn't have to do this on my previous cluster but it seems to need it here. [client.admin] keyring = /etc/ceph/ceph.client.admin.keyring Now I am failing with a new problem, probably something to do with how I set up Apache, I think, this seems to be some kind of FastCGI error: 2013-11-04 13:05:48.354547 7f1cd6f5d820 0 ERROR: FCGX_Accept_r returned -88 Full output: http://pastebin.com/gyhQnrgP ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw fails to start
Adding some debug arguments has generated output which I believe indicates the problem is my keyring is missing, but the keyring seems to be here. Why would this complain about the keyring and fail to start? [ceph@joceph08 ceph]$ sudo /usr/bin/radosgw -d --debug-rgw 20 --debug-ms 1 start 2013-11-01 10:59:47.015332 7f83978e4820 0 ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process radosgw, pid 18760 2013-11-01 10:59:47.015338 7f83978e4820 -1 WARNING: libcurl doesn't support curl_multi_wait() 2013-11-01 10:59:47.015340 7f83978e4820 -1 WARNING: cross zone / region transfer performance may be affected 2013-11-01 10:59:47.018707 7f83978e4820 1 -- :/0 messenger.start 2013-11-01 10:59:47.018773 7f83978e4820 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication 2013-11-01 10:59:47.018774 7f83978e4820 0 librados: client.admin initialization error (2) No such file or directory 2013-11-01 10:59:47.018788 7f83978e4820 1 -- :/1018760 mark_down_all 2013-11-01 10:59:47.018932 7f83978e4820 1 -- :/1018760 shutdown complete. 2013-11-01 10:59:47.018967 7f83978e4820 -1 Couldn't init storage provider (RADOS) [ceph@joceph08 ceph]$ sudo service ceph-radosgw status /usr/bin/radosgw is not running. [ceph@joceph08 ceph]$ pwd /etc/ceph [ceph@joceph08 ceph]$ ls ceph.client.admin.keyring ceph.conf keyring.radosgw.gateway rbdmap [ceph@joceph08 ceph]$ cat ceph.client.admin.keyring [client.admin] key = AQCYyHJSCFH3BBAA472q80qrAiIIVbvJfK/47A== [ceph@joceph08 ceph]$ cat keyring.radosgw.gateway [client.radosgw.gateway] key = AQBh6nNS0Cu3HxAAMxLsbEYZ3pEbwEBajQb1WA== caps mon = allow rw caps osd = allow rwx [ceph@joceph08 ceph]$ cat ceph.conf [client.radosgw.joceph08] host = joceph08 log_file = /var/log/ceph/radosgw.log keyring = /etc/ceph/keyring.radosgw.gateway rgw_socket_path = /tmp/radosgw.sock [global] auth_service_required = cephx filestore_xattr_use_omap = true auth_client_required = cephx auth_cluster_required = cephx mon_host = 10.23.37.142,10.23.37.145,10.23.37.161,10.23.37.165 osd_journal_size = 1024 mon_initial_members = joceph01, joceph02, joceph03, joceph04 fsid = 74d808db-aaa7-41d2-8a84-7d590327a3c7 From: Gruher, Joseph R Sent: Wednesday, October 30, 2013 12:24 PM To: ceph-users@lists.ceph.com Subject: radosgw fails to start, leaves no clues why Hi all- Trying to set up object storage on CentOS. I've done this successfully on Ubuntu but I'm having some trouble on CentOS. I think I have everything configured but when I try to start the radosgw service it reports starting, but then the status is not running, with no helpful output as to why on the console or in the radosgw log. I once experienced a similar problem in Ubuntu when the hostname was incorrect in ceph.conf but that doesn't seem to be the issue here. Not sure where to go next. Any suggestions what could be the problem? Thanks! [ceph@joceph08 ceph]$ sudo service httpd restart Stopping httpd:[ OK ] Starting httpd:[ OK ] [ceph@joceph08 ceph]$ cat ceph.conf [joceph08.radosgw.gateway] keyring = /etc/ceph/keyring.radosgw.gateway rgw_dns_name = joceph08 host = joceph08 log_file = /var/log/ceph/radosgw.log rgw_socket_path = /tmp/radosgw.sock [global] filestore_xattr_use_omap = true mon_host = 10.23.37.142,10.23.37.145,10.23.37.161 osd_journal_size = 1024 mon_initial_members = joceph01, joceph02, joceph03 auth_supported = cephx fsid = 721ea513-e84c-48df-9c8f-f1d9e602b810 [ceph@joceph08 ceph]$ sudo service ceph-radosgw start Starting radosgw instance(s)... [ceph@joceph08 ceph]$ sudo service ceph-radosgw status /usr/bin/radosgw is not running. [ceph@joceph08 ceph]$ sudo cat /var/log/ceph/radosgw.log [ceph@joceph08 ceph]$ [ceph@joceph08 ceph]$ sudo cat /etc/ceph/keyring.radosgw.gateway [client.radosgw.gateway] key = AQDbUnFSIGT2BxAA5rz9I1HHIG/LJx+XCYot1w== caps mon = allow rw caps osd = allow rwx [ceph@joceph08 ceph]$ ceph status cluster 721ea513-e84c-48df-9c8f-f1d9e602b810 health HEALTH_OK monmap e1: 3 mons at {joceph01=10.23.37.142:6789/0,joceph02=10.23.37.145:6789/0,joceph03=10.23.37.161:6789/0}, election epoch 8, quorum 0,1,2 joceph01,joceph02,joceph03 osdmap e119: 16 osds: 16 up, 16 in pgmap v1383: 3200 pgs: 3200 active+clean; 219 GB data, 411 GB used, 10760 GB / 11172 GB avail mdsmap e1: 0/0/1 up ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw fails to start
-Original Message- From: Derek Yarnell [mailto:de...@umiacs.umd.edu] Sent: Friday, November 01, 2013 12:20 PM To: Gruher, Joseph R; ceph-users@lists.ceph.com Subject: Re: [ceph-users] radosgw fails to start On 11/1/13, 2:07 PM, Gruher, Joseph R wrote: Adding some debug arguments has generated output which I believe indicates the problem is my keyring is missing, but the keyring seems to be here. Why would this complain about the keyring and fail to start? Hi, Are you sure the user you are starting radosgw has the permission to read the keyring file? Thanks, derek Thanks for the suggestion. Yup, it should be readable, first of all I'm starting radosgw with sudo, so root should be able to read anything, plus I set the file to be readable by all users just in case. Problem persists... ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] radosgw fails to start, leaves no clues why
Hi all- Trying to set up object storage on CentOS. I've done this successfully on Ubuntu but I'm having some trouble on CentOS. I think I have everything configured but when I try to start the radosgw service it reports starting, but then the status is not running, with no helpful output as to why on the console or in the radosgw log. I once experienced a similar problem in Ubuntu when the hostname was incorrect in ceph.conf but that doesn't seem to be the issue here. Not sure where to go next. Any suggestions what could be the problem? Thanks! [ceph@joceph08 ceph]$ sudo service httpd restart Stopping httpd:[ OK ] Starting httpd:[ OK ] [ceph@joceph08 ceph]$ cat ceph.conf [joceph08.radosgw.gateway] keyring = /etc/ceph/keyring.radosgw.gateway rgw_dns_name = joceph08 host = joceph08 log_file = /var/log/ceph/radosgw.log rgw_socket_path = /tmp/radosgw.sock [global] filestore_xattr_use_omap = true mon_host = 10.23.37.142,10.23.37.145,10.23.37.161 osd_journal_size = 1024 mon_initial_members = joceph01, joceph02, joceph03 auth_supported = cephx fsid = 721ea513-e84c-48df-9c8f-f1d9e602b810 [ceph@joceph08 ceph]$ sudo service ceph-radosgw start Starting radosgw instance(s)... [ceph@joceph08 ceph]$ sudo service ceph-radosgw status /usr/bin/radosgw is not running. [ceph@joceph08 ceph]$ sudo cat /var/log/ceph/radosgw.log [ceph@joceph08 ceph]$ [ceph@joceph08 ceph]$ sudo cat /etc/ceph/keyring.radosgw.gateway [client.radosgw.gateway] key = AQDbUnFSIGT2BxAA5rz9I1HHIG/LJx+XCYot1w== caps mon = allow rw caps osd = allow rwx [ceph@joceph08 ceph]$ ceph status cluster 721ea513-e84c-48df-9c8f-f1d9e602b810 health HEALTH_OK monmap e1: 3 mons at {joceph01=10.23.37.142:6789/0,joceph02=10.23.37.145:6789/0,joceph03=10.23.37.161:6789/0}, election epoch 8, quorum 0,1,2 joceph01,joceph02,joceph03 osdmap e119: 16 osds: 16 up, 16 in pgmap v1383: 3200 pgs: 3200 active+clean; 219 GB data, 411 GB used, 10760 GB / 11172 GB avail mdsmap e1: 0/0/1 up ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Red Hat clients
I have CentOS 6.4 running with the 3.11.6 kernel from elrepo and it includes the rbd module. I think you could make the same update on RHEL 6.4 and get rbd. From there it is very simple to mount an rbd device. Here are a few notes on what I did. Update kernel: sudo rpm --import http://elrepo.org/RPM-GPG-KEY-elrepo.org sudo rpm -Uvh http://elrepo.org/elrepo-release-6-5.el6.elrepo.noarch.rpm sudo yum -y update sudo yum -y --enablerepo=elrepo-kernel install kernel-ml sudo vim /boot/grub/menu.lst (update default to zero) reboot Create rbd device: rbd create {name} --size {size_in_MB} sudo modprobe rbd sudo rbd map {name} --pool {pool_name} Device appears at /dev/rbd/rbd/name From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of alistair.whit...@barclays.com Sent: Wednesday, October 30, 2013 11:48 AM To: ceph-users@lists.ceph.com Subject: [ceph-users] Red Hat clients Now that my ceph cluster seems to be happy and stable, I have been looking at different ways of using it. Object, block and file. Object is relatively easy and I will use different ones to test with Ceph. When I look at block, I'm getting the impression from a lot of Googling that deploying clients on Red Hat to connect to a Ceph cluster can be complex. As I understand it, the rbd module is not currently in the Red Hat kernel (and I am not allowed to make changes to our standard kernel as is suggested in places as a possible solution). Does this mean I can't connect a Red Hat machine to Ceph as a block client? ___ This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimerhttp://www.barclays.com/emaildisclaimer. For important disclosures, please see: www.barclays.com/salesandtradingdisclaimerhttp://www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com. ___ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy problems on CentOS-6.4
If you are behind a proxy try configuring the wget proxy through /etc/wgetrc. I had a similar problem where I could complete wget commands manually but they would fail in ceph-deploy until I configured the wget proxy in that manner. From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Trivedi, Narendra Sent: Tuesday, October 29, 2013 9:51 AM To: ceph-users@lists.ceph.com Subject: [ceph-users] ceph-deploy problems on CentOS-6.4 Hi All, I am a newbie to ceph. I am installing ceph (dumpling release) using ceph-deploy (issued from my admin node) on one monitor and two OSD nodes running CentOS 6.4 (64-bit) using followed instructions in the link below: http://ceph.com/docs/master/start/quick-ceph-deploy/ My setup looks exactly like the diagram. I followed pre-flight instructions exacty as outlined in the link below: http://ceph.com/docs/master/start/quick-start-preflight/ The ceph-deploy takes forever and then throws up the following error: 2013-10-28 17:32:35,903 [ceph_deploy.cli][INFO ] Invoked (1.2.7): /usr/bin/ceph-deploy new ceph-node1-mon-centos-6-4 2013-10-28 17:32:35,904 [ceph_deploy.new][DEBUG ] Creating new cluster named ceph 2013-10-28 17:32:35,904 [ceph_deploy.new][DEBUG ] Resolving host ceph-node1-mon-centos-6-4 2013-10-28 17:32:35,904 [ceph_deploy.new][DEBUG ] Monitor ceph-node1-mon-centos-6-4 at 10.12.0.70 2013-10-28 17:32:35,904 [ceph_deploy.new][DEBUG ] Monitor initial members are ['ceph-node1-mon-centos-6-4'] 2013-10-28 17:32:35,904 [ceph_deploy.new][DEBUG ] Monitor addrs are ['10.12.0.70'] 2013-10-28 17:32:35,905 [ceph_deploy.new][DEBUG ] Creating a random mon key... 2013-10-28 17:32:35,905 [ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf... 2013-10-28 17:32:35,905 [ceph_deploy.new][DEBUG ] Writing monitor keyring to ceph.mon.keyring... 2013-10-28 17:33:10,287 [ceph_deploy.cli][INFO ] Invoked (1.2.7): /usr/bin/ceph-deploy install ceph-node1-mon-centos-6-4 ceph-node2-osd0-centos-6-4 ceph-admin-node-centos-6-4 2013-10-28 17:33:10,287 [ceph_deploy.install][DEBUG ] Installing stable version dumpling on cluster ceph hosts ceph-node1-mon-centos-6-4 ceph-node2-osd0-centos-6-4 ceph-admin-node-centos-6-4 2013-10-28 17:33:10,288 [ceph_deploy.install][DEBUG ] Detecting platform for host ceph-node1-mon-centos-6-4 ... 2013-10-28 17:33:10,288 [ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection without sudo 2013-10-28 17:33:10,626 [ceph_deploy.install][INFO ] Distro info: CentOS 6.4 Final 2013-10-28 17:33:10,626 [ceph-node1-mon-centos-6-4][INFO ] installing ceph on ceph-node1-mon-centos-6-4 2013-10-28 17:33:10,633 [ceph-node1-mon-centos-6-4][INFO ] adding EPEL repository 2013-10-28 17:33:10,633 [ceph-node1-mon-centos-6-4][INFO ] Running command: wget http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm 2013-10-28 19:20:35,893 [ceph-node1-mon-centos-6-4][ERROR ] Traceback (most recent call last): 2013-10-28 19:20:35,894 [ceph-node1-mon-centos-6-4][ERROR ] File /usr/lib/python2.6/site-packages/ceph_deploy/hosts/centos/install.py, line 77, in install_epel 2013-10-28 19:20:35,899 [ceph-node1-mon-centos-6-4][ERROR ] File /usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py, line 10, in inner 2013-10-28 19:20:35,900 [ceph-node1-mon-centos-6-4][ERROR ] File /usr/lib/python2.6/site-packages/ceph_deploy/util/wrappers.py, line 6, in remote_call 2013-10-28 19:20:35,902 [ceph-node1-mon-centos-6-4][ERROR ] File /usr/lib64/python2.6/subprocess.py, line 502, in check_call 2013-10-28 19:20:35,903 [ceph-node1-mon-centos-6-4][ERROR ] raise CalledProcessError(retcode, cmd) 2013-10-28 19:20:35,904 [ceph-node1-mon-centos-6-4][ERROR ] CalledProcessError: Command '['wget', 'http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm']' returned non-zero exit status 4 2013-10-28 19:20:35,911 [ceph-node1-mon-centos-6-4][ERROR ] --2013-10-28 17:33:10-- http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm 2013-10-28 19:20:35,911 [ceph-node1-mon-centos-6-4][ERROR ] Resolving dl.fedoraproject.org... 209.132.181.25, 209.132.181.26, 209.132.181.27, ... 2013-10-28 19:20:35,912 [ceph-node1-mon-centos-6-4][ERROR ] Connecting to dl.fedoraproject.orghttp://dl.fedoraproject.org/|209.132.181.25|:80... failed: Connection timed out. 2013-10-28 19:20:35,912 [ceph-node1-mon-centos-6-4][ERROR ] Connecting to dl.fedoraproject.orghttp://dl.fedoraproject.org/|209.132.181.26|:80... failed: Connection timed out. Interestingly, wget http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm; on each nodes (1 mon and 2 OSDs) succeeds without any problem. I have tried everything many times with root user, ceph user etc. but it fails every time! It is very frustrating! Has anyone else experienced the same or similar problem? Thanks a lot in advance! Nar This message contains information which may be confidential and/or privileged.
Re: [ceph-users] Ceph-deploy, sudo and proxies
Try configuring the curl proxy in /root/.curlrc. I had a similar problem earlier this week. Overall I have to be sure to set all these proxies individually for ceph-deploy to work on CentOS (Ubuntu is easier): Curl: /root/.curlrc rpm: /root/.rpmmacros wget: /etc/wgetrc yum: /etc/yum.conf -Joe From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of alistair.whit...@barclays.com Sent: Friday, October 25, 2013 10:26 AM To: ceph-users@lists.ceph.com Subject: [ceph-users] Ceph-deploy, sudo and proxies I have an interesting problem I was hoping someone could help with. My Red Hat servers are configured to use proxies to access the internet. I have managed to successfully add the Ceph repo install ceph-deploy on the admin node and create the cluster. All ceph nodes are no password sudo tested and I have made sure that the proxy settings are kept when trying an 'rpm' command using sudo. All other preflight checks are completed with ceph being the default login user etc. So, when I run the ceph-deploy install ceph-node command from the admin node, I get the following error: ceph@ldtdsr02se17 PROD $ ceph-deploy install ldtdsr02se18 [ceph_deploy.cli][INFO ] Invoked (1.2.7): /usr/bin/ceph-deploy install ldtdsr02se18 [ceph_deploy.install][DEBUG ] Installing stable version dumpling on cluster ceph hosts ldtdsr02se18 [ceph_deploy.install][DEBUG ] Detecting platform for host ldtdsr02se18 ... [ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection with sudo [ceph_deploy.install][INFO ] Distro info: RedHatEnterpriseServer 6.4 Santiago [ldtdsr02se18][INFO ] installing ceph on ldtdsr02se18 [ldtdsr02se18][INFO ] Running command: su -c 'rpm --import https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;' [ldtdsr02se18][ERROR ] Traceback (most recent call last): [ldtdsr02se18][ERROR ] File /usr/lib/python2.6/site-packages/ceph_deploy/hosts/centos/install.py, line 23, in install [ldtdsr02se18][ERROR ] File /usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py, line 10, in inner [ldtdsr02se18][ERROR ] File /usr/lib/python2.6/site-packages/ceph_deploy/util/wrappers.py, line 6, in remote_call [ldtdsr02se18][ERROR ] File /usr/lib64/python2.6/subprocess.py, line 502, in check_call [ldtdsr02se18][ERROR ] raise CalledProcessError(retcode, cmd) [ldtdsr02se18][ERROR ] CalledProcessError: Command '['su -c \'rpm --import https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc\'']' returned non-zero exit status 1 [ldtdsr02se18][ERROR ] curl: (7) couldn't connect to host [ldtdsr02se18][ERROR ] error: https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc: import read failed(2). [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: su -c 'rpm --import https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;' Note that it uses sudo as it should and then complains about not being able to connect. When I run the exact same command on the ceph node itself as the ceph user, it works without any errors. This implies that the authentication is in place between ceph and root, and the proxy settings are correct. Yet, it fails to work when initiated from the admin node via ceph-deploy. Any ideas what might be going on here? I should add that I looked at the github page about using the -no-adjust-repos flag but my version of ceph-deploy says it is an invalid flag... Please help Alistair ___ This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimerhttp://www.barclays.com/emaildisclaimer. For important disclosures, please see: www.barclays.com/salesandtradingdisclaimerhttp://www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com. ___ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Default PGs
Should osd_pool_default_pg_num and osd_pool_default_pgp_num apply to the default pools? I put them in ceph.conf before creating any OSDs but after bringing up the OSDs the default pools are using a value of 64. Ceph.conf contains these lines in [global]: osd_pool_default_pgp_num = 800 osd_pool_default_pg_num = 800 After creating and activating OSDs: [ceph@joceph05 ceph]$ ceph osd pool get data pg_num pg_num: 64 [ceph@joceph05 ceph]$ ceph osd pool get data pgp_num pgp_num: 64 [ceph@joceph05 ceph]$ ceph osd dump pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 I have ceph-deploy 1.2.7 and ceph 0.67.4 on CentOS 6.4 with 3.11.6 kernel. Thanks, Joe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd client module in centos 6.4
Hi all, I have CentOS 6.4 with 3.11.6 kernel running (built from latest stable on kernel.org) and I cannot load the rbd client module. Should I have to do anything to enable/install it? Shouldn't it be present in this kernel? [ceph@joceph05 /]$ cat /etc/centos-release CentOS release 6.4 (Final) [ceph@joceph05 /]$ uname -a Linux joceph05.jf.intel.com 3.11.6 #1 SMP Mon Oct 21 17:23:07 PDT 2013 x86_64 x86_64 x86_64 GNU/Linux [ceph@joceph05 /]$ modprobe rbd FATAL: Module rbd not found. [ceph@joceph05 /]$ Thanks, Joe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD journal size
Speculating, but it seems possible that the ':' in the path is problematic, since that is also the separator between disk and journal (HOST:DISK:JOURNAL)? Perhaps if you enclose in ''s or or use /dev/disk/by-id? -Original Message- From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users- boun...@lists.ceph.com] On Behalf Of Shain Miley Sent: Wednesday, October 23, 2013 1:55 PM To: Alfredo Deza Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] OSD journal size O.K...I found the help section in 1.2.7 that talks about using paths...however I still cannot get this to work: root@hqceph1:/usr/local/ceph-install-1# ceph-deploy osd prepare hqosd1:/dev/disk/by-path/pci-:02:00.0-scsi-0:2:1:0 usage: ceph-deploy osd [-h] [--zap-disk] [--fs-type FS_TYPE] [--dmcrypt] [--dmcrypt-key-dir KEYDIR] SUBCOMMAND HOST:DISK[:JOURNAL] [HOST:DISK[:JOURNAL] ...] ceph-deploy osd: error: argument HOST:DISK[:JOURNAL]: must be in form HOST:DISK[:JOURNAL] is '/dev/disk/by-path' names supported...or am I doing something wrong? Thanks, Shain Shain Miley | Manager of Systems and Infrastructure, Digital Media | smi...@npr.org | 202.513.3649 From: ceph-users-boun...@lists.ceph.com [ceph-users- boun...@lists.ceph.com] on behalf of Shain Miley [smi...@npr.org] Sent: Wednesday, October 23, 2013 4:19 PM To: Alfredo Deza Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] OSD journal size Alfredo, Do you know what version of ceph-deploy has this updated functionality I just updated to 1.2.7 and it does not appear to include it. Thanks, Shain Shain Miley | Manager of Systems and Infrastructure, Digital Media | smi...@npr.org | 202.513.3649 From: ceph-users-boun...@lists.ceph.com [ceph-users- boun...@lists.ceph.com] on behalf of Shain Miley [smi...@npr.org] Sent: Monday, October 21, 2013 6:13 PM To: Alfredo Deza Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] OSD journal size Alfredo, Thanks a lot for the info. I'll make sure I have an updated version of ceph-deploy and give it another shot. Shain Shain Miley | Manager of Systems and Infrastructure, Digital Media | smi...@npr.org | 202.513.3649 From: Alfredo Deza [alfredo.d...@inktank.com] Sent: Monday, October 21, 2013 2:03 PM To: Shain Miley Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] OSD journal size On Mon, Oct 21, 2013 at 1:21 PM, Shain Miley smi...@npr.org wrote: Hi, We have been testing a ceph cluster with the following specs: 3 Mon's 72 OSD's spread across 6 Dell R-720xd servers 4 TB SAS drives 4 bonded 10 GigE NIC ports per server 64 GB of RAM Up until this point we have been running tests using the default journal size of '1024'. Before we start to place production data on the cluster I was want to clear up the following questions I have: 1)Is there a more appropriate journal size for my setup given the specs listed above? 2)According to this link: http://www.slideshare.net/Inktank_Ceph/cern-ceph-day-london-2013/11 CERN is using '/dev/disk/by-path' for their OSD's. Does ceph-deploy currently support setting up OSD's using this method? Indeed it does! `ceph-deploy osd --help` got updated recently to demonstrate how this needs to be done (an extra step is involved): For paths, first prepare and then activate: ceph-deploy osd prepare {osd-node-name}:/path/to/osd ceph-deploy osd activate {osd-node-name}:/path/to/osd Thanks, Shain Shain Miley | Manager of Systems and Infrastructure, Digital Media | smi...@npr.org | 202.513.3649 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Client Timeout on Rados Gateway
Thanks for the reply. This eventually resolved itself when I upgraded the client kernel from the Ubuntu Server 12.04.2 default to the 3.6.10 kernel. Not sure if there is a good causal explanation there or if it might be a coincidence. I did see the kernel recommendations in the docs but I had assumed those just applied to the Ceph machines and not clients - perhaps that is a bad assumption. Anyway, it works now, so I guess the next steps are to try moving the client back to the public network and to re-enable authentication and see if it works or if I still have an issue there. With regard to versions: ceph@cephtest06:/etc/ceph$ ceph-mon --version ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a) ceph@cephtest06:/etc/ceph$ uname -a Linux cephtest06 3.6.10-030610-generic #201212101650 SMP Mon Dec 10 21:51:40 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux ceph@cephclient01:~/cos$ rados --version ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a) ceph@cephclient01:~/cos$ uname -a Linux cephclient01 3.6.10-030610-generic #201212101650 SMP Mon Dec 10 21:51:40 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux Thanks, Joe -Original Message- From: Gregory Farnum [mailto:g...@inktank.com] Sent: Monday, October 07, 2013 1:27 PM To: Gruher, Joseph R Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Client Timeout on Rados Gateway The ping tests you're running are connecting to different interfaces (10.23.37.175) than those you specify in the mon_hosts option (10.0.0.2, 10.0.0.3, 10.0.0.4). The client needs to be able to connect to the specified address; I'm guessing it's not routable from outside that network? The error you're getting once you put it inside the network is more interesting. What version of the Ceph packages do you have installed there, and what's installed on the monitors? (run ceph-mon --version on the monitor, and rados --version on the client, and it'll output.) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Oct 1, 2013 at 12:45 PM, Gruher, Joseph R joseph.r.gru...@intel.com wrote: Hello- I've set up a rados gateway but I'm having trouble accessing it from clients. I can access it using rados command line just fine from any system in my ceph deployment, including my monitors and OSDs, the gateway system, and even the admin system I used to run ceph-deploy. However, when I set up a client outside the ceph nodes I get a timeout error as shown at the bottom of the output pasted below. I've turned off authentication for the moment to simplify things. Systems are able to resolve names and reach each other via ping. Any thoughts on what could be the issue here or how to debug? The failure: ceph@cephclient01:/etc/ceph$ rados df 2013-10-01 19:57:07.488970 7fd381db0780 monclient(hunting): authenticate timed out after 30 2013-10-01 19:57:07.489174 7fd381db0780 librados: client.admin authentication error (110) Connection timed out couldn't connect to cluster! error -110 ceph@cephclient01:/etc/ceph$ sudo rados df 2013-10-01 19:57:44.461273 7fb6712d5780 monclient(hunting): authenticate timed out after 30 2013-10-01 19:57:44.461440 7fb6712d5780 librados: client.admin authentication error (110) Connection timed out couldn't connect to cluster! error -110 ceph@cephclient01:/etc/ceph$ Some details from the client: ceph@cephclient01:/etc/ceph$ pwd /etc/ceph ceph@cephclient01:/etc/ceph$ ls ceph.client.admin.keyring ceph.conf keyring.radosgw.gateway ceph@cephclient01:/etc/ceph$ cat ceph.conf [global] fsid = a45e6e54-70ef-4470-91db-2152965deec5 mon_initial_members = cephtest02, cephtest03, cephtest04 mon_host = 10.0.0.2,10.0.0.3,10.0.0.4 osd_journal_size = 1024 filestore_xattr_use_omap = true auth_cluster_required = none #cephx auth_service_required = none #cephx auth_client_required = none #cephx [client.radosgw.gateway] host = cephtest06 keyring = /etc/ceph/keyring.radosgw.gateway rgw_socket_path = /tmp/radosgw.sock log_file = /var/log/ceph/radosgw.log ceph@cephclient01:/etc/ceph$ ping cephtest06 PING cephtest06.jf.intel.com (10.23.37.175) 56(84) bytes of data. 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=1 ttl=64 time=0.216 ms 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=2 ttl=64 time=0.209 ms ^C --- cephtest06.jf.intel.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.209/0.212/0.216/0.015 ms ceph@cephclient01:/etc/ceph$ ping cephtest06.jf.intel.com PING cephtest06.jf.intel.com (10.23.37.175) 56(84) bytes of data. 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=1 ttl=64 time=0.223 ms 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=2 ttl=64 time=0.242 ms ^C --- cephtest06.jf.intel.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms
Re: [ceph-users] Client Timeout on Rados Gateway
Could you clarify something for me... I have a cluster network (10.0.0.x) and a public network (10.23.37.x). All the Ceph machines have one interface on each network and clients (when configured normally) would only be on the public network. My ceph.conf uses 10.0.0.x IPs for the monitors but as you mention below this can cause a problem for the client reaching the monitor since it is not on that network. This could cause the rados command to fail? What is the solution to that problem? It doesn't seem like ceph.conf should use the public IPs for the monitor, don't we want those to be on the private network? And the client wouldn't normally have access to the private network. Is this really just an issue with accuss using rados, as swift or rbd would not need to access the monitors? -Original Message- From: Gregory Farnum [mailto:g...@inktank.com] Sent: Monday, October 07, 2013 1:27 PM To: Gruher, Joseph R Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Client Timeout on Rados Gateway The ping tests you're running are connecting to different interfaces (10.23.37.175) than those you specify in the mon_hosts option (10.0.0.2, 10.0.0.3, 10.0.0.4). The client needs to be able to connect to the specified address; I'm guessing it's not routable from outside that network? The error you're getting once you put it inside the network is more interesting. What version of the Ceph packages do you have installed there, and what's installed on the monitors? (run ceph-mon --version on the monitor, and rados --version on the client, and it'll output.) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Oct 1, 2013 at 12:45 PM, Gruher, Joseph R joseph.r.gru...@intel.com wrote: Hello- I've set up a rados gateway but I'm having trouble accessing it from clients. I can access it using rados command line just fine from any system in my ceph deployment, including my monitors and OSDs, the gateway system, and even the admin system I used to run ceph-deploy. However, when I set up a client outside the ceph nodes I get a timeout error as shown at the bottom of the output pasted below. I've turned off authentication for the moment to simplify things. Systems are able to resolve names and reach each other via ping. Any thoughts on what could be the issue here or how to debug? The failure: ceph@cephclient01:/etc/ceph$ rados df 2013-10-01 19:57:07.488970 7fd381db0780 monclient(hunting): authenticate timed out after 30 2013-10-01 19:57:07.489174 7fd381db0780 librados: client.admin authentication error (110) Connection timed out couldn't connect to cluster! error -110 ceph@cephclient01:/etc/ceph$ sudo rados df 2013-10-01 19:57:44.461273 7fb6712d5780 monclient(hunting): authenticate timed out after 30 2013-10-01 19:57:44.461440 7fb6712d5780 librados: client.admin authentication error (110) Connection timed out couldn't connect to cluster! error -110 ceph@cephclient01:/etc/ceph$ Some details from the client: ceph@cephclient01:/etc/ceph$ pwd /etc/ceph ceph@cephclient01:/etc/ceph$ ls ceph.client.admin.keyring ceph.conf keyring.radosgw.gateway ceph@cephclient01:/etc/ceph$ cat ceph.conf [global] fsid = a45e6e54-70ef-4470-91db-2152965deec5 mon_initial_members = cephtest02, cephtest03, cephtest04 mon_host = 10.0.0.2,10.0.0.3,10.0.0.4 osd_journal_size = 1024 filestore_xattr_use_omap = true auth_cluster_required = none #cephx auth_service_required = none #cephx auth_client_required = none #cephx [client.radosgw.gateway] host = cephtest06 keyring = /etc/ceph/keyring.radosgw.gateway rgw_socket_path = /tmp/radosgw.sock log_file = /var/log/ceph/radosgw.log ceph@cephclient01:/etc/ceph$ ping cephtest06 PING cephtest06.jf.intel.com (10.23.37.175) 56(84) bytes of data. 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=1 ttl=64 time=0.216 ms 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=2 ttl=64 time=0.209 ms ^C --- cephtest06.jf.intel.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.209/0.212/0.216/0.015 ms ceph@cephclient01:/etc/ceph$ ping cephtest06.jf.intel.com PING cephtest06.jf.intel.com (10.23.37.175) 56(84) bytes of data. 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=1 ttl=64 time=0.223 ms 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=2 ttl=64 time=0.242 ms ^C --- cephtest06.jf.intel.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.223/0.232/0.242/0.017 ms I did try putting the client on the 10.0.0.x network to see if that would affect behavior but that just seemed to introduce a new problem: ceph@cephclient01:/etc/ceph$ rados df 2013-10-01 21:37:29.439410 7f60d2a43700 failed to decode message of type 59 v1: buffer::end_of_buffer 2013
Re: [ceph-users] Newbie question
Along the lines of this thread, if I have OSD(s) on rotational HDD(s), but have the journal(s) going to an SSD, I am curious about the best procedure for replacing the SSD should it fail. -Joe From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Scottix Sent: Wednesday, October 02, 2013 10:37 AM To: Andy Paluch Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Newbie question I actually am looking for a similar answer. If 1 osd = 1 HDD, in dumpling it will relocate the data for me after the timeout which is great. If I just want to replace the osd with an unformated new HDD what is the procedure? One method that has worked for me is to remove it out of the crush map then re add the osd drive to the cluster. This works but seems like a lot of overhead just to replace a single drive. Is there a better way to do this? On Wed, Oct 2, 2013 at 8:10 AM, Andy Paluch a...@webguyz.netmailto:a...@webguyz.net wrote: What happens when a drive goes bad in ceph and has to be replaced (at the physical level) . In the Raid world you pop out the bad disk and stick a new one in and the controller takes care of getting it back into the system. With what I've been reading so far, it probably going be a mess to do this with ceph and involve a lot of low level linux tweaking to remove and replace the disk that failed. Not a big Linux guy so was wondering if anyone can point to any docs on how to recover from a bad disk in a ceph node. Thanks ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Follow Me: @Scottixhttp://www.twitter.com/scottix http://about.me/scottix scot...@gmail.commailto:scot...@gmail.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] question on setup ssh documentation
On my system my user is named ceph so I modified /home/ceph/.ssh/config. That seemed to work fine for me. ~/ is shorthand for your user's home folder. I think SSH will default to the current username so if you just use the same username everywhere this may not even be necessary. My file: ceph@cephtest01:/etc/ceph$ cat /home/ceph/.ssh/config Host cephtest02 Hostname cephtest02.jf.intel.com User ceph Host cephtest03 Hostname cephtest03.jf.intel.com User ceph Host cephtest04 Hostname cephtest04.jf.intel.com User ceph Host cephtest05 Hostname cephtest05.jf.intel.com User ceph Host cephtest06 Hostname cephtest06.jf.intel.com User ceph ceph@cephtest01:/etc/ceph$ From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Nimish Patel Sent: Wednesday, October 02, 2013 11:19 AM To: ceph-us...@ceph.com Subject: [ceph-users] question on setup ssh documentation On this web page http://ceph.com/docs/master/start/quick-start-preflight/ where it says Modify your ~/.ssh/config file of your admin node so that it defaults to logging in as the user you created when no username is specified. Which config file do I change? I am using Ubuntu server 13.04. 1.Which files do I modify? /etc/ssh/ssh_config or /etc/ssh/sshd_config ? 2.Am I supposed to see a config file in /root/.ssh? 3.Am I supposed to see a config file in /home/ceph/.ssh? Thanks for the help! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph.conf with multiple rados gateways
Can anyone provide me a sample ceph.conf with multiple rados gateways? I must not be configuring it correctly and I can't seem to Google up an example or find one in the docs. Thanks! -Joe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Client Timeout on Rados Gateway
Hello- I've set up a rados gateway but I'm having trouble accessing it from clients. I can access it using rados command line just fine from any system in my ceph deployment, including my monitors and OSDs, the gateway system, and even the admin system I used to run ceph-deploy. However, when I set up a client outside the ceph nodes I get a timeout error as shown at the bottom of the output pasted below. I've turned off authentication for the moment to simplify things. Systems are able to resolve names and reach each other via ping. Any thoughts on what could be the issue here or how to debug? The failure: ceph@cephclient01:/etc/ceph$ rados df 2013-10-01 19:57:07.488970 7fd381db0780 monclient(hunting): authenticate timed out after 30 2013-10-01 19:57:07.489174 7fd381db0780 librados: client.admin authentication error (110) Connection timed out couldn't connect to cluster! error -110 ceph@cephclient01:/etc/ceph$ sudo rados df 2013-10-01 19:57:44.461273 7fb6712d5780 monclient(hunting): authenticate timed out after 30 2013-10-01 19:57:44.461440 7fb6712d5780 librados: client.admin authentication error (110) Connection timed out couldn't connect to cluster! error -110 ceph@cephclient01:/etc/ceph$ Some details from the client: ceph@cephclient01:/etc/ceph$ pwd /etc/ceph ceph@cephclient01:/etc/ceph$ ls ceph.client.admin.keyring ceph.conf keyring.radosgw.gateway ceph@cephclient01:/etc/ceph$ cat ceph.conf [global] fsid = a45e6e54-70ef-4470-91db-2152965deec5 mon_initial_members = cephtest02, cephtest03, cephtest04 mon_host = 10.0.0.2,10.0.0.3,10.0.0.4 osd_journal_size = 1024 filestore_xattr_use_omap = true auth_cluster_required = none #cephx auth_service_required = none #cephx auth_client_required = none #cephx [client.radosgw.gateway] host = cephtest06 keyring = /etc/ceph/keyring.radosgw.gateway rgw_socket_path = /tmp/radosgw.sock log_file = /var/log/ceph/radosgw.log ceph@cephclient01:/etc/ceph$ ping cephtest06 PING cephtest06.jf.intel.com (10.23.37.175) 56(84) bytes of data. 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=1 ttl=64 time=0.216 ms 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=2 ttl=64 time=0.209 ms ^C --- cephtest06.jf.intel.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.209/0.212/0.216/0.015 ms ceph@cephclient01:/etc/ceph$ ping cephtest06.jf.intel.com PING cephtest06.jf.intel.com (10.23.37.175) 56(84) bytes of data. 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=1 ttl=64 time=0.223 ms 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=2 ttl=64 time=0.242 ms ^C --- cephtest06.jf.intel.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.223/0.232/0.242/0.017 ms I did try putting the client on the 10.0.0.x network to see if that would affect behavior but that just seemed to introduce a new problem: ceph@cephclient01:/etc/ceph$ rados df 2013-10-01 21:37:29.439410 7f60d2a43700 failed to decode message of type 59 v1: buffer::end_of_buffer 2013-10-01 21:37:29.439583 7f60d4a47700 monclient: hunting for new mon ceph@cephclient01:/etc/ceph$ ceph -m 10.0.0.2 -s 2013-10-01 21:37:42.341480 7f61eacd5700 monclient: hunting for new mon 2013-10-01 21:37:45.341024 7f61eacd5700 monclient: hunting for new mon 2013-10-01 21:37:45.343274 7f61eacd5700 monclient: hunting for new mon ceph@cephclient01:/etc/ceph$ ceph health 2013-10-01 21:39:52.833560 mon - [health] 2013-10-01 21:39:52.834671 mon.0 - 'unparseable JSON health' (-22) ceph@cephclient01:/etc/ceph$ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] failure starting radosgw after setting up object storage
-Original Message- From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users- boun...@lists.ceph.com] On Behalf Of Gruher, Joseph R Sent: Monday, September 30, 2013 10:27 AM To: Yehuda Sadeh Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] failure starting radosgw after setting up object storage -Original Message- From: Yehuda Sadeh [mailto:yeh...@inktank.com] Sent: Friday, September 27, 2013 9:30 AM To: Gruher, Joseph R ceph@cephtest06:/etc/ceph$ cat /var/log/ceph/radosgw.log 2013-09-25 14:03:01.235760 7f713d79d780 0 ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a), process radosgw, pid 13187 2013-09-25 14:03:01.235789 7f713d79d780 -1 WARNING: libcurl doesn't support curl_multi_wait() 2013-09-25 14:03:01.235797 7f713d79d780 -1 WARNING: cross zone / region transfer performance may be affected 2013-09-25 14:03:01.245786 7f713d79d780 0 librados: client.radosgw.gateway authentication error (1) Operation not permitted 2013-09-25 14:03:01.246526 7f713d79d780 -1 Couldn't init storage provider (RADOS) This means that the radosgw process cannot connect to the cluster due to user / key set up. Make sure that the user for radosgw exists, and that the ceph keyring file (on the radosgw side) has the correct credentials set. Yehuda Thanks for the response. I will look into these. Is it possible you could provide more detail on how to check these? Sorry, still fairly new to Ceph (and object storage in general). Thanks! I went back through the setup steps again, this time using this guide (http://ceph.com/docs/master/radosgw/manual-install/) instead of this guide (http://ceph.com/docs/next/start/quick-rgw/). Now I can start radosgw on this OSD successfully. I notice this guide has me install a radosgw-agent package, which was not installed before, and I wonder if this could be the difference. Should that package be installed to be able to start radosgw or should it not be required? I didn't make many other changes between the working and failing configuration. The only other change I really made was to create a gateway user. I had not done that step before. In both guides that step is done after starting radosgw, so I wouldn't think that would have been the key to allowing it to start, unless the guides are both broken in that respect. My other OSD still returns nothing when I try to start radosgw so not sure what the problem is there. ceph@cephtest05:/etc/ceph$ sudo /etc/init.d/radosgw start ceph@cephtest05:/etc/ceph$ sudo /etc/init.d/radosgw status /usr/bin/radosgw is not running. ceph@cephtest05:/etc/ceph$ sudo cat /var/log/ceph/radosgw.log ceph@cephtest05:/etc/ceph$ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] failure starting radosgw after setting up object storage
Hi all- I am following the object storage quick start guide. I have a cluster with two OSDs and have followed the steps on both. Both are failing to start radosgw but each in a different manner. All the previous steps in the quick start guide appeared to complete successfully. Any tips on how to debug from here? Thanks! OSD1: ceph@cephtest05:/etc/ceph$ sudo /etc/init.d/radosgw start ceph@cephtest05:/etc/ceph$ ceph@cephtest05:/etc/ceph$ sudo /etc/init.d/radosgw status /usr/bin/radosgw is not running. ceph@cephtest05:/etc/ceph$ ceph@cephtest05:/etc/ceph$ cat /var/log/ceph/radosgw.log ceph@cephtest05:/etc/ceph$ OSD2: ceph@cephtest06:/etc/ceph$ sudo /etc/init.d/radosgw start Starting client.radosgw.gateway... 2013-09-25 14:03:01.235789 7f713d79d780 -1 WARNING: libcurl doesn't support curl_multi_wait() 2013-09-25 14:03:01.235797 7f713d79d780 -1 WARNING: cross zone / region transfer performance may be affected ceph@cephtest06:/etc/ceph$ ceph@cephtest06:/etc/ceph$ sudo /etc/init.d/radosgw status /usr/bin/radosgw is not running. ceph@cephtest06:/etc/ceph$ ceph@cephtest06:/etc/ceph$ cat /var/log/ceph/radosgw.log 2013-09-25 14:03:01.235760 7f713d79d780 0 ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a), process radosgw, pid 13187 2013-09-25 14:03:01.235789 7f713d79d780 -1 WARNING: libcurl doesn't support curl_multi_wait() 2013-09-25 14:03:01.235797 7f713d79d780 -1 WARNING: cross zone / region transfer performance may be affected 2013-09-25 14:03:01.245786 7f713d79d780 0 librados: client.radosgw.gateway authentication error (1) Operation not permitted 2013-09-25 14:03:01.246526 7f713d79d780 -1 Couldn't init storage provider (RADOS) ceph@cephtest06:/etc/ceph$ For reference, I think cluster health is OK: ceph@cephtest06:/etc/ceph$ sudo ceph status cluster a45e6e54-70ef-4470-91db-2152965deec5 health HEALTH_WARN clock skew detected on mon.cephtest03, mon.cephtest04 monmap e1: 3 mons at {cephtest02=10.0.0.2:6789/0,cephtest03=10.0.0.3:6789/0,cephtest04=10.0.0.4:6789/0}, election epoch 6, quorum 0,1,2 cephtest02,cephtest03,cephtest04 osdmap e9: 2 osds: 2 up, 2 in pgmap v439: 192 pgs: 192 active+clean; 0 bytes data, 72548 KB used, 1998 GB / 1999 GB avail mdsmap e1: 0/0/1 up ceph@cephtest06:/etc/ceph$ sudo ceph health HEALTH_WARN clock skew detected on mon.cephtest03, mon.cephtest04 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] monitor deployment during quick start
Sorry, not trying to repost or bump my thread, but I think I can restate my question here and for better clarity. I am confused about the --cluster argument used when ceph-deploy mon create invokes ceph-mon on the target system. I always get a failure at this point when running ceph-deploy mon create and this then halts the whole ceph quick start process. Here is the line where ceph-deploy mon create fails: [cephtest02][INFO ] Running command: ceph-mon --cluster ceph --mkfs -i cephtest02 --keyring /var/lib/ceph/tmp/ceph-cephtest02.mon.keyring Running the same command manually on the target system gives an error. As far as I can tell from the man page and the built-in help and the website (http://ceph.com/docs/next/man/8/ceph-mon/) it seems --cluster is not a valid argument for ceph-mon? Is this a problem in ceph-deploy? Does this work for anyone else? ceph@cephtest02:~$ sudo ceph-mon --cluster ceph --mkfs -i cephtest02 --keyring /var/lib/ceph/tmp/ceph-cephtest02.mon.keyring too many arguments: [--cluster,ceph] usage: ceph-mon -i monid [--mon-data=pathtodata] [flags] --debug_mon n debug monitor level (e.g. 10) --mkfs build fresh monitor fs --conf/-cRead configuration from the given configuration file -d Run in foreground, log to stderr. -f Run in foreground, log to usual location. --id/-i set ID portion of my name --name/-nset name (TYPE.ID) --versionshow version and quit --debug_ms N set message debug level (e.g. 1) ceph@cephtest02:~$ Can anyone clarify if --cluster is a supported argument for ceph-mon? Thanks! Here's the more complete output from the admin system when this fails: ceph@cephtest01:/my-cluster$ ceph-deploy --overwrite-conf mon create cephtest02 [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts cephtest02 [ceph_deploy.mon][DEBUG ] detecting platform for host cephtest02 ... [ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection with sudo [ceph_deploy.mon][INFO ] distro info: Ubuntu 12.04 precise [cephtest02][DEBUG ] determining if provided host has same hostname in remote [cephtest02][DEBUG ] deploying mon to cephtest02 [cephtest02][DEBUG ] remote hostname: cephtest02 [cephtest02][INFO ] write cluster configuration to /etc/ceph/{cluster}.conf [cephtest02][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-cephtest02/done [cephtest02][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-cephtest02/done [cephtest02][INFO ] creating keyring file: /var/lib/ceph/tmp/ceph-cephtest02.mon.keyring [cephtest02][INFO ] create the monitor keyring file [cephtest02][INFO ] Running command: ceph-mon --cluster ceph --mkfs -i cephtest02 --keyring /var/lib/ceph/tmp/ceph-cephtest02.mon.keyring [cephtest02][ERROR ] Traceback (most recent call last): [cephtest02][ERROR ] File /usr/lib/python2.7/dist-packages/ceph_deploy/hosts/common.py, line 72, in mon_create [cephtest02][ERROR ] File /usr/lib/python2.7/dist-packages/ceph_deploy/util/decorators.py, line 10, in inner [cephtest02][ERROR ] File /usr/lib/python2.7/dist-packages/ceph_deploy/util/wrappers.py, line 6, in remote_call [cephtest02][ERROR ] File /usr/lib/python2.7/subprocess.py, line 511, in check_call [cephtest02][ERROR ] raise CalledProcessError(retcode, cmd) [cephtest02][ERROR ] CalledProcessError: Command '['ceph-mon', '--cluster', 'ceph', '--mkfs', '-i', 'cephtest02', '--keyring', '/var/lib/ceph/tmp/ceph-cephtest02.mon.keyring']' returned non-zero exit status 1 [cephtest02][INFO ] --conf/-cRead configuration from the given configuration file [cephtest02][INFO ] -d Run in foreground, log to stderr. [cephtest02][INFO ] -f Run in foreground, log to usual location. [cephtest02][INFO ] --id/-i set ID portion of my name [cephtest02][INFO ] --name/-nset name (TYPE.ID) [cephtest02][INFO ] --versionshow version and quit [cephtest02][INFO ]--debug_ms N [cephtest02][INFO ] set message debug level (e.g. 1) [cephtest02][ERROR ] too many arguments: [--cluster,ceph] [cephtest02][ERROR ] usage: ceph-mon -i monid [--mon-data=pathtodata] [flags] [cephtest02][ERROR ] --debug_mon n [cephtest02][ERROR ] debug monitor level (e.g. 10) [cephtest02][ERROR ] --mkfs [cephtest02][ERROR ] build fresh monitor fs [ceph_deploy.mon][ERROR ] Failed to execute command: ceph-mon --cluster ceph --mkfs -i cephtest02 --keyring /var/lib/ceph/tmp/ceph-cephtest02.mon.keyring [ceph_deploy][ERROR ] GenericError: Failed to create 1 monitors ceph@cephtest01:/my-cluster$ -Joe -Original Message- From: Gruher, Joseph R Sent: Thursday, September 19, 2013 11:14 AM To: ceph-users@lists.ceph.com Cc: Gruher, Joseph R Subject: monitor deployment during quick start Could someone make a quick clarification on the quick start guide for me? On this page: http://ceph.com/docs/next/start/quick-ceph-deploy/. After I do ceph
Re: [ceph-users] ceph-deploy not including sudo?
-Original Message- From: Alfredo Deza [mailto:alfredo.d...@inktank.com] Can you try running ceph-deploy *without* sudo ? Ah, OK, sure. Without sudo I end up hung here again: ceph@cephtest01:~$ ceph-deploy install cephtest03 cephtest04 cephtest05 cephtest06 cut [cephtest03][INFO ] Running command: wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key add - BUT if I then add the --no-adjust-repos switch that was suggested we then finally run to completion! Thanks for the help! On to the next step... ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] monitor deployment during quick start
Could someone make a quick clarification on the quick start guide for me? On this page: http://ceph.com/docs/next/start/quick-ceph-deploy/. After I do ceph-deploy new to a system is that system then a monitor from that point forward? Or do I then have to do ceph-deploy mon create to that same system before it is really a monitor? Regardless of the combinations of systems I try I seem to get a failure at the add a monitor step. Should this be a correct sequence? ceph@cephtest01:~$ ceph-deploy new cephtest02 ceph@cephtest01:~$ ceph-deploy install --no-adjust-repos cephtest02 cephtest03 cephtest04 ceph@cephtest01:~$ ceph-deploy mon create cephtest02 Here is the failure I get: ceph@cephtest01:~$ ceph-deploy mon create cephtest02 [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts cephtest02 [ceph_deploy.mon][DEBUG ] detecting platform for host cephtest02 ... [ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection with sudo [ceph_deploy.mon][INFO ] distro info: Ubuntu 12.04 precise [cephtest02][DEBUG ] determining if provided host has same hostname in remote [cephtest02][DEBUG ] deploying mon to cephtest02 [cephtest02][DEBUG ] remote hostname: cephtest02 [cephtest02][INFO ] write cluster configuration to /etc/ceph/{cluster}.conf [cephtest02][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-cephtest02/done [cephtest02][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-cephtest02/done [cephtest02][INFO ] creating keyring file: /var/lib/ceph/tmp/ceph-cephtest02.mon.keyring [cephtest02][INFO ] create the monitor keyring file [cephtest02][INFO ] Running command: ceph-mon --cluster ceph --mkfs -i cephtest02 --keyring /var/lib/ceph/tmp/ceph-cephtest02.mon.keyring [cephtest02][ERROR ] Traceback (most recent call last): [cephtest02][ERROR ] File /usr/lib/python2.7/dist-packages/ceph_deploy/hosts/common.py, line 72, in mon_create [cephtest02][ERROR ] File /usr/lib/python2.7/dist-packages/ceph_deploy/util/decorators.py, line 10, in inner [cephtest02][ERROR ] File /usr/lib/python2.7/dist-packages/ceph_deploy/util/wrappers.py, line 6, in remote_call [cephtest02][ERROR ] File /usr/lib/python2.7/subprocess.py, line 511, in check_call [cephtest02][ERROR ] raise CalledProcessError(retcode, cmd) [cephtest02][ERROR ] CalledProcessError: Command '['ceph-mon', '--cluster', 'ceph', '--mkfs', '-i', 'cephtest02', '--keyring', '/var/lib/ceph/tmp/ceph-cephtest02.mon.keyring']' returned non-zero exit status 1 [cephtest02][INFO ] --conf/-cRead configuration from the given configuration file [cephtest02][INFO ] -d Run in foreground, log to stderr. [cephtest02][INFO ] -f Run in foreground, log to usual location. [cephtest02][INFO ] --id/-i set ID portion of my name [cephtest02][INFO ] --name/-nset name (TYPE.ID) [cephtest02][INFO ] --versionshow version and quit [cephtest02][INFO ]--debug_ms N [cephtest02][INFO ] set message debug level (e.g. 1) [cephtest02][ERROR ] too many arguments: [--cluster,ceph] [cephtest02][ERROR ] usage: ceph-mon -i monid [--mon-data=pathtodata] [flags] [cephtest02][ERROR ] --debug_mon n [cephtest02][ERROR ] debug monitor level (e.g. 10) [cephtest02][ERROR ] --mkfs [cephtest02][ERROR ] build fresh monitor fs [ceph_deploy.mon][ERROR ] Failed to execute command: ceph-mon --cluster ceph --mkfs -i cephtest02 --keyring /var/lib/ceph/tmp/ceph-cephtest02.mon.keyring [ceph_deploy][ERROR ] GenericError: Failed to create 1 monitors Trying to run the failing command myself: ceph@cephtest01:~$ ssh cephtest02 sudo ceph-mon --cluster ceph --mkfs -i cephtest02 --keyring /var/lib/ceph/tmp/ceph-cephtest02.mon.keyring --conf/-cRead configuration from the given configuration file -d Run in foreground, log to stderr. -f Run in foreground, log to usual location. --id/-i set ID portion of my name --name/-nset name (TYPE.ID) --versionshow version and quit --debug_ms N set message debug level (e.g. 1) too many arguments: [--cluster,ceph] usage: ceph-mon -i monid [--mon-data=pathtodata] [flags] --debug_mon n debug monitor level (e.g. 10) --mkfs build fresh monitor fs Not clear if I should be using the same system from ceph-deploy new for ceph-deploy mon but the same thing happens either way: ceph@cephtest01:~$ ssh cephtest03 sudo ceph-mon --cluster ceph --mkfs -i cephtest02 --keyring /var/lib/ceph/tmp/ceph-cephtest02.mon.keyring --conf/-cRead configuration from the given configuration file -d Run in foreground, log to stderr. -f Run in foreground, log to usual location. --id/-i set ID portion of my name --name/-nset name (TYPE.ID) --versionshow version and quit --debug_ms N set message debug level (e.g. 1) too many arguments: [--cluster,ceph]
Re: [ceph-users] OSD and Journal Files
-Original Message- From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users- boun...@lists.ceph.com] On Behalf Of Mike Dawson you need to understand losing an SSD will cause the loss of ALL of the OSDs which had their journal on the failed SSD. First, you probably don't want RAID1 for the journal SSDs. It isn't particularly needed for resiliency and certainly isn't beneficial from a throughput perspective. Sorry, can you clarify this further for me? If losing the SSD would cause losing all the OSDs journaling on it why would you not want to RAID it? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] problem with ceph-deploy hanging
-Original Message- From: Alfredo Deza [mailto:alfredo.d...@inktank.com] Again, in this next coming release, you will be able to tell ceph-deploy to just install the packages without mangling your repos (or installing keys) Updated to new ceph-deploy release 1.2.6 today but I still see the hang at the same point. Can you provide some more detail on your comment about running ceph-deploy without installing keys / mangling repos (install packages only)? How? Thanks!! joe@cephtest01:~$ su ceph Password: $ sudo ceph-deploy --version 1.2.6 $ sudo ceph-deploy -v install cephtest01 cephtest02 cephtest03 [ceph_deploy.install][DEBUG ] Installing stable version dumpling on cluster ceph hosts cephtest01 cephtest02 cephtest03 [ceph_deploy.install][DEBUG ] Detecting platform for host cephtest01 ... [ceph_deploy.sudo_pushy][DEBUG ] will use a local connection without sudo [ceph_deploy.install][INFO ] Distro info: Ubuntu 12.04 precise [cephtest01][INFO ] installing ceph on cephtest01 [cephtest01][INFO ] Running command: env DEBIAN_FRONTEND=noninteractive apt-get -q install --assume-yes ca-certificates [cephtest01][INFO ] Reading package lists... [cephtest01][INFO ] Building dependency tree... [cephtest01][INFO ] Reading state information... [cephtest01][INFO ] ca-certificates is already the newest version. [cephtest01][INFO ] 0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded. [cephtest01][INFO ] Running command: wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key add - (system hangs here indefinitely) As noted before, this command succeeds, so unclear why ceph-deploy is hanging... $ wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | sudo apt-key add - OK ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-deploy not including sudo?
Using latest ceph-deploy: ceph@cephtest01:/my-cluster$ sudo ceph-deploy --version 1.2.6 I get this failure: ceph@cephtest01:/my-cluster$ sudo ceph-deploy install cephtest03 cephtest04 cephtest05 cephtest06 [ceph_deploy.install][DEBUG ] Installing stable version dumpling on cluster ceph hosts cephtest03 cephtest04 cephtest05 cephtest06 [ceph_deploy.install][DEBUG ] Detecting platform for host cephtest03 ... [ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection without sudo [ceph_deploy.install][INFO ] Distro info: Ubuntu 12.04 precise [cephtest03][INFO ] installing ceph on cephtest03 [cephtest03][INFO ] Running command: env DEBIAN_FRONTEND=noninteractive apt-get -q install --assume-yes ca-certificates [cephtest03][ERROR ] Traceback (most recent call last): [cephtest03][ERROR ] File /usr/lib/python2.7/dist-packages/ceph_deploy/hosts/debian/install.py, line 26, in install [cephtest03][ERROR ] File /usr/lib/python2.7/dist-packages/ceph_deploy/util/decorators.py, line 10, in inner [cephtest03][ERROR ] File /usr/lib/python2.7/dist-packages/ceph_deploy/util/wrappers.py, line 6, in remote_call [cephtest03][ERROR ] File /usr/lib/python2.7/subprocess.py, line 511, in check_call [cephtest03][ERROR ] raise CalledProcessError(retcode, cmd) [cephtest03][ERROR ] CalledProcessError: Command '['env', 'DEBIAN_FRONTEND=noninteractive', 'apt-get', '-q', 'install', '--assume-yes', 'ca-certificates']' returned non-zero exit status 100 [cephtest03][ERROR ] E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied) [cephtest03][ERROR ] E: Unable to lock the administration directory (/var/lib/dpkg/), are you root? [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env DEBIAN_FRONTEND=noninteractive apt-get -q install --assume-yes ca-certificates This failure seems to imply ceph-deploy is not prefacing remote (SSH) commands to other systems with sudo? For example this command as shown in the ceph-deploy output fails: ceph@cephtest01:/my-cluster$ ssh cephtest03 env DEBIAN_FRONTEND=noninteractive apt-get -q install --assume-yes ca-certificates E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied) E: Unable to lock the administration directory (/var/lib/dpkg/), are you root? But with the sudo added it works: ceph@cephtest01:/my-cluster$ ssh cephtest03 sudo env DEBIAN_FRONTEND=noninteractive apt-get -q install --assume-yes ca-certificates Reading package lists... Building dependency tree... Reading state information... ca-certificates is already the newest version. 0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded. ceph@cephtest01:/my-cluster$ Thanks, Joe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] problem with ceph-deploy hanging
-Original Message- From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users- boun...@lists.ceph.com] On Behalf Of Gilles Mocellin So you can add something like this in all ceph nodes' /etc/sudoers (use visudo) : Defaults env_keep += http_proxy https_proxy ftp_proxy no_proxy Hope it can help. Thanks for the suggestion! However, no effect on the problem from this change. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] problem with ceph-deploy hanging
-Original Message- From: Alfredo Deza [mailto:alfredo.d...@inktank.com] Subject: Re: [ceph-users] problem with ceph-deploy hanging ceph-deploy will use the user as you are currently executing. That is why, if you are calling ceph-deploy as root, it will log in remotely as root. So by a different user, I mean, something like, user `ceph` executing ceph- deploy (yes, that same user needs to exist remotely too with correct permissions) This is interesting. Since the preflight has us set up passwordless SSH with a default ceph user I assumed it didn't really matter what user I was logged in as on the admin system. Good to know. Unfortunately, logging in as my ceph user on the admin system (with a matching user on the target system) does not affect my result. The ceph-deploy install still hangs here: [cephtest02][INFO ] Running command: wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key add - It has been suggested that this could be due to our firewall. I have the proxies configured in /etc/environment and when I run a wget myself (as the ceph user, either directly on cephtest02 or via SSH command to cephtest02 from the admin system) it resolves the proxy and succeeds. Is there any reason the wget might behave differently when run by ceph-deploy and fail to resolve the proxy? Is there anywhere I might need to set proxy information besides /etc/environment? Or, any other thoughts on how to debug this further? Thanks! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] problem with ceph-deploy hanging
From: Gruher, Joseph R From: Alfredo Deza [mailto:alfredo.d...@inktank.com] On Fri, Sep 13, 2013 at 5:06 PM, Gruher, Joseph R joseph.r.gru...@intel.com wrote: root@cephtest01:~# ssh cephtest02 wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key add - gpg: no valid OpenPGP data found. This is clearly part of the problem. Can you try getting to this with something other than wget (e.g. curl) ? OK, I am seeing the problem here after turning off quiet mode on wget. You can see in the wget output that part of the URL is lost when executing the command over SSH. However, I'm still unsure how to fix this, I've tried a number of ways of enclosing the command and this keeps happening. SSH command leads to incomplete URL and returns web page (note URL truncated at ceph.git): root@cephtest01:~# ssh cephtest02 sudo wget -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' --2013-09-13 16:37:06-- https://ceph.com/git/?p=ceph.git When run locally complete URL returns PGP key: root@cephtest02:/# wget -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' --2013-09-13 16:37:30-- https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc I was able to show the wget command does succeed if properly formatted (have to double-enclose in quotes as SSH strips the outer set) as does the apt-key add if prefaced with a sudo. So, I'm still stuck on the problem of ceph deploy hanging at the point shown below. Any tips on how to debug further? Has anyone else experienced a similar problem? Is it possible to enable any additional output from ceph-deploy? Is there any documentation on how to deploy without using ceph-deploy install? Thanks! Here's where it hangs: root@cephtest01:~# ceph-deploy install cephtest02 cephtest03 cephtest04 [ceph_deploy.install][DEBUG ] Installing stable version dumpling on cluster ceph hosts cephtest02 cephtest03 cephtest04 [ceph_deploy.install][DEBUG ] Detecting platform for host cephtest02 ... [ceph_deploy.install][INFO ] Distro info: Ubuntu 12.04 precise [cephtest02][INFO ] installing ceph on cephtest02 [cephtest02][INFO ] Running command: env DEBIAN_FRONTEND=noninteractive apt-get -q install --assume-yes ca-certificates [cephtest02][INFO ] Reading package lists... [cephtest02][INFO ] Building dependency tree... [cephtest02][INFO ] Reading state information... [cephtest02][INFO ] ca-certificates is already the newest version. [cephtest02][INFO ] 0 upgraded, 0 newly installed, 0 to remove and 4 not upgraded. [cephtest02][INFO ] Running command: wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key add - Here's the command it seems to be hanging on succeeding when manually run on the command line: root@cephtest01:~# ssh cephtest02 wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | sudo apt-key add - OK root@cephtest01:~# Thanks, Joe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] problem with ceph user
Hello all- I'm setting up a new Ceph cluster (my first time - just a lab experiment, not for production) by following the docs on the ceph.com website. The preflight checklist went fine, I installed and updated Ubuntu 12.04.2, set up my user and set up passwordless SSH, etc. I ran ceph-deploy new without any apparent issues. However, when I run ceph-deploy install it hangs at this point: [cephtest02][INFO ] Running command: wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key add - It looks to me like it is failing on the apt-key add command. If I log directly into the cephtest02 host as my ceph user and try to run apt-key add it fails: $ apt-key add ERROR: This command can only be used by root. It works if I include a sudo: $ sudo apt-key add gpg: can't open `': No such file or directory So I assume the problem is my ceph user doesn't have the right permissions? I set up the ceph user by following the instructions in the preflight checklist (http://ceph.com/docs/master/start/quick-start-preflight/): root@cephtest02:/# cat /etc/sudoers.d/ceph ceph ALL = (root) NOPASSWD:ALL root@cephtest02:/# ls -l /etc/sudoers.d/ceph -r--r- 1 root root 31 Sep 12 15:45 /etc/sudoers.d/ceph $ sudo -l Matching Defaults entries for ceph on this host: env_reset, secure_path=/usr/local/sbin\:/usr/local/bin\:/usr/sbin\:/usr/bin\:/sbin\:/bin User ceph may run the following commands on this host: (root) NOPASSWD: ALL Can anyone tell me where I'm going wrong here, or in general how to give the ceph user the appropriate permissions? Or is this a ceph-deploy problem that it is not including the sudo? Thanks, Joe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] problem with ceph-deploy hanging
ago Greg_Farnumcaching_policies_in tree | snapshot ReplicatedPG 2 days ago Greg_FarnumObjecter:_follow_redirect commit | commitdiff | replies_from_the_OSD tree | snapshot ... tags 4 days ago v0.67.3 v0.67.3 tag | commit | shortlog | log 9 days ago v0.68 v0.68 tag | commit | shortlog | log 2 weeks ago v0.56.7 v0.56.7 tag | commit | shortlog | log 3 weeks ago v0.67.2 v0.67.2 tag | commit | shortlog | log 3 weeks ago v0.61.8 v0.61.8 tag | commit | shortlog | log 3 weeks ago v0.67.1 v0.67.1 tag | commit | shortlog | log 4 weeks ago v0.67 v0.67 tag | commit | shortlog | log 6 weeks ago v0.67-rc3 v0.67-rc3 tag | commit | shortlog | log 7 weeks ago v0.61.7 v0.61.7 tag | commit | shortlog | log 7 weeks ago v0.67-rc2 v0.67-rc2 tag | commit | shortlog | log 7 weeks ago v0.61.6 v0.61.6 tag | commit | shortlog | log 7 weeks ago v0.67-rc1 v0.67-rc1 tag | commit | shortlog | log 8 weeks ago v0.61.5 v0.61.5 tag | commit | shortlog | log 2 months ago v0.66 v0.66 tag | commit | shortlog | log 2 months ago v0.65 v0.65 tag | commit | shortlog | log 2 months ago v0.61.4 v0.61.4 tag | commit | shortlog | log ... heads 22 min ago remove-hadoop-shimshortlog | log | tree 3 hours ago wip-5862 shortlog | log | tree 3 hours ago next shortlog | log | tree 4 hours ago wip-6147 shortlog | log | tree 5 hours ago mastershortlog | log | tree 5 hours ago fix-no-tcmalloc-build shortlog | log | tree 6 hours ago wip-5857-3shortlog | log | tree 23 hours ago wip-build-fixes shortlog | log | tree 23 hours ago wip-6294 shortlog | log | tree 24 hours ago wip-centos-java shortlog | log | tree 26 hours ago wip-remove-old-hadoop shortlog | log | tree 27 hours ago wip-6286-dumpling shortlog | log | tree 30 hours ago wip-6287 shortlog | log | tree 39 hours ago wip-6286 shortlog | log | tree 39 hours ago wip-misc shortlog | log | tree 41 hours ago wip-6284 shortlog | log | tree ... ceph.git RSS Atom root@cephtest01:~# Is this URL wrong, or is the data at the URL incorrect? Thanks, Joe From: Gruher, Joseph R Sent: Friday, September 13, 2013 1:17 PM To: ceph-users@lists.ceph.com Cc: Gruher, Joseph R Subject: problem with ceph user Hello all- I'm setting up a new Ceph cluster (my first time - just a lab experiment, not for production) by following the docs on the ceph.com website. The preflight checklist went fine, I installed and updated Ubuntu 12.04.2, set up my user and set up passwordless SSH, etc. I ran ceph-deploy new without any apparent issues. However, when I run ceph-deploy install it hangs at this point: [cephtest02][INFO ] Running command: wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key add - It looks to me like it is failing on the apt-key add command. If I log directly into the cephtest02 host as my ceph user and try to run apt-key add it fails: $ apt-key add ERROR: This command can only be used by root. It works if I include a sudo: $ sudo apt-key add gpg: can't open `': No such file or directory So I assume the problem is my ceph user doesn't have the right permissions? I set up the ceph user by following the instructions in the preflight checklist (http://ceph.com/docs/master/start/quick-start-preflight/): root@cephtest02:/# cat /etc/sudoers.d/ceph ceph ALL = (root) NOPASSWD:ALL root@cephtest02:/# ls -l /etc/sudoers.d/ceph -r--r- 1 root root 31 Sep 12 15:45 /etc/sudoers.d/ceph $ sudo -l Matching Defaults entries for ceph on this host: env_reset, secure_path=/usr/local/sbin\:/usr/local/bin\:/usr/sbin\:/usr/bin\:/sbin\:/bin User ceph may run the following commands on this host: (root) NOPASSWD: ALL Can anyone tell me where I'm going wrong here, or in general how to give the ceph user the appropriate permissions? Or is this a ceph-deploy problem that it is not including the sudo? Thanks, Joe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] problem with ceph-deploy hanging
-Original Message- From: Alfredo Deza [mailto:alfredo.d...@inktank.com] Sent: Friday, September 13, 2013 3:17 PM To: Gruher, Joseph R Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] problem with ceph-deploy hanging On Fri, Sep 13, 2013 at 5:06 PM, Gruher, Joseph R joseph.r.gru...@intel.com wrote: root@cephtest01:~# ssh cephtest02 wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key add - gpg: no valid OpenPGP data found. This is clearly part of the problem. Can you try getting to this with something other than wget (e.g. curl) ? OK, I am seeing the problem here after turning off quiet mode on wget. You can see in the wget output that part of the URL is lost when executing the command over SSH. However, I'm still unsure how to fix this, I've tried a number of ways of enclosing the command and this keeps happening. SSH command leads to incomplete URL and returns web page (note URL truncated at ceph.git): root@cephtest01:~# ssh cephtest02 sudo wget -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' --2013-09-13 16:37:06-- https://ceph.com/git/?p=ceph.git When run locally complete URL returns PGP key: root@cephtest02:/# wget -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' --2013-09-13 16:37:30-- https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com