[ceph-users] systemctl enable ceph-mon fails in ceph-deploy create initial (no such service)

2015-12-02 Thread Gruher, Joseph R
Hey folks.  Running RHEL7.1 with stock 3.10.0 kernel and trying to deploy 
Infernalis.  Haven't done this since Firefly but I used to know what I was 
doing.  My problem is "ceph-deploy new" and "ceph-deploy install" seem to go 
well but "ceph-deploy mon create-initial" reliably fails when starting the 
ceph-mon service.  I attached a full log of the deploy attempt and have pasted 
a sample of the problem below.  Problem seems to be that the ceph-mon service 
it wants to start doesn't actually exist on the target system.  Any ideas?  
Thanks!

[root@bdcr151 ceph]# ceph-deploy --overwrite-conf mon create-initial
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.28): /usr/bin/ceph-deploy 
--overwrite-conf mon create-initial
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  overwrite_conf: True
[ceph_deploy.cli][INFO  ]  subcommand: create-initial
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   : 

[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  func  : 
[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.cli][INFO  ]  keyrings  : None
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts bdcr151 bdcr153 
bdcr155
[ceph_deploy.mon][DEBUG ] detecting platform for host bdcr151 ...
[bdcr151][DEBUG ] connected to host: bdcr151
[bdcr151][DEBUG ] detect platform information from remote host
[bdcr151][DEBUG ] detect machine type
[bdcr151][DEBUG ] find the location of an executable
[ceph_deploy.mon][INFO  ] distro info: Red Hat Enterprise Linux Server 7.1 Maipo
[bdcr151][DEBUG ] determining if provided host has same hostname in remote
[bdcr151][DEBUG ] get remote short hostname
[bdcr151][DEBUG ] deploying mon to bdcr151
[bdcr151][DEBUG ] get remote short hostname
[bdcr151][DEBUG ] remote hostname: bdcr151
[bdcr151][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[bdcr151][DEBUG ] create the mon path if it does not exist
[bdcr151][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-bdcr151/done
[bdcr151][DEBUG ] create a done file to avoid re-doing the mon deployment
[bdcr151][DEBUG ] create the init path if it does not exist
[bdcr151][INFO  ] Running command: systemctl enable ceph.target
[bdcr151][INFO  ] Running command: systemctl enable ceph-mon@bdcr151
[bdcr151][WARNIN] Failed to issue method call: No such file or directory
[bdcr151][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy.mon][ERROR ] Failed to execute command: systemctl enable 
ceph-mon@bdcr151
[ceph_deploy.mon][DEBUG ] detecting platform for host bdcr153 ...


ceph-deployment.log
Description: ceph-deployment.log
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MONs not forming quorum

2015-06-12 Thread Gruher, Joseph R
Augh, never mind, firewall problem.  Thanks anyway.

From: Gruher, Joseph R
Sent: Thursday, June 11, 2015 10:55 PM
To: ceph-users@lists.ceph.com
Cc: Gruher, Joseph R
Subject: MONs not forming quorum

Hi folks-

I'm trying to deploy 0.94.2 (Hammer) onto CentOS7.  I used to be pretty good at 
this on Ubuntu but it has been a while.  Anyway, my monitors are not forming 
quorum, and I'm not sure why.  They can definitely all ping each other and 
such.  Any thoughts on specific problems in the output below, or just general 
causes for monitors not forming quorum, or where to get more debug information 
on what is going wrong?  Thanks!!

[root@bdca151 ceph]# ceph-deploy mon create-initial bdca15{0,2,3}
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.25): /bin/ceph-deploy mon create-initial 
bdca150 bdca152 bdca153
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts bdca150 bdca152 
bdca153
[ceph_deploy.mon][DEBUG ] detecting platform for host bdca150 ...
[bdca150][DEBUG ] connected to host: bdca150
[bdca150][DEBUG ] detect platform information from remote host
[bdca150][DEBUG ] detect machine type
[ceph_deploy.mon][INFO  ] distro info: CentOS Linux 7.1.1503 Core
[bdca150][DEBUG ] determining if provided host has same hostname in remote
[bdca150][DEBUG ] get remote short hostname
[bdca150][DEBUG ] deploying mon to bdca150
[bdca150][DEBUG ] get remote short hostname
[bdca150][DEBUG ] remote hostname: bdca150
[bdca150][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[bdca150][DEBUG ] create the mon path if it does not exist
[bdca150][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-bdca150/done
[bdca150][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-bdca150/done
[bdca150][INFO  ] creating keyring file: 
/var/lib/ceph/tmp/ceph-bdca150.mon.keyring
[bdca150][DEBUG ] create the monitor keyring file
[bdca150][INFO  ] Running command: ceph-mon --cluster ceph --mkfs -i bdca150 
--keyring /var/lib/ceph/tmp/ceph-bdca150.mon.keyring
[bdca150][DEBUG ] ceph-mon: renaming mon.noname-a 10.1.0.150:6789/0 to 
mon.bdca150
[bdca150][DEBUG ] ceph-mon: set fsid to 770514ba-65e6-475b-8d43-ad6ee850ead6
[bdca150][DEBUG ] ceph-mon: created monfs at /var/lib/ceph/mon/ceph-bdca150 for 
mon.bdca150
[bdca150][INFO  ] unlinking keyring file 
/var/lib/ceph/tmp/ceph-bdca150.mon.keyring
[bdca150][DEBUG ] create a done file to avoid re-doing the mon deployment
[bdca150][DEBUG ] create the init path if it does not exist
[bdca150][DEBUG ] locating the `service` executable...
[bdca150][INFO  ] Running command: /usr/sbin/service ceph -c 
/etc/ceph/ceph.conf start mon.bdca150
[bdca150][DEBUG ] === mon.bdca150 ===
[bdca150][DEBUG ] Starting Ceph mon.bdca150 on bdca150...
[bdca150][WARNIN] Running as unit run-52328.service.
[bdca150][DEBUG ] Starting ceph-create-keys on bdca150...
[bdca150][INFO  ] Running command: systemctl enable ceph
[bdca150][WARNIN] ceph.service is not a native service, redirecting to 
/sbin/chkconfig.
[bdca150][WARNIN] Executing /sbin/chkconfig ceph on
[bdca150][WARNIN] The unit files have no [Install] section. They are not meant 
to be enabled
[bdca150][WARNIN] using systemctl.
[bdca150][WARNIN] Possible reasons for having this kind of units are:
[bdca150][WARNIN] 1) A unit may be statically enabled by being symlinked from 
another unit's
[bdca150][WARNIN].wants/ or .requires/ directory.
[bdca150][WARNIN] 2) A unit's purpose may be to act as a helper for some other 
unit which has
[bdca150][WARNIN]a requirement dependency on it.
[bdca150][WARNIN] 3) A unit may be started when needed via activation (socket, 
path, timer,
[bdca150][WARNIN]D-Bus, udev, scripted systemctl call, ...).
[bdca150][INFO  ] Running command: ceph --cluster=ceph --admin-daemon 
/var/run/ceph/ceph-mon.bdca150.asok mon_status
[bdca150][DEBUG ] 

[bdca150][DEBUG ] status for monitor: mon.bdca150
[bdca150][DEBUG ] {
[bdca150][DEBUG ]   election_epoch: 0,
[bdca150][DEBUG ]   extra_probe_peers: [
[bdca150][DEBUG ] 10.1.0.152:6789/0,
[bdca150][DEBUG ] 10.1.0.153:6789/0
[bdca150][DEBUG ]   ],
[bdca150][DEBUG ]   monmap: {
[bdca150][DEBUG ] created: 0.00,
[bdca150][DEBUG ] epoch: 0,
[bdca150][DEBUG ] fsid: 770514ba-65e6-475b-8d43-ad6ee850ead6,
[bdca150][DEBUG ] modified: 0.00,
[bdca150][DEBUG ] mons: [
[bdca150][DEBUG ]   {
[bdca150][DEBUG ] addr: 10.1.0.150:6789/0,
[bdca150][DEBUG ] name: bdca150,
[bdca150][DEBUG ] rank: 0
[bdca150][DEBUG ]   },
[bdca150][DEBUG ]   {
[bdca150][DEBUG ] addr: 0.0.0.0:0/1,
[bdca150][DEBUG ] name: bdca152,
[bdca150][DEBUG ] rank: 1
[bdca150][DEBUG ]   },
[bdca150][DEBUG ]   {
[bdca150][DEBUG ] addr: 0.0.0.0:0/2,
[bdca150][DEBUG ] name: bdca153,
[bdca150][DEBUG ] rank: 2
[bdca150][DEBUG ]   }
[bdca150][DEBUG

[ceph-users] MONs not forming quorum

2015-06-11 Thread Gruher, Joseph R
Hi folks-

I'm trying to deploy 0.94.2 (Hammer) onto CentOS7.  I used to be pretty good at 
this on Ubuntu but it has been a while.  Anyway, my monitors are not forming 
quorum, and I'm not sure why.  They can definitely all ping each other and 
such.  Any thoughts on specific problems in the output below, or just general 
causes for monitors not forming quorum, or where to get more debug information 
on what is going wrong?  Thanks!!

[root@bdca151 ceph]# ceph-deploy mon create-initial bdca15{0,2,3}
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.25): /bin/ceph-deploy mon create-initial 
bdca150 bdca152 bdca153
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts bdca150 bdca152 
bdca153
[ceph_deploy.mon][DEBUG ] detecting platform for host bdca150 ...
[bdca150][DEBUG ] connected to host: bdca150
[bdca150][DEBUG ] detect platform information from remote host
[bdca150][DEBUG ] detect machine type
[ceph_deploy.mon][INFO  ] distro info: CentOS Linux 7.1.1503 Core
[bdca150][DEBUG ] determining if provided host has same hostname in remote
[bdca150][DEBUG ] get remote short hostname
[bdca150][DEBUG ] deploying mon to bdca150
[bdca150][DEBUG ] get remote short hostname
[bdca150][DEBUG ] remote hostname: bdca150
[bdca150][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[bdca150][DEBUG ] create the mon path if it does not exist
[bdca150][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-bdca150/done
[bdca150][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-bdca150/done
[bdca150][INFO  ] creating keyring file: 
/var/lib/ceph/tmp/ceph-bdca150.mon.keyring
[bdca150][DEBUG ] create the monitor keyring file
[bdca150][INFO  ] Running command: ceph-mon --cluster ceph --mkfs -i bdca150 
--keyring /var/lib/ceph/tmp/ceph-bdca150.mon.keyring
[bdca150][DEBUG ] ceph-mon: renaming mon.noname-a 10.1.0.150:6789/0 to 
mon.bdca150
[bdca150][DEBUG ] ceph-mon: set fsid to 770514ba-65e6-475b-8d43-ad6ee850ead6
[bdca150][DEBUG ] ceph-mon: created monfs at /var/lib/ceph/mon/ceph-bdca150 for 
mon.bdca150
[bdca150][INFO  ] unlinking keyring file 
/var/lib/ceph/tmp/ceph-bdca150.mon.keyring
[bdca150][DEBUG ] create a done file to avoid re-doing the mon deployment
[bdca150][DEBUG ] create the init path if it does not exist
[bdca150][DEBUG ] locating the `service` executable...
[bdca150][INFO  ] Running command: /usr/sbin/service ceph -c 
/etc/ceph/ceph.conf start mon.bdca150
[bdca150][DEBUG ] === mon.bdca150 ===
[bdca150][DEBUG ] Starting Ceph mon.bdca150 on bdca150...
[bdca150][WARNIN] Running as unit run-52328.service.
[bdca150][DEBUG ] Starting ceph-create-keys on bdca150...
[bdca150][INFO  ] Running command: systemctl enable ceph
[bdca150][WARNIN] ceph.service is not a native service, redirecting to 
/sbin/chkconfig.
[bdca150][WARNIN] Executing /sbin/chkconfig ceph on
[bdca150][WARNIN] The unit files have no [Install] section. They are not meant 
to be enabled
[bdca150][WARNIN] using systemctl.
[bdca150][WARNIN] Possible reasons for having this kind of units are:
[bdca150][WARNIN] 1) A unit may be statically enabled by being symlinked from 
another unit's
[bdca150][WARNIN].wants/ or .requires/ directory.
[bdca150][WARNIN] 2) A unit's purpose may be to act as a helper for some other 
unit which has
[bdca150][WARNIN]a requirement dependency on it.
[bdca150][WARNIN] 3) A unit may be started when needed via activation (socket, 
path, timer,
[bdca150][WARNIN]D-Bus, udev, scripted systemctl call, ...).
[bdca150][INFO  ] Running command: ceph --cluster=ceph --admin-daemon 
/var/run/ceph/ceph-mon.bdca150.asok mon_status
[bdca150][DEBUG ] 

[bdca150][DEBUG ] status for monitor: mon.bdca150
[bdca150][DEBUG ] {
[bdca150][DEBUG ]   election_epoch: 0,
[bdca150][DEBUG ]   extra_probe_peers: [
[bdca150][DEBUG ] 10.1.0.152:6789/0,
[bdca150][DEBUG ] 10.1.0.153:6789/0
[bdca150][DEBUG ]   ],
[bdca150][DEBUG ]   monmap: {
[bdca150][DEBUG ] created: 0.00,
[bdca150][DEBUG ] epoch: 0,
[bdca150][DEBUG ] fsid: 770514ba-65e6-475b-8d43-ad6ee850ead6,
[bdca150][DEBUG ] modified: 0.00,
[bdca150][DEBUG ] mons: [
[bdca150][DEBUG ]   {
[bdca150][DEBUG ] addr: 10.1.0.150:6789/0,
[bdca150][DEBUG ] name: bdca150,
[bdca150][DEBUG ] rank: 0
[bdca150][DEBUG ]   },
[bdca150][DEBUG ]   {
[bdca150][DEBUG ] addr: 0.0.0.0:0/1,
[bdca150][DEBUG ] name: bdca152,
[bdca150][DEBUG ] rank: 1
[bdca150][DEBUG ]   },
[bdca150][DEBUG ]   {
[bdca150][DEBUG ] addr: 0.0.0.0:0/2,
[bdca150][DEBUG ] name: bdca153,
[bdca150][DEBUG ] rank: 2
[bdca150][DEBUG ]   }
[bdca150][DEBUG ] ]
[bdca150][DEBUG ]   },
[bdca150][DEBUG ]   name: bdca150,
[bdca150][DEBUG ]   outside_quorum: [
[bdca150][DEBUG ] bdca150
[bdca150][DEBUG ]   ],
[bdca150][DEBUG ]   quorum: [],

Re: [ceph-users] Ceph RBD 0.78 Bug or feature?

2014-04-04 Thread Gruher, Joseph R
Hi folks-

Was this ever resolved?  I’m not finding a resolution in the email chain, 
apologies if I am missing it.  I am experiencing this same problem.  Cluster 
works fine for object traffic, can’t seem to get rbd to work in 0.78.  Worked 
fine in 0.72.2 for me.  Running Ubuntu 13.04 with 3.12 kernel.

$ rbd create rbd/myimage --size 102400
$ sudo rbd map rbd/myimage
rbd: add failed: (5) Input/output error

$ rbd ls rbd
myimage
$

Thanks,
Joe

From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of  ???
Sent: Tuesday, March 25, 2014 1:59 AM
To: Ilya Dryomov
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Ceph RBD 0.78 Bug or feature?

Ilya, set chooseleaf_vary_r 0, but no map rbd images.

[root@ceph01 cluster]# rbd map rbd/tst
2014-03-25 12:48:14.318167 7f44717f7760  2 auth: KeyRing::load: loaded key file 
/etc/ceph/ceph.client.admin.keyring
rbd: add failed: (5) Input/output error

[root@ceph01 cluster]# cat /var/log/messages | tail
Mar 25 12:45:06 ceph01 kernel: libceph: osdc handle_map corrupt msg
Mar 25 12:45:06 ceph01 kernel: libceph: mon2 
192.168.100.203:6789http://192.168.100.203:6789 session established
Mar 25 12:46:33 ceph01 kernel: libceph: client11240 fsid 
10b46114-ac17-404e-99e3-69b34b85c901
Mar 25 12:46:33 ceph01 kernel: libceph: got v 13 cv 11  9 of ceph_pg_pool
Mar 25 12:46:33 ceph01 kernel: libceph: osdc handle_map corrupt msg
Mar 25 12:46:33 ceph01 kernel: libceph: mon2 
192.168.100.203:6789http://192.168.100.203:6789 session established
Mar 25 12:48:14 ceph01 kernel: libceph: client11313 fsid 
10b46114-ac17-404e-99e3-69b34b85c901
Mar 25 12:48:14 ceph01 kernel: libceph: got v 13 cv 11  9 of ceph_pg_pool
Mar 25 12:48:14 ceph01 kernel: libceph: osdc handle_map corrupt msg
Mar 25 12:48:14 ceph01 kernel: libceph: mon0 
192.168.100.201:6789http://192.168.100.201:6789 session established

I do not really understand this error. CRUSH correct.

Thanks.


2014-03-25 12:26 GMT+04:00 Ilya Dryomov 
ilya.dryo...@inktank.commailto:ilya.dryo...@inktank.com:
On Tue, Mar 25, 2014 at 8:38 AM, Ирек Фасихов 
malm...@gmail.commailto:malm...@gmail.com wrote:
 Hi, Ilya.

 I added the files(crushd and osddump) to a folder in GoogleDrive.

 https://drive.google.com/folderview?id=0BxoNLVWxzOJWX0NLV1kzQ1l3Ymcusp=sharing
OK, so this has nothing to do with caching.  You have chooseleaf_vary_r
set to 1 in your crushmap.  This is a new crush tunable, which was
introduced long after 3.14 merge window closed.  It will be supported
starting with 3.15, until then you should be able to do

ceph osd getcrushmap -o /tmp/crush
crushtool -i /tmp/crush --set-chooseleaf_vary_r 0 -o /tmp/crush.new
ceph osd setcrushmap -i /tmp/crush.new

to disable it.

Thanks,

Ilya



--
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph RBD 0.78 Bug or feature?

2014-04-04 Thread Gruher, Joseph R
Meant to include this – what do these messages indicate?  All systems have 0.78.

[1301268.557820] Key type ceph registered
[1301268.558524] libceph: loaded (mon/osd proto 15/24)
[1301268.579486] rbd: loaded rbd (rados block device)
[1301268.582364] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301268.582462] libceph: mon1 10.0.0.102:6789 socket error on read
[1301278.589461] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301278.589558] libceph: mon1 10.0.0.102:6789 socket error on read
[1301288.607615] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301288.607713] libceph: mon1 10.0.0.102:6789 socket error on read
[1301298.625873] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301298.625970] libceph: mon1 10.0.0.102:6789 socket error on read
[1301308.643936] libceph: mon0 10.0.0.101:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301308.644033] libceph: mon0 10.0.0.101:6789 socket error on read
[1301318.662082] libceph: mon0 10.0.0.101:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301318.662179] libceph: mon0 10.0.0.101:6789 socket error on read
[1301449.695232] libceph: mon0 10.0.0.101:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301449.695329] libceph: mon0 10.0.0.101:6789 socket error on read
[1301459.716235] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301459.716332] libceph: mon1 10.0.0.102:6789 socket error on read
[1301469.734425] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301469.734523] libceph: mon1 10.0.0.102:6789 socket error on read
[1301479.752603] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301479.752700] libceph: mon1 10.0.0.102:6789 socket error on read
[1301489.770773] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301489.770870] libceph: mon1 10.0.0.102:6789 socket error on read
[1301499.788904] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301499.789001] libceph: mon1 10.0.0.102:6789 socket error on read

$ ceph --version
ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4)

$ ssh mohonpeak01 'ceph --version'
ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4)

$ ssh mohonpeak02 'ceph --version'
ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4)

$ ceph health detail
HEALTH_WARN noscrub,nodeep-scrub flag(s) set
noscrub,nodeep-scrub flag(s) set

$ ceph status
cluster b12ebb71-e4a6-41fa-8246-71cbfa09fb6e
 health HEALTH_WARN noscrub,nodeep-scrub flag(s) set
 monmap e1: 2 mons at 
{mohonpeak01=10.0.0.101:6789/0,mohonpeak02=10.0.0.102:6789/0}, election epoch 
10, quorum 0,1 mohonpeak01,mohonpeak02
 osdmap e216: 18 osds: 18 up, 18 in
flags noscrub,nodeep-scrub
  pgmap v202112: 2784 pgs, 10 pools, 1637 GB data, 427 kobjects
2439 GB used, 12643 GB / 15083 GB avail
2784 active+clean


From: Gruher, Joseph R
Sent: Friday, April 04, 2014 11:44 AM
To: 'Ирек Фасихов'; Ilya Dryomov
Cc: ceph-users@lists.ceph.com; Gruher, Joseph R
Subject: RE: [ceph-users] Ceph RBD 0.78 Bug or feature?

Hi folks-

Was this ever resolved?  I’m not finding a resolution in the email chain, 
apologies if I am missing it.  I am experiencing this same problem.  Cluster 
works fine for object traffic, can’t seem to get rbd to work in 0.78.  Worked 
fine in 0.72.2 for me.  Running Ubuntu 13.04 with 3.12 kernel.

$ rbd create rbd/myimage --size 102400
$ sudo rbd map rbd/myimage
rbd: add failed: (5) Input/output error

$ rbd ls rbd
myimage
$

Thanks,
Joe

From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of  ???
Sent: Tuesday, March 25, 2014 1:59 AM
To: Ilya Dryomov
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Ceph RBD 0.78 Bug or feature?

Ilya, set chooseleaf_vary_r 0, but no map rbd images.

[root@ceph01 cluster]# rbd map rbd/tst
2014-03-25 12:48:14.318167 7f44717f7760  2 auth: KeyRing::load: loaded key file 
/etc/ceph/ceph.client.admin.keyring
rbd: add failed: (5) Input/output error

[root@ceph01 cluster]# cat /var/log/messages | tail
Mar 25 12:45:06 ceph01 kernel: libceph: osdc handle_map corrupt msg
Mar 25 12:45:06 ceph01 kernel: libceph: mon2 
192.168.100.203:6789http://192.168.100.203:6789 session established
Mar 25 12:46:33 ceph01 kernel: libceph: client11240 fsid 
10b46114-ac17-404e-99e3-69b34b85c901
Mar 25 12:46:33 ceph01 kernel: libceph: got v 13 cv 11  9 of ceph_pg_pool
Mar 25 12:46:33 ceph01 kernel: libceph

Re: [ceph-users] Ceph RBD 0.78 Bug or feature?

2014-04-04 Thread Gruher, Joseph R
Aha – upgrade of kernel from 3.13 to 3.14 appears to have resolved the problem.

Thanks,
Joe

From: Gruher, Joseph R
Sent: Friday, April 04, 2014 11:48 AM
To: Ирек Фасихов; Ilya Dryomov
Cc: ceph-users@lists.ceph.com; Gruher, Joseph R
Subject: RE: [ceph-users] Ceph RBD 0.78 Bug or feature?

Meant to include this – what do these messages indicate?  All systems have 0.78.

[1301268.557820] Key type ceph registered
[1301268.558524] libceph: loaded (mon/osd proto 15/24)
[1301268.579486] rbd: loaded rbd (rados block device)
[1301268.582364] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301268.582462] libceph: mon1 10.0.0.102:6789 socket error on read
[1301278.589461] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301278.589558] libceph: mon1 10.0.0.102:6789 socket error on read
[1301288.607615] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301288.607713] libceph: mon1 10.0.0.102:6789 socket error on read
[1301298.625873] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301298.625970] libceph: mon1 10.0.0.102:6789 socket error on read
[1301308.643936] libceph: mon0 10.0.0.101:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301308.644033] libceph: mon0 10.0.0.101:6789 socket error on read
[1301318.662082] libceph: mon0 10.0.0.101:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301318.662179] libceph: mon0 10.0.0.101:6789 socket error on read
[1301449.695232] libceph: mon0 10.0.0.101:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301449.695329] libceph: mon0 10.0.0.101:6789 socket error on read
[1301459.716235] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301459.716332] libceph: mon1 10.0.0.102:6789 socket error on read
[1301469.734425] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301469.734523] libceph: mon1 10.0.0.102:6789 socket error on read
[1301479.752603] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301479.752700] libceph: mon1 10.0.0.102:6789 socket error on read
[1301489.770773] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301489.770870] libceph: mon1 10.0.0.102:6789 socket error on read
[1301499.788904] libceph: mon1 10.0.0.102:6789 feature set mismatch, my 
4a042a42  server's 104a042a42, missing 10
[1301499.789001] libceph: mon1 10.0.0.102:6789 socket error on read

$ ceph --version
ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4)

$ ssh mohonpeak01 'ceph --version'
ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4)

$ ssh mohonpeak02 'ceph --version'
ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4)

$ ceph health detail
HEALTH_WARN noscrub,nodeep-scrub flag(s) set
noscrub,nodeep-scrub flag(s) set

$ ceph status
cluster b12ebb71-e4a6-41fa-8246-71cbfa09fb6e
 health HEALTH_WARN noscrub,nodeep-scrub flag(s) set
 monmap e1: 2 mons at 
{mohonpeak01=10.0.0.101:6789/0,mohonpeak02=10.0.0.102:6789/0}, election epoch 
10, quorum 0,1 mohonpeak01,mohonpeak02
 osdmap e216: 18 osds: 18 up, 18 in
flags noscrub,nodeep-scrub
  pgmap v202112: 2784 pgs, 10 pools, 1637 GB data, 427 kobjects
2439 GB used, 12643 GB / 15083 GB avail
2784 active+clean


From: Gruher, Joseph R
Sent: Friday, April 04, 2014 11:44 AM
To: 'Ирек Фасихов'; Ilya Dryomov
Cc: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com; Gruher, Joseph 
R
Subject: RE: [ceph-users] Ceph RBD 0.78 Bug or feature?

Hi folks-

Was this ever resolved?  I’m not finding a resolution in the email chain, 
apologies if I am missing it.  I am experiencing this same problem.  Cluster 
works fine for object traffic, can’t seem to get rbd to work in 0.78.  Worked 
fine in 0.72.2 for me.  Running Ubuntu 13.04 with 3.12 kernel.

$ rbd create rbd/myimage --size 102400
$ sudo rbd map rbd/myimage
rbd: add failed: (5) Input/output error

$ rbd ls rbd
myimage
$

Thanks,
Joe

From: 
ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of  ???
Sent: Tuesday, March 25, 2014 1:59 AM
To: Ilya Dryomov
Cc: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Ceph RBD 0.78 Bug or feature?

Ilya, set chooseleaf_vary_r 0, but no map rbd images.

[root@ceph01 cluster]# rbd map rbd/tst
2014-03-25 12:48:14.318167 7f44717f7760  2 auth: KeyRing::load: loaded key file 
/etc/ceph/ceph.client.admin.keyring
rbd: add failed: (5) Input/output error

[root@ceph01 cluster]# cat /var/log/messages

[ceph-users] Erasure Code Setup

2014-03-24 Thread Gruher, Joseph R
Hi Folks-

Having a bit of trouble with EC setup on 0.78.  Hoping someone can help me out. 
 I've got most of the pieces in place, I think I'm just having a problem with 
the ruleset.

I am running 0.78:
ceph --version
ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4)

I created a new ruleset:
ceph osd crush rule create-erasure ecruleset

Then I created a new erasure code pool:
ceph osd pool create mycontainers_1 1800 1800 erasure crush_ruleset=ecruleset 
erasure-code-k=9 erasure-code-m=3

Pool exists:
ceph@joceph-admin01:/etc/ceph$ ceph osd dump
epoch 106
fsid b12ebb71-e4a6-41fa-8246-71cbfa09fb6e
created 2014-03-24 12:06:28.290970
modified 2014-03-24 12:42:59.231381
flags
pool 0 'data' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins 
pg_num 128 pgp_num 128 last_change 84 owner 0 flags hashpspool 
crash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 1 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 128 pgp_num 128 last_change 86 owner 0 flags hashpspool 
stripe_width 0
pool 2 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins 
pg_num 128 pgp_num 128 last_change 88 owner 0 flags hashpspool stripe_width 0
pool 4 'mycontainers_2' replicated size 2 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 1200 pgp_num 1200 last_change 100 owner 0 flags 
hashpspool stripe_width 0
pool 5 'mycontainers_3' replicated size 1 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 1800 pgp_num 1800 last_change 94 owner 0 flags 
hashpspool stripe_width 0
pool 6 'mycontainers_1' erasure size 12 min_size 1 crush_ruleset 1 object_hash 
rjenkins pg_num 1800 pgp_num 1800 last_change 104 owner 0 flags hashpspool 
stripe_width 4320

However, the new PGs won't come to a healthy state:
ceph@joceph-admin01:/etc/ceph$ ceph status
cluster b12ebb71-e4a6-41fa-8246-71cbfa09fb6e
 health HEALTH_WARN 1800 pgs incomplete; 1800 pgs stuck inactive; 1800 pgs 
stuck unclean
 monmap e1: 2 mons at 
{mohonpeak01=10.0.0.101:6789/0,mohonpeak02=10.0.0.102:6789/0}, election epoch 
4, quorum 0,1 mohonpeak01,mohonpeak02
 osdmap e106: 18 osds: 18 up, 18 in
  pgmap v261: 5184 pgs, 7 pools, 0 bytes data, 0 objects
682 MB used, 15082 GB / 15083 GB avail
3384 active+clean
1800 incomplete

I think this is because it is using a failure domain of hosts and I only have 2 
hosts (with 9 OSDs on each for 18 OSDs total).  I suspect I need to change the 
ruleset to use a failure domain of OSD instead of host.  This is also mentioned 
on this page: https://ceph.com/docs/master/dev/erasure-coded-pool/.

However, the guidance on that that page to adjust it using commands of the form 
ceph osd erasure-code-profile set myprofile is not working for me.  As far as 
I can tell ceph osd erasure-code-profile does not seem to be a valid command 
syntax.  Is this documentation correct and up to date for 0.78?  Can anyone 
suggest where I am going wrong?  Thanks!

ceph@joceph-admin01:/etc/ceph$ ceph osd erasure-code-profile ls
no valid command found; 10 closest matches:
osd tier add-cache poolname poolname int[0-]
osd tier set-overlay poolname poolname
osd tier remove-overlay poolname
osd tier remove poolname poolname
osd tier cache-mode poolname none|writeback|forward|readonly
osd thrash int[0-]
osd tier add poolname poolname {--force-nonempty}
osd stat
osd reweight-by-utilization {int[100-]}
osd pool stats {name}
Error EINVAL: invalid command
ceph@joceph-admin01:/etc/ceph$
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] erasure coding testing

2014-03-16 Thread Gruher, Joseph R
Great, thanks!  I'll watch (hope) for an update later this week.  Appreciate 
the rapid response.

-Joe

From: Ian Colle [mailto:ian.co...@inktank.com]
Sent: Sunday, March 16, 2014 7:22 PM
To: Gruher, Joseph R; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] erasure coding testing

Joe,

We're pushing to get 0.78 out this week, which will allow you to play with EC.

Ian R. Colle
Director of Engineering
Inktank
Delivering the Future of Storage
http://www.linkedin.com/in/ircolle
http://www.twitter.com/ircolle
Cell: +1.303.601.7713
Email: i...@inktank.commailto:i...@inktank.com

On 3/16/14, 8:11 PM, Gruher, Joseph R 
joseph.r.gru...@intel.commailto:joseph.r.gru...@intel.com wrote:

Hey all-

Can anyone tell me, if I install the latest development release (looks like it 
is 0.77) can I enable and test erasure coding?  Or do I have to wait for the 
actual Firefly release?  I don't want to deploy anything for production, 
basically I just want to do some lab testing to see what kind of CPU loading 
results from erasure coding.  Also, if anyone has any data along those lines 
already, would love a pointer to it.  Thanks!

-Joe
___ ceph-users mailing list 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Low RBD Performance

2014-02-04 Thread Gruher, Joseph R


-Original Message-
From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
boun...@lists.ceph.com] On Behalf Of Mark Nelson
Sent: Monday, February 03, 2014 6:48 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Low RBD Performance

On 02/03/2014 07:29 PM, Gruher, Joseph R wrote:
 Hi folks-

 I'm having trouble demonstrating reasonable performance of RBDs.  I'm
 running Ceph 0.72.2 on Ubuntu 13.04 with the 3.12 kernel.  I have four
 dual-Xeon servers, each with 24GB RAM, and an Intel 320 SSD for
 journals and four WD 10K RPM SAS drives for OSDs, all connected with
 an LSI 1078.  This is just a lab experiment using scrounged hardware
 so everything isn't sized to be a Ceph cluster, it's just what I have
 lying around, but I should have more than enough CPU and memory
resources.
 Everything is connected with a single 10GbE.

 When testing with RBDs from four clients (also running Ubuntu 13.04
 with
 3.12 kernel) I am having trouble breaking 300 IOPS on a 4KB random
 read or write workload (cephx set to none, replication set to one).
 IO is generated using FIO from four clients, each hosting a single 1TB
 RBD, and I've experimented with queue depths and increasing the number
 of RBDs without any benefit.  300 IOPS for a pool of 16 10K RPM HDDs
 seems quite low, not to mention the journal should provide a good
 boost on write workloads.  When I run a 4KB object write workload in
 Cosbench I can approach 3500 Obj/Sec which seems more reasonable.

 Sample FIO configuration:

 [global]

 ioengine=libaio

 direct=1

 ramp_time=300

 runtime=300

 [4k-rw]

 description=4k-rw

 filename=/dev/rbd1

 rw=randwrite

 bs=4k

 stonewall

 I use --iodepth=X on the FIO command line to set the queue depth when
 testing.

 I notice in the FIO output despite the iodepth setting it seems to be
 reporting an IO depth of only 1, which would certainly help explain
 poor performance, but I'm at a loss as to why, I wonder if it could be
 something specific to RBD behavior, like I need to use a different IO
 engine to establish queue depth.

 IO depths: 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
=64=0.0%

 Any thoughts appreciated!

Interesting results with the io depth at 1.  I Haven't seen that behaviour when
using libaio, direct=1, and higher io depths.  Is this kernel RBD or QEMU/KVM?
If it's QEMU/KVM, is it the libvirt driver?

Certainly 300 IOPS is low for that kind of setup compared to what we've seen
for RBD on other systems (especially with 1x replication).  Given that you are
seeing more reasonable performance with RGW, I guess I'd look at a couple
things:

- Figure out why fio is reporting queue depth = 1

Yup, I agree, I will work on this and report back.  First thought is to try 
specifying the queue depth in the FIO workload file instead of on the command 
line.

- Does increasing the num jobs help (ie get concurrency another way)?

I will give this a shot.

- Do you have enough PGs in the RBD pool?

I should, for 16 OSDs and no replication I use 2048 PGs/PGPs (100 * 16 / 1 
rounded up to power of 2).

- Are you using the virtio driver if QEMU/KVM?

No virtualization, clients are bare metal using kernel RBD.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Low RBD Performance

2014-02-04 Thread Gruher, Joseph R


-Original Message-
From: Gregory Farnum [mailto:g...@inktank.com]
Sent: Tuesday, February 04, 2014 9:46 AM
To: Gruher, Joseph R
Cc: Mark Nelson; ceph-users@lists.ceph.com; Ilya Dryomov
Subject: Re: [ceph-users] Low RBD Performance

On Tue, Feb 4, 2014 at 9:29 AM, Gruher, Joseph R
joseph.r.gru...@intel.com wrote:


-Original Message-
From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
boun...@lists.ceph.com] On Behalf Of Mark Nelson
Sent: Monday, February 03, 2014 6:48 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Low RBD Performance

On 02/03/2014 07:29 PM, Gruher, Joseph R wrote:
 Hi folks-

 I'm having trouble demonstrating reasonable performance of RBDs.
 I'm running Ceph 0.72.2 on Ubuntu 13.04 with the 3.12 kernel.  I
 have four dual-Xeon servers, each with 24GB RAM, and an Intel 320
 SSD for journals and four WD 10K RPM SAS drives for OSDs, all
 connected with an LSI 1078.  This is just a lab experiment using
 scrounged hardware so everything isn't sized to be a Ceph cluster,
 it's just what I have lying around, but I should have more than
 enough CPU and memory
resources.
 Everything is connected with a single 10GbE.

 When testing with RBDs from four clients (also running Ubuntu 13.04
 with
 3.12 kernel) I am having trouble breaking 300 IOPS on a 4KB random
 read or write workload (cephx set to none, replication set to one).
 IO is generated using FIO from four clients, each hosting a single
 1TB RBD, and I've experimented with queue depths and increasing the
 number of RBDs without any benefit.  300 IOPS for a pool of 16 10K
 RPM HDDs seems quite low, not to mention the journal should provide
 a good boost on write workloads.  When I run a 4KB object write
 workload in Cosbench I can approach 3500 Obj/Sec which seems more
reasonable.

 Sample FIO configuration:

 [global]

 ioengine=libaio

 direct=1

 ramp_time=300

 runtime=300

 [4k-rw]

 description=4k-rw

 filename=/dev/rbd1

 rw=randwrite

 bs=4k

 stonewall

 I use --iodepth=X on the FIO command line to set the queue depth
 when testing.

 I notice in the FIO output despite the iodepth setting it seems to
 be reporting an IO depth of only 1, which would certainly help
 explain poor performance, but I'm at a loss as to why, I wonder if
 it could be something specific to RBD behavior, like I need to use a
 different IO engine to establish queue depth.

 IO depths: 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
=64=0.0%

 Any thoughts appreciated!

Interesting results with the io depth at 1.  I Haven't seen that
behaviour when using libaio, direct=1, and higher io depths.  Is this kernel
RBD or QEMU/KVM?
If it's QEMU/KVM, is it the libvirt driver?

Certainly 300 IOPS is low for that kind of setup compared to what
we've seen for RBD on other systems (especially with 1x replication).
Given that you are seeing more reasonable performance with RGW, I
guess I'd look at a couple
things:

- Figure out why fio is reporting queue depth = 1

 Yup, I agree, I will work on this and report back.  First thought is to try
specifying the queue depth in the FIO workload file instead of on the
command line.

- Does increasing the num jobs help (ie get concurrency another way)?

 I will give this a shot.

- Do you have enough PGs in the RBD pool?

 I should, for 16 OSDs and no replication I use 2048 PGs/PGPs (100 * 16 / 1
rounded up to power of 2).

- Are you using the virtio driver if QEMU/KVM?

 No virtualization, clients are bare metal using kernel RBD.

I believe that directIO via the kernel client will go all the way to the OSDs 
and
to disk before returning. I imagine that something in the stack is preventing
the dispatch from actually happening asynchronously in that case, and the
reason you're getting 300 IOPS is because your total RTT is about 3 ms with
that code...

Ilya, is that assumption of mine correct? One thing that occurs to me is that 
for
direct IO it's fair to use the ack instead of on-disk response from the OSDs,
although that would only help us for people using btrfs.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

Ultimately this seems to be an FIO issue.  If I use --iodepth X or 
--iodepth=X on the FIO command line I always get queue depth 1.  After 
switching to specifying iodepth=X in the body of the FIO workload file I do 
get the desired queue depth and I can immediately see performance is much 
higher (a full re-test is underway, I can share some results when complete if 
anyone is curious).  This seems to have effectively worked around the problem, 
although I'm still curious why the command line parameters don't have the 
desired effect.  Thanks for the responses!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Low RBD Performance

2014-02-04 Thread Gruher, Joseph R
 Ultimately this seems to be an FIO issue.  If I use --iodepth X or --
iodepth=X on the FIO command line I always get queue depth 1.  After
switching to specifying iodepth=X in the body of the FIO workload file I do
get the desired queue depth and I can immediately see performance is much
higher (a full re-test is underway, I can share some results when complete if
anyone is curious).  This seems to have effectively worked around the
problem, although I'm still curious why the command line parameters don't
have the desired effect.  Thanks for the responses!


Strange!  I do most of our testing using the command line parameters as well.
What version of fio are you using?  Maybe there is a bug.  For what it's worth,
I'm using --iodepth=X, and fio version 1.59 from the Ubuntu precise
repository.

Mark

FIO --version reports 2.0.8.  Installed on Ubuntu 13.04 from the default 
repositories (just did an 'apt-get install fio').

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Low RBD Performance

2014-02-03 Thread Gruher, Joseph R
Hi folks-

I'm having trouble demonstrating reasonable performance of RBDs.  I'm running 
Ceph 0.72.2 on Ubuntu 13.04 with the 3.12 kernel.  I have four dual-Xeon 
servers, each with 24GB RAM, and an Intel 320 SSD for journals and four WD 10K 
RPM SAS drives for OSDs, all connected with an LSI 1078.  This is just a lab 
experiment using scrounged hardware so everything isn't sized to be a Ceph 
cluster, it's just what I have lying around, but I should have more than enough 
CPU and memory resources.  Everything is connected with a single 10GbE.

When testing with RBDs from four clients (also running Ubuntu 13.04 with 3.12 
kernel) I am having trouble breaking 300 IOPS on a 4KB random read or write 
workload (cephx set to none, replication set to one).  IO is generated using 
FIO from four clients, each hosting a single 1TB RBD, and I've experimented 
with queue depths and increasing the number of RBDs without any benefit.  300 
IOPS for a pool of 16 10K RPM HDDs seems quite low, not to mention the journal 
should provide a good boost on write workloads.  When I run a 4KB object write 
workload in Cosbench I can approach 3500 Obj/Sec which seems more reasonable.

Sample FIO configuration:

[global]
ioengine=libaio
direct=1
ramp_time=300
runtime=300
[4k-rw]
description=4k-rw
filename=/dev/rbd1
rw=randwrite
bs=4k
stonewall

I use --iodepth=X on the FIO command line to set the queue depth when testing.

I notice in the FIO output despite the iodepth setting it seems to be reporting 
an IO depth of only 1, which would certainly help explain poor performance, but 
I'm at a loss as to why, I wonder if it could be something specific to RBD 
behavior, like I need to use a different IO engine to establish queue depth.

IO depths: 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0%

Any thoughts appreciated!

Thanks,
Joe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Performance Testing Setup Tricks?

2014-01-23 Thread Gruher, Joseph R
Hi all-

I'm creating some scripted performance testing for my Ceph cluster.  The part 
relevant to my questions works like this:

1.   Create some pools

2.   Create and map some RBDs

3.   Write-in the RBDs using DD or FIO

4.   Run FIO testing on the RBDs (small block random and large block 
sequential with varying queue depths and workers)

5.   Delete the pools and make some new pools

6.   Populate with objects using Cosbench

7.   Run Cosbench to measure object read and write performance

8.   (repeat for various object sizes)

9.   Delete the pools

The whole this works pretty well as far as generating results.  The part I'm 
hoping to improve is steps 3 and 6, where I'm writing in the RBDs, or where I'm 
populating objects to the pools, respectively.  For any significant amount of 
data relative to the size of the cluster (which is 16TB now but will probably 
get bigger) this takes hours and hours and hours.  I'm wondering if there is 
any way to shortcut these preparation steps.  For example, for a new RBD, is 
there any way to tell Ceph to treat it as already written-in or thickly 
provisioned, and just serve me up whatever junk data is in there when I read 
from it?  Since the RBD sits on objects instead of blocks I'm guessing not but 
it doesn't hurt to ask.  Similarly, are there any tricks I might investigate 
for populating junk objects into a pool, which I can then read and write, other 
than actually writing all the objects in with a tool like Cosbench?

There may not be a better approach, but any thoughts are appreciated.  Thanks!

-Joe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] odd performance graph

2013-12-02 Thread Gruher, Joseph R
I don't know how rbd works inside, but i think ceph rbd here returns zeros
without real osd disk read if the block/sector of the rbd-disk is unused. That
would explain the graph you see. You can try adding a second rbd image and
not format/use it and benchmark this disk, then make a filesystem on it and
write some data and benchmark again...


When performance testing RBDs I generally write in the whole area before doing 
any testing to avoid this problem.  It would be interesting to have 
confirmation this is a real concern with Ceph.  I know it is in other thin 
provisioned storage, for example, VMWare.  Perhaps someone more expert can 
comment.

Also, is there any way to shortcut the write-in process?  Writing in TBs of RBD 
image can really extend the length of our performance test cycle.  It would be 
great if there was some shortcut to cause Ceph to treat the whole RBD as having 
already been written, or just go fetch data from disk on all reads regardless 
of whether that area had been written, just for testing purposes.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Minimum requirements for ceph monitors?

2013-11-27 Thread Gruher, Joseph R
For ~$67 you get a mini-itx motherboard with a soldered on 17W dual core
1.8GHz ivy-bridge based Celeron (supports SSE4.2 CRC32 instructions!).
It has 2 standard dimm slots so no compromising on memory, on-board gigabit
eithernet, 3 3Gb/s + 1 6Gb/s SATA, and a single PCIE slot for an additional 
NIC.
This has the potential to make a very competent low cost, lowish power OSD
or mon server.  The biggest downside is that it doesn't appear to support ECC
memory.  Some of the newer Atoms appear to, so that might be an option as
well.

Yup, the server and storage purposed Atoms do support ECC.  I think Atom sounds 
like an interesting fit for OSD servers, the new Avoton SoCs are quite fast, 
can host up to 64GB ECC RAM on two channels, and have 4x1GbE or 1x10GbE 
onboard.  Plus six SATA lanes onboard which would be a nice fit for an OS disk, 
a journal SSD and four OSD disks.  I have been hoping to track down a few 
boards and do some testing with Atom myself.

http://ark.intel.com/products/77987/Intel-Atom-Processor-C2750-4M-Cache-2_40-GHz
 

Would be interested to hear if anyone else has tried such an experiment.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [ANN] ceph-deploy 1.3.3 released!

2013-11-26 Thread Gruher, Joseph R
Hi Alfredo-

Have you looked at adding the ability to specify a proxy on the ceph-deploy 
command line?  Something like:

ceph-deploy install --proxy {http_proxy}

That would then need to run all the remote commands (rpm, curl, wget, etc) with 
the proxy.  Not sure how complex that would be to implement... just curious.

-Joe

-Original Message-
From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
boun...@lists.ceph.com] On Behalf Of Alfredo Deza
Sent: Tuesday, November 26, 2013 12:51 PM
To: ceph-devel; ceph-users@lists.ceph.com
Subject: [ceph-users] [ANN] ceph-deploy 1.3.3 released!

Hi All,

There is a new release of ceph-deploy, the easy deployment tool for Ceph.

The most important (non-bug) change for this release is the ability to specify
repository mirrors when installing ceph. This can be done with environment
variables or flags in the `install` subcommand.

Full documentation on that feature can be found in the new location for docs:

http://ceph.com/ceph-deploy/docs/install.html#behind-firewall

The complete changelog can be found here:
http://ceph.com/ceph-deploy/docs/changelog.html#id1

Make sure you update!

Thanks,


Alfredo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy problems on CentOS-6.4

2013-11-22 Thread Gruher, Joseph R
Those aren't really errors, when ceph-deploy runs commands on the host anything 
that gets printed to stderr as a result is relayed back through ceph-deploy 
with the [ERROR] tag.  If you look at the content of the errors it just has the 
output of the commands that were run in the step beforehand.

This seems to confuse a ton of people, I wonder if ceph-deploy wouldn't be 
better off labeling this content as something like [OUTPUT] or [RESPONSE].

From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Gautam Saxena
Sent: Friday, November 22, 2013 10:48 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph-deploy problems on CentOS-6.4

I'm also getting similar problems, although in my installation, even though 
there are errors, it seems to finish. (I'm using centos 6.4 and emperor release 
and I added the defaults http and https to the sudoers file for the ia1 node, 
though I didn't do so for the the ia2 and ia3 nodes.) So is everything ok? If 
so, why are there error statements? Here are the excerpt of the logs:

command that I executed -- ceph-deploy install ia1 ia2 ia3

First portion of the log --

[ceph_deploy.cli][INFO  ] Invoked (1.3.2): /usr/bin/ceph-deploy install ia1 ia2 
ia3
[ceph_deploy.install][DEBUG ] Installing stable version emperor on cluster ceph 
hosts ia1 ia2 ia3
[ceph_deploy.install][DEBUG ] Detecting platform for host ia1 ...
[ia1][DEBUG ] connected to host: ia1
[ia1][DEBUG ] detect platform information from remote host
[ia1][DEBUG ] detect machine type
[ceph_deploy.install][INFO  ] Distro info: CentOS 6.4 Final
[ia1][INFO  ] installing ceph on ia1
[ia1][INFO  ] Running command: sudo yum -y -q install wget
[ia1][DEBUG ] Package wget-1.12-1.8.el6.x86_64 already installed and latest 
version
[ia1][INFO  ] adding EPEL repository
[ia1][INFO  ] Running command: sudo wget 
http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
[ia1][ERROR ] --2013-11-22 13:40:52--  
http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
[ia1][ERROR ] Resolving dl.fedoraproject.org... 209.132.181.23, 209.132.181.24, 
209.132.181.25, ...
[ia1][ERROR ] Connecting to 
dl.fedoraproject.orghttp://dl.fedoraproject.org|209.132.181.23|:80... 
connected.
[ia1][ERROR ] HTTP request sent, awaiting response... 200 OK
[ia1][ERROR ] Length: 14540 (14K) [application/x-rpm]
[ia1][ERROR ] Saving to: `epel-release-6-8.noarch.rpm.1'
[ia1][ERROR ]
[ia1][ERROR ]  0K ..    
100%  158K=0.09s
[ia1][ERROR ]
[ia1][ERROR ] 2013-11-22 13:40:52 (158 KB/s) - `epel-release-6-8.noarch.rpm.1' 
saved [14540/14540]
[ia1][ERROR ]
[ia1][INFO  ] Running command: sudo rpm -Uvh --replacepkgs epel-release-6*.rpm
[ia1][ERROR ] warning: epel-release-6-8.noarch.rpm: Header V3 RSA/SHA256 
Signature, key ID 0608b895: NOKEY
[ia1][DEBUG ] Preparing...
##
[ia1][DEBUG ] epel-release
##
[ia1][INFO  ] Running command: sudo rpm --import 
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
[ia1][INFO  ] Running command: sudo rpm -Uvh --replacepkgs 
http://ceph.com/rpm-emperor/el6/noarch/ceph-release-1-0.el6.noarch.rpm
[ia1][DEBUG ] Retrieving 
http://ceph.com/rpm-emperor/el6/noarch/ceph-release-1-0.el6.noarch.rpm
[ia1][DEBUG ] Preparing...
##
[ia1][DEBUG ] ceph-release
##
[ia1][INFO  ] Running command: sudo yum -y -q install ceph
[ia1][ERROR ] warning: rpmts_HdrFromFdno: Header V3 RSA/SHA256 Signature, key 
ID 0608b895: NOKEY
[ia1][ERROR ] Importing GPG key 0x0608B895:
[ia1][ERROR ]  Userid : EPEL (6) 
e...@fedoraproject.orgmailto:e...@fedoraproject.org
[ia1][ERROR ]  Package: epel-release-6-8.noarch (installed)
[ia1][ERROR ]  From   : /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6
[ia1][ERROR ] Warning: RPMDB altered outside of yum.
[ia1][INFO  ] Running command: sudo ceph --version
[ia1][DEBUG ] ceph version 0.72.1 (4d923861868f6a15dcb33fef7f50f674997322de)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy disk zap fails but succeeds on retry

2013-11-20 Thread Gruher, Joseph R


-Original Message-
From: Alfredo Deza [mailto:alfredo.d...@inktank.com]
Sent: Wednesday, November 20, 2013 7:17 AM
To: Gruher, Joseph R
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph-deploy disk zap fails but succeeds on retry

On Mon, Nov 18, 2013 at 1:12 PM, Gruher, Joseph R
joseph.r.gru...@intel.com wrote:

-Original Message-
From: Alfredo Deza [mailto:alfredo.d...@inktank.com]
Sent: Monday, November 18, 2013 6:34 AM
To: Gruher, Joseph R
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph-deploy disk zap fails but succeeds on
retry

I went ahead and created a ticket to track this, if you have any new
input, please make sure you add to the actual ticket:
http://tracker.ceph.com/issues/6793

Thanks for reporting the problem!


 Will do!  I should be bringing up a few different cluster configurations on 
 this
hardware (we're doing some Ceph performance testing) so I may be able to
reproduce again and get more details.

I am trying to replicate this but somehow failing... in what state where the
drives, e.g. did you have any partitions before starting? or was this like a 
new
drive out of the box that was put in there?

So far I can only see it sometimes in 13.04 and not anywhere else

Looking at the ceph-deploy disk list output I captured at the time (see below) 
it seems to be reporting partition data on the drives (/dev/sdd1 exists in this 
example).  The last thing I did with the drives prior to deploying Emperor was 
some tests to baseline their performance with FIO, including a fair amount of 
write activity to the raw devices.  As a result I would expect their initial 
state to have basically been junk data.  Prior to that FIO testing the disks 
would have been OSD disks in a Dumpling cluster.

ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk list joceph02
[ceph_deploy.cli][INFO  ] Invoked (1.3.2): /usr/bin/ceph-deploy disk list 
joceph02
[joceph02][DEBUG ] connected to host: joceph02
[joceph02][DEBUG ] detect platform information from remote host
[joceph02][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 13.04 raring
[ceph_deploy.osd][DEBUG ] Listing disks on joceph02...
[joceph02][INFO  ] Running command: sudo ceph-disk list
[joceph02][DEBUG ] /dev/sda :
[joceph02][DEBUG ]  /dev/sda1 other, ext4, mounted on /
[joceph02][DEBUG ]  /dev/sda2 other
[joceph02][DEBUG ]  /dev/sda5 swap, swap
[joceph02][DEBUG ] /dev/sdb other, unknown
[joceph02][DEBUG ] /dev/sdc other, unknown
[joceph02][DEBUG ] /dev/sdd :
[joceph02][DEBUG ]  /dev/sdd1 other
[joceph02][DEBUG ] /dev/sde :
[joceph02][DEBUG ]  /dev/sde1 other
[joceph02][DEBUG ] /dev/sdf :
[joceph02][DEBUG ]  /dev/sdf1 other

ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk zap joceph02:/dev/sdd
[ceph_deploy.cli][INFO  ] Invoked (1.3.2): /usr/bin/ceph-deploy disk zap 
joceph02:/dev/sdd
[ceph_deploy.osd][DEBUG ] zapping /dev/sdd on joceph02
[joceph02][DEBUG ] connected to host: joceph02
[joceph02][DEBUG ] detect platform information from remote host
[joceph02][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 13.04 raring
[joceph02][DEBUG ] zeroing last few blocks of device
[joceph02][INFO  ] Running command: sudo sgdisk --zap-all --clear --mbrtogpt -- 
/dev/sdd
[joceph02][ERROR ] Caution: invalid main GPT header, but valid backup; 
regenerating main header
[joceph02][ERROR ] from backup!
[joceph02][ERROR ]
[joceph02][ERROR ] Warning! Main partition table CRC mismatch! Loaded backup 
partition table
[joceph02][ERROR ] instead of main partition table!
[joceph02][ERROR ]
[joceph02][ERROR ] Warning! One or more CRCs don't match. You should repair the 
disk!
[joceph02][ERROR ]
[joceph02][ERROR ] Invalid partition data!
[joceph02][DEBUG ] Caution! After loading partitions, the CRC doesn't check out!
[joceph02][DEBUG ] GPT data structures destroyed! You may now partition the 
disk using fdisk or
[joceph02][DEBUG ] other utilities.
[joceph02][DEBUG ] Information: Creating fresh partition table; will override 
earlier problems!
[joceph02][DEBUG ] Non-GPT disk; not saving changes. Use -g to override.
[joceph02][ERROR ] Traceback (most recent call last):
[joceph02][ERROR ]   File 
/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/process.py, line 68, 
in run
[joceph02][ERROR ] reporting(conn, result, timeout)
[joceph02][ERROR ]   File 
/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/log.py, line 13, in 
reporting
[joceph02][ERROR ] received = result.receive(timeout)
[joceph02][ERROR ]   File 
/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/lib/execnet/gateway_base.py,
 line 455, in receive
[joceph02][ERROR ] raise self._getremoteerror() or EOFError()
[joceph02][ERROR ] RemoteError: Traceback (most recent call last):
[joceph02][ERROR ]   File string, line 806, in executetask
[joceph02][ERROR ]   File , line 35, in _remote_run
[joceph02][ERROR ] RuntimeError: command returned non-zero exit status: 3
[joceph02][ERROR ]
[joceph02][ERROR ]
[ceph_deploy

Re: [ceph-users] Size of RBD images

2013-11-19 Thread Gruher, Joseph R
So is there any size limit on RBD images?  I had a failure this morning 
mounting 1TB RBD.  Deleting now (why does it take so long to delete if it was 
never even mapped, much less written to?) and will retry with smaller images.  
See output below.  This is 0.72 on Ubuntu 13.04 with 3.12 kernel.

ceph@joceph-client01:~$ rbd info testrbd
rbd image 'testrbd':
size 1024 GB in 262144 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.1770.6b8b4567
format: 1

ceph@joceph-client01:~$ rbd map testrbd -p testpool01
rbd: add failed: (13) Permission denied

ceph@joceph-client01:~$ sudo rbd map testrbd -p testpool01
rbd: add failed: (2) No such file or directory

ceph@joceph-client01:/etc/ceph$ rados df
pool name   category KB  objects   clones 
degraded  unfound   rdrd KB   wrwr KB
data-  000  
  0   00000
metadata-  000  
  0   00000
rbd -  120  
  0   0   10788
testpool01  -  000  
  0   00000
testpool02  -  000  
  0   00000
testpool03  -  000  
  0   00000
testpool04  -  000  
  0   00000
  total used  23287851602
  total avail 9218978040
  total space11547763200

ceph@joceph-client01:/etc/ceph$ sudo modprobe rbd

ceph@joceph-client01:/etc/ceph$ sudo rbd map testrbd --pool testpool01
rbd: add failed: (2) No such file or directory

ceph@joceph-client01:/etc/ceph$ rbd info testrbd
rbd image 'testrbd':
size 1024 GB in 262144 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.1770.6b8b4567
format: 1


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Size of RBD images

2013-11-19 Thread Gruher, Joseph R


-Original Message-
From: Gruher, Joseph R
Sent: Tuesday, November 19, 2013 12:24 PM
To: 'Wolfgang Hennerbichler'; Bernhard Glomm
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] Size of RBD images

So is there any size limit on RBD images?  I had a failure this morning 
mounting
1TB RBD.  Deleting now (why does it take so long to delete if it was never even
mapped, much less written to?) and will retry with smaller images.  See
output below.  This is 0.72 on Ubuntu 13.04 with 3.12 kernel.

ceph@joceph-client01:~$ rbd info testrbd
rbd image 'testrbd':
size 1024 GB in 262144 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.1770.6b8b4567
format: 1

ceph@joceph-client01:~$ rbd map testrbd -p testpool01
rbd: add failed: (13) Permission denied

ceph@joceph-client01:~$ sudo rbd map testrbd -p testpool01
rbd: add failed: (2) No such file or directory

ceph@joceph-client01:/etc/ceph$ rados df
pool name   category KB  objects   clones 
degraded  unfound
rdrd KB   wrwr KB
data-  000 
   0   0000
0
metadata-  000 
   0   0000
0
rbd -  120 
   0   0   1078
8
testpool01  -  000 
   0   0000
0
testpool02  -  000 
   0   0000
0
testpool03  -  000 
   0   0000
0
testpool04  -  000 
   0   0000
0
  total used  23287851602
  total avail 9218978040
  total space11547763200

ceph@joceph-client01:/etc/ceph$ sudo modprobe rbd

ceph@joceph-client01:/etc/ceph$ sudo rbd map testrbd --pool testpool01
rbd: add failed: (2) No such file or directory

ceph@joceph-client01:/etc/ceph$ rbd info testrbd
rbd image 'testrbd':
size 1024 GB in 262144 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.1770.6b8b4567
format: 1


I think I figured out where I went wrong here.  I had thought if you didn't 
specify the pool on the 'rbd create' command line you could then later map to 
any pool.  In retrospect that probably doesn't make a lot of sense and it 
appears if you don't specify the pool at the create step it just defaults to 
the rbd pool.  See example below.

ceph@joceph-client01:/etc/ceph$ sudo rbd create --size 1048576 testimage5 
--pool testpool01
ceph@joceph-client01:/etc/ceph$ sudo rbd map testimage5 --pool testpool01

ceph@joceph-client01:/etc/ceph$ sudo rbd create --size 1048576 testimage6
ceph@joceph-client01:/etc/ceph$ sudo rbd map testimage6 --pool testpool01
rbd: add failed: (2) No such file or directory

ceph@joceph-client01:/etc/ceph$ sudo rbd map testimage6 --pool rbd
ceph@joceph-client01:/etc/ceph$
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy disk zap fails but succeeds on retry

2013-11-18 Thread Gruher, Joseph R

-Original Message-
From: Alfredo Deza [mailto:alfredo.d...@inktank.com]
Sent: Monday, November 18, 2013 6:34 AM
To: Gruher, Joseph R
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph-deploy disk zap fails but succeeds on retry

I went ahead and created a ticket to track this, if you have any new input,
please make sure you add to the actual ticket:
http://tracker.ceph.com/issues/6793

Thanks for reporting the problem!


Will do!  I should be bringing up a few different cluster configurations on 
this hardware (we're doing some Ceph performance testing) so I may be able to 
reproduce again and get more details.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy disk zap fails but succeeds on retry

2013-11-15 Thread Gruher, Joseph R
Using ceph-deploy 1.3.2 with ceph 0.72.1.  Ceph-deploy disk zap will fail and 
exit with error, but then on retry will succeed.  This is repeatable as I go 
through each of the OSD disks in my cluster.  See output below.

I am guessing the first attempt to run changes something about the initial 
state of the disk which then allows the second run to complete, but if it can 
be changed to where it will complete, why doesn't the first run just do that?

The main negative effect is this causes a compact command like ceph-deploy disk 
zap joceph0{1,2,3,4}:/dev/sd{b,c,d,e,f} to fail and exit without running 
through all the targets.

I did not encounter this in the previous release of ceph and ceph-deploy 
(dumpling and 1.2.7?) but I can't say for sure my disks were in the same 
initial state when running ceph-deploy on that release.

Would this be a bug, or expected behavior?

ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk zap joceph02:/dev/sdc
[ceph_deploy.cli][INFO  ] Invoked (1.3.2): /usr/bin/ceph-deploy disk zap 
joceph02:/dev/sdc
[ceph_deploy.osd][DEBUG ] zapping /dev/sdc on joceph02
[joceph02][DEBUG ] connected to host: joceph02
[joceph02][DEBUG ] detect platform information from remote host
[joceph02][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 13.04 raring
[joceph02][DEBUG ] zeroing last few blocks of device
[joceph02][INFO  ] Running command: sudo sgdisk --zap-all --clear --mbrtogpt -- 
/dev/sdc
[joceph02][ERROR ] Caution: invalid main GPT header, but valid backup; 
regenerating main header
[joceph02][ERROR ] from backup!
[joceph02][ERROR ]
[joceph02][ERROR ] Warning! Main partition table CRC mismatch! Loaded backup 
partition table
[joceph02][ERROR ] instead of main partition table!
[joceph02][ERROR ]
[joceph02][ERROR ] Warning! One or more CRCs don't match. You should repair the 
disk!
[joceph02][ERROR ]
[joceph02][ERROR ] Invalid partition data!
[joceph02][DEBUG ] Caution! After loading partitions, the CRC doesn't check out!
[joceph02][DEBUG ] GPT data structures destroyed! You may now partition the 
disk using fdisk or
[joceph02][DEBUG ] other utilities.
[joceph02][DEBUG ] Information: Creating fresh partition table; will override 
earlier problems!
[joceph02][DEBUG ] Non-GPT disk; not saving changes. Use -g to override.
[joceph02][ERROR ] Traceback (most recent call last):
[joceph02][ERROR ]   File 
/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/process.py, line 68, 
in run
[joceph02][ERROR ] reporting(conn, result, timeout)
[joceph02][ERROR ]   File 
/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/log.py, line 13, in 
reporting
[joceph02][ERROR ] received = result.receive(timeout)
[joceph02][ERROR ]   File 
/usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/lib/execnet/gateway_base.py,
 line 455, in receive
[joceph02][ERROR ] raise self._getremoteerror() or EOFError()
[joceph02][ERROR ] RemoteError: Traceback (most recent call last):
[joceph02][ERROR ]   File string, line 806, in executetask
[joceph02][ERROR ]   File , line 35, in _remote_run
[joceph02][ERROR ] RuntimeError: command returned non-zero exit status: 3
[joceph02][ERROR ]
[joceph02][ERROR ]
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: sgdisk --zap-all 
--clear --mbrtogpt -- /dev/sdc


ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk zap joceph02:/dev/sdc
[ceph_deploy.cli][INFO  ] Invoked (1.3.2): /usr/bin/ceph-deploy disk zap 
joceph02:/dev/sdc
[ceph_deploy.osd][DEBUG ] zapping /dev/sdc on joceph02
[joceph02][DEBUG ] connected to host: joceph02
[joceph02][DEBUG ] detect platform information from remote host
[joceph02][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 13.04 raring
[joceph02][DEBUG ] zeroing last few blocks of device
[joceph02][INFO  ] Running command: sudo sgdisk --zap-all --clear --mbrtogpt -- 
/dev/sdc
[joceph02][DEBUG ] Creating new GPT entries.
[joceph02][DEBUG ] GPT data structures destroyed! You may now partition the 
disk using fdisk or
[joceph02][DEBUG ] other utilities.
[joceph02][DEBUG ] The operation has completed successfully.
ceph@joceph-admin01:/etc/ceph$



Here's some additional output with a disk-list executed in between zaps:

ceph@joceph-admin01:/etc/ceph$ ceph-deploy disk list joceph02
[ceph_deploy.cli][INFO  ] Invoked (1.3.2): /usr/bin/ceph-deploy disk list 
joceph02
[joceph02][DEBUG ] connected to host: joceph02
[joceph02][DEBUG ] detect platform information from remote host
[joceph02][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 13.04 raring
[ceph_deploy.osd][DEBUG ] Listing disks on joceph02...
[joceph02][INFO  ] Running command: sudo ceph-disk list
[joceph02][DEBUG ] /dev/sda :
[joceph02][DEBUG ]  /dev/sda1 other, ext4, mounted on /
[joceph02][DEBUG ]  /dev/sda2 other
[joceph02][DEBUG ]  /dev/sda5 swap, swap
[joceph02][DEBUG ] /dev/sdb other, unknown
[joceph02][DEBUG ] /dev/sdc other, unknown
[joceph02][DEBUG ] /dev/sdd :
[joceph02][DEBUG ]  /dev/sdd1 other

Re: [ceph-users] ceph cluster performance

2013-11-07 Thread Gruher, Joseph R
-Original Message-
From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
boun...@lists.ceph.com] On Behalf Of Dinu Vlad
Sent: Thursday, November 07, 2013 3:30 AM
To: ja...@peacon.co.uk; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph cluster performance

In this case however, the SSDs were only used for journals and I don't know if
ceph-osd sends TRIM to the drive in the process of journaling over a block
device. They were also under-subscribed, with just 3 x 10G partitions out of
240 GB raw capacity. I did a manual trim, but it hasn't changed anything.

If your SSD capacity is well in excess of your journal capacity requirements 
you could consider overprovisioning the SSD.  Overprovisioning should increase 
SSD performance and lifetime.  This achieves the same effect as trim to some 
degree (lets the SSD better understand what cells have real data and which can 
be treated as free).  I wonder how effective trim would be on a Ceph journal 
area.  If the journal empties and is then trimmed the next write cycle should 
be faster, but if the journal is active all the time the benefits would be lost 
almost immediately, as those cells are going to receive data again almost 
immediately and go back to an untrimmed state until the next trim occurs.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] please help me.problem with my ceph

2013-11-07 Thread Gruher, Joseph R
From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
boun...@lists.ceph.com] On Behalf Of ??
Sent: Wednesday, November 06, 2013 10:04 PM
To: ceph-users
Subject: [ceph-users] please help me.problem with my ceph

1.  I have installed ceph with one mon/mds and one osd.When i use 'ceph -
s',there si a warning:health HEALTH_WARN 384 pgs degraded; 384 pgs stuck
unclean; recovery 21/42 degraded (50.000%) 

I would think this is because Ceph defaults to a replication level of 2 and you 
only have one OSD (nowhere to write a second copy) so you are degraded?  You 
could add a second OSD or perhaps you could set the replication level to 1?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Block Storage QoS

2013-11-07 Thread Gruher, Joseph R
Is there any plan to implement some kind of QoS in Ceph?  Say I want to provide 
service level assurance to my OpenStack VMs and I might have to throttle 
bandwidth to some to provide adequate bandwidth to others - is anything like 
that planned for Ceph?  Generally with regard to block storage (rbds), not 
object or filesystem.

Or is there already a better way to do this elsewhere in the OpenStack cloud?

Thanks,
Joe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw fails to start

2013-11-04 Thread Gruher, Joseph R
Sorry to bump this, but does anyone have any idea what could be wrong here?

To resummarize, radosgw fails to start.  Debug output seems to indicate it is 
complaining about the keyring, but the keyring is present and readable, and 
other Ceph functions which require the keyring can success.  So why can't 
radosgw start?  Details below.

Thanks!

-Original Message-
From: Gruher, Joseph R
Sent: Friday, November 01, 2013 11:50 AM
To: Gruher, Joseph R
Subject: RE: radosgw fails to start

Adding some debug arguments has generated output which I believe
indicates the problem is my keyring is missing, but the keyring seems
to be here.  Why would this complain about the keyring and fail to start?

[ceph@joceph08 ceph]$ sudo /usr/bin/radosgw -d --debug-rgw 20 --debug-
ms 1 start
2013-11-01 10:59:47.015332 7f83978e4820  0 ceph version 0.67.4
(ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process radosgw, pid 18760
2013-11-01 10:59:47.015338 7f83978e4820 -1 WARNING: libcurl doesn't
support
curl_multi_wait()
2013-11-01 10:59:47.015340 7f83978e4820 -1 WARNING: cross zone / region
transfer performance may be affected
2013-11-01 10:59:47.018707 7f83978e4820  1 -- :/0 messenger.start
2013-11-01 10:59:47.018773 7f83978e4820 -1 monclient(hunting): ERROR:
missing keyring, cannot use cephx for authentication
2013-11-01 10:59:47.018774 7f83978e4820  0 librados: client.admin
initialization error (2) No such file or directory
2013-11-01 10:59:47.018788 7f83978e4820  1 -- :/1018760 mark_down_all
2013-11-01 10:59:47.018932 7f83978e4820  1 -- :/1018760 shutdown
complete.
2013-11-01 10:59:47.018967 7f83978e4820 -1 Couldn't init storage
provider
(RADOS)

[ceph@joceph08 ceph]$ sudo service ceph-radosgw status
/usr/bin/radosgw
is not running.

[ceph@joceph08 ceph]$ pwd
/etc/ceph

[ceph@joceph08 ceph]$ ls
ceph.client.admin.keyring  ceph.conf  keyring.radosgw.gateway  rbdmap

[ceph@joceph08 ceph]$ cat ceph.client.admin.keyring [client.admin]
    key = AQCYyHJSCFH3BBAA472q80qrAiIIVbvJfK/47A==

[ceph@joceph08 ceph]$ cat keyring.radosgw.gateway
[client.radosgw.gateway]
    key = AQBh6nNS0Cu3HxAAMxLsbEYZ3pEbwEBajQb1WA==
    caps mon = allow rw
    caps osd = allow rwx

[ceph@joceph08 ceph]$ cat ceph.conf
[client.radosgw.joceph08]
host = joceph08
log_file = /var/log/ceph/radosgw.log
keyring = /etc/ceph/keyring.radosgw.gateway rgw_socket_path =
/tmp/radosgw.sock

[global]
auth_service_required = cephx
filestore_xattr_use_omap = true
auth_client_required = cephx
auth_cluster_required = cephx
mon_host = 10.23.37.142,10.23.37.145,10.23.37.161,10.23.37.165
osd_journal_size = 1024
mon_initial_members = joceph01, joceph02, joceph03, joceph04 fsid =
74d808db-aaa7-41d2-8a84-7d590327a3c7

By the way, I can run other commands on the node which I think must require
the keyring. they succeed.

[ceph@joceph08 ceph]$ sudo /usr/bin/radosgw -d -c /etc/ceph/ceph.conf --
debug-rgw 20 --debug-ms 1 start
2013-11-01 11:45:07.935483 7ff2e2f11820  0 ceph version 0.67.4
(ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process radosgw, pid 19265
2013-11-01 11:45:07.935488 7ff2e2f11820 -1 WARNING: libcurl doesn't support
curl_multi_wait()
2013-11-01 11:45:07.935489 7ff2e2f11820 -1 WARNING: cross zone / region
transfer performance may be affected
2013-11-01 11:45:07.938719 7ff2e2f11820  1 -- :/0 messenger.start
2013-11-01 11:45:07.938817 7ff2e2f11820 -1 monclient(hunting): ERROR:
missing keyring, cannot use cephx for authentication
2013-11-01 11:45:07.938818 7ff2e2f11820  0 librados: client.admin 
initialization
error (2) No such file or directory
2013-11-01 11:45:07.938832 7ff2e2f11820  1 -- :/1019265 mark_down_all
2013-11-01 11:45:07.939150 7ff2e2f11820  1 -- :/1019265 shutdown complete.
2013-11-01 11:45:07.939219 7ff2e2f11820 -1 Couldn't init storage provider
(RADOS)

[ceph@joceph08 ceph]$ rados df
pool name   category KB  objects   clones 
degraded  unfound
rdrd KB   wrwr KB
data-  000 
   0   0000
0
metadata-  000 
   0   0000
0
rbd -  000 
   0   0000
0
  total used  6306480
  total avail11714822792
  total space11715453440

[ceph@joceph08 ceph]$ ceph status
  cluster 74d808db-aaa7-41d2-8a84-7d590327a3c7
   health HEALTH_OK
   monmap e1: 4 mons at
{joceph01=10.23.37.142:6789/0,joceph02=10.23.37.145:6789/0,joceph03=10.2
3.37.161:6789/0,joceph04=10.23.37.165:6789/0}, election epoch 8, quorum
0,1,2,3 joceph01,joceph02,joceph03,joceph04
   osdmap e88: 16 osds: 16 up, 16 in
pgmap v1402: 2400 pgs: 2400 active+clean; 0 bytes data, 615 MB used, 11172
GB / 11172 GB avail
   mdsmap e1: 0/0/1 up

Re: [ceph-users] radosgw fails to start

2013-11-04 Thread Gruher, Joseph R
-Original Message-
From: Yehuda Sadeh [mailto:yeh...@inktank.com]
Sent: Monday, November 04, 2013 12:40 PM
To: Gruher, Joseph R
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] radosgw fails to start

Not sure why you're able to run the 'rados' and 'ceph' command, and not
'radosgw', just note that the former two don't connect to the osds, whereas
the latter does, so it might fail on a different level.
You're using the default client.admin as the user for radosgw, but your
ceph.conf file doesn't have a section for it and all the relevant configurables
are under client.radosgw.gateway. Try fixing that first.

Yehuda


Thanks for the hint.  Adding the section below seems to have addressed the 
problem.  For some reason I didn't have to do this on my previous cluster but 
it seems to need it here.

[client.admin]
keyring = /etc/ceph/ceph.client.admin.keyring

Now I am failing with a new problem, probably something to do with how I set up 
Apache, I think, this seems to be some kind of FastCGI error:
2013-11-04 13:05:48.354547 7f1cd6f5d820  0 ERROR: FCGX_Accept_r returned -88

Full output: http://pastebin.com/gyhQnrgP 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw fails to start

2013-11-01 Thread Gruher, Joseph R
Adding some debug arguments has generated output which I believe indicates the 
problem is my keyring is missing, but the keyring seems to be here.  Why would 
this complain about the keyring and fail to start?

[ceph@joceph08 ceph]$ sudo /usr/bin/radosgw -d --debug-rgw 20 --debug-ms 1 start
2013-11-01 10:59:47.015332 7f83978e4820  0 ceph version 0.67.4 
(ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process radosgw, pid 18760
2013-11-01 10:59:47.015338 7f83978e4820 -1 WARNING: libcurl doesn't support 
curl_multi_wait()
2013-11-01 10:59:47.015340 7f83978e4820 -1 WARNING: cross zone / region 
transfer performance may be affected
2013-11-01 10:59:47.018707 7f83978e4820  1 -- :/0 messenger.start
2013-11-01 10:59:47.018773 7f83978e4820 -1 monclient(hunting): ERROR: missing 
keyring, cannot use cephx for authentication
2013-11-01 10:59:47.018774 7f83978e4820  0 librados: client.admin 
initialization error (2) No such file or directory
2013-11-01 10:59:47.018788 7f83978e4820  1 -- :/1018760 mark_down_all
2013-11-01 10:59:47.018932 7f83978e4820  1 -- :/1018760 shutdown complete.
2013-11-01 10:59:47.018967 7f83978e4820 -1 Couldn't init storage provider 
(RADOS)

[ceph@joceph08 ceph]$ sudo service ceph-radosgw status
/usr/bin/radosgw is not running.

[ceph@joceph08 ceph]$ pwd
/etc/ceph

[ceph@joceph08 ceph]$ ls
ceph.client.admin.keyring  ceph.conf  keyring.radosgw.gateway  rbdmap

[ceph@joceph08 ceph]$ cat ceph.client.admin.keyring
[client.admin]
key = AQCYyHJSCFH3BBAA472q80qrAiIIVbvJfK/47A==

[ceph@joceph08 ceph]$ cat keyring.radosgw.gateway
[client.radosgw.gateway]
key = AQBh6nNS0Cu3HxAAMxLsbEYZ3pEbwEBajQb1WA==
caps mon = allow rw
caps osd = allow rwx

[ceph@joceph08 ceph]$ cat ceph.conf
[client.radosgw.joceph08]
host = joceph08
log_file = /var/log/ceph/radosgw.log
keyring = /etc/ceph/keyring.radosgw.gateway
rgw_socket_path = /tmp/radosgw.sock

[global]
auth_service_required = cephx
filestore_xattr_use_omap = true
auth_client_required = cephx
auth_cluster_required = cephx
mon_host = 10.23.37.142,10.23.37.145,10.23.37.161,10.23.37.165
osd_journal_size = 1024
mon_initial_members = joceph01, joceph02, joceph03, joceph04
fsid = 74d808db-aaa7-41d2-8a84-7d590327a3c7


From: Gruher, Joseph R
Sent: Wednesday, October 30, 2013 12:24 PM
To: ceph-users@lists.ceph.com
Subject: radosgw fails to start, leaves no clues why

Hi all-

Trying to set up object storage on CentOS.  I've done this successfully on 
Ubuntu but I'm having some trouble on CentOS.  I think I have everything 
configured but when I try to start the radosgw service it reports starting, but 
then the status is not running, with no helpful output as to why on the console 
or in the radosgw log.  I once experienced a similar problem in Ubuntu when the 
hostname was incorrect in ceph.conf but that doesn't seem to be the issue here. 
 Not sure where to go next.  Any suggestions what could be the problem?  Thanks!

[ceph@joceph08 ceph]$ sudo service httpd restart
Stopping httpd:[  OK  ]
Starting httpd:[  OK  ]

[ceph@joceph08 ceph]$ cat ceph.conf
[joceph08.radosgw.gateway]
keyring = /etc/ceph/keyring.radosgw.gateway
rgw_dns_name = joceph08
host = joceph08
log_file = /var/log/ceph/radosgw.log
rgw_socket_path = /tmp/radosgw.sock
[global]
filestore_xattr_use_omap = true
mon_host = 10.23.37.142,10.23.37.145,10.23.37.161
osd_journal_size = 1024
mon_initial_members = joceph01, joceph02, joceph03
auth_supported = cephx
fsid = 721ea513-e84c-48df-9c8f-f1d9e602b810

[ceph@joceph08 ceph]$ sudo service ceph-radosgw start
Starting radosgw instance(s)...

[ceph@joceph08 ceph]$ sudo service ceph-radosgw status
/usr/bin/radosgw is not running.

[ceph@joceph08 ceph]$ sudo cat /var/log/ceph/radosgw.log
[ceph@joceph08 ceph]$

[ceph@joceph08 ceph]$ sudo cat /etc/ceph/keyring.radosgw.gateway
[client.radosgw.gateway]
key = AQDbUnFSIGT2BxAA5rz9I1HHIG/LJx+XCYot1w==
caps mon = allow rw
caps osd = allow rwx

[ceph@joceph08 ceph]$ ceph status
  cluster 721ea513-e84c-48df-9c8f-f1d9e602b810
   health HEALTH_OK
   monmap e1: 3 mons at 
{joceph01=10.23.37.142:6789/0,joceph02=10.23.37.145:6789/0,joceph03=10.23.37.161:6789/0},
 election epoch 8, quorum 0,1,2 joceph01,joceph02,joceph03
   osdmap e119: 16 osds: 16 up, 16 in
pgmap v1383: 3200 pgs: 3200 active+clean; 219 GB data, 411 GB used, 10760 
GB / 11172 GB avail
   mdsmap e1: 0/0/1 up
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw fails to start

2013-11-01 Thread Gruher, Joseph R
-Original Message-
From: Derek Yarnell [mailto:de...@umiacs.umd.edu]
Sent: Friday, November 01, 2013 12:20 PM
To: Gruher, Joseph R; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] radosgw fails to start

On 11/1/13, 2:07 PM, Gruher, Joseph R wrote:
 Adding some debug arguments has generated output which I believe
indicates the problem is my keyring is missing, but the keyring seems to be
here.  Why would this complain about the keyring and fail to start?

Hi,

Are you sure the user you are starting radosgw has the permission to read the
keyring file?

Thanks,
derek

Thanks for the suggestion.  Yup, it should be readable, first of all I'm 
starting radosgw with sudo, so root should be able to read anything, plus I set 
the file to be readable by all users just in case.  Problem persists...
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw fails to start, leaves no clues why

2013-10-30 Thread Gruher, Joseph R
Hi all-

Trying to set up object storage on CentOS.  I've done this successfully on 
Ubuntu but I'm having some trouble on CentOS.  I think I have everything 
configured but when I try to start the radosgw service it reports starting, but 
then the status is not running, with no helpful output as to why on the console 
or in the radosgw log.  I once experienced a similar problem in Ubuntu when the 
hostname was incorrect in ceph.conf but that doesn't seem to be the issue here. 
 Not sure where to go next.  Any suggestions what could be the problem?  Thanks!

[ceph@joceph08 ceph]$ sudo service httpd restart
Stopping httpd:[  OK  ]
Starting httpd:[  OK  ]

[ceph@joceph08 ceph]$ cat ceph.conf
[joceph08.radosgw.gateway]
keyring = /etc/ceph/keyring.radosgw.gateway
rgw_dns_name = joceph08
host = joceph08
log_file = /var/log/ceph/radosgw.log
rgw_socket_path = /tmp/radosgw.sock
[global]
filestore_xattr_use_omap = true
mon_host = 10.23.37.142,10.23.37.145,10.23.37.161
osd_journal_size = 1024
mon_initial_members = joceph01, joceph02, joceph03
auth_supported = cephx
fsid = 721ea513-e84c-48df-9c8f-f1d9e602b810

[ceph@joceph08 ceph]$ sudo service ceph-radosgw start
Starting radosgw instance(s)...

[ceph@joceph08 ceph]$ sudo service ceph-radosgw status
/usr/bin/radosgw is not running.

[ceph@joceph08 ceph]$ sudo cat /var/log/ceph/radosgw.log
[ceph@joceph08 ceph]$

[ceph@joceph08 ceph]$ sudo cat /etc/ceph/keyring.radosgw.gateway
[client.radosgw.gateway]
key = AQDbUnFSIGT2BxAA5rz9I1HHIG/LJx+XCYot1w==
caps mon = allow rw
caps osd = allow rwx

[ceph@joceph08 ceph]$ ceph status
  cluster 721ea513-e84c-48df-9c8f-f1d9e602b810
   health HEALTH_OK
   monmap e1: 3 mons at 
{joceph01=10.23.37.142:6789/0,joceph02=10.23.37.145:6789/0,joceph03=10.23.37.161:6789/0},
 election epoch 8, quorum 0,1,2 joceph01,joceph02,joceph03
   osdmap e119: 16 osds: 16 up, 16 in
pgmap v1383: 3200 pgs: 3200 active+clean; 219 GB data, 411 GB used, 10760 
GB / 11172 GB avail
   mdsmap e1: 0/0/1 up
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Red Hat clients

2013-10-30 Thread Gruher, Joseph R
I have CentOS 6.4 running with the 3.11.6 kernel from elrepo and it includes 
the rbd module.  I think you could make the same update on RHEL 6.4 and get 
rbd.  From there it is very simple to mount an rbd device.  Here are a few 
notes on what I did.

Update kernel:
sudo rpm --import http://elrepo.org/RPM-GPG-KEY-elrepo.org
sudo rpm -Uvh http://elrepo.org/elrepo-release-6-5.el6.elrepo.noarch.rpm
sudo yum -y update
sudo yum -y --enablerepo=elrepo-kernel install kernel-ml
sudo vim /boot/grub/menu.lst (update default to zero)
reboot

Create rbd device:
rbd create {name} --size {size_in_MB}
sudo modprobe rbd
sudo rbd map {name} --pool {pool_name}
Device appears at /dev/rbd/rbd/name


From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
alistair.whit...@barclays.com
Sent: Wednesday, October 30, 2013 11:48 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Red Hat clients

Now that my ceph cluster seems to be happy and stable, I have been looking at 
different ways of using it.   Object, block and file.

Object is relatively easy and I will use different ones to test with Ceph.

When I look at block, I'm getting the impression from a lot of Googling that 
deploying clients on Red Hat to connect to a Ceph cluster can be complex.   As 
I understand it, the rbd module is not currently in the Red Hat kernel (and I 
am not allowed to make changes to our standard kernel as is suggested in places 
as a possible solution).  Does this mean I can't connect a Red Hat machine to 
Ceph as a block client?


___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimerhttp://www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimerhttp://www.barclays.com/salesandtradingdisclaimer
 regarding market commentary from Barclays Sales and/or Trading, who are active 
market participants; and in respect of Barclays Research, including disclosures 
relating to specific issuers, please see http://publicresearch.barclays.com.

___
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy problems on CentOS-6.4

2013-10-29 Thread Gruher, Joseph R
If you are behind a proxy try configuring the wget proxy through /etc/wgetrc.  
I had a similar problem where I could complete wget commands manually but they 
would fail in ceph-deploy until I configured the wget proxy in that manner.

From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Trivedi, Narendra
Sent: Tuesday, October 29, 2013 9:51 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] ceph-deploy problems on CentOS-6.4

Hi All,

I am a newbie to ceph. I am installing ceph (dumpling release) using 
ceph-deploy (issued from my admin node) on one monitor and two OSD nodes 
running CentOS 6.4 (64-bit) using followed instructions in the link below:

http://ceph.com/docs/master/start/quick-ceph-deploy/

My setup looks exactly like the diagram. I followed pre-flight instructions 
exacty as outlined in the link below:

http://ceph.com/docs/master/start/quick-start-preflight/

The ceph-deploy takes forever and then throws up the following error:

2013-10-28 17:32:35,903 [ceph_deploy.cli][INFO  ] Invoked (1.2.7): 
/usr/bin/ceph-deploy new ceph-node1-mon-centos-6-4
2013-10-28 17:32:35,904 [ceph_deploy.new][DEBUG ] Creating new cluster named 
ceph
2013-10-28 17:32:35,904 [ceph_deploy.new][DEBUG ] Resolving host 
ceph-node1-mon-centos-6-4
2013-10-28 17:32:35,904 [ceph_deploy.new][DEBUG ] Monitor 
ceph-node1-mon-centos-6-4 at 10.12.0.70
2013-10-28 17:32:35,904 [ceph_deploy.new][DEBUG ] Monitor initial members are 
['ceph-node1-mon-centos-6-4']
2013-10-28 17:32:35,904 [ceph_deploy.new][DEBUG ] Monitor addrs are 
['10.12.0.70']
2013-10-28 17:32:35,905 [ceph_deploy.new][DEBUG ] Creating a random mon key...
2013-10-28 17:32:35,905 [ceph_deploy.new][DEBUG ] Writing initial config to 
ceph.conf...
2013-10-28 17:32:35,905 [ceph_deploy.new][DEBUG ] Writing monitor keyring to 
ceph.mon.keyring...
2013-10-28 17:33:10,287 [ceph_deploy.cli][INFO  ] Invoked (1.2.7): 
/usr/bin/ceph-deploy install ceph-node1-mon-centos-6-4 
ceph-node2-osd0-centos-6-4 ceph-admin-node-centos-6-4
2013-10-28 17:33:10,287 [ceph_deploy.install][DEBUG ] Installing stable version 
dumpling on cluster ceph hosts ceph-node1-mon-centos-6-4 
ceph-node2-osd0-centos-6-4 ceph-admin-node-centos-6-4
2013-10-28 17:33:10,288 [ceph_deploy.install][DEBUG ] Detecting platform for 
host ceph-node1-mon-centos-6-4 ...
2013-10-28 17:33:10,288 [ceph_deploy.sudo_pushy][DEBUG ] will use a remote 
connection without sudo
2013-10-28 17:33:10,626 [ceph_deploy.install][INFO  ] Distro info: CentOS 6.4 
Final
2013-10-28 17:33:10,626 [ceph-node1-mon-centos-6-4][INFO  ] installing ceph on 
ceph-node1-mon-centos-6-4
2013-10-28 17:33:10,633 [ceph-node1-mon-centos-6-4][INFO  ] adding EPEL 
repository
2013-10-28 17:33:10,633 [ceph-node1-mon-centos-6-4][INFO  ] Running command: 
wget http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
2013-10-28 19:20:35,893 [ceph-node1-mon-centos-6-4][ERROR ] Traceback (most 
recent call last):
2013-10-28 19:20:35,894 [ceph-node1-mon-centos-6-4][ERROR ]   File 
/usr/lib/python2.6/site-packages/ceph_deploy/hosts/centos/install.py, line 
77, in install_epel
2013-10-28 19:20:35,899 [ceph-node1-mon-centos-6-4][ERROR ]   File 
/usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py, line 10, in 
inner
2013-10-28 19:20:35,900 [ceph-node1-mon-centos-6-4][ERROR ]   File 
/usr/lib/python2.6/site-packages/ceph_deploy/util/wrappers.py, line 6, in 
remote_call
2013-10-28 19:20:35,902 [ceph-node1-mon-centos-6-4][ERROR ]   File 
/usr/lib64/python2.6/subprocess.py, line 502, in check_call
2013-10-28 19:20:35,903 [ceph-node1-mon-centos-6-4][ERROR ] raise 
CalledProcessError(retcode, cmd)
2013-10-28 19:20:35,904 [ceph-node1-mon-centos-6-4][ERROR ] CalledProcessError: 
Command '['wget', 
'http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm']' 
returned non-zero exit status 4
2013-10-28 19:20:35,911 [ceph-node1-mon-centos-6-4][ERROR ] --2013-10-28 
17:33:10--  
http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
2013-10-28 19:20:35,911 [ceph-node1-mon-centos-6-4][ERROR ] Resolving 
dl.fedoraproject.org... 209.132.181.25, 209.132.181.26, 209.132.181.27, ...
2013-10-28 19:20:35,912 [ceph-node1-mon-centos-6-4][ERROR ] Connecting to 
dl.fedoraproject.orghttp://dl.fedoraproject.org/|209.132.181.25|:80... 
failed: Connection timed out.
2013-10-28 19:20:35,912 [ceph-node1-mon-centos-6-4][ERROR ] Connecting to 
dl.fedoraproject.orghttp://dl.fedoraproject.org/|209.132.181.26|:80... 
failed: Connection timed out.

Interestingly, wget 
http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm; on 
each nodes (1 mon and 2 OSDs) succeeds without any problem. I have tried 
everything many times with root user, ceph user etc. but it fails every time! 
It is very frustrating!

Has anyone else experienced the same or similar problem?

Thanks a lot in advance!
Nar


This message contains information which may be confidential and/or privileged. 

Re: [ceph-users] Ceph-deploy, sudo and proxies

2013-10-25 Thread Gruher, Joseph R
Try configuring the curl proxy in /root/.curlrc.  I had a similar problem 
earlier this week.

Overall I have to be sure to set all these proxies individually for ceph-deploy 
to work on CentOS (Ubuntu is easier):
Curl: /root/.curlrc
rpm: /root/.rpmmacros
wget: /etc/wgetrc
yum: /etc/yum.conf

-Joe

From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
alistair.whit...@barclays.com
Sent: Friday, October 25, 2013 10:26 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Ceph-deploy, sudo and proxies

I have an interesting problem I was hoping someone could help with.

My Red Hat servers are configured to use proxies to access the internet.   I 
have managed to successfully add the Ceph repo install ceph-deploy on the admin 
node and create the cluster.   All ceph nodes are no password sudo tested and I 
have made sure that the proxy settings are kept when trying an 'rpm' command 
using sudo.  All other preflight checks are completed with ceph being the 
default login user etc.

So, when I run the ceph-deploy install ceph-node command from the admin node, 
I get the following error:

ceph@ldtdsr02se17 PROD $ ceph-deploy install ldtdsr02se18
[ceph_deploy.cli][INFO  ] Invoked (1.2.7): /usr/bin/ceph-deploy install 
ldtdsr02se18
[ceph_deploy.install][DEBUG ] Installing stable version dumpling on cluster 
ceph hosts ldtdsr02se18
[ceph_deploy.install][DEBUG ] Detecting platform for host ldtdsr02se18 ...
[ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection with sudo
[ceph_deploy.install][INFO  ] Distro info: RedHatEnterpriseServer 6.4 Santiago
[ldtdsr02se18][INFO  ] installing ceph on ldtdsr02se18
[ldtdsr02se18][INFO  ] Running command: su -c 'rpm --import 
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;'
[ldtdsr02se18][ERROR ] Traceback (most recent call last):
[ldtdsr02se18][ERROR ]   File 
/usr/lib/python2.6/site-packages/ceph_deploy/hosts/centos/install.py, line 
23, in install
[ldtdsr02se18][ERROR ]   File 
/usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py, line 10, in 
inner
[ldtdsr02se18][ERROR ]   File 
/usr/lib/python2.6/site-packages/ceph_deploy/util/wrappers.py, line 6, in 
remote_call
[ldtdsr02se18][ERROR ]   File /usr/lib64/python2.6/subprocess.py, line 502, 
in check_call
[ldtdsr02se18][ERROR ] raise CalledProcessError(retcode, cmd)
[ldtdsr02se18][ERROR ] CalledProcessError: Command '['su -c \'rpm --import 
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc\'']' 
returned non-zero exit status 1
[ldtdsr02se18][ERROR ] curl: (7) couldn't connect to host
[ldtdsr02se18][ERROR ] error: 
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc: import read 
failed(2).
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: su -c 'rpm 
--import https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc;'

Note that it uses sudo as it should and then complains about not being able to 
connect.   When I run the exact same command on the ceph node itself as the 
ceph user, it works without any errors.   This implies that the 
authentication is in place between ceph and root, and the proxy settings are 
correct.   Yet, it fails to work when initiated from the admin node via 
ceph-deploy.

Any ideas what might be going on here?   I should add that I looked at the 
github page about using the -no-adjust-repos flag but my version of 
ceph-deploy says it is an invalid flag...

Please help
Alistair


___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimerhttp://www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimerhttp://www.barclays.com/salesandtradingdisclaimer
 regarding market commentary from Barclays Sales and/or Trading, who are active 
market participants; and in respect of Barclays Research, including disclosures 
relating to specific issuers, please see http://publicresearch.barclays.com.

___
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Default PGs

2013-10-23 Thread Gruher, Joseph R
Should osd_pool_default_pg_num and osd_pool_default_pgp_num apply to the 
default pools?  I put them in ceph.conf before creating any OSDs but after 
bringing up the OSDs the default pools are using a value of 64.

Ceph.conf contains these lines in [global]:
osd_pool_default_pgp_num = 800
osd_pool_default_pg_num = 800

After creating and activating OSDs:

[ceph@joceph05 ceph]$ ceph osd pool get data pg_num
pg_num: 64
[ceph@joceph05 ceph]$ ceph osd pool get data pgp_num
pgp_num: 64

[ceph@joceph05 ceph]$ ceph osd dump

pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 
64 pgp_num 64 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 owner 0
pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 
64 pgp_num 64 last_change 1 owner 0

I have ceph-deploy 1.2.7 and ceph 0.67.4 on CentOS 6.4 with 3.11.6 kernel.

Thanks,
Joe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd client module in centos 6.4

2013-10-23 Thread Gruher, Joseph R
Hi all,

I have CentOS 6.4 with 3.11.6 kernel running (built from latest stable on 
kernel.org) and I cannot load the rbd client module.  Should I have to do 
anything to enable/install it?  Shouldn't it be present in this kernel?

[ceph@joceph05 /]$ cat /etc/centos-release
CentOS release 6.4 (Final)

[ceph@joceph05 /]$ uname -a
Linux joceph05.jf.intel.com 3.11.6 #1 SMP Mon Oct 21 17:23:07 PDT 2013 x86_64 
x86_64 x86_64 GNU/Linux

[ceph@joceph05 /]$ modprobe rbd
FATAL: Module rbd not found.
[ceph@joceph05 /]$

Thanks,
Joe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD journal size

2013-10-23 Thread Gruher, Joseph R
Speculating, but it seems possible that the ':' in the path is problematic, 
since that is also the separator between disk and journal (HOST:DISK:JOURNAL)?

Perhaps if you enclose in ''s or or use /dev/disk/by-id?

-Original Message-
From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
boun...@lists.ceph.com] On Behalf Of Shain Miley
Sent: Wednesday, October 23, 2013 1:55 PM
To: Alfredo Deza
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] OSD journal size

O.K...I found the help section in 1.2.7 that talks about using paths...however 
I
still cannot get this to work:


root@hqceph1:/usr/local/ceph-install-1# ceph-deploy osd prepare
hqosd1:/dev/disk/by-path/pci-:02:00.0-scsi-0:2:1:0

usage: ceph-deploy osd [-h] [--zap-disk] [--fs-type FS_TYPE] [--dmcrypt]
   [--dmcrypt-key-dir KEYDIR]
   SUBCOMMAND HOST:DISK[:JOURNAL] [HOST:DISK[:JOURNAL]
   ...]
ceph-deploy osd: error: argument HOST:DISK[:JOURNAL]: must be in form
HOST:DISK[:JOURNAL]


is '/dev/disk/by-path' names supported...or am I doing something wrong?

Thanks,

Shain



Shain Miley | Manager of Systems and Infrastructure, Digital Media |
smi...@npr.org | 202.513.3649


From: ceph-users-boun...@lists.ceph.com [ceph-users-
boun...@lists.ceph.com] on behalf of Shain Miley [smi...@npr.org]
Sent: Wednesday, October 23, 2013 4:19 PM
To: Alfredo Deza
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] OSD journal size

Alfredo,

Do you know what version of ceph-deploy has this updated functionality

I just updated to 1.2.7 and it does not appear to include it.

Thanks,

Shain

Shain Miley | Manager of Systems and Infrastructure, Digital Media |
smi...@npr.org | 202.513.3649


From: ceph-users-boun...@lists.ceph.com [ceph-users-
boun...@lists.ceph.com] on behalf of Shain Miley [smi...@npr.org]
Sent: Monday, October 21, 2013 6:13 PM
To: Alfredo Deza
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] OSD journal size

Alfredo,

Thanks a lot for the info.

I'll make sure I have an updated version of ceph-deploy and give it another
shot.

Shain
Shain Miley | Manager of Systems and Infrastructure, Digital Media |
smi...@npr.org | 202.513.3649


From: Alfredo Deza [alfredo.d...@inktank.com]
Sent: Monday, October 21, 2013 2:03 PM
To: Shain Miley
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] OSD journal size

On Mon, Oct 21, 2013 at 1:21 PM, Shain Miley smi...@npr.org wrote:
 Hi,

 We have been testing a ceph cluster with the following specs:

 3 Mon's
 72 OSD's spread across 6 Dell R-720xd servers
 4 TB SAS drives
 4 bonded 10 GigE NIC ports per server
 64 GB of RAM

 Up until this point we have been running tests using the default
 journal size of '1024'.
 Before we start to place production data on the cluster I was want to
 clear up the following questions I have:

 1)Is there a more appropriate journal size for my setup given the
 specs listed above?

 2)According to this link:

 http://www.slideshare.net/Inktank_Ceph/cern-ceph-day-london-2013/11

 CERN is using  '/dev/disk/by-path' for their OSD's.

 Does ceph-deploy currently support setting up OSD's using this method?

Indeed it does!

`ceph-deploy osd --help` got updated recently to demonstrate how this needs
to be done (an extra step is involved):

For paths, first prepare and then activate:

ceph-deploy osd prepare {osd-node-name}:/path/to/osd
ceph-deploy osd activate {osd-node-name}:/path/to/osd




 Thanks,

 Shain

 Shain Miley | Manager of Systems and Infrastructure, Digital Media |
 smi...@npr.org | 202.513.3649

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Client Timeout on Rados Gateway

2013-10-07 Thread Gruher, Joseph R
Thanks for the reply.  This eventually resolved itself when I upgraded the 
client kernel from the Ubuntu Server 12.04.2 default to the 3.6.10 kernel.  Not 
sure if there is a good causal explanation there or if it might be a 
coincidence.  I did see the kernel recommendations in the docs but I had 
assumed those just applied to the Ceph machines and not clients - perhaps that 
is a bad assumption.  Anyway, it works now, so I guess the next steps are to 
try moving the client back to the public network and to re-enable 
authentication and see if it works or if I still have an issue there.

With regard to versions:

ceph@cephtest06:/etc/ceph$ ceph-mon --version
ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a)

ceph@cephtest06:/etc/ceph$ uname -a
Linux cephtest06 3.6.10-030610-generic #201212101650 SMP Mon Dec 10 21:51:40 
UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

ceph@cephclient01:~/cos$ rados --version
ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a)

ceph@cephclient01:~/cos$ uname -a
Linux cephclient01 3.6.10-030610-generic #201212101650 SMP Mon Dec 10 21:51:40 
UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Thanks,
Joe

-Original Message-
From: Gregory Farnum [mailto:g...@inktank.com]
Sent: Monday, October 07, 2013 1:27 PM
To: Gruher, Joseph R
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Client Timeout on Rados Gateway

The ping tests you're running are connecting to different interfaces
(10.23.37.175) than those you specify in the mon_hosts option (10.0.0.2,
10.0.0.3, 10.0.0.4). The client needs to be able to connect to the specified
address; I'm guessing it's not routable from outside that network?

The error you're getting once you put it inside the network is more
interesting. What version of the Ceph packages do you have installed there,
and what's installed on the monitors? (run ceph-mon --version
on the monitor, and rados --version on the client, and it'll
output.)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

On Tue, Oct 1, 2013 at 12:45 PM, Gruher, Joseph R
joseph.r.gru...@intel.com wrote:
 Hello-



 I've set up a rados gateway but I'm having trouble accessing it from
 clients.  I can access it using rados command line just fine from any
 system in my ceph deployment, including my monitors and OSDs, the
 gateway system, and even the admin system I used to run ceph-deploy.
 However, when I set up a client outside the ceph nodes I get a timeout
 error as shown at the bottom of the output pasted below.  I've turned
 off authentication for the moment to simplify things.  Systems are
 able to resolve names and reach each other via ping.  Any thoughts on what
could be the issue here or how to debug?



 The failure:



 ceph@cephclient01:/etc/ceph$ rados df

 2013-10-01 19:57:07.488970 7fd381db0780 monclient(hunting):
 authenticate timed out after 30

 2013-10-01 19:57:07.489174 7fd381db0780 librados: client.admin
 authentication error (110) Connection timed out

 couldn't connect to cluster! error -110





 ceph@cephclient01:/etc/ceph$ sudo rados df

 2013-10-01 19:57:44.461273 7fb6712d5780 monclient(hunting):
 authenticate timed out after 30

 2013-10-01 19:57:44.461440 7fb6712d5780 librados: client.admin
 authentication error (110) Connection timed out

 couldn't connect to cluster! error -110

 ceph@cephclient01:/etc/ceph$





 Some details from the client:



 ceph@cephclient01:/etc/ceph$ pwd

 /etc/ceph





 ceph@cephclient01:/etc/ceph$ ls

 ceph.client.admin.keyring  ceph.conf  keyring.radosgw.gateway





 ceph@cephclient01:/etc/ceph$ cat ceph.conf

 [global]

 fsid = a45e6e54-70ef-4470-91db-2152965deec5

 mon_initial_members = cephtest02, cephtest03, cephtest04

 mon_host = 10.0.0.2,10.0.0.3,10.0.0.4

 osd_journal_size = 1024

 filestore_xattr_use_omap = true

 auth_cluster_required = none #cephx

 auth_service_required = none #cephx

 auth_client_required = none #cephx



 [client.radosgw.gateway]

 host = cephtest06

 keyring = /etc/ceph/keyring.radosgw.gateway

 rgw_socket_path = /tmp/radosgw.sock

 log_file = /var/log/ceph/radosgw.log





 ceph@cephclient01:/etc/ceph$ ping cephtest06

 PING cephtest06.jf.intel.com (10.23.37.175) 56(84) bytes of data.

 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=1
 ttl=64
 time=0.216 ms

 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=2
 ttl=64
 time=0.209 ms

 ^C

 --- cephtest06.jf.intel.com ping statistics ---

 2 packets transmitted, 2 received, 0% packet loss, time 999ms

 rtt min/avg/max/mdev = 0.209/0.212/0.216/0.015 ms





 ceph@cephclient01:/etc/ceph$ ping cephtest06.jf.intel.com

 PING cephtest06.jf.intel.com (10.23.37.175) 56(84) bytes of data.

 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=1
 ttl=64
 time=0.223 ms

 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=2
 ttl=64
 time=0.242 ms

 ^C

 --- cephtest06.jf.intel.com ping statistics ---

 2 packets transmitted, 2 received, 0% packet loss, time 999ms

Re: [ceph-users] Client Timeout on Rados Gateway

2013-10-07 Thread Gruher, Joseph R
Could you clarify something for me... I have a cluster network (10.0.0.x) and a 
public network (10.23.37.x).  All the Ceph machines have one interface on each 
network and clients (when configured normally) would only be on the public 
network.  My ceph.conf uses 10.0.0.x IPs for the monitors but as you mention 
below this can cause a problem for the client reaching the monitor since it is 
not on that network.  This could cause the rados command to fail?  What is the 
solution to that problem?  It doesn't seem like ceph.conf should use the public 
IPs for the monitor, don't we want those to be on the private network?  And the 
client wouldn't normally have access to the private network.  Is this really 
just an issue with accuss using rados, as swift or rbd would not need to access 
the monitors?



-Original Message-
From: Gregory Farnum [mailto:g...@inktank.com]
Sent: Monday, October 07, 2013 1:27 PM
To: Gruher, Joseph R
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Client Timeout on Rados Gateway

The ping tests you're running are connecting to different interfaces
(10.23.37.175) than those you specify in the mon_hosts option (10.0.0.2,
10.0.0.3, 10.0.0.4). The client needs to be able to connect to the specified
address; I'm guessing it's not routable from outside that network?

The error you're getting once you put it inside the network is more
interesting. What version of the Ceph packages do you have installed there,
and what's installed on the monitors? (run ceph-mon --version
on the monitor, and rados --version on the client, and it'll
output.)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

On Tue, Oct 1, 2013 at 12:45 PM, Gruher, Joseph R
joseph.r.gru...@intel.com wrote:
 Hello-



 I've set up a rados gateway but I'm having trouble accessing it from
 clients.  I can access it using rados command line just fine from any
 system in my ceph deployment, including my monitors and OSDs, the
 gateway system, and even the admin system I used to run ceph-deploy.
 However, when I set up a client outside the ceph nodes I get a timeout
 error as shown at the bottom of the output pasted below.  I've turned
 off authentication for the moment to simplify things.  Systems are
 able to resolve names and reach each other via ping.  Any thoughts on what
could be the issue here or how to debug?



 The failure:



 ceph@cephclient01:/etc/ceph$ rados df

 2013-10-01 19:57:07.488970 7fd381db0780 monclient(hunting):
 authenticate timed out after 30

 2013-10-01 19:57:07.489174 7fd381db0780 librados: client.admin
 authentication error (110) Connection timed out

 couldn't connect to cluster! error -110





 ceph@cephclient01:/etc/ceph$ sudo rados df

 2013-10-01 19:57:44.461273 7fb6712d5780 monclient(hunting):
 authenticate timed out after 30

 2013-10-01 19:57:44.461440 7fb6712d5780 librados: client.admin
 authentication error (110) Connection timed out

 couldn't connect to cluster! error -110

 ceph@cephclient01:/etc/ceph$





 Some details from the client:



 ceph@cephclient01:/etc/ceph$ pwd

 /etc/ceph





 ceph@cephclient01:/etc/ceph$ ls

 ceph.client.admin.keyring  ceph.conf  keyring.radosgw.gateway





 ceph@cephclient01:/etc/ceph$ cat ceph.conf

 [global]

 fsid = a45e6e54-70ef-4470-91db-2152965deec5

 mon_initial_members = cephtest02, cephtest03, cephtest04

 mon_host = 10.0.0.2,10.0.0.3,10.0.0.4

 osd_journal_size = 1024

 filestore_xattr_use_omap = true

 auth_cluster_required = none #cephx

 auth_service_required = none #cephx

 auth_client_required = none #cephx



 [client.radosgw.gateway]

 host = cephtest06

 keyring = /etc/ceph/keyring.radosgw.gateway

 rgw_socket_path = /tmp/radosgw.sock

 log_file = /var/log/ceph/radosgw.log





 ceph@cephclient01:/etc/ceph$ ping cephtest06

 PING cephtest06.jf.intel.com (10.23.37.175) 56(84) bytes of data.

 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=1
 ttl=64
 time=0.216 ms

 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=2
 ttl=64
 time=0.209 ms

 ^C

 --- cephtest06.jf.intel.com ping statistics ---

 2 packets transmitted, 2 received, 0% packet loss, time 999ms

 rtt min/avg/max/mdev = 0.209/0.212/0.216/0.015 ms





 ceph@cephclient01:/etc/ceph$ ping cephtest06.jf.intel.com

 PING cephtest06.jf.intel.com (10.23.37.175) 56(84) bytes of data.

 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=1
 ttl=64
 time=0.223 ms

 64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=2
 ttl=64
 time=0.242 ms

 ^C

 --- cephtest06.jf.intel.com ping statistics ---

 2 packets transmitted, 2 received, 0% packet loss, time 999ms

 rtt min/avg/max/mdev = 0.223/0.232/0.242/0.017 ms





 I did try putting the client on the 10.0.0.x network to see if that
 would affect behavior but that just seemed to introduce a new problem:



 ceph@cephclient01:/etc/ceph$ rados df

 2013-10-01 21:37:29.439410 7f60d2a43700 failed to decode message of
 type 59
 v1: buffer::end_of_buffer

 2013

Re: [ceph-users] Newbie question

2013-10-02 Thread Gruher, Joseph R
Along the lines of this thread, if I have OSD(s) on rotational HDD(s), but have 
the journal(s) going to an SSD, I am curious about the best procedure for 
replacing the SSD should it fail.

-Joe

From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Scottix
Sent: Wednesday, October 02, 2013 10:37 AM
To: Andy Paluch
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Newbie question

I actually am looking for a similar answer. If 1 osd = 1 HDD, in dumpling it 
will relocate the data for me after the timeout which is great. If I just want 
to replace the osd with an unformated new HDD what is the procedure?

One method that has worked for me is to remove it out of the crush map then re 
add the osd drive to the cluster. This works but seems like a lot of overhead 
just to replace a single drive. Is there a better way to do this?

On Wed, Oct 2, 2013 at 8:10 AM, Andy Paluch 
a...@webguyz.netmailto:a...@webguyz.net wrote:
What happens when a drive goes bad in ceph and has to be replaced (at the 
physical level) . In the Raid world you pop out the bad disk and stick a new 
one in and the controller takes care of getting it back into the system. With 
what I've been reading so far, it probably going be a mess to do this with ceph 
 and involve a lot of low level linux tweaking to remove and replace the disk 
that failed. Not a big Linux guy so was wondering if anyone can point to any 
docs on how to recover from a bad disk in a ceph node.

Thanks


___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Follow Me: @Scottixhttp://www.twitter.com/scottix
http://about.me/scottix
scot...@gmail.commailto:scot...@gmail.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] question on setup ssh documentation

2013-10-02 Thread Gruher, Joseph R
On my system my user is named ceph so I modified /home/ceph/.ssh/config.  
That seemed to work fine for me.  ~/ is shorthand for your user's home folder.

I think SSH will default to the current username so if you just use the same 
username everywhere this may not even be necessary.

My file:

ceph@cephtest01:/etc/ceph$ cat /home/ceph/.ssh/config
Host cephtest02
  Hostname cephtest02.jf.intel.com
  User ceph

Host cephtest03
  Hostname cephtest03.jf.intel.com
  User ceph

Host cephtest04
  Hostname cephtest04.jf.intel.com
  User ceph

Host cephtest05
  Hostname cephtest05.jf.intel.com
  User ceph

Host cephtest06
  Hostname cephtest06.jf.intel.com
  User ceph

ceph@cephtest01:/etc/ceph$


From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Nimish Patel
Sent: Wednesday, October 02, 2013 11:19 AM
To: ceph-us...@ceph.com
Subject: [ceph-users] question on setup ssh documentation

On this web page http://ceph.com/docs/master/start/quick-start-preflight/ where 
it says Modify your ~/.ssh/config file of your admin node so that it defaults 
to logging in as the user you created when no username is specified. Which  
config file do I change?

I am using Ubuntu server 13.04.

1.Which files do I modify? /etc/ssh/ssh_config or /etc/ssh/sshd_config ?

2.Am I supposed to see a config file in /root/.ssh?

3.Am I supposed to see a config file in /home/ceph/.ssh?
Thanks for the help!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph.conf with multiple rados gateways

2013-10-02 Thread Gruher, Joseph R
Can anyone provide me a sample ceph.conf with multiple rados gateways?  I must 
not be configuring it correctly and I can't seem to Google up an example or 
find one in the docs.  Thanks!

-Joe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Client Timeout on Rados Gateway

2013-10-01 Thread Gruher, Joseph R
Hello-

I've set up a rados gateway but I'm having trouble accessing it from clients.  
I can access it using rados command line just fine from any system in my ceph 
deployment, including my monitors and OSDs, the gateway system, and even the 
admin system I used to run ceph-deploy.  However, when I set up a client 
outside the ceph nodes I get a timeout error as shown at the bottom of the 
output pasted below.  I've turned off authentication for the moment to simplify 
things.  Systems are able to resolve names and reach each other via ping.  Any 
thoughts on what could be the issue here or how to debug?

The failure:

ceph@cephclient01:/etc/ceph$ rados df
2013-10-01 19:57:07.488970 7fd381db0780 monclient(hunting): authenticate timed 
out after 30
2013-10-01 19:57:07.489174 7fd381db0780 librados: client.admin authentication 
error (110) Connection timed out
couldn't connect to cluster! error -110


ceph@cephclient01:/etc/ceph$ sudo rados df
2013-10-01 19:57:44.461273 7fb6712d5780 monclient(hunting): authenticate timed 
out after 30
2013-10-01 19:57:44.461440 7fb6712d5780 librados: client.admin authentication 
error (110) Connection timed out
couldn't connect to cluster! error -110
ceph@cephclient01:/etc/ceph$


Some details from the client:

ceph@cephclient01:/etc/ceph$ pwd
/etc/ceph


ceph@cephclient01:/etc/ceph$ ls
ceph.client.admin.keyring  ceph.conf  keyring.radosgw.gateway


ceph@cephclient01:/etc/ceph$ cat ceph.conf
[global]
fsid = a45e6e54-70ef-4470-91db-2152965deec5
mon_initial_members = cephtest02, cephtest03, cephtest04
mon_host = 10.0.0.2,10.0.0.3,10.0.0.4
osd_journal_size = 1024
filestore_xattr_use_omap = true
auth_cluster_required = none #cephx
auth_service_required = none #cephx
auth_client_required = none #cephx

[client.radosgw.gateway]
host = cephtest06
keyring = /etc/ceph/keyring.radosgw.gateway
rgw_socket_path = /tmp/radosgw.sock
log_file = /var/log/ceph/radosgw.log


ceph@cephclient01:/etc/ceph$ ping cephtest06
PING cephtest06.jf.intel.com (10.23.37.175) 56(84) bytes of data.
64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=1 ttl=64 
time=0.216 ms
64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=2 ttl=64 
time=0.209 ms
^C
--- cephtest06.jf.intel.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.209/0.212/0.216/0.015 ms


ceph@cephclient01:/etc/ceph$ ping cephtest06.jf.intel.com
PING cephtest06.jf.intel.com (10.23.37.175) 56(84) bytes of data.
64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=1 ttl=64 
time=0.223 ms
64 bytes from cephtest06.jf.intel.com (10.23.37.175): icmp_req=2 ttl=64 
time=0.242 ms
^C
--- cephtest06.jf.intel.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.223/0.232/0.242/0.017 ms


I did try putting the client on the 10.0.0.x network to see if that would 
affect behavior but that just seemed to introduce a new problem:

ceph@cephclient01:/etc/ceph$ rados df
2013-10-01 21:37:29.439410 7f60d2a43700 failed to decode message of type 59 v1: 
buffer::end_of_buffer
2013-10-01 21:37:29.439583 7f60d4a47700 monclient: hunting for new mon

ceph@cephclient01:/etc/ceph$ ceph -m 10.0.0.2 -s
2013-10-01 21:37:42.341480 7f61eacd5700 monclient: hunting for new mon
2013-10-01 21:37:45.341024 7f61eacd5700 monclient: hunting for new mon
2013-10-01 21:37:45.343274 7f61eacd5700 monclient: hunting for new mon

ceph@cephclient01:/etc/ceph$ ceph health
2013-10-01 21:39:52.833560 mon - [health]
2013-10-01 21:39:52.834671 mon.0 - 'unparseable JSON health' (-22)
ceph@cephclient01:/etc/ceph$
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] failure starting radosgw after setting up object storage

2013-09-30 Thread Gruher, Joseph R


-Original Message-
From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
boun...@lists.ceph.com] On Behalf Of Gruher, Joseph R
Sent: Monday, September 30, 2013 10:27 AM
To: Yehuda Sadeh
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] failure starting radosgw after setting up object
storage
-Original Message-
From: Yehuda Sadeh [mailto:yeh...@inktank.com]
Sent: Friday, September 27, 2013 9:30 AM
To: Gruher, Joseph R
 ceph@cephtest06:/etc/ceph$ cat /var/log/ceph/radosgw.log

 2013-09-25 14:03:01.235760 7f713d79d780  0 ceph version 0.67.3
 (408cd61584c72c0d97b774b3d8f95c6b1b06341a), process radosgw, pid
 13187

 2013-09-25 14:03:01.235789 7f713d79d780 -1 WARNING: libcurl doesn't
 support
 curl_multi_wait()

 2013-09-25 14:03:01.235797 7f713d79d780 -1 WARNING: cross zone /
 region transfer performance may be affected

 2013-09-25 14:03:01.245786 7f713d79d780  0 librados:
 client.radosgw.gateway authentication error (1) Operation not
 permitted

 2013-09-25 14:03:01.246526 7f713d79d780 -1 Couldn't init storage
 provider
 (RADOS)


This means that the radosgw process cannot connect to the cluster due
to user / key set up. Make sure that the user for radosgw exists, and
that the ceph keyring file (on the radosgw side) has the correct credentials
set.


Yehuda


Thanks for the response.  I will look into these.  Is it possible you could 
provide
more detail on how to check these?  Sorry, still fairly new to Ceph (and object
storage in general).  Thanks!

I went back through the setup steps again, this time using this guide 
(http://ceph.com/docs/master/radosgw/manual-install/) instead of this guide 
(http://ceph.com/docs/next/start/quick-rgw/).  Now I can start radosgw on this 
OSD successfully.  I notice this guide has me install a radosgw-agent package, 
which was not installed before, and I wonder if this could be the difference.  
Should that package be installed to be able to start radosgw or should it not 
be required?  

I didn't make many other changes between the working and failing configuration. 
 The only other change I really made was to create a gateway user.  I had not 
done that step before.  In both guides that step is done after starting 
radosgw, so I wouldn't think that would have been the key to allowing it to 
start, unless the guides are both broken in that respect.

My other OSD still returns nothing when I try to start radosgw so not sure what 
the problem is there.

ceph@cephtest05:/etc/ceph$ sudo /etc/init.d/radosgw start
ceph@cephtest05:/etc/ceph$ sudo /etc/init.d/radosgw status
/usr/bin/radosgw is not running.
ceph@cephtest05:/etc/ceph$ sudo cat /var/log/ceph/radosgw.log
ceph@cephtest05:/etc/ceph$
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] failure starting radosgw after setting up object storage

2013-09-25 Thread Gruher, Joseph R
Hi all-

I am following the object storage quick start guide.  I have a cluster with two 
OSDs and have followed the steps on both.  Both are failing to start radosgw 
but each in a different manner.  All the previous steps in the quick start 
guide appeared to complete successfully.  Any tips on how to debug from here?  
Thanks!


OSD1:

ceph@cephtest05:/etc/ceph$ sudo /etc/init.d/radosgw start
ceph@cephtest05:/etc/ceph$

ceph@cephtest05:/etc/ceph$ sudo /etc/init.d/radosgw status
/usr/bin/radosgw is not running.
ceph@cephtest05:/etc/ceph$

ceph@cephtest05:/etc/ceph$ cat /var/log/ceph/radosgw.log
ceph@cephtest05:/etc/ceph$


OSD2:

ceph@cephtest06:/etc/ceph$ sudo /etc/init.d/radosgw start
Starting client.radosgw.gateway...
2013-09-25 14:03:01.235789 7f713d79d780 -1 WARNING: libcurl doesn't support 
curl_multi_wait()
2013-09-25 14:03:01.235797 7f713d79d780 -1 WARNING: cross zone / region 
transfer performance may be affected
ceph@cephtest06:/etc/ceph$

ceph@cephtest06:/etc/ceph$ sudo /etc/init.d/radosgw status
/usr/bin/radosgw is not running.
ceph@cephtest06:/etc/ceph$

ceph@cephtest06:/etc/ceph$ cat /var/log/ceph/radosgw.log
2013-09-25 14:03:01.235760 7f713d79d780  0 ceph version 0.67.3 
(408cd61584c72c0d97b774b3d8f95c6b1b06341a), process radosgw, pid 13187
2013-09-25 14:03:01.235789 7f713d79d780 -1 WARNING: libcurl doesn't support 
curl_multi_wait()
2013-09-25 14:03:01.235797 7f713d79d780 -1 WARNING: cross zone / region 
transfer performance may be affected
2013-09-25 14:03:01.245786 7f713d79d780  0 librados: client.radosgw.gateway 
authentication error (1) Operation not permitted
2013-09-25 14:03:01.246526 7f713d79d780 -1 Couldn't init storage provider 
(RADOS)
ceph@cephtest06:/etc/ceph$


For reference, I think cluster health is OK:

ceph@cephtest06:/etc/ceph$ sudo ceph status
  cluster a45e6e54-70ef-4470-91db-2152965deec5
   health HEALTH_WARN clock skew detected on mon.cephtest03, mon.cephtest04
   monmap e1: 3 mons at 
{cephtest02=10.0.0.2:6789/0,cephtest03=10.0.0.3:6789/0,cephtest04=10.0.0.4:6789/0},
 election epoch 6, quorum 0,1,2 cephtest02,cephtest03,cephtest04
   osdmap e9: 2 osds: 2 up, 2 in
pgmap v439: 192 pgs: 192 active+clean; 0 bytes data, 72548 KB used, 1998 GB 
/ 1999 GB avail
   mdsmap e1: 0/0/1 up

ceph@cephtest06:/etc/ceph$ sudo ceph health
HEALTH_WARN clock skew detected on mon.cephtest03, mon.cephtest04
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor deployment during quick start

2013-09-20 Thread Gruher, Joseph R
Sorry, not trying to repost or bump my thread, but I think I can restate my 
question here and for better clarity.  I am confused about the --cluster 
argument used when ceph-deploy mon create invokes ceph-mon on the target 
system.  I always get a failure at this point when running ceph-deploy mon 
create and this then halts the whole ceph quick start process.

Here is the line where ceph-deploy mon create fails:
[cephtest02][INFO  ] Running command: ceph-mon --cluster ceph --mkfs -i 
cephtest02 --keyring /var/lib/ceph/tmp/ceph-cephtest02.mon.keyring

Running the same command manually on the target system gives an error.  As far 
as I can tell from the man page and the built-in help and the website 
(http://ceph.com/docs/next/man/8/ceph-mon/) it seems --cluster is not a valid 
argument for ceph-mon?  Is this a problem in ceph-deploy?  Does this work for 
anyone else?

ceph@cephtest02:~$ sudo ceph-mon --cluster ceph --mkfs -i cephtest02 --keyring 
/var/lib/ceph/tmp/ceph-cephtest02.mon.keyring
too many arguments: [--cluster,ceph]
usage: ceph-mon -i monid [--mon-data=pathtodata] [flags]
  --debug_mon n
debug monitor level (e.g. 10)
  --mkfs
build fresh monitor fs
--conf/-cRead configuration from the given configuration file
-d   Run in foreground, log to stderr.
-f   Run in foreground, log to usual location.
--id/-i  set ID portion of my name
--name/-nset name (TYPE.ID)
--versionshow version and quit

   --debug_ms N
set message debug level (e.g. 1)
ceph@cephtest02:~$

Can anyone clarify if --cluster is a supported argument for ceph-mon?

Thanks!

Here's the more complete output from the admin system when this fails:

ceph@cephtest01:/my-cluster$ ceph-deploy --overwrite-conf mon create cephtest02
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts cephtest02
[ceph_deploy.mon][DEBUG ] detecting platform for host cephtest02 ...
[ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection with sudo
[ceph_deploy.mon][INFO  ] distro info: Ubuntu 12.04 precise
[cephtest02][DEBUG ] determining if provided host has same hostname in remote
[cephtest02][DEBUG ] deploying mon to cephtest02
[cephtest02][DEBUG ] remote hostname: cephtest02
[cephtest02][INFO  ] write cluster configuration to /etc/ceph/{cluster}.conf
[cephtest02][DEBUG ] checking for done path: 
/var/lib/ceph/mon/ceph-cephtest02/done
[cephtest02][DEBUG ] done path does not exist: 
/var/lib/ceph/mon/ceph-cephtest02/done
[cephtest02][INFO  ] creating keyring file: 
/var/lib/ceph/tmp/ceph-cephtest02.mon.keyring
[cephtest02][INFO  ] create the monitor keyring file
[cephtest02][INFO  ] Running command: ceph-mon --cluster ceph --mkfs -i 
cephtest02 --keyring /var/lib/ceph/tmp/ceph-cephtest02.mon.keyring
[cephtest02][ERROR ] Traceback (most recent call last):
[cephtest02][ERROR ]   File 
/usr/lib/python2.7/dist-packages/ceph_deploy/hosts/common.py, line 72, in 
mon_create
[cephtest02][ERROR ]   File 
/usr/lib/python2.7/dist-packages/ceph_deploy/util/decorators.py, line 10, in 
inner
[cephtest02][ERROR ]   File 
/usr/lib/python2.7/dist-packages/ceph_deploy/util/wrappers.py, line 6, in 
remote_call
[cephtest02][ERROR ]   File /usr/lib/python2.7/subprocess.py, line 511, in 
check_call
[cephtest02][ERROR ] raise CalledProcessError(retcode, cmd)
[cephtest02][ERROR ] CalledProcessError: Command '['ceph-mon', '--cluster', 
'ceph', '--mkfs', '-i', 'cephtest02', '--keyring', 
'/var/lib/ceph/tmp/ceph-cephtest02.mon.keyring']' returned non-zero exit status 
1
[cephtest02][INFO  ] --conf/-cRead configuration from the given 
configuration file
[cephtest02][INFO  ] -d   Run in foreground, log to stderr.
[cephtest02][INFO  ] -f   Run in foreground, log to usual location.
[cephtest02][INFO  ] --id/-i  set ID portion of my name
[cephtest02][INFO  ] --name/-nset name (TYPE.ID)
[cephtest02][INFO  ] --versionshow version and quit
[cephtest02][INFO  ]--debug_ms N
[cephtest02][INFO  ] set message debug level (e.g. 1)
[cephtest02][ERROR ] too many arguments: [--cluster,ceph]
[cephtest02][ERROR ] usage: ceph-mon -i monid [--mon-data=pathtodata] [flags]
[cephtest02][ERROR ]   --debug_mon n
[cephtest02][ERROR ] debug monitor level (e.g. 10)
[cephtest02][ERROR ]   --mkfs
[cephtest02][ERROR ] build fresh monitor fs
[ceph_deploy.mon][ERROR ] Failed to execute command: ceph-mon --cluster ceph 
--mkfs -i cephtest02 --keyring /var/lib/ceph/tmp/ceph-cephtest02.mon.keyring
[ceph_deploy][ERROR ] GenericError: Failed to create 1 monitors

ceph@cephtest01:/my-cluster$

-Joe

-Original Message-
From: Gruher, Joseph R
Sent: Thursday, September 19, 2013 11:14 AM
To: ceph-users@lists.ceph.com
Cc: Gruher, Joseph R
Subject: monitor deployment during quick start

Could someone make a quick clarification on the quick start guide for me?  On
this page: http://ceph.com/docs/next/start/quick-ceph-deploy/.  After I do
ceph

Re: [ceph-users] ceph-deploy not including sudo?

2013-09-19 Thread Gruher, Joseph R
-Original Message-
From: Alfredo Deza [mailto:alfredo.d...@inktank.com]

Can you try running ceph-deploy *without* sudo ?


Ah, OK, sure.  Without sudo I end up hung here again:

ceph@cephtest01:~$ ceph-deploy install cephtest03 cephtest04 cephtest05 
cephtest06
cut
[cephtest03][INFO  ] Running command: wget -q -O- 
'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key 
add -

BUT if I then add the --no-adjust-repos switch that was suggested we then 
finally run to completion!  

Thanks for the help!  On to the next step...
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] monitor deployment during quick start

2013-09-19 Thread Gruher, Joseph R
Could someone make a quick clarification on the quick start guide for me?  On 
this page: http://ceph.com/docs/next/start/quick-ceph-deploy/.  After I do 
ceph-deploy new to a system is that system then a monitor from that point 
forward?  Or do I then have to do ceph-deploy mon create to that same system 
before it is really a monitor?

Regardless of the combinations of systems I try I seem to get a failure at the 
add a monitor step.  Should this be a correct sequence?
ceph@cephtest01:~$ ceph-deploy new cephtest02
ceph@cephtest01:~$ ceph-deploy install --no-adjust-repos cephtest02 
cephtest03 cephtest04
ceph@cephtest01:~$ ceph-deploy mon create cephtest02

Here is the failure I get:

ceph@cephtest01:~$ ceph-deploy mon create cephtest02
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts cephtest02
[ceph_deploy.mon][DEBUG ] detecting platform for host cephtest02 ...
[ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection with sudo
[ceph_deploy.mon][INFO  ] distro info: Ubuntu 12.04 precise
[cephtest02][DEBUG ] determining if provided host has same hostname in remote
[cephtest02][DEBUG ] deploying mon to cephtest02
[cephtest02][DEBUG ] remote hostname: cephtest02
[cephtest02][INFO  ] write cluster configuration to /etc/ceph/{cluster}.conf
[cephtest02][DEBUG ] checking for done path: 
/var/lib/ceph/mon/ceph-cephtest02/done
[cephtest02][DEBUG ] done path does not exist: 
/var/lib/ceph/mon/ceph-cephtest02/done
[cephtest02][INFO  ] creating keyring file: 
/var/lib/ceph/tmp/ceph-cephtest02.mon.keyring
[cephtest02][INFO  ] create the monitor keyring file
[cephtest02][INFO  ] Running command: ceph-mon --cluster ceph --mkfs -i 
cephtest02 --keyring /var/lib/ceph/tmp/ceph-cephtest02.mon.keyring
[cephtest02][ERROR ] Traceback (most recent call last):
[cephtest02][ERROR ]   File 
/usr/lib/python2.7/dist-packages/ceph_deploy/hosts/common.py, line 72, in 
mon_create
[cephtest02][ERROR ]   File 
/usr/lib/python2.7/dist-packages/ceph_deploy/util/decorators.py, line 10, in 
inner
[cephtest02][ERROR ]   File 
/usr/lib/python2.7/dist-packages/ceph_deploy/util/wrappers.py, line 6, in 
remote_call
[cephtest02][ERROR ]   File /usr/lib/python2.7/subprocess.py, line 511, in 
check_call
[cephtest02][ERROR ] raise CalledProcessError(retcode, cmd)
[cephtest02][ERROR ] CalledProcessError: Command '['ceph-mon', '--cluster', 
'ceph', '--mkfs', '-i', 'cephtest02', '--keyring', 
'/var/lib/ceph/tmp/ceph-cephtest02.mon.keyring']' returned non-zero exit status 
1
[cephtest02][INFO  ] --conf/-cRead configuration from the given 
configuration file
[cephtest02][INFO  ] -d   Run in foreground, log to stderr.
[cephtest02][INFO  ] -f   Run in foreground, log to usual location.
[cephtest02][INFO  ] --id/-i  set ID portion of my name
[cephtest02][INFO  ] --name/-nset name (TYPE.ID)
[cephtest02][INFO  ] --versionshow version and quit
[cephtest02][INFO  ]--debug_ms N
[cephtest02][INFO  ] set message debug level (e.g. 1)
[cephtest02][ERROR ] too many arguments: [--cluster,ceph]
[cephtest02][ERROR ] usage: ceph-mon -i monid [--mon-data=pathtodata] [flags]
[cephtest02][ERROR ]   --debug_mon n
[cephtest02][ERROR ] debug monitor level (e.g. 10)
[cephtest02][ERROR ]   --mkfs
[cephtest02][ERROR ] build fresh monitor fs
[ceph_deploy.mon][ERROR ] Failed to execute command: ceph-mon --cluster ceph 
--mkfs -i cephtest02 --keyring /var/lib/ceph/tmp/ceph-cephtest02.mon.keyring
[ceph_deploy][ERROR ] GenericError: Failed to create 1 monitors


Trying to run the failing command myself:

ceph@cephtest01:~$ ssh cephtest02 sudo ceph-mon --cluster ceph --mkfs -i 
cephtest02 --keyring /var/lib/ceph/tmp/ceph-cephtest02.mon.keyring
--conf/-cRead configuration from the given configuration file
-d   Run in foreground, log to stderr.
-f   Run in foreground, log to usual location.
--id/-i  set ID portion of my name
--name/-nset name (TYPE.ID)
--versionshow version and quit

   --debug_ms N
set message debug level (e.g. 1)
too many arguments: [--cluster,ceph]
usage: ceph-mon -i monid [--mon-data=pathtodata] [flags]
  --debug_mon n
debug monitor level (e.g. 10)
  --mkfs
build fresh monitor fs


Not clear if I should be using the same system from ceph-deploy new for 
ceph-deploy mon but the same thing happens either way:

ceph@cephtest01:~$ ssh cephtest03 sudo ceph-mon --cluster ceph --mkfs -i 
cephtest02 --keyring /var/lib/ceph/tmp/ceph-cephtest02.mon.keyring
--conf/-cRead configuration from the given configuration file
-d   Run in foreground, log to stderr.
-f   Run in foreground, log to usual location.
--id/-i  set ID portion of my name
--name/-nset name (TYPE.ID)
--versionshow version and quit

   --debug_ms N
set message debug level (e.g. 1)
too many arguments: [--cluster,ceph]

Re: [ceph-users] OSD and Journal Files

2013-09-18 Thread Gruher, Joseph R


-Original Message-
From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
boun...@lists.ceph.com] On Behalf Of Mike Dawson
 
 you need to understand losing an SSD will cause
the loss of ALL of the OSDs which had their journal on the failed SSD.

First, you probably don't want RAID1 for the journal SSDs. It isn't 
particularly
needed for resiliency and certainly isn't beneficial from a throughput
perspective.

Sorry, can you clarify this further for me?  If losing the SSD would cause 
losing all the OSDs journaling on it why would you not want to RAID it?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] problem with ceph-deploy hanging

2013-09-18 Thread Gruher, Joseph R
-Original Message-
From: Alfredo Deza [mailto:alfredo.d...@inktank.com]

Again, in this next coming release, you will be able to tell
ceph-deploy to just install the packages without mangling your repos
(or installing keys)


Updated to new ceph-deploy release 1.2.6 today but I still see the hang at the 
same point.  Can you provide some more detail on your comment about running 
ceph-deploy without installing keys / mangling repos (install packages only)?  
How?  Thanks!!

joe@cephtest01:~$ su ceph
Password:
$ sudo ceph-deploy --version
1.2.6
$ sudo ceph-deploy -v install cephtest01 cephtest02 cephtest03
[ceph_deploy.install][DEBUG ] Installing stable version dumpling on cluster 
ceph hosts cephtest01 cephtest02 cephtest03
[ceph_deploy.install][DEBUG ] Detecting platform for host cephtest01 ...
[ceph_deploy.sudo_pushy][DEBUG ] will use a local connection without sudo
[ceph_deploy.install][INFO  ] Distro info: Ubuntu 12.04 precise
[cephtest01][INFO  ] installing ceph on cephtest01
[cephtest01][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive 
apt-get -q install --assume-yes ca-certificates
[cephtest01][INFO  ] Reading package lists...
[cephtest01][INFO  ] Building dependency tree...
[cephtest01][INFO  ] Reading state information...
[cephtest01][INFO  ] ca-certificates is already the newest version.
[cephtest01][INFO  ] 0 upgraded, 0 newly installed, 0 to remove and 2 not 
upgraded.
[cephtest01][INFO  ] Running command: wget -q -O- 
'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key 
add -

(system hangs here indefinitely)


As noted before, this command succeeds, so unclear why ceph-deploy is hanging...

$ wget -q -O- 
'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | sudo 
apt-key add -
OK
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy not including sudo?

2013-09-18 Thread Gruher, Joseph R
Using latest ceph-deploy:
ceph@cephtest01:/my-cluster$ sudo ceph-deploy --version
1.2.6

I get this failure:

ceph@cephtest01:/my-cluster$ sudo ceph-deploy install cephtest03 cephtest04 
cephtest05 cephtest06
[ceph_deploy.install][DEBUG ] Installing stable version dumpling on cluster 
ceph hosts cephtest03 cephtest04 cephtest05 cephtest06
[ceph_deploy.install][DEBUG ] Detecting platform for host cephtest03 ...
[ceph_deploy.sudo_pushy][DEBUG ] will use a remote connection without sudo
[ceph_deploy.install][INFO  ] Distro info: Ubuntu 12.04 precise
[cephtest03][INFO  ] installing ceph on cephtest03
[cephtest03][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive 
apt-get -q install --assume-yes ca-certificates
[cephtest03][ERROR ] Traceback (most recent call last):
[cephtest03][ERROR ]   File 
/usr/lib/python2.7/dist-packages/ceph_deploy/hosts/debian/install.py, line 
26, in install
[cephtest03][ERROR ]   File 
/usr/lib/python2.7/dist-packages/ceph_deploy/util/decorators.py, line 10, in 
inner
[cephtest03][ERROR ]   File 
/usr/lib/python2.7/dist-packages/ceph_deploy/util/wrappers.py, line 6, in 
remote_call
[cephtest03][ERROR ]   File /usr/lib/python2.7/subprocess.py, line 511, in 
check_call
[cephtest03][ERROR ] raise CalledProcessError(retcode, cmd)
[cephtest03][ERROR ] CalledProcessError: Command '['env', 
'DEBIAN_FRONTEND=noninteractive', 'apt-get', '-q', 'install', '--assume-yes', 
'ca-certificates']' returned non-zero exit status 100
[cephtest03][ERROR ] E: Could not open lock file /var/lib/dpkg/lock - open (13: 
Permission denied)
[cephtest03][ERROR ] E: Unable to lock the administration directory 
(/var/lib/dpkg/), are you root?
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env 
DEBIAN_FRONTEND=noninteractive apt-get -q install --assume-yes ca-certificates

This failure seems to imply ceph-deploy is not prefacing remote (SSH) commands 
to other systems with sudo?  For example this command as shown in the 
ceph-deploy output fails:

ceph@cephtest01:/my-cluster$ ssh cephtest03 env DEBIAN_FRONTEND=noninteractive 
apt-get -q install --assume-yes ca-certificates
E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?

But with the sudo added it works:

ceph@cephtest01:/my-cluster$ ssh cephtest03 sudo env 
DEBIAN_FRONTEND=noninteractive apt-get -q install --assume-yes ca-certificates
Reading package lists...
Building dependency tree...
Reading state information...
ca-certificates is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
ceph@cephtest01:/my-cluster$

Thanks,
Joe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] problem with ceph-deploy hanging

2013-09-17 Thread Gruher, Joseph R


-Original Message-
From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
boun...@lists.ceph.com] On Behalf Of Gilles Mocellin

So you can add something like this in all ceph nodes' /etc/sudoers (use
visudo) :

Defaults env_keep += http_proxy https_proxy ftp_proxy no_proxy

Hope it can help.


Thanks for the suggestion!  However, no effect on the problem from this change.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] problem with ceph-deploy hanging

2013-09-16 Thread Gruher, Joseph R
-Original Message-
From: Alfredo Deza [mailto:alfredo.d...@inktank.com]
Subject: Re: [ceph-users] problem with ceph-deploy hanging

ceph-deploy will use the user as you are currently executing. That is why, if
you are calling ceph-deploy as root, it will log in remotely as root.

So by a different user, I mean, something like, user `ceph` executing ceph-
deploy (yes, that same user needs to exist remotely too with correct
permissions)

This is interesting.  Since the preflight has us set up passwordless SSH with a 
default ceph user I assumed it didn't really matter what user I was logged in 
as on the admin system.  Good to know.

Unfortunately, logging in as my ceph user on the admin system (with a matching 
user on the target system) does not affect my result.  The ceph-deploy 
install still hangs here:

[cephtest02][INFO  ] Running command: wget -q -O- 
'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key 
add -

It has been suggested that this could be due to our firewall.  I have the 
proxies configured in /etc/environment and when I run a wget myself (as the 
ceph user, either directly on cephtest02 or via SSH command to cephtest02 from 
the admin system) it resolves the proxy and succeeds.  Is there any reason the 
wget might behave differently when run by ceph-deploy and fail to resolve the 
proxy?  Is there anywhere I might need to set proxy information besides 
/etc/environment?

Or, any other thoughts on how to debug this further?

Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] problem with ceph-deploy hanging

2013-09-15 Thread Gruher, Joseph R

From: Gruher, Joseph R
From: Alfredo Deza [mailto:alfredo.d...@inktank.com]
On Fri, Sep 13, 2013 at 5:06 PM, Gruher, Joseph R
joseph.r.gru...@intel.com wrote:

 root@cephtest01:~# ssh cephtest02 wget -q -O-
 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' |
 apt-key add -

 gpg: no valid OpenPGP data found.


This is clearly part of the problem. Can you try getting to this with
something other than wget (e.g. curl) ?

OK, I am seeing the problem here after turning off quiet mode on wget.  You
can see in the wget output that part of the URL is lost when executing the
command over SSH.  However, I'm still unsure how to fix this, I've tried a
number of ways of enclosing the command and this keeps happening.

SSH command leads to incomplete URL and returns web page (note URL
truncated at ceph.git):

root@cephtest01:~# ssh cephtest02 sudo wget -O-
'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc'
--2013-09-13 16:37:06--  https://ceph.com/git/?p=ceph.git

When run locally complete URL returns PGP key:

root@cephtest02:/# wget -O-
'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc'
--2013-09-13 16:37:30--
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

I was able to show the wget command does succeed if properly formatted (have to 
double-enclose in quotes as SSH strips the outer set) as does the apt-key add 
if prefaced with a sudo.

So, I'm still stuck on the problem of ceph deploy hanging at the point shown 
below.  Any tips on how to debug further?  Has anyone else experienced a 
similar problem?  Is it possible to enable any additional output from 
ceph-deploy?  Is there any documentation on how to deploy without using 
ceph-deploy install?  Thanks!

Here's where it hangs:

root@cephtest01:~# ceph-deploy install cephtest02 cephtest03 cephtest04 
[ceph_deploy.install][DEBUG ] Installing stable version dumpling on cluster 
ceph hosts cephtest02 cephtest03 cephtest04
[ceph_deploy.install][DEBUG ] Detecting platform for host cephtest02 ...
[ceph_deploy.install][INFO  ] Distro info: Ubuntu 12.04 precise
[cephtest02][INFO  ] installing ceph on cephtest02
[cephtest02][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive 
apt-get -q install --assume-yes ca-certificates
[cephtest02][INFO  ] Reading package lists...
[cephtest02][INFO  ] Building dependency tree...
[cephtest02][INFO  ] Reading state information...
[cephtest02][INFO  ] ca-certificates is already the newest version.
[cephtest02][INFO  ] 0 upgraded, 0 newly installed, 0 to remove and 4 not 
upgraded.
[cephtest02][INFO  ] Running command: wget -q -O- 
'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key 
add -

Here's the command it seems to be hanging on succeeding when manually run on 
the command line:

root@cephtest01:~# ssh cephtest02 wget -q -O- 
'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | sudo 
apt-key add -
OK
root@cephtest01:~#

Thanks,
Joe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] problem with ceph user

2013-09-13 Thread Gruher, Joseph R
Hello all-

I'm setting up a new Ceph cluster (my first time - just a lab experiment, not 
for production) by following the docs on the ceph.com website.  The preflight 
checklist went fine, I installed and updated Ubuntu 12.04.2, set up my user and 
set up passwordless SSH, etc.  I ran ceph-deploy new without any apparent 
issues.  However, when I run ceph-deploy install it hangs at this point:

[cephtest02][INFO  ] Running command: wget -q -O- 
'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key 
add -

It looks to me like it is failing on the apt-key add command.  If I log 
directly into the cephtest02 host as my ceph user and try to run apt-key add 
it fails:

$ apt-key add
ERROR: This command can only be used by root.

It works if I include a sudo:

$ sudo apt-key add
gpg: can't open `': No such file or directory

So I assume the problem is my ceph user doesn't have the right permissions?  I 
set up the ceph user by following the instructions in the preflight checklist 
(http://ceph.com/docs/master/start/quick-start-preflight/):

root@cephtest02:/# cat /etc/sudoers.d/ceph
ceph ALL = (root) NOPASSWD:ALL

root@cephtest02:/# ls -l /etc/sudoers.d/ceph
-r--r- 1 root root 31 Sep 12 15:45 /etc/sudoers.d/ceph

$ sudo -l
Matching Defaults entries for ceph on this host:
env_reset,

secure_path=/usr/local/sbin\:/usr/local/bin\:/usr/sbin\:/usr/bin\:/sbin\:/bin

User ceph may run the following commands on this host:
(root) NOPASSWD: ALL

Can anyone tell me where I'm going wrong here, or in general how to give the 
ceph user the appropriate permissions?  Or is this a ceph-deploy problem that 
it is not including the sudo?

Thanks,
Joe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] problem with ceph-deploy hanging

2013-09-13 Thread Gruher, Joseph R
 ago   Greg_Farnumcaching_policies_in   tree | snapshot
ReplicatedPG
2 days ago   Greg_FarnumObjecter:_follow_redirect commit | commitdiff |
replies_from_the_OSD  tree | snapshot
...
tags
4 days ago   v0.67.3   v0.67.3   tag | commit | shortlog | log
9 days ago   v0.68 v0.68 tag | commit | shortlog | log
2 weeks ago  v0.56.7   v0.56.7   tag | commit | shortlog | log
3 weeks ago  v0.67.2   v0.67.2   tag | commit | shortlog | log
3 weeks ago  v0.61.8   v0.61.8   tag | commit | shortlog | log
3 weeks ago  v0.67.1   v0.67.1   tag | commit | shortlog | log
4 weeks ago  v0.67 v0.67 tag | commit | shortlog | log
6 weeks ago  v0.67-rc3 v0.67-rc3 tag | commit | shortlog | log
7 weeks ago  v0.61.7   v0.61.7   tag | commit | shortlog | log
7 weeks ago  v0.67-rc2 v0.67-rc2 tag | commit | shortlog | log
7 weeks ago  v0.61.6   v0.61.6   tag | commit | shortlog | log
7 weeks ago  v0.67-rc1 v0.67-rc1 tag | commit | shortlog | log
8 weeks ago  v0.61.5   v0.61.5   tag | commit | shortlog | log
2 months ago v0.66 v0.66 tag | commit | shortlog | log
2 months ago v0.65 v0.65 tag | commit | shortlog | log
2 months ago v0.61.4   v0.61.4   tag | commit | shortlog | log
...
heads
22 min ago   remove-hadoop-shimshortlog | log | tree
3 hours ago  wip-5862  shortlog | log | tree
3 hours ago  next  shortlog | log | tree
4 hours ago  wip-6147  shortlog | log | tree
5 hours ago  mastershortlog | log | tree
5 hours ago  fix-no-tcmalloc-build shortlog | log | tree
6 hours ago  wip-5857-3shortlog | log | tree
23 hours ago wip-build-fixes   shortlog | log | tree
23 hours ago wip-6294  shortlog | log | tree
24 hours ago wip-centos-java   shortlog | log | tree
26 hours ago wip-remove-old-hadoop shortlog | log | tree
27 hours ago wip-6286-dumpling shortlog | log | tree
30 hours ago wip-6287  shortlog | log | tree
39 hours ago wip-6286  shortlog | log | tree
39 hours ago wip-misc  shortlog | log | tree
41 hours ago wip-6284  shortlog | log | tree
...
ceph.git
RSS Atom
root@cephtest01:~#

Is this URL wrong, or is the data at the URL incorrect?

Thanks,
Joe

From: Gruher, Joseph R
Sent: Friday, September 13, 2013 1:17 PM
To: ceph-users@lists.ceph.com
Cc: Gruher, Joseph R
Subject: problem with ceph user

Hello all-

I'm setting up a new Ceph cluster (my first time - just a lab experiment, not 
for production) by following the docs on the ceph.com website.  The preflight 
checklist went fine, I installed and updated Ubuntu 12.04.2, set up my user and 
set up passwordless SSH, etc.  I ran ceph-deploy new without any apparent 
issues.  However, when I run ceph-deploy install it hangs at this point:

[cephtest02][INFO  ] Running command: wget -q -O- 
'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key 
add -

It looks to me like it is failing on the apt-key add command.  If I log 
directly into the cephtest02 host as my ceph user and try to run apt-key add 
it fails:

$ apt-key add
ERROR: This command can only be used by root.

It works if I include a sudo:

$ sudo apt-key add
gpg: can't open `': No such file or directory

So I assume the problem is my ceph user doesn't have the right permissions?  I 
set up the ceph user by following the instructions in the preflight checklist 
(http://ceph.com/docs/master/start/quick-start-preflight/):

root@cephtest02:/# cat /etc/sudoers.d/ceph
ceph ALL = (root) NOPASSWD:ALL

root@cephtest02:/# ls -l /etc/sudoers.d/ceph
-r--r- 1 root root 31 Sep 12 15:45 /etc/sudoers.d/ceph

$ sudo -l
Matching Defaults entries for ceph on this host:
env_reset,

secure_path=/usr/local/sbin\:/usr/local/bin\:/usr/sbin\:/usr/bin\:/sbin\:/bin

User ceph may run the following commands on this host:
(root) NOPASSWD: ALL

Can anyone tell me where I'm going wrong here, or in general how to give the 
ceph user the appropriate permissions?  Or is this a ceph-deploy problem that 
it is not including the sudo?

Thanks,
Joe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] problem with ceph-deploy hanging

2013-09-13 Thread Gruher, Joseph R


-Original Message-
From: Alfredo Deza [mailto:alfredo.d...@inktank.com]
Sent: Friday, September 13, 2013 3:17 PM
To: Gruher, Joseph R
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] problem with ceph-deploy hanging

On Fri, Sep 13, 2013 at 5:06 PM, Gruher, Joseph R
joseph.r.gru...@intel.com wrote:

 root@cephtest01:~# ssh cephtest02 wget -q -O-
 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' |
 apt-key add -

 gpg: no valid OpenPGP data found.


This is clearly part of the problem. Can you try getting to this with something
other than wget (e.g. curl) ?

OK, I am seeing the problem here after turning off quiet mode on wget.  You can 
see in the wget output that part of the URL is lost when executing the command 
over SSH.  However, I'm still unsure how to fix this, I've tried a number of 
ways of enclosing the command and this keeps happening.

SSH command leads to incomplete URL and returns web page (note URL truncated at 
ceph.git):

root@cephtest01:~# ssh cephtest02 sudo wget -O- 
'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc'
--2013-09-13 16:37:06--  https://ceph.com/git/?p=ceph.git

When run locally complete URL returns PGP key:

root@cephtest02:/# wget -O- 
'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc'
--2013-09-13 16:37:30--  
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com