[ceph-users] failed to populate the monitor daemon(s) with the monitor map and keyring.

2014-06-09 Thread jiangdahui
My PC had problems to quick install,so I followed the Installation (Manual) 
guide, but when I was at the step populate the monitor daemon(s) with the 
monitor map and keyring., error occured and the print is :
IO error: /var/lib/ceph/mon/ceph-node1/store.db/LOCK: No such file or directory
ceph-mon: error opening mon data directory at '/var/lib/ceph/mon/ceph-node1': 
(22) Invalid argument



expecting your help,thanks!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd: add failed: (34) Numerical result out of range

2014-06-09 Thread lists+ceph

I was building a small test cluster and noticed a difference with trying
to rbd map depending on whether the cluster was built using fedora or
CentOS.

When I used CentOS osds, and tried to rbd map from arch linux or fedora,
I would get rbd: add failed: (34) Numerical result out of range.  It
seemed to happen when the tool was writing to /sys/bus/rbd/add_single_major.

If I rebuild the osds using fedora (20 in this case), everything
works fine.

In each scenario, I used ceph-0.80.1 on all the boxes.

Is that expected?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd: add failed: (34) Numerical result out of range

2014-06-09 Thread Ilya Dryomov
On Mon, Jun 9, 2014 at 11:48 AM,  lists+c...@deksai.com wrote:
 I was building a small test cluster and noticed a difference with trying
 to rbd map depending on whether the cluster was built using fedora or
 CentOS.

 When I used CentOS osds, and tried to rbd map from arch linux or fedora,
 I would get rbd: add failed: (34) Numerical result out of range.  It
 seemed to happen when the tool was writing to /sys/bus/rbd/add_single_major.

 If I rebuild the osds using fedora (20 in this case), everything
 works fine.

 In each scenario, I used ceph-0.80.1 on all the boxes.

 Is that expected?

No, it's most certainly not expected.  If you are willing to help debug this,
let's start with the output of 'rbd info'.  Return to the failing setup, do
'rbd map image', make sure it fails, and, on the same box, do 'rbd info
image'.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] add new data host

2014-06-09 Thread Ta Ba Tuan

Hi all,

I adding a new ceph-data host, but

#ceph -s -k /etc/ceph/ceph.client.admin.keyring

2014-06-09 17:39:51.686082 7fade4f14700  0 librados: client.admin 
authentication error (1) Operation not permitted

Error connecting to cluster: PermissionError

my ceph.conf:

[global]
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
keyring = /etc/ceph/ceph.client.admin.keyring

any suggest ?
Thanks all
--
TABA





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] add new data host

2014-06-09 Thread Ta Ba Tuan

i solved this by export key from ceph auth export... :D
above question, i use key with old format version.


On 06/09/2014 05:44 PM, Ta Ba Tuan wrote:

Hi all,

I adding a new ceph-data host, but

#ceph -s -k /etc/ceph/ceph.client.admin.keyring

2014-06-09 17:39:51.686082 7fade4f14700  0 librados: client.admin 
authentication error (1) Operation not permitted

Error connecting to cluster: PermissionError

my ceph.conf:

[global]
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
keyring = /etc/ceph/ceph.client.admin.keyring

any suggest ?
Thanks all
--
TABA





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd snap protect error

2014-06-09 Thread Ignazio Cassano
Hi all,
I installed cep firefly and now I am playing with rbd snapshot.
I created a pool (libvirt-pool) with two images:

libvirtimage1 (format 1)
image2 (format 2).

When I try to protect the first image:

rbd --pool libvirt-pool snap protect --image libvirtimage1 --snap
libvirt-snap

it gives me an error because the image is in format 1:

image must support layering.

This is correct because libvirtimage1 is in format 1.

But If I try with the second image:
rbd --pool libvirt-pool snap protect --image image2  --snap image2-snap

it gives the following:

snap failed (2) No such file or directory


Image2 exists infact I can see it :

rbd -p libvirt-pool ls

libvirtimage1
image2


Could someone help me, please ?

Regards
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd snap protect error

2014-06-09 Thread Ilya Dryomov
On Mon, Jun 9, 2014 at 3:01 PM, Ignazio Cassano
ignaziocass...@gmail.com wrote:
 Hi all,
 I installed cep firefly and now I am playing with rbd snapshot.
 I created a pool (libvirt-pool) with two images:

 libvirtimage1 (format 1)
 image2 (format 2).

 When I try to protect the first image:

 rbd --pool libvirt-pool snap protect --image libvirtimage1 --snap
 libvirt-snap

 it gives me an error because the image is in format 1:

 image must support layering.

 This is correct because libvirtimage1 is in format 1.

 But If I try with the second image:
 rbd --pool libvirt-pool snap protect --image image2  --snap image2-snap

 it gives the following:

 snap failed (2) No such file or directory


 Image2 exists infact I can see it :

 rbd -p libvirt-pool ls

 libvirtimage1
 image2


 Could someone help me, please ?

You have to create the snapshot first:

rbd --pool libvirt-pool snap create --image image2  --snap image2-snap
rbd --pool libvirt-pool snap protect --image image2  --snap image2-snap

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd snap protect error

2014-06-09 Thread Ignazio Cassano
Many thanks...
Can I create a format 2 image (with support for linear snapshot)  using
qemu-img command ?


2014-06-09 13:05 GMT+02:00 Ilya Dryomov ilya.dryo...@inktank.com:

 On Mon, Jun 9, 2014 at 3:01 PM, Ignazio Cassano
 ignaziocass...@gmail.com wrote:
  Hi all,
  I installed cep firefly and now I am playing with rbd snapshot.
  I created a pool (libvirt-pool) with two images:
 
  libvirtimage1 (format 1)
  image2 (format 2).
 
  When I try to protect the first image:
 
  rbd --pool libvirt-pool snap protect --image libvirtimage1 --snap
  libvirt-snap
 
  it gives me an error because the image is in format 1:
 
  image must support layering.
 
  This is correct because libvirtimage1 is in format 1.
 
  But If I try with the second image:
  rbd --pool libvirt-pool snap protect --image image2  --snap image2-snap
 
  it gives the following:
 
  snap failed (2) No such file or directory
 
 
  Image2 exists infact I can see it :
 
  rbd -p libvirt-pool ls
 
  libvirtimage1
  image2
 
 
  Could someone help me, please ?

 You have to create the snapshot first:

 rbd --pool libvirt-pool snap create --image image2  --snap image2-snap
 rbd --pool libvirt-pool snap protect --image image2  --snap image2-snap

 Thanks,

 Ilya

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd snap protect error

2014-06-09 Thread Wido den Hollander

On 06/09/2014 02:00 PM, Ignazio Cassano wrote:

Many thanks...
Can I create a format 2 image (with support for linear snapshot)  using
qemu-img command ?


Yes:

qemu-img create -f raw rbd:rbd/image1:rbd_default_format=2 10G

'rbd_default_format' is a Ceph setting which is passed down to librbd 
directly.


Wido




2014-06-09 13:05 GMT+02:00 Ilya Dryomov ilya.dryo...@inktank.com
mailto:ilya.dryo...@inktank.com:

On Mon, Jun 9, 2014 at 3:01 PM, Ignazio Cassano
ignaziocass...@gmail.com mailto:ignaziocass...@gmail.com wrote:
  Hi all,
  I installed cep firefly and now I am playing with rbd snapshot.
  I created a pool (libvirt-pool) with two images:
 
  libvirtimage1 (format 1)
  image2 (format 2).
 
  When I try to protect the first image:
 
  rbd --pool libvirt-pool snap protect --image libvirtimage1 --snap
  libvirt-snap
 
  it gives me an error because the image is in format 1:
 
  image must support layering.
 
  This is correct because libvirtimage1 is in format 1.
 
  But If I try with the second image:
  rbd --pool libvirt-pool snap protect --image image2  --snap
image2-snap
 
  it gives the following:
 
  snap failed (2) No such file or directory
 
 
  Image2 exists infact I can see it :
 
  rbd -p libvirt-pool ls
 
  libvirtimage1
  image2
 
 
  Could someone help me, please ?

You have to create the snapshot first:

rbd --pool libvirt-pool snap create --image image2  --snap image2-snap
rbd --pool libvirt-pool snap protect --image image2  --snap image2-snap

Thanks,

 Ilya




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd snap protect error

2014-06-09 Thread Ignazio Cassano
Many thanks


2014-06-09 14:04 GMT+02:00 Wido den Hollander w...@42on.com:

 On 06/09/2014 02:00 PM, Ignazio Cassano wrote:

 Many thanks...
 Can I create a format 2 image (with support for linear snapshot)  using
 qemu-img command ?


 Yes:

 qemu-img create -f raw rbd:rbd/image1:rbd_default_format=2 10G

 'rbd_default_format' is a Ceph setting which is passed down to librbd
 directly.

 Wido



 2014-06-09 13:05 GMT+02:00 Ilya Dryomov ilya.dryo...@inktank.com
 mailto:ilya.dryo...@inktank.com:


 On Mon, Jun 9, 2014 at 3:01 PM, Ignazio Cassano
 ignaziocass...@gmail.com mailto:ignaziocass...@gmail.com wrote:
   Hi all,
   I installed cep firefly and now I am playing with rbd snapshot.
   I created a pool (libvirt-pool) with two images:
  
   libvirtimage1 (format 1)
   image2 (format 2).
  
   When I try to protect the first image:
  
   rbd --pool libvirt-pool snap protect --image libvirtimage1 --snap
   libvirt-snap
  
   it gives me an error because the image is in format 1:
  
   image must support layering.
  
   This is correct because libvirtimage1 is in format 1.
  
   But If I try with the second image:
   rbd --pool libvirt-pool snap protect --image image2  --snap
 image2-snap
  
   it gives the following:
  
   snap failed (2) No such file or directory
  
  
   Image2 exists infact I can see it :
  
   rbd -p libvirt-pool ls
  
   libvirtimage1
   image2
  
  
   Could someone help me, please ?

 You have to create the snapshot first:

 rbd --pool libvirt-pool snap create --image image2  --snap image2-snap
 rbd --pool libvirt-pool snap protect --image image2  --snap
 image2-snap

 Thanks,

  Ilya




 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Wido den Hollander
 42on B.V.
 Ceph trainer and consultant

 Phone: +31 (0)20 700 9902
 Skype: contact42on
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recommended way to use Ceph as storage for file server

2014-06-09 Thread John-Paul Robinson
We have an NFS to RBD gateway with a large number of smaller RBDs.  In
our use case we are allowing users to request their own RBD containers
that are then served up via NFS into a mixed cluster of clients.Our
gateway is quite beefy, probably more than it needs to be, 2x8 core
cpus  and 96GB ram.  It has been pressed into this service, drawn from a
pool homogeneous servers rather then being spec'd out for this role
explicitly (it could likely be less beefy).  It has performed well.  Our
RBD nodes connected via  2x10GB nics in a transmit-load-balance config.

The server has performed well in this role.  It could just be the
specs.  An individual RBD in this NFS gateway won't see the parallel
performance advantages that CephFS promises, however, one potential
advantage is that a multi-RBD backend will be able to simultaneously
manage NFS client requests isolated to different RBD.   One RBD may
still get a heavy load but it at least the server as a whole has the
potential to spread requests across different devices. 

I haven't done load comparisons so this is just a point of interest. 
It's probably moot if the kernel doesn't do a good job of spreading NFS
load across threads or there is some other kernel/RBD constriction point.

~jpr

On 06/02/2014 12:35 PM, Dimitri Maziuk wrote:
 A more or less obvious alternative for CephFS would be to simply create
  a huge RBD and have a separate file server (running NFS / Samba /
  whatever) use that block device as backend. Just put a regular FS on top
  of the RBD and use it that way.
  Clients wouldn't really have any of the real performance and resilience
  benefits that Ceph could offer though, because the (single machine?)
  file server is now the bottleneck.
 Performance: assuming all your nodes are fast storage on a quad-10Gb
 pipe. Resilience: your gateway can be an active-passive HA pair, that
 shouldn't be any different from NFS+DRBD setups.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy 1.5.4 (addressing packages coming from EPEL)

2014-06-09 Thread Alfredo Deza
Hi All,

We've experienced a lot of issues since EPEL started packaging a
0.80.1-2 version that YUM
will see as higher than 0.80.1 and therefore will choose to install
the EPEL one.

That package has some issues from what we have seen and in most cases
will break the installation
process.

There is a new version of ceph-deploy (1.5.4) that addresses this
problem by setting the priorities
so that the ceph.repo will be considered before the EPEL one.

Some improvements where done for how ceph-deploy parses
cephdeploy.conf files so that priorities
can be correctly set (and honored) from there as well.

The changelog with the details of this release can be found here:

http://ceph.com/ceph-deploy/docs/changelog.html#id1

Make sure you update!


-Alfredo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Teuthology: Need help on Lock server setup running schedule_suite.sh

2014-06-09 Thread Rajesh Raman
Hi,

I am trying to run schedule_suite.sh on our custom Ceph build for leveraging 
InkTank suites in our testing. Can someone help me in using this shell script, 
where I can provide my own targets instead of the script picking from Ceph lab? 
Also kindly let me know if anyone has setup a lock server for this script to 
run. If yes, please share the details on how to setup the lock server.

Thanks and Regards,
Rajesh Raman



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy 1.5.4 (addressing packages coming from EPEL)

2014-06-09 Thread Karan Singh
Thanks Alfredo , happy to see your email.

I was a victim of this problem , hope 1.5.4 will take away my pain :-)


- Karan Sing -

On 09 Jun 2014, at 15:33, Alfredo Deza alfredo.d...@inktank.com wrote:

 http://ceph.com/ceph-deploy/docs/changelog.html#id1

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD keyrings shifted and down

2014-06-09 Thread Jimmy Lu
More detail to this. I recently upgraded my Ceph cluster from Emperor to 
Firefly. After the upgrade had been done, I noticed 1 of the OSD not coming 
back to life. While in the process of troubleshooting, rebooted the osd server 
and the keyring shifted.

My $ENV.

4x OSD servers (each has 12, 1 for root and 11 for OSD)
1x mon + mds + admin for ceph-deploy

Hopefully someone out there experience similar situation, if you do, please 
share your fixes.

Thanks,
Jimmy

From: J L j...@yahoo-inc.commailto:j...@yahoo-inc.com
Date: Friday, June 6, 2014 at 1:13 PM
To: J L j...@yahoo-inc.commailto:j...@yahoo-inc.com, 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Subject: Re: [ceph-users] OSD keyrings shifted and down

Has anyone run into this issue and would like to provide any troubleshooting 
tip?

Thanks,
Jimmy

From: J L j...@yahoo-inc.commailto:j...@yahoo-inc.com
Date: Thursday, June 5, 2014 at 4:20 PM
To: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com 
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Subject: [ceph-users] OSD keyrings shifted and down


Hello Ceph Guru,


I rebooted osd server to fix “osd.33”. When the server came back online, I 
noticed all the osd are down, while I am troubleshooting and restarting the 
osd, I got below error for authentication. I also noticed the “keyring” for 
each osd had shifted. For example, for osd.33 which mapped to 
/var/lib/ceph/osd/ceph-33, its keyring should be mapped to [osd.33], in this 
case it mapped to [osd.34].


Can I just simply change the osd.# in the keyring to correct the mapping or is 
there proper for the fix? Please help.


Thanks in advance!!


-Jimmy




[root@gfsnode1 ceph-34]# service ceph start osd.34

=== osd.34 ===

2014-06-05 15:08:54.053958 7f08f2b47700  0 librados: osd.34 authentication 
error (1) Operation not permitted

Error connecting to cluster: PermissionError

failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.34 
--keyring=/var/lib/ceph/osd/ceph-34/keyring osd crush create-or-move -- 34 2.73 
host=gfsnode1 root=default'

[root@gfsnode1 ceph-34]#



[root@gfsnode1 osd]# ls -l

total 0

lrwxrwxrwx 1 root root 12 May 15 15:21 ceph-33 - /ceph/osd120

lrwxrwxrwx 1 root root 12 May 15 15:22 ceph-34 - /ceph/osd121

lrwxrwxrwx 1 root root 12 May 15 15:23 ceph-35 - /ceph/osd122

lrwxrwxrwx 1 root root 12 May 15 15:24 ceph-36 - /ceph/osd123

lrwxrwxrwx 1 root root 12 May 15 15:24 ceph-37 - /ceph/osd124

lrwxrwxrwx 1 root root 12 May 15 15:25 ceph-38 - /ceph/osd125

lrwxrwxrwx 1 root root 12 May 15 15:25 ceph-39 - /ceph/osd126

lrwxrwxrwx 1 root root 12 May 15 15:26 ceph-40 - /ceph/osd127

lrwxrwxrwx 1 root root 12 May 15 15:27 ceph-41 - /ceph/osd128

lrwxrwxrwx 1 root root 12 May 15 15:27 ceph-42 - /ceph/osd129

lrwxrwxrwx 1 root root 12 May 15 15:28 ceph-43 - /ceph/osd130

[root@gfsnode1 osd]# cat ceph-33/keyring

[osd.34]

key = AQAwPnVT6G7fBRAA86D4FuxN0U8uKXk0brPbCQ==

[root@gfsnode1 osd]# cat ceph-34/keyring

[osd.35]

key = AQBbPnVTmG4BLxAA6UV6XHbZepXUEXB6VJQzEA==

[root@gfsnode1 osd]# cat ceph-35/keyring

[osd.36]

key = AQCDPnVTuL97JRAA1soDHToJ1c6WhXX+mnnRPw==

[root@gfsnode1 osd]# cat ceph-36/keyring

[osd.37]

key = AQCwPnVTYAttNhAAomeRalOEHWlyO7C9tF+7SQ==

[root@gfsnode1 osd]# cat ceph-37/keyring

[osd.38]

key = AQDKPnVTQC1DLBAAl0959S0st+UcFw8uOppa7g==

[root@gfsnode1 osd]# cat ceph-38/keyring

[osd.39]

key = AQDjPnVTMFGwNxAABH5M1Y8uXoqecPesS09IGw==

[root@gfsnode1 osd]# cat ceph-39/keyring

[osd.40]

key = AQChQXVT6JHiBxAAohTnBGxb2ZAbgCjt5M0xBw==

[root@gfsnode1 osd]# cat ceph-40/keyring

[osd.41]

key = AQBGP3VTAHI0CRAAZkcUPLOFT1jx9v3DVNX4nQ==

[root@gfsnode1 osd]# cat ceph-41/keyring

[osd.42]

key = AQAEsIdTMBTjChAAfJrsqIEBcCGEXv0jcK2vtQ==

[root@gfsnode1 osd]# cat ceph-42/keyring

[osd.43]

key = AQB6P3VT2KW7ORAAU+1Ix/fUXIBU8jky0BQ9jw==

[root@gfsnode1 osd]# cat ceph-43/keyring

cat: ceph-43/keyring: No such file or directory

[root@gfsnode1 osd]#
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] perplexed by unmapped groups on fresh firefly install

2014-06-09 Thread John Wilkins
Miki,

osd crush chooseleaf type is set to 1 by default, which means that it looks
to peer with placement groups on another node, not the same node. You would
need to set that to 0 for a 1-node cluster.

John


On Sun, Jun 8, 2014 at 10:40 PM, Miki Habryn dic...@rcpt.to wrote:

 I set up a single-node, dual-osd cluster following the Quick Start on
 ceph.com with Firefly packages, adding osd pool default size = 2.
 All of the pgs came up in active+remapped or active+degraded status. I
 read up on tunables and set them to optimal, to no result, so I added
 a third osd instead. About 39 pgs moved to active status, but the rest
 stayed in active+remapped or active+degraded. When I raised the
 replication level to 3 with ceph osd pool set ... size 3, all the
 pgs went back to degraded or remapped. Just for kicks, I tried to set
 the replication level to 1, and I still only got 39 pgs active. Is
 there something obvious I'm doing wrong?

 m.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] failed assertion on AuthMonitor

2014-06-09 Thread Gregory Farnum
Barring a newly-introduced bug (doubtful), that assert basically means
that your computer lied to the ceph monitor about the durability or
ordering of data going to disk, and the store is now inconsistent. If
you don't have data you care about on the cluster, by far your best
option is:
1) Figure out what part of the system is lying about data durability
(probably your filesystem or controller is ignoring barriers),
2) start the Ceph install over
It's possible that the ceph-monstore-tool will let you edit the store
back into a consistent state, but it looks like the system can't find
the *initial* commit, which means you'll need to manufacture a new one
wholesale with the right keys from the other system components.

(I am assuming that the system didn't crash right while you were
turning on the monitor for the first time; if it did that makes it
slightly more likely to be a bug on our end, but again it'll be
easiest to just start over since you don't have any data in it yet.)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Sun, Jun 8, 2014 at 10:26 PM, Mohammad Salehe sal...@gmail.com wrote:
 Hi,

 I'm receiving failed assertion in AuthMonitor::update_from_paxos(bool*)
 after a system crash. I've saved a complete monitor log with 10/20 for 'mon'
 and 'paxos' here.
 There is only one monitor and two OSDs in the cluster as I was just at the
 beginning of deployment.

 I will be thankful if someone could help.

 --
 Mohammad Salehe
 sal...@gmail.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to avoid deep-scrubbing performance hit?

2014-06-09 Thread Craig Lewis
I've correlated a large deep scrubbing operation to cluster stability
problems.

My primary cluster does a small amount of deep scrubs all the time, spread
out over the whole week.  It has no stability problems.

My secondary cluster doesn't spread them out.  It saves them up, and tries
to do all of the deep scrubs over the weekend.  The secondary starts
loosing OSDs about an hour after these deep scrubs start.

To avoid this, I'm thinking of writing a script that continuously scrubs
the oldest outstanding PG.  In psuedo-bash:
# Sort by the deep-scrub timestamp, taking the single oldest PG
while ceph pg dump | awk '$1 ~ /[0-9a-f]+\.[0-9a-f]+/ {print $20, $21, $1}'
| sort | head -1 | read date time pg
 do
  ceph pg deep-scrub ${pg}
  while ceph status | grep scrubbing+deep
   do
sleep 5
  done
  sleep 30
done


Does anybody think this will solve my problem?

I'm also considering disabling deep-scrubbing until the secondary finishes
replicating from the primary.  Once it's caught up, the write load should
drop enough that opportunistic deep scrubs should have a chance to run.  It
should only take another week or two to catch up.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to avoid deep-scrubbing performance hit?

2014-06-09 Thread Gregory Farnum
On Mon, Jun 9, 2014 at 3:22 PM, Craig Lewis cle...@centraldesktop.com wrote:
 I've correlated a large deep scrubbing operation to cluster stability
 problems.

 My primary cluster does a small amount of deep scrubs all the time, spread
 out over the whole week.  It has no stability problems.

 My secondary cluster doesn't spread them out.  It saves them up, and tries
 to do all of the deep scrubs over the weekend.  The secondary starts loosing
 OSDs about an hour after these deep scrubs start.

 To avoid this, I'm thinking of writing a script that continuously scrubs the
 oldest outstanding PG.  In psuedo-bash:
 # Sort by the deep-scrub timestamp, taking the single oldest PG
 while ceph pg dump | awk '$1 ~ /[0-9a-f]+\.[0-9a-f]+/ {print $20, $21, $1}'
 | sort | head -1 | read date time pg
  do
   ceph pg deep-scrub ${pg}
   while ceph status | grep scrubbing+deep
do
 sleep 5
   done
   sleep 30
 done


 Does anybody think this will solve my problem?

 I'm also considering disabling deep-scrubbing until the secondary finishes
 replicating from the primary.  Once it's caught up, the write load should
 drop enough that opportunistic deep scrubs should have a chance to run.  It
 should only take another week or two to catch up.

If the problem is just that your secondary cluster is under a heavy
write load, and so the scrubbing won't run automatically until the PGs
hit their time limit, maybe it's appropriate to change the limits so
they can run earlier. You can bump up osd scrub load threshold.
Or maybe that would be a terrible thing to do, not sure. But it sounds
like the cluster is just skipping the voluntary scrubs, and then they
all come due at once (probably from some earlier event).
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to avoid deep-scrubbing performance hit?

2014-06-09 Thread Mike Dawson

Craig,

I've struggled with the same issue for quite a while. If your i/o is 
similar to mine, I believe you are on the right track. For the past 
month or so, I have been running this cronjob:


* * * * *   for strPg in `ceph pg dump | egrep 
'^[0-9]\.[0-9a-f]{1,4}' | sort -k20 | awk '{ print $1 }' | head -2`; do 
ceph pg deep-scrub $strPg; done


That roughly handles my 20672 PGs that are set to be deep-scrubbed every 
7 days. Your script may be a bit better, but this quick and dirty method 
has helped my cluster maintain more consistency.


The real key for me is to avoid the clumpiness I have observed without 
that hack where concurrent deep-scrubs sit at zero for a long period of 
time (despite having PGs that were months overdue for a deep-scrub), 
then concurrent deep-scrubs suddenly spike up and stay in the teens for 
hours, killing client writes/second.


The scrubbing behavior table[0] indicates that a periodic tick initiates 
scrubs on a per-PG basis. Perhaps the timing of ticks aren't 
sufficiently randomized when you restart lots of OSDs concurrently (for 
instance via pdsh).


On my cluster I suffer a significant drag on client writes/second when I 
exceed perhaps four or five concurrent PGs in deep-scrub. When 
concurrent deep-scrubs get into the teens, I get a massive drop in 
client writes/second.


Greg, is there locking involved when a PG enters deep-scrub? If so, is 
the entire PG locked for the duration or is each individual object 
inside the PG locked as it is processed? Some of my PGs will be in 
deep-scrub for minutes at a time.


0: http://ceph.com/docs/master/dev/osd_internals/scrub/

Thanks,
Mike Dawson


On 6/9/2014 6:22 PM, Craig Lewis wrote:

I've correlated a large deep scrubbing operation to cluster stability
problems.

My primary cluster does a small amount of deep scrubs all the time,
spread out over the whole week.  It has no stability problems.

My secondary cluster doesn't spread them out.  It saves them up, and
tries to do all of the deep scrubs over the weekend.  The secondary
starts loosing OSDs about an hour after these deep scrubs start.

To avoid this, I'm thinking of writing a script that continuously scrubs
the oldest outstanding PG.  In psuedo-bash:
# Sort by the deep-scrub timestamp, taking the single oldest PG
while ceph pg dump | awk '$1 ~ /[0-9a-f]+\.[0-9a-f]+/ {print $20, $21,
$1}' | sort | head -1 | read date time pg
  do
   ceph pg deep-scrub ${pg}
   while ceph status | grep scrubbing+deep
do
 sleep 5
   done
   sleep 30
done


Does anybody think this will solve my problem?

I'm also considering disabling deep-scrubbing until the secondary
finishes replicating from the primary.  Once it's caught up, the write
load should drop enough that opportunistic deep scrubs should have a
chance to run.  It should only take another week or two to catch up.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fail to Block Devices and OpenStack

2014-06-09 Thread 山下 良民
Hi,

I fail for the cooperation of Openstack and Ceph.
I was set on the basis of the url.
http://ceph.com/docs/next/rbd/rbd-openstack/

Can look at the state of cephcluster from Openstack(cephClient)
Failure occurs at cinder create

Ceph Cluster:
CentOS release 6.5
Ceph 0.80.1

OpenStack:
Ubuntu 12.04.4
OpenStack DevStack Icehouse

# glance image-create --name cirros --disk-format raw --container-format ovf 
--file /usr/local/src/cirros-0.3.2-x86_64-disk.raw --is-public True 
   
+--+--+
| Property | Value|
+--+--+
| checksum | cf2392db1f59d59ed69a8f8491b670e0 |
| container_format | ovf  |
| created_at   | 2014-06-09T05:04:48  |
| deleted  | False|
| deleted_at   | None |
| disk_format  | raw  |
| id   | f4a0f971-437b-4d3f-a0c4-1c82f31e9f1e |
| is_public| True |
| min_disk | 0|
| min_ram  | 0|
| name | cirros   |
| owner| 5a10a1fed82b45a7affaf57f814434bb |
| protected| False|
| size | 41126400 |
| status   | active   |
| updated_at   | 2014-06-09T05:04:50  |
| virtual_size | None |
+--+--+


# cinder create --image-id f4a0f971-437b-4d3f-a0c4-1c82f31e9f1e --display-name 
boot-from-rbd 1
++--+
|Property|Value |
++--+
|  attachments   |  []  |
|   availability_zone| nova |
|bootable|false |
|   created_at   |  2014-06-09T05:12:51.00  |
|  description   | None |
|   encrypted|False |
|   id   | 30d1eee7-54d6-4911-af06-b35d2f8ef0c4 |
|metadata|  {}  |
|  name  |boot-from-rbd |
| os-vol-host-attr:host  | None |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
|  os-vol-tenant-attr:tenant_id  |   5a10a1fed82b45a7affaf57f814434bb   |
|  size  |  1   |
|  snapshot_id   | None |
|  source_volid  | None |
| status |   creating   |
|user_id |   90ed966837e44f91a582b73960dd848c   |
|  volume_type   | None |
++--+

# cinder list
+--++---+--+-+--+-+
|  ID  | Status |  Name | Size | Volume 
Type | Bootable | Attached to |
+--++---+--+-+--+-+
| 30d1eee7-54d6-4911-af06-b35d2f8ef0c4 | error  | boot-from-rbd |  1   | 
None|  false   | |
+--++---+--+-+--+-+

I've done all the setting of URL(http://ceph.com/docs/next/rbd/rbd-openstack/)
There is a setup required except URL?



Best Regards.

Yamashita
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com