Re: librbd: error finding header

2012-07-10 Thread Vladimir Bashkirtsev
On 10/07/12 14:32, Dan Mick wrote: On 07/09/2012 08:27 PM, Vladimir Bashkirtsev wrote: On 10/07/12 03:17, Dan Mick wrote: Well, it's not so much those; those are the objects that hold data blocks. You're more interested in the objects whose names end in '.rbd'. These are the header

Re: domino-style OSD crash

2012-07-10 Thread Yann Dupont
Le 09/07/2012 19:14, Samuel Just a écrit : Can you restart the node that failed to complete the upgrade with Well, it's a little big complicated ; I now run those nodes with XFS, and I've long-running jobs on it right now, so I can't stop the ceph cluster at the moment. As I've keeped the

core dumps

2012-07-10 Thread Székelyi Szabolcs
Hi, I usually find core dumps in my root folders belonging to Ceph daemons. Yesterday night two of my three monitors dumped core at the exact same moment. Are you interested in them? And in general, if I find such core files, should I send them to you? Thanks, -- cc -- To unsubscribe from

Re: core dumps

2012-07-10 Thread Székelyi Szabolcs
It just turned out that this problem was likely caused by a network outage, that made the iSCSI-backed filesystem that holds Ceph's data directories go away. The third mon seems to have crashed, too, but that machine didn't have enough free space in the root filesystem for the core file. On

Re: rados mailbox? (was Re: Ceph for email storage)

2012-07-10 Thread Smart Weblications GmbH
Am 10.07.2012 07:45, schrieb Kristofer: Very short answer to this. It can work if you direct all email requests for a particular mailbox to a single machine. You need to avoid locking between servers as much as possible. Messages will need to be indexed, period. Or else your life will

Re: domino-style OSD crash

2012-07-10 Thread Tommi Virtanen
On Tue, Jul 10, 2012 at 2:46 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: As I've keeped the original broken btrfs volumes, I tried this morning to run the old osd in parrallel, using the $cluster variable. I only have partial success. The cluster mechanism was never intended for moving

ioping: tool to monitor I/O latency in real time

2012-07-10 Thread Tommi Virtanen
Hi. I stumbled on this over the weekend, and thought people here might be interested in seeing whether it's useful in figuring out things like btrfs health: http://code.google.com/p/ioping/ -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to

Re: domino-style OSD crash

2012-07-10 Thread Yann Dupont
Le 10/07/2012 17:56, Tommi Virtanen a écrit : On Tue, Jul 10, 2012 at 2:46 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: As I've keeped the original broken btrfs volumes, I tried this morning to run the old osd in parrallel, using the $cluster variable. I only have partial success. The

Re: domino-style OSD crash

2012-07-10 Thread Tommi Virtanen
On Tue, Jul 10, 2012 at 9:39 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: The cluster mechanism was never intended for moving existing osds to other clusters. Trying that might not be a good idea. Ok, good to know. I saw that the remaining maps could lead to problem, but in 2 words, what

Re: domino-style OSD crash

2012-07-10 Thread Yann Dupont
Le 10/07/2012 19:11, Tommi Virtanen a écrit : On Tue, Jul 10, 2012 at 9:39 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: The cluster mechanism was never intended for moving existing osds to other clusters. Trying that might not be a good idea. Ok, good to know. I saw that the remaining

Re: oops in rbd module (con_work in libceph)

2012-07-10 Thread Gregory Farnum
On Mon, Jul 9, 2012 at 10:04 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: Le 09/07/2012 18:54, Yann Dupont a écrit : Ok. I've compiled the kernel this afternoon, and tested it without much success : Jul 9 18:17:23 label5.u14.univ-nantes.prive kernel: [ 284.116236] libceph: osd0

Re: domino-style OSD crash

2012-07-10 Thread Tommi Virtanen
On Tue, Jul 10, 2012 at 10:36 AM, Yann Dupont yann.dup...@univ-nantes.fr wrote: Fundamentally, it comes down to this: the two clusters will still have the same fsid, and you won't be isolated from configuration errors or (CEPH-PROD is the old btrfs volume ). /CEPH is new xfs volume, completely

Re: librbd: error finding header

2012-07-10 Thread Josh Durgin
On 07/10/2012 02:25 AM, Vladimir Bashkirtsev wrote: On 10/07/12 14:32, Dan Mick wrote: On 07/09/2012 08:27 PM, Vladimir Bashkirtsev wrote: On 10/07/12 03:17, Dan Mick wrote: Well, it's not so much those; those are the objects that hold data blocks. You're more interested in the objects

Re: Keys caps

2012-07-10 Thread Gregory Farnum
On Mon, Jul 9, 2012 at 10:27 AM, Székelyi Szabolcs szeke...@niif.hu wrote: On 2012. July 9. 09:33:22 Sage Weil wrote: On Mon, 9 Jul 2012, Székelyi Szabolcs wrote: this far I accessed my Ceph (0.48) FS with the client.admin key, but I'd like to change that since I don't want to allow clients

Re: Keys caps

2012-07-10 Thread Székelyi Szabolcs
On 2012. July 10. 13:09:10 Gregory Farnum wrote: On Mon, Jul 9, 2012 at 10:27 AM, Székelyi Szabolcs szeke...@niif.hu wrote: On 2012. July 9. 09:33:22 Sage Weil wrote: On Mon, 9 Jul 2012, Székelyi Szabolcs wrote: this far I accessed my Ceph (0.48) FS with the client.admin key, but I'd

Re: Keys caps

2012-07-10 Thread Sage Weil
On Wed, 11 Jul 2012, Sz?kelyi Szabolcs wrote: The problem is that the mount.ceph command doesn't understand keyrings; it only understands secret= and secretfile=. There is a longstanding feature bug open for this http://tracker.newdream.net/issues/266 but it

Re: Keys caps

2012-07-10 Thread Sage Weil
On Tue, 10 Jul 2012, Gregory Farnum wrote: On Tue, Jul 10, 2012 at 4:25 PM, Sage Weil s...@inktank.com wrote: On Wed, 11 Jul 2012, Sz?kelyi Szabolcs wrote: The problem is that the mount.ceph command doesn't understand keyrings; it only understands secret= and secretfile=.

Re: Keys caps

2012-07-10 Thread Székelyi Szabolcs
On 2012. July 10. 16:25:47 Sage Weil wrote: On Wed, 11 Jul 2012, Sz?kelyi Szabolcs wrote: The problem is that the mount.ceph command doesn't understand keyrings; it only understands secret= and secretfile=. There is a longstanding feature bug open for this

Pg stuck stale...why?

2012-07-10 Thread Mark Kirkwood
I am seeing this: # ceph -s health HEALTH_WARN 256 pgs stale; 256 pgs stuck stale monmap e1: 3 mons at {ved1=192.168.122.11:6789/0,ved2=192.168.122.12:6789/0,ved3=192.168.122.13:6789/0}, election epoch 18, quorum 0,1,2 ved1,ved2,ved3 osdmap e62: 4 osds: 4 up, 4 in pgmap v47148:

Re: Pg stuck stale...why?

2012-07-10 Thread Josh Durgin
On 07/10/2012 06:11 PM, Mark Kirkwood wrote: I am seeing this: # ceph -s health HEALTH_WARN 256 pgs stale; 256 pgs stuck stale monmap e1: 3 mons at {ved1=192.168.122.11:6789/0,ved2=192.168.122.12:6789/0,ved3=192.168.122.13:6789/0}, election epoch 18, quorum 0,1,2 ved1,ved2,ved3 osdmap e62: 4

Re: Pg stuck stale...why?

2012-07-10 Thread Mark Kirkwood
On 11/07/12 13:22, Josh Durgin wrote: On 07/10/2012 06:11 PM, Mark Kirkwood wrote: I am seeing this: # ceph -s health HEALTH_WARN 256 pgs stale; 256 pgs stuck stale monmap e1: 3 mons at {ved1=192.168.122.11:6789/0,ved2=192.168.122.12:6789/0,ved3=192.168.122.13:6789/0}, election epoch 18,

Re: Pg stuck stale...why?

2012-07-10 Thread Mark Kirkwood
On 11/07/12 13:32, Mark Kirkwood wrote: I have attached the dump of stuck stale pgs, and the crushmap in use. ...of course I left off the crushmap, so here it is, plus my ceph.conf for good measure. Mark # begin crush map # devices device 0 osd0 device 1 osd1 device 2 osd2 device 3 osd3

Re: Pg stuck stale...why?

2012-07-10 Thread Josh Durgin
On 07/10/2012 06:32 PM, Mark Kirkwood wrote: On 11/07/12 13:22, Josh Durgin wrote: On 07/10/2012 06:11 PM, Mark Kirkwood wrote: I am seeing this: # ceph -s health HEALTH_WARN 256 pgs stale; 256 pgs stuck stale monmap e1: 3 mons at

Re: Pg stuck stale...why?

2012-07-10 Thread Mark Kirkwood
On 11/07/12 13:55, Josh Durgin wrote: On 07/10/2012 06:32 PM, Mark Kirkwood wrote: On 11/07/12 13:22, Josh Durgin wrote: On 07/10/2012 06:11 PM, Mark Kirkwood wrote: I am seeing this: # ceph -s health HEALTH_WARN 256 pgs stale; 256 pgs stuck stale monmap e1: 3 mons at

Re: [PATCH 1/1] client: prevent the race of incoming work during teardown

2012-07-10 Thread Sage Weil
Hey, I've rebased this and applied it to the testing branch, along with a few other fixes. Thanks! sage On Sun, 8 Jul 2012, Guan Jun He wrote: Hi Sage, I have resubmited the patch with changes we talked in last e_mail. Please take a view of it and give a reply. Thank you very much

Re: Pg stuck stale...why?

2012-07-10 Thread Mark Kirkwood
On 11/07/12 14:09, Mark Kirkwood wrote: On 11/07/12 13:55, Josh Durgin wrote: On 07/10/2012 06:32 PM, Mark Kirkwood wrote: On 11/07/12 13:22, Josh Durgin wrote: On 07/10/2012 06:11 PM, Mark Kirkwood wrote: I am seeing this: # ceph -s health HEALTH_WARN 256 pgs stale; 256 pgs stuck stale