Re: Empty directory size greater than zero and can't remove

2012-12-18 Thread Sage Weil
On Wed, 19 Dec 2012, Mark Kirkwood wrote:
> On 19/12/12 15:56, Drunkard Zhang wrote:
> > 2012/12/19 Mark Kirkwood :
> > > On 19/12/12 14:44, Drunkard Zhang wrote:
> > > > 2012/12/16 Drunkard Zhang :
> > > > > I couldn't rm files in ceph, which was backuped files of one osd. It
> > > > > reports direcory not empty, but there's nothing under that directory,
> > > > > just the directory itself held some spaces. How could I shoot down the
> > > > > problem ?
> > > > > 
> > > > > log30 /mnt/bc # ls -aR osd.28/
> > > > > osd.28/:
> > > > > .  ..  osd.28
> > > > > 
> > > > > osd.28/osd.28:
> > > > > .  ..  current
> > > > > 
> > > > > osd.28/osd.28/current:
> > > > > .  ..  0.537_head
> > > > > 
> > > > > osd.28/osd.28/current/0.537_head:
> > > > > .  ..
> > > > > log30 /mnt/bc # ls -lhd osd.28/osd.28/current/0.537_head
> > > > > drwxr-xr-x 1 root root 119M Dec 14 19:22
> > > > > osd.28/osd.28/current/0.537_head
> > > > > log30 /mnt/bc #
> > > > > log30 /mnt/bc # rm -rf osd.28/
> > > > > rm: cannot remove ?osd.28/osd.28/current/0.537_head?: Directory not
> > > > > empty
> > > > > log30 /mnt/bc # rm -rf osd.28/osd.28/current/0.537_head
> > > > > rm: cannot remove ?osd.28/osd.28/current/0.537_head?: Directory not
> > > > > empty
> > > > > 
> > > > > The cluster seems health:
> > > > > log3 ~ # ceph -s
> > > > >  health HEALTH_OK
> > > > >  monmap e1: 3 mons at
> > > > > 
> > > > > {log21=10.205.118.21:6789/0,log3=10.205.119.2:6789/0,squid86-log12=150.164.100.218:6789/0},
> > > > > election epoch 640, quorum 0,1,2 log21,log3,squid86-log12
> > > > >  osdmap e1864: 45 osds: 45 up, 45 in
> > > > >   pgmap v163907: 9224 pgs: 9224 active+clean; 3168 GB data, 9565
> > > > > GB
> > > > > used, 111 TB / 120 TB avail
> > > > >  mdsmap e134: 1/1/1 up {0=log14=up:active}, 1 up:standby
> > > > > 
> > > > After mds restart, I got this error message:
> > > > 2012-12-19 09:16:24.837045 mds.0 [ERR] unmatched fragstat size on
> > > > single dirfrag 10006c7, inode has f(v6 m2012-11-24 23:18:34.947266
> > > > 1773=1773+0), dirfrag has f(v6 m2012-12-17 12:43:52.203358
> > > > 2038=2038+0)
> > > > 
> > > > How can I fix this?
> > > > 
> > > Is it a btrfs filesystem? If so will have sub volumes hiding in there you
> > > need to remove 1st.
> > Thanks for reply, osds all lives on xfs filesystem.
> 
> Ah, right - might be worth showing us output of 'ls -la' in the dir concerned.
> In particular the link counts might be wrong (indicating fs corruption,
> probably fixable with xfs_repair).

This is a problem in the MDS, not the fs underneath the OSDs.  There was 
at least one bug that was corrupting the 'rstats' recursive info that 
could lead to this that has been fixed recently.

The MDS is actually repairing this as it goes, unless you specify the 'mds 
verify scatter = true' option, in which case it will assert and kill 
itself.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Empty directory size greater than zero and can't remove

2012-12-18 Thread Mark Kirkwood

On 19/12/12 15:56, Drunkard Zhang wrote:

2012/12/19 Mark Kirkwood :

On 19/12/12 14:44, Drunkard Zhang wrote:

2012/12/16 Drunkard Zhang :

I couldn't rm files in ceph, which was backuped files of one osd. It
reports direcory not empty, but there's nothing under that directory,
just the directory itself held some spaces. How could I shoot down the
problem ?

log30 /mnt/bc # ls -aR osd.28/
osd.28/:
.  ..  osd.28

osd.28/osd.28:
.  ..  current

osd.28/osd.28/current:
.  ..  0.537_head

osd.28/osd.28/current/0.537_head:
.  ..
log30 /mnt/bc # ls -lhd osd.28/osd.28/current/0.537_head
drwxr-xr-x 1 root root 119M Dec 14 19:22 osd.28/osd.28/current/0.537_head
log30 /mnt/bc #
log30 /mnt/bc # rm -rf osd.28/
rm: cannot remove ‘osd.28/osd.28/current/0.537_head’: Directory not empty
log30 /mnt/bc # rm -rf osd.28/osd.28/current/0.537_head
rm: cannot remove ‘osd.28/osd.28/current/0.537_head’: Directory not empty

The cluster seems health:
log3 ~ # ceph -s
 health HEALTH_OK
 monmap e1: 3 mons at

{log21=10.205.118.21:6789/0,log3=10.205.119.2:6789/0,squid86-log12=150.164.100.218:6789/0},
election epoch 640, quorum 0,1,2 log21,log3,squid86-log12
 osdmap e1864: 45 osds: 45 up, 45 in
  pgmap v163907: 9224 pgs: 9224 active+clean; 3168 GB data, 9565 GB
used, 111 TB / 120 TB avail
 mdsmap e134: 1/1/1 up {0=log14=up:active}, 1 up:standby


After mds restart, I got this error message:
2012-12-19 09:16:24.837045 mds.0 [ERR] unmatched fragstat size on
single dirfrag 10006c7, inode has f(v6 m2012-11-24 23:18:34.947266
1773=1773+0), dirfrag has f(v6 m2012-12-17 12:43:52.203358
2038=2038+0)

How can I fix this?


Is it a btrfs filesystem? If so will have sub volumes hiding in there you
need to remove 1st.

Thanks for reply, osds all lives on xfs filesystem.


Ah, right - might be worth showing us output of 'ls -la' in the dir 
concerned. In particular the link counts might be wrong (indicating fs 
corruption, probably fixable with xfs_repair).


Cheers

Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mon not marking dead osds down and slow streaming write performance

2012-12-18 Thread Michael Chapman
Hi all,

I apologise if this list is only for dev issues and not for operators,
I didn't see a more general list on the ceph website.

I have 5 OSD processes per host, and an FC uplink port failure caused
kernel panics in two hosts - 0404 and 0401. The mon log looks like
this:

2012-12-19 13:30:38.634865 7f9a0f167700 10 mon.3@0(leader).osd e2184
preprocess_query osd_failure(osd.404 172.22.4.4:6812/12835 for 8832
e2184 v2184) v3 from osd.602 172.22.4.6:6806/5152
2012-12-19 13:30:38.634875 7f9a0f167700  5 mon.3@0(leader).osd e2184
can_mark_down current up_ratio 0.298429 < min 0.3, will not mark
osd.404 down
2012-12-19 13:30:38.634880 7f9a0f167700  5 mon.3@0(leader).osd e2184 preprocess_

The cluster appears healthy

root@os-0405:~# ceph -s
   health HEALTH_OK
   monmap e3: 1 mons at {3=172.22.4.5:6789/0}, election epoch 1, quorum 0 3
   osdmap e2184: 191 osds: 57 up, 57 in
pgmap v205386: 121952 pgs: 121951 active+clean, 1
active+clean+scrubbing; 4437 MB data, 49497 MB used, 103 TB / 103 TB
avail
   mdsmap e1: 0/0/1 up

root@os-0405:~# ceph osd tree

# idweight  type name   up/down reweight
-1  30  pool default
-3  30  rack unknownrack
-2  6   host os-0401
100 1   osd.100 up  1
101 1   osd.101 up  1
102 1   osd.102 up  1
103 1   osd.103 up  1
104 1   osd.104 up  1
112 1   osd.112 up  1
-4  6   host os-0402
200 1   osd.200 up  1
201 1   osd.201 up  1
202 1   osd.202 up  1
203 1   osd.203 up  1
204 1   osd.204 up  1
212 1   osd.212 up  1
-5  6   host os-0403
300 1   osd.300 up  1
301 1   osd.301 up  1
302 1   osd.302 up  1
303 1   osd.303 up  1
304 1   osd.304 up  1
312 1   osd.312 up  1
-6  6   host os-0404
400 1   osd.400 up  1
401 1   osd.401 up  1
402 1   osd.402 up  1
403 1   osd.403 up  1
404 1   osd.404 up  1
412 1   osd.412 up  1
-7  0   host os-0405
-8  6   host os-0406
600 1   osd.600 up  1
601 1   osd.601 up  1
602 1   osd.602 up  1
603 1   osd.603 up  1
604 1   osd.604 up  1
612 1   osd.612 up  1

but os-0404 has no osd processes running anymore.

root@os-0404:~# ps aux | grep ceph
root  4964  0.0  0.0   9628   920 pts/1S+   13:31   0:00 grep
--color=auto ceph

and even if it did, it can't access the luns in order to mount the xfs
filesystems with all the osd data.

What is preventing the mon from marking the osds on 0404 down?

A second issue I have been having is that my reads+writes are very
bursty, going from 8MB/s to 200MB/s when doing a dd from a physical
client over 10GbE. It seems to be waiting on the mon most of the time,
and iostat shows long io wait times for the disk the mon is using. I
can also see it writing ~40MB/s constantly to disk in iotop, though I
don't know if this is random or sequential. I see a lot of waiting for
sub ops which I thought might be a result of the io wait.

Is that a normal amount of activity for a mon process? Should I be
running the mon processes off more than just a single sata disk to
keep up with ~30 OSD processes?

Thanks for your time.

 - Michael Chapman
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Empty directory size greater than zero and can't remove

2012-12-18 Thread Drunkard Zhang
2012/12/19 Mark Kirkwood :
> On 19/12/12 14:44, Drunkard Zhang wrote:
>>
>> 2012/12/16 Drunkard Zhang :
>>>
>>> I couldn't rm files in ceph, which was backuped files of one osd. It
>>> reports direcory not empty, but there's nothing under that directory,
>>> just the directory itself held some spaces. How could I shoot down the
>>> problem ?
>>>
>>> log30 /mnt/bc # ls -aR osd.28/
>>> osd.28/:
>>> .  ..  osd.28
>>>
>>> osd.28/osd.28:
>>> .  ..  current
>>>
>>> osd.28/osd.28/current:
>>> .  ..  0.537_head
>>>
>>> osd.28/osd.28/current/0.537_head:
>>> .  ..
>>> log30 /mnt/bc # ls -lhd osd.28/osd.28/current/0.537_head
>>> drwxr-xr-x 1 root root 119M Dec 14 19:22 osd.28/osd.28/current/0.537_head
>>> log30 /mnt/bc #
>>> log30 /mnt/bc # rm -rf osd.28/
>>> rm: cannot remove ‘osd.28/osd.28/current/0.537_head’: Directory not empty
>>> log30 /mnt/bc # rm -rf osd.28/osd.28/current/0.537_head
>>> rm: cannot remove ‘osd.28/osd.28/current/0.537_head’: Directory not empty
>>>
>>> The cluster seems health:
>>> log3 ~ # ceph -s
>>> health HEALTH_OK
>>> monmap e1: 3 mons at
>>>
>>> {log21=10.205.118.21:6789/0,log3=10.205.119.2:6789/0,squid86-log12=150.164.100.218:6789/0},
>>> election epoch 640, quorum 0,1,2 log21,log3,squid86-log12
>>> osdmap e1864: 45 osds: 45 up, 45 in
>>>  pgmap v163907: 9224 pgs: 9224 active+clean; 3168 GB data, 9565 GB
>>> used, 111 TB / 120 TB avail
>>> mdsmap e134: 1/1/1 up {0=log14=up:active}, 1 up:standby
>>>
>> After mds restart, I got this error message:
>> 2012-12-19 09:16:24.837045 mds.0 [ERR] unmatched fragstat size on
>> single dirfrag 10006c7, inode has f(v6 m2012-11-24 23:18:34.947266
>> 1773=1773+0), dirfrag has f(v6 m2012-12-17 12:43:52.203358
>> 2038=2038+0)
>>
>> How can I fix this?
>>
>
> Is it a btrfs filesystem? If so will have sub volumes hiding in there you
> need to remove 1st.

Thanks for reply, osds all lives on xfs filesystem.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Empty directory size greater than zero and can't remove

2012-12-18 Thread Mark Kirkwood

On 19/12/12 14:44, Drunkard Zhang wrote:

2012/12/16 Drunkard Zhang :

I couldn't rm files in ceph, which was backuped files of one osd. It
reports direcory not empty, but there's nothing under that directory,
just the directory itself held some spaces. How could I shoot down the
problem ?

log30 /mnt/bc # ls -aR osd.28/
osd.28/:
.  ..  osd.28

osd.28/osd.28:
.  ..  current

osd.28/osd.28/current:
.  ..  0.537_head

osd.28/osd.28/current/0.537_head:
.  ..
log30 /mnt/bc # ls -lhd osd.28/osd.28/current/0.537_head
drwxr-xr-x 1 root root 119M Dec 14 19:22 osd.28/osd.28/current/0.537_head
log30 /mnt/bc #
log30 /mnt/bc # rm -rf osd.28/
rm: cannot remove ‘osd.28/osd.28/current/0.537_head’: Directory not empty
log30 /mnt/bc # rm -rf osd.28/osd.28/current/0.537_head
rm: cannot remove ‘osd.28/osd.28/current/0.537_head’: Directory not empty

The cluster seems health:
log3 ~ # ceph -s
health HEALTH_OK
monmap e1: 3 mons at
{log21=10.205.118.21:6789/0,log3=10.205.119.2:6789/0,squid86-log12=150.164.100.218:6789/0},
election epoch 640, quorum 0,1,2 log21,log3,squid86-log12
osdmap e1864: 45 osds: 45 up, 45 in
 pgmap v163907: 9224 pgs: 9224 active+clean; 3168 GB data, 9565 GB
used, 111 TB / 120 TB avail
mdsmap e134: 1/1/1 up {0=log14=up:active}, 1 up:standby


After mds restart, I got this error message:
2012-12-19 09:16:24.837045 mds.0 [ERR] unmatched fragstat size on
single dirfrag 10006c7, inode has f(v6 m2012-11-24 23:18:34.947266
1773=1773+0), dirfrag has f(v6 m2012-12-17 12:43:52.203358
2038=2038+0)

How can I fix this?



Is it a btrfs filesystem? If so will have sub volumes hiding in there 
you need to remove 1st.


Cheers

Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Empty directory size greater than zero and can't remove

2012-12-18 Thread Drunkard Zhang
2012/12/16 Drunkard Zhang :
> I couldn't rm files in ceph, which was backuped files of one osd. It
> reports direcory not empty, but there's nothing under that directory,
> just the directory itself held some spaces. How could I shoot down the
> problem ?
>
> log30 /mnt/bc # ls -aR osd.28/
> osd.28/:
> .  ..  osd.28
>
> osd.28/osd.28:
> .  ..  current
>
> osd.28/osd.28/current:
> .  ..  0.537_head
>
> osd.28/osd.28/current/0.537_head:
> .  ..
> log30 /mnt/bc # ls -lhd osd.28/osd.28/current/0.537_head
> drwxr-xr-x 1 root root 119M Dec 14 19:22 osd.28/osd.28/current/0.537_head
> log30 /mnt/bc #
> log30 /mnt/bc # rm -rf osd.28/
> rm: cannot remove ‘osd.28/osd.28/current/0.537_head’: Directory not empty
> log30 /mnt/bc # rm -rf osd.28/osd.28/current/0.537_head
> rm: cannot remove ‘osd.28/osd.28/current/0.537_head’: Directory not empty
>
> The cluster seems health:
> log3 ~ # ceph -s
>health HEALTH_OK
>monmap e1: 3 mons at
> {log21=10.205.118.21:6789/0,log3=10.205.119.2:6789/0,squid86-log12=150.164.100.218:6789/0},
> election epoch 640, quorum 0,1,2 log21,log3,squid86-log12
>osdmap e1864: 45 osds: 45 up, 45 in
> pgmap v163907: 9224 pgs: 9224 active+clean; 3168 GB data, 9565 GB
> used, 111 TB / 120 TB avail
>mdsmap e134: 1/1/1 up {0=log14=up:active}, 1 up:standby
>
After mds restart, I got this error message:
2012-12-19 09:16:24.837045 mds.0 [ERR] unmatched fragstat size on
single dirfrag 10006c7, inode has f(v6 m2012-11-24 23:18:34.947266
1773=1773+0), dirfrag has f(v6 m2012-12-17 12:43:52.203358
2038=2038+0)

How can I fix this?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Rados consistency model

2012-12-18 Thread Samuel Just
You can expect read-after-write on any object.  That is, once the
write is complete, any subsequent reader will see the result.
-Sam

On Tue, Dec 18, 2012 at 12:51 AM, Rutger ter Borg  wrote:
>
> Dear list,
>
> I have a question regarding concurrency guarantees of Rados. Suppose I have
> two nodes, say A and B, both running a process, and both using the same
> rados storage pool, maybe connected through different OSDs. Suppose node A
> updates an object in the storage pool, and after completion immediately
> notifies B (through its own messaging layer) that that specific object has
> been updated. Then, can I assume that, if B re-reads that object, it will
> always get the updated one? If not, what would be the recommended way of
> notifying B?
>
> IOW, what kind of consistency model should I assume?
>
> Thanks,
>
> Rutger
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rbd map command hangs for 15 minutes during system start up

2012-12-18 Thread Nick Bartos
I've added the output of "ps -ef" in addition to triggering a trace
when a hang is detected.  Not much is generally running at that point,
but you can have a look:

https://gist.github.com/raw/4330223/2f131ee312ee43cb3d8c307a9bf2f454a7edfe57/rbd-hang-1355851498.txt

Is it possible that there is some sort of deadlock going on?  We are
doing the rbd maps (and subsequent filesystem mounts) on the same
systems which are running the ceph-osd and ceph-mon processes.  To get
around the 'sync' deadlock problem, we are using a patch from Sage
which ignores system wide sync's on filesystems mounted with the
'mand' option (and we mount the underlying osd filesystems with
'mand').  However I am wondering if there is potential for other types
of deadlocks in this environment.

Also, we recently saw an rbd hang in a much older version, running
kernel 3.5.3 with only the sync hack patch, along side ceph 0.48.1.
It's possible that this issue was around for some time, just the
recent patches made it happen more often (and thus more reproducible)
for us.


On Tue, Dec 18, 2012 at 8:09 AM, Alex Elder  wrote:
> On 12/17/2012 11:12 AM, Nick Bartos wrote:
>> Here's a log with the rbd debugging enabled:
>>
>> https://gist.github.com/raw/4319962/d9690fd92c169198efc5eecabf275ef1808929d2/rbd-hang-test-1355763470.log
>>
>> On Fri, Dec 14, 2012 at 10:03 AM, Alex Elder  wrote:
>>> On 12/14/2012 10:53 AM, Nick Bartos wrote:
 Yes I was only enabling debugging for libceph.  I'm adding debugging
 for rbd as well.  I'll do a repro later today when a test cluster
 opens up.
>>>
>>> Excellent, thank you.   -Alex
>
> I looked through these debugging messages.  Looking only at the
> rbd debugging, what I see seems to indicate that rbd is idle at
> the point the "hang" seems to start.  This suggests that the hang
> is not due to rbd itself, but rather whatever it is that might
> be responsible for using the rbd image once it has been mapped.
>
> Is that possible?  I don't know what process you have that is
> mapping the rbd image, and what is supposed to be the next thing
> it does.  (I realize this may not make a lot of sense, given
> a patch in rdb seems to have caused the hang to begin occurring.)
>
> Also note that the debugging information available (i.e., the
> lines in the code that can output debugging information) may
> well be incomplete.  So if you don't find anything it may be
> necessary to provide you with another update which might include
> more debugging.
>
> Anyway, could you provide a little more context about what
> is going on sort of *around* rbd when activity seems to stop?
>
> Thanks a lot.
>
> -Alex
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD memory leaks?

2012-12-18 Thread Sébastien Han
Nothing terrific...

Kernel logs from my clients are full of "libceph: osd4
172.20.11.32:6801 socket closed"

I saw this somewhere on the tracker.

Does this harm?

Thanks.

--
Regards,
Sébastien Han.



On Mon, Dec 17, 2012 at 11:55 PM, Samuel Just  wrote:
>
> What is the workload like?
> -Sam
>
> On Mon, Dec 17, 2012 at 2:41 PM, Sébastien Han  
> wrote:
> > Hi,
> >
> > No, I don't see nothing abnormal in the network stats. I don't see
> > anything in the logs... :(
> > The weird thing is that one node over 4 seems to take way more memory
> > than the others...
> >
> > --
> > Regards,
> > Sébastien Han.
> >
> >
> > On Mon, Dec 17, 2012 at 11:31 PM, Sébastien Han  
> > wrote:
> >>
> >> Hi,
> >>
> >> No, I don't see nothing abnormal in the network stats. I don't see 
> >> anything in the logs... :(
> >> The weird thing is that one node over 4 seems to take way more memory than 
> >> the others...
> >>
> >> --
> >> Regards,
> >> Sébastien Han.
> >>
> >>
> >>
> >> On Mon, Dec 17, 2012 at 7:12 PM, Samuel Just  wrote:
> >>>
> >>> Are you having network hiccups?  There was a bug noticed recently that
> >>> could cause a memory leak if nodes are being marked up and down.
> >>> -Sam
> >>>
> >>> On Mon, Dec 17, 2012 at 12:28 AM, Sébastien Han  
> >>> wrote:
> >>> > Hi guys,
> >>> >
> >>> > Today looking at my graphs I noticed that one over 4 ceph nodes used a
> >>> > lot of memory. It keeps growing and growing.
> >>> > See the graph attached to this mail.
> >>> > I run 0.48.2 on Ubuntu 12.04.
> >>> >
> >>> > The other nodes also grow, but slowly than the first one.
> >>> >
> >>> > I'm not quite sure about the information that I have to provide. So
> >>> > let me know. The only thing I can say is that the load haven't
> >>> > increase that much this week. It seems to be consuming and not giving
> >>> > back the memory.
> >>> >
> >>> > Thank you in advance.
> >>> >
> >>> > --
> >>> > Regards,
> >>> > Sébastien Han.
> >>
> >>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rbd map command hangs for 15 minutes during system start up

2012-12-18 Thread Alex Elder
On 12/17/2012 11:12 AM, Nick Bartos wrote:
> Here's a log with the rbd debugging enabled:
> 
> https://gist.github.com/raw/4319962/d9690fd92c169198efc5eecabf275ef1808929d2/rbd-hang-test-1355763470.log
> 
> On Fri, Dec 14, 2012 at 10:03 AM, Alex Elder  wrote:
>> On 12/14/2012 10:53 AM, Nick Bartos wrote:
>>> Yes I was only enabling debugging for libceph.  I'm adding debugging
>>> for rbd as well.  I'll do a repro later today when a test cluster
>>> opens up.
>>
>> Excellent, thank you.   -Alex

I looked through these debugging messages.  Looking only at the
rbd debugging, what I see seems to indicate that rbd is idle at
the point the "hang" seems to start.  This suggests that the hang
is not due to rbd itself, but rather whatever it is that might
be responsible for using the rbd image once it has been mapped.

Is that possible?  I don't know what process you have that is
mapping the rbd image, and what is supposed to be the next thing
it does.  (I realize this may not make a lot of sense, given
a patch in rdb seems to have caused the hang to begin occurring.)

Also note that the debugging information available (i.e., the
lines in the code that can output debugging information) may
well be incomplete.  So if you don't find anything it may be
necessary to provide you with another update which might include
more debugging.

Anyway, could you provide a little more context about what
is going on sort of *around* rbd when activity seems to stop?

Thanks a lot.

-Alex
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Recovery stuck and radosgateway not initializing

2012-12-18 Thread Yann ROBIN
Our configuration : 6 OSDs, 3 Mon.
Journal is on INTEL SSDSA2CW120G3 disk and Data is on Hitachi HUS724040ALE640 
disk.

When OSD does recovery IO is high, and at some point the OSD is killed.
We set max active recovery to 1 and set filestore op thread suicide timeout to 
360.

What should I do in that case ?

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Yann ROBIN
Sent: mardi 18 décembre 2012 11:51
To: ceph-devel@vger.kernel.org
Subject: Recovery stuck and radosgateway not initializing

Hi,

We're using ceph v0.55, and last night we loste one node of our cluster.
When it came back, ceph start recovering but since then the radosgateway could 
not connect to the cluster.
The rados gateway timeout on initializtion (somewhere in the radosclient 
connect).

The other problem (and I think it's related) is that the recovery isn't 
working. Osd gets OSD Op thread timeout and sometimes some of the OSD crash 
(see stacktrace attached).
So it seems that our OSD aren't up long enough for the recovery to proceed.

Any would be appreciated.

Thanks,

-- 
Yann


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Recovery stuck and radosgateway not initializing

2012-12-18 Thread Yann ROBIN
Hi,

We're using ceph v0.55, and last night we loste one node of our cluster.
When it came back, ceph start recovering but since then the radosgateway could 
not connect to the cluster.
The rados gateway timeout on initializtion (somewhere in the radosclient 
connect).

The other problem (and I think it's related) is that the recovery isn't 
working. Osd gets OSD Op thread timeout and sometimes some of the OSD crash 
(see stacktrace attached).
So it seems that our OSD aren't up long enough for the recovery to proceed.

Any would be appreciated.

Thanks,

-- 
Yann



ceph.log
Description: ceph.log


Re: Striped images and cluster misbehavior

2012-12-18 Thread Andrey Korolyov
On Mon, Dec 17, 2012 at 2:36 AM, Andrey Korolyov  wrote:
> Hi,
>
> After recent switch do default  ``--stripe-count 1'' on image upload I
> have observed some strange thing - single import or deletion of the
> striped image may temporarily turn off entire cluster, literally(see
> log below).
> Of course next issued osd map fix the situation, but all in-flight
> operations experiencing a short freeze. This issue appears randomly in
> some import or delete operation, have not seen any other types causing
> this. Even if a nature of this bug laying completely in the client-osd
> interaction, may be ceph should develop a some foolproof actions even
> if complaining client have admin privileges? Almost for sure this
> should be reproduced within teuthology with rwx rights both on osds
> and mons at the client. And as I can see there is no problem on both
> physical and protocol layer for dedicated cluster interface on client
> machine.
>
> 2012-12-17 02:17:03.691079 mon.0 [INF] pgmap v2403268: 15552 pgs:
> 15552 active+clean; 931 GB data, 2927 GB used, 26720 GB / 29647 GB
> avail
> 2012-12-17 02:17:04.693344 mon.0 [INF] pgmap v2403269: 15552 pgs:
> 15552 active+clean; 931 GB data, 2927 GB used, 26720 GB / 29647 GB
> avail
> 2012-12-17 02:17:05.695742 mon.0 [INF] pgmap v2403270: 15552 pgs:
> 15552 active+clean; 931 GB data, 2927 GB used, 26720 GB / 29647 GB
> avail
> 2012-12-17 02:17:05.991900 mon.0 [INF] osd.0 10.5.0.10:6800/4907
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.991859 >=
> grace 20.00)
> 2012-12-17 02:17:05.992017 mon.0 [INF] osd.1 10.5.0.11:6800/5011
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.991995 >=
> grace 20.00)
> 2012-12-17 02:17:05.992139 mon.0 [INF] osd.2 10.5.0.12:6803/5226
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.992110 >=
> grace 20.00)
> 2012-12-17 02:17:05.992240 mon.0 [INF] osd.3 10.5.0.13:6803/6054
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.992224 >=
> grace 20.00)
> 2012-12-17 02:17:05.992330 mon.0 [INF] osd.4 10.5.0.14:6803/5792
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.992317 >=
> grace 20.00)
> 2012-12-17 02:17:05.992420 mon.0 [INF] osd.5 10.5.0.15:6803/5564
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.992405 >=
> grace 20.00)
> 2012-12-17 02:17:05.992515 mon.0 [INF] osd.7 10.5.0.17:6803/5902
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.992501 >=
> grace 20.00)
> 2012-12-17 02:17:05.992607 mon.0 [INF] osd.8 10.5.0.10:6803/5338
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.992591 >=
> grace 20.00)
> 2012-12-17 02:17:05.992702 mon.0 [INF] osd.10 10.5.0.12:6800/5040
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.992686 >=
> grace 20.00)
> 2012-12-17 02:17:05.992793 mon.0 [INF] osd.11 10.5.0.13:6800/5748
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.992778 >=
> grace 20.00)
> 2012-12-17 02:17:05.992891 mon.0 [INF] osd.12 10.5.0.14:6800/5459
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.992875 >=
> grace 20.00)
> 2012-12-17 02:17:05.992980 mon.0 [INF] osd.13 10.5.0.15:6800/5235
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.992966 >=
> grace 20.00)
> 2012-12-17 02:17:05.993081 mon.0 [INF] osd.16 10.5.0.30:6800/5585
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.993065 >=
> grace 20.00)
> 2012-12-17 02:17:05.993184 mon.0 [INF] osd.17 10.5.0.31:6800/5578
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.993169 >=
> grace 20.00)
> 2012-12-17 02:17:05.993274 mon.0 [INF] osd.18 10.5.0.32:6800/5097
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.993260 >=
> grace 20.00)
> 2012-12-17 02:17:05.993367 mon.0 [INF] osd.19 10.5.0.33:6800/5109
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.993352 >=
> grace 20.00)
> 2012-12-17 02:17:05.993464 mon.0 [INF] osd.20 10.5.0.34:6800/5125
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.993448 >=
> grace 20.00)
> 2012-12-17 02:17:05.993554 mon.0 [INF] osd.21 10.5.0.35:6800/5183
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.993538 >=
> grace 20.00)
> 2012-12-17 02:17:05.993644 mon.0 [INF] osd.22 10.5.0.36:6800/5202
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.993628 >=
> grace 20.00)
> 2012-12-17 02:17:05.993740 mon.0 [INF] osd.23 10.5.0.37:6800/5252
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.993725 >=
> grace 20.00)
> 2012-12-17 02:17:05.993831 mon.0 [INF] osd.24 10.5.0.30:6803/5758
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.993816 >=
> grace 20.00)
> 2012-12-17 02:17:05.993924 mon.0 [INF] osd.25 10.5.0.31:6803/5748
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.993908 >=
> grace 20.00)
> 2012-12-17 02:17:05.994018 mon.0 [INF] osd.26 10.5.0.32:6803/5275
> failed (3 reports from 1 peers after 2012-12-17 02:17:29.994002 >=
> grace 20.00)
> 2012-12-17 02:17:06.105315 mon.0 [I

Rados consistency model

2012-12-18 Thread Rutger ter Borg


Dear list,

I have a question regarding concurrency guarantees of Rados. Suppose I 
have two nodes, say A and B, both running a process, and both using the 
same rados storage pool, maybe connected through different OSDs. Suppose 
node A updates an object in the storage pool, and after completion 
immediately notifies B (through its own messaging layer) that that 
specific object has been updated. Then, can I assume that, if B re-reads 
that object, it will always get the updated one? If not, what would be 
the recommended way of notifying B?


IOW, what kind of consistency model should I assume?

Thanks,

Rutger


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


rbd caching issue

2012-12-18 Thread Jens Rehpöhler
Hi folks,

i just received a mail of a customer. He reclaimed that the "ping
latency" of his VM rises, if he does a lot of IO
inside of the VM. I have done the same with a test VM.

I could reproduce this behavior. If i disable the rbd cache the VM IO is
slower but the latency is ok. Even SSH und other
programs are affected  so its not a problem of slow ICMP.

Device Setting:

virtio0:
rbd:9997/vm-1171-disk-1.rbd:rbd_cache=true:rbd_cache_size=16777216:rbd_cache_max_dirty=8388608:rbd_cache_target_dirty=4194304,cache=none

normal ping:

64 bytes from 109.75.x.x: icmp_seq=38 ttl=56 time=29.2 ms
64 bytes from 109.75.x.x: icmp_seq=39 ttl=56 time=20.8 ms
64 bytes from 109.75.x.x: icmp_seq=40 ttl=56 time=22.4 ms

with lots of IO:

64 bytes from 109.75.x.x: icmp_seq=87 ttl=56 time=28.1 ms
64 bytes from 109.75.x.x: icmp_seq=88 ttl=56 time=665 ms
64 bytes from 109.75.x.x: icmp_seq=89 ttl=56 time=226 ms
64 bytes from 109.75.x.x: icmp_seq=90 ttl=56 time=179 ms
64 bytes from 109.75.x.x: icmp_seq=91 ttl=56 time=140 ms
64 bytes from 109.75.x.x: icmp_seq=92 ttl=56 time=25.6 ms
64 bytes from 109.75.x.x: icmp_seq=93 ttl=56 time=568 ms
64 bytes from 109.75.x.x: icmp_seq=94 ttl=56 time=405 ms
64 bytes from 109.75.x.x: icmp_seq=95 ttl=56 time=223 ms
64 bytes from 109.75.x.x: icmp_seq=96 ttl=56 time=24.5 ms
64 bytes from 109.75.x.x: icmp_seq=97 ttl=56 time=321 ms
64 bytes from 109.75.x.x: icmp_seq=98 ttl=56 time=391 ms
64 bytes from 109.75.x.x: icmp_seq=99 ttl=56 time=4200 ms
64 bytes from 109.75.x.x: icmp_seq=101 ttl=56 time=2194 ms

But if i disable caching:

virtio0: rbd:9997/vm-1171-disk-1.rbd:rbd_cache=false,cache=writeback

with lots of IO:

64 bytes from 109.75.x.x: icmp_seq=62 ttl=56 time=22.1 ms
64 bytes from 109.75.x.x: icmp_seq=63 ttl=56 time=26.5 ms
64 bytes from 109.75.x.x: icmp_seq=64 ttl=56 time=30.7 ms
64 bytes from 109.75.x.x: icmp_seq=65 ttl=56 time=24.8 ms
64 bytes from 109.75.x.x: icmp_seq=66 ttl=56 time=21.9 ms

Can someone please explain me this behavior ? Why is the latency of the
VM spiky if i enable rbd caching ? I've played around with the caching
parameters
but with caching enabled its always the same.

KVM Version: 1.2.1
Ceph Version: ceph version 0.48.2argonaut
(commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe)

Thanks a lot !!

-- 
mit freundlichen Grüssen

Jens Rehpöhler

--
filoo GmbH
Moltkestr. 25a
0 Gütersloh
HRB4355 AG Gütersloh

Geschäftsführer: S.Grewing | J.Rehpöhler | Dr. C.Kunz
Telefon: +49 5241 8673012 | Mobil: +49 151 54645798
Hotline: +49 5241 8673026| Fax: +49 5241 8673020

Folgen Sie uns auf:
Twitter: http://twitter.com/filoogmbh
Facebook: http://facebook.com/filoogmbh




signature.asc
Description: OpenPGP digital signature


Re: Slow requests

2012-12-18 Thread Jens Kristian Søgaard

Hi Dino,


Seems that the RPM packager likes to keep the latest and greatest
versions in http://ceph.com/rpm-testing/ but this path isn't defined
in the ceph yum repository.


Thanks for the link!

Perhaps the documentation should be updated with this URL?

The release notes link to:

http://ceph.com/docs/master/install/rpm/

--
Jens Kristian Søgaard, Mermaid Consulting ApS,
j...@mermaidconsulting.dk,
http://.mermaidconsulting.com/


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html