Re: Empty directory size greater than zero and can't remove
On Wed, 19 Dec 2012, Mark Kirkwood wrote: > On 19/12/12 15:56, Drunkard Zhang wrote: > > 2012/12/19 Mark Kirkwood : > > > On 19/12/12 14:44, Drunkard Zhang wrote: > > > > 2012/12/16 Drunkard Zhang : > > > > > I couldn't rm files in ceph, which was backuped files of one osd. It > > > > > reports direcory not empty, but there's nothing under that directory, > > > > > just the directory itself held some spaces. How could I shoot down the > > > > > problem ? > > > > > > > > > > log30 /mnt/bc # ls -aR osd.28/ > > > > > osd.28/: > > > > > . .. osd.28 > > > > > > > > > > osd.28/osd.28: > > > > > . .. current > > > > > > > > > > osd.28/osd.28/current: > > > > > . .. 0.537_head > > > > > > > > > > osd.28/osd.28/current/0.537_head: > > > > > . .. > > > > > log30 /mnt/bc # ls -lhd osd.28/osd.28/current/0.537_head > > > > > drwxr-xr-x 1 root root 119M Dec 14 19:22 > > > > > osd.28/osd.28/current/0.537_head > > > > > log30 /mnt/bc # > > > > > log30 /mnt/bc # rm -rf osd.28/ > > > > > rm: cannot remove ?osd.28/osd.28/current/0.537_head?: Directory not > > > > > empty > > > > > log30 /mnt/bc # rm -rf osd.28/osd.28/current/0.537_head > > > > > rm: cannot remove ?osd.28/osd.28/current/0.537_head?: Directory not > > > > > empty > > > > > > > > > > The cluster seems health: > > > > > log3 ~ # ceph -s > > > > > health HEALTH_OK > > > > > monmap e1: 3 mons at > > > > > > > > > > {log21=10.205.118.21:6789/0,log3=10.205.119.2:6789/0,squid86-log12=150.164.100.218:6789/0}, > > > > > election epoch 640, quorum 0,1,2 log21,log3,squid86-log12 > > > > > osdmap e1864: 45 osds: 45 up, 45 in > > > > > pgmap v163907: 9224 pgs: 9224 active+clean; 3168 GB data, 9565 > > > > > GB > > > > > used, 111 TB / 120 TB avail > > > > > mdsmap e134: 1/1/1 up {0=log14=up:active}, 1 up:standby > > > > > > > > > After mds restart, I got this error message: > > > > 2012-12-19 09:16:24.837045 mds.0 [ERR] unmatched fragstat size on > > > > single dirfrag 10006c7, inode has f(v6 m2012-11-24 23:18:34.947266 > > > > 1773=1773+0), dirfrag has f(v6 m2012-12-17 12:43:52.203358 > > > > 2038=2038+0) > > > > > > > > How can I fix this? > > > > > > > Is it a btrfs filesystem? If so will have sub volumes hiding in there you > > > need to remove 1st. > > Thanks for reply, osds all lives on xfs filesystem. > > Ah, right - might be worth showing us output of 'ls -la' in the dir concerned. > In particular the link counts might be wrong (indicating fs corruption, > probably fixable with xfs_repair). This is a problem in the MDS, not the fs underneath the OSDs. There was at least one bug that was corrupting the 'rstats' recursive info that could lead to this that has been fixed recently. The MDS is actually repairing this as it goes, unless you specify the 'mds verify scatter = true' option, in which case it will assert and kill itself. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Empty directory size greater than zero and can't remove
On 19/12/12 15:56, Drunkard Zhang wrote: 2012/12/19 Mark Kirkwood : On 19/12/12 14:44, Drunkard Zhang wrote: 2012/12/16 Drunkard Zhang : I couldn't rm files in ceph, which was backuped files of one osd. It reports direcory not empty, but there's nothing under that directory, just the directory itself held some spaces. How could I shoot down the problem ? log30 /mnt/bc # ls -aR osd.28/ osd.28/: . .. osd.28 osd.28/osd.28: . .. current osd.28/osd.28/current: . .. 0.537_head osd.28/osd.28/current/0.537_head: . .. log30 /mnt/bc # ls -lhd osd.28/osd.28/current/0.537_head drwxr-xr-x 1 root root 119M Dec 14 19:22 osd.28/osd.28/current/0.537_head log30 /mnt/bc # log30 /mnt/bc # rm -rf osd.28/ rm: cannot remove ‘osd.28/osd.28/current/0.537_head’: Directory not empty log30 /mnt/bc # rm -rf osd.28/osd.28/current/0.537_head rm: cannot remove ‘osd.28/osd.28/current/0.537_head’: Directory not empty The cluster seems health: log3 ~ # ceph -s health HEALTH_OK monmap e1: 3 mons at {log21=10.205.118.21:6789/0,log3=10.205.119.2:6789/0,squid86-log12=150.164.100.218:6789/0}, election epoch 640, quorum 0,1,2 log21,log3,squid86-log12 osdmap e1864: 45 osds: 45 up, 45 in pgmap v163907: 9224 pgs: 9224 active+clean; 3168 GB data, 9565 GB used, 111 TB / 120 TB avail mdsmap e134: 1/1/1 up {0=log14=up:active}, 1 up:standby After mds restart, I got this error message: 2012-12-19 09:16:24.837045 mds.0 [ERR] unmatched fragstat size on single dirfrag 10006c7, inode has f(v6 m2012-11-24 23:18:34.947266 1773=1773+0), dirfrag has f(v6 m2012-12-17 12:43:52.203358 2038=2038+0) How can I fix this? Is it a btrfs filesystem? If so will have sub volumes hiding in there you need to remove 1st. Thanks for reply, osds all lives on xfs filesystem. Ah, right - might be worth showing us output of 'ls -la' in the dir concerned. In particular the link counts might be wrong (indicating fs corruption, probably fixable with xfs_repair). Cheers Mark -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
mon not marking dead osds down and slow streaming write performance
Hi all, I apologise if this list is only for dev issues and not for operators, I didn't see a more general list on the ceph website. I have 5 OSD processes per host, and an FC uplink port failure caused kernel panics in two hosts - 0404 and 0401. The mon log looks like this: 2012-12-19 13:30:38.634865 7f9a0f167700 10 mon.3@0(leader).osd e2184 preprocess_query osd_failure(osd.404 172.22.4.4:6812/12835 for 8832 e2184 v2184) v3 from osd.602 172.22.4.6:6806/5152 2012-12-19 13:30:38.634875 7f9a0f167700 5 mon.3@0(leader).osd e2184 can_mark_down current up_ratio 0.298429 < min 0.3, will not mark osd.404 down 2012-12-19 13:30:38.634880 7f9a0f167700 5 mon.3@0(leader).osd e2184 preprocess_ The cluster appears healthy root@os-0405:~# ceph -s health HEALTH_OK monmap e3: 1 mons at {3=172.22.4.5:6789/0}, election epoch 1, quorum 0 3 osdmap e2184: 191 osds: 57 up, 57 in pgmap v205386: 121952 pgs: 121951 active+clean, 1 active+clean+scrubbing; 4437 MB data, 49497 MB used, 103 TB / 103 TB avail mdsmap e1: 0/0/1 up root@os-0405:~# ceph osd tree # idweight type name up/down reweight -1 30 pool default -3 30 rack unknownrack -2 6 host os-0401 100 1 osd.100 up 1 101 1 osd.101 up 1 102 1 osd.102 up 1 103 1 osd.103 up 1 104 1 osd.104 up 1 112 1 osd.112 up 1 -4 6 host os-0402 200 1 osd.200 up 1 201 1 osd.201 up 1 202 1 osd.202 up 1 203 1 osd.203 up 1 204 1 osd.204 up 1 212 1 osd.212 up 1 -5 6 host os-0403 300 1 osd.300 up 1 301 1 osd.301 up 1 302 1 osd.302 up 1 303 1 osd.303 up 1 304 1 osd.304 up 1 312 1 osd.312 up 1 -6 6 host os-0404 400 1 osd.400 up 1 401 1 osd.401 up 1 402 1 osd.402 up 1 403 1 osd.403 up 1 404 1 osd.404 up 1 412 1 osd.412 up 1 -7 0 host os-0405 -8 6 host os-0406 600 1 osd.600 up 1 601 1 osd.601 up 1 602 1 osd.602 up 1 603 1 osd.603 up 1 604 1 osd.604 up 1 612 1 osd.612 up 1 but os-0404 has no osd processes running anymore. root@os-0404:~# ps aux | grep ceph root 4964 0.0 0.0 9628 920 pts/1S+ 13:31 0:00 grep --color=auto ceph and even if it did, it can't access the luns in order to mount the xfs filesystems with all the osd data. What is preventing the mon from marking the osds on 0404 down? A second issue I have been having is that my reads+writes are very bursty, going from 8MB/s to 200MB/s when doing a dd from a physical client over 10GbE. It seems to be waiting on the mon most of the time, and iostat shows long io wait times for the disk the mon is using. I can also see it writing ~40MB/s constantly to disk in iotop, though I don't know if this is random or sequential. I see a lot of waiting for sub ops which I thought might be a result of the io wait. Is that a normal amount of activity for a mon process? Should I be running the mon processes off more than just a single sata disk to keep up with ~30 OSD processes? Thanks for your time. - Michael Chapman -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Empty directory size greater than zero and can't remove
2012/12/19 Mark Kirkwood : > On 19/12/12 14:44, Drunkard Zhang wrote: >> >> 2012/12/16 Drunkard Zhang : >>> >>> I couldn't rm files in ceph, which was backuped files of one osd. It >>> reports direcory not empty, but there's nothing under that directory, >>> just the directory itself held some spaces. How could I shoot down the >>> problem ? >>> >>> log30 /mnt/bc # ls -aR osd.28/ >>> osd.28/: >>> . .. osd.28 >>> >>> osd.28/osd.28: >>> . .. current >>> >>> osd.28/osd.28/current: >>> . .. 0.537_head >>> >>> osd.28/osd.28/current/0.537_head: >>> . .. >>> log30 /mnt/bc # ls -lhd osd.28/osd.28/current/0.537_head >>> drwxr-xr-x 1 root root 119M Dec 14 19:22 osd.28/osd.28/current/0.537_head >>> log30 /mnt/bc # >>> log30 /mnt/bc # rm -rf osd.28/ >>> rm: cannot remove ‘osd.28/osd.28/current/0.537_head’: Directory not empty >>> log30 /mnt/bc # rm -rf osd.28/osd.28/current/0.537_head >>> rm: cannot remove ‘osd.28/osd.28/current/0.537_head’: Directory not empty >>> >>> The cluster seems health: >>> log3 ~ # ceph -s >>> health HEALTH_OK >>> monmap e1: 3 mons at >>> >>> {log21=10.205.118.21:6789/0,log3=10.205.119.2:6789/0,squid86-log12=150.164.100.218:6789/0}, >>> election epoch 640, quorum 0,1,2 log21,log3,squid86-log12 >>> osdmap e1864: 45 osds: 45 up, 45 in >>> pgmap v163907: 9224 pgs: 9224 active+clean; 3168 GB data, 9565 GB >>> used, 111 TB / 120 TB avail >>> mdsmap e134: 1/1/1 up {0=log14=up:active}, 1 up:standby >>> >> After mds restart, I got this error message: >> 2012-12-19 09:16:24.837045 mds.0 [ERR] unmatched fragstat size on >> single dirfrag 10006c7, inode has f(v6 m2012-11-24 23:18:34.947266 >> 1773=1773+0), dirfrag has f(v6 m2012-12-17 12:43:52.203358 >> 2038=2038+0) >> >> How can I fix this? >> > > Is it a btrfs filesystem? If so will have sub volumes hiding in there you > need to remove 1st. Thanks for reply, osds all lives on xfs filesystem. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Empty directory size greater than zero and can't remove
On 19/12/12 14:44, Drunkard Zhang wrote: 2012/12/16 Drunkard Zhang : I couldn't rm files in ceph, which was backuped files of one osd. It reports direcory not empty, but there's nothing under that directory, just the directory itself held some spaces. How could I shoot down the problem ? log30 /mnt/bc # ls -aR osd.28/ osd.28/: . .. osd.28 osd.28/osd.28: . .. current osd.28/osd.28/current: . .. 0.537_head osd.28/osd.28/current/0.537_head: . .. log30 /mnt/bc # ls -lhd osd.28/osd.28/current/0.537_head drwxr-xr-x 1 root root 119M Dec 14 19:22 osd.28/osd.28/current/0.537_head log30 /mnt/bc # log30 /mnt/bc # rm -rf osd.28/ rm: cannot remove ‘osd.28/osd.28/current/0.537_head’: Directory not empty log30 /mnt/bc # rm -rf osd.28/osd.28/current/0.537_head rm: cannot remove ‘osd.28/osd.28/current/0.537_head’: Directory not empty The cluster seems health: log3 ~ # ceph -s health HEALTH_OK monmap e1: 3 mons at {log21=10.205.118.21:6789/0,log3=10.205.119.2:6789/0,squid86-log12=150.164.100.218:6789/0}, election epoch 640, quorum 0,1,2 log21,log3,squid86-log12 osdmap e1864: 45 osds: 45 up, 45 in pgmap v163907: 9224 pgs: 9224 active+clean; 3168 GB data, 9565 GB used, 111 TB / 120 TB avail mdsmap e134: 1/1/1 up {0=log14=up:active}, 1 up:standby After mds restart, I got this error message: 2012-12-19 09:16:24.837045 mds.0 [ERR] unmatched fragstat size on single dirfrag 10006c7, inode has f(v6 m2012-11-24 23:18:34.947266 1773=1773+0), dirfrag has f(v6 m2012-12-17 12:43:52.203358 2038=2038+0) How can I fix this? Is it a btrfs filesystem? If so will have sub volumes hiding in there you need to remove 1st. Cheers Mark -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Empty directory size greater than zero and can't remove
2012/12/16 Drunkard Zhang : > I couldn't rm files in ceph, which was backuped files of one osd. It > reports direcory not empty, but there's nothing under that directory, > just the directory itself held some spaces. How could I shoot down the > problem ? > > log30 /mnt/bc # ls -aR osd.28/ > osd.28/: > . .. osd.28 > > osd.28/osd.28: > . .. current > > osd.28/osd.28/current: > . .. 0.537_head > > osd.28/osd.28/current/0.537_head: > . .. > log30 /mnt/bc # ls -lhd osd.28/osd.28/current/0.537_head > drwxr-xr-x 1 root root 119M Dec 14 19:22 osd.28/osd.28/current/0.537_head > log30 /mnt/bc # > log30 /mnt/bc # rm -rf osd.28/ > rm: cannot remove ‘osd.28/osd.28/current/0.537_head’: Directory not empty > log30 /mnt/bc # rm -rf osd.28/osd.28/current/0.537_head > rm: cannot remove ‘osd.28/osd.28/current/0.537_head’: Directory not empty > > The cluster seems health: > log3 ~ # ceph -s >health HEALTH_OK >monmap e1: 3 mons at > {log21=10.205.118.21:6789/0,log3=10.205.119.2:6789/0,squid86-log12=150.164.100.218:6789/0}, > election epoch 640, quorum 0,1,2 log21,log3,squid86-log12 >osdmap e1864: 45 osds: 45 up, 45 in > pgmap v163907: 9224 pgs: 9224 active+clean; 3168 GB data, 9565 GB > used, 111 TB / 120 TB avail >mdsmap e134: 1/1/1 up {0=log14=up:active}, 1 up:standby > After mds restart, I got this error message: 2012-12-19 09:16:24.837045 mds.0 [ERR] unmatched fragstat size on single dirfrag 10006c7, inode has f(v6 m2012-11-24 23:18:34.947266 1773=1773+0), dirfrag has f(v6 m2012-12-17 12:43:52.203358 2038=2038+0) How can I fix this? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rados consistency model
You can expect read-after-write on any object. That is, once the write is complete, any subsequent reader will see the result. -Sam On Tue, Dec 18, 2012 at 12:51 AM, Rutger ter Borg wrote: > > Dear list, > > I have a question regarding concurrency guarantees of Rados. Suppose I have > two nodes, say A and B, both running a process, and both using the same > rados storage pool, maybe connected through different OSDs. Suppose node A > updates an object in the storage pool, and after completion immediately > notifies B (through its own messaging layer) that that specific object has > been updated. Then, can I assume that, if B re-reads that object, it will > always get the updated one? If not, what would be the recommended way of > notifying B? > > IOW, what kind of consistency model should I assume? > > Thanks, > > Rutger > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rbd map command hangs for 15 minutes during system start up
I've added the output of "ps -ef" in addition to triggering a trace when a hang is detected. Not much is generally running at that point, but you can have a look: https://gist.github.com/raw/4330223/2f131ee312ee43cb3d8c307a9bf2f454a7edfe57/rbd-hang-1355851498.txt Is it possible that there is some sort of deadlock going on? We are doing the rbd maps (and subsequent filesystem mounts) on the same systems which are running the ceph-osd and ceph-mon processes. To get around the 'sync' deadlock problem, we are using a patch from Sage which ignores system wide sync's on filesystems mounted with the 'mand' option (and we mount the underlying osd filesystems with 'mand'). However I am wondering if there is potential for other types of deadlocks in this environment. Also, we recently saw an rbd hang in a much older version, running kernel 3.5.3 with only the sync hack patch, along side ceph 0.48.1. It's possible that this issue was around for some time, just the recent patches made it happen more often (and thus more reproducible) for us. On Tue, Dec 18, 2012 at 8:09 AM, Alex Elder wrote: > On 12/17/2012 11:12 AM, Nick Bartos wrote: >> Here's a log with the rbd debugging enabled: >> >> https://gist.github.com/raw/4319962/d9690fd92c169198efc5eecabf275ef1808929d2/rbd-hang-test-1355763470.log >> >> On Fri, Dec 14, 2012 at 10:03 AM, Alex Elder wrote: >>> On 12/14/2012 10:53 AM, Nick Bartos wrote: Yes I was only enabling debugging for libceph. I'm adding debugging for rbd as well. I'll do a repro later today when a test cluster opens up. >>> >>> Excellent, thank you. -Alex > > I looked through these debugging messages. Looking only at the > rbd debugging, what I see seems to indicate that rbd is idle at > the point the "hang" seems to start. This suggests that the hang > is not due to rbd itself, but rather whatever it is that might > be responsible for using the rbd image once it has been mapped. > > Is that possible? I don't know what process you have that is > mapping the rbd image, and what is supposed to be the next thing > it does. (I realize this may not make a lot of sense, given > a patch in rdb seems to have caused the hang to begin occurring.) > > Also note that the debugging information available (i.e., the > lines in the code that can output debugging information) may > well be incomplete. So if you don't find anything it may be > necessary to provide you with another update which might include > more debugging. > > Anyway, could you provide a little more context about what > is going on sort of *around* rbd when activity seems to stop? > > Thanks a lot. > > -Alex -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSD memory leaks?
Nothing terrific... Kernel logs from my clients are full of "libceph: osd4 172.20.11.32:6801 socket closed" I saw this somewhere on the tracker. Does this harm? Thanks. -- Regards, Sébastien Han. On Mon, Dec 17, 2012 at 11:55 PM, Samuel Just wrote: > > What is the workload like? > -Sam > > On Mon, Dec 17, 2012 at 2:41 PM, Sébastien Han > wrote: > > Hi, > > > > No, I don't see nothing abnormal in the network stats. I don't see > > anything in the logs... :( > > The weird thing is that one node over 4 seems to take way more memory > > than the others... > > > > -- > > Regards, > > Sébastien Han. > > > > > > On Mon, Dec 17, 2012 at 11:31 PM, Sébastien Han > > wrote: > >> > >> Hi, > >> > >> No, I don't see nothing abnormal in the network stats. I don't see > >> anything in the logs... :( > >> The weird thing is that one node over 4 seems to take way more memory than > >> the others... > >> > >> -- > >> Regards, > >> Sébastien Han. > >> > >> > >> > >> On Mon, Dec 17, 2012 at 7:12 PM, Samuel Just wrote: > >>> > >>> Are you having network hiccups? There was a bug noticed recently that > >>> could cause a memory leak if nodes are being marked up and down. > >>> -Sam > >>> > >>> On Mon, Dec 17, 2012 at 12:28 AM, Sébastien Han > >>> wrote: > >>> > Hi guys, > >>> > > >>> > Today looking at my graphs I noticed that one over 4 ceph nodes used a > >>> > lot of memory. It keeps growing and growing. > >>> > See the graph attached to this mail. > >>> > I run 0.48.2 on Ubuntu 12.04. > >>> > > >>> > The other nodes also grow, but slowly than the first one. > >>> > > >>> > I'm not quite sure about the information that I have to provide. So > >>> > let me know. The only thing I can say is that the load haven't > >>> > increase that much this week. It seems to be consuming and not giving > >>> > back the memory. > >>> > > >>> > Thank you in advance. > >>> > > >>> > -- > >>> > Regards, > >>> > Sébastien Han. > >> > >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rbd map command hangs for 15 minutes during system start up
On 12/17/2012 11:12 AM, Nick Bartos wrote: > Here's a log with the rbd debugging enabled: > > https://gist.github.com/raw/4319962/d9690fd92c169198efc5eecabf275ef1808929d2/rbd-hang-test-1355763470.log > > On Fri, Dec 14, 2012 at 10:03 AM, Alex Elder wrote: >> On 12/14/2012 10:53 AM, Nick Bartos wrote: >>> Yes I was only enabling debugging for libceph. I'm adding debugging >>> for rbd as well. I'll do a repro later today when a test cluster >>> opens up. >> >> Excellent, thank you. -Alex I looked through these debugging messages. Looking only at the rbd debugging, what I see seems to indicate that rbd is idle at the point the "hang" seems to start. This suggests that the hang is not due to rbd itself, but rather whatever it is that might be responsible for using the rbd image once it has been mapped. Is that possible? I don't know what process you have that is mapping the rbd image, and what is supposed to be the next thing it does. (I realize this may not make a lot of sense, given a patch in rdb seems to have caused the hang to begin occurring.) Also note that the debugging information available (i.e., the lines in the code that can output debugging information) may well be incomplete. So if you don't find anything it may be necessary to provide you with another update which might include more debugging. Anyway, could you provide a little more context about what is going on sort of *around* rbd when activity seems to stop? Thanks a lot. -Alex -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Recovery stuck and radosgateway not initializing
Our configuration : 6 OSDs, 3 Mon. Journal is on INTEL SSDSA2CW120G3 disk and Data is on Hitachi HUS724040ALE640 disk. When OSD does recovery IO is high, and at some point the OSD is killed. We set max active recovery to 1 and set filestore op thread suicide timeout to 360. What should I do in that case ? -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Yann ROBIN Sent: mardi 18 décembre 2012 11:51 To: ceph-devel@vger.kernel.org Subject: Recovery stuck and radosgateway not initializing Hi, We're using ceph v0.55, and last night we loste one node of our cluster. When it came back, ceph start recovering but since then the radosgateway could not connect to the cluster. The rados gateway timeout on initializtion (somewhere in the radosclient connect). The other problem (and I think it's related) is that the recovery isn't working. Osd gets OSD Op thread timeout and sometimes some of the OSD crash (see stacktrace attached). So it seems that our OSD aren't up long enough for the recovery to proceed. Any would be appreciated. Thanks, -- Yann -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Recovery stuck and radosgateway not initializing
Hi, We're using ceph v0.55, and last night we loste one node of our cluster. When it came back, ceph start recovering but since then the radosgateway could not connect to the cluster. The rados gateway timeout on initializtion (somewhere in the radosclient connect). The other problem (and I think it's related) is that the recovery isn't working. Osd gets OSD Op thread timeout and sometimes some of the OSD crash (see stacktrace attached). So it seems that our OSD aren't up long enough for the recovery to proceed. Any would be appreciated. Thanks, -- Yann ceph.log Description: ceph.log
Re: Striped images and cluster misbehavior
On Mon, Dec 17, 2012 at 2:36 AM, Andrey Korolyov wrote: > Hi, > > After recent switch do default ``--stripe-count 1'' on image upload I > have observed some strange thing - single import or deletion of the > striped image may temporarily turn off entire cluster, literally(see > log below). > Of course next issued osd map fix the situation, but all in-flight > operations experiencing a short freeze. This issue appears randomly in > some import or delete operation, have not seen any other types causing > this. Even if a nature of this bug laying completely in the client-osd > interaction, may be ceph should develop a some foolproof actions even > if complaining client have admin privileges? Almost for sure this > should be reproduced within teuthology with rwx rights both on osds > and mons at the client. And as I can see there is no problem on both > physical and protocol layer for dedicated cluster interface on client > machine. > > 2012-12-17 02:17:03.691079 mon.0 [INF] pgmap v2403268: 15552 pgs: > 15552 active+clean; 931 GB data, 2927 GB used, 26720 GB / 29647 GB > avail > 2012-12-17 02:17:04.693344 mon.0 [INF] pgmap v2403269: 15552 pgs: > 15552 active+clean; 931 GB data, 2927 GB used, 26720 GB / 29647 GB > avail > 2012-12-17 02:17:05.695742 mon.0 [INF] pgmap v2403270: 15552 pgs: > 15552 active+clean; 931 GB data, 2927 GB used, 26720 GB / 29647 GB > avail > 2012-12-17 02:17:05.991900 mon.0 [INF] osd.0 10.5.0.10:6800/4907 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.991859 >= > grace 20.00) > 2012-12-17 02:17:05.992017 mon.0 [INF] osd.1 10.5.0.11:6800/5011 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.991995 >= > grace 20.00) > 2012-12-17 02:17:05.992139 mon.0 [INF] osd.2 10.5.0.12:6803/5226 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.992110 >= > grace 20.00) > 2012-12-17 02:17:05.992240 mon.0 [INF] osd.3 10.5.0.13:6803/6054 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.992224 >= > grace 20.00) > 2012-12-17 02:17:05.992330 mon.0 [INF] osd.4 10.5.0.14:6803/5792 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.992317 >= > grace 20.00) > 2012-12-17 02:17:05.992420 mon.0 [INF] osd.5 10.5.0.15:6803/5564 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.992405 >= > grace 20.00) > 2012-12-17 02:17:05.992515 mon.0 [INF] osd.7 10.5.0.17:6803/5902 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.992501 >= > grace 20.00) > 2012-12-17 02:17:05.992607 mon.0 [INF] osd.8 10.5.0.10:6803/5338 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.992591 >= > grace 20.00) > 2012-12-17 02:17:05.992702 mon.0 [INF] osd.10 10.5.0.12:6800/5040 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.992686 >= > grace 20.00) > 2012-12-17 02:17:05.992793 mon.0 [INF] osd.11 10.5.0.13:6800/5748 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.992778 >= > grace 20.00) > 2012-12-17 02:17:05.992891 mon.0 [INF] osd.12 10.5.0.14:6800/5459 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.992875 >= > grace 20.00) > 2012-12-17 02:17:05.992980 mon.0 [INF] osd.13 10.5.0.15:6800/5235 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.992966 >= > grace 20.00) > 2012-12-17 02:17:05.993081 mon.0 [INF] osd.16 10.5.0.30:6800/5585 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.993065 >= > grace 20.00) > 2012-12-17 02:17:05.993184 mon.0 [INF] osd.17 10.5.0.31:6800/5578 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.993169 >= > grace 20.00) > 2012-12-17 02:17:05.993274 mon.0 [INF] osd.18 10.5.0.32:6800/5097 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.993260 >= > grace 20.00) > 2012-12-17 02:17:05.993367 mon.0 [INF] osd.19 10.5.0.33:6800/5109 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.993352 >= > grace 20.00) > 2012-12-17 02:17:05.993464 mon.0 [INF] osd.20 10.5.0.34:6800/5125 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.993448 >= > grace 20.00) > 2012-12-17 02:17:05.993554 mon.0 [INF] osd.21 10.5.0.35:6800/5183 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.993538 >= > grace 20.00) > 2012-12-17 02:17:05.993644 mon.0 [INF] osd.22 10.5.0.36:6800/5202 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.993628 >= > grace 20.00) > 2012-12-17 02:17:05.993740 mon.0 [INF] osd.23 10.5.0.37:6800/5252 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.993725 >= > grace 20.00) > 2012-12-17 02:17:05.993831 mon.0 [INF] osd.24 10.5.0.30:6803/5758 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.993816 >= > grace 20.00) > 2012-12-17 02:17:05.993924 mon.0 [INF] osd.25 10.5.0.31:6803/5748 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.993908 >= > grace 20.00) > 2012-12-17 02:17:05.994018 mon.0 [INF] osd.26 10.5.0.32:6803/5275 > failed (3 reports from 1 peers after 2012-12-17 02:17:29.994002 >= > grace 20.00) > 2012-12-17 02:17:06.105315 mon.0 [I
Rados consistency model
Dear list, I have a question regarding concurrency guarantees of Rados. Suppose I have two nodes, say A and B, both running a process, and both using the same rados storage pool, maybe connected through different OSDs. Suppose node A updates an object in the storage pool, and after completion immediately notifies B (through its own messaging layer) that that specific object has been updated. Then, can I assume that, if B re-reads that object, it will always get the updated one? If not, what would be the recommended way of notifying B? IOW, what kind of consistency model should I assume? Thanks, Rutger -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
rbd caching issue
Hi folks, i just received a mail of a customer. He reclaimed that the "ping latency" of his VM rises, if he does a lot of IO inside of the VM. I have done the same with a test VM. I could reproduce this behavior. If i disable the rbd cache the VM IO is slower but the latency is ok. Even SSH und other programs are affected so its not a problem of slow ICMP. Device Setting: virtio0: rbd:9997/vm-1171-disk-1.rbd:rbd_cache=true:rbd_cache_size=16777216:rbd_cache_max_dirty=8388608:rbd_cache_target_dirty=4194304,cache=none normal ping: 64 bytes from 109.75.x.x: icmp_seq=38 ttl=56 time=29.2 ms 64 bytes from 109.75.x.x: icmp_seq=39 ttl=56 time=20.8 ms 64 bytes from 109.75.x.x: icmp_seq=40 ttl=56 time=22.4 ms with lots of IO: 64 bytes from 109.75.x.x: icmp_seq=87 ttl=56 time=28.1 ms 64 bytes from 109.75.x.x: icmp_seq=88 ttl=56 time=665 ms 64 bytes from 109.75.x.x: icmp_seq=89 ttl=56 time=226 ms 64 bytes from 109.75.x.x: icmp_seq=90 ttl=56 time=179 ms 64 bytes from 109.75.x.x: icmp_seq=91 ttl=56 time=140 ms 64 bytes from 109.75.x.x: icmp_seq=92 ttl=56 time=25.6 ms 64 bytes from 109.75.x.x: icmp_seq=93 ttl=56 time=568 ms 64 bytes from 109.75.x.x: icmp_seq=94 ttl=56 time=405 ms 64 bytes from 109.75.x.x: icmp_seq=95 ttl=56 time=223 ms 64 bytes from 109.75.x.x: icmp_seq=96 ttl=56 time=24.5 ms 64 bytes from 109.75.x.x: icmp_seq=97 ttl=56 time=321 ms 64 bytes from 109.75.x.x: icmp_seq=98 ttl=56 time=391 ms 64 bytes from 109.75.x.x: icmp_seq=99 ttl=56 time=4200 ms 64 bytes from 109.75.x.x: icmp_seq=101 ttl=56 time=2194 ms But if i disable caching: virtio0: rbd:9997/vm-1171-disk-1.rbd:rbd_cache=false,cache=writeback with lots of IO: 64 bytes from 109.75.x.x: icmp_seq=62 ttl=56 time=22.1 ms 64 bytes from 109.75.x.x: icmp_seq=63 ttl=56 time=26.5 ms 64 bytes from 109.75.x.x: icmp_seq=64 ttl=56 time=30.7 ms 64 bytes from 109.75.x.x: icmp_seq=65 ttl=56 time=24.8 ms 64 bytes from 109.75.x.x: icmp_seq=66 ttl=56 time=21.9 ms Can someone please explain me this behavior ? Why is the latency of the VM spiky if i enable rbd caching ? I've played around with the caching parameters but with caching enabled its always the same. KVM Version: 1.2.1 Ceph Version: ceph version 0.48.2argonaut (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe) Thanks a lot !! -- mit freundlichen Grüssen Jens Rehpöhler -- filoo GmbH Moltkestr. 25a 0 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing | J.Rehpöhler | Dr. C.Kunz Telefon: +49 5241 8673012 | Mobil: +49 151 54645798 Hotline: +49 5241 8673026| Fax: +49 5241 8673020 Folgen Sie uns auf: Twitter: http://twitter.com/filoogmbh Facebook: http://facebook.com/filoogmbh signature.asc Description: OpenPGP digital signature
Re: Slow requests
Hi Dino, Seems that the RPM packager likes to keep the latest and greatest versions in http://ceph.com/rpm-testing/ but this path isn't defined in the ceph yum repository. Thanks for the link! Perhaps the documentation should be updated with this URL? The release notes link to: http://ceph.com/docs/master/install/rpm/ -- Jens Kristian Søgaard, Mermaid Consulting ApS, j...@mermaidconsulting.dk, http://.mermaidconsulting.com/ -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html