[ceph-users] Is lttng enable by default in debian hammer-0.94.5?
Hi, everyone After install hammer-0.94.5 in debian, i want to trace the librbd by lttng, but after done follow steps, i got nothing: 2036 mkdir -p traces 2037 lttng create -o traces librbd 2038 lttng enable-event -u 'librbd:*' 2039 lttng add-context -u -t pthread_id 2040 lttng start 2041 lttng stop So, is the lttng enabled in this version on debian? Thanks! -- hzwulibin 2015-10-30 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS and page cache
On Thu, Oct 29, 2015 at 4:30 PM, Sage Weil wrote: > On Thu, 29 Oct 2015, Yan, Zheng wrote: >> On Thu, Oct 29, 2015 at 2:21 PM, Gregory Farnum wrote: >> > On Wed, Oct 28, 2015 at 8:38 PM, Yan, Zheng wrote: >> >> On Thu, Oct 29, 2015 at 1:10 AM, Burkhard Linke >> >>> I tried to dig into the ceph-fuse code, but I was unable to find the >> >>> fragment that is responsible for flushing the data from the page cache. >> >>> >> >> >> >> fuse kernel code invalidates page cache on opening file. you can >> >> disable this behaviour by setting ""fuse use invalidate cb" config >> >> option to true. >> > >> > Zheng, do you know any reason we shouldn't make that the default value >> > now? There was a loopback deadlock (which is why it's disabled by >> > default) but I don't remember the details offhand well enough to know >> > if your recent work in those interfaces has fixed it. Or Sage? >> > -Greg >> >> there is no loopback deadlock now, because we use a separate thread to >> invalidate kernel page cache. I think we can enable this option >> safely. > > ...as long as nobody blocks waiting for invalidate while holding a lock > (client_lock?) that could prevent other fuse ops like write (pretty sure > that was the deadlock we saw before). I worry this could still happen > with a writer (or reader?) getting stuck in a check_caps() type situation > while the invalidate cb is waiting on a page lock held by the calling > kernel syscall... > the invalidate thread does not hold client_lock while invalidating kernel page cache. Regards Yan, Zheng ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd hang
More info output of dmesg [259956.804942] libceph: osd7 10.134.128.42:6806 socket closed (con state OPEN) [260752.788609] libceph: osd1 10.134.128.43:6800 socket closed (con state OPEN) [260757.908206] libceph: osd2 10.134.128.43:6803 socket closed (con state OPEN) [260763.181751] libceph: osd3 10.134.128.43:6806 socket closed (con state OPEN) [260852.224607] libceph: osd6 10.134.128.42:6803 socket closed (con state OPEN) [260852.510451] libceph: osd5 10.134.128.42:6800 socket closed (con state OPEN) [260856.868099] libceph: osd7 10.134.128.42:6806 socket closed (con state OPEN) [261652.890656] libceph: osd1 10.134.128.43:6800 socket closed (con state OPEN) [261657.972579] libceph: osd2 10.134.128.43:6803 socket closed (con state OPEN) [261663.283701] libceph: osd3 10.134.128.43:6806 socket closed (con state OPEN) [261752.325749] libceph: osd6 10.134.128.42:6803 socket closed (con state OPEN) [261752.611505] libceph: osd5 10.134.128.42:6800 socket closed (con state OPEN) [261756.969340] libceph: osd7 10.134.128.42:6806 socket closed (con state OPEN) [262552.961741] libceph: osd1 10.134.128.43:6800 socket closed (con state OPEN) [262558.074441] libceph: osd2 10.134.128.43:6803 socket closed (con state OPEN) [262563.385635] libceph: osd3 10.134.128.43:6806 socket closed (con state OPEN) [262652.427089] libceph: osd6 10.134.128.42:6803 socket closed (con state OPEN) [262652.712681] libceph: osd5 10.134.128.42:6800 socket closed (con state OPEN) [262657.070456] libceph: osd7 10.134.128.42:6806 socket closed (con state OPEN) I noticed that the osds are talking on 10.134.128.42 which is a part of the public network but I have defined the cluster network as 10.134.128.64/26 The machine has two nics 10.134.128.41 and 10.134.128.105. In dmesg output I should be seeing the socket closed spam on 10.134.128.10{5,6} right? ceph.conf snippett (see full in below) [global] public network = 10.134.128.0/26 cluster network = 10.134.128.64/26 - Original Message - From: "Jason Dillaman" To: "Joe Ryner" Cc: ceph-us...@ceph.com Sent: Thursday, October 29, 2015 12:05:38 PM Subject: Re: [ceph-users] rbd hang I don't see the read request hitting the wire, so I am thinking your client cannot talk to the primary PG for the 'rb.0.16cf.238e1f29.' object. Try adding "debug objecter = 20" to your configuration to get more details. -- Jason Dillaman - Original Message - > From: "Joe Ryner" > To: ceph-us...@ceph.com > Sent: Thursday, October 29, 2015 12:22:01 PM > Subject: [ceph-users] rbd hang > > i, > > I am having a strange problem with our development cluster. When I run rbd > export it just hangs. I have been running ceph for a long time and haven't > encountered this kind of issue. Any ideas as to what is going on? > > rbd -p locks export seco101ira - > > > I am running > > Centos 6.6 x86 64 > > ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70) > > I have enabled debugging and get the following when I run the command > > [root@durbium ~]# rbd -p locks export seco101ira - > 2015-10-29 11:17:08.183597 7fc3334fa7c0 1 librados: starting msgr at :/0 > 2015-10-29 11:17:08.183613 7fc3334fa7c0 1 librados: starting objecter > 2015-10-29 11:17:08.183739 7fc3334fa7c0 1 -- :/0 messenger.start > 2015-10-29 11:17:08.183779 7fc3334fa7c0 1 librados: setting wanted keys > 2015-10-29 11:17:08.183782 7fc3334fa7c0 1 librados: calling monclient init > 2015-10-29 11:17:08.184365 7fc3334fa7c0 1 -- :/1024687 --> > 10.134.128.42:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x15ba900 > con 0x15ba540 > 2015-10-29 11:17:08.185006 7fc3334f2700 1 -- 10.134.128.41:0/1024687 learned > my addr 10.134.128.41:0/1024687 > 2015-10-29 11:17:08.185995 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== > mon.1 10.134.128.42:6789/0 1 mon_map v1 491+0+0 (318324477 0 0) > 0x7fc318000be0 con 0x15ba540 > 2015-10-29 11:17:08.186213 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== > mon.1 10.134.128.42:6789/0 2 auth_reply(proto 2 0 (0) Success) v1 > 33+0+0 (4093383511 0 0) 0x7fc318001090 con 0x15ba540 > 2015-10-29 11:17:08.186544 7fc32da9a700 1 -- 10.134.128.41:0/1024687 --> > 10.134.128.42:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 > 0x7fc31c001700 con 0x15ba540 > 2015-10-29 11:17:08.187160 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== > mon.1 10.134.128.42:6789/0 3 auth_reply(proto 2 0 (0) Success) v1 > 206+0+0 (2382192463 0 0) 0x7fc318001090 con 0x15ba540 > 2015-10-29 11:17:08.187354 7fc32da9a700 1 -- 10.134.128.41:0/1024687 --> > 10.134.128.42:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 > 0x7fc31c002220 con 0x15ba540 > 2015-10-29 11:17:08.188001 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== > mon.1 10.134.128.42:6789/0 4 auth_reply(proto 2 0 (0) Success) v1 > 393+0+0 (34117402 0 0) 0x7fc3180008c0 con 0x15ba540 > 2015-10-29 11:17:08.188148 7fc32da9a700 1 -- 10.134.128.41:0/1024687 --> > 10.134.128.42:6789/0 -- mon
Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd
>From all we analyzed - look like - it's this issue http://tracker.ceph.com/issues/13045 PR: https://github.com/ceph/ceph/pull/6097 Can anyone help us to confirm this? :) 2015-10-29 23:13 GMT+02:00 Voloshanenko Igor : > Additional trace: > > #0 0x7f30f9891cc9 in __GI_raise (sig=sig@entry=6) at > ../nptl/sysdeps/unix/sysv/linux/raise.c:56 > #1 0x7f30f98950d8 in __GI_abort () at abort.c:89 > #2 0x7f30f87b36b5 in __gnu_cxx::__verbose_terminate_handler() () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #3 0x7f30f87b1836 in ?? () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #4 0x7f30f87b1863 in std::terminate() () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #5 0x7f30f87b1aa2 in __cxa_throw () from > /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > #6 0x7f2fddb50778 in ceph::__ceph_assert_fail > (assertion=assertion@entry=0x7f2fdddeca05 "sub < m_subsys.size()", > file=file@entry=0x7f2fdddec9f0 "./log/SubsystemMap.h", line=line@entry > =62, > func=func@entry=0x7f2fdddedba0 > <_ZZN4ceph3log12SubsystemMap13should_gatherEjiE19__PRETTY_FUNCTION__> "bool > ceph::log::SubsystemMap::should_gather(unsigned int, int)") at > common/assert.cc:77 > #7 0x7f2fdda1fed2 in ceph::log::SubsystemMap::should_gather > (level=, sub=, this=) > at ./log/SubsystemMap.h:62 > #8 0x7f2fdda3b693 in ceph::log::SubsystemMap::should_gather > (this=, sub=, level=) > at ./log/SubsystemMap.h:61 > #9 0x7f2fddd879be in ObjectCacher::flusher_entry > (this=0x7f2ff80b27a0) at osdc/ObjectCacher.cc:1527 > #10 0x7f2fddd9851d in ObjectCacher::FlusherThread::entry > (this=) at osdc/ObjectCacher.h:374 > #11 0x7f30f9c28182 in start_thread (arg=0x7f2e1a7fc700) at > pthread_create.c:312 > #12 0x7f30f995547d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > 2015-10-29 17:38 GMT+02:00 Voloshanenko Igor > : > >> Hi Wido and all community. >> >> We catched very idiotic issue on our Cloudstack installation, which >> related to ceph and possible to java-rados lib. >> >> So, we have constantly agent crashed (which cause very big problem for >> us... ). >> >> When agent crashed - it's crash JVM. And no event in logs at all. >> We enabled crush dump, and after crash we see next picture: >> >> #grep -A1 "Problematic frame" < /hs_err_pid30260.log >> Problematic frame: >> C [librbd.so.1.0.0+0x5d681] >> >> # gdb /usr/lib/librbd.so.1.0.0 /var/tmp/cores/jsvc.25526.0.core >> (gdb) bt >> ... >> #7 0x7f30b9a1fed2 in ceph::log::SubsystemMap::should_gather >> (level=, sub=, this=) >> at ./log/SubsystemMap.h:62 >> #8 0x7f30b9a3b693 in ceph::log::SubsystemMap::should_gather >> (this=, sub=, level=) >> at ./log/SubsystemMap.h:61 >> #9 0x7f30b9d879be in ObjectCacher::flusher_entry >> (this=0x7f2fb4017910) at osdc/ObjectCacher.cc:1527 >> #10 0x7f30b9d9851d in ObjectCacher::FlusherThread::entry >> (this=) at osdc/ObjectCacher.h:374 >> >> From ceph code, this part executed when flushing cache object... And we >> don;t understand why. Becasue we have absolutely different race condition >> to reproduce it. >> >> As cloudstack have not good implementation yet of snapshot lifecycle, >> sometime, it's happen, that some volumes already marked as EXPUNGED in DB >> and then cloudstack try to delete bas Volume, before it's try to unprotect >> it. >> >> Sure, unprotecting fail, normal exception returned back (fail because >> snap has childs... ) >> >> 2015-10-29 09:02:19,401 DEBUG [kvm.resource.KVMHAMonitor] >> (Thread-1304:null) Executing: >> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i >> 10.44.253.13 -p /var/lib/libvirt/PRIMARY -m >> /mnt/93655746-a9ef-394d-95e9-6e62471dd39f -h 10.44.253.11 >> 2015-10-29 09:02:19,412 DEBUG [kvm.resource.KVMHAMonitor] >> (Thread-1304:null) Execution is successful. >> 2015-10-29 09:02:19,554 INFO [kvm.storage.LibvirtStorageAdaptor] >> (agentRequest-Handler-5:null) Unprotecting and Removing RBD snapshots of >> image 6789/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c prior to removing the image >> 2015-10-29 09:02:19,571 DEBUG [kvm.storage.LibvirtStorageAdaptor] >> (agentRequest-Handler-5:null) Succesfully connected to Ceph cluster at >> cephmon.anolim.net:6789 >> 2015-10-29 09:02:19,608 DEBUG [kvm.storage.LibvirtStorageAdaptor] >> (agentRequest-Handler-5:null) Unprotecting snapshot >> cloudstack/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c@cloudstack-base-snap >> 2015-10-29 09:02:19,627 DEBUG [kvm.storage.KVMStorageProcessor] >> (agentRequest-Handler-5:null) Failed to delete volume: >> com.cloud.utils.exception.CloudRuntimeException: com.ceph.rbd.RbdException: >> Failed to unprotect snapshot cloudstack-base-snap >> 2015-10-29 09:02:19,628 DEBUG [cloud.agent.Agent] >> (agentRequest-Handler-5:null) Seq 4-1921583831: { Ans: , MgmtId: >> 161344838950, via: 4, Ver: v1, Flags: 10, >> [{"com.cloud.agent.api.Answer":{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException: >> com.ceph.rbd.RbdException:
Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd
Additional trace: #0 0x7f30f9891cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x7f30f98950d8 in __GI_abort () at abort.c:89 #2 0x7f30f87b36b5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #3 0x7f30f87b1836 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #4 0x7f30f87b1863 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #5 0x7f30f87b1aa2 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #6 0x7f2fddb50778 in ceph::__ceph_assert_fail (assertion=assertion@entry=0x7f2fdddeca05 "sub < m_subsys.size()", file=file@entry=0x7f2fdddec9f0 "./log/SubsystemMap.h", line=line@entry =62, func=func@entry=0x7f2fdddedba0 <_ZZN4ceph3log12SubsystemMap13should_gatherEjiE19__PRETTY_FUNCTION__> "bool ceph::log::SubsystemMap::should_gather(unsigned int, int)") at common/assert.cc:77 #7 0x7f2fdda1fed2 in ceph::log::SubsystemMap::should_gather (level=, sub=, this=) at ./log/SubsystemMap.h:62 #8 0x7f2fdda3b693 in ceph::log::SubsystemMap::should_gather (this=, sub=, level=) at ./log/SubsystemMap.h:61 #9 0x7f2fddd879be in ObjectCacher::flusher_entry (this=0x7f2ff80b27a0) at osdc/ObjectCacher.cc:1527 #10 0x7f2fddd9851d in ObjectCacher::FlusherThread::entry (this=) at osdc/ObjectCacher.h:374 #11 0x7f30f9c28182 in start_thread (arg=0x7f2e1a7fc700) at pthread_create.c:312 #12 0x7f30f995547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 2015-10-29 17:38 GMT+02:00 Voloshanenko Igor : > Hi Wido and all community. > > We catched very idiotic issue on our Cloudstack installation, which > related to ceph and possible to java-rados lib. > > So, we have constantly agent crashed (which cause very big problem for > us... ). > > When agent crashed - it's crash JVM. And no event in logs at all. > We enabled crush dump, and after crash we see next picture: > > #grep -A1 "Problematic frame" < /hs_err_pid30260.log > Problematic frame: > C [librbd.so.1.0.0+0x5d681] > > # gdb /usr/lib/librbd.so.1.0.0 /var/tmp/cores/jsvc.25526.0.core > (gdb) bt > ... > #7 0x7f30b9a1fed2 in ceph::log::SubsystemMap::should_gather > (level=, sub=, this=) > at ./log/SubsystemMap.h:62 > #8 0x7f30b9a3b693 in ceph::log::SubsystemMap::should_gather > (this=, sub=, level=) > at ./log/SubsystemMap.h:61 > #9 0x7f30b9d879be in ObjectCacher::flusher_entry > (this=0x7f2fb4017910) at osdc/ObjectCacher.cc:1527 > #10 0x7f30b9d9851d in ObjectCacher::FlusherThread::entry > (this=) at osdc/ObjectCacher.h:374 > > From ceph code, this part executed when flushing cache object... And we > don;t understand why. Becasue we have absolutely different race condition > to reproduce it. > > As cloudstack have not good implementation yet of snapshot lifecycle, > sometime, it's happen, that some volumes already marked as EXPUNGED in DB > and then cloudstack try to delete bas Volume, before it's try to unprotect > it. > > Sure, unprotecting fail, normal exception returned back (fail because snap > has childs... ) > > 2015-10-29 09:02:19,401 DEBUG [kvm.resource.KVMHAMonitor] > (Thread-1304:null) Executing: > /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i > 10.44.253.13 -p /var/lib/libvirt/PRIMARY -m > /mnt/93655746-a9ef-394d-95e9-6e62471dd39f -h 10.44.253.11 > 2015-10-29 09:02:19,412 DEBUG [kvm.resource.KVMHAMonitor] > (Thread-1304:null) Execution is successful. > 2015-10-29 09:02:19,554 INFO [kvm.storage.LibvirtStorageAdaptor] > (agentRequest-Handler-5:null) Unprotecting and Removing RBD snapshots of > image 6789/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c prior to removing the image > 2015-10-29 09:02:19,571 DEBUG [kvm.storage.LibvirtStorageAdaptor] > (agentRequest-Handler-5:null) Succesfully connected to Ceph cluster at > cephmon.anolim.net:6789 > 2015-10-29 09:02:19,608 DEBUG [kvm.storage.LibvirtStorageAdaptor] > (agentRequest-Handler-5:null) Unprotecting snapshot > cloudstack/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c@cloudstack-base-snap > 2015-10-29 09:02:19,627 DEBUG [kvm.storage.KVMStorageProcessor] > (agentRequest-Handler-5:null) Failed to delete volume: > com.cloud.utils.exception.CloudRuntimeException: com.ceph.rbd.RbdException: > Failed to unprotect snapshot cloudstack-base-snap > 2015-10-29 09:02:19,628 DEBUG [cloud.agent.Agent] > (agentRequest-Handler-5:null) Seq 4-1921583831: { Ans: , MgmtId: > 161344838950, via: 4, Ver: v1, Flags: 10, > [{"com.cloud.agent.api.Answer":{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException: > com.ceph.rbd.RbdException: Failed to unprotect snapshot > cloudstack-base-snap","wait":0}}] } > 2015-10-29 09:02:25,722 DEBUG [cloud.agent.Agent] > (agentRequest-Handler-2:null) Processing command: > com.cloud.agent.api.GetHostStatsCommand > 2015-10-29 09:02:25,722 DEBUG [kvm.resource.LibvirtComputingResource] > (agentRequest-Handler-2:null) Executing: /bin/bash -c idle=$(top -b -n 1| > a
Re: [ceph-users] rbd hang
Periodicly I am also getting these while waiting 2015-10-29 13:41:09.528674 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:41:14.528779 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:41:19.528907 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:41:22.515725 7f5c260d9700 1 -- 10.134.128.41:0/1025119 --> 10.134.128.41:6789/0 -- mon_subscribe({monmap=6+,osdmap=4351}) v2 -- ?+0 0x7f5c080073c0 con 0x2307540 2015-10-29 13:41:22.516453 7f5c270db700 1 -- 10.134.128.41:0/1025119 <== mon.0 10.134.128.41:6789/0 21 mon_subscribe_ack(300s) v1 20+0+0 (646930372 0 0) 0x7f5c10003170 con 0x2307540 2015-10-29 13:41:24.529012 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:41:29.529109 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:41:34.529209 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:41:39.529306 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:41:44.529402 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:41:49.529498 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:41:54.529597 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:41:59.529695 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:42:04.529800 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:42:09.529904 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:42:14.530004 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:42:19.530103 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:42:24.530200 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:42:29.530293 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:42:34.530385 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:42:39.530480 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:42:44.530594 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:42:49.530690 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:42:54.530787 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:42:59.530881 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:43:04.530980 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:43:09.531087 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:43:14.531190 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:43:19.531308 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:43:24.531417 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:43:29.531524 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:43:34.531629 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:43:39.531733 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:43:44.531836 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:43:49.531938 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:43:49.692028 7f5c270db700 1 client.7368.objecter ms_handle_reset on osd.2 2015-10-29 13:43:49.692051 7f5c270db700 10 client.7368.objecter reopen_session osd.2 session, addr now osd.2 10.134.128.43:6803/2741 2015-10-29 13:43:49.692176 7f5c270db700 1 -- 10.134.128.41:0/1025119 mark_down 0x7f5c1c001c40 -- pipe dne 2015-10-29 13:43:49.692287 7f5c270db700 10 client.7368.objecter kick_requests for osd.2 2015-10-29 13:43:49.692300 7f5c270db700 10 client.7368.objecter maybe_request_map subscribing (onetime) to next osd map 2015-10-29 13:43:49.693706 7f5c270db700 10 client.7368.objecter ms_handle_connect 0x7f5c1c003810 2015-10-29 13:43:52.517670 7f5c260d9700 1 -- 10.134.128.41:0/1025119 --> 10.134.128.41:6789/0 -- mon_subscribe({monmap=6+,osdmap=4351}) v2 -- ?+0 0x7f5c080096c0 con 0x2307540 2015-10-29 13:43:52.518032 7f5c270db700 1 -- 10.134.128.41:0/1025119 <== mon.0 10.134.128.41:6789/0 22 mon_subscribe_ack(300s) v1 20+0+0 (646930372 0 0) 0x7f5c100056d0 con 0x2307540 2015-10-29 13:43:54.532041 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:43:59.532150 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:44:04.532252 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:44:09.532359 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:44:14.532467 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:44:19.532587 7f5c24fd6700 10 client.7368.objecter tick 2015-10-29 13:44:24.532692 7f5c24fd6700 10 client.7368.objecter tick - Original Message - From: "Jason Dillaman" To: "Joe Ryner" Cc: ceph-us...@ceph.com Sent: Thursday, October 29, 2015 12:05:38 PM Subject: Re: [ceph-users] rbd hang I don't see the read request hitting the wire, so I am thinking your client cannot talk to the primary PG for the 'rb.0.16cf.238e1f29.' object. Try adding "debug objecter = 20" to your configuration to get more details. -- Jason Dillaman - Original Message - > From: "Joe Ryner" > To: ceph-us...@ceph.com > Sent: Thursday, October 29, 2015 12:22:01 PM > Subject: [ceph-users] rbd hang > > i, > > I am having a strange problem with our development cluster. When I run rbd > export it just hangs. I have been running ceph for a long time and haven't > encountered this kind of issue. Any ideas
Re: [ceph-users] radosgw get quota
On Thu, Oct 29, 2015 at 11:29 AM, Derek Yarnell wrote: > Sorry, the information is in the headers. So I think the valid question > to follow up is why is this information in the headers and not the body > of the request. I think this is a bug, but maybe I am not aware of a > subtly. It would seem this json comes from this line[0]. > > [0] - > https://github.com/ceph/ceph/blob/83e10f7e2df0a71bd59e6ef2aa06b52b186fddaa/src/rgw/rgw_rest_user.cc#L697 > > For example the information is returned in what seems to be the > Content-type header as follows. Maybe the missing : in the json > encoding would explain something? It's definitely a bug. It looks like we fail to call end_header() before it, so everything is dumped before we close the http header. Can you open a ceph tracker issue with the info you provided here? Thanks, Yehuda > > INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS > connection (1): ceph.umiacs.umd.edu > DEBUG:requests.packages.urllib3.connectionpool:"GET > /admin/user?quota&format=json&uid=foo1209"a-type=user HTTP/1.1" 200 0 > INFO:rgwadmin.rgw:[('date', 'Thu, 29 Oct 2015 18:28:45 GMT'), > ('{"enabled"', 'true,"max_size_kb":12345,"max_objects":-1}Content-type: > application/json'), ('content-length', '0'), ('server', 'Apache/2.4.6 > (Red Hat Enterprise Linux) OpenSSL/1.0.1e-fips mod_wsgi/3.4 Python/2.7.5')] > > On 10/28/15 11:15 PM, Derek Yarnell wrote: >> I have had this issue before, and I don't think I have resolved it. I >> have been using the RGW admin api to set quota based on the docs[0]. >> But I can't seem to be able to get it to cough up and show me the quota >> now. Any ideas I get a 200 back but no body, I have tested this on a >> Firefly (0.80.5-9) and Hammer (0.87.2-0) cluster. The latter is what >> the logs are for. >> >> [0] - http://docs.ceph.com/docs/master/radosgw/adminops/#quotas >> >> DEBUG:rgwadmin.rgw:URL: >> http://ceph.umiacs.umd.edu/admin/user?quota&uid=derek"a-type=user >> DEBUG:rgwadmin.rgw:Access Key: RTJ1TL13CH613JRU2PJD >> DEBUG:rgwadmin.rgw:Verify: True CA Bundle: None >> INFO:requests.packages.urllib3.connectionpool:Starting new HTTP >> connection (1): ceph.umiacs.umd.edu >> DEBUG:requests.packages.urllib3.connectionpool:"GET >> /admin/user?quota&uid=derek"a-type=user HTTP/1.1" 200 0 >> INFO:rgwadmin.rgw:No JSON object could be decoded >> >> >> 2015-10-28 23:02:46.445367 7f444cff1700 1 civetweb: 0x7f445c026d00: >> 127.0.0.1 - - [28/Oct/2015:23:02:46 -0400] "GET /admin/user HTTP/1.1" -1 >> 0 - python-requests/2.7.0 CPython/2.7.5 Linux/3.10.0-229.14.1.el7.x86_64 >> 2015-10-28 23:03:02.063755 7f447ace2700 2 >> RGWDataChangesLog::ChangesRenewThread: start >> 2015-10-28 23:03:17.139339 7f443cfd1700 20 RGWEnv::set(): HTTP_HOST: >> localhost:7480 >> 2015-10-28 23:03:17.139357 7f443cfd1700 20 RGWEnv::set(): >> HTTP_ACCEPT_ENCODING: gzip, deflate >> 2015-10-28 23:03:17.139358 7f443cfd1700 20 RGWEnv::set(): HTTP_ACCEPT: */* >> 2015-10-28 23:03:17.139364 7f443cfd1700 20 RGWEnv::set(): >> HTTP_USER_AGENT: python-requests/2.7.0 CPython/2.7.5 >> Linux/3.10.0-229.14.1.el7.x86_64 >> 2015-10-28 23:03:17.139375 7f443cfd1700 20 RGWEnv::set(): HTTP_DATE: >> Thu, 29 Oct 2015 03:03:17 GMT >> 2015-10-28 23:03:17.139377 7f443cfd1700 20 RGWEnv::set(): >> HTTP_AUTHORIZATION: AWS RTJ1TL13CH613JRU2PJD:ZtDQkxc+Nqo04zVsNND0yx32lds= >> 2015-10-28 23:03:17.139381 7f443cfd1700 20 RGWEnv::set(): >> HTTP_X_FORWARDED_FOR: 128.8.132.4 >> 2015-10-28 23:03:17.139383 7f443cfd1700 20 RGWEnv::set(): >> HTTP_X_FORWARDED_HOST: ceph.umiacs.umd.edu >> 2015-10-28 23:03:17.139385 7f443cfd1700 20 RGWEnv::set(): >> HTTP_X_FORWARDED_SERVER: cephproxy00.umiacs.umd.edu >> 2015-10-28 23:03:17.139387 7f443cfd1700 20 RGWEnv::set(): >> HTTP_CONNECTION: Keep-Alive >> 2015-10-28 23:03:17.139392 7f443cfd1700 20 RGWEnv::set(): >> REQUEST_METHOD: GET >> 2015-10-28 23:03:17.139394 7f443cfd1700 20 RGWEnv::set(): REQUEST_URI: >> /admin/user >> 2015-10-28 23:03:17.139397 7f443cfd1700 20 RGWEnv::set(): QUERY_STRING: >> quota&uid=derek"a-type=user >> 2015-10-28 23:03:17.139401 7f443cfd1700 20 RGWEnv::set(): REMOTE_USER: >> 2015-10-28 23:03:17.139403 7f443cfd1700 20 RGWEnv::set(): SCRIPT_URI: >> /admin/user >> 2015-10-28 23:03:17.139408 7f443cfd1700 20 RGWEnv::set(): SERVER_PORT: 7480 >> 2015-10-28 23:03:17.139409 7f443cfd1700 20 HTTP_ACCEPT=*/* >> 2015-10-28 23:03:17.139410 7f443cfd1700 20 HTTP_ACCEPT_ENCODING=gzip, >> deflate >> 2015-10-28 23:03:17.139411 7f443cfd1700 20 HTTP_AUTHORIZATION=AWS >> RTJ1TL13CH613JRU2PJD:ZtDQkxc+Nqo04zVsNND0yx32lds= >> 2015-10-28 23:03:17.139412 7f443cfd1700 20 HTTP_CONNECTION=Keep-Alive >> 2015-10-28 23:03:17.139412 7f443cfd1700 20 HTTP_DATE=Thu, 29 Oct 2015 >> 03:03:17 GMT >> 2015-10-28 23:03:17.139413 7f443cfd1700 20 HTTP_HOST=localhost:7480 >> 2015-10-28 23:03:17.139413 7f443cfd1700 20 >> HTTP_USER_AGENT=python-requests/2.7.0 CPython/2.7.5 >> Linux/3.10.0-229.14.1.el7.x86_64 >> 2015-10-28 23:03:17.139414 7f443cfd1700 20 HTTP_X_FORWARDED_FOR=128.8.132.4 >> 2015-10-2
Re: [ceph-users] radosgw get quota
Sorry, the information is in the headers. So I think the valid question to follow up is why is this information in the headers and not the body of the request. I think this is a bug, but maybe I am not aware of a subtly. It would seem this json comes from this line[0]. [0] - https://github.com/ceph/ceph/blob/83e10f7e2df0a71bd59e6ef2aa06b52b186fddaa/src/rgw/rgw_rest_user.cc#L697 For example the information is returned in what seems to be the Content-type header as follows. Maybe the missing : in the json encoding would explain something? INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): ceph.umiacs.umd.edu DEBUG:requests.packages.urllib3.connectionpool:"GET /admin/user?quota&format=json&uid=foo1209"a-type=user HTTP/1.1" 200 0 INFO:rgwadmin.rgw:[('date', 'Thu, 29 Oct 2015 18:28:45 GMT'), ('{"enabled"', 'true,"max_size_kb":12345,"max_objects":-1}Content-type: application/json'), ('content-length', '0'), ('server', 'Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.1e-fips mod_wsgi/3.4 Python/2.7.5')] On 10/28/15 11:15 PM, Derek Yarnell wrote: > I have had this issue before, and I don't think I have resolved it. I > have been using the RGW admin api to set quota based on the docs[0]. > But I can't seem to be able to get it to cough up and show me the quota > now. Any ideas I get a 200 back but no body, I have tested this on a > Firefly (0.80.5-9) and Hammer (0.87.2-0) cluster. The latter is what > the logs are for. > > [0] - http://docs.ceph.com/docs/master/radosgw/adminops/#quotas > > DEBUG:rgwadmin.rgw:URL: > http://ceph.umiacs.umd.edu/admin/user?quota&uid=derek"a-type=user > DEBUG:rgwadmin.rgw:Access Key: RTJ1TL13CH613JRU2PJD > DEBUG:rgwadmin.rgw:Verify: True CA Bundle: None > INFO:requests.packages.urllib3.connectionpool:Starting new HTTP > connection (1): ceph.umiacs.umd.edu > DEBUG:requests.packages.urllib3.connectionpool:"GET > /admin/user?quota&uid=derek"a-type=user HTTP/1.1" 200 0 > INFO:rgwadmin.rgw:No JSON object could be decoded > > > 2015-10-28 23:02:46.445367 7f444cff1700 1 civetweb: 0x7f445c026d00: > 127.0.0.1 - - [28/Oct/2015:23:02:46 -0400] "GET /admin/user HTTP/1.1" -1 > 0 - python-requests/2.7.0 CPython/2.7.5 Linux/3.10.0-229.14.1.el7.x86_64 > 2015-10-28 23:03:02.063755 7f447ace2700 2 > RGWDataChangesLog::ChangesRenewThread: start > 2015-10-28 23:03:17.139339 7f443cfd1700 20 RGWEnv::set(): HTTP_HOST: > localhost:7480 > 2015-10-28 23:03:17.139357 7f443cfd1700 20 RGWEnv::set(): > HTTP_ACCEPT_ENCODING: gzip, deflate > 2015-10-28 23:03:17.139358 7f443cfd1700 20 RGWEnv::set(): HTTP_ACCEPT: */* > 2015-10-28 23:03:17.139364 7f443cfd1700 20 RGWEnv::set(): > HTTP_USER_AGENT: python-requests/2.7.0 CPython/2.7.5 > Linux/3.10.0-229.14.1.el7.x86_64 > 2015-10-28 23:03:17.139375 7f443cfd1700 20 RGWEnv::set(): HTTP_DATE: > Thu, 29 Oct 2015 03:03:17 GMT > 2015-10-28 23:03:17.139377 7f443cfd1700 20 RGWEnv::set(): > HTTP_AUTHORIZATION: AWS RTJ1TL13CH613JRU2PJD:ZtDQkxc+Nqo04zVsNND0yx32lds= > 2015-10-28 23:03:17.139381 7f443cfd1700 20 RGWEnv::set(): > HTTP_X_FORWARDED_FOR: 128.8.132.4 > 2015-10-28 23:03:17.139383 7f443cfd1700 20 RGWEnv::set(): > HTTP_X_FORWARDED_HOST: ceph.umiacs.umd.edu > 2015-10-28 23:03:17.139385 7f443cfd1700 20 RGWEnv::set(): > HTTP_X_FORWARDED_SERVER: cephproxy00.umiacs.umd.edu > 2015-10-28 23:03:17.139387 7f443cfd1700 20 RGWEnv::set(): > HTTP_CONNECTION: Keep-Alive > 2015-10-28 23:03:17.139392 7f443cfd1700 20 RGWEnv::set(): > REQUEST_METHOD: GET > 2015-10-28 23:03:17.139394 7f443cfd1700 20 RGWEnv::set(): REQUEST_URI: > /admin/user > 2015-10-28 23:03:17.139397 7f443cfd1700 20 RGWEnv::set(): QUERY_STRING: > quota&uid=derek"a-type=user > 2015-10-28 23:03:17.139401 7f443cfd1700 20 RGWEnv::set(): REMOTE_USER: > 2015-10-28 23:03:17.139403 7f443cfd1700 20 RGWEnv::set(): SCRIPT_URI: > /admin/user > 2015-10-28 23:03:17.139408 7f443cfd1700 20 RGWEnv::set(): SERVER_PORT: 7480 > 2015-10-28 23:03:17.139409 7f443cfd1700 20 HTTP_ACCEPT=*/* > 2015-10-28 23:03:17.139410 7f443cfd1700 20 HTTP_ACCEPT_ENCODING=gzip, > deflate > 2015-10-28 23:03:17.139411 7f443cfd1700 20 HTTP_AUTHORIZATION=AWS > RTJ1TL13CH613JRU2PJD:ZtDQkxc+Nqo04zVsNND0yx32lds= > 2015-10-28 23:03:17.139412 7f443cfd1700 20 HTTP_CONNECTION=Keep-Alive > 2015-10-28 23:03:17.139412 7f443cfd1700 20 HTTP_DATE=Thu, 29 Oct 2015 > 03:03:17 GMT > 2015-10-28 23:03:17.139413 7f443cfd1700 20 HTTP_HOST=localhost:7480 > 2015-10-28 23:03:17.139413 7f443cfd1700 20 > HTTP_USER_AGENT=python-requests/2.7.0 CPython/2.7.5 > Linux/3.10.0-229.14.1.el7.x86_64 > 2015-10-28 23:03:17.139414 7f443cfd1700 20 HTTP_X_FORWARDED_FOR=128.8.132.4 > 2015-10-28 23:03:17.139415 7f443cfd1700 20 > HTTP_X_FORWARDED_HOST=ceph.umiacs.umd.edu > 2015-10-28 23:03:17.139416 7f443cfd1700 20 > HTTP_X_FORWARDED_SERVER=cephproxy00.umiacs.umd.edu > 2015-10-28 23:03:17.139416 7f443cfd1700 20 > QUERY_STRING=quota&uid=derek"a-type=user > 2015-10-28 23:03:17.139417 7f443cfd1700 20 REMOTE_USER= > 2015-10-28 23:03:17.139417 7f443cfd1700 20 REQUEST_METHO
Re: [ceph-users] rbd hang
rbd -p locks export seco101ira - 2015-10-29 13:13:49.487822 7f5c2cb3b7c0 1 librados: starting msgr at :/0 2015-10-29 13:13:49.487838 7f5c2cb3b7c0 1 librados: starting objecter 2015-10-29 13:13:49.487971 7f5c2cb3b7c0 1 -- :/0 messenger.start 2015-10-29 13:13:49.488027 7f5c2cb3b7c0 1 librados: setting wanted keys 2015-10-29 13:13:49.488031 7f5c2cb3b7c0 1 librados: calling monclient init 2015-10-29 13:13:49.488708 7f5c2cb3b7c0 1 -- :/1025119 --> 10.134.128.41:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x2307900 con 0x2307540 2015-10-29 13:13:49.489236 7f5c2cb33700 1 -- 10.134.128.41:0/1025119 learned my addr 10.134.128.41:0/1025119 2015-10-29 13:13:49.489498 7f5c270db700 10 client.?.objecter ms_handle_connect 0x2307540 2015-10-29 13:13:49.489646 7f5c270db700 10 client.?.objecter resend_mon_ops 2015-10-29 13:13:49.490171 7f5c270db700 1 -- 10.134.128.41:0/1025119 <== mon.0 10.134.128.41:6789/0 1 mon_map v1 491+0+0 (318324477 0 0) 0x7f5c1be0 con 0x2307540 2015-10-29 13:13:49.490316 7f5c270db700 1 -- 10.134.128.41:0/1025119 <== mon.0 10.134.128.41:6789/0 2 auth_reply(proto 2 0 (0) Success) v1 33+0+0 (3748436714 0 0) 0x7f5c10001090 con 0x2307540 2015-10-29 13:13:49.490656 7f5c270db700 1 -- 10.134.128.41:0/1025119 --> 10.134.128.41:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0x7f5c1c0018a0 con 0x2307540 2015-10-29 13:13:49.491183 7f5c270db700 1 -- 10.134.128.41:0/1025119 <== mon.0 10.134.128.41:6789/0 3 auth_reply(proto 2 0 (0) Success) v1 206+0+0 (1658299125 0 0) 0x7f5c10001090 con 0x2307540 2015-10-29 13:13:49.491329 7f5c270db700 1 -- 10.134.128.41:0/1025119 --> 10.134.128.41:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 0x7f5c1c002250 con 0x2307540 2015-10-29 13:13:49.491871 7f5c270db700 1 -- 10.134.128.41:0/1025119 <== mon.0 10.134.128.41:6789/0 4 auth_reply(proto 2 0 (0) Success) v1 393+0+0 (1503133956 0 0) 0x7f5c18c0 con 0x2307540 2015-10-29 13:13:49.491981 7f5c270db700 1 -- 10.134.128.41:0/1025119 --> 10.134.128.41:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x2303c10 con 0x2307540 2015-10-29 13:13:49.492197 7f5c2cb3b7c0 10 client.7368.objecter maybe_request_map subscribing (onetime) to next osd map 2015-10-29 13:13:49.492234 7f5c2cb3b7c0 1 -- 10.134.128.41:0/1025119 --> 10.134.128.41:6789/0 -- mon_subscribe({monmap=6+,osdmap=0}) v2 -- ?+0 0x23048a0 con 0x2307540 2015-10-29 13:13:49.492263 7f5c2cb3b7c0 1 -- 10.134.128.41:0/1025119 --> 10.134.128.41:6789/0 -- mon_subscribe({monmap=6+,osdmap=0}) v2 -- ?+0 0x2304e40 con 0x2307540 2015-10-29 13:13:49.492595 7f5c270db700 1 -- 10.134.128.41:0/1025119 <== mon.0 10.134.128.41:6789/0 5 mon_map v1 491+0+0 (318324477 0 0) 0x7f5c10001300 con 0x2307540 2015-10-29 13:13:49.492758 7f5c270db700 1 -- 10.134.128.41:0/1025119 <== mon.0 10.134.128.41:6789/0 6 mon_subscribe_ack(300s) v1 20+0+0 (646930372 0 0) 0x7f5c100015a0 con 0x2307540 2015-10-29 13:13:49.493171 7f5c270db700 1 -- 10.134.128.41:0/1025119 <== mon.0 10.134.128.41:6789/0 7 osd_map(4350..4350 src has 3829..4350) v3 7562+0+0 (1787729222 0 0) 0x7f5c18c0 con 0x2307540 2015-10-29 13:13:49.493390 7f5c2cb3b7c0 1 librados: init done 2015-10-29 13:13:49.493431 7f5c2cb3b7c0 10 librados: wait_for_osdmap waiting 2015-10-29 13:13:49.493557 7f5c270db700 3 client.7368.objecter handle_osd_map got epochs [4350,4350] > 0 2015-10-29 13:13:49.493572 7f5c270db700 3 client.7368.objecter handle_osd_map decoding full epoch 4350 2015-10-29 13:13:49.493831 7f5c270db700 20 client.7368.objecter dump_active .. 0 homeless 2015-10-29 13:13:49.493861 7f5c2cb3b7c0 10 librados: wait_for_osdmap done waiting 2015-10-29 13:13:49.493863 7f5c270db700 1 -- 10.134.128.41:0/1025119 <== mon.0 10.134.128.41:6789/0 8 mon_subscribe_ack(300s) v1 20+0+0 (646930372 0 0) 0x7f5c10003170 con 0x2307540 2015-10-29 13:13:49.493880 7f5c270db700 1 -- 10.134.128.41:0/1025119 <== mon.0 10.134.128.41:6789/0 9 osd_map(4350..4350 src has 3829..4350) v3 7562+0+0 (1787729222 0 0) 0x7f5c10005230 con 0x2307540 2015-10-29 13:13:49.493889 7f5c270db700 3 client.7368.objecter handle_osd_map ignoring epochs [4350,4350] <= 4350 2015-10-29 13:13:49.493891 7f5c270db700 20 client.7368.objecter dump_active .. 0 homeless 2015-10-29 13:13:49.493898 7f5c270db700 1 -- 10.134.128.41:0/1025119 <== mon.0 10.134.128.41:6789/0 10 mon_subscribe_ack(300s) v1 20+0+0 (646930372 0 0) 0x7f5c100056d0 con 0x2307540 2015-10-29 13:13:49.493950 7f5c2cb3b7c0 20 librbd::ImageCtx: enabling caching... 2015-10-29 13:13:49.493971 7f5c2cb3b7c0 20 librbd::ImageCtx: Initial cache settings: size=64 num_objects=10 max_dirty=32 target_dirty=16 max_dirty_age=5 2015-10-29 13:13:49.494155 7f5c2cb3b7c0 20 librbd: open_image: ictx = 0x2305530 name = 'seco101ira' id = '' snap_name = '' 2015-10-29 13:13:49.494209 7f5c2cb3b7c0 10 librados: stat oid=seco101ira.rbd nspace= 2015-10-29 13:13:49.494290 7f5c2cb3b7c0 10 client.7368.
Re: [ceph-users] rbd hang
I don't see the read request hitting the wire, so I am thinking your client cannot talk to the primary PG for the 'rb.0.16cf.238e1f29.' object. Try adding "debug objecter = 20" to your configuration to get more details. -- Jason Dillaman - Original Message - > From: "Joe Ryner" > To: ceph-us...@ceph.com > Sent: Thursday, October 29, 2015 12:22:01 PM > Subject: [ceph-users] rbd hang > > i, > > I am having a strange problem with our development cluster. When I run rbd > export it just hangs. I have been running ceph for a long time and haven't > encountered this kind of issue. Any ideas as to what is going on? > > rbd -p locks export seco101ira - > > > I am running > > Centos 6.6 x86 64 > > ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70) > > I have enabled debugging and get the following when I run the command > > [root@durbium ~]# rbd -p locks export seco101ira - > 2015-10-29 11:17:08.183597 7fc3334fa7c0 1 librados: starting msgr at :/0 > 2015-10-29 11:17:08.183613 7fc3334fa7c0 1 librados: starting objecter > 2015-10-29 11:17:08.183739 7fc3334fa7c0 1 -- :/0 messenger.start > 2015-10-29 11:17:08.183779 7fc3334fa7c0 1 librados: setting wanted keys > 2015-10-29 11:17:08.183782 7fc3334fa7c0 1 librados: calling monclient init > 2015-10-29 11:17:08.184365 7fc3334fa7c0 1 -- :/1024687 --> > 10.134.128.42:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x15ba900 > con 0x15ba540 > 2015-10-29 11:17:08.185006 7fc3334f2700 1 -- 10.134.128.41:0/1024687 learned > my addr 10.134.128.41:0/1024687 > 2015-10-29 11:17:08.185995 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== > mon.1 10.134.128.42:6789/0 1 mon_map v1 491+0+0 (318324477 0 0) > 0x7fc318000be0 con 0x15ba540 > 2015-10-29 11:17:08.186213 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== > mon.1 10.134.128.42:6789/0 2 auth_reply(proto 2 0 (0) Success) v1 > 33+0+0 (4093383511 0 0) 0x7fc318001090 con 0x15ba540 > 2015-10-29 11:17:08.186544 7fc32da9a700 1 -- 10.134.128.41:0/1024687 --> > 10.134.128.42:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 > 0x7fc31c001700 con 0x15ba540 > 2015-10-29 11:17:08.187160 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== > mon.1 10.134.128.42:6789/0 3 auth_reply(proto 2 0 (0) Success) v1 > 206+0+0 (2382192463 0 0) 0x7fc318001090 con 0x15ba540 > 2015-10-29 11:17:08.187354 7fc32da9a700 1 -- 10.134.128.41:0/1024687 --> > 10.134.128.42:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 > 0x7fc31c002220 con 0x15ba540 > 2015-10-29 11:17:08.188001 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== > mon.1 10.134.128.42:6789/0 4 auth_reply(proto 2 0 (0) Success) v1 > 393+0+0 (34117402 0 0) 0x7fc3180008c0 con 0x15ba540 > 2015-10-29 11:17:08.188148 7fc32da9a700 1 -- 10.134.128.41:0/1024687 --> > 10.134.128.42:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x15b6b80 con > 0x15ba540 > 2015-10-29 11:17:08.188334 7fc3334fa7c0 1 -- 10.134.128.41:0/1024687 --> > 10.134.128.42:6789/0 -- mon_subscribe({monmap=6+,osdmap=0}) v2 -- ?+0 > 0x15b7700 con 0x15ba540 > 2015-10-29 11:17:08.188355 7fc3334fa7c0 1 -- 10.134.128.41:0/1024687 --> > 10.134.128.42:6789/0 -- mon_subscribe({monmap=6+,osdmap=0}) v2 -- ?+0 > 0x15b7ca0 con 0x15ba540 > 2015-10-29 11:17:08.188445 7fc3334fa7c0 1 librados: init done > 2015-10-29 11:17:08.188463 7fc3334fa7c0 10 librados: wait_for_osdmap waiting > 2015-10-29 11:17:08.188625 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== > mon.1 10.134.128.42:6789/0 5 mon_map v1 491+0+0 (318324477 0 0) > 0x7fc318001300 con 0x15ba540 > 2015-10-29 11:17:08.188795 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== > mon.1 10.134.128.42:6789/0 6 mon_subscribe_ack(300s) v1 20+0+0 > (646930372 0 0) 0x7fc3180015a0 con 0x15ba540 > 2015-10-29 11:17:08.189129 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== > mon.1 10.134.128.42:6789/0 7 osd_map(4350..4350 src has 3829..4350) v3 > 7562+0+0 (1787729222 0 0) 0x7fc3180013b0 con 0x15ba540 > 2015-10-29 11:17:08.189452 7fc3334fa7c0 10 librados: wait_for_osdmap done > waiting > 2015-10-29 11:17:08.189454 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== > mon.1 10.134.128.42:6789/0 8 mon_subscribe_ack(300s) v1 20+0+0 > (646930372 0 0) 0x7fc3180008c0 con 0x15ba540 > 2015-10-29 11:17:08.189470 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== > mon.1 10.134.128.42:6789/0 9 osd_map(4350..4350 src has 3829..4350) v3 > 7562+0+0 (1787729222 0 0) 0x7fc318005290 con 0x15ba540 > 2015-10-29 11:17:08.189485 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== > mon.1 10.134.128.42:6789/0 10 mon_subscribe_ack(300s) v1 20+0+0 > (646930372 0 0) 0x7fc3180056d0 con 0x15ba540 > 2015-10-29 11:17:08.189522 7fc3334fa7c0 20 librbd::ImageCtx: enabling > caching... > 2015-10-29 11:17:08.189540 7fc3334fa7c0 20 librbd::ImageCtx: Initial cache > settings: size=64 num_objects=10 max_dirty=32 target_dirty=16 > max_dirty_age=5 > 2015-10-29 11:17:08.189686 7fc3334fa7c0 20 librbd: open_image: ictx =
Re: [ceph-users] Benchmark individual OSD's
You can also extend that command line to specify specific block and total sizes. Check the help text. :) -Greg On Thursday, October 29, 2015, Lindsay Mathieson < lindsay.mathie...@gmail.com> wrote: > > On 29 October 2015 at 19:24, Burkhard Linke < > burkhard.li...@computational.bio.uni-giessen.de > > > wrote: > >> # ceph tell osd.1 bench >> { >> "bytes_written": 1073741824, >> "blocksize": 4194304, >> "bytes_per_sec": 117403227.00 >> } >> >> It might help you to figure out whether individual OSDs do not perform as >> expected. The amount of data written is limited (but there's a config >> setting for it). With 1 GB as in the example above, the write operation >> will probably be limited to the journal. >> > > > > Thats perfect, thanks Burkhard, it lets me compare osd's and > configurations. > > > -- > Lindsay > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd hang
i, I am having a strange problem with our development cluster. When I run rbd export it just hangs. I have been running ceph for a long time and haven't encountered this kind of issue. Any ideas as to what is going on? rbd -p locks export seco101ira - I am running Centos 6.6 x86 64 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70) I have enabled debugging and get the following when I run the command [root@durbium ~]# rbd -p locks export seco101ira - 2015-10-29 11:17:08.183597 7fc3334fa7c0 1 librados: starting msgr at :/0 2015-10-29 11:17:08.183613 7fc3334fa7c0 1 librados: starting objecter 2015-10-29 11:17:08.183739 7fc3334fa7c0 1 -- :/0 messenger.start 2015-10-29 11:17:08.183779 7fc3334fa7c0 1 librados: setting wanted keys 2015-10-29 11:17:08.183782 7fc3334fa7c0 1 librados: calling monclient init 2015-10-29 11:17:08.184365 7fc3334fa7c0 1 -- :/1024687 --> 10.134.128.42:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x15ba900 con 0x15ba540 2015-10-29 11:17:08.185006 7fc3334f2700 1 -- 10.134.128.41:0/1024687 learned my addr 10.134.128.41:0/1024687 2015-10-29 11:17:08.185995 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 1 mon_map v1 491+0+0 (318324477 0 0) 0x7fc318000be0 con 0x15ba540 2015-10-29 11:17:08.186213 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 2 auth_reply(proto 2 0 (0) Success) v1 33+0+0 (4093383511 0 0) 0x7fc318001090 con 0x15ba540 2015-10-29 11:17:08.186544 7fc32da9a700 1 -- 10.134.128.41:0/1024687 --> 10.134.128.42:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0x7fc31c001700 con 0x15ba540 2015-10-29 11:17:08.187160 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 3 auth_reply(proto 2 0 (0) Success) v1 206+0+0 (2382192463 0 0) 0x7fc318001090 con 0x15ba540 2015-10-29 11:17:08.187354 7fc32da9a700 1 -- 10.134.128.41:0/1024687 --> 10.134.128.42:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 0x7fc31c002220 con 0x15ba540 2015-10-29 11:17:08.188001 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 4 auth_reply(proto 2 0 (0) Success) v1 393+0+0 (34117402 0 0) 0x7fc3180008c0 con 0x15ba540 2015-10-29 11:17:08.188148 7fc32da9a700 1 -- 10.134.128.41:0/1024687 --> 10.134.128.42:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x15b6b80 con 0x15ba540 2015-10-29 11:17:08.188334 7fc3334fa7c0 1 -- 10.134.128.41:0/1024687 --> 10.134.128.42:6789/0 -- mon_subscribe({monmap=6+,osdmap=0}) v2 -- ?+0 0x15b7700 con 0x15ba540 2015-10-29 11:17:08.188355 7fc3334fa7c0 1 -- 10.134.128.41:0/1024687 --> 10.134.128.42:6789/0 -- mon_subscribe({monmap=6+,osdmap=0}) v2 -- ?+0 0x15b7ca0 con 0x15ba540 2015-10-29 11:17:08.188445 7fc3334fa7c0 1 librados: init done 2015-10-29 11:17:08.188463 7fc3334fa7c0 10 librados: wait_for_osdmap waiting 2015-10-29 11:17:08.188625 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 5 mon_map v1 491+0+0 (318324477 0 0) 0x7fc318001300 con 0x15ba540 2015-10-29 11:17:08.188795 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 6 mon_subscribe_ack(300s) v1 20+0+0 (646930372 0 0) 0x7fc3180015a0 con 0x15ba540 2015-10-29 11:17:08.189129 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 7 osd_map(4350..4350 src has 3829..4350) v3 7562+0+0 (1787729222 0 0) 0x7fc3180013b0 con 0x15ba540 2015-10-29 11:17:08.189452 7fc3334fa7c0 10 librados: wait_for_osdmap done waiting 2015-10-29 11:17:08.189454 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 8 mon_subscribe_ack(300s) v1 20+0+0 (646930372 0 0) 0x7fc3180008c0 con 0x15ba540 2015-10-29 11:17:08.189470 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 9 osd_map(4350..4350 src has 3829..4350) v3 7562+0+0 (1787729222 0 0) 0x7fc318005290 con 0x15ba540 2015-10-29 11:17:08.189485 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 10 mon_subscribe_ack(300s) v1 20+0+0 (646930372 0 0) 0x7fc3180056d0 con 0x15ba540 2015-10-29 11:17:08.189522 7fc3334fa7c0 20 librbd::ImageCtx: enabling caching... 2015-10-29 11:17:08.189540 7fc3334fa7c0 20 librbd::ImageCtx: Initial cache settings: size=64 num_objects=10 max_dirty=32 target_dirty=16 max_dirty_age=5 2015-10-29 11:17:08.189686 7fc3334fa7c0 20 librbd: open_image: ictx = 0x15b8390 name = 'seco101ira' id = '' snap_name = '' 2015-10-29 11:17:08.189730 7fc3334fa7c0 10 librados: stat oid=seco101ira.rbd nspace= 2015-10-29 11:17:08.189882 7fc3334fa7c0 1 -- 10.134.128.41:0/1024687 --> 10.134.128.43:6803/2741 -- osd_op(client.7543.0:1 seco101ira.rbd [stat] 4.a982c550 ack+read e4350) v4 -- ?+0 0x15baf60 con 0x15b9e70 2015-10-29 11:17:08.192470 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== osd.2 10.134.128.43:6803/2741 1 osd_op_reply(1 seco101ira.rbd [stat] v0'0 uv1 ondisk = 0) v6 181+0+16 (1355327
[ceph-users] rbd export hangs
Hi, I am having a strange problem with our development cluster. When I run rbd export it just hangs. I have been running ceph for a long time and haven't encountered this kind of issue. Any ideas as to what is going on? rbd -p locks export seco101ira - I am running Centos 6.6 x86 64 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70) I have enabled debugging and get the following when I run the command [root@durbium ~]# rbd -p locks export seco101ira - 2015-10-29 11:17:08.183597 7fc3334fa7c0 1 librados: starting msgr at :/0 2015-10-29 11:17:08.183613 7fc3334fa7c0 1 librados: starting objecter 2015-10-29 11:17:08.183739 7fc3334fa7c0 1 -- :/0 messenger.start 2015-10-29 11:17:08.183779 7fc3334fa7c0 1 librados: setting wanted keys 2015-10-29 11:17:08.183782 7fc3334fa7c0 1 librados: calling monclient init 2015-10-29 11:17:08.184365 7fc3334fa7c0 1 -- :/1024687 --> 10.134.128.42:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x15ba900 con 0x15ba540 2015-10-29 11:17:08.185006 7fc3334f2700 1 -- 10.134.128.41:0/1024687 learned my addr 10.134.128.41:0/1024687 2015-10-29 11:17:08.185995 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 1 mon_map v1 491+0+0 (318324477 0 0) 0x7fc318000be0 con 0x15ba540 2015-10-29 11:17:08.186213 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 2 auth_reply(proto 2 0 (0) Success) v1 33+0+0 (4093383511 0 0) 0x7fc318001090 con 0x15ba540 2015-10-29 11:17:08.186544 7fc32da9a700 1 -- 10.134.128.41:0/1024687 --> 10.134.128.42:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0x7fc31c001700 con 0x15ba540 2015-10-29 11:17:08.187160 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 3 auth_reply(proto 2 0 (0) Success) v1 206+0+0 (2382192463 0 0) 0x7fc318001090 con 0x15ba540 2015-10-29 11:17:08.187354 7fc32da9a700 1 -- 10.134.128.41:0/1024687 --> 10.134.128.42:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 0x7fc31c002220 con 0x15ba540 2015-10-29 11:17:08.188001 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 4 auth_reply(proto 2 0 (0) Success) v1 393+0+0 (34117402 0 0) 0x7fc3180008c0 con 0x15ba540 2015-10-29 11:17:08.188148 7fc32da9a700 1 -- 10.134.128.41:0/1024687 --> 10.134.128.42:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x15b6b80 con 0x15ba540 2015-10-29 11:17:08.188334 7fc3334fa7c0 1 -- 10.134.128.41:0/1024687 --> 10.134.128.42:6789/0 -- mon_subscribe({monmap=6+,osdmap=0}) v2 -- ?+0 0x15b7700 con 0x15ba540 2015-10-29 11:17:08.188355 7fc3334fa7c0 1 -- 10.134.128.41:0/1024687 --> 10.134.128.42:6789/0 -- mon_subscribe({monmap=6+,osdmap=0}) v2 -- ?+0 0x15b7ca0 con 0x15ba540 2015-10-29 11:17:08.188445 7fc3334fa7c0 1 librados: init done 2015-10-29 11:17:08.188463 7fc3334fa7c0 10 librados: wait_for_osdmap waiting 2015-10-29 11:17:08.188625 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 5 mon_map v1 491+0+0 (318324477 0 0) 0x7fc318001300 con 0x15ba540 2015-10-29 11:17:08.188795 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 6 mon_subscribe_ack(300s) v1 20+0+0 (646930372 0 0) 0x7fc3180015a0 con 0x15ba540 2015-10-29 11:17:08.189129 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 7 osd_map(4350..4350 src has 3829..4350) v3 7562+0+0 (1787729222 0 0) 0x7fc3180013b0 con 0x15ba540 2015-10-29 11:17:08.189452 7fc3334fa7c0 10 librados: wait_for_osdmap done waiting 2015-10-29 11:17:08.189454 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 8 mon_subscribe_ack(300s) v1 20+0+0 (646930372 0 0) 0x7fc3180008c0 con 0x15ba540 2015-10-29 11:17:08.189470 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 9 osd_map(4350..4350 src has 3829..4350) v3 7562+0+0 (1787729222 0 0) 0x7fc318005290 con 0x15ba540 2015-10-29 11:17:08.189485 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== mon.1 10.134.128.42:6789/0 10 mon_subscribe_ack(300s) v1 20+0+0 (646930372 0 0) 0x7fc3180056d0 con 0x15ba540 2015-10-29 11:17:08.189522 7fc3334fa7c0 20 librbd::ImageCtx: enabling caching... 2015-10-29 11:17:08.189540 7fc3334fa7c0 20 librbd::ImageCtx: Initial cache settings: size=64 num_objects=10 max_dirty=32 target_dirty=16 max_dirty_age=5 2015-10-29 11:17:08.189686 7fc3334fa7c0 20 librbd: open_image: ictx = 0x15b8390 name = 'seco101ira' id = '' snap_name = '' 2015-10-29 11:17:08.189730 7fc3334fa7c0 10 librados: stat oid=seco101ira.rbd nspace= 2015-10-29 11:17:08.189882 7fc3334fa7c0 1 -- 10.134.128.41:0/1024687 --> 10.134.128.43:6803/2741 -- osd_op(client.7543.0:1 seco101ira.rbd [stat] 4.a982c550 ack+read e4350) v4 -- ?+0 0x15baf60 con 0x15b9e70 2015-10-29 11:17:08.192470 7fc32da9a700 1 -- 10.134.128.41:0/1024687 <== osd.2 10.134.128.43:6803/2741 1 osd_op_reply(1 seco101ira.rbd [stat] v0'0 uv1 ondisk = 0) v6 181+0+16 (135532
[ceph-users] Cloudstack agent crashed JVM with exception in librbd
Hi Wido and all community. We catched very idiotic issue on our Cloudstack installation, which related to ceph and possible to java-rados lib. So, we have constantly agent crashed (which cause very big problem for us... ). When agent crashed - it's crash JVM. And no event in logs at all. We enabled crush dump, and after crash we see next picture: #grep -A1 "Problematic frame" < /hs_err_pid30260.log Problematic frame: C [librbd.so.1.0.0+0x5d681] # gdb /usr/lib/librbd.so.1.0.0 /var/tmp/cores/jsvc.25526.0.core (gdb) bt ... #7 0x7f30b9a1fed2 in ceph::log::SubsystemMap::should_gather (level=, sub=, this=) at ./log/SubsystemMap.h:62 #8 0x7f30b9a3b693 in ceph::log::SubsystemMap::should_gather (this=, sub=, level=) at ./log/SubsystemMap.h:61 #9 0x7f30b9d879be in ObjectCacher::flusher_entry (this=0x7f2fb4017910) at osdc/ObjectCacher.cc:1527 #10 0x7f30b9d9851d in ObjectCacher::FlusherThread::entry (this=) at osdc/ObjectCacher.h:374 >From ceph code, this part executed when flushing cache object... And we don;t understand why. Becasue we have absolutely different race condition to reproduce it. As cloudstack have not good implementation yet of snapshot lifecycle, sometime, it's happen, that some volumes already marked as EXPUNGED in DB and then cloudstack try to delete bas Volume, before it's try to unprotect it. Sure, unprotecting fail, normal exception returned back (fail because snap has childs... ) 2015-10-29 09:02:19,401 DEBUG [kvm.resource.KVMHAMonitor] (Thread-1304:null) Executing: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i 10.44.253.13 -p /var/lib/libvirt/PRIMARY -m /mnt/93655746-a9ef-394d-95e9-6e62471dd39f -h 10.44.253.11 2015-10-29 09:02:19,412 DEBUG [kvm.resource.KVMHAMonitor] (Thread-1304:null) Execution is successful. 2015-10-29 09:02:19,554 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-5:null) Unprotecting and Removing RBD snapshots of image 6789/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c prior to removing the image 2015-10-29 09:02:19,571 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-5:null) Succesfully connected to Ceph cluster at cephmon.anolim.net:6789 2015-10-29 09:02:19,608 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-5:null) Unprotecting snapshot cloudstack/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c@cloudstack-base-snap 2015-10-29 09:02:19,627 DEBUG [kvm.storage.KVMStorageProcessor] (agentRequest-Handler-5:null) Failed to delete volume: com.cloud.utils.exception.CloudRuntimeException: com.ceph.rbd.RbdException: Failed to unprotect snapshot cloudstack-base-snap 2015-10-29 09:02:19,628 DEBUG [cloud.agent.Agent] (agentRequest-Handler-5:null) Seq 4-1921583831: { Ans: , MgmtId: 161344838950, via: 4, Ver: v1, Flags: 10, [{"com.cloud.agent.api.Answer":{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException: com.ceph.rbd.RbdException: Failed to unprotect snapshot cloudstack-base-snap","wait":0}}] } 2015-10-29 09:02:25,722 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:null) Processing command: com.cloud.agent.api.GetHostStatsCommand 2015-10-29 09:02:25,722 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Executing: /bin/bash -c idle=$(top -b -n 1| awk -F, '/^[%]*[Cc]pu/{$0=$4; gsub(/[^0-9.,]+/,""); print }'); echo $idle 2015-10-29 09:02:26,249 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Execution is successful. 2015-10-29 09:02:26,250 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Executing: /bin/bash -c freeMem=$(free|grep cache:|awk '{print $4}');echo $freeMem 2015-10-29 09:02:26,254 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) Execution is successful. BUT, after 20 minutes - agent crashed... If we remove all childs and create conditions for cloudstack to delete volume - all OK - no agent crash in 20 minutes... We can't connect this action - Volume delete with agent crashe... Also we don't understand why +- 20 minutes need to last, and only then agent crashed... >From logs, before crash - only GetVMStats... And then - agent started... 2015-10-29 09:21:55,143 DEBUG [cloud.agent.Agent] (UgentTask-5:null) Sending ping: Seq 4-1343: { Cmd , MgmtId: -1, via: 4, Ver: v1, Flags: 11, [{"com.cloud.agent.api.PingRoutingCommand":{"newStates":{},"_hostVmStateReport":{"i-881-1117-VM":{"state":"PowerOn","host":" cs2.anolim.net"},"i-7-106-VM":{"state":"PowerOn","host":"cs2.anolim.net "},"i-1683-1984-VM":{"state":"PowerOn","host":"cs2.anolim.net "},"i-11-504-VM":{"state":"PowerOn","host":"cs2.anolim.net "},"i-325-616-VM":{"state":"PowerOn","host":"cs2.anolim.net "},"i-10-52-VM":{"state":"PowerOn","host":"cs2.anolim.net "},"i-941-1237-VM":{"state":"PowerOn","host":"cs2.anolim.net"}},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":4,"wait":0}}] } 2015-10-29 09:21:55,149 DEBUG [cloud.agent.Agent] (Agent-Handler-3:null) Received response
[ceph-users] ceph-mon segmentation faults after upgrade from 0.94.3 to 0.94.5
Hi, we have multiple Ceph clusters. One is used as backend for OpenStack installation for developers - it's here we test Ceph upgrades before we upgrade prod Ceph clusters. The Ceph cluster is 4 nodes with 12 osds each running Ubuntu Trusty with latest 3.13 kernel. This time when upgrading from 0.94.3 to 0.94.5 ceph-mons died during the upgrade a couple of times. One when we restarted the first monitor, of three, during the upgrade procedure and the second time when we ran 'ceph osd unset noout' in the end. I thought this was a fluke during the upgrade, but ceph-mons seem to segfault fairly regular now, the day after the upgrade. Corefile doesn't get dumped, so I have only the log for this strange behaviour. The cluster has been following the upgrades from firefly to the current Hammer release and has worked flawless until now. The cluster produces and work more or less as normal from the users viewpoint. Except we get segmentation faults in the logfile. Downgrading is a last resort that I rather not do. What can cause these errors and how can I fix it is my question. -Arnulf The segfault look like this: Oct 29 14:29:46 95z3zz1 ceph-mon: 0> 2015-10-29 14:29:46.297786 7f908e5af700 -1 *** Caught signal (Segmentation fault) **#012 in thread 7f908e5af700#012#012 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)#012 1: /usr/bin/ceph-mon() [0x9adefa]#012 2: (()+0x10340) [0x7f90936b6340]#012 3: (std::_Rb_tree, std::_Select1st >, std::less, std::allocator > >::find(std::string const&) const+0x25) [0x6518e5]#012 4: (get_str_map_key(std::map, std::allocator > > const&, std::string const&, std::string const*)+0x1e) [0x8a002e]#012 5: (LogMonitor::update_from_paxos(bool*)+0x87a) [0x6b0a5a]#012 6: (PaxosService::refresh(bool*)+0x19a) [0x60432a]#012 7: (Monitor::refresh_from_paxos(bool*)+0x1db) [0x5b03db]#012 8: (Paxos::do_refresh()+0 x2e) [0x5eea5e]#012 9: (Paxos::commit_finish()+0x569) [0x5fbf39]#012 10: (C_Committed::finish(int)+0x2b) [0x60038b]#012 11: (Context::complete(int)+0x9) [0x5d4d89]#012 12: (MonitorDBStore::C_DoTransaction::finish(int)+0x8c) [0x5ff4bc]#012 13: (Context::complete(int)+0x9) [0x5d4d89]#012 14: (Finisher::finisher_thread_entry()+0x158) [0x717e88]#012 15: (()+0x8182) [0x7f90936ae182]#012 16: (clone()+0x6d) [0x7f9091c1947d]#012 NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. Full log of event: Oct 29 14:29:45 95z3zz1 ceph-mon: 2015-10-29 14:29:45.697177 7f3154801700 -1 *** Caught signal (Segmentation fault) **#012 in thread 7f3154801700#012#012 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)#012 1: /usr/bin/ceph-mon() [0x9adefa]#012 2: (()+0x10340) [0x7f3159b63340]#012 3: (std::_Rb_tree, std::_Select1st >, std::less, std::allocator > >::find(std::string const&) const+0x25) [0x6518e5]#012 4: (get_str_map_key(std::map, std::allocator > > const&, std::string const&, std::string const*)+0x1e) [0x8a002e]#012 5: (LogMonitor::update_from_paxos(bool*)+0x87a) [0x6b0a5a]#012 6: (PaxosService::refresh(bool*)+0x19a) [0x60432a]#012 7: (Monitor::refresh_from_paxos(bool*)+0x1db) [0x5b03db]#012 8: (Paxos::do_refresh()+0x2e) [0x 5eea5e]#012 9: (Paxos::commit_finish()+0x569) [0x5fbf39]#012 10: (C_Committed::finish(int)+0x2b) [0x60038b]#012 11: (Context::complete(int)+0x9) [0x5d4d89]#012 12: (MonitorDBStore::C_DoTransaction::finish(int)+0x8c) [0x5ff4bc]#012 13: (Context::complete(int)+0x9) [0x5d4d89]#012 14: (Finisher::finisher_thread_entry()+0x158) [0x717e88]#012 15: (()+0x8182) [0x7f3159b5b182]#012 16: (clone()+0x6d) [0x7f31580c647d]#012 NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. Oct 29 14:29:45 95z3zz1 ceph-mon: --- begin dump of recent events --- Oct 29 14:29:45 95z3zz1 ceph-mon: -450> 2015-10-29 14:29:44.484656 7f315aa5d8c0 5 asok(0x4daa000) register_command perfcounters_dump hook 0x4d32050 Oct 29 14:29:45 95z3zz1 ceph-mon: -449> 2015-10-29 14:29:44.484677 7f315aa5d8c0 5 asok(0x4daa000) register_command 1 hook 0x4d32050 Oct 29 14:29:45 95z3zz1 ceph-mon: -448> 2015-10-29 14:29:44.484681 7f315aa5d8c0 5 asok(0x4daa000) register_command perf dump hook 0x4d32050 Oct 29 14:29:45 95z3zz1 ceph-mon: -447> 2015-10-29 14:29:44.484686 7f315aa5d8c0 5 asok(0x4daa000) register_command perfcounters_schema hook 0x4d32050 Oct 29 14:29:45 95z3zz1 ceph-mon: -446> 2015-10-29 14:29:44.484688 7f315aa5d8c0 5 asok(0x4daa000) register_command 2 hook 0x4d32050 Oct 29 14:29:45 95z3zz1 ceph-mon: -445> 2015-10-29 14:29:44.484690 7f315aa5d8c0 5 asok(0x4daa000) register_command perf schema hook 0x4d32050 Oct 29 14:29:45 95z3zz1 ceph-mon: -444> 2015-10-29 14:29:44.484692 7f315aa5d8c0 5 asok(0x4daa000) register_command perf reset hook 0x4d32050 Oct 29 14:29:45 95z3zz1 ceph-mon: -443> 2015-10-29 14:29:44.484694 7f315aa5d8c0 5 asok(0x4daa000) register_command config show hook 0x4d32050 Oct 29 14:29:45 95z3zz1 ceph-mon: -442> 2015-10-29 14:29:44.4
Re: [ceph-users] rsync mirror download.ceph.com - broken file on rsync server
On Wed, Oct 28, 2015 at 7:54 PM, Matt Taylor wrote: > I still see rsync errors due to permissions on the remote side: > Thanks for the heads' up; I bet another upload rsync process got interrupted there. I've run the following to remove all the oddly-named RPM files: for f in $(locate *.rpm.* ) ; do rm -i $f; done Please let us know if there are other problems like this. - Ken ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Benchmark individual OSD's
On 29 October 2015 at 19:24, Burkhard Linke < burkhard.li...@computational.bio.uni-giessen.de> wrote: > # ceph tell osd.1 bench > { > "bytes_written": 1073741824, > "blocksize": 4194304, > "bytes_per_sec": 117403227.00 > } > > It might help you to figure out whether individual OSDs do not perform as > expected. The amount of data written is limited (but there's a config > setting for it). With 1 GB as in the example above, the write operation > will probably be limited to the journal. > Thats perfect, thanks Burkhard, it lets me compare osd's and configurations. -- Lindsay ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Input/output error
Hi, Please search the google, there exist the answer. As i rembered: 1. low kernel version rbd not support some feature of CRUSH, Check the /var/log/message 2. sudo rbd map foo --name client.admin -p {pol_name} 3. also specify the -p {pol_name} when you create the image Thanks! -- hzwulibin 2015-10-29 - 发件人:Wah Peng 发送日期:2015-10-29 15:13 收件人:ceph-users@lists.ceph.com 抄送: 主题:[ceph-users] Input/output error hello, do you know why this happens when I did it following the official ducumentation. $ sudo rbd map foo --name client.admin rbd: add failed: (5) Input/output error the OS kernel, $ uname -a Linux ceph.yygamedev.com 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux I tried this way, ceph osd getcrushmap -o /tmp/crush crushtool -i /tmp/crush --set-chooseleaf_vary_r 0 -o /tmp/crush.new ceph osd setcrushmap -i /tmp/crush.new but got no luck. my cluster status seems OK, $ ceph health HEALTH_OK $ ceph osd tree # idweight type name up/down reweight -1 0.24root default -2 0.24host ceph2 0 0.07999 osd.0 up 1 1 0.07999 osd.1 up 1 2 0.07999 osd.2 up 1 Thanks in advance. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Issue with ceph-deploy
I'm following the tutorial at http://docs.ceph.com/docs/v0.79/start/quick-ceph-deploy/ to deploy a monitor using % ceph-deploy mon create-initial But I got the following errors: ... [ceph-node1][INFO ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-node1.asok mon_status [ceph-node1][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory ... I checked on the monitor ceph-node1, it turned out that there were asok file, but with it's `hostname` rather than the /etc/hosts alias e.g. ceph-node1: [root@GZH-ZB-SA2-Kiev-116 ~]# ls /var/run/ceph/ ceph-mon.GZH-ZB-SA2-Kiev-116.asok mon.GZH-ZB-SA2-Kiev-116.pid When creating the socket file, is it better to use some hostname-neutral or passing-from-admin-node name? I think we shall at least modify ceph-deploy to run ceph --admin-daemon with the monitor's real hostname. Most users tend to use IP addresses or aliases rather than admin node's hostname. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Core dump while getting a volume real size with a python script
It sounds like you ran into this issue [1]. It's been fixed in upstream master and infernalis branches, but the backport is still awaiting release on hammer. [1] http://tracker.ceph.com/issues/12885 -- Jason Dillaman - Original Message - > From: "Giuseppe Civitella" > To: "ceph-users" > Sent: Thursday, October 29, 2015 4:44:02 AM > Subject: Re: [ceph-users] Core dump while getting a volume real size with a > python script > ... and this is the core dump output while executing the "rbd diff" command: > http://paste.openstack.org/show/477604/ > Regards, > Giuseppe > 2015-10-28 16:46 GMT+01:00 Giuseppe Civitella < giuseppe.civite...@gmail.com > > : > > Hi all, > > > I'm trying to get the real disk usage of a Cinder volume converting this > > bash > > commands to python: > > > http://cephnotes.ksperis.com/blog/2013/08/28/rbd-image-real-size > > > I wrote a small test function which has already worked in many cases but it > > stops with a core dump while trying to calculate the real size of a > > particular volume. > > > This is the function: > > > http://paste.openstack.org/show/477563/ > > > this is the error I get: > > > http://paste.openstack.org/show/477567/ > > > and these are the related rbd info: > > > http://paste.openstack.org/show/477568/ > > > Can anyone help me to debug the problem? > > > Thanks > > > Giuseppe > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Input/output error
On Thu, Oct 29, 2015 at 11:22 AM, Wah Peng wrote: > Thanks Gurjar. > Have loaded the rbd module, but got no luck. > what dmesg shows, > > [119192.384770] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 2 < > server's 42040002, missing 4204 > [119192.388744] libceph: mon0 172.17.6.176:6789 missing required protocol > features > [119202.400782] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 2 < > server's 42040002, missing 4204 > [119202.404756] libceph: mon0 172.17.6.176:6789 missing required protocol > features > [119212.416758] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 2 < > server's 42040002, missing 4204 > [119212.420732] libceph: mon0 172.17.6.176:6789 missing required protocol > features > [119222.432783] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 2 < > server's 42040002, missing 4204 > [119222.436756] libceph: mon0 172.17.6.176:6789 missing required protocol > features > [119232.448780] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 2 < > server's 42040002, missing 4204 > [119232.452754] libceph: mon0 172.17.6.176:6789 missing required protocol > features Our messages raced - you are missing CRUSH_TUNABLES, CRUSH_TUNABLES2 and, more importantly, OSDHASHPSPOOL: starting with ceph 0.64, pools are created with hashpspool flag set. If you *really* want to try and run 3.2 kernel client, you'll need to clear it with "ceph osd pool set $poolname hashpspool false" and then reset all crush tunables to legacy values [1]. Note that we recommend at least >=3.10 for the kernel client [2]. [1] http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables [2] http://docs.ceph.com/docs/master/start/os-recommendations/ Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Input/output error
$ ceph -v ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70) thanks. On 2015/10/29 星期四 18:23, Ilya Dryomov wrote: What's your ceph version and what does dmesg say? 3.2 is*way* too old, you are probably missing more than one required feature bit. See http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables. Thanks, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Input/output error
On Thu, Oct 29, 2015 at 8:13 AM, Wah Peng wrote: > hello, > > do you know why this happens when I did it following the official > ducumentation. > > $ sudo rbd map foo --name client.admin > > rbd: add failed: (5) Input/output error > > > the OS kernel, > > $ uname -a > Linux ceph.yygamedev.com 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 > UTC 2012 x86_64 x86_64 x86_64 GNU/Linux > > > I tried this way, > > ceph osd getcrushmap -o /tmp/crush > crushtool -i /tmp/crush --set-chooseleaf_vary_r 0 -o /tmp/crush.new > ceph osd setcrushmap -i /tmp/crush.new > > but got no luck. > > my cluster status seems OK, > > $ ceph health > HEALTH_OK > > $ ceph osd tree > # idweight type name up/down reweight > -1 0.24root default > -2 0.24host ceph2 > 0 0.07999 osd.0 up 1 > 1 0.07999 osd.1 up 1 > 2 0.07999 osd.2 up 1 What's your ceph version and what does dmesg say? 3.2 is *way* too old, you are probably missing more than one required feature bit. See http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables. Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Input/output error
Thanks Gurjar. Have loaded the rbd module, but got no luck. what dmesg shows, [119192.384770] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 2 < server's 42040002, missing 4204 [119192.388744] libceph: mon0 172.17.6.176:6789 missing required protocol features [119202.400782] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 2 < server's 42040002, missing 4204 [119202.404756] libceph: mon0 172.17.6.176:6789 missing required protocol features [119212.416758] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 2 < server's 42040002, missing 4204 [119212.420732] libceph: mon0 172.17.6.176:6789 missing required protocol features [119222.432783] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 2 < server's 42040002, missing 4204 [119222.436756] libceph: mon0 172.17.6.176:6789 missing required protocol features [119232.448780] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 2 < server's 42040002, missing 4204 [119232.452754] libceph: mon0 172.17.6.176:6789 missing required protocol features Thx. On 2015/10/29 星期四 18:11, Gurjar, Unmesh wrote: Hi, You might want to confirm if the rbd module is loaded (sudo modprobe rbd) on the ceph-client node and give it a retry. If you still encounter the issue, post back the snippet of error logs in syslog or dmesg to take it forward. Regards, Unmesh G. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Wah Peng Sent: Thursday, October 29, 2015 12:44 PM To: ceph-users@lists.ceph.com Subject: [ceph-users] Input/output error hello, do you know why this happens when I did it following the official ducumentation. $ sudo rbd map foo --name client.admin rbd: add failed: (5) Input/output error the OS kernel, $ uname -a Linux ceph.yygamedev.com 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux I tried this way, ceph osd getcrushmap -o /tmp/crush crushtool -i /tmp/crush --set-chooseleaf_vary_r 0 -o /tmp/crush.new ceph osd setcrushmap -i /tmp/crush.new but got no luck. my cluster status seems OK, $ ceph health HEALTH_OK $ ceph osd tree # idweight type name up/down reweight -1 0.24root default -2 0.24host ceph2 0 0.07999 osd.0 up 1 1 0.07999 osd.1 up 1 2 0.07999 osd.2 up 1 Thanks in advance. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS and page cache
Hi, On 10/29/2015 09:30 AM, Sage Weil wrote: On Thu, 29 Oct 2015, Yan, Zheng wrote: On Thu, Oct 29, 2015 at 2:21 PM, Gregory Farnum wrote: On Wed, Oct 28, 2015 at 8:38 PM, Yan, Zheng wrote: On Thu, Oct 29, 2015 at 1:10 AM, Burkhard Linke I tried to dig into the ceph-fuse code, but I was unable to find the fragment that is responsible for flushing the data from the page cache. fuse kernel code invalidates page cache on opening file. you can disable this behaviour by setting ""fuse use invalidate cb" config option to true. With that option ceph-fuse finally works with page cache: $ time cat /ceph/volumes/biodb/asn1/nr.3*.psq > /dev/null real2m0.979s user0m0.020s sys0m3.164s $ time cat /ceph/volumes/biodb/asn1/nr.3*.psq > /dev/null real0m2.106s user0m0.000s sys0m1.996s Zheng, do you know any reason we shouldn't make that the default value now? There was a loopback deadlock (which is why it's disabled by default) but I don't remember the details offhand well enough to know if your recent work in those interfaces has fixed it. Or Sage? -Greg there is no loopback deadlock now, because we use a separate thread to invalidate kernel page cache. I think we can enable this option safely. ...as long as nobody blocks waiting for invalidate while holding a lock (client_lock?) that could prevent other fuse ops like write (pretty sure that was the deadlock we saw before). I worry this could still happen with a writer (or reader?) getting stuck in a check_caps() type situation while the invalidate cb is waiting on a page lock held by the calling kernel syscall... I have created an issue to track this: http://tracker.ceph.com/issues/13640 It would be great it the patch is ported to one of the next hammer releases after the potential deadlock situation is analysed. Best regards, Burkhard ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Benchmark individual OSD's
Hi, On 10/29/2015 09:54 AM, Luis Periquito wrote: Only way I can think of that is creating a new crush rule that selects that specific OSD with min_size = max_size = 1, then creating a pool with size = 1 and using that crush rule. Then you can use that pool as you'd use any other pool. I haven't tested however it should work. There's also the osd bench command that writes a certain amount of data to a given OSD: # ceph tell osd.1 bench { "bytes_written": 1073741824, "blocksize": 4194304, "bytes_per_sec": 117403227.00 } It might help you to figure out whether individual OSDs do not perform as expected. The amount of data written is limited (but there's a config setting for it). With 1 GB as in the example above, the write operation will probably be limited to the journal. Regards, Burkhard ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Benchmark individual OSD's
Only way I can think of that is creating a new crush rule that selects that specific OSD with min_size = max_size = 1, then creating a pool with size = 1 and using that crush rule. Then you can use that pool as you'd use any other pool. I haven't tested however it should work. On Thu, Oct 29, 2015 at 1:44 AM, Lindsay Mathieson wrote: > > On 29 October 2015 at 11:39, Lindsay Mathieson > wrote: >> >> Is there a way to benchmark individual OSD's? > > > nb - Non-destructive :) > > > -- > Lindsay > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Core dump while getting a volume real size with a python script
... and this is the core dump output while executing the "rbd diff" command: http://paste.openstack.org/show/477604/ Regards, Giuseppe 2015-10-28 16:46 GMT+01:00 Giuseppe Civitella : > Hi all, > > I'm trying to get the real disk usage of a Cinder volume converting this > bash commands to python: > http://cephnotes.ksperis.com/blog/2013/08/28/rbd-image-real-size > > I wrote a small test function which has already worked in many cases but > it stops with a core dump while trying to calculate the real size of a > particular volume. > > This is the function: > http://paste.openstack.org/show/477563/ > > this is the error I get: > http://paste.openstack.org/show/477567/ > > and these are the related rbd info: > http://paste.openstack.org/show/477568/ > > Can anyone help me to debug the problem? > > Thanks > Giuseppe > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS and page cache
On Thu, 29 Oct 2015, Yan, Zheng wrote: > On Thu, Oct 29, 2015 at 2:21 PM, Gregory Farnum wrote: > > On Wed, Oct 28, 2015 at 8:38 PM, Yan, Zheng wrote: > >> On Thu, Oct 29, 2015 at 1:10 AM, Burkhard Linke > >>> I tried to dig into the ceph-fuse code, but I was unable to find the > >>> fragment that is responsible for flushing the data from the page cache. > >>> > >> > >> fuse kernel code invalidates page cache on opening file. you can > >> disable this behaviour by setting ""fuse use invalidate cb" config > >> option to true. > > > > Zheng, do you know any reason we shouldn't make that the default value > > now? There was a loopback deadlock (which is why it's disabled by > > default) but I don't remember the details offhand well enough to know > > if your recent work in those interfaces has fixed it. Or Sage? > > -Greg > > there is no loopback deadlock now, because we use a separate thread to > invalidate kernel page cache. I think we can enable this option > safely. ...as long as nobody blocks waiting for invalidate while holding a lock (client_lock?) that could prevent other fuse ops like write (pretty sure that was the deadlock we saw before). I worry this could still happen with a writer (or reader?) getting stuck in a check_caps() type situation while the invalidate cb is waiting on a page lock held by the calling kernel syscall... sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS and page cache
On Thu, Oct 29, 2015 at 2:21 PM, Gregory Farnum wrote: > On Wed, Oct 28, 2015 at 8:38 PM, Yan, Zheng wrote: >> On Thu, Oct 29, 2015 at 1:10 AM, Burkhard Linke >>> I tried to dig into the ceph-fuse code, but I was unable to find the >>> fragment that is responsible for flushing the data from the page cache. >>> >> >> fuse kernel code invalidates page cache on opening file. you can >> disable this behaviour by setting ""fuse use invalidate cb" config >> option to true. > > Zheng, do you know any reason we shouldn't make that the default value > now? There was a loopback deadlock (which is why it's disabled by > default) but I don't remember the details offhand well enough to know > if your recent work in those interfaces has fixed it. Or Sage? > -Greg there is no loopback deadlock now, because we use a separate thread to invalidate kernel page cache. I think we can enable this option safely. Regards Yan, Zheng ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Input/output error
hello, do you know why this happens when I did it following the official ducumentation. $ sudo rbd map foo --name client.admin rbd: add failed: (5) Input/output error the OS kernel, $ uname -a Linux ceph.yygamedev.com 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux I tried this way, ceph osd getcrushmap -o /tmp/crush crushtool -i /tmp/crush --set-chooseleaf_vary_r 0 -o /tmp/crush.new ceph osd setcrushmap -i /tmp/crush.new but got no luck. my cluster status seems OK, $ ceph health HEALTH_OK $ ceph osd tree # idweight type name up/down reweight -1 0.24root default -2 0.24host ceph2 0 0.07999 osd.0 up 1 1 0.07999 osd.1 up 1 2 0.07999 osd.2 up 1 Thanks in advance. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com