date:20151029

[ceph-users] Is lttng enable by default in debian hammer-0.94.5?

2015-10-29 Thread hzwulibin

Hi, everyone

After install hammer-0.94.5 in debian, i want to trace the librbd by lttng, but 
after done follow steps, i got nothing:
 2036  mkdir -p traces
 2037  lttng create -o traces librbd
 2038  lttng enable-event -u 'librbd:*'
 2039  lttng add-context -u -t pthread_id
 2040  lttng start
 2041  lttng stop

So, is the lttng enabled in this version on debian?

Thanks!

--
hzwulibin
2015-10-30
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS and page cache

2015-10-29 Thread Yan, Zheng

On Thu, Oct 29, 2015 at 4:30 PM, Sage Weil  wrote:
> On Thu, 29 Oct 2015, Yan, Zheng wrote:
>> On Thu, Oct 29, 2015 at 2:21 PM, Gregory Farnum  wrote:
>> > On Wed, Oct 28, 2015 at 8:38 PM, Yan, Zheng  wrote:
>> >> On Thu, Oct 29, 2015 at 1:10 AM, Burkhard Linke
>> >>> I tried to dig into the ceph-fuse code, but I was unable to find the
>> >>> fragment that is responsible for flushing the data from the page cache.
>> >>>
>> >>
>> >> fuse kernel code invalidates page cache on opening file. you can
>> >> disable this behaviour by setting ""fuse use invalidate cb"  config
>> >> option to true.
>> >
>> > Zheng, do you know any reason we shouldn't make that the default value
>> > now? There was a loopback deadlock (which is why it's disabled by
>> > default) but I don't remember the details offhand well enough to know
>> > if your recent work in those interfaces has fixed it. Or Sage?
>> > -Greg
>>
>> there is no loopback deadlock now, because we use a separate thread to
>> invalidate kernel page cache. I think we can enable this option
>> safely.
>
> ...as long as nobody blocks waiting for invalidate while holding a lock
> (client_lock?) that could prevent other fuse ops like write (pretty sure
> that was the deadlock we saw before).  I worry this could still happen
> with a writer (or reader?) getting stuck in a check_caps() type situation
> while the invalidate cb is waiting on a page lock held by the calling
> kernel syscall...
>

the invalidate thread does not hold client_lock while invalidating
kernel page cache.

Regards
Yan, Zheng
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd hang

2015-10-29 Thread Joe Ryner

More info

output of dmesg

[259956.804942] libceph: osd7 10.134.128.42:6806 socket closed (con state OPEN)
[260752.788609] libceph: osd1 10.134.128.43:6800 socket closed (con state OPEN)
[260757.908206] libceph: osd2 10.134.128.43:6803 socket closed (con state OPEN)
[260763.181751] libceph: osd3 10.134.128.43:6806 socket closed (con state OPEN)
[260852.224607] libceph: osd6 10.134.128.42:6803 socket closed (con state OPEN)
[260852.510451] libceph: osd5 10.134.128.42:6800 socket closed (con state OPEN)
[260856.868099] libceph: osd7 10.134.128.42:6806 socket closed (con state OPEN)
[261652.890656] libceph: osd1 10.134.128.43:6800 socket closed (con state OPEN)
[261657.972579] libceph: osd2 10.134.128.43:6803 socket closed (con state OPEN)
[261663.283701] libceph: osd3 10.134.128.43:6806 socket closed (con state OPEN)
[261752.325749] libceph: osd6 10.134.128.42:6803 socket closed (con state OPEN)
[261752.611505] libceph: osd5 10.134.128.42:6800 socket closed (con state OPEN)
[261756.969340] libceph: osd7 10.134.128.42:6806 socket closed (con state OPEN)
[262552.961741] libceph: osd1 10.134.128.43:6800 socket closed (con state OPEN)
[262558.074441] libceph: osd2 10.134.128.43:6803 socket closed (con state OPEN)
[262563.385635] libceph: osd3 10.134.128.43:6806 socket closed (con state OPEN)
[262652.427089] libceph: osd6 10.134.128.42:6803 socket closed (con state OPEN)
[262652.712681] libceph: osd5 10.134.128.42:6800 socket closed (con state OPEN)
[262657.070456] libceph: osd7 10.134.128.42:6806 socket closed (con state OPEN)

I noticed that the osds are talking on 10.134.128.42 which is a part of the 
public network but I have defined the cluster network as 10.134.128.64/26

The machine has two nics   10.134.128.41 and 10.134.128.105.

In dmesg output I should be seeing the socket closed spam on 10.134.128.10{5,6} 
right?  

ceph.conf snippett (see full in below)
[global]
 public network = 10.134.128.0/26
 cluster network = 10.134.128.64/26

- Original Message -
From: "Jason Dillaman" 
To: "Joe Ryner" 
Cc: ceph-us...@ceph.com
Sent: Thursday, October 29, 2015 12:05:38 PM
Subject: Re: [ceph-users] rbd hang

I don't see the read request hitting the wire, so I am thinking your client 
cannot talk to the primary PG for the 'rb.0.16cf.238e1f29.' object. 
 Try adding "debug objecter = 20" to your configuration to get more details.

-- 

Jason Dillaman 

- Original Message -
> From: "Joe Ryner" 
> To: ceph-us...@ceph.com
> Sent: Thursday, October 29, 2015 12:22:01 PM
> Subject: [ceph-users] rbd hang
> 
> i,
> 
> I am having a strange problem with our development cluster.  When I run rbd
> export it just hangs.  I have been running ceph for a long time and haven't
> encountered this kind of issue.  Any ideas as to what is going on?
> 
> rbd -p locks export seco101ira -
> 
> 
> I am running
> 
> Centos 6.6 x86 64
> 
> ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70)
> 
> I have enabled debugging and get the following when I run the command
> 
> [root@durbium ~]# rbd -p locks export seco101ira -
> 2015-10-29 11:17:08.183597 7fc3334fa7c0  1 librados: starting msgr at :/0
> 2015-10-29 11:17:08.183613 7fc3334fa7c0  1 librados: starting objecter
> 2015-10-29 11:17:08.183739 7fc3334fa7c0  1 -- :/0 messenger.start
> 2015-10-29 11:17:08.183779 7fc3334fa7c0  1 librados: setting wanted keys
> 2015-10-29 11:17:08.183782 7fc3334fa7c0  1 librados: calling monclient init
> 2015-10-29 11:17:08.184365 7fc3334fa7c0  1 -- :/1024687 -->
> 10.134.128.42:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x15ba900
> con 0x15ba540
> 2015-10-29 11:17:08.185006 7fc3334f2700  1 -- 10.134.128.41:0/1024687 learned
> my addr 10.134.128.41:0/1024687
> 2015-10-29 11:17:08.185995 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 1  mon_map v1  491+0+0 (318324477 0 0)
> 0x7fc318000be0 con 0x15ba540
> 2015-10-29 11:17:08.186213 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 2  auth_reply(proto 2 0 (0) Success) v1 
> 33+0+0 (4093383511 0 0) 0x7fc318001090 con 0x15ba540
> 2015-10-29 11:17:08.186544 7fc32da9a700  1 -- 10.134.128.41:0/1024687 -->
> 10.134.128.42:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0
> 0x7fc31c001700 con 0x15ba540
> 2015-10-29 11:17:08.187160 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 3  auth_reply(proto 2 0 (0) Success) v1 
> 206+0+0 (2382192463 0 0) 0x7fc318001090 con 0x15ba540
> 2015-10-29 11:17:08.187354 7fc32da9a700  1 -- 10.134.128.41:0/1024687 -->
> 10.134.128.42:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0
> 0x7fc31c002220 con 0x15ba540
> 2015-10-29 11:17:08.188001 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 4  auth_reply(proto 2 0 (0) Success) v1 
> 393+0+0 (34117402 0 0) 0x7fc3180008c0 con 0x15ba540
> 2015-10-29 11:17:08.188148 7fc32da9a700  1 -- 10.134.128.41:0/1024687 -->
> 10.134.128.42:6789/0 -- mon

Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd

2015-10-29 Thread Voloshanenko Igor

>From all we analyzed - look like - it's this issue
http://tracker.ceph.com/issues/13045

PR: https://github.com/ceph/ceph/pull/6097

Can anyone help us to confirm this? :)

2015-10-29 23:13 GMT+02:00 Voloshanenko Igor :

> Additional trace:
>
> #0  0x7f30f9891cc9 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x7f30f98950d8 in __GI_abort () at abort.c:89
> #2  0x7f30f87b36b5 in __gnu_cxx::__verbose_terminate_handler() () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #3  0x7f30f87b1836 in ?? () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #4  0x7f30f87b1863 in std::terminate() () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #5  0x7f30f87b1aa2 in __cxa_throw () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #6  0x7f2fddb50778 in ceph::__ceph_assert_fail
> (assertion=assertion@entry=0x7f2fdddeca05 "sub < m_subsys.size()",
> file=file@entry=0x7f2fdddec9f0 "./log/SubsystemMap.h", line=line@entry
> =62,
> func=func@entry=0x7f2fdddedba0
> <_ZZN4ceph3log12SubsystemMap13should_gatherEjiE19__PRETTY_FUNCTION__> "bool
> ceph::log::SubsystemMap::should_gather(unsigned int, int)") at
> common/assert.cc:77
> #7  0x7f2fdda1fed2 in ceph::log::SubsystemMap::should_gather
> (level=, sub=, this=)
> at ./log/SubsystemMap.h:62
> #8  0x7f2fdda3b693 in ceph::log::SubsystemMap::should_gather
> (this=, sub=, level=)
> at ./log/SubsystemMap.h:61
> #9  0x7f2fddd879be in ObjectCacher::flusher_entry
> (this=0x7f2ff80b27a0) at osdc/ObjectCacher.cc:1527
> #10 0x7f2fddd9851d in ObjectCacher::FlusherThread::entry
> (this=) at osdc/ObjectCacher.h:374
> #11 0x7f30f9c28182 in start_thread (arg=0x7f2e1a7fc700) at
> pthread_create.c:312
> #12 0x7f30f995547d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>
> 2015-10-29 17:38 GMT+02:00 Voloshanenko Igor 
> :
>
>> Hi Wido and all community.
>>
>> We catched very idiotic issue on our Cloudstack installation, which
>> related to ceph and possible to java-rados lib.
>>
>> So, we have constantly agent crashed (which cause very big problem for
>> us... ).
>>
>> When agent crashed - it's crash JVM. And no event in logs at all.
>> We enabled crush dump, and after crash we see next picture:
>>
>> #grep -A1 "Problematic frame" < /hs_err_pid30260.log
>>  Problematic frame:
>>  C  [librbd.so.1.0.0+0x5d681]
>>
>> # gdb /usr/lib/librbd.so.1.0.0 /var/tmp/cores/jsvc.25526.0.core
>> (gdb)  bt
>> ...
>> #7  0x7f30b9a1fed2 in ceph::log::SubsystemMap::should_gather
>> (level=, sub=, this=)
>> at ./log/SubsystemMap.h:62
>> #8  0x7f30b9a3b693 in ceph::log::SubsystemMap::should_gather
>> (this=, sub=, level=)
>> at ./log/SubsystemMap.h:61
>> #9  0x7f30b9d879be in ObjectCacher::flusher_entry
>> (this=0x7f2fb4017910) at osdc/ObjectCacher.cc:1527
>> #10 0x7f30b9d9851d in ObjectCacher::FlusherThread::entry
>> (this=) at osdc/ObjectCacher.h:374
>>
>> From ceph code, this part executed when flushing cache object... And we
>> don;t understand why. Becasue we have absolutely different race condition
>> to reproduce it.
>>
>> As cloudstack have not good implementation yet of snapshot lifecycle,
>> sometime, it's happen, that some volumes already marked as EXPUNGED in DB
>> and then cloudstack try to delete bas Volume, before it's try to unprotect
>> it.
>>
>> Sure, unprotecting fail, normal exception returned back (fail because
>> snap has childs... )
>>
>> 2015-10-29 09:02:19,401 DEBUG [kvm.resource.KVMHAMonitor]
>> (Thread-1304:null) Executing:
>> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i
>> 10.44.253.13 -p /var/lib/libvirt/PRIMARY -m
>> /mnt/93655746-a9ef-394d-95e9-6e62471dd39f -h 10.44.253.11
>> 2015-10-29 09:02:19,412 DEBUG [kvm.resource.KVMHAMonitor]
>> (Thread-1304:null) Execution is successful.
>> 2015-10-29 09:02:19,554 INFO  [kvm.storage.LibvirtStorageAdaptor]
>> (agentRequest-Handler-5:null) Unprotecting and Removing RBD snapshots of
>> image 6789/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c prior to removing the image
>> 2015-10-29 09:02:19,571 DEBUG [kvm.storage.LibvirtStorageAdaptor]
>> (agentRequest-Handler-5:null) Succesfully connected to Ceph cluster at
>> cephmon.anolim.net:6789
>> 2015-10-29 09:02:19,608 DEBUG [kvm.storage.LibvirtStorageAdaptor]
>> (agentRequest-Handler-5:null) Unprotecting snapshot
>> cloudstack/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c@cloudstack-base-snap
>> 2015-10-29 09:02:19,627 DEBUG [kvm.storage.KVMStorageProcessor]
>> (agentRequest-Handler-5:null) Failed to delete volume:
>> com.cloud.utils.exception.CloudRuntimeException: com.ceph.rbd.RbdException:
>> Failed to unprotect snapshot cloudstack-base-snap
>> 2015-10-29 09:02:19,628 DEBUG [cloud.agent.Agent]
>> (agentRequest-Handler-5:null) Seq 4-1921583831:  { Ans: , MgmtId:
>> 161344838950, via: 4, Ver: v1, Flags: 10,
>> [{"com.cloud.agent.api.Answer":{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException:
>> com.ceph.rbd.RbdException:

Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd

2015-10-29 Thread Voloshanenko Igor

Additional trace:

#0  0x7f30f9891cc9 in __GI_raise (sig=sig@entry=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f30f98950d8 in __GI_abort () at abort.c:89
#2  0x7f30f87b36b5 in __gnu_cxx::__verbose_terminate_handler() () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x7f30f87b1836 in ?? () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x7f30f87b1863 in std::terminate() () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x7f30f87b1aa2 in __cxa_throw () from
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x7f2fddb50778 in ceph::__ceph_assert_fail
(assertion=assertion@entry=0x7f2fdddeca05 "sub < m_subsys.size()",
file=file@entry=0x7f2fdddec9f0 "./log/SubsystemMap.h", line=line@entry
=62,
func=func@entry=0x7f2fdddedba0
<_ZZN4ceph3log12SubsystemMap13should_gatherEjiE19__PRETTY_FUNCTION__> "bool
ceph::log::SubsystemMap::should_gather(unsigned int, int)") at
common/assert.cc:77
#7  0x7f2fdda1fed2 in ceph::log::SubsystemMap::should_gather
(level=, sub=, this=)
at ./log/SubsystemMap.h:62
#8  0x7f2fdda3b693 in ceph::log::SubsystemMap::should_gather
(this=, sub=, level=)
at ./log/SubsystemMap.h:61
#9  0x7f2fddd879be in ObjectCacher::flusher_entry (this=0x7f2ff80b27a0)
at osdc/ObjectCacher.cc:1527
#10 0x7f2fddd9851d in ObjectCacher::FlusherThread::entry
(this=) at osdc/ObjectCacher.h:374
#11 0x7f30f9c28182 in start_thread (arg=0x7f2e1a7fc700) at
pthread_create.c:312
#12 0x7f30f995547d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111

2015-10-29 17:38 GMT+02:00 Voloshanenko Igor :

> Hi Wido and all community.
>
> We catched very idiotic issue on our Cloudstack installation, which
> related to ceph and possible to java-rados lib.
>
> So, we have constantly agent crashed (which cause very big problem for
> us... ).
>
> When agent crashed - it's crash JVM. And no event in logs at all.
> We enabled crush dump, and after crash we see next picture:
>
> #grep -A1 "Problematic frame" < /hs_err_pid30260.log
>  Problematic frame:
>  C  [librbd.so.1.0.0+0x5d681]
>
> # gdb /usr/lib/librbd.so.1.0.0 /var/tmp/cores/jsvc.25526.0.core
> (gdb)  bt
> ...
> #7  0x7f30b9a1fed2 in ceph::log::SubsystemMap::should_gather
> (level=, sub=, this=)
> at ./log/SubsystemMap.h:62
> #8  0x7f30b9a3b693 in ceph::log::SubsystemMap::should_gather
> (this=, sub=, level=)
> at ./log/SubsystemMap.h:61
> #9  0x7f30b9d879be in ObjectCacher::flusher_entry
> (this=0x7f2fb4017910) at osdc/ObjectCacher.cc:1527
> #10 0x7f30b9d9851d in ObjectCacher::FlusherThread::entry
> (this=) at osdc/ObjectCacher.h:374
>
> From ceph code, this part executed when flushing cache object... And we
> don;t understand why. Becasue we have absolutely different race condition
> to reproduce it.
>
> As cloudstack have not good implementation yet of snapshot lifecycle,
> sometime, it's happen, that some volumes already marked as EXPUNGED in DB
> and then cloudstack try to delete bas Volume, before it's try to unprotect
> it.
>
> Sure, unprotecting fail, normal exception returned back (fail because snap
> has childs... )
>
> 2015-10-29 09:02:19,401 DEBUG [kvm.resource.KVMHAMonitor]
> (Thread-1304:null) Executing:
> /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i
> 10.44.253.13 -p /var/lib/libvirt/PRIMARY -m
> /mnt/93655746-a9ef-394d-95e9-6e62471dd39f -h 10.44.253.11
> 2015-10-29 09:02:19,412 DEBUG [kvm.resource.KVMHAMonitor]
> (Thread-1304:null) Execution is successful.
> 2015-10-29 09:02:19,554 INFO  [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-5:null) Unprotecting and Removing RBD snapshots of
> image 6789/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c prior to removing the image
> 2015-10-29 09:02:19,571 DEBUG [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-5:null) Succesfully connected to Ceph cluster at
> cephmon.anolim.net:6789
> 2015-10-29 09:02:19,608 DEBUG [kvm.storage.LibvirtStorageAdaptor]
> (agentRequest-Handler-5:null) Unprotecting snapshot
> cloudstack/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c@cloudstack-base-snap
> 2015-10-29 09:02:19,627 DEBUG [kvm.storage.KVMStorageProcessor]
> (agentRequest-Handler-5:null) Failed to delete volume:
> com.cloud.utils.exception.CloudRuntimeException: com.ceph.rbd.RbdException:
> Failed to unprotect snapshot cloudstack-base-snap
> 2015-10-29 09:02:19,628 DEBUG [cloud.agent.Agent]
> (agentRequest-Handler-5:null) Seq 4-1921583831:  { Ans: , MgmtId:
> 161344838950, via: 4, Ver: v1, Flags: 10,
> [{"com.cloud.agent.api.Answer":{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException:
> com.ceph.rbd.RbdException: Failed to unprotect snapshot
> cloudstack-base-snap","wait":0}}] }
> 2015-10-29 09:02:25,722 DEBUG [cloud.agent.Agent]
> (agentRequest-Handler-2:null) Processing command:
> com.cloud.agent.api.GetHostStatsCommand
> 2015-10-29 09:02:25,722 DEBUG [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-2:null) Executing: /bin/bash -c idle=$(top -b -n 1|
> a

Re: [ceph-users] rbd hang

2015-10-29 Thread Joe Ryner

Periodicly I am also getting these while waiting

2015-10-29 13:41:09.528674 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:14.528779 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:19.528907 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:22.515725 7f5c260d9700  1 -- 10.134.128.41:0/1025119 --> 
10.134.128.41:6789/0 -- mon_subscribe({monmap=6+,osdmap=4351}) v2 -- ?+0 
0x7f5c080073c0 con 0x2307540
2015-10-29 13:41:22.516453 7f5c270db700  1 -- 10.134.128.41:0/1025119 <== mon.0 
10.134.128.41:6789/0 21  mon_subscribe_ack(300s) v1  20+0+0 (646930372 
0 0) 0x7f5c10003170 con 0x2307540
2015-10-29 13:41:24.529012 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:29.529109 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:34.529209 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:39.529306 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:44.529402 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:49.529498 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:54.529597 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:41:59.529695 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:04.529800 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:09.529904 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:14.530004 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:19.530103 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:24.530200 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:29.530293 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:34.530385 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:39.530480 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:44.530594 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:49.530690 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:54.530787 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:42:59.530881 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:04.530980 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:09.531087 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:14.531190 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:19.531308 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:24.531417 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:29.531524 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:34.531629 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:39.531733 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:44.531836 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:49.531938 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:49.692028 7f5c270db700  1 client.7368.objecter ms_handle_reset 
on osd.2
2015-10-29 13:43:49.692051 7f5c270db700 10 client.7368.objecter reopen_session 
osd.2 session, addr now osd.2 10.134.128.43:6803/2741
2015-10-29 13:43:49.692176 7f5c270db700  1 -- 10.134.128.41:0/1025119 mark_down 
0x7f5c1c001c40 -- pipe dne
2015-10-29 13:43:49.692287 7f5c270db700 10 client.7368.objecter kick_requests 
for osd.2
2015-10-29 13:43:49.692300 7f5c270db700 10 client.7368.objecter 
maybe_request_map subscribing (onetime) to next osd map
2015-10-29 13:43:49.693706 7f5c270db700 10 client.7368.objecter 
ms_handle_connect 0x7f5c1c003810
2015-10-29 13:43:52.517670 7f5c260d9700  1 -- 10.134.128.41:0/1025119 --> 
10.134.128.41:6789/0 -- mon_subscribe({monmap=6+,osdmap=4351}) v2 -- ?+0 
0x7f5c080096c0 con 0x2307540
2015-10-29 13:43:52.518032 7f5c270db700  1 -- 10.134.128.41:0/1025119 <== mon.0 
10.134.128.41:6789/0 22  mon_subscribe_ack(300s) v1  20+0+0 (646930372 
0 0) 0x7f5c100056d0 con 0x2307540
2015-10-29 13:43:54.532041 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:43:59.532150 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:44:04.532252 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:44:09.532359 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:44:14.532467 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:44:19.532587 7f5c24fd6700 10 client.7368.objecter tick
2015-10-29 13:44:24.532692 7f5c24fd6700 10 client.7368.objecter tick


- Original Message -
From: "Jason Dillaman" 
To: "Joe Ryner" 
Cc: ceph-us...@ceph.com
Sent: Thursday, October 29, 2015 12:05:38 PM
Subject: Re: [ceph-users] rbd hang

I don't see the read request hitting the wire, so I am thinking your client 
cannot talk to the primary PG for the 'rb.0.16cf.238e1f29.' object. 
 Try adding "debug objecter = 20" to your configuration to get more details.

-- 

Jason Dillaman 

- Original Message -
> From: "Joe Ryner" 
> To: ceph-us...@ceph.com
> Sent: Thursday, October 29, 2015 12:22:01 PM
> Subject: [ceph-users] rbd hang
> 
> i,
> 
> I am having a strange problem with our development cluster.  When I run rbd
> export it just hangs.  I have been running ceph for a long time and haven't
> encountered this kind of issue.  Any ideas

Re: [ceph-users] radosgw get quota

2015-10-29 Thread Yehuda Sadeh-Weinraub

On Thu, Oct 29, 2015 at 11:29 AM, Derek Yarnell  wrote:
> Sorry, the information is in the headers.  So I think the valid question
> to follow up is why is this information in the headers and not the body
> of the request.  I think this is a bug, but maybe I am not aware of a
> subtly.  It would seem this json comes from this line[0].
>
> [0] -
> https://github.com/ceph/ceph/blob/83e10f7e2df0a71bd59e6ef2aa06b52b186fddaa/src/rgw/rgw_rest_user.cc#L697
>
> For example the information is returned in what seems to be the
> Content-type header as follows.  Maybe the missing : in the json
> encoding would explain something?

It's definitely a bug. It looks like we fail to call end_header()
before it, so everything is dumped before we close the http header.
Can you open a ceph tracker issue with the info you provided here?

Thanks,
Yehuda

>
> INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS
> connection (1): ceph.umiacs.umd.edu
> DEBUG:requests.packages.urllib3.connectionpool:"GET
> /admin/user?quota&format=json&uid=foo1209"a-type=user HTTP/1.1" 200 0
> INFO:rgwadmin.rgw:[('date', 'Thu, 29 Oct 2015 18:28:45 GMT'),
> ('{"enabled"', 'true,"max_size_kb":12345,"max_objects":-1}Content-type:
> application/json'), ('content-length', '0'), ('server', 'Apache/2.4.6
> (Red Hat Enterprise Linux) OpenSSL/1.0.1e-fips mod_wsgi/3.4 Python/2.7.5')]
>
> On 10/28/15 11:15 PM, Derek Yarnell wrote:
>> I have had this issue before, and I don't think I have resolved it.  I
>> have been using the RGW admin api to set quota based on the docs[0].
>> But I can't seem to be able to get it to cough up and show me the quota
>> now.  Any ideas I get a 200 back but no body, I have tested this on a
>> Firefly (0.80.5-9) and Hammer (0.87.2-0) cluster.  The latter is what
>> the logs are for.
>>
>> [0] - http://docs.ceph.com/docs/master/radosgw/adminops/#quotas
>>
>> DEBUG:rgwadmin.rgw:URL:
>> http://ceph.umiacs.umd.edu/admin/user?quota&uid=derek"a-type=user
>> DEBUG:rgwadmin.rgw:Access Key: RTJ1TL13CH613JRU2PJD
>> DEBUG:rgwadmin.rgw:Verify: True  CA Bundle: None
>> INFO:requests.packages.urllib3.connectionpool:Starting new HTTP
>> connection (1): ceph.umiacs.umd.edu
>> DEBUG:requests.packages.urllib3.connectionpool:"GET
>> /admin/user?quota&uid=derek"a-type=user HTTP/1.1" 200 0
>> INFO:rgwadmin.rgw:No JSON object could be decoded
>>
>>
>> 2015-10-28 23:02:46.445367 7f444cff1700  1 civetweb: 0x7f445c026d00:
>> 127.0.0.1 - - [28/Oct/2015:23:02:46 -0400] "GET /admin/user HTTP/1.1" -1
>> 0 - python-requests/2.7.0 CPython/2.7.5 Linux/3.10.0-229.14.1.el7.x86_64
>> 2015-10-28 23:03:02.063755 7f447ace2700  2
>> RGWDataChangesLog::ChangesRenewThread: start
>> 2015-10-28 23:03:17.139339 7f443cfd1700 20 RGWEnv::set(): HTTP_HOST:
>> localhost:7480
>> 2015-10-28 23:03:17.139357 7f443cfd1700 20 RGWEnv::set():
>> HTTP_ACCEPT_ENCODING: gzip, deflate
>> 2015-10-28 23:03:17.139358 7f443cfd1700 20 RGWEnv::set(): HTTP_ACCEPT: */*
>> 2015-10-28 23:03:17.139364 7f443cfd1700 20 RGWEnv::set():
>> HTTP_USER_AGENT: python-requests/2.7.0 CPython/2.7.5
>> Linux/3.10.0-229.14.1.el7.x86_64
>> 2015-10-28 23:03:17.139375 7f443cfd1700 20 RGWEnv::set(): HTTP_DATE:
>> Thu, 29 Oct 2015 03:03:17 GMT
>> 2015-10-28 23:03:17.139377 7f443cfd1700 20 RGWEnv::set():
>> HTTP_AUTHORIZATION: AWS RTJ1TL13CH613JRU2PJD:ZtDQkxc+Nqo04zVsNND0yx32lds=
>> 2015-10-28 23:03:17.139381 7f443cfd1700 20 RGWEnv::set():
>> HTTP_X_FORWARDED_FOR: 128.8.132.4
>> 2015-10-28 23:03:17.139383 7f443cfd1700 20 RGWEnv::set():
>> HTTP_X_FORWARDED_HOST: ceph.umiacs.umd.edu
>> 2015-10-28 23:03:17.139385 7f443cfd1700 20 RGWEnv::set():
>> HTTP_X_FORWARDED_SERVER: cephproxy00.umiacs.umd.edu
>> 2015-10-28 23:03:17.139387 7f443cfd1700 20 RGWEnv::set():
>> HTTP_CONNECTION: Keep-Alive
>> 2015-10-28 23:03:17.139392 7f443cfd1700 20 RGWEnv::set():
>> REQUEST_METHOD: GET
>> 2015-10-28 23:03:17.139394 7f443cfd1700 20 RGWEnv::set(): REQUEST_URI:
>> /admin/user
>> 2015-10-28 23:03:17.139397 7f443cfd1700 20 RGWEnv::set(): QUERY_STRING:
>> quota&uid=derek"a-type=user
>> 2015-10-28 23:03:17.139401 7f443cfd1700 20 RGWEnv::set(): REMOTE_USER:
>> 2015-10-28 23:03:17.139403 7f443cfd1700 20 RGWEnv::set(): SCRIPT_URI:
>> /admin/user
>> 2015-10-28 23:03:17.139408 7f443cfd1700 20 RGWEnv::set(): SERVER_PORT: 7480
>> 2015-10-28 23:03:17.139409 7f443cfd1700 20 HTTP_ACCEPT=*/*
>> 2015-10-28 23:03:17.139410 7f443cfd1700 20 HTTP_ACCEPT_ENCODING=gzip,
>> deflate
>> 2015-10-28 23:03:17.139411 7f443cfd1700 20 HTTP_AUTHORIZATION=AWS
>> RTJ1TL13CH613JRU2PJD:ZtDQkxc+Nqo04zVsNND0yx32lds=
>> 2015-10-28 23:03:17.139412 7f443cfd1700 20 HTTP_CONNECTION=Keep-Alive
>> 2015-10-28 23:03:17.139412 7f443cfd1700 20 HTTP_DATE=Thu, 29 Oct 2015
>> 03:03:17 GMT
>> 2015-10-28 23:03:17.139413 7f443cfd1700 20 HTTP_HOST=localhost:7480
>> 2015-10-28 23:03:17.139413 7f443cfd1700 20
>> HTTP_USER_AGENT=python-requests/2.7.0 CPython/2.7.5
>> Linux/3.10.0-229.14.1.el7.x86_64
>> 2015-10-28 23:03:17.139414 7f443cfd1700 20 HTTP_X_FORWARDED_FOR=128.8.132.4
>> 2015-10-2

Re: [ceph-users] radosgw get quota

2015-10-29 Thread Derek Yarnell

Sorry, the information is in the headers.  So I think the valid question
to follow up is why is this information in the headers and not the body
of the request.  I think this is a bug, but maybe I am not aware of a
subtly.  It would seem this json comes from this line[0].

[0] -
https://github.com/ceph/ceph/blob/83e10f7e2df0a71bd59e6ef2aa06b52b186fddaa/src/rgw/rgw_rest_user.cc#L697

For example the information is returned in what seems to be the
Content-type header as follows.  Maybe the missing : in the json
encoding would explain something?

INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS
connection (1): ceph.umiacs.umd.edu
DEBUG:requests.packages.urllib3.connectionpool:"GET
/admin/user?quota&format=json&uid=foo1209"a-type=user HTTP/1.1" 200 0
INFO:rgwadmin.rgw:[('date', 'Thu, 29 Oct 2015 18:28:45 GMT'),
('{"enabled"', 'true,"max_size_kb":12345,"max_objects":-1}Content-type:
application/json'), ('content-length', '0'), ('server', 'Apache/2.4.6
(Red Hat Enterprise Linux) OpenSSL/1.0.1e-fips mod_wsgi/3.4 Python/2.7.5')]

On 10/28/15 11:15 PM, Derek Yarnell wrote:
> I have had this issue before, and I don't think I have resolved it.  I
> have been using the RGW admin api to set quota based on the docs[0].
> But I can't seem to be able to get it to cough up and show me the quota
> now.  Any ideas I get a 200 back but no body, I have tested this on a
> Firefly (0.80.5-9) and Hammer (0.87.2-0) cluster.  The latter is what
> the logs are for.
> 
> [0] - http://docs.ceph.com/docs/master/radosgw/adminops/#quotas
> 
> DEBUG:rgwadmin.rgw:URL:
> http://ceph.umiacs.umd.edu/admin/user?quota&uid=derek"a-type=user
> DEBUG:rgwadmin.rgw:Access Key: RTJ1TL13CH613JRU2PJD
> DEBUG:rgwadmin.rgw:Verify: True  CA Bundle: None
> INFO:requests.packages.urllib3.connectionpool:Starting new HTTP
> connection (1): ceph.umiacs.umd.edu
> DEBUG:requests.packages.urllib3.connectionpool:"GET
> /admin/user?quota&uid=derek"a-type=user HTTP/1.1" 200 0
> INFO:rgwadmin.rgw:No JSON object could be decoded
> 
> 
> 2015-10-28 23:02:46.445367 7f444cff1700  1 civetweb: 0x7f445c026d00:
> 127.0.0.1 - - [28/Oct/2015:23:02:46 -0400] "GET /admin/user HTTP/1.1" -1
> 0 - python-requests/2.7.0 CPython/2.7.5 Linux/3.10.0-229.14.1.el7.x86_64
> 2015-10-28 23:03:02.063755 7f447ace2700  2
> RGWDataChangesLog::ChangesRenewThread: start
> 2015-10-28 23:03:17.139339 7f443cfd1700 20 RGWEnv::set(): HTTP_HOST:
> localhost:7480
> 2015-10-28 23:03:17.139357 7f443cfd1700 20 RGWEnv::set():
> HTTP_ACCEPT_ENCODING: gzip, deflate
> 2015-10-28 23:03:17.139358 7f443cfd1700 20 RGWEnv::set(): HTTP_ACCEPT: */*
> 2015-10-28 23:03:17.139364 7f443cfd1700 20 RGWEnv::set():
> HTTP_USER_AGENT: python-requests/2.7.0 CPython/2.7.5
> Linux/3.10.0-229.14.1.el7.x86_64
> 2015-10-28 23:03:17.139375 7f443cfd1700 20 RGWEnv::set(): HTTP_DATE:
> Thu, 29 Oct 2015 03:03:17 GMT
> 2015-10-28 23:03:17.139377 7f443cfd1700 20 RGWEnv::set():
> HTTP_AUTHORIZATION: AWS RTJ1TL13CH613JRU2PJD:ZtDQkxc+Nqo04zVsNND0yx32lds=
> 2015-10-28 23:03:17.139381 7f443cfd1700 20 RGWEnv::set():
> HTTP_X_FORWARDED_FOR: 128.8.132.4
> 2015-10-28 23:03:17.139383 7f443cfd1700 20 RGWEnv::set():
> HTTP_X_FORWARDED_HOST: ceph.umiacs.umd.edu
> 2015-10-28 23:03:17.139385 7f443cfd1700 20 RGWEnv::set():
> HTTP_X_FORWARDED_SERVER: cephproxy00.umiacs.umd.edu
> 2015-10-28 23:03:17.139387 7f443cfd1700 20 RGWEnv::set():
> HTTP_CONNECTION: Keep-Alive
> 2015-10-28 23:03:17.139392 7f443cfd1700 20 RGWEnv::set():
> REQUEST_METHOD: GET
> 2015-10-28 23:03:17.139394 7f443cfd1700 20 RGWEnv::set(): REQUEST_URI:
> /admin/user
> 2015-10-28 23:03:17.139397 7f443cfd1700 20 RGWEnv::set(): QUERY_STRING:
> quota&uid=derek"a-type=user
> 2015-10-28 23:03:17.139401 7f443cfd1700 20 RGWEnv::set(): REMOTE_USER:
> 2015-10-28 23:03:17.139403 7f443cfd1700 20 RGWEnv::set(): SCRIPT_URI:
> /admin/user
> 2015-10-28 23:03:17.139408 7f443cfd1700 20 RGWEnv::set(): SERVER_PORT: 7480
> 2015-10-28 23:03:17.139409 7f443cfd1700 20 HTTP_ACCEPT=*/*
> 2015-10-28 23:03:17.139410 7f443cfd1700 20 HTTP_ACCEPT_ENCODING=gzip,
> deflate
> 2015-10-28 23:03:17.139411 7f443cfd1700 20 HTTP_AUTHORIZATION=AWS
> RTJ1TL13CH613JRU2PJD:ZtDQkxc+Nqo04zVsNND0yx32lds=
> 2015-10-28 23:03:17.139412 7f443cfd1700 20 HTTP_CONNECTION=Keep-Alive
> 2015-10-28 23:03:17.139412 7f443cfd1700 20 HTTP_DATE=Thu, 29 Oct 2015
> 03:03:17 GMT
> 2015-10-28 23:03:17.139413 7f443cfd1700 20 HTTP_HOST=localhost:7480
> 2015-10-28 23:03:17.139413 7f443cfd1700 20
> HTTP_USER_AGENT=python-requests/2.7.0 CPython/2.7.5
> Linux/3.10.0-229.14.1.el7.x86_64
> 2015-10-28 23:03:17.139414 7f443cfd1700 20 HTTP_X_FORWARDED_FOR=128.8.132.4
> 2015-10-28 23:03:17.139415 7f443cfd1700 20
> HTTP_X_FORWARDED_HOST=ceph.umiacs.umd.edu
> 2015-10-28 23:03:17.139416 7f443cfd1700 20
> HTTP_X_FORWARDED_SERVER=cephproxy00.umiacs.umd.edu
> 2015-10-28 23:03:17.139416 7f443cfd1700 20
> QUERY_STRING=quota&uid=derek"a-type=user
> 2015-10-28 23:03:17.139417 7f443cfd1700 20 REMOTE_USER=
> 2015-10-28 23:03:17.139417 7f443cfd1700 20 REQUEST_METHO

Re: [ceph-users] rbd hang

2015-10-29 Thread Joe Ryner

rbd -p locks export seco101ira -
2015-10-29 13:13:49.487822 7f5c2cb3b7c0  1 librados: starting msgr at :/0
2015-10-29 13:13:49.487838 7f5c2cb3b7c0  1 librados: starting objecter
2015-10-29 13:13:49.487971 7f5c2cb3b7c0  1 -- :/0 messenger.start
2015-10-29 13:13:49.488027 7f5c2cb3b7c0  1 librados: setting wanted keys
2015-10-29 13:13:49.488031 7f5c2cb3b7c0  1 librados: calling monclient init
2015-10-29 13:13:49.488708 7f5c2cb3b7c0  1 -- :/1025119 --> 
10.134.128.41:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x2307900 con 
0x2307540
2015-10-29 13:13:49.489236 7f5c2cb33700  1 -- 10.134.128.41:0/1025119 learned 
my addr 10.134.128.41:0/1025119
2015-10-29 13:13:49.489498 7f5c270db700 10 client.?.objecter ms_handle_connect 
0x2307540
2015-10-29 13:13:49.489646 7f5c270db700 10 client.?.objecter resend_mon_ops
2015-10-29 13:13:49.490171 7f5c270db700  1 -- 10.134.128.41:0/1025119 <== mon.0 
10.134.128.41:6789/0 1  mon_map v1  491+0+0 (318324477 0 0) 
0x7f5c1be0 con 0x2307540
2015-10-29 13:13:49.490316 7f5c270db700  1 -- 10.134.128.41:0/1025119 <== mon.0 
10.134.128.41:6789/0 2  auth_reply(proto 2 0 (0) Success) v1  33+0+0 
(3748436714 0 0) 0x7f5c10001090 con 0x2307540
2015-10-29 13:13:49.490656 7f5c270db700  1 -- 10.134.128.41:0/1025119 --> 
10.134.128.41:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0x7f5c1c0018a0 
con 0x2307540
2015-10-29 13:13:49.491183 7f5c270db700  1 -- 10.134.128.41:0/1025119 <== mon.0 
10.134.128.41:6789/0 3  auth_reply(proto 2 0 (0) Success) v1  206+0+0 
(1658299125 0 0) 0x7f5c10001090 con 0x2307540
2015-10-29 13:13:49.491329 7f5c270db700  1 -- 10.134.128.41:0/1025119 --> 
10.134.128.41:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 
0x7f5c1c002250 con 0x2307540
2015-10-29 13:13:49.491871 7f5c270db700  1 -- 10.134.128.41:0/1025119 <== mon.0 
10.134.128.41:6789/0 4  auth_reply(proto 2 0 (0) Success) v1  393+0+0 
(1503133956 0 0) 0x7f5c18c0 con 0x2307540
2015-10-29 13:13:49.491981 7f5c270db700  1 -- 10.134.128.41:0/1025119 --> 
10.134.128.41:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x2303c10 con 
0x2307540
2015-10-29 13:13:49.492197 7f5c2cb3b7c0 10 client.7368.objecter 
maybe_request_map subscribing (onetime) to next osd map
2015-10-29 13:13:49.492234 7f5c2cb3b7c0  1 -- 10.134.128.41:0/1025119 --> 
10.134.128.41:6789/0 -- mon_subscribe({monmap=6+,osdmap=0}) v2 -- ?+0 0x23048a0 
con 0x2307540
2015-10-29 13:13:49.492263 7f5c2cb3b7c0  1 -- 10.134.128.41:0/1025119 --> 
10.134.128.41:6789/0 -- mon_subscribe({monmap=6+,osdmap=0}) v2 -- ?+0 0x2304e40 
con 0x2307540
2015-10-29 13:13:49.492595 7f5c270db700  1 -- 10.134.128.41:0/1025119 <== mon.0 
10.134.128.41:6789/0 5  mon_map v1  491+0+0 (318324477 0 0) 
0x7f5c10001300 con 0x2307540
2015-10-29 13:13:49.492758 7f5c270db700  1 -- 10.134.128.41:0/1025119 <== mon.0 
10.134.128.41:6789/0 6  mon_subscribe_ack(300s) v1  20+0+0 (646930372 0 
0) 0x7f5c100015a0 con 0x2307540
2015-10-29 13:13:49.493171 7f5c270db700  1 -- 10.134.128.41:0/1025119 <== mon.0 
10.134.128.41:6789/0 7  osd_map(4350..4350 src has 3829..4350) v3  
7562+0+0 (1787729222 0 0) 0x7f5c18c0 con 0x2307540
2015-10-29 13:13:49.493390 7f5c2cb3b7c0  1 librados: init done
2015-10-29 13:13:49.493431 7f5c2cb3b7c0 10 librados: wait_for_osdmap waiting
2015-10-29 13:13:49.493557 7f5c270db700  3 client.7368.objecter handle_osd_map 
got epochs [4350,4350] > 0
2015-10-29 13:13:49.493572 7f5c270db700  3 client.7368.objecter handle_osd_map 
decoding full epoch 4350
2015-10-29 13:13:49.493831 7f5c270db700 20 client.7368.objecter dump_active .. 
0 homeless
2015-10-29 13:13:49.493861 7f5c2cb3b7c0 10 librados: wait_for_osdmap done 
waiting
2015-10-29 13:13:49.493863 7f5c270db700  1 -- 10.134.128.41:0/1025119 <== mon.0 
10.134.128.41:6789/0 8  mon_subscribe_ack(300s) v1  20+0+0 (646930372 0 
0) 0x7f5c10003170 con 0x2307540
2015-10-29 13:13:49.493880 7f5c270db700  1 -- 10.134.128.41:0/1025119 <== mon.0 
10.134.128.41:6789/0 9  osd_map(4350..4350 src has 3829..4350) v3  
7562+0+0 (1787729222 0 0) 0x7f5c10005230 con 0x2307540
2015-10-29 13:13:49.493889 7f5c270db700  3 client.7368.objecter handle_osd_map 
ignoring epochs [4350,4350] <= 4350
2015-10-29 13:13:49.493891 7f5c270db700 20 client.7368.objecter dump_active .. 
0 homeless
2015-10-29 13:13:49.493898 7f5c270db700  1 -- 10.134.128.41:0/1025119 <== mon.0 
10.134.128.41:6789/0 10  mon_subscribe_ack(300s) v1  20+0+0 (646930372 
0 0) 0x7f5c100056d0 con 0x2307540
2015-10-29 13:13:49.493950 7f5c2cb3b7c0 20 librbd::ImageCtx: enabling caching...
2015-10-29 13:13:49.493971 7f5c2cb3b7c0 20 librbd::ImageCtx: Initial cache 
settings: size=64 num_objects=10 max_dirty=32 target_dirty=16 max_dirty_age=5
2015-10-29 13:13:49.494155 7f5c2cb3b7c0 20 librbd: open_image: ictx = 0x2305530 
name = 'seco101ira' id = '' snap_name = ''
2015-10-29 13:13:49.494209 7f5c2cb3b7c0 10 librados: stat oid=seco101ira.rbd 
nspace=
2015-10-29 13:13:49.494290 7f5c2cb3b7c0 10 client.7368.

Re: [ceph-users] rbd hang

2015-10-29 Thread Jason Dillaman

I don't see the read request hitting the wire, so I am thinking your client 
cannot talk to the primary PG for the 'rb.0.16cf.238e1f29.' object. 
 Try adding "debug objecter = 20" to your configuration to get more details.

-- 

Jason Dillaman 

- Original Message -
> From: "Joe Ryner" 
> To: ceph-us...@ceph.com
> Sent: Thursday, October 29, 2015 12:22:01 PM
> Subject: [ceph-users] rbd hang
> 
> i,
> 
> I am having a strange problem with our development cluster.  When I run rbd
> export it just hangs.  I have been running ceph for a long time and haven't
> encountered this kind of issue.  Any ideas as to what is going on?
> 
> rbd -p locks export seco101ira -
> 
> 
> I am running
> 
> Centos 6.6 x86 64
> 
> ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70)
> 
> I have enabled debugging and get the following when I run the command
> 
> [root@durbium ~]# rbd -p locks export seco101ira -
> 2015-10-29 11:17:08.183597 7fc3334fa7c0  1 librados: starting msgr at :/0
> 2015-10-29 11:17:08.183613 7fc3334fa7c0  1 librados: starting objecter
> 2015-10-29 11:17:08.183739 7fc3334fa7c0  1 -- :/0 messenger.start
> 2015-10-29 11:17:08.183779 7fc3334fa7c0  1 librados: setting wanted keys
> 2015-10-29 11:17:08.183782 7fc3334fa7c0  1 librados: calling monclient init
> 2015-10-29 11:17:08.184365 7fc3334fa7c0  1 -- :/1024687 -->
> 10.134.128.42:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x15ba900
> con 0x15ba540
> 2015-10-29 11:17:08.185006 7fc3334f2700  1 -- 10.134.128.41:0/1024687 learned
> my addr 10.134.128.41:0/1024687
> 2015-10-29 11:17:08.185995 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 1  mon_map v1  491+0+0 (318324477 0 0)
> 0x7fc318000be0 con 0x15ba540
> 2015-10-29 11:17:08.186213 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 2  auth_reply(proto 2 0 (0) Success) v1 
> 33+0+0 (4093383511 0 0) 0x7fc318001090 con 0x15ba540
> 2015-10-29 11:17:08.186544 7fc32da9a700  1 -- 10.134.128.41:0/1024687 -->
> 10.134.128.42:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0
> 0x7fc31c001700 con 0x15ba540
> 2015-10-29 11:17:08.187160 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 3  auth_reply(proto 2 0 (0) Success) v1 
> 206+0+0 (2382192463 0 0) 0x7fc318001090 con 0x15ba540
> 2015-10-29 11:17:08.187354 7fc32da9a700  1 -- 10.134.128.41:0/1024687 -->
> 10.134.128.42:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0
> 0x7fc31c002220 con 0x15ba540
> 2015-10-29 11:17:08.188001 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 4  auth_reply(proto 2 0 (0) Success) v1 
> 393+0+0 (34117402 0 0) 0x7fc3180008c0 con 0x15ba540
> 2015-10-29 11:17:08.188148 7fc32da9a700  1 -- 10.134.128.41:0/1024687 -->
> 10.134.128.42:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x15b6b80 con
> 0x15ba540
> 2015-10-29 11:17:08.188334 7fc3334fa7c0  1 -- 10.134.128.41:0/1024687 -->
> 10.134.128.42:6789/0 -- mon_subscribe({monmap=6+,osdmap=0}) v2 -- ?+0
> 0x15b7700 con 0x15ba540
> 2015-10-29 11:17:08.188355 7fc3334fa7c0  1 -- 10.134.128.41:0/1024687 -->
> 10.134.128.42:6789/0 -- mon_subscribe({monmap=6+,osdmap=0}) v2 -- ?+0
> 0x15b7ca0 con 0x15ba540
> 2015-10-29 11:17:08.188445 7fc3334fa7c0  1 librados: init done
> 2015-10-29 11:17:08.188463 7fc3334fa7c0 10 librados: wait_for_osdmap waiting
> 2015-10-29 11:17:08.188625 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 5  mon_map v1  491+0+0 (318324477 0 0)
> 0x7fc318001300 con 0x15ba540
> 2015-10-29 11:17:08.188795 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 6  mon_subscribe_ack(300s) v1  20+0+0
> (646930372 0 0) 0x7fc3180015a0 con 0x15ba540
> 2015-10-29 11:17:08.189129 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 7  osd_map(4350..4350 src has 3829..4350) v3
>  7562+0+0 (1787729222 0 0) 0x7fc3180013b0 con 0x15ba540
> 2015-10-29 11:17:08.189452 7fc3334fa7c0 10 librados: wait_for_osdmap done
> waiting
> 2015-10-29 11:17:08.189454 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 8  mon_subscribe_ack(300s) v1  20+0+0
> (646930372 0 0) 0x7fc3180008c0 con 0x15ba540
> 2015-10-29 11:17:08.189470 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 9  osd_map(4350..4350 src has 3829..4350) v3
>  7562+0+0 (1787729222 0 0) 0x7fc318005290 con 0x15ba540
> 2015-10-29 11:17:08.189485 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <==
> mon.1 10.134.128.42:6789/0 10  mon_subscribe_ack(300s) v1  20+0+0
> (646930372 0 0) 0x7fc3180056d0 con 0x15ba540
> 2015-10-29 11:17:08.189522 7fc3334fa7c0 20 librbd::ImageCtx: enabling
> caching...
> 2015-10-29 11:17:08.189540 7fc3334fa7c0 20 librbd::ImageCtx: Initial cache
> settings: size=64 num_objects=10 max_dirty=32 target_dirty=16
> max_dirty_age=5
> 2015-10-29 11:17:08.189686 7fc3334fa7c0 20 librbd: open_image: ictx =

Re: [ceph-users] Benchmark individual OSD's

2015-10-29 Thread Gregory Farnum

You can also extend that command line to specify specific block and
total sizes. Check the help text. :)
-Greg

On Thursday, October 29, 2015, Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:

>
> On 29 October 2015 at 19:24, Burkhard Linke <
> burkhard.li...@computational.bio.uni-giessen.de
> 
> > wrote:
>
>> # ceph tell osd.1 bench
>> {
>> "bytes_written": 1073741824,
>> "blocksize": 4194304,
>> "bytes_per_sec": 117403227.00
>> }
>>
>> It might help you to figure out whether individual OSDs do not perform as
>> expected. The amount of data written is limited (but there's a config
>> setting for it). With 1 GB as in the example above, the write operation
>> will probably be limited to the journal.
>>
>
>
>
> Thats perfect, thanks Burkhard, it lets me compare osd's and
> configurations.
>
>
> --
> Lindsay
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] rbd hang

2015-10-29 Thread Joe Ryner

i,

I am having a strange problem with our development cluster.  When I run rbd 
export it just hangs.  I have been running ceph for a long time and haven't 
encountered this kind of issue.  Any ideas as to what is going on?

rbd -p locks export seco101ira -


I am running

Centos 6.6 x86 64 

ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70)

I have enabled debugging and get the following when I run the command

[root@durbium ~]# rbd -p locks export seco101ira -
2015-10-29 11:17:08.183597 7fc3334fa7c0  1 librados: starting msgr at :/0
2015-10-29 11:17:08.183613 7fc3334fa7c0  1 librados: starting objecter
2015-10-29 11:17:08.183739 7fc3334fa7c0  1 -- :/0 messenger.start
2015-10-29 11:17:08.183779 7fc3334fa7c0  1 librados: setting wanted keys
2015-10-29 11:17:08.183782 7fc3334fa7c0  1 librados: calling monclient init
2015-10-29 11:17:08.184365 7fc3334fa7c0  1 -- :/1024687 --> 
10.134.128.42:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x15ba900 con 
0x15ba540
2015-10-29 11:17:08.185006 7fc3334f2700  1 -- 10.134.128.41:0/1024687 learned 
my addr 10.134.128.41:0/1024687
2015-10-29 11:17:08.185995 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 1  mon_map v1  491+0+0 (318324477 0 0) 
0x7fc318000be0 con 0x15ba540
2015-10-29 11:17:08.186213 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 2  auth_reply(proto 2 0 (0) Success) v1  33+0+0 
(4093383511 0 0) 0x7fc318001090 con 0x15ba540
2015-10-29 11:17:08.186544 7fc32da9a700  1 -- 10.134.128.41:0/1024687 --> 
10.134.128.42:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0x7fc31c001700 
con 0x15ba540
2015-10-29 11:17:08.187160 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 3  auth_reply(proto 2 0 (0) Success) v1  206+0+0 
(2382192463 0 0) 0x7fc318001090 con 0x15ba540
2015-10-29 11:17:08.187354 7fc32da9a700  1 -- 10.134.128.41:0/1024687 --> 
10.134.128.42:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 
0x7fc31c002220 con 0x15ba540
2015-10-29 11:17:08.188001 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 4  auth_reply(proto 2 0 (0) Success) v1  393+0+0 
(34117402 0 0) 0x7fc3180008c0 con 0x15ba540
2015-10-29 11:17:08.188148 7fc32da9a700  1 -- 10.134.128.41:0/1024687 --> 
10.134.128.42:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x15b6b80 con 
0x15ba540
2015-10-29 11:17:08.188334 7fc3334fa7c0  1 -- 10.134.128.41:0/1024687 --> 
10.134.128.42:6789/0 -- mon_subscribe({monmap=6+,osdmap=0}) v2 -- ?+0 0x15b7700 
con 0x15ba540
2015-10-29 11:17:08.188355 7fc3334fa7c0  1 -- 10.134.128.41:0/1024687 --> 
10.134.128.42:6789/0 -- mon_subscribe({monmap=6+,osdmap=0}) v2 -- ?+0 0x15b7ca0 
con 0x15ba540
2015-10-29 11:17:08.188445 7fc3334fa7c0  1 librados: init done
2015-10-29 11:17:08.188463 7fc3334fa7c0 10 librados: wait_for_osdmap waiting
2015-10-29 11:17:08.188625 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 5  mon_map v1  491+0+0 (318324477 0 0) 
0x7fc318001300 con 0x15ba540
2015-10-29 11:17:08.188795 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 6  mon_subscribe_ack(300s) v1  20+0+0 (646930372 0 
0) 0x7fc3180015a0 con 0x15ba540
2015-10-29 11:17:08.189129 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 7  osd_map(4350..4350 src has 3829..4350) v3  
7562+0+0 (1787729222 0 0) 0x7fc3180013b0 con 0x15ba540
2015-10-29 11:17:08.189452 7fc3334fa7c0 10 librados: wait_for_osdmap done 
waiting
2015-10-29 11:17:08.189454 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 8  mon_subscribe_ack(300s) v1  20+0+0 (646930372 0 
0) 0x7fc3180008c0 con 0x15ba540
2015-10-29 11:17:08.189470 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 9  osd_map(4350..4350 src has 3829..4350) v3  
7562+0+0 (1787729222 0 0) 0x7fc318005290 con 0x15ba540
2015-10-29 11:17:08.189485 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 10  mon_subscribe_ack(300s) v1  20+0+0 (646930372 
0 0) 0x7fc3180056d0 con 0x15ba540
2015-10-29 11:17:08.189522 7fc3334fa7c0 20 librbd::ImageCtx: enabling caching...
2015-10-29 11:17:08.189540 7fc3334fa7c0 20 librbd::ImageCtx: Initial cache 
settings: size=64 num_objects=10 max_dirty=32 target_dirty=16 max_dirty_age=5
2015-10-29 11:17:08.189686 7fc3334fa7c0 20 librbd: open_image: ictx = 0x15b8390 
name = 'seco101ira' id = '' snap_name = ''
2015-10-29 11:17:08.189730 7fc3334fa7c0 10 librados: stat oid=seco101ira.rbd 
nspace=
2015-10-29 11:17:08.189882 7fc3334fa7c0  1 -- 10.134.128.41:0/1024687 --> 
10.134.128.43:6803/2741 -- osd_op(client.7543.0:1 seco101ira.rbd [stat] 
4.a982c550 ack+read e4350) v4 -- ?+0 0x15baf60 con 0x15b9e70
2015-10-29 11:17:08.192470 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== osd.2 
10.134.128.43:6803/2741 1  osd_op_reply(1 seco101ira.rbd [stat] v0'0 uv1 
ondisk = 0) v6  181+0+16 (1355327

[ceph-users] rbd export hangs

2015-10-29 Thread Joe Ryner

Hi,

I am having a strange problem with our development cluster.  When I run rbd 
export it just hangs.  I have been running ceph for a long time and haven't 
encountered this kind of issue.  Any ideas as to what is going on?

rbd -p locks export seco101ira -


I am running

Centos 6.6 x86 64 

ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70)

I have enabled debugging and get the following when I run the command

[root@durbium ~]# rbd -p locks export seco101ira -
2015-10-29 11:17:08.183597 7fc3334fa7c0  1 librados: starting msgr at :/0
2015-10-29 11:17:08.183613 7fc3334fa7c0  1 librados: starting objecter
2015-10-29 11:17:08.183739 7fc3334fa7c0  1 -- :/0 messenger.start
2015-10-29 11:17:08.183779 7fc3334fa7c0  1 librados: setting wanted keys
2015-10-29 11:17:08.183782 7fc3334fa7c0  1 librados: calling monclient init
2015-10-29 11:17:08.184365 7fc3334fa7c0  1 -- :/1024687 --> 
10.134.128.42:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x15ba900 con 
0x15ba540
2015-10-29 11:17:08.185006 7fc3334f2700  1 -- 10.134.128.41:0/1024687 learned 
my addr 10.134.128.41:0/1024687
2015-10-29 11:17:08.185995 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 1  mon_map v1  491+0+0 (318324477 0 0) 
0x7fc318000be0 con 0x15ba540
2015-10-29 11:17:08.186213 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 2  auth_reply(proto 2 0 (0) Success) v1  33+0+0 
(4093383511 0 0) 0x7fc318001090 con 0x15ba540
2015-10-29 11:17:08.186544 7fc32da9a700  1 -- 10.134.128.41:0/1024687 --> 
10.134.128.42:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0x7fc31c001700 
con 0x15ba540
2015-10-29 11:17:08.187160 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 3  auth_reply(proto 2 0 (0) Success) v1  206+0+0 
(2382192463 0 0) 0x7fc318001090 con 0x15ba540
2015-10-29 11:17:08.187354 7fc32da9a700  1 -- 10.134.128.41:0/1024687 --> 
10.134.128.42:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 
0x7fc31c002220 con 0x15ba540
2015-10-29 11:17:08.188001 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 4  auth_reply(proto 2 0 (0) Success) v1  393+0+0 
(34117402 0 0) 0x7fc3180008c0 con 0x15ba540
2015-10-29 11:17:08.188148 7fc32da9a700  1 -- 10.134.128.41:0/1024687 --> 
10.134.128.42:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x15b6b80 con 
0x15ba540
2015-10-29 11:17:08.188334 7fc3334fa7c0  1 -- 10.134.128.41:0/1024687 --> 
10.134.128.42:6789/0 -- mon_subscribe({monmap=6+,osdmap=0}) v2 -- ?+0 0x15b7700 
con 0x15ba540
2015-10-29 11:17:08.188355 7fc3334fa7c0  1 -- 10.134.128.41:0/1024687 --> 
10.134.128.42:6789/0 -- mon_subscribe({monmap=6+,osdmap=0}) v2 -- ?+0 0x15b7ca0 
con 0x15ba540
2015-10-29 11:17:08.188445 7fc3334fa7c0  1 librados: init done
2015-10-29 11:17:08.188463 7fc3334fa7c0 10 librados: wait_for_osdmap waiting
2015-10-29 11:17:08.188625 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 5  mon_map v1  491+0+0 (318324477 0 0) 
0x7fc318001300 con 0x15ba540
2015-10-29 11:17:08.188795 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 6  mon_subscribe_ack(300s) v1  20+0+0 (646930372 0 
0) 0x7fc3180015a0 con 0x15ba540
2015-10-29 11:17:08.189129 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 7  osd_map(4350..4350 src has 3829..4350) v3  
7562+0+0 (1787729222 0 0) 0x7fc3180013b0 con 0x15ba540
2015-10-29 11:17:08.189452 7fc3334fa7c0 10 librados: wait_for_osdmap done 
waiting
2015-10-29 11:17:08.189454 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 8  mon_subscribe_ack(300s) v1  20+0+0 (646930372 0 
0) 0x7fc3180008c0 con 0x15ba540
2015-10-29 11:17:08.189470 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 9  osd_map(4350..4350 src has 3829..4350) v3  
7562+0+0 (1787729222 0 0) 0x7fc318005290 con 0x15ba540
2015-10-29 11:17:08.189485 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== mon.1 
10.134.128.42:6789/0 10  mon_subscribe_ack(300s) v1  20+0+0 (646930372 
0 0) 0x7fc3180056d0 con 0x15ba540
2015-10-29 11:17:08.189522 7fc3334fa7c0 20 librbd::ImageCtx: enabling caching...
2015-10-29 11:17:08.189540 7fc3334fa7c0 20 librbd::ImageCtx: Initial cache 
settings: size=64 num_objects=10 max_dirty=32 target_dirty=16 max_dirty_age=5
2015-10-29 11:17:08.189686 7fc3334fa7c0 20 librbd: open_image: ictx = 0x15b8390 
name = 'seco101ira' id = '' snap_name = ''
2015-10-29 11:17:08.189730 7fc3334fa7c0 10 librados: stat oid=seco101ira.rbd 
nspace=
2015-10-29 11:17:08.189882 7fc3334fa7c0  1 -- 10.134.128.41:0/1024687 --> 
10.134.128.43:6803/2741 -- osd_op(client.7543.0:1 seco101ira.rbd [stat] 
4.a982c550 ack+read e4350) v4 -- ?+0 0x15baf60 con 0x15b9e70
2015-10-29 11:17:08.192470 7fc32da9a700  1 -- 10.134.128.41:0/1024687 <== osd.2 
10.134.128.43:6803/2741 1  osd_op_reply(1 seco101ira.rbd [stat] v0'0 uv1 
ondisk = 0) v6  181+0+16 (135532

[ceph-users] Cloudstack agent crashed JVM with exception in librbd

2015-10-29 Thread Voloshanenko Igor

Hi Wido and all community.

We catched very idiotic issue on our Cloudstack installation, which related
to ceph and possible to java-rados lib.

So, we have constantly agent crashed (which cause very big problem for
us... ).

When agent crashed - it's crash JVM. And no event in logs at all.
We enabled crush dump, and after crash we see next picture:

#grep -A1 "Problematic frame" < /hs_err_pid30260.log
 Problematic frame:
 C  [librbd.so.1.0.0+0x5d681]

# gdb /usr/lib/librbd.so.1.0.0 /var/tmp/cores/jsvc.25526.0.core
(gdb)  bt
...
#7  0x7f30b9a1fed2 in ceph::log::SubsystemMap::should_gather
(level=, sub=, this=)
at ./log/SubsystemMap.h:62
#8  0x7f30b9a3b693 in ceph::log::SubsystemMap::should_gather
(this=, sub=, level=)
at ./log/SubsystemMap.h:61
#9  0x7f30b9d879be in ObjectCacher::flusher_entry (this=0x7f2fb4017910)
at osdc/ObjectCacher.cc:1527
#10 0x7f30b9d9851d in ObjectCacher::FlusherThread::entry
(this=) at osdc/ObjectCacher.h:374

>From ceph code, this part executed when flushing cache object... And we
don;t understand why. Becasue we have absolutely different race condition
to reproduce it.

As cloudstack have not good implementation yet of snapshot lifecycle,
sometime, it's happen, that some volumes already marked as EXPUNGED in DB
and then cloudstack try to delete bas Volume, before it's try to unprotect
it.

Sure, unprotecting fail, normal exception returned back (fail because snap
has childs... )

2015-10-29 09:02:19,401 DEBUG [kvm.resource.KVMHAMonitor]
(Thread-1304:null) Executing:
/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/kvmheartbeat.sh -i
10.44.253.13 -p /var/lib/libvirt/PRIMARY -m
/mnt/93655746-a9ef-394d-95e9-6e62471dd39f -h 10.44.253.11
2015-10-29 09:02:19,412 DEBUG [kvm.resource.KVMHAMonitor]
(Thread-1304:null) Execution is successful.
2015-10-29 09:02:19,554 INFO  [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-5:null) Unprotecting and Removing RBD snapshots of
image 6789/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c prior to removing the image
2015-10-29 09:02:19,571 DEBUG [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-5:null) Succesfully connected to Ceph cluster at
cephmon.anolim.net:6789
2015-10-29 09:02:19,608 DEBUG [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-5:null) Unprotecting snapshot
cloudstack/71b1e2e9-1985-45ca-9ab6-9e5016b86b7c@cloudstack-base-snap
2015-10-29 09:02:19,627 DEBUG [kvm.storage.KVMStorageProcessor]
(agentRequest-Handler-5:null) Failed to delete volume:
com.cloud.utils.exception.CloudRuntimeException: com.ceph.rbd.RbdException:
Failed to unprotect snapshot cloudstack-base-snap
2015-10-29 09:02:19,628 DEBUG [cloud.agent.Agent]
(agentRequest-Handler-5:null) Seq 4-1921583831:  { Ans: , MgmtId:
161344838950, via: 4, Ver: v1, Flags: 10,
[{"com.cloud.agent.api.Answer":{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException:
com.ceph.rbd.RbdException: Failed to unprotect snapshot
cloudstack-base-snap","wait":0}}] }
2015-10-29 09:02:25,722 DEBUG [cloud.agent.Agent]
(agentRequest-Handler-2:null) Processing command:
com.cloud.agent.api.GetHostStatsCommand
2015-10-29 09:02:25,722 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Executing: /bin/bash -c idle=$(top -b -n 1|
awk -F, '/^[%]*[Cc]pu/{$0=$4; gsub(/[^0-9.,]+/,""); print }'); echo $idle
2015-10-29 09:02:26,249 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Execution is successful.
2015-10-29 09:02:26,250 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Executing: /bin/bash -c freeMem=$(free|grep
cache:|awk '{print $4}');echo $freeMem
2015-10-29 09:02:26,254 DEBUG [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-2:null) Execution is successful.

BUT, after 20 minutes - agent crashed... If we remove all childs and create
conditions for cloudstack to delete volume - all OK - no agent crash in 20
minutes...

We can't connect this action - Volume delete with agent crashe... Also we
don't understand why +- 20 minutes need to last, and only then agent
crashed...

>From logs, before crash - only GetVMStats... And then - agent started...

2015-10-29 09:21:55,143 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
Sending ping: Seq 4-1343:  { Cmd , MgmtId: -1, via: 4, Ver: v1, Flags: 11,
[{"com.cloud.agent.api.PingRoutingCommand":{"newStates":{},"_hostVmStateReport":{"i-881-1117-VM":{"state":"PowerOn","host":"
cs2.anolim.net"},"i-7-106-VM":{"state":"PowerOn","host":"cs2.anolim.net
"},"i-1683-1984-VM":{"state":"PowerOn","host":"cs2.anolim.net
"},"i-11-504-VM":{"state":"PowerOn","host":"cs2.anolim.net
"},"i-325-616-VM":{"state":"PowerOn","host":"cs2.anolim.net
"},"i-10-52-VM":{"state":"PowerOn","host":"cs2.anolim.net
"},"i-941-1237-VM":{"state":"PowerOn","host":"cs2.anolim.net"}},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":4,"wait":0}}]
}
2015-10-29 09:21:55,149 DEBUG [cloud.agent.Agent] (Agent-Handler-3:null)
Received response

[ceph-users] ceph-mon segmentation faults after upgrade from 0.94.3 to 0.94.5

2015-10-29 Thread Arnulf Heimsbakk

Hi,

we have multiple Ceph clusters. One is used as backend for OpenStack 
installation for developers - it's here we test Ceph upgrades before we upgrade 
prod Ceph clusters. The Ceph cluster is 4 nodes with 12 osds each running 
Ubuntu Trusty with latest 3.13 kernel.

This time when upgrading from 0.94.3 to 0.94.5 ceph-mons died during the 
upgrade a couple of times. One when we restarted the first monitor, of three, 
during the upgrade procedure and the second time when we ran 'ceph osd unset 
noout' in the end.

I thought this was a fluke during the upgrade, but ceph-mons seem to segfault 
fairly regular now, the day after the upgrade. Corefile doesn't get dumped, so 
I have only the log for this strange behaviour. The cluster has been following 
the upgrades from firefly to the current Hammer release and has worked flawless 
until now.

The cluster produces and work more or less as normal from the users viewpoint. 
Except we get segmentation faults in the logfile. Downgrading is a last resort 
that I rather not do. 

What can cause these errors and how can I fix it is my question.

-Arnulf



The segfault look like this:

Oct 29 14:29:46 95z3zz1 ceph-mon:  0> 2015-10-29 14:29:46.297786 
7f908e5af700 -1 *** Caught signal (Segmentation fault) **#012 in thread 
7f908e5af700#012#012 ceph version 0.94.5 
(9764da52395923e0b32908d83a9f7304401fee43)#012 1: /usr/bin/ceph-mon() 
[0x9adefa]#012 2: (()+0x10340) [0x7f90936b6340]#012 3: 
(std::_Rb_tree, 
std::_Select1st >, 
std::less, std::allocator > >::find(std::string const&) const+0x25) [0x6518e5]#012 4: 
(get_str_map_key(std::map, 
std::allocator > > const&, 
std::string const&, std::string const*)+0x1e) [0x8a002e]#012 5: 
(LogMonitor::update_from_paxos(bool*)+0x87a) [0x6b0a5a]#012 6: 
(PaxosService::refresh(bool*)+0x19a) [0x60432a]#012 7: 
(Monitor::refresh_from_paxos(bool*)+0x1db) [0x5b03db]#012 8: 
(Paxos::do_refresh()+0
 x2e) [0x5eea5e]#012 9: (Paxos::commit_finish()+0x569) [0x5fbf39]#012 10: 
(C_Committed::finish(int)+0x2b) [0x60038b]#012 11: (Context::complete(int)+0x9) 
[0x5d4d89]#012 12: (MonitorDBStore::C_DoTransaction::finish(int)+0x8c) 
[0x5ff4bc]#012 13: (Context::complete(int)+0x9) [0x5d4d89]#012 14: 
(Finisher::finisher_thread_entry()+0x158) [0x717e88]#012 15: (()+0x8182) 
[0x7f90936ae182]#012 16: (clone()+0x6d) [0x7f9091c1947d]#012 NOTE: a copy of 
the executable, or `objdump -rdS ` is needed to interpret this.

Full log of event:

Oct 29 14:29:45 95z3zz1 ceph-mon: 2015-10-29 14:29:45.697177 7f3154801700 -1 
*** Caught signal (Segmentation fault) **#012 in thread 7f3154801700#012#012 
ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)#012 1: 
/usr/bin/ceph-mon() [0x9adefa]#012 2: (()+0x10340) [0x7f3159b63340]#012 3: 
(std::_Rb_tree, 
std::_Select1st >, 
std::less, std::allocator > >::find(std::string const&) const+0x25) [0x6518e5]#012 4: 
(get_str_map_key(std::map, 
std::allocator > > const&, 
std::string const&, std::string const*)+0x1e) [0x8a002e]#012 5: 
(LogMonitor::update_from_paxos(bool*)+0x87a) [0x6b0a5a]#012 6: 
(PaxosService::refresh(bool*)+0x19a) [0x60432a]#012 7: 
(Monitor::refresh_from_paxos(bool*)+0x1db) [0x5b03db]#012 8: 
(Paxos::do_refresh()+0x2e) [0x
 5eea5e]#012 9: (Paxos::commit_finish()+0x569) [0x5fbf39]#012 10: 
(C_Committed::finish(int)+0x2b) [0x60038b]#012 11: (Context::complete(int)+0x9) 
[0x5d4d89]#012 12: (MonitorDBStore::C_DoTransaction::finish(int)+0x8c) 
[0x5ff4bc]#012 13: (Context::complete(int)+0x9) [0x5d4d89]#012 14: 
(Finisher::finisher_thread_entry()+0x158) [0x717e88]#012 15: (()+0x8182) 
[0x7f3159b5b182]#012 16: (clone()+0x6d) [0x7f31580c647d]#012 NOTE: a copy of 
the executable, or `objdump -rdS ` is needed to interpret this.
Oct 29 14:29:45 95z3zz1 ceph-mon: --- begin dump of recent events ---
Oct 29 14:29:45 95z3zz1 ceph-mon:   -450> 2015-10-29 14:29:44.484656 
7f315aa5d8c0  5 asok(0x4daa000) register_command perfcounters_dump hook 
0x4d32050
Oct 29 14:29:45 95z3zz1 ceph-mon:   -449> 2015-10-29 14:29:44.484677 
7f315aa5d8c0  5 asok(0x4daa000) register_command 1 hook 0x4d32050
Oct 29 14:29:45 95z3zz1 ceph-mon:   -448> 2015-10-29 14:29:44.484681 
7f315aa5d8c0  5 asok(0x4daa000) register_command perf dump hook 0x4d32050
Oct 29 14:29:45 95z3zz1 ceph-mon:   -447> 2015-10-29 14:29:44.484686 
7f315aa5d8c0  5 asok(0x4daa000) register_command perfcounters_schema hook 
0x4d32050
Oct 29 14:29:45 95z3zz1 ceph-mon:   -446> 2015-10-29 14:29:44.484688 
7f315aa5d8c0  5 asok(0x4daa000) register_command 2 hook 0x4d32050
Oct 29 14:29:45 95z3zz1 ceph-mon:   -445> 2015-10-29 14:29:44.484690 
7f315aa5d8c0  5 asok(0x4daa000) register_command perf schema hook 0x4d32050
Oct 29 14:29:45 95z3zz1 ceph-mon:   -444> 2015-10-29 14:29:44.484692 
7f315aa5d8c0  5 asok(0x4daa000) register_command perf reset hook 0x4d32050
Oct 29 14:29:45 95z3zz1 ceph-mon:   -443> 2015-10-29 14:29:44.484694 
7f315aa5d8c0  5 asok(0x4daa000) register_command config show hook 0x4d32050
Oct 29 14:29:45 95z3zz1 ceph-mon:   -442> 2015-10-29 14:29:44.4

Re: [ceph-users] rsync mirror download.ceph.com - broken file on rsync server

2015-10-29 Thread Ken Dreyer

On Wed, Oct 28, 2015 at 7:54 PM, Matt Taylor  wrote:
> I still see rsync errors due to permissions on the remote side:
>

Thanks for the heads' up; I bet another upload rsync process got
interrupted there.

I've run the following to remove all the oddly-named RPM files:

  for f in $(locate *.rpm.* ) ; do rm -i $f; done

Please let us know if there are other problems like this.

- Ken
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Benchmark individual OSD's

2015-10-29 Thread Lindsay Mathieson

On 29 October 2015 at 19:24, Burkhard Linke <
burkhard.li...@computational.bio.uni-giessen.de> wrote:

> # ceph tell osd.1 bench
> {
> "bytes_written": 1073741824,
> "blocksize": 4194304,
> "bytes_per_sec": 117403227.00
> }
>
> It might help you to figure out whether individual OSDs do not perform as
> expected. The amount of data written is limited (but there's a config
> setting for it). With 1 GB as in the example above, the write operation
> will probably be limited to the journal.
>



Thats perfect, thanks Burkhard, it lets me compare osd's and configurations.


-- 
Lindsay
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Input/output error

2015-10-29 Thread hzwulibin

Hi, 

Please search the google, there exist the answer.

As i rembered:
1. low kernel version rbd not support some feature of CRUSH, Check the 
/var/log/message
2. sudo rbd map foo --name client.admin -p {pol_name}
3. also specify the -p {pol_name} when you create the image

Thanks!

--   
hzwulibin
2015-10-29

-
发件人：Wah Peng 
发送日期：2015-10-29 15:13
收件人：ceph-users@lists.ceph.com
抄送：
主题：[ceph-users] Input/output error

hello,

do you know why this happens when I did it following the official 
ducumentation.

$ sudo rbd map foo --name client.admin

rbd: add failed: (5) Input/output error


the OS kernel,

$ uname -a
Linux ceph.yygamedev.com 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 
20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux


I tried  this way,

ceph osd getcrushmap -o /tmp/crush
crushtool -i /tmp/crush --set-chooseleaf_vary_r 0 -o /tmp/crush.new
ceph osd setcrushmap -i /tmp/crush.new

but got no luck.

my cluster status seems OK,

$ ceph health
HEALTH_OK

$ ceph osd tree
# idweight  type name   up/down reweight
-1  0.24root default
-2  0.24host ceph2
0   0.07999 osd.0   up  1
1   0.07999 osd.1   up  1
2   0.07999 osd.2   up  1

Thanks in advance.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Issue with ceph-deploy

2015-10-29 Thread Tashi Lu

I'm following the tutorial at
http://docs.ceph.com/docs/v0.79/start/quick-ceph-deploy/ to deploy a
monitor using

% ceph-deploy mon create-initial

But I got the following errors:

...
[ceph-node1][INFO  ] Running command: ceph --cluster=ceph --admin-daemon
/var/run/ceph/ceph-mon.ceph-node1.asok mon_status
[ceph-node1][ERROR ] admin_socket: exception getting command descriptions:
[Errno 2] No such file or directory
...

I checked on the monitor ceph-node1, it turned out that there were asok
file, but with it's `hostname` rather than the /etc/hosts alias e.g.
ceph-node1:

[root@GZH-ZB-SA2-Kiev-116 ~]# ls /var/run/ceph/
ceph-mon.GZH-ZB-SA2-Kiev-116.asok  mon.GZH-ZB-SA2-Kiev-116.pid

When creating the socket file, is it better to use some hostname-neutral or
passing-from-admin-node name?
I think we shall at least modify ceph-deploy to run ceph --admin-daemon
with the monitor's real hostname. Most users tend to use IP addresses or
aliases rather than admin node's hostname.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Core dump while getting a volume real size with a python script

2015-10-29 Thread Jason Dillaman

It sounds like you ran into this issue [1].  It's been fixed in upstream master 
and infernalis branches, but the backport is still awaiting release on hammer.

[1] http://tracker.ceph.com/issues/12885

-- 

Jason Dillaman 


- Original Message - 

> From: "Giuseppe Civitella" 
> To: "ceph-users" 
> Sent: Thursday, October 29, 2015 4:44:02 AM
> Subject: Re: [ceph-users] Core dump while getting a volume real size with a
> python script

> ... and this is the core dump output while executing the "rbd diff" command:
> http://paste.openstack.org/show/477604/

> Regards,
> Giuseppe

> 2015-10-28 16:46 GMT+01:00 Giuseppe Civitella < giuseppe.civite...@gmail.com
> > :

> > Hi all,
> 

> > I'm trying to get the real disk usage of a Cinder volume converting this
> > bash
> > commands to python:
> 
> > http://cephnotes.ksperis.com/blog/2013/08/28/rbd-image-real-size
> 

> > I wrote a small test function which has already worked in many cases but it
> > stops with a core dump while trying to calculate the real size of a
> > particular volume.
> 

> > This is the function:
> 
> > http://paste.openstack.org/show/477563/
> 

> > this is the error I get:
> 
> > http://paste.openstack.org/show/477567/
> 

> > and these are the related rbd info:
> 
> > http://paste.openstack.org/show/477568/
> 

> > Can anyone help me to debug the problem?
> 

> > Thanks
> 
> > Giuseppe
> 

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Input/output error

2015-10-29 Thread Ilya Dryomov

On Thu, Oct 29, 2015 at 11:22 AM, Wah Peng  wrote:
> Thanks Gurjar.
> Have loaded the rbd module, but got no luck.
> what dmesg shows,
>
> [119192.384770] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 2 <
> server's 42040002, missing 4204
> [119192.388744] libceph: mon0 172.17.6.176:6789 missing required protocol
> features
> [119202.400782] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 2 <
> server's 42040002, missing 4204
> [119202.404756] libceph: mon0 172.17.6.176:6789 missing required protocol
> features
> [119212.416758] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 2 <
> server's 42040002, missing 4204
> [119212.420732] libceph: mon0 172.17.6.176:6789 missing required protocol
> features
> [119222.432783] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 2 <
> server's 42040002, missing 4204
> [119222.436756] libceph: mon0 172.17.6.176:6789 missing required protocol
> features
> [119232.448780] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 2 <
> server's 42040002, missing 4204
> [119232.452754] libceph: mon0 172.17.6.176:6789 missing required protocol
> features

Our messages raced - you are missing CRUSH_TUNABLES, CRUSH_TUNABLES2
and, more importantly, OSDHASHPSPOOL: starting with ceph 0.64, pools
are created with hashpspool flag set.  If you *really* want to try and
run 3.2 kernel client, you'll need to clear it with "ceph osd pool set
$poolname hashpspool false" and then reset all crush tunables to legacy
values [1].

Note that we recommend at least >=3.10 for the kernel client [2].

[1] http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables
[2] http://docs.ceph.com/docs/master/start/os-recommendations/

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Input/output error

2015-10-29 Thread Wah Peng


$ ceph -v
ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70)

thanks.

On 2015/10/29 星期四 18:23, Ilya Dryomov wrote:

What's your ceph version and what does dmesg say?  3.2 is*way*  too
old,  you are probably missing more than one required feature bit.  See
http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables.

Thanks,

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Input/output error

2015-10-29 Thread Ilya Dryomov

On Thu, Oct 29, 2015 at 8:13 AM, Wah Peng  wrote:
> hello,
>
> do you know why this happens when I did it following the official
> ducumentation.
>
> $ sudo rbd map foo --name client.admin
>
> rbd: add failed: (5) Input/output error
>
>
> the OS kernel,
>
> $ uname -a
> Linux ceph.yygamedev.com 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51
> UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
>
>
> I tried  this way,
>
> ceph osd getcrushmap -o /tmp/crush
> crushtool -i /tmp/crush --set-chooseleaf_vary_r 0 -o /tmp/crush.new
> ceph osd setcrushmap -i /tmp/crush.new
>
> but got no luck.
>
> my cluster status seems OK,
>
> $ ceph health
> HEALTH_OK
>
> $ ceph osd tree
> # idweight  type name   up/down reweight
> -1  0.24root default
> -2  0.24host ceph2
> 0   0.07999 osd.0   up  1
> 1   0.07999 osd.1   up  1
> 2   0.07999 osd.2   up  1

What's your ceph version and what does dmesg say?  3.2 is *way* too
old,  you are probably missing more than one required feature bit.  See
http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Input/output error

2015-10-29 Thread Wah Peng


Thanks Gurjar.
Have loaded the rbd module, but got no luck.
what dmesg shows,

[119192.384770] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 
2 < server's 42040002, missing 4204
[119192.388744] libceph: mon0 172.17.6.176:6789 missing required 
protocol features
[119202.400782] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 
2 < server's 42040002, missing 4204
[119202.404756] libceph: mon0 172.17.6.176:6789 missing required 
protocol features
[119212.416758] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 
2 < server's 42040002, missing 4204
[119212.420732] libceph: mon0 172.17.6.176:6789 missing required 
protocol features
[119222.432783] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 
2 < server's 42040002, missing 4204
[119222.436756] libceph: mon0 172.17.6.176:6789 missing required 
protocol features
[119232.448780] libceph: mon0 172.17.6.176:6789 feature set mismatch, my 
2 < server's 42040002, missing 4204
[119232.452754] libceph: mon0 172.17.6.176:6789 missing required 
protocol features


Thx.


On 2015/10/29 星期四 18:11, Gurjar, Unmesh wrote:

Hi,

You might want to confirm if the rbd module is loaded (sudo modprobe rbd) on 
the ceph-client node and give it a retry.
If you still encounter the issue, post back the snippet of error logs in syslog 
or dmesg to take it forward.

Regards,
Unmesh G.

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Wah Peng
Sent: Thursday, October 29, 2015 12:44 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Input/output error

hello,

do you know why this happens when I did it following the official
ducumentation.

$ sudo rbd map foo --name client.admin

rbd: add failed: (5) Input/output error


the OS kernel,

$ uname -a
Linux ceph.yygamedev.com 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10
20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux


I tried  this way,

ceph osd getcrushmap -o /tmp/crush
crushtool -i /tmp/crush --set-chooseleaf_vary_r 0 -o /tmp/crush.new
ceph osd setcrushmap -i /tmp/crush.new

but got no luck.

my cluster status seems OK,

$ ceph health
HEALTH_OK

$ ceph osd tree
# idweight  type name   up/down reweight
-1  0.24root default
-2  0.24host ceph2
0   0.07999 osd.0   up  1
1   0.07999 osd.1   up  1
2   0.07999 osd.2   up  1

Thanks in advance.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS and page cache

2015-10-29 Thread Burkhard Linke


Hi,

On 10/29/2015 09:30 AM, Sage Weil wrote:

On Thu, 29 Oct 2015, Yan, Zheng wrote:

On Thu, Oct 29, 2015 at 2:21 PM, Gregory Farnum  wrote:

On Wed, Oct 28, 2015 at 8:38 PM, Yan, Zheng  wrote:

On Thu, Oct 29, 2015 at 1:10 AM, Burkhard Linke

I tried to dig into the ceph-fuse code, but I was unable to find the
fragment that is responsible for flushing the data from the page cache.


fuse kernel code invalidates page cache on opening file. you can
disable this behaviour by setting ""fuse use invalidate cb"  config
option to true.

With that option ceph-fuse finally works with page cache:

$ time cat /ceph/volumes/biodb/asn1/nr.3*.psq > /dev/null

real2m0.979s
user0m0.020s
sys0m3.164s
$ time cat /ceph/volumes/biodb/asn1/nr.3*.psq > /dev/null

real0m2.106s
user0m0.000s
sys0m1.996s


Zheng, do you know any reason we shouldn't make that the default value
now? There was a loopback deadlock (which is why it's disabled by
default) but I don't remember the details offhand well enough to know
if your recent work in those interfaces has fixed it. Or Sage?
-Greg

there is no loopback deadlock now, because we use a separate thread to
invalidate kernel page cache. I think we can enable this option
safely.

...as long as nobody blocks waiting for invalidate while holding a lock
(client_lock?) that could prevent other fuse ops like write (pretty sure
that was the deadlock we saw before).  I worry this could still happen
with a writer (or reader?) getting stuck in a check_caps() type situation
while the invalidate cb is waiting on a page lock held by the calling
kernel syscall...

I have created an issue to track this: http://tracker.ceph.com/issues/13640

It would be great it the patch is ported to one of the next hammer 
releases after the potential deadlock situation is analysed.


Best regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Benchmark individual OSD's

2015-10-29 Thread Burkhard Linke


Hi,

On 10/29/2015 09:54 AM, Luis Periquito wrote:

Only way I can think of that is creating a new crush rule that selects
that specific OSD with min_size = max_size = 1, then creating a pool
with size = 1 and using that crush rule.

Then you can use that pool as you'd use any other pool.

I haven't tested however it should work.
There's also the osd bench command that writes a certain amount of data 
to a given OSD:


# ceph tell osd.1 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"bytes_per_sec": 117403227.00
}

It might help you to figure out whether individual OSDs do not perform 
as expected. The amount of data written is limited (but there's a config 
setting for it). With 1 GB as in the example above, the write operation 
will probably be limited to the journal.


Regards,
Burkhard


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Benchmark individual OSD's

2015-10-29 Thread Luis Periquito

Only way I can think of that is creating a new crush rule that selects
that specific OSD with min_size = max_size = 1, then creating a pool
with size = 1 and using that crush rule.

Then you can use that pool as you'd use any other pool.

I haven't tested however it should work.

On Thu, Oct 29, 2015 at 1:44 AM, Lindsay Mathieson
 wrote:
>
> On 29 October 2015 at 11:39, Lindsay Mathieson 
> wrote:
>>
>> Is there a way to benchmark individual OSD's?
>
>
> nb - Non-destructive :)
>
>
> --
> Lindsay
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Core dump while getting a volume real size with a python script

2015-10-29 Thread Giuseppe Civitella

... and this is the core dump output while executing the "rbd diff" command:
http://paste.openstack.org/show/477604/

Regards,
Giuseppe

2015-10-28 16:46 GMT+01:00 Giuseppe Civitella 
:

> Hi all,
>
> I'm trying to get the real disk usage of a Cinder volume converting this
> bash commands to python:
> http://cephnotes.ksperis.com/blog/2013/08/28/rbd-image-real-size
>
> I wrote a small test function which has already worked in many cases but
> it stops with a core dump while trying to calculate the real size of a
> particular volume.
>
> This is the function:
> http://paste.openstack.org/show/477563/
>
> this is the error I get:
> http://paste.openstack.org/show/477567/
>
> and these are the related rbd info:
>  http://paste.openstack.org/show/477568/
>
> Can anyone help me to debug the problem?
>
> Thanks
> Giuseppe
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS and page cache

2015-10-29 Thread Sage Weil

On Thu, 29 Oct 2015, Yan, Zheng wrote:
> On Thu, Oct 29, 2015 at 2:21 PM, Gregory Farnum  wrote:
> > On Wed, Oct 28, 2015 at 8:38 PM, Yan, Zheng  wrote:
> >> On Thu, Oct 29, 2015 at 1:10 AM, Burkhard Linke
> >>> I tried to dig into the ceph-fuse code, but I was unable to find the
> >>> fragment that is responsible for flushing the data from the page cache.
> >>>
> >>
> >> fuse kernel code invalidates page cache on opening file. you can
> >> disable this behaviour by setting ""fuse use invalidate cb"  config
> >> option to true.
> >
> > Zheng, do you know any reason we shouldn't make that the default value
> > now? There was a loopback deadlock (which is why it's disabled by
> > default) but I don't remember the details offhand well enough to know
> > if your recent work in those interfaces has fixed it. Or Sage?
> > -Greg
> 
> there is no loopback deadlock now, because we use a separate thread to
> invalidate kernel page cache. I think we can enable this option
> safely.

...as long as nobody blocks waiting for invalidate while holding a lock 
(client_lock?) that could prevent other fuse ops like write (pretty sure 
that was the deadlock we saw before).  I worry this could still happen 
with a writer (or reader?) getting stuck in a check_caps() type situation 
while the invalidate cb is waiting on a page lock held by the calling 
kernel syscall...

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS and page cache

2015-10-29 Thread Yan, Zheng

On Thu, Oct 29, 2015 at 2:21 PM, Gregory Farnum  wrote:
> On Wed, Oct 28, 2015 at 8:38 PM, Yan, Zheng  wrote:
>> On Thu, Oct 29, 2015 at 1:10 AM, Burkhard Linke
>>> I tried to dig into the ceph-fuse code, but I was unable to find the
>>> fragment that is responsible for flushing the data from the page cache.
>>>
>>
>> fuse kernel code invalidates page cache on opening file. you can
>> disable this behaviour by setting ""fuse use invalidate cb"  config
>> option to true.
>
> Zheng, do you know any reason we shouldn't make that the default value
> now? There was a loopback deadlock (which is why it's disabled by
> default) but I don't remember the details offhand well enough to know
> if your recent work in those interfaces has fixed it. Or Sage?
> -Greg

there is no loopback deadlock now, because we use a separate thread to
invalidate kernel page cache. I think we can enable this option
safely.

Regards
Yan, Zheng
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Input/output error

2015-10-29 Thread Wah Peng


hello,

do you know why this happens when I did it following the official 
ducumentation.


$ sudo rbd map foo --name client.admin

rbd: add failed: (5) Input/output error


the OS kernel,

$ uname -a
Linux ceph.yygamedev.com 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 
20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux



I tried  this way,

ceph osd getcrushmap -o /tmp/crush
crushtool -i /tmp/crush --set-chooseleaf_vary_r 0 -o /tmp/crush.new
ceph osd setcrushmap -i /tmp/crush.new

but got no luck.

my cluster status seems OK,

$ ceph health
HEALTH_OK

$ ceph osd tree
# idweight  type name   up/down reweight
-1  0.24root default
-2  0.24host ceph2
0   0.07999 osd.0   up  1
1   0.07999 osd.1   up  1
2   0.07999 osd.2   up  1

Thanks in advance.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Is lttng enable by default in debian hammer-0.94.5?

Re: [ceph-users] CephFS and page cache

Re: [ceph-users] rbd hang

Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd

Re: [ceph-users] Cloudstack agent crashed JVM with exception in librbd

Re: [ceph-users] rbd hang

Re: [ceph-users] radosgw get quota

Re: [ceph-users] radosgw get quota

Re: [ceph-users] rbd hang

Re: [ceph-users] rbd hang

Re: [ceph-users] Benchmark individual OSD's

[ceph-users] rbd hang

[ceph-users] rbd export hangs

[ceph-users] Cloudstack agent crashed JVM with exception in librbd

[ceph-users] ceph-mon segmentation faults after upgrade from 0.94.3 to 0.94.5

Re: [ceph-users] rsync mirror download.ceph.com - broken file on rsync server

Re: [ceph-users] Benchmark individual OSD's

Re: [ceph-users] Input/output error

[ceph-users] Issue with ceph-deploy

Re: [ceph-users] Core dump while getting a volume real size with a python script

Re: [ceph-users] Input/output error

Re: [ceph-users] Input/output error

Re: [ceph-users] Input/output error

Re: [ceph-users] Input/output error

Re: [ceph-users] CephFS and page cache

Re: [ceph-users] Benchmark individual OSD's

Re: [ceph-users] Benchmark individual OSD's

Re: [ceph-users] Core dump while getting a volume real size with a python script

Re: [ceph-users] CephFS and page cache

Re: [ceph-users] CephFS and page cache

[ceph-users] Input/output error

31 matches

Site Navigation

Mail list logo

Footer information