Re: [ceph-users] Calamari build in vagrants
Steffen Winther ceph.user@... writes: Trying to build calamari rpm+deb packages following this guide: http://karan-mj.blogspot.fi/2014/09/ceph-calamari-survival-guide.html Server packages works fine, but fails in clients for: dashboard manage admin login due to: yo 1.1.0 seems needed to build the clients, but can't found this with npm, what to do about this anyone? 1.1.0 seems oldest version npm will install, latest says 1.4.5 :( build error: npm ERR! notarget No compatible version found: yo at '=1.0.0-0 1.1.0-0' npm ERR! notarget Valid install targets: npm ERR! notarget [1.1.0,1.1.1,1.1.2, 1.2.0,1.2.1,1.3.0,1.3.2,1.3.3] Found a tar ball of yo@1.0.6 which can be installed with either: npm install -g tar ball of npm install -g package directory :) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] wider rados namespace support?
On 02/10/2015 07:54 PM, Blair Bethwaite wrote: Just came across this in the docs: Currently (i.e., firefly), namespaces are only useful for applications written on top of librados. Ceph clients such as block device, object storage and file system do not currently support this feature. Then found: https://wiki.ceph.com/Planning/Sideboard/rbd%3A_namespace_support Is there any progress or plans to address this (particularly for rbd clients but also cephfs)? No immediate plans for rbd. That blueprint still seems like a reasonable way to implement it to me. The one part I'm less sure about is the OpenStack or other higher level integration, which would need to start adding secret keys to libvirt dynamically. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mongodb on top of rbd volumes (through krbd) ?
Hi Alexandre, What is the behavior of mongo when a shard is unavailable for some reason (crash or network partition) ? If shard3 is on the wrong side of a network partition and uses RBD, it will hang. Is it something that mongo will gracefully handle ? I have no experience in this but I'm curious about this use case :-) Cheers On 12/02/2015 05:55, Alexandre DERUMIER wrote: Hi, I'm currently running a big mongodb cluster, around 2TB, (sharding + replication). And I have a lot of problems with mongo replication (out of syncs and need to full replicate again and again datas between my mongo replicats). So, I thinked to use rbd to replicate the storage and keep only sharding on mongo. (maybe with some kind of shard failover between nodes with corosync). NODE1 NODE2 NODE3 - --- [shard1] [shard2] [shard3] | | | | | | /dev/rbd0 /dev/rbd1 /dev/rbd2 Has somebody already tested such kind of setup with mongo ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mongodb on top of rbd volumes (through krbd) ?
What is the behavior of mongo when a shard is unavailable for some reason (crash or network partition) ? If shard3 is on the wrong side of a network partition and uses RBD, it will hang. Is it something that mongo will gracefully handle ? If one shard is down, I think the cluster is locked. That's why I thinked to add corosync/pacemaker to restart a mongod daemon on another host, migrate a vip, keeping the same /dev/rbd3 (as it can be shared on all nodes) for example. A little bit complex, but this mongodb replication is really buggy on high load. (Need to implement librados inside mongo ;) - Mail original - De: Loic Dachary l...@dachary.org À: aderumier aderum...@odiso.com, ceph-users ceph-us...@ceph.com Envoyé: Jeudi 12 Février 2015 11:12:02 Objet: Re: [ceph-users] mongodb on top of rbd volumes (through krbd) ? Hi Alexandre, What is the behavior of mongo when a shard is unavailable for some reason (crash or network partition) ? If shard3 is on the wrong side of a network partition and uses RBD, it will hang. Is it something that mongo will gracefully handle ? I have no experience in this but I'm curious about this use case :-) Cheers On 12/02/2015 05:55, Alexandre DERUMIER wrote: Hi, I'm currently running a big mongodb cluster, around 2TB, (sharding + replication). And I have a lot of problems with mongo replication (out of syncs and need to full replicate again and again datas between my mongo replicats). So, I thinked to use rbd to replicate the storage and keep only sharding on mongo. (maybe with some kind of shard failover between nodes with corosync). NODE1 NODE2 NODE3 - - -- [shard1] [shard2] [shard3] | | | | | | /dev/rbd0 /dev/rbd1 /dev/rbd2 Has somebody already tested such kind of setup with mongo ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] re?? Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow
I have this problems too , Help! -- -- ??: ??;yangwanyuan8...@gmail.com; : 2015??2??12??(??) 11:14 ??: ceph-users@lists.ceph.comceph-users@lists.ceph.com; : [ceph-users] Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow Hello!We use Ceph+Openstack in our private cloud. Recently we upgrade our centos6.5 based cluster from Ceph Emperor to Ceph Firefly. At first,we use redhat yum repo epel to upgrade, this Ceph's version is 0.80.5. First upgrade monitor,then osd,last client. when we complete this upgrade, we boot a VM on the cluster,then use fio to test the io performance. The io performance is as better as before. Everything is ok! Then we upgrade the cluster from 0.80.5 to 0.80.8,when we completed , we reboot the VM to load the newest librbd. after that we also use fio to test the io performance.then we find the randwrite and write is as good as before.but the randread and read is become worse, randwrite's iops from 4000-5000 to 300-400 ,and the latency is worse. the write's bw from 400MB/s to 115MB/s. then I downgrade the ceph client version from 0.80.8 to 0.80.5, then the reslut become normal. So I think maybe something cause about librbd. I compare the 0.80.8 release notes with 0.80.5 (http://ceph.com/docs/master/release-notes/#v0-80-8-firefly ), I just find this change in 0.80.8 is something about read request : librbd: cap memory utilization for read requests (Jason Dillaman) . Who can explain this?? My ceph cluster is 400osd,5mons: ceph -s health HEALTH_OK monmap e11: 5 mons at {BJ-M1-Cloud71=172.28.2.71:6789/0,BJ-M1-Cloud73=172.28.2.73:6789/0,BJ-M2-Cloud80=172.28.2.80:6789/0,BJ-M2-Cloud81=172.28.2.81:6789/0,BJ-M3-Cloud85=172.28.2.85:6789/0}, election epoch 198, quorum 0,1,2,3,4 BJ-M1-Cloud71,BJ-M1-Cloud73,BJ-M2-Cloud80,BJ-M2-Cloud81,BJ-M3-Cloud85 osdmap e120157: 400 osds: 400 up, 400 in pgmap v26161895: 29288 pgs, 6 pools, 20862 GB data, 3014 kobjects 41084 GB used, 323 TB / 363 TB avail 29288 active+clean client io 52640 kB/s rd, 32419 kB/s wr, 5193 op/s The follwing is my ceph client conf : [global] auth_service_required = cephx filestore_xattr_use_omap = true auth_client_required = cephx auth_cluster_required = cephx mon_host = 172.29.204.24,172.29.204.48,172.29.204.55,172.29.204.58,172.29.204.73 mon_initial_members = ZR-F5-Cloud24, ZR-F6-Cloud48, ZR-F7-Cloud55, ZR-F8-Cloud58, ZR-F9-Cloud73 fsid = c01c8e28-304e-47a4-b876-cb93acc2e980 mon osd full ratio = .85 mon osd nearfull ratio = .75 public network = 172.29.204.0/24 mon warn on legacy crush tunables = false [osd] osd op threads = 12 filestore journal writeahead = true filestore merge threshold = 40 filestore split multiple = 8 [client] rbd cache = true rbd cache writethrough until flush = false rbd cache size = 67108864 rbd cache max dirty = 50331648 rbd cache target dirty = 33554432 [client.cinder] admin socket = /var/run/ceph/rbd-$pid.asok My VM is 8core16G,we use fio scripts is : fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randread -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randwrite -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=read -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=write -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 The following is the io test result ceph client verison :0.80.5 read: bw=430MB write: bw=420MB randread: iops=4875 latency=65ms randwrite: iops=6844 latency=46ms ceph client verison :0.80.8 read: bw=115MB write: bw=480MB randread: iops=381 latency=83ms randwrite: iops=4843 latency=68ms___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] re: Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow
Hi, Can you test with disabling rbd_cache ? I remember of a bug detected in giant, not sure it's also the case for fireflt This was this tracker: http://tracker.ceph.com/issues/9513 But It has been solved and backported to firefly. Also, can you test 0.80.6 and 0.80.7 ? - Mail original - De: killingwolf killingw...@qq.com À: ceph-users ceph-users@lists.ceph.com Envoyé: Jeudi 12 Février 2015 12:16:32 Objet: [ceph-users] re: Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow I have this problems too , Help! -- 原始邮件 -- 发件人: 杨万元;yangwanyuan8...@gmail.com; 发送时间: 2015年2月12日(星期四) 中午11:14 收件人: ceph-users@lists.ceph.comceph-users@lists.ceph.com; 主题: [ceph-users] Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow Hello! We use Ceph+Openstack in our private cloud. Recently we upgrade our centos6.5 based cluster from Ceph Emperor to Ceph Firefly. At first,we use redhat yum repo epel to upgrade, this Ceph's version is 0.80.5. First upgrade monitor,then osd,last client. when we complete this upgrade, we boot a VM on the cluster,then use fio to test the io performance. The io performance is as better as before. Everything is ok! Then we upgrade the cluster from 0.80.5 to 0.80.8,when we completed , we reboot the VM to load the newest librbd. after that we also use fio to test the io performance .then we find the randwrite and write is as good as before.but the randread and read is become worse, randwrite's iops from 4000-5000 to 300-400 ,and the latency is worse. the write's bw from 400MB/s to 115MB/s . then I downgrade the ceph client version from 0.80.8 to 0.80.5, then the reslut become normal. So I think maybe something cause about librbd. I compare the 0.80.8 release notes with 0.80.5 ( http://ceph.com/docs/master/release-notes/#v0-80-8-firefly ), I just find this change in 0.80.8 is something about read request : librbd: cap memory utilization for read requests (Jason Dillaman) . Who can explain this? My ceph cluster is 400osd,5mons : ceph -s health HEALTH_OK monmap e11: 5 mons at {BJ-M1-Cloud71= 172.28.2.71:6789/0,BJ-M1-Cloud73=172.28.2.73:6789/0,BJ-M2-Cloud80=172.28.2.80:6789/0,BJ-M2-Cloud81=172.28.2.81:6789/0,BJ-M3-Cloud85=172.28.2.85:6789/0 }, election epoch 198, quorum 0,1,2,3,4 BJ-M1-Cloud71,BJ-M1-Cloud73,BJ-M2-Cloud80,BJ-M2-Cloud81,BJ-M3-Cloud85 osdmap e120157: 400 osds: 400 up, 400 in pgmap v26161895: 29288 pgs, 6 pools, 20862 GB data, 3014 kobjects 41084 GB used, 323 TB / 363 TB avail 29288 active+clean client io 52640 kB/s rd, 32419 kB/s wr, 5193 op/s The follwing is my ceph client conf : [global] auth_service_required = cephx filestore_xattr_use_omap = true auth_client_required = cephx auth_cluster_required = cephx mon_host = 172.29.204.24,172.29.204.48,172.29.204.55,172.29.204.58,172.29.204.73 mon_initial_members = ZR-F5-Cloud24, ZR-F6-Cloud48, ZR-F7-Cloud55, ZR-F8-Cloud58, ZR-F9-Cloud73 fsid = c01c8e28-304e-47a4-b876-cb93acc2e980 mon osd full ratio = .85 mon osd nearfull ratio = .75 public network = 172.29.204.0/24 mon warn on legacy crush tunables = false [osd] osd op threads = 12 filestore journal writeahead = true filestore merge threshold = 40 filestore split multiple = 8 [client] rbd cache = true rbd cache writethrough until flush = false rbd cache size = 67108864 rbd cache max dirty = 50331648 rbd cache target dirty = 33554432 [client.cinder] admin socket = /var/run/ceph/rbd-$pid.asok My VM is 8core16G,we use fio scripts is : fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randread -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randwrite -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=read -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=write -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 The following is the io test result ceph client verison :0.80.5 read: bw= 430MB write: bw=420MB randread: iops= 4875 latency=65ms randwrite: iops=6844 latency=46ms ceph client verison :0.80.8 read: bw= 115MB write: bw=480MB randread: iops= 381 latency=83ms randwrite: iops=4843 latency=68ms ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mongodb on top of rbd volumes (through krbd) ?
On 12/02/15 23:18, Alexandre DERUMIER wrote: What is the behavior of mongo when a shard is unavailable for some reason (crash or network partition) ? If shard3 is on the wrong side of a network partition and uses RBD, it will hang. Is it something that mongo will gracefully handle ? If one shard is down, I think the cluster is locked. That's why I thinked to add corosync/pacemaker to restart a mongod daemon on another host, migrate a vip, keeping the same /dev/rbd3 (as it can be shared on all nodes) for example. A little bit complex, but this mongodb replication is really buggy on high load. (Need to implement librados inside mongo ;) I wonder if it might be better to let Mongo do the replication (since that is what it understands) - so you'd use rbd volumes in pool(s) with replica size 1 (i.e no replication) its storage, and create n Mongo replicasets for each shard. That way a shard down will just be a degradation alert rather than fatal. Cheers Mark ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] wider rados namespace support?
My particular interest is for a less dynamic environment, so manual key distribution is not a problem. Re. OpenStack, it's probably good enough to have the Cinder host creating them as needed (presumably stored in its DB) and just send the secret keys over the message bus to compute hosts as needed - if your infrastructure network is not trusted then you've got bigger problems to worry about. It's true that a lot of clouds would end up logging the secrets in various places, but then they are only useful on particular hosts. I guess there is nothing special about the default namespace compared to any other as far as cephx is concerned. It would be nice to have something of a nested auth, so that the client requires explicit permission to read the default namespace (configured out-of-band when setting up compute hosts) and further permission for particular non-default namespaces (managed by the cinder rbd driver), that way leaking secrets from cinder gives less exposure - but I guess that would be a bit of a change from the current namespace functionality. On 13 February 2015 at 05:57, Josh Durgin josh.dur...@inktank.com wrote: On 02/10/2015 07:54 PM, Blair Bethwaite wrote: Just came across this in the docs: Currently (i.e., firefly), namespaces are only useful for applications written on top of librados. Ceph clients such as block device, object storage and file system do not currently support this feature. Then found: https://wiki.ceph.com/Planning/Sideboard/rbd%3A_namespace_support Is there any progress or plans to address this (particularly for rbd clients but also cephfs)? No immediate plans for rbd. That blueprint still seems like a reasonable way to implement it to me. The one part I'm less sure about is the OpenStack or other higher level integration, which would need to start adding secret keys to libvirt dynamically. -- Cheers, ~Blairo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] UGRENT: add mon failed and ceph monitor refresh log crazily
Hi , all developers and users when i add a new mon to current mon cluter, failed with 2 mon out of quorum. there are 5 mons in our ceph cluster: epoch 7 fsid 0dfd2bd5-1896-4712-916b-ec02dcc7b049 last_changed 2015-02-13 09:11:45.758839 created 0.00 0: 10.117.16.17:6789/0 mon.b 1: 10.118.32.7:6789/0 mon.cHEALTH_WARN 2 mons down, quorum 0,1,2 b,c,d mon.e (rank 3) addr 10.122.0.9:6789/0 is down (out of quorum) mon.f (rank 4) addr 10.122.48.11:6789/0 is down (out of quorum) 2: 10.119.16.11:6789/0 mon.d 3: 10.122.0.9:6789/0 mon.e 4: 10.122.48.11:6789/0 mon.f mon.f is newly added to montior cluster, but when starting mon.f, it caused both mon.e and mon.f out of quorum: HEALTH_WARN 2 mons down, quorum 0,1,2 b,c,d mon.e (rank 3) addr 10.122.0.9:6789/0 is down (out of quorum) mon.f (rank 4) addr 10.122.48.11:6789/0 is down (out of quorum) mon.b ,mon.c, mon.d, log refresh crazily as following: Feb 13 09:37:34 root ceph-mon: 2015-02-13 09:37:34.063628 7f7b64e14700 1 mon.b@0(leader).paxos(paxos active c 11818589..11819234) is_readable now=2015-02-13 09:37:34.063629 lease_expire=2015-02-13 09:37:38.205219 has v0 lc 11819234 Feb 13 09:37:34 root ceph-mon: 2015-02-13 09:37:34.090647 7f7b64e14700 1 mon.b@0(leader).paxos(paxos active c 11818589..11819234) is_readable now=2015-02-13 09:37:34.090648 lease_expire=2015-02-13 09:37:38.205219 has v0 lc 11819234 Feb 13 09:37:34 root ceph-mon: 2015-02-13 09:37:34.090661 7f7b64e14700 1 mon.b@0(leader).paxos(paxos active c 11818589..11819234) is_readable now=2015-02-13 09:37:34.090662 lease_expire=2015-02-13 09:37:38.205219 has v0 lc 11819234 .. and mon.f log : Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.526676 7f3931dfd7c0 0 ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f), process ceph-mon, pid 30639 Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.607412 7f3931dfd7c0 0 mon.f does not exist in monmap, will attempt to join an existing cluster Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.609838 7f3931dfd7c0 0 starting mon.f rank -1 at 10.122.48.11:6789/0 mon_data /osd/ceph/mon fsid 0dfd2bd5-1896-4712-916b-ec02dcc7b049 Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.610076 7f3931dfd7c0 1 mon.f@-1(probing) e0 preinit fsid 0dfd2bd5-1896-4712-916b-ec02dcc7b049 Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.636499 7f392a504700 0 -- 10.122.48.11:6789/0 10.119.16.11:6789/0 pipe(0x7f3934ebfb80 sd=26 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f3934ea9ce0).accept connect_seq 0 vs existing 0 state wait Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.636797 7f392a201700 0 -- 10.122.48.11:6789/0 10.122.0.9:6789/0 pipe(0x7f3934ec0800 sd=29 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f3934eaa940).accept connect_seq 0 vs existing 0 state wait Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.636968 7f392a403700 0 -- 10.122.48.11:6789/0 10.118.32.7:6789/0 pipe(0x7f3934ec0080 sd=27 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f3934ea9e40).accept connect_seq 0 vs existing 0 state wait Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.637037 7f392a302700 0 -- 10.122.48.11:6789/0 10.117.16.17:6789/0 pipe(0x7f3934ebfe00 sd=28 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f3934eaa260).accept connect_seq 0 vs existing 0 state wait Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.638854 7f392c00a700 0 mon.f@-1(probing) e7 my rank is now 4 (was -1) Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.639365 7f392c00a700 1 mon.f@4(synchronizing) e7 sync_obtain_latest_monmap Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.639494 7f392b008700 0 -- 10.122.48.11:6789/0 10.122.0.9:6789/0 pipe(0x7f3934ec0580 sd=17 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f3934eaa680).accept connect_seq 2 vs existing 0 state connecting Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.639513 7f392b008700 0 -- 10.122.48.11:6789/0 10.122.0.9:6789/0 pipe(0x7f3934ec0580 sd=17 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f3934eaa680).accept we reset (peer sent cseq 2, 0x7f3934ebf400.cseq = 0), sending RESETSESSION .. Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.643159 7f392af07700 0 -- 10.122.48.11:6789/0 10.119.16.11:6789/0 pipe(0x7f3934ec1700 sd=28 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f3934eab2e0).accept connect_seq 0 vs existing 0 state wait Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.637037 7f392a302700 0 -- 10.122.48.11:6789/0 10.117.16.17:6789/0 pipe(0x7f3934ebfe00 sd=28 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f3934eaa260).accept connect_seq 0 vs existing 0 state wait Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.638854 7f392c00a700 0 mon.f@-1(probing) e7 my rank is now 4 (was -1) Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.639365 7f392c00a700 1 mon.f@4(synchronizing) e7 sync_obtain_latest_monmap Feb 13 09:16:26 root ceph-mon: 2015-02-13 09:16:26.639494 7f392b008700 0 -- 10.122.48.11:6789/0 10.122.0.9:6789/0 pipe(0x7f3934ec0580 sd=17 :6789 s=0 pgs=0 cs=0 l=0 c=0x7f3934eaa680).accept connect_seq 2 vs existing 0 state connecting Feb 13 09:16:26 root ceph-mon:
Re: [ceph-users] re: Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow
thanks very much for your advice . yes,as you said,disabled the rbd_cache will improve the read request,but if i disabled rbd_cache, the randwrite request will be worse. so this method maybe can not solve my problem, is it ? In addition , I also test the 0.80.6 and 0.80.7 librbd,they are as good as 0.80.5 performance , so maybe can sure this problem is cause from 0.80.8 2015-02-12 19:33 GMT+08:00 Alexandre DERUMIER aderum...@odiso.com: Hi, Can you test with disabling rbd_cache ? I remember of a bug detected in giant, not sure it's also the case for fireflt This was this tracker: http://tracker.ceph.com/issues/9513 But It has been solved and backported to firefly. Also, can you test 0.80.6 and 0.80.7 ? - Mail original - De: killingwolf killingw...@qq.com À: ceph-users ceph-users@lists.ceph.com Envoyé: Jeudi 12 Février 2015 12:16:32 Objet: [ceph-users] re: Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow I have this problems too , Help! -- 原始邮件 -- 发件人: 杨万元;yangwanyuan8...@gmail.com; 发送时间: 2015年2月12日(星期四) 中午11:14 收件人: ceph-users@lists.ceph.comceph-users@lists.ceph.com; 主题: [ceph-users] Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow Hello! We use Ceph+Openstack in our private cloud. Recently we upgrade our centos6.5 based cluster from Ceph Emperor to Ceph Firefly. At first,we use redhat yum repo epel to upgrade, this Ceph's version is 0.80.5. First upgrade monitor,then osd,last client. when we complete this upgrade, we boot a VM on the cluster,then use fio to test the io performance. The io performance is as better as before. Everything is ok! Then we upgrade the cluster from 0.80.5 to 0.80.8,when we completed , we reboot the VM to load the newest librbd. after that we also use fio to test the io performance .then we find the randwrite and write is as good as before.but the randread and read is become worse, randwrite's iops from 4000-5000 to 300-400 ,and the latency is worse. the write's bw from 400MB/s to 115MB/s . then I downgrade the ceph client version from 0.80.8 to 0.80.5, then the reslut become normal. So I think maybe something cause about librbd. I compare the 0.80.8 release notes with 0.80.5 ( http://ceph.com/docs/master/release-notes/#v0-80-8-firefly ), I just find this change in 0.80.8 is something about read request : librbd: cap memory utilization for read requests (Jason Dillaman) . Who can explain this? My ceph cluster is 400osd,5mons : ceph -s health HEALTH_OK monmap e11: 5 mons at {BJ-M1-Cloud71= 172.28.2.71:6789/0,BJ-M1-Cloud73=172.28.2.73:6789/0,BJ-M2-Cloud80=172.28.2.80:6789/0,BJ-M2-Cloud81=172.28.2.81:6789/0,BJ-M3-Cloud85=172.28.2.85:6789/0 }, election epoch 198, quorum 0,1,2,3,4 BJ-M1-Cloud71,BJ-M1-Cloud73,BJ-M2-Cloud80,BJ-M2-Cloud81,BJ-M3-Cloud85 osdmap e120157: 400 osds: 400 up, 400 in pgmap v26161895: 29288 pgs, 6 pools, 20862 GB data, 3014 kobjects 41084 GB used, 323 TB / 363 TB avail 29288 active+clean client io 52640 kB/s rd, 32419 kB/s wr, 5193 op/s The follwing is my ceph client conf : [global] auth_service_required = cephx filestore_xattr_use_omap = true auth_client_required = cephx auth_cluster_required = cephx mon_host = 172.29.204.24,172.29.204.48,172.29.204.55,172.29.204.58,172.29.204.73 mon_initial_members = ZR-F5-Cloud24, ZR-F6-Cloud48, ZR-F7-Cloud55, ZR-F8-Cloud58, ZR-F9-Cloud73 fsid = c01c8e28-304e-47a4-b876-cb93acc2e980 mon osd full ratio = .85 mon osd nearfull ratio = .75 public network = 172.29.204.0/24 mon warn on legacy crush tunables = false [osd] osd op threads = 12 filestore journal writeahead = true filestore merge threshold = 40 filestore split multiple = 8 [client] rbd cache = true rbd cache writethrough until flush = false rbd cache size = 67108864 rbd cache max dirty = 50331648 rbd cache target dirty = 33554432 [client.cinder] admin socket = /var/run/ceph/rbd-$pid.asok My VM is 8core16G,we use fio scripts is : fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randread -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randwrite -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=read -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=write -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 The following is the io test result ceph client verison :0.80.5 read: bw= 430MB write: bw=420MB randread: iops= 4875 latency=65ms randwrite: iops=6844 latency=46ms ceph client verison :0.80.8 read: bw= 115MB write: bw=480MB randread: iops= 381 latency=83ms randwrite: iops=4843 latency=68ms ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Introducing Learning Ceph : The First ever Book on Ceph
Wow, Cong BTW, I found the link of sample copy is 404. 2015-02-06 6:53 GMT+08:00 Karan Singh karan.si...@csc.fi: Hello Community Members I am happy to introduce the first book on Ceph with the title “*Learning Ceph*”. Me and many folks from the publishing house together with technical reviewers spent several months to get this book compiled and published. Finally the book is up for sale on , i hope you would like it and surely will learn a lot from it. Amazon : http://www.amazon.com/Learning-Ceph-Karan-Singh/dp/1783985623/ref=sr_1_1?s=booksie=UTF8qid=1423174441sr=1-1keywords=ceph Packtpub : https://www.packtpub.com/application-development/learning-ceph You can grab the sample copy from here : https://www.dropbox.com/s/ek76r01r9prs6pb/Learning_Ceph_Packt.pdf?dl=0 *Finally , I would like to express my sincere thanks to * *Sage Weil* - For developing Ceph and everything around it as well as writing foreword for “Learning Ceph”. *Patrick McGarry *- For his usual off the track support that too always. Last but not the least , to our great community members , who are also reviewers of the book *Don Talton , Julien Recurt , Sebastien Han *and *Zihong Chen *, Thank you guys for your efforts. Karan Singh Systems Specialist , Storage Platforms CSC - IT Center for Science, Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland mobile: +358 503 812758 tel. +358 9 4572001 fax +358 9 4572302 http://www.csc.fi/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Regards Frank Yu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph mds zombie
On Tue, Feb 10, 2015 at 9:26 PM, kenmasida 981163...@qq.com wrote: hi, everybody Thang you for reading my question. my ceph cluster is 5 mon, 1 mds , 3 osd . When ceph cluster runned one day or some days, I can't cp some file from ceph. I use mount.ceph for client . The cp'command is zombie for a long long time ! When I restart mds , cp again , it work well . But after some days , I alse can't cp file from ceph cluster. kernel version? when the hang happens again, find PID of cp and send content of /proc/PID/stack to us. Regards Yan, Zheng The mon log , mds log ,osd log is good, ceph -w is healthy ok.What can i do ? Any advise is important to me ! Thant you very much ! ceph version is 0.80.5 , centos 6.4 x86 64bit , rpm install. Best Regards Kenmasida ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph Performance with SSD journal
Hi Chir, Please fidn my answer below in blue On Thu, Feb 12, 2015 at 12:42 PM, Chris Hoy Poy ch...@gopc.net wrote: Hi Sumit, A couple questions: What brand/model SSD? samsung 480G SSD(PM853T) having random write 90K IOPS (4K, 368MBps) What brand/model HDD? 64GB memory, 300GB SAS HDD (seagate), 10Gb nic Also how they are connected to controller/motherboard? Are they sharing a bus (ie SATA expander)? no , They are connected with local Bus not the SATA expander. RAM? *64GB * Also look at the output of iostat -x or similiar, are the SSDs hitting 100% utilisation? *No, SSD was hitting 2000 iops only. * I suspect that the 5:1 ratio of HDDs to SDDs is not ideal, you now have 5x the write IO trying to fit into a single SSD. * I have not seen any documented reference to calculate the ratio. Could you suggest one. Here I want to mention that results for 1024K write improve a lot. Problem is with 1024K read and 4k write .* *SSD journal 810 IOPS and 810MBps* *HDD journal 620 IOPS and 620 MBps* I'll take a punt on it being a SATA connected SSD (most common), 5x ~130 megabytes/second gets very close to most SATA bus limits. If its a shared BUS, you possibly hit that limit even earlier (since all that data is now being written twice out over the bus). cheers; \Chris -- *From: *Sumit Gaur sumitkg...@gmail.com *To: *ceph-users@lists.ceph.com *Sent: *Thursday, 12 February, 2015 9:23:35 AM *Subject: *[ceph-users] ceph Performance with SSD journal Hi Ceph-Experts, Have a small ceph architecture related question As blogs and documents suggest that ceph perform much better if we use journal on SSD. I have made the ceph cluster with 30 HDD + 6 SSD for 6 OSD nodes. 5 HDD + 1 SSD on each node and each SSD have 5 partition for journaling 5 OSDs on the node. Now I ran similar test as I ran for all HDD setup. What I saw below two reading goes in wrong direction as expected 1) 4K write IOPS are less for SSD setup, though not major difference but less. 2) 1024K Read IOPS are less for SSD setup than HDD setup. On the other hand 4K read and 1024K write both have much better numbers for SSD setup. Let me know if I am missing some obvious concept. Thanks sumit ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] certificate of `ceph.com' is not trusted!
I get the following error on standard Debian Wheezy # wget https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc --2015-02-13 07:19:04-- https://ceph.com/git/?p=ceph.git Resolving ceph.com (ceph.com)... 208.113.241.137, 2607:f298:4:147::b05:fe2a Connecting to ceph.com (ceph.com)|208.113.241.137|:443... connected. ERROR: The certificate of `ceph.com' is not trusted. ERROR: The certificate of `ceph.com' hasn't got a known issuer. Previously, this worked without problem. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] certificate of `ceph.com' is not trusted!
Hi, I think the root-CA (COMODO RSA Certification Authority) is not available on your Linux host? Using Google chrome connecting to https://ceph.com/ works fine. regards Danny -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Dietmar Maurer Sent: Friday, February 13, 2015 8:10 AM To: ceph-users Subject: [ceph-users] certificate of `ceph.com' is not trusted! I get the following error on standard Debian Wheezy # wget https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc --2015-02-13 07:19:04-- https://ceph.com/git/?p=ceph.git Resolving ceph.com (ceph.com)... 208.113.241.137, 2607:f298:4:147::b05:fe2a Connecting to ceph.com (ceph.com)|208.113.241.137|:443... connected. ERROR: The certificate of `ceph.com' is not trusted. ERROR: The certificate of `ceph.com' hasn't got a known issuer. Previously, this worked without problem. smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD slow requests causing disk aborts in KVM
Thu Feb 12 2015 at 16:23:38 użytkownik Andrey Korolyov and...@xdel.ru napisał: On Fri, Feb 6, 2015 at 12:16 PM, Krzysztof Nowicki krzysztof.a.nowi...@gmail.com wrote: Hi all, I'm running a small Ceph cluster with 4 OSD nodes, which serves as a storage backend for a set of KVM virtual machines. The VMs use RBD for disk storage. On the VM side I'm using virtio-scsi instead of virtio-blk in order to gain DISCARD support. Each OSD node is running on a separate machine, using 3TB WD Black drive + Samsung SSD for journal. The machines used for OSD nodes are not equal in spec. Three of them are small servers, while one is a desktop PC. The last node is the one causing trouble. During high loads caused by remapping due to one of the other nodes going down I've experienced some slow requests. To my surprise however these slow requests caused aborts from the block device on the VM side, which ended up corrupting files. What I wonder if such behaviour (aborts) is normal in case slow requests pile up. I always though that these requests would be delayed but eventually they'd be handled. Are there any tunables that would help me avoid such situations? I would really like to avoid VM outages caused by such corruption issues. I can attach some logs if needed. Best regards Chris Hi, this is unevitable payoff for using scsi backend on a storage which is capable to slow enough operations. There was some argonaut/bobtail-era discussions in ceph ml, may be those readings can be interesting for you. AFAIR the scsi disk would about after 70s of non-receiving ack state for a pending operation. Can this timeout be increased in some way? I've searched around and found the /sys/block/sdx/device/timeout knob, which in my case is set to 30s. As for the versions I'm running all Ceph nodes on Gentoo with Ceph version 0.80.5. The VM guest in question is running Ubuntu 12.04 LTS with kernel 3.13. The guest filesystem is BTRFS. I'm thinking that the corruption may be some BTRFS bug. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade 0.80.5 to 0.80.8 --the VM's read request become too slow
Hi, Do you have also tested 0.80.6 and 0.80.7 librbd ? could be usefull to search commits in git. (I'm not sure that all changes are in the release note) - Mail original - De: 杨万元 yangwanyuan8...@gmail.com À: ceph-users ceph-users@lists.ceph.com Envoyé: Jeudi 12 Février 2015 04:14:15 Objet: [ceph-users] Upgrade 0.80.5 to 0.80.8 --the VM's read request become too slow Hello! We use Ceph+Openstack in our private cloud. Recently we upgrade our centos6.5 based cluster from Ceph Emperor to Ceph Firefly. At first,we use redhat yum repo epel to upgrade, this Ceph's version is 0.80.5. First upgrade monitor,then osd,last client. when we complete this upgrade, we boot a VM on the cluster,then use fio to test the io performance. The io performance is as better as before. Everything is ok! Then we upgrade the cluster from 0.80.5 to 0.80.8,when we completed , we reboot the VM to load the newest librbd. after that we also use fio to test the io performance .then we find the randwrite and write is as good as before.but the randread and read is become worse, randwrite's iops from 4000-5000 to 300-400 ,and the latency is worse. the write's bw from 400MB/s to 115MB/s . then I downgrade the ceph client version from 0.80.8 to 0.80.5, then the reslut become normal. So I think maybe something cause about librbd. I compare the 0.80.8 release notes with 0.80.5 ( http://ceph.com/docs/master/release-notes/#v0-80-8-firefly ), I just find this change in 0.80.8 is something about read request : librbd: cap memory utilization for read requests (Jason Dillaman) . Who can explain this? My ceph cluster is 400osd,5mons : ceph -s health HEALTH_OK monmap e11: 5 mons at {BJ-M1-Cloud71= 172.28.2.71:6789/0,BJ-M1-Cloud73=172.28.2.73:6789/0,BJ-M2-Cloud80=172.28.2.80:6789/0,BJ-M2-Cloud81=172.28.2.81:6789/0,BJ-M3-Cloud85=172.28.2.85:6789/0 }, election epoch 198, quorum 0,1,2,3,4 BJ-M1-Cloud71,BJ-M1-Cloud73,BJ-M2-Cloud80,BJ-M2-Cloud81,BJ-M3-Cloud85 osdmap e120157: 400 osds: 400 up, 400 in pgmap v26161895: 29288 pgs, 6 pools, 20862 GB data, 3014 kobjects 41084 GB used, 323 TB / 363 TB avail 29288 active+clean client io 52640 kB/s rd, 32419 kB/s wr, 5193 op/s The follwing is my ceph client conf : [global] auth_service_required = cephx filestore_xattr_use_omap = true auth_client_required = cephx auth_cluster_required = cephx mon_host = 172.29.204.24,172.29.204.48,172.29.204.55,172.29.204.58,172.29.204.73 mon_initial_members = ZR-F5-Cloud24, ZR-F6-Cloud48, ZR-F7-Cloud55, ZR-F8-Cloud58, ZR-F9-Cloud73 fsid = c01c8e28-304e-47a4-b876-cb93acc2e980 mon osd full ratio = .85 mon osd nearfull ratio = .75 public network = 172.29.204.0/24 mon warn on legacy crush tunables = false [osd] osd op threads = 12 filestore journal writeahead = true filestore merge threshold = 40 filestore split multiple = 8 [client] rbd cache = true rbd cache writethrough until flush = false rbd cache size = 67108864 rbd cache max dirty = 50331648 rbd cache target dirty = 33554432 [client.cinder] admin socket = /var/run/ceph/rbd-$pid.asok My VM is 8core16G,we use fio scripts is : fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randread -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randwrite -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=read -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=write -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 The following is the io test result ceph client verison :0.80.5 read: bw= 430MB write: bw=420MB randread: iops= 4875 latency=65ms randwrite: iops=6844 latency=46ms ceph client verison :0.80.8 read: bw= 115MB write: bw=480MB randread: iops= 381 latency=83ms randwrite: iops=4843 latency=68ms ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] re: Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow
Hi. hmm ... I thought, why I have such a low speed reading on another cluster P.S. ceph 0.80.8 2015-02-12 14:33 GMT+03:00 Alexandre DERUMIER aderum...@odiso.com: Hi, Can you test with disabling rbd_cache ? I remember of a bug detected in giant, not sure it's also the case for fireflt This was this tracker: http://tracker.ceph.com/issues/9513 But It has been solved and backported to firefly. Also, can you test 0.80.6 and 0.80.7 ? - Mail original - De: killingwolf killingw...@qq.com À: ceph-users ceph-users@lists.ceph.com Envoyé: Jeudi 12 Février 2015 12:16:32 Objet: [ceph-users] re: Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow I have this problems too , Help! -- 原始邮件 -- 发件人: 杨万元;yangwanyuan8...@gmail.com; 发送时间: 2015年2月12日(星期四) 中午11:14 收件人: ceph-users@lists.ceph.comceph-users@lists.ceph.com; 主题: [ceph-users] Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow Hello! We use Ceph+Openstack in our private cloud. Recently we upgrade our centos6.5 based cluster from Ceph Emperor to Ceph Firefly. At first,we use redhat yum repo epel to upgrade, this Ceph's version is 0.80.5. First upgrade monitor,then osd,last client. when we complete this upgrade, we boot a VM on the cluster,then use fio to test the io performance. The io performance is as better as before. Everything is ok! Then we upgrade the cluster from 0.80.5 to 0.80.8,when we completed , we reboot the VM to load the newest librbd. after that we also use fio to test the io performance .then we find the randwrite and write is as good as before.but the randread and read is become worse, randwrite's iops from 4000-5000 to 300-400 ,and the latency is worse. the write's bw from 400MB/s to 115MB/s . then I downgrade the ceph client version from 0.80.8 to 0.80.5, then the reslut become normal. So I think maybe something cause about librbd. I compare the 0.80.8 release notes with 0.80.5 ( http://ceph.com/docs/master/release-notes/#v0-80-8-firefly ), I just find this change in 0.80.8 is something about read request : librbd: cap memory utilization for read requests (Jason Dillaman) . Who can explain this? My ceph cluster is 400osd,5mons : ceph -s health HEALTH_OK monmap e11: 5 mons at {BJ-M1-Cloud71= 172.28.2.71:6789/0,BJ-M1-Cloud73=172.28.2.73:6789/0,BJ-M2-Cloud80=172.28.2.80:6789/0,BJ-M2-Cloud81=172.28.2.81:6789/0,BJ-M3-Cloud85=172.28.2.85:6789/0 }, election epoch 198, quorum 0,1,2,3,4 BJ-M1-Cloud71,BJ-M1-Cloud73,BJ-M2-Cloud80,BJ-M2-Cloud81,BJ-M3-Cloud85 osdmap e120157: 400 osds: 400 up, 400 in pgmap v26161895: 29288 pgs, 6 pools, 20862 GB data, 3014 kobjects 41084 GB used, 323 TB / 363 TB avail 29288 active+clean client io 52640 kB/s rd, 32419 kB/s wr, 5193 op/s The follwing is my ceph client conf : [global] auth_service_required = cephx filestore_xattr_use_omap = true auth_client_required = cephx auth_cluster_required = cephx mon_host = 172.29.204.24,172.29.204.48,172.29.204.55,172.29.204.58,172.29.204.73 mon_initial_members = ZR-F5-Cloud24, ZR-F6-Cloud48, ZR-F7-Cloud55, ZR-F8-Cloud58, ZR-F9-Cloud73 fsid = c01c8e28-304e-47a4-b876-cb93acc2e980 mon osd full ratio = .85 mon osd nearfull ratio = .75 public network = 172.29.204.0/24 mon warn on legacy crush tunables = false [osd] osd op threads = 12 filestore journal writeahead = true filestore merge threshold = 40 filestore split multiple = 8 [client] rbd cache = true rbd cache writethrough until flush = false rbd cache size = 67108864 rbd cache max dirty = 50331648 rbd cache target dirty = 33554432 [client.cinder] admin socket = /var/run/ceph/rbd-$pid.asok My VM is 8core16G,we use fio scripts is : fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randread -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randwrite -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=read -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=write -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 The following is the io test result ceph client verison :0.80.5 read: bw= 430MB write: bw=420MB randread: iops= 4875 latency=65ms randwrite: iops=6844 latency=46ms ceph client verison :0.80.8 read: bw= 115MB write: bw=480MB randread: iops= 381 latency=83ms randwrite: iops=4843 latency=68ms ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 ___ ceph-users mailing list
[ceph-users] Cache Tier 1 vs. Journal
Hello! If I using cache tier 1 pool in writeback mode, it is a good idea turn off journal on OSDs? I think in this sutuation journal can help if you are hit a rebalance procedure in a cold storage. In outer situation the journal is useless, I think. Any comments? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS removal.
What version of Ceph are you running? It's varied by a bit. But I think you want to just turn off the MDS and run the fail command — deactivate is actually the command for removing a logical MDS from the cluster, and you can't do that for a lone MDS because there's nobody to pass off the data to. I'll make a ticket to clarify this. When you've done that you should be able to delete it. -Greg On Mon, Feb 2, 2015 at 1:40 AM, warren.je...@stfc.ac.uk wrote: Hi All, Having a few problems removing cephfs file systems. I want to remove my current pools (was used for test data) – wiping all current data, and start a fresh file system on my current cluster. I have looked over the documentation but I can’t find anything on this. I have an object store pool, Which I don’t want to remove – but I’d like to remove the cephfs file system pools and remake them. My cephfs is called ‘data’. Running ceph fs delete data returns: Error EINVAL: all MDS daemons must be inactive before removing filesystem To make an MDS inactive I believe the command is: ceph mds deactivate 0 Which returns: telling mds.0 135.248.53.134:6809/16692 to deactivate Checking the status of the mds using: ceph mds stat returns: e105: 1/1/0 up {0=node2=up:stopping} This has been sitting at this status for the whole weekend with no change. I don’t have any clients connected currently. When trying to manually just remove the pools, it’s not allowed as there is a cephfs file system on them. I’m happy that all of the failsafe’s to stop someone removing a pool are all working correctly. If this is currently undoable. Is there a way to quickly wipe a cephfs filesystem – using RM from a kernel client is really slow. Many thanks Warren Jeffs ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS removal.
I am running 0.87, In the end I just wiped the cluster and started again - it was quicker. Warren -Original Message- From: Gregory Farnum [mailto:g...@gregs42.com] Sent: 12 February 2015 16:25 To: Jeffs, Warren (STFC,RAL,ISIS) Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] CephFS removal. What version of Ceph are you running? It's varied by a bit. But I think you want to just turn off the MDS and run the fail command — deactivate is actually the command for removing a logical MDS from the cluster, and you can't do that for a lone MDS because there's nobody to pass off the data to. I'll make a ticket to clarify this. When you've done that you should be able to delete it. -Greg On Mon, Feb 2, 2015 at 1:40 AM, warren.je...@stfc.ac.uk wrote: Hi All, Having a few problems removing cephfs file systems. I want to remove my current pools (was used for test data) – wiping all current data, and start a fresh file system on my current cluster. I have looked over the documentation but I can’t find anything on this. I have an object store pool, Which I don’t want to remove – but I’d like to remove the cephfs file system pools and remake them. My cephfs is called ‘data’. Running ceph fs delete data returns: Error EINVAL: all MDS daemons must be inactive before removing filesystem To make an MDS inactive I believe the command is: ceph mds deactivate 0 Which returns: telling mds.0 135.248.53.134:6809/16692 to deactivate Checking the status of the mds using: ceph mds stat returns: e105: 1/1/0 up {0=node2=up:stopping} This has been sitting at this status for the whole weekend with no change. I don’t have any clients connected currently. When trying to manually just remove the pools, it’s not allowed as there is a cephfs file system on them. I’m happy that all of the failsafe’s to stop someone removing a pool are all working correctly. If this is currently undoable. Is there a way to quickly wipe a cephfs filesystem – using RM from a kernel client is really slow. Many thanks Warren Jeffs ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OSD slow requests causing disk aborts in KVM
Hi all, I'm running a small Ceph cluster with 4 OSD nodes, which serves as a storage backend for a set of KVM virtual machines. The VMs use RBD for disk storage. On the VM side I'm using virtio-scsi instead of virtio-blk in order to gain DISCARD support. Each OSD node is running on a separate machine, using 3TB WD Black drive + Samsung SSD for journal. The machines used for OSD nodes are not equal in spec. Three of them are small servers, while one is a desktop PC. The last node is the one causing trouble. During high loads caused by remapping due to one of the other nodes going down I've experienced some slow requests. To my surprise however these slow requests caused aborts from the block device on the VM side, which ended up corrupting files. What I wonder if such behaviour (aborts) is normal in case slow requests pile up. I always though that these requests would be delayed but eventually they'd be handled. Are there any tunables that would help me avoid such situations? I would really like to avoid VM outages caused by such corruption issues. I can attach some logs if needed. Best regards Chris ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] re: Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow
ok, I'll test it tomorrow , thank you. -- Original -- From: Irek Fasikhov;malm...@gmail.com; Date: Thu, Feb 12, 2015 09:29 PM To: Alexandre DERUMIERaderum...@odiso.com; Cc: killingwolfkillingw...@qq.com; ceph-usersceph-users@lists.ceph.com; Subject: Re: [ceph-users] re: Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow Hi. hmm ... I thought, why I have such a low speed reading on another cluster P.S. ceph 0.80.8 2015-02-12 14:33 GMT+03:00 Alexandre DERUMIER aderum...@odiso.com: Hi, Can you test with disabling rbd_cache ? I remember of a bug detected in giant, not sure it's also the case for fireflt This was this tracker: http://tracker.ceph.com/issues/9513 But It has been solved and backported to firefly. Also, can you test 0.80.6 and 0.80.7 ? - Mail original - De: killingwolf killingw...@qq.com À: ceph-users ceph-users@lists.ceph.com Envoyé: Jeudi 12 Février 2015 12:16:32 Objet: [ceph-users] re: Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow I have this problems too , Help! -- 原始邮件 -- 发件人: 杨万元;yangwanyuan8...@gmail.com; 发送时间: 2015年2月12日(星期四) 中午11:14 收件人: ceph-users@lists.ceph.comceph-users@lists.ceph.com; 主题: [ceph-users] Upgrade 0.80.5 to 0.80.8 --the VM's read requestbecome too slow Hello! We use Ceph+Openstack in our private cloud. Recently we upgrade our centos6.5 based cluster from Ceph Emperor to Ceph Firefly. At first,we use redhat yum repo epel to upgrade, this Ceph's version is 0.80.5. First upgrade monitor,then osd,last client. when we complete this upgrade, we boot a VM on the cluster,then use fio to test the io performance. The io performance is as better as before. Everything is ok! Then we upgrade the cluster from 0.80.5 to 0.80.8,when we completed , we reboot the VM to load the newest librbd. after that we also use fio to test the io performance .then we find the randwrite and write is as good as before.but the randread and read is become worse, randwrite's iops from 4000-5000 to 300-400 ,and the latency is worse. the write's bw from 400MB/s to 115MB/s . then I downgrade the ceph client version from 0.80.8 to 0.80.5, then the reslut become normal. So I think maybe something cause about librbd. I compare the 0.80.8 release notes with 0.80.5 ( http://ceph.com/docs/master/release-notes/#v0-80-8-firefly ), I just find this change in 0.80.8 is something about read request : librbd: cap memory utilization for read requests (Jason Dillaman) . Who can explain this? My ceph cluster is 400osd,5mons : ceph -s health HEALTH_OK monmap e11: 5 mons at {BJ-M1-Cloud71= 172.28.2.71:6789/0,BJ-M1-Cloud73=172.28.2.73:6789/0,BJ-M2-Cloud80=172.28.2.80:6789/0,BJ-M2-Cloud81=172.28.2.81:6789/0,BJ-M3-Cloud85=172.28.2.85:6789/0 }, election epoch 198, quorum 0,1,2,3,4 BJ-M1-Cloud71,BJ-M1-Cloud73,BJ-M2-Cloud80,BJ-M2-Cloud81,BJ-M3-Cloud85 osdmap e120157: 400 osds: 400 up, 400 in pgmap v26161895: 29288 pgs, 6 pools, 20862 GB data, 3014 kobjects 41084 GB used, 323 TB / 363 TB avail 29288 active+clean client io 52640 kB/s rd, 32419 kB/s wr, 5193 op/s The follwing is my ceph client conf : [global] auth_service_required = cephx filestore_xattr_use_omap = true auth_client_required = cephx auth_cluster_required = cephx mon_host = 172.29.204.24,172.29.204.48,172.29.204.55,172.29.204.58,172.29.204.73 mon_initial_members = ZR-F5-Cloud24, ZR-F6-Cloud48, ZR-F7-Cloud55, ZR-F8-Cloud58, ZR-F9-Cloud73 fsid = c01c8e28-304e-47a4-b876-cb93acc2e980 mon osd full ratio = .85 mon osd nearfull ratio = .75 public network = 172.29.204.0/24 mon warn on legacy crush tunables = false [osd] osd op threads = 12 filestore journal writeahead = true filestore merge threshold = 40 filestore split multiple = 8 [client] rbd cache = true rbd cache writethrough until flush = false rbd cache size = 67108864 rbd cache max dirty = 50331648 rbd cache target dirty = 33554432 [client.cinder] admin socket = /var/run/ceph/rbd-$pid.asok My VM is 8core16G,we use fio scripts is : fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randread -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randwrite -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=read -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=write -size=60G -filename=/dev/vdb -name=EBS -iodepth=32 -runtime=200 The following is the io test result ceph client verison :0.80.5 read: bw= 430MB write: bw=420MB randread: iops= 4875 latency=65ms randwrite: iops=6844 latency=46ms ceph client verison :0.80.8 read: bw= 115MB write: bw=480MB randread: iops= 381 latency=83ms randwrite: iops=4843 latency=68ms
[ceph-users] Random OSDs respawning continuously
Hi all, Cluster : 540 OSDs , Cache tier and EC pool ceph version 0.87 cluster c2a97a2f-fdc7-4eb5-82ef-70c52f2eceb1 health HEALTH_WARN 10 pgs peering; 21 pgs stale; 2 pgs stuck inactive; 2 pgs stuck unclean; 287 requests are blocked 32 sec; recovery 24/6707031 objects degraded (0.000%); too few pgs per osd (13 min 20); 1/552 in osds are down; clock skew detected on mon.master02, mon.master03 monmap e3: 3 mons at {master01= 10.1.2.231:6789/0,master02=10.1.2.232:6789/0,master03=10.1.2.233:6789/0}, election epoch 4, quorum 0,1,2 master01,master02,master03 mdsmap e17: 1/1/1 up {0=master01=up:active} osdmap e57805: 552 osds: 551 up, 552 in pgmap v278604: 7264 pgs, 3 pools, 2027 GB data, 547 kobjects 3811 GB used, 1958 TB / 1962 TB avail 24/6707031 objects degraded (0.000%) 7 stale+peering 3 peering 7240 active+clean 13 stale 1 stale+active We have mounted ceph using ceph-fuse client . Suddenly some of osds are re spawning continuously. Still cluster health is unstable. How to stop the respawning osds? 2015-02-12 18:41:51.562337 7f8371373900 0 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), process ceph-osd, pid 3911 2015-02-12 18:41:51.564781 7f8371373900 0 filestore(/var/lib/ceph/osd/ceph-538) backend xfs (magic 0x58465342) 2015-02-12 18:41:51.564792 7f8371373900 1 filestore(/var/lib/ceph/osd/ceph-538) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2015-02-12 18:41:51.655623 7f8371373900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is supported and appears to work 2015-02-12 18:41:51.655639 7f8371373900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-02-12 18:41:51.663864 7f8371373900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2015-02-12 18:41:51.663910 7f8371373900 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_feature: extsize is disabled by conf 2015-02-12 18:41:51.994021 7f8371373900 0 filestore(/var/lib/ceph/osd/ceph-538) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2015-02-12 18:41:52.788178 7f8371373900 1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-02-12 18:41:52.848430 7f8371373900 1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-02-12 18:41:52.922806 7f8371373900 1 journal close /var/lib/ceph/osd/ceph-538/journal 2015-02-12 18:41:52.948320 7f8371373900 0 filestore(/var/lib/ceph/osd/ceph-538) backend xfs (magic 0x58465342) 2015-02-12 18:41:52.981122 7f8371373900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is supported and appears to work 2015-02-12 18:41:52.981137 7f8371373900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-02-12 18:41:52.989395 7f8371373900 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2015-02-12 18:41:52.989440 7f8371373900 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-538) detect_feature: extsize is disabled by conf 2015-02-12 18:41:53.149095 7f8371373900 0 filestore(/var/lib/ceph/osd/ceph-538) mount: WRITEAHEAD journal mode explicitly enabled in conf 2015-02-12 18:41:53.154258 7f8371373900 1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-02-12 18:41:53.217404 7f8371373900 1 journal _open /var/lib/ceph/osd/ceph-538/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1 2015-02-12 18:41:53.467512 7f8371373900 0 cls cls/hello/cls_hello.cc:271: loading cls_hello 2015-02-12 18:41:53.563846 7f8371373900 0 osd.538 54486 crush map has features 104186773504, adjusting msgr requires for clients 2015-02-12 18:41:53.563865 7f8371373900 0 osd.538 54486 crush map has features 379064680448 was 8705, adjusting msgr requires for mons 2015-02-12 18:41:53.563869 7f8371373900 0 osd.538 54486 crush map has features 379064680448, adjusting msgr requires for osds 2015-02-12 18:41:53.563888 7f8371373900 0 osd.538 54486 load_pgs 2015-02-12 18:41:55.430730 7f8371373900 0 osd.538 54486 load_pgs opened 137 pgs 2015-02-12 18:41:55.432854 7f8371373900 -1 osd.538 54486 set_disk_tp_priority(22) Invalid argument: osd_disk_thread_ioprio_class is but only the following values are allowed: idle, be or rt 2015-02-12 18:41:55.442748 7f835dfc8700 0 osd.538 54486 ignoring osdmap until we have initialized 2015-02-12 18:41:55.456802 7f835dfc8700 0 osd.538 54486 ignoring osdmap until we have
[ceph-users] CephFS removal.
Hi All, Having a few problems removing cephfs file systems. I want to remove my current pools (was used for test data) - wiping all current data, and start a fresh file system on my current cluster. I have looked over the documentation but I can't find anything on this. I have an object store pool, Which I don't want to remove - but I'd like to remove the cephfs file system pools and remake them. My cephfs is called 'data'. Running ceph fs delete data returns: Error EINVAL: all MDS daemons must be inactive before removing filesystem To make an MDS inactive I believe the command is: ceph mds deactivate 0 Which returns: telling mds.0 135.248.53.134:6809/16692 to deactivate Checking the status of the mds using: ceph mds stat returns: e105: 1/1/0 up {0=node2=up:stopping} This has been sitting at this status for the whole weekend with no change. I don't have any clients connected currently. When trying to manually just remove the pools, it's not allowed as there is a cephfs file system on them. I'm happy that all of the failsafe's to stop someone removing a pool are all working correctly. If this is currently undoable. Is there a way to quickly wipe a cephfs filesystem - using RM from a kernel client is really slow. Many thanks Warren Jeffs ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] 400 Errors uploadig files
Hello, it's about RADOS Gateway. S3-Clients get a 400 error uploading files lager or equal 2GB. For example tcpdump extracts: Uploading less than 2GB files: Client: PUT /tzzzvb/file.txt HTTP/1.1 User-Agent: CloudBerryLab.Base.HttpUtil.Client 4.0.6 (http://www.cloudberrylab.com/) x-amz-meta-cb-modifiedtime: Thu, 29 Jan 2015 16:34:26 GMT Content-Type: text/plain x-amz-date: Thu, 29 Jan 2015 16:39:33 GMT Authorization: AWS RMYHKQCATF6DYZIRKM6V:8h/D1VJhKKb3NIdofjC39DIg3Fo= Host: s3-domain Content-Length: 6 Expect: 100-continue Server: HTTP/1.1 100 Continue Client: file content Server: HTTP/1.1 200 OK Date: Thu, 29 Jan 2015 16:39:33 GMT Server: Apache/2.4.7 (Ubuntu) ETag: b53abb22d914d3c15953fae66d375b60 Accept-Ranges: bytes Content-Length: 0 Vary: Accept-Encoding Content-Type: application/xml Server expects the file Uploading file =2GB: Client: PUT /tzzzvb/KNOPPIX%5FV7.2.0DVD%2D2013%2D06%2D16%2DEN.iso HTTP/1.1 User-Agent: CloudBerryLab.Base.HttpUtil.Client 4.0.6 (http://www.cloudberrylab.com/) x-amz-meta-cb-modifiedtime: Fri, 25 Apr 2014 16:02:47 GMT Content-Type: application/octet-stream x-amz-date: Thu, 29 Jan 2015 16:24:52 GMT Authorization: AWS RMYHKQCATF6DYZIRKM6V:idbm5DarFmKyazGcVo29Kd7pVCI= Host: s3-domain Content-Length: 4111474688 Expect: 100-continue Server: HTTP/1.1 400 Bad Request Date: Thu, 29 Jan 2015 16:24:52 GMT Server: Apache/2.4.7 (Ubuntu) Accept-Ranges: bytes Content-Length: 26 Connection: close Content-Type: application/json {Code:InvalidArgument} 100-continue is turned on (rgw print continue = true), but the request is bad once the content-length is greater 2GB. Accourding to the specification it should be able to upload files up to 5GB without using multipart-upload. But there seems to be magic boarder... Tested with s3cmd and Cloudberry. Is it a configuration issue or may be a bug? We are using ceph giant 0.87 on Ubuntu 14.04.1 LTS kernel 3.13.0-44-generic ceph 0.87-1trusty apache2 2.4.7-1ubuntu4.1ceph1 libapache2-mod-fastcgi 2.4.7~0910052141-ceph1 [client.radosgw.gateway] host = rgw01 keyring = /etc/ceph/keyring.radosgw.gateway rgw socket path = /tmp/radosgw.sock admin socket = /var/run/ceph/radosgw.asok rgw enable ops log = true rgw enable usage log = true rgw usage log tick interval = 30 rgw usage log flush threshold = 1024 rgw usage max shards = 32 rgw usage max user shards = 1 log file = /var/log/radosgw/radosgw.log rgw print continue = true rgw dns name = s3-domain rgw resolve cname = true BR Eduard Kormann ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD capacity variance ?
Hi Howard, be default each OSD is weighed based on its capacity automatically. So the smaller OSDs will receive less data than the bigger ones. Be careful though in this case to properly monitor the utilization rate of all OSDs in your cluster so that one of them does not reach the odd_full ratio Read this link that will help you get e better view on Ceph data placement mechanisms. Cheers JC On Jan 31, 2015, at 14:39, Howard Thomson h...@thomsons.co.uk wrote: Hi All, I am developing a custom disk storage backend for the Bacula backup system, and am in the process of setting up a trial Ceph system, intending to use a direct interface to RADOS. I have a variety of 1Tb, 250Mb and 160Mb disk drives that I would like to use, but it is not [as yet] obvious as to whether having differences in capacity at different OSDs matters. Can anyone comment, or point me in the right direction on docs.ceph.com ? Thanks, Howard ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Internal performance counters in Ceph
Hello! I try to collect some info about Ceph performance on cluster. The question is, if I can collect all metrics from the cluster, or the only way to do it is to ask all nodes by ceph perf dump commands? Or, may be, there are some better ways to understand on what operations Ceph spend time? And another question, is the perf counters documented or their meaning I can understand only from names? Thanks for any help! --- Best regards, Kiseleva Alyona ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD slow requests causing disk aborts in KVM
To my surprise however these slow requests caused aborts from the block device on the VM side, which ended up corrupting files This is very strange, you shouldn't have corruption. Do you use writeback ? if yes, do you have disable barrier on your filesystem ? (What is the qemu version ? guest os ? guest os kernel ?) - Mail original - De: Krzysztof Nowicki krzysztof.a.nowi...@gmail.com À: ceph-users ceph-users@lists.ceph.com Envoyé: Vendredi 6 Février 2015 10:16:30 Objet: [ceph-users] OSD slow requests causing disk aborts in KVM Hi all, I'm running a small Ceph cluster with 4 OSD nodes, which serves as a storage backend for a set of KVM virtual machines. The VMs use RBD for disk storage. On the VM side I'm using virtio-scsi instead of virtio-blk in order to gain DISCARD support. Each OSD node is running on a separate machine, using 3TB WD Black drive + Samsung SSD for journal. The machines used for OSD nodes are not equal in spec. Three of them are small servers, while one is a desktop PC. The last node is the one causing trouble. During high loads caused by remapping due to one of the other nodes going down I've experienced some slow requests. To my surprise however these slow requests caused aborts from the block device on the VM side, which ended up corrupting files. What I wonder if such behaviour (aborts) is normal in case slow requests pile up. I always though that these requests would be delayed but eventually they'd be handled. Are there any tunables that would help me avoid such situations? I would really like to avoid VM outages caused by such corruption issues. I can attach some logs if needed. Best regards Chris ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD slow requests causing disk aborts in KVM
On Fri, Feb 6, 2015 at 12:16 PM, Krzysztof Nowicki krzysztof.a.nowi...@gmail.com wrote: Hi all, I'm running a small Ceph cluster with 4 OSD nodes, which serves as a storage backend for a set of KVM virtual machines. The VMs use RBD for disk storage. On the VM side I'm using virtio-scsi instead of virtio-blk in order to gain DISCARD support. Each OSD node is running on a separate machine, using 3TB WD Black drive + Samsung SSD for journal. The machines used for OSD nodes are not equal in spec. Three of them are small servers, while one is a desktop PC. The last node is the one causing trouble. During high loads caused by remapping due to one of the other nodes going down I've experienced some slow requests. To my surprise however these slow requests caused aborts from the block device on the VM side, which ended up corrupting files. What I wonder if such behaviour (aborts) is normal in case slow requests pile up. I always though that these requests would be delayed but eventually they'd be handled. Are there any tunables that would help me avoid such situations? I would really like to avoid VM outages caused by such corruption issues. I can attach some logs if needed. Best regards Chris Hi, this is unevitable payoff for using scsi backend on a storage which is capable to slow enough operations. There was some argonaut/bobtail-era discussions in ceph ml, may be those readings can be interesting for you. AFAIR the scsi disk would about after 70s of non-receiving ack state for a pending operation. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph mds zombie
hi, everybody Thang you for reading my question. my ceph cluster is 5 mon, 1 mds , 3 osd . When ceph cluster runned one day or some days, I can't cp some file from ceph. I use mount.ceph for client . The cp'command is zombie for a long long time ! When I restart mds , cp again , it work well . But after some days , I alse can't cp file from ceph cluster. The mon log , mds log ,osd log is good, ceph -w is healthy ok.What can i do ? Any advise is important to me ! Thant you very much ! ceph version is 0.80.5 , centos 6.4 x86 64bit , rpm install. Best Regards Kenmasida___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Supermicro hardware recommendation
On 02/08/2015 10:41 PM, Scott Laird wrote: Does anyone have a good recommendation for per-OSD memory for EC? My EC test blew up in my face when my OSDs suddenly spiked to 10+ GB per OSD process as soon as any reconstruction was needed. Which (of course) caused OSDs to OOM, which meant more reconstruction, which fairly immediately led to a dead cluster. This was with Giant. Is this typical? Doh, that shouldn't happen. Can you reproduce it? Would be especially nice if we could get a core dump or if you could make it happen under valgrind. If the CPUs are spinning, even a perf report might prove useful. On Fri Feb 06 2015 at 2:41:50 AM Mohamed Pakkeer mdfakk...@gmail.com mailto:mdfakk...@gmail.com wrote: Hi all, We are building EC cluster with cache tier for CephFS. We are planning to use the following 1U chassis along with Intel SSD DC S3700 for cache tier. It has 10 * 2.5 slots. Could you recommend a suitable Intel processor and amount of RAM to cater 10 * SSDs?. http://www.supermicro.com/products/system/1U/1028/SYS-1028R-WTRT.cfm Regards K.Mohamed Pakkeer On Fri, Feb 6, 2015 at 2:57 PM, Stephan Seitz s.se...@heinlein-support.de mailto:s.se...@heinlein-support.de wrote: Hi, Am Dienstag, den 03.02.2015, 15:16 + schrieb Colombo Marco: Hi all, I have to build a new Ceph storage cluster, after i‘ve read the hardware recommendations and some mail from this mailing list i would like to buy these servers: just FYI: SuperMicro already focuses on ceph with a productline: http://www.supermicro.com/solutions/datasheet_Ceph.pdf http://www.supermicro.com/solutions/storage_ceph.cfm regards, Stephan Seitz -- Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-44 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Thanks Regards K.Mohamed Pakkeer Mobile- 0091-8754410114 _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS removal.
Oh, hah, your initial email had a very delayed message delivery...probably got stuck in the moderation queue. :) On Thu, Feb 12, 2015 at 8:26 AM, warren.je...@stfc.ac.uk wrote: I am running 0.87, In the end I just wiped the cluster and started again - it was quicker. Warren -Original Message- From: Gregory Farnum [mailto:g...@gregs42.com] Sent: 12 February 2015 16:25 To: Jeffs, Warren (STFC,RAL,ISIS) Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] CephFS removal. What version of Ceph are you running? It's varied by a bit. But I think you want to just turn off the MDS and run the fail command — deactivate is actually the command for removing a logical MDS from the cluster, and you can't do that for a lone MDS because there's nobody to pass off the data to. I'll make a ticket to clarify this. When you've done that you should be able to delete it. -Greg On Mon, Feb 2, 2015 at 1:40 AM, warren.je...@stfc.ac.uk wrote: Hi All, Having a few problems removing cephfs file systems. I want to remove my current pools (was used for test data) – wiping all current data, and start a fresh file system on my current cluster. I have looked over the documentation but I can’t find anything on this. I have an object store pool, Which I don’t want to remove – but I’d like to remove the cephfs file system pools and remake them. My cephfs is called ‘data’. Running ceph fs delete data returns: Error EINVAL: all MDS daemons must be inactive before removing filesystem To make an MDS inactive I believe the command is: ceph mds deactivate 0 Which returns: telling mds.0 135.248.53.134:6809/16692 to deactivate Checking the status of the mds using: ceph mds stat returns: e105: 1/1/0 up {0=node2=up:stopping} This has been sitting at this status for the whole weekend with no change. I don’t have any clients connected currently. When trying to manually just remove the pools, it’s not allowed as there is a cephfs file system on them. I’m happy that all of the failsafe’s to stop someone removing a pool are all working correctly. If this is currently undoable. Is there a way to quickly wipe a cephfs filesystem – using RM from a kernel client is really slow. Many thanks Warren Jeffs ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Can't add RadosGW keyring to the cluster
Hi all, Trying to do this: ceph -k ceph.client.admin.keyring auth add client.radosgw.gateway -i ceph.client.radosgw.keyring Getting this error: Error EINVAL: entity client.radosgw.gateway exists but key does not match What can this be?? Thanks! Beanos___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com