Re: [ceph-users] Newbie Ceph Design Questions
Hi Christian, On 21.09.2014 07:18, Christian Balzer wrote: ... Personally I found ext4 to be faster than XFS in nearly all use cases and the lack of full, real kernel integration of ZFS is something that doesn't appeal to me either. a little bit OT... what kind of ext4-mount options do you use? I have an 5-node cluster with xfs (60 osds), and perhaps the performance with ext4 would be better?! For xfs I use osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M regards Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Merging two active ceph clusters: suggestions needed
On Sun, Sep 21, 2014 at 02:33:09PM +0900, Christian Balzer wrote: For a variety of reasons, none good anymore, we have two separate Ceph clusters. I would like to merge them onto the newer hardware, with as little downtime and data loss as possible; then discard the old hardware. Cluster A (2 hosts): - 3TB of S3 content, 100k files, file mtimes important - 500GB of RBD volumes, exported via iscsi Cluster B (4 hosts): - 50GiB of S3 content - 7TB of RBD volumes, exported via iscsi Short of finding somewhere to dump all of the data from one side, and re-importing it after merging with that cluster as empty; are there any other alternatives available to me? Having recently seen a similar question and the answer by the Ceph developers, no. As in there is no way (and no plans) for merging clusters. There are export functions for RBD volumes, not sure about S3 and the mtimes as I don't use that functionality. Can somebody else make comments about migrating S3 buckets with preserved mtime data (and all of the ACLs CORS) then? -- Robin Hugh Johnson Gentoo Linux: Developer, Infrastructure Lead E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS
Thanks. The results looked close to our results now. Thanks Jian -Original Message- From: Alexandre DERUMIER [mailto:aderum...@odiso.com] Sent: Friday, September 19, 2014 8:54 PM To: Zhang, Jian Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS Thanks for this great information. We are using Firefly. We will also try this later. Thanks Jian Oh, sorry,I have done a mistake when benching with fio (forgot to fill the osd with datas before the read benchmark). true results with 6 osd : bw=118129KB/s, iops=29532 - Mail original - De: Jian Zhang jian.zh...@intel.com À: Alexandre DERUMIER aderum...@odiso.com Cc: ceph-users@lists.ceph.com Envoyé: Vendredi 19 Septembre 2014 10:21:38 Objet: RE: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS Thanks for this great information. We are using Firefly. We will also try this later. Thanks Jian -Original Message- From: Alexandre DERUMIER [mailto:aderum...@odiso.com] Sent: Friday, September 19, 2014 3:00 PM To: Zhang, Jian Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS I'll do benchs with 6 osd dc3500 tomorrow to compare firefly and giant. Here the results (big giant improvements) 3 nodes with 2osd, replication x1 network is 2gigabit link with lacp for nodes and client firefly : no tunning - bw=45880KB/s, iops=11469 firefly with tuning: debug lockdep = 0/0 debug context = 0/0 debug crush = 0/0 debug buffer = 0/0 debug timer = 0/0 debug journaler = 0/0 debug osd = 0/0 debug optracker = 0/0 debug objclass = 0/0 debug filestore = 0/0 debug journal = 0/0 debug ms = 0/0 debug monc = 0/0 debug tp = 0/0 debug auth = 0/0 debug finisher = 0/0 debug heartbeatmap = 0/0 debug perfcounter = 0/0 debug asok = 0/0 debug throttle = 0/0 osd_op_threads = 5 filestore_op_threads = 4 bw=62094KB/s, iops=15523 giant with same tuning --- bw=247073KB/s, iops=61768 ! I think I could reach more, but my 2 gigabit link are satured. - Mail original - De: Alexandre DERUMIER aderum...@odiso.com À: Jian Zhang jian.zh...@intel.com Cc: ceph-users@lists.ceph.com Envoyé: Jeudi 18 Septembre 2014 15:36:48 Objet: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS Have anyone ever testing multi volume performance on a *FULL* SSD setup? I known that Stefan Priebe run full ssd clusters in production, and have done benchmark. (Ad far I remember, he have benched around 20k peak with dumpling) We are able to get ~18K IOPS for 4K random read on a single volume with fio (with rbd engine) on a 12x DC3700 Setup, but only able to get ~23K (peak) IOPS even with multiple volumes. Seems the maximum random write performance we can get on the entire cluster is quite close to single volume performance. Firefly or Giant ? I'll do benchs with 6 osd dc3500 tomorrow to compare firefly and giant. - Mail original - De: Jian Zhang jian.zh...@intel.com À: Sebastien Han sebastien@enovance.com, Alexandre DERUMIER aderum...@odiso.com Cc: ceph-users@lists.ceph.com Envoyé: Jeudi 18 Septembre 2014 08:12:32 Objet: RE: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS Have anyone ever testing multi volume performance on a *FULL* SSD setup? We are able to get ~18K IOPS for 4K random read on a single volume with fio (with rbd engine) on a 12x DC3700 Setup, but only able to get ~23K (peak) IOPS even with multiple volumes. Seems the maximum random write performance we can get on the entire cluster is quite close to single volume performance. Thanks Jian -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Sebastien Han Sent: Tuesday, September 16, 2014 9:33 PM To: Alexandre DERUMIER Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS Hi, Thanks for keeping us updated on this subject. dsync is definitely killing the ssd. I don't have much to add, I'm just surprised that you're only getting 5299 with 0.85 since I've been able to get 6,4K, well I was using the 200GB model, that might explain this. On 12 Sep 2014, at 16:32, Alexandre DERUMIER aderum...@odiso.com wrote: here the results for the intel s3500 max performance is with ceph 0.85 + optracker disabled. intel s3500 don't have d_sync problem like crucial %util show almost 100% for read and write, so maybe the ssd disk performance is the limit. I have some stec zeusram 8GB in stock (I used them for zfs zil), I'll try to bench them next week. INTEL s3500 --- raw disk randread: fio --filename=/dev/sdb --direct=1 --rw=randread --bs=4k --iodepth=32
Re: [ceph-users] Newbie Ceph Design Questions
Hello, On Sun, 21 Sep 2014 21:00:48 +0200 Udo Lembke wrote: Hi Christian, On 21.09.2014 07:18, Christian Balzer wrote: ... Personally I found ext4 to be faster than XFS in nearly all use cases and the lack of full, real kernel integration of ZFS is something that doesn't appeal to me either. a little bit OT... what kind of ext4-mount options do you use? I have an 5-node cluster with xfs (60 osds), and perhaps the performance with ext4 would be better?! Hard to tell w/o testing your particular load, I/O patterns. When benchmarking directly with single disks or RAIDs it is fairly straightforward to see: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-April/028540.html Also note that the actual question has never been answered by the Ceph team, which is a shame as I venture that it would make things faster. Now testing with Ceph on top makes things a lot more murky, as Ceph currently is so inefficient (see the various OSD on SSD performance threads) that the underlying FS only makes a difference in certain scenarios. In the next few weeks I will probably have the chance to play with a cluster before it goes into production and refine those findings. I found that ext4 definitely fragments less than XFS (using RBD for VM images only here) over long term use. Nothing particular for mount options, but the biggest performance boost for ext4 is to give it the largest journal possible (-J size=1024) at mkfs time. For xfs I use osd_mount_options_xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M That looks pretty much like the optimum settings for XFS indeed. Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Troubleshooting down OSDs: Invalid command: ceph osd start osd.1
Hi Loïc, It seams there is another error in the documentation at (http://ceph.com/docs/argonaut/init/stop-cluster/) I believe sudo service -a ceph stop Should probably read sudo service ceph -a stop Cheers On 19 Sep 2014, at 6:33 pm, Loic Dachary l...@dachary.org wrote: Hi, The documentation indeed contains an example that does not work. This should fix it : https://github.com/dachary/ceph/commit/be97b7d5b89d7021f71695b4c1b78830bad4dab6 Cheers On 19/09/2014 08:06, Piers Dawson-Damer wrote: Has the command for manually starting and stopping OSDs changed? The documentation for troubleshooting OSDs (http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/) mentions restarting OSDs with the command; ceph osd start osd.{num} Yet I find, using Firefly 0.80.5 piers@sol:/etc/ceph$ ceph osd start osd.1 no valid command found; 10 closest matches: osd tier remove poolname poolname osd tier cache-mode poolname none|writeback|forward|readonly osd thrash int[0-] osd tier add poolname poolname {--force-nonempty} osd pool stats {name} osd reweight-by-utilization {int[100-]} osd pool set poolname size|min_size|crash_replay_interval|pg_num|pgp_num|crush_ruleset|hashpspool|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|debug_fake_ec_pool|target_max_bytes|target_max_objects|cache_target_dirty_ratio|cache_target_full_ratio|cache_min_flush_age|cache_min_evict_age|auid val {--yes-i-really-mean-it} osd pool set-quota poolname max_objects|max_bytes val osd pool rename poolname poolname osd pool get poolname size|min_size|crash_replay_interval|pg_num|pgp_num|crush_ruleset|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|auid Error EINVAL: invalid command ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com