Re: [ceph-users] pgs unfound
Hi, I managed to remove the warning reweighting the crashed OSD: ceph osd crush reweight osd.33 0.8 After the recovery, the cluster is not showing the warning any more Xabier On 29/11/16 11:18, Xabier Elkano wrote: > Hi all, > > my cluster is in WARN state because apparently there are some pgs > unfound. I think that I reached this situation because the metadata > pool, this pool was in default root but without any use because I don't > use cephfs, I only use rbd for VMs. I don't have OSDs in the default > root, they are assigned to different roots depending on its disk type. > > My pools have specific crush rules to use the different roots. Again, > all my pools, but not the metadata pool, it was assigned to the default > root. > With this situation I got problems in one OSD (I had to reset it from > scratch) and when I restored the situation I got some pgs unfound > because they where in the faulty OSD and belonged to the metadata pool > with size 1. Because I didn't care of any data in the metadata pool I > created the unfounds pgs again with "ceph pg force_create_pg 1.25". > Finally I set a crush rule to the metadata pool to change its location > and the pgs were created. > > But now, the cluster is showing 29 unfound pgs, but without saying what pgs. > How can I recover from this situation? Can I remove metadata pool and > recreate it again? > > > # ceph status > cluster 72a4a18b-ec5c-454d-9135-04362c97c307 > health HEALTH_WARN > recovery 29/2748828 unfound (0.001%) > monmap e13: 5 mons at > {mon1=172.16.64.12:6789/0,mon2=172.16.64.13:6789/0,mon3=172.16.64.16:6789/0,mon4=172.16.64.30:6789/0,mon4=172.16.64.31:6789/0} > election epoch 99672, quorum 0,1,2,3,4 mon1,mon2,mon2,mon3,mon4 > mdsmap e35323: 0/0/1 up > osdmap e49648: 38 osds: 38 up, 38 in > pgmap v76150847: 3065 pgs, 21 pools, 10654 GB data, 2684 kobjects > 3 GB used, 25423 GB / 56534 GB avail > 29/2748828 unfound (0.001%) > 3063 active+clean >2 active+clean+scrubbing > client io 4431 kB/s rd, 15897 kB/s wr, 2385 op/s > > > # ceph health detail > HEALTH_WARN recovery 29/2748829 unfound (0.001%) > recovery 29/2748829 unfound (0.001%) > > > My cluster runs hammer 0.94.9 > 5 Servers with 7 OSDs each on Ubuntu 14.04 > 5 monitor servers. > > Thanks and Best regards, > Xabier > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] pgs unfound
Hi all, my cluster is in WARN state because apparently there are some pgs unfound. I think that I reached this situation because the metadata pool, this pool was in default root but without any use because I don't use cephfs, I only use rbd for VMs. I don't have OSDs in the default root, they are assigned to different roots depending on its disk type. My pools have specific crush rules to use the different roots. Again, all my pools, but not the metadata pool, it was assigned to the default root. With this situation I got problems in one OSD (I had to reset it from scratch) and when I restored the situation I got some pgs unfound because they where in the faulty OSD and belonged to the metadata pool with size 1. Because I didn't care of any data in the metadata pool I created the unfounds pgs again with "ceph pg force_create_pg 1.25". Finally I set a crush rule to the metadata pool to change its location and the pgs were created. But now, the cluster is showing 29 unfound pgs, but without saying what pgs. How can I recover from this situation? Can I remove metadata pool and recreate it again? # ceph status cluster 72a4a18b-ec5c-454d-9135-04362c97c307 health HEALTH_WARN recovery 29/2748828 unfound (0.001%) monmap e13: 5 mons at {mon1=172.16.64.12:6789/0,mon2=172.16.64.13:6789/0,mon3=172.16.64.16:6789/0,mon4=172.16.64.30:6789/0,mon4=172.16.64.31:6789/0} election epoch 99672, quorum 0,1,2,3,4 mon1,mon2,mon2,mon3,mon4 mdsmap e35323: 0/0/1 up osdmap e49648: 38 osds: 38 up, 38 in pgmap v76150847: 3065 pgs, 21 pools, 10654 GB data, 2684 kobjects 3 GB used, 25423 GB / 56534 GB avail 29/2748828 unfound (0.001%) 3063 active+clean 2 active+clean+scrubbing client io 4431 kB/s rd, 15897 kB/s wr, 2385 op/s # ceph health detail HEALTH_WARN recovery 29/2748829 unfound (0.001%) recovery 29/2748829 unfound (0.001%) My cluster runs hammer 0.94.9 5 Servers with 7 OSDs each on Ubuntu 14.04 5 monitor servers. Thanks and Best regards, Xabier ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Correct method to deploy on jessie
Hi, you can add monitors and OSDs manually: http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/ http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/ I've deployed all my cluster without ceph-deploy. Xabier On jue, 2015-10-01 at 16:54 +1000, Dmitry Ogorodnikov wrote: > Dear all, > > I'm trying to install ceph on Debian Jessie using pile of various > manuals, and have no luck so far. > Problem is ceph doesnt have jessie repositories. Jessie has own > repositories of ceph, but doesnt have ceph-deploy. As far as > ceph-deploy is a part of most documented manipulations, I can't > operate ceph without it. > > If I use wheezy machine with installed ceph-deploy for deploying on > jessie, it try to stick to ceph.com repositories for jessie (which > doesn't exist, as we remember). Deploy fails. > > If I use ceph-deploy with debian repositories (--no-adjust-repos) then > following quickstart manual causes multiple errors on several steps. I > can overcome some of those errors using various crutches, but I think > this way is not ok. > > So, please provide a manual for install on jessie. Or (better) provide > repository. Please. > > Have a nice day. > Dmitry. > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Xabier Elkano Departamento Técnico Hostinet S.L.U. http://www.hostinet.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] radosgw and keystone users
Hi, I'm just deployed the ceph object gateway as an object storage in OpenStack. I've followed this doc to achieve the integration with Keystone: http://docs.ceph.com/docs/master/radosgw/keystone/ "It is possible to integrate the Ceph Object Gateway with Keystone, the OpenStack identity service. This sets up the gateway to accept Keystone as the users authority. A user that Keystone authorizes to access the gateway will also be automatically created on the Ceph Object Gateway (if didn’t exist beforehand). A token that Keystone validates will be considered as valid by the gateway." According to it, I was expecting that the keystone user was created in radosgw when it was authorized by a keystone token, but instead, what is created is the tenant id of the project that the user uses to manage his objects. # radosgw-admin user stats --uid=db4d25b13eaa4645a180f564b3817e1c { "stats": { "total_entries": 1, "total_bytes": 24546, "total_bytes_rounded": 24576}, "last_stats_sync": "2015-09-25 12:09:12.795775Z", "last_stats_update": "2015-09-28 11:58:43.422859Z"} Being that "db4d25b13eaa4645a180f564b3817e1c" is the project id I'm using. Is this the expected behavior and the doc pointed me in the wrong direction or I misconfigured something? Really, I prefer this behavior, because in this way I can set quotas on a project basis without worrying about the users, but I would like to know if the integration is Ok. My rados setup: [client.radosgw.gateway] host = hostname keyring = /etc/ceph/ceph.client.radosgw.keyring rgw socket path = "" log file = /var/log/radosgw/client.radosgw.gateway.log rgw frontends = fastcgi socket_port=9000 socket_host=0.0.0.0 rgw print continue = false rgw keystone url = http://keystone_host:5000 rgw keystone admin token = _ rgw keystone accepted roles = _member_, Member, admin rgw s3 auth use keystone = true nss db path = /var/ceph/nss Ceph FireFly 0.80.10 OpenStack Juno SO: Ubuntu 14.04 Best regards, Xabier ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] performance tests
El 09/07/14 16:53, Christian Balzer escribió: On Wed, 09 Jul 2014 07:07:50 -0500 Mark Nelson wrote: On 07/09/2014 06:52 AM, Xabier Elkano wrote: El 09/07/14 13:10, Mark Nelson escribió: On 07/09/2014 05:57 AM, Xabier Elkano wrote: Hi, I was doing some tests in my cluster with fio tool, one fio instance with 70 jobs, each job writing 1GB random with 4K block size. I did this test with 3 variations: 1- Creating 70 images, 60GB each, in the pool. Using rbd kernel module, format and mount each image as ext4. Each fio job writing in a separate image/directory. (ioengine=libaio, queue_depth=4, direct=1) IOPS: 6542 AVG LAT: 41ms 2- Creating 1 large image 4,2TB in the pool. Using rbd kernel module, format and mount the image as ext4. Each fio job writing in a separate file in the same directory. (ioengine=libaio, queue_depth=4,direct=1) IOPS: 5899 AVG LAT: 47ms 3- Creating 1 large image 4,2TB in the pool. Use ioengine rbd in fio to access the image through librados. (ioengine=rbd, queue_depth=4,direct=1) IOPS: 2638 AVG LAT: 96ms Do these results make sense? From Ceph perspective, It is better to have many small images than a larger one? What is the best approach to simulate the workload of 70 VMs? I'm not sure the difference between the first two cases is enough to say much yet. You may need to repeat the test a couple of times to ensure that the difference is more than noise. having said that, if we are seeing an effect, it would be interesting to know what the latency distribution is like. is it consistently worse in the 2nd case or do we see higher spikes at specific times? I've repeated the tests with similar results. Each test is done with a clean new rbd image, first removing any existing images in the pool and then creating the new image. Between tests I am running: echo 3 /proc/sys/vm/drop_caches - In the first test I've created 70 images (60G) and mounted them: /dev/rbd1 on /mnt/fiotest/vtest0 /dev/rbd2 on /mnt/fiotest/vtest1 .. /dev/rbd70 on /mnt/fiotest/vtest69 fio output: rand-write-4k: (groupid=0, jobs=70): err= 0: pid=21852: Tue Jul 8 14:52:56 2014 write: io=2559.5MB, bw=26179KB/s, iops=6542, runt=100116msec slat (usec): min=18, max=512646, avg=4002.62, stdev=13754.33 clat (usec): min=867, max=579715, avg=37581.64, stdev=55954.19 lat (usec): min=903, max=586022, avg=41957.74, stdev=59276.40 clat percentiles (msec): | 1.00th=[5], 5.00th=[ 10], 10.00th=[ 13], 20.00th=[ 18], | 30.00th=[ 21], 40.00th=[ 26], 50.00th=[ 31], 60.00th=[ 34], | 70.00th=[ 37], 80.00th=[ 41], 90.00th=[ 48], 95.00th=[ 61], | 99.00th=[ 404], 99.50th=[ 445], 99.90th=[ 494], 99.95th=[ 515], | 99.99th=[ 553] bw (KB /s): min=0, max= 694, per=1.46%, avg=383.29, stdev=148.01 lat (usec) : 1000=0.01% lat (msec) : 2=0.12%, 4=0.63%, 10=4.82%, 20=22.33%, 50=63.97% lat (msec) : 100=5.61%, 250=0.47%, 500=2.01%, 750=0.08% cpu : usr=0.69%, sys=2.57%, ctx=1525021, majf=0, minf=2405 IO depths: 1=1.1%, 2=0.6%, 4=335.8%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% issued: total=r=0/w=655015/d=0, short=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=4 Run status group 0 (all jobs): WRITE: io=2559.5MB, aggrb=26178KB/s, minb=26178KB/s, maxb=26178KB/s, mint=100116msec, maxt=100116msec Disk stats (read/write): rbd1: ios=0/2408612, merge=0/979004, ticks=0/39436432, in_queue=39459720, util=99.68% - In the second test I only created one large image (4,2T) /dev/rbd1 on /mnt/fiotest/vtest0 type ext4 (rw,noatime,nodiratime,data=ordered) fio output: rand-write-4k: (groupid=0, jobs=70): err= 0: pid=8907: Wed Jul 9 13:38:14 2014 write: io=2264.6MB, bw=23143KB/s, iops=5783, runt=100198msec slat (usec): min=0, max=3099.8K, avg=4131.91, stdev=21388.98 clat (usec): min=850, max=3133.1K, avg=43337.56, stdev=93830.42 lat (usec): min=930, max=3147.5K, avg=48253.22, stdev=100642.53 clat percentiles (msec): | 1.00th=[5], 5.00th=[ 11], 10.00th=[ 14], 20.00th=[ 19], | 30.00th=[ 24], 40.00th=[ 29], 50.00th=[ 33], 60.00th=[ 36], | 70.00th=[ 39], 80.00th=[ 43], 90.00th=[ 51], 95.00th=[ 68], | 99.00th=[ 506], 99.50th=[ 553], 99.90th=[ 717], 99.95th=[ 783], | 99.99th=[ 3130] bw (KB /s): min=0, max= 680, per=1.54%, avg=355.39, stdev=156.10 lat (usec) : 1000=0.01% lat (msec) : 2=0.12%, 4=0.66%, 10=4.21%, 20=17.82%, 50=66.95% lat (msec) : 100=7.34%, 250=0.78%, 500=1.10%, 750=0.99%, 1000=0.02% lat (msec) : =2000=0.04% cpu : usr=0.65%, sys=2.45%, ctx=1434322, majf=0, minf=2399 IO depths: 1=0.2%, 2=0.1%, 4=365.4%, 8=0.0%, 16=0.0%, 32=0.0
Re: [ceph-users] performance tests
El 10/07/14 09:18, Christian Balzer escribió: On Thu, 10 Jul 2014 08:57:56 +0200 Xabier Elkano wrote: El 09/07/14 16:53, Christian Balzer escribió: On Wed, 09 Jul 2014 07:07:50 -0500 Mark Nelson wrote: On 07/09/2014 06:52 AM, Xabier Elkano wrote: El 09/07/14 13:10, Mark Nelson escribió: On 07/09/2014 05:57 AM, Xabier Elkano wrote: Hi, I was doing some tests in my cluster with fio tool, one fio instance with 70 jobs, each job writing 1GB random with 4K block size. I did this test with 3 variations: 1- Creating 70 images, 60GB each, in the pool. Using rbd kernel module, format and mount each image as ext4. Each fio job writing in a separate image/directory. (ioengine=libaio, queue_depth=4, direct=1) IOPS: 6542 AVG LAT: 41ms 2- Creating 1 large image 4,2TB in the pool. Using rbd kernel module, format and mount the image as ext4. Each fio job writing in a separate file in the same directory. (ioengine=libaio, queue_depth=4,direct=1) IOPS: 5899 AVG LAT: 47ms 3- Creating 1 large image 4,2TB in the pool. Use ioengine rbd in fio to access the image through librados. (ioengine=rbd, queue_depth=4,direct=1) IOPS: 2638 AVG LAT: 96ms Do these results make sense? From Ceph perspective, It is better to have many small images than a larger one? What is the best approach to simulate the workload of 70 VMs? I'm not sure the difference between the first two cases is enough to say much yet. You may need to repeat the test a couple of times to ensure that the difference is more than noise. having said that, if we are seeing an effect, it would be interesting to know what the latency distribution is like. is it consistently worse in the 2nd case or do we see higher spikes at specific times? I've repeated the tests with similar results. Each test is done with a clean new rbd image, first removing any existing images in the pool and then creating the new image. Between tests I am running: echo 3 /proc/sys/vm/drop_caches - In the first test I've created 70 images (60G) and mounted them: /dev/rbd1 on /mnt/fiotest/vtest0 /dev/rbd2 on /mnt/fiotest/vtest1 .. /dev/rbd70 on /mnt/fiotest/vtest69 fio output: rand-write-4k: (groupid=0, jobs=70): err= 0: pid=21852: Tue Jul 8 14:52:56 2014 write: io=2559.5MB, bw=26179KB/s, iops=6542, runt=100116msec slat (usec): min=18, max=512646, avg=4002.62, stdev=13754.33 clat (usec): min=867, max=579715, avg=37581.64, stdev=55954.19 lat (usec): min=903, max=586022, avg=41957.74, stdev=59276.40 clat percentiles (msec): | 1.00th=[5], 5.00th=[ 10], 10.00th=[ 13], 20.00th=[ 18], | 30.00th=[ 21], 40.00th=[ 26], 50.00th=[ 31], 60.00th=[ 34], | 70.00th=[ 37], 80.00th=[ 41], 90.00th=[ 48], 95.00th=[ 61], | 99.00th=[ 404], 99.50th=[ 445], 99.90th=[ 494], 99.95th=[ 515], | 99.99th=[ 553] bw (KB /s): min=0, max= 694, per=1.46%, avg=383.29, stdev=148.01 lat (usec) : 1000=0.01% lat (msec) : 2=0.12%, 4=0.63%, 10=4.82%, 20=22.33%, 50=63.97% lat (msec) : 100=5.61%, 250=0.47%, 500=2.01%, 750=0.08% cpu : usr=0.69%, sys=2.57%, ctx=1525021, majf=0, minf=2405 IO depths: 1=1.1%, 2=0.6%, 4=335.8%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% issued: total=r=0/w=655015/d=0, short=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=4 Run status group 0 (all jobs): WRITE: io=2559.5MB, aggrb=26178KB/s, minb=26178KB/s, maxb=26178KB/s, mint=100116msec, maxt=100116msec Disk stats (read/write): rbd1: ios=0/2408612, merge=0/979004, ticks=0/39436432, in_queue=39459720, util=99.68% - In the second test I only created one large image (4,2T) /dev/rbd1 on /mnt/fiotest/vtest0 type ext4 (rw,noatime,nodiratime,data=ordered) fio output: rand-write-4k: (groupid=0, jobs=70): err= 0: pid=8907: Wed Jul 9 13:38:14 2014 write: io=2264.6MB, bw=23143KB/s, iops=5783, runt=100198msec slat (usec): min=0, max=3099.8K, avg=4131.91, stdev=21388.98 clat (usec): min=850, max=3133.1K, avg=43337.56, stdev=93830.42 lat (usec): min=930, max=3147.5K, avg=48253.22, stdev=100642.53 clat percentiles (msec): | 1.00th=[5], 5.00th=[ 11], 10.00th=[ 14], 20.00th=[ 19], | 30.00th=[ 24], 40.00th=[ 29], 50.00th=[ 33], 60.00th=[ 36], | 70.00th=[ 39], 80.00th=[ 43], 90.00th=[ 51], 95.00th=[ 68], | 99.00th=[ 506], 99.50th=[ 553], 99.90th=[ 717], 99.95th=[ 783], | 99.99th=[ 3130] bw (KB /s): min=0, max= 680, per=1.54%, avg=355.39, stdev=156.10 lat (usec) : 1000=0.01% lat (msec) : 2=0.12%, 4=0.66%, 10=4.21%, 20=17.82%, 50=66.95% lat (msec) : 100=7.34%, 250=0.78%, 500=1.10%, 750=0.99%, 1000=0.02% lat (msec) : =2000=0.04% cpu : usr=0.65%, sys=2.45
[ceph-users] performance tests
Hi, I was doing some tests in my cluster with fio tool, one fio instance with 70 jobs, each job writing 1GB random with 4K block size. I did this test with 3 variations: 1- Creating 70 images, 60GB each, in the pool. Using rbd kernel module, format and mount each image as ext4. Each fio job writing in a separate image/directory. (ioengine=libaio, queue_depth=4, direct=1) IOPS: 6542 AVG LAT: 41ms 2- Creating 1 large image 4,2TB in the pool. Using rbd kernel module, format and mount the image as ext4. Each fio job writing in a separate file in the same directory. (ioengine=libaio, queue_depth=4,direct=1) IOPS: 5899 AVG LAT: 47ms 3- Creating 1 large image 4,2TB in the pool. Use ioengine rbd in fio to access the image through librados. (ioengine=rbd, queue_depth=4,direct=1) IOPS: 2638 AVG LAT: 96ms Do these results make sense? From Ceph perspective, It is better to have many small images than a larger one? What is the best approach to simulate the workload of 70 VMs? thanks in advance or any help, Xabier ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] performance tests
El 09/07/14 13:10, Mark Nelson escribió: On 07/09/2014 05:57 AM, Xabier Elkano wrote: Hi, I was doing some tests in my cluster with fio tool, one fio instance with 70 jobs, each job writing 1GB random with 4K block size. I did this test with 3 variations: 1- Creating 70 images, 60GB each, in the pool. Using rbd kernel module, format and mount each image as ext4. Each fio job writing in a separate image/directory. (ioengine=libaio, queue_depth=4, direct=1) IOPS: 6542 AVG LAT: 41ms 2- Creating 1 large image 4,2TB in the pool. Using rbd kernel module, format and mount the image as ext4. Each fio job writing in a separate file in the same directory. (ioengine=libaio, queue_depth=4,direct=1) IOPS: 5899 AVG LAT: 47ms 3- Creating 1 large image 4,2TB in the pool. Use ioengine rbd in fio to access the image through librados. (ioengine=rbd, queue_depth=4,direct=1) IOPS: 2638 AVG LAT: 96ms Do these results make sense? From Ceph perspective, It is better to have many small images than a larger one? What is the best approach to simulate the workload of 70 VMs? I'm not sure the difference between the first two cases is enough to say much yet. You may need to repeat the test a couple of times to ensure that the difference is more than noise. having said that, if we are seeing an effect, it would be interesting to know what the latency distribution is like. is it consistently worse in the 2nd case or do we see higher spikes at specific times? I've repeated the tests with similar results. Each test is done with a clean new rbd image, first removing any existing images in the pool and then creating the new image. Between tests I am running: echo 3 /proc/sys/vm/drop_caches - In the first test I've created 70 images (60G) and mounted them: /dev/rbd1 on /mnt/fiotest/vtest0 /dev/rbd2 on /mnt/fiotest/vtest1 .. /dev/rbd70 on /mnt/fiotest/vtest69 fio output: rand-write-4k: (groupid=0, jobs=70): err= 0: pid=21852: Tue Jul 8 14:52:56 2014 write: io=2559.5MB, bw=26179KB/s, iops=6542, runt=100116msec slat (usec): min=18, max=512646, avg=4002.62, stdev=13754.33 clat (usec): min=867, max=579715, avg=37581.64, stdev=55954.19 lat (usec): min=903, max=586022, avg=41957.74, stdev=59276.40 clat percentiles (msec): | 1.00th=[5], 5.00th=[ 10], 10.00th=[ 13], 20.00th=[ 18], | 30.00th=[ 21], 40.00th=[ 26], 50.00th=[ 31], 60.00th=[ 34], | 70.00th=[ 37], 80.00th=[ 41], 90.00th=[ 48], 95.00th=[ 61], | 99.00th=[ 404], 99.50th=[ 445], 99.90th=[ 494], 99.95th=[ 515], | 99.99th=[ 553] bw (KB /s): min=0, max= 694, per=1.46%, avg=383.29, stdev=148.01 lat (usec) : 1000=0.01% lat (msec) : 2=0.12%, 4=0.63%, 10=4.82%, 20=22.33%, 50=63.97% lat (msec) : 100=5.61%, 250=0.47%, 500=2.01%, 750=0.08% cpu : usr=0.69%, sys=2.57%, ctx=1525021, majf=0, minf=2405 IO depths: 1=1.1%, 2=0.6%, 4=335.8%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% issued: total=r=0/w=655015/d=0, short=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=4 Run status group 0 (all jobs): WRITE: io=2559.5MB, aggrb=26178KB/s, minb=26178KB/s, maxb=26178KB/s, mint=100116msec, maxt=100116msec Disk stats (read/write): rbd1: ios=0/2408612, merge=0/979004, ticks=0/39436432, in_queue=39459720, util=99.68% - In the second test I only created one large image (4,2T) /dev/rbd1 on /mnt/fiotest/vtest0 type ext4 (rw,noatime,nodiratime,data=ordered) fio output: rand-write-4k: (groupid=0, jobs=70): err= 0: pid=8907: Wed Jul 9 13:38:14 2014 write: io=2264.6MB, bw=23143KB/s, iops=5783, runt=100198msec slat (usec): min=0, max=3099.8K, avg=4131.91, stdev=21388.98 clat (usec): min=850, max=3133.1K, avg=43337.56, stdev=93830.42 lat (usec): min=930, max=3147.5K, avg=48253.22, stdev=100642.53 clat percentiles (msec): | 1.00th=[5], 5.00th=[ 11], 10.00th=[ 14], 20.00th=[ 19], | 30.00th=[ 24], 40.00th=[ 29], 50.00th=[ 33], 60.00th=[ 36], | 70.00th=[ 39], 80.00th=[ 43], 90.00th=[ 51], 95.00th=[ 68], | 99.00th=[ 506], 99.50th=[ 553], 99.90th=[ 717], 99.95th=[ 783], | 99.99th=[ 3130] bw (KB /s): min=0, max= 680, per=1.54%, avg=355.39, stdev=156.10 lat (usec) : 1000=0.01% lat (msec) : 2=0.12%, 4=0.66%, 10=4.21%, 20=17.82%, 50=66.95% lat (msec) : 100=7.34%, 250=0.78%, 500=1.10%, 750=0.99%, 1000=0.02% lat (msec) : =2000=0.04% cpu : usr=0.65%, sys=2.45%, ctx=1434322, majf=0, minf=2399 IO depths: 1=0.2%, 2=0.1%, 4=365.4%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% issued: total=r=0/w
Re: [ceph-users] performance tests
El 09/07/14 14:07, Mark Nelson escribió: On 07/09/2014 06:52 AM, Xabier Elkano wrote: El 09/07/14 13:10, Mark Nelson escribió: On 07/09/2014 05:57 AM, Xabier Elkano wrote: Hi, I was doing some tests in my cluster with fio tool, one fio instance with 70 jobs, each job writing 1GB random with 4K block size. I did this test with 3 variations: 1- Creating 70 images, 60GB each, in the pool. Using rbd kernel module, format and mount each image as ext4. Each fio job writing in a separate image/directory. (ioengine=libaio, queue_depth=4, direct=1) IOPS: 6542 AVG LAT: 41ms 2- Creating 1 large image 4,2TB in the pool. Using rbd kernel module, format and mount the image as ext4. Each fio job writing in a separate file in the same directory. (ioengine=libaio, queue_depth=4,direct=1) IOPS: 5899 AVG LAT: 47ms 3- Creating 1 large image 4,2TB in the pool. Use ioengine rbd in fio to access the image through librados. (ioengine=rbd, queue_depth=4,direct=1) IOPS: 2638 AVG LAT: 96ms Do these results make sense? From Ceph perspective, It is better to have many small images than a larger one? What is the best approach to simulate the workload of 70 VMs? I'm not sure the difference between the first two cases is enough to say much yet. You may need to repeat the test a couple of times to ensure that the difference is more than noise. having said that, if we are seeing an effect, it would be interesting to know what the latency distribution is like. is it consistently worse in the 2nd case or do we see higher spikes at specific times? I've repeated the tests with similar results. Each test is done with a clean new rbd image, first removing any existing images in the pool and then creating the new image. Between tests I am running: echo 3 /proc/sys/vm/drop_caches - In the first test I've created 70 images (60G) and mounted them: /dev/rbd1 on /mnt/fiotest/vtest0 /dev/rbd2 on /mnt/fiotest/vtest1 .. /dev/rbd70 on /mnt/fiotest/vtest69 fio output: rand-write-4k: (groupid=0, jobs=70): err= 0: pid=21852: Tue Jul 8 14:52:56 2014 write: io=2559.5MB, bw=26179KB/s, iops=6542, runt=100116msec slat (usec): min=18, max=512646, avg=4002.62, stdev=13754.33 clat (usec): min=867, max=579715, avg=37581.64, stdev=55954.19 lat (usec): min=903, max=586022, avg=41957.74, stdev=59276.40 clat percentiles (msec): | 1.00th=[5], 5.00th=[ 10], 10.00th=[ 13], 20.00th=[ 18], | 30.00th=[ 21], 40.00th=[ 26], 50.00th=[ 31], 60.00th=[ 34], | 70.00th=[ 37], 80.00th=[ 41], 90.00th=[ 48], 95.00th=[ 61], | 99.00th=[ 404], 99.50th=[ 445], 99.90th=[ 494], 99.95th=[ 515], | 99.99th=[ 553] bw (KB /s): min=0, max= 694, per=1.46%, avg=383.29, stdev=148.01 lat (usec) : 1000=0.01% lat (msec) : 2=0.12%, 4=0.63%, 10=4.82%, 20=22.33%, 50=63.97% lat (msec) : 100=5.61%, 250=0.47%, 500=2.01%, 750=0.08% cpu : usr=0.69%, sys=2.57%, ctx=1525021, majf=0, minf=2405 IO depths: 1=1.1%, 2=0.6%, 4=335.8%, 8=0.0%, 16=0.0%, 32=0.0%, =64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, =64=0.0% issued: total=r=0/w=655015/d=0, short=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=4 Run status group 0 (all jobs): WRITE: io=2559.5MB, aggrb=26178KB/s, minb=26178KB/s, maxb=26178KB/s, mint=100116msec, maxt=100116msec Disk stats (read/write): rbd1: ios=0/2408612, merge=0/979004, ticks=0/39436432, in_queue=39459720, util=99.68% - In the second test I only created one large image (4,2T) /dev/rbd1 on /mnt/fiotest/vtest0 type ext4 (rw,noatime,nodiratime,data=ordered) fio output: rand-write-4k: (groupid=0, jobs=70): err= 0: pid=8907: Wed Jul 9 13:38:14 2014 write: io=2264.6MB, bw=23143KB/s, iops=5783, runt=100198msec slat (usec): min=0, max=3099.8K, avg=4131.91, stdev=21388.98 clat (usec): min=850, max=3133.1K, avg=43337.56, stdev=93830.42 lat (usec): min=930, max=3147.5K, avg=48253.22, stdev=100642.53 clat percentiles (msec): | 1.00th=[5], 5.00th=[ 11], 10.00th=[ 14], 20.00th=[ 19], | 30.00th=[ 24], 40.00th=[ 29], 50.00th=[ 33], 60.00th=[ 36], | 70.00th=[ 39], 80.00th=[ 43], 90.00th=[ 51], 95.00th=[ 68], | 99.00th=[ 506], 99.50th=[ 553], 99.90th=[ 717], 99.95th=[ 783], | 99.99th=[ 3130] bw (KB /s): min=0, max= 680, per=1.54%, avg=355.39, stdev=156.10 lat (usec) : 1000=0.01% lat (msec) : 2=0.12%, 4=0.66%, 10=4.21%, 20=17.82%, 50=66.95% lat (msec) : 100=7.34%, 250=0.78%, 500=1.10%, 750=0.99%, 1000=0.02% lat (msec) : =2000=0.04% cpu : usr=0.65%, sys=2.45%, ctx=1434322, majf=0, minf=2399 IO depths: 1=0.2%, 2=0.1%, 4=365.4%, 8=0.0
Re: [ceph-users] performance tests
El 09/07/14 13:14, hua peng escribió: what're the IO throughput (MB/s) for the test cases? Thanks. Hi Hua, the throughput in each test is IOPS x 4K block size, all tests are random write. Xabier On 14-7-9 下午6:57, Xabier Elkano wrote: Hi, I was doing some tests in my cluster with fio tool, one fio instance with 70 jobs, each job writing 1GB random with 4K block size. I did this test with 3 variations: 1- Creating 70 images, 60GB each, in the pool. Using rbd kernel module, format and mount each image as ext4. Each fio job writing in a separate image/directory. (ioengine=libaio, queue_depth=4, direct=1) IOPS: 6542 AVG LAT: 41ms 2- Creating 1 large image 4,2TB in the pool. Using rbd kernel module, format and mount the image as ext4. Each fio job writing in a separate file in the same directory. (ioengine=libaio, queue_depth=4,direct=1) IOPS: 5899 AVG LAT: 47ms 3- Creating 1 large image 4,2TB in the pool. Use ioengine rbd in fio to access the image through librados. (ioengine=rbd, queue_depth=4,direct=1) IOPS: 2638 AVG LAT: 96ms Do these results make sense? From Ceph perspective, It is better to have many small images than a larger one? What is the best approach to simulate the workload of 70 VMs? thanks in advance or any help, Xabier ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] error mapping device in firefly
El 04/07/14 17:58, Ilya Dryomov escribió: On Fri, Jul 4, 2014 at 11:48 AM, Xabier Elkano xelk...@hostinet.com wrote: Hi, I am trying to map a rbd device in Ubuntu 14.04 (kernel 3.13.0-30-generic): # rbd -p mypool create test1 --size 500 # rbd -p mypool ls test1 # rbd -p mypool map test1 rbd: add failed: (5) Input/output error and in the syslog: Jul 4 09:31:48 testceph kernel: [70503.356842] libceph: mon2 172.16.64.18:6789 feature set mismatch, my 4a042a42 server's 2004a042a42, missing 200 Jul 4 09:31:48 testceph kernel: [70503.356938] libceph: mon2 172.16.64.18:6789 socket error on read my environment: cluster version on all MONs and OSDs is 0.80.1 In the client machine: ii ceph-common 0.80.1-1trusty amd64common utilities to mount and interact with a ceph storage cluster ii python-ceph 0.80.1-1trusty amd64Python libraries for the Ceph distributed filesystem ii librados2 0.80.1-1trusty amd64RADOS distributed object store client library I think I started getting this error when I switched from tunables legacy to optimal after upgrading from 0.72 to 0.80. Hi Xabier, You need to do ceph osd getcrushmap -o /tmp/crush crushtool -i /tmp/crush --set-chooseleaf_vary_r 0 -o /tmp/crush.new ceph osd setcrushmap -i /tmp/crush.new or upgrade your kernel to 3.15. Thanks, Ilya Thansks you Ilya, I changed the crushmap as you said and it solved the problem. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] error mapping device in firefly
Hi, I am trying to map a rbd device in Ubuntu 14.04 (kernel 3.13.0-30-generic): # rbd -p mypool create test1 --size 500 # rbd -p mypool ls test1 # rbd -p mypool map test1 rbd: add failed: (5) Input/output error and in the syslog: Jul 4 09:31:48 testceph kernel: [70503.356842] libceph: mon2 172.16.64.18:6789 feature set mismatch, my 4a042a42 server's 2004a042a42, missing 200 Jul 4 09:31:48 testceph kernel: [70503.356938] libceph: mon2 172.16.64.18:6789 socket error on read my environment: cluster version on all MONs and OSDs is 0.80.1 In the client machine: ii ceph-common 0.80.1-1trusty amd64common utilities to mount and interact with a ceph storage cluster ii python-ceph 0.80.1-1trusty amd64Python libraries for the Ceph distributed filesystem ii librados2 0.80.1-1trusty amd64RADOS distributed object store client library I think I started getting this error when I switched from tunables legacy to optimal after upgrading from 0.72 to 0.80. Thanks in advance! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Journal SSD durability
El 13/05/14 11:31, Christian Balzer escribió: Hello, No actual question, just some food for thought and something that later generations can scour from the ML archive. I'm planning another Ceph storage cluster, this time a classic Ceph design, 3 storage nodes with 8 HDDs for OSDs and 4 SSDs for OS and journal. Christian, do yo have many clusters in production? Are there any advantages with many clusters vs different pools per cluster? What is the right way to go?, maintain a big cluster or different clusters? When juggling the budget for it the 12 DC3700 200GB SSDs of my first draft stood out like the proverbial sore thumb, nearly 1/6th of the total budget. I really like those SSDs with their smooth performance and durability of 1TB/day writes (over 5 years, same for all the other numbers below), but wondered if that was really needed. This cluster is supposed to provide the storage for VMs (Vservers really) that are currently on 3 DRBD cluster pairs. Not particular write intensive, all of them just total about 20GB/day. With 2 journals per SSD that's 5GB/day of writes, well within the Intel specification of 20GB/day for their 530 drives (180GB version). However the uneven IOPS of the 530 and potential future changes in write patterns make this 300% safety margin still to slim for my liking. Alas a DC3500 240GB SSD will perform well enough at half the price of the DC3700 and give me enough breathing room at about 80GB/day writes, so this is what I will order in the end. Did you consider DC3700 100G with similar price? Christian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] advice with hardware configuration
El 06/05/14 19:31, Cedric Lemarchand escribió: Le 06/05/2014 17:07, Xabier Elkano a écrit : the goal is the performance over the capacity. I am sure you already consider the full SSD option, did you ? Yes, I considered full SSD option, but it is very expensive. Using intel 520 series each disk costs double than a SAS equivalent. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] advice with hardware configuration
El 06/05/14 19:38, Sergey Malinin escribió: If you plan to scale up in the future you could consider the following config to start with: Pool size=2 3 x servers with OS+journal on 1 ssd, 3 journal ssds, 4 x 900 gb data disks. It will get you 5+ TB capacity and you will be able to increase pool size to 3 at some point in time. Thanks for your response. Do you mean 1 SSD for OS and 3 journal + 4 SAS 900G + 5 free slots ? I had in mind the OS in RAID 1, but with 2 cheap SSD intel 3500 disks. The OS disk are SSD, but not for gaining performance, they are only 100G and they are cheap. I though that a OS failure could be worst than a journal or a single OSD failure, because the recovery time to restore de OS could be higher. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] advice with hardware configuration
El 06/05/14 18:40, Christian Balzer escribió: Hello, On Tue, 06 May 2014 17:07:33 +0200 Xabier Elkano wrote: Hi, I'm designing a new ceph pool with new hardware and I would like to receive some suggestion. I want to use a replica count of 3 in the pool and the idea is to buy 3 new servers with a 10-drive 2,5 chassis each and 2 10Gbps nics. I have in mind two configurations: As Wido said, more nodes are usually better, unless you're quite aware of what you're doing and why. Yes, I know that, but what is the minimum number of nodes to start with? Start with three nodes is not a feasible option? 1- With journal in SSDs OS: 2xSSD intel SC3500 100G Raid 1 Journal: 2xSSD intel SC3700 100G, 3 journal for each SSD As I wrote just a moment ago, use at least the 200GB ones if performance is such an issue for you. If you can afford it, use 4 3700s and share OS and journal, the OS IOPS will not be that significant, especially if you're using a writeback cache controller. the journal can be shared with the OS, but I like the RAID 1 for the OS. I think that the only drawback with it is that I am using two dedicated disk slots for OS. OSD: 6 SAS10K 900G (SAS2 6Gbps), each running an OSD process. Total size for OSDs: 5,4TB 2- With journal in a partition in the spinners. OS: 2xSSD intel SC3500 100G Raid 1 OSD+journal: 8 SAS15K 600G (SAS3 12Gbps), each runing an OSD process and its journal. Total size for OSDs: 3,6TB I have no idea why anybody would spend money on 12Gb/s HDDs when even most SSDs have trouble saturating a 6Gb/s link. Given the double write penalty in IOPS, I think you're going to find this more expensive (per byte) and slower than a well rounded option 1. But these disks are 2,5 15K, not only for the link. Other SAS 2,5 (SAS2) disks I found are only 10K. The 15K disks should be better for random IOPS. The budget in both configuration is similar, but the total capacity not. What would be the best configuration from the point of view of performance? In the second configuration I know the controller write back cache could be very critical, the servers has a LSI 3108 controller with 2GB Cache. I have to plan this storage as a KVM image backend and the goal is the performance over the capacity. Writeback cache can be very helpful, however it is not a miracle cure. Not knowing your actual load and I/O patterns it might very well be enough, though. The IO patterns are a bit unknown, I should assume 40% read and 60% write, but the IO size is unknown, because the storage is for KVM images and the VMs are for many customers and different purposes. Regards, Christian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] advice with hardware configuration
Hi, I'm designing a new ceph pool with new hardware and I would like to receive some suggestion. I want to use a replica count of 3 in the pool and the idea is to buy 3 new servers with a 10-drive 2,5 chassis each and 2 10Gbps nics. I have in mind two configurations: 1- With journal in SSDs OS: 2xSSD intel SC3500 100G Raid 1 Journal: 2xSSD intel SC3700 100G, 3 journal for each SSD OSD: 6 SAS10K 900G (SAS2 6Gbps), each running an OSD process. Total size for OSDs: 5,4TB 2- With journal in a partition in the spinners. OS: 2xSSD intel SC3500 100G Raid 1 OSD+journal: 8 SAS15K 600G (SAS3 12Gbps), each runing an OSD process and its journal. Total size for OSDs: 3,6TB The budget in both configuration is similar, but the total capacity not. What would be the best configuration from the point of view of performance? In the second configuration I know the controller write back cache could be very critical, the servers has a LSI 3108 controller with 2GB Cache. I have to plan this storage as a KVM image backend and the goal is the performance over the capacity. On the other hand, with these new hardware, what would be the best choice: create a new pool in an existing cluster or create a complete new cluster? Are there any advantages in creating and maintaining an isolated new cluster? thanks in advance, Xabier ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] advice with hardware configuration
El 06/05/14 17:57, Sergey Malinin escribió: My vision of a well built node is when number of journal disks is equal to number of data disks. You definitely don't want to lose 3 journals at once in case of single drive failure. thanks for your resonse. This is true, a single SSD failure also mean 3 OSD failure (50% loss capacity of each node and 16% of total capacity ), but the journal SSDs are intel SC3700 and them should be very reliable. 06 мая 2014 г., в 18:07, Xabier Elkano xelk...@hostinet.com написал(а): Hi, I'm designing a new ceph pool with new hardware and I would like to receive some suggestion. I want to use a replica count of 3 in the pool and the idea is to buy 3SC3700 new servers with a 10-drive 2,5 chassis each and 2 10Gbps nics. I have in mind two configurations: 1- With journal in SSDs OS: 2xSSD intel SC3500 100G Raid 1 Journal: 2xSSD intel SC3700 100G, 3 journal for each SSD OSD: 6 SAS10K 900G (SAS2 6Gbps), each running an OSD process. Total size for OSDs: 5,4TB 2- With journal in a partition in the spinners. OS: 2xSSD intel SC3500 100G Raid 1 OSD+journal: 8 SAS15K 600G (SAS3 12Gbps), each runing an OSD process and its journal. Total size for OSDs: 3,6TB The budget in both configuration is similar, but the total capacity not. What would be the best configuration from the point of view of performance? In the second configuration I know the controller write back cache could be very critical, the servers has a LSI 3108 controller with 2GB Cache. I have to plan this storage as a KVM image backend and the goal is the performance over the capacity. On the other hand, with these new hardware, what would be the best choice: create a new pool in an existing cluster or create a complete new cluster? Are there any advantages in creating and maintaining an isolated new cluster? thanks in advance, Xabier ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] advice with hardware configuration
El 06/05/14 17:51, Wido den Hollander escribió: On 05/06/2014 05:07 PM, Xabier Elkano wrote: Hi, I'm designing a new ceph pool with new hardware and I would like to receive some suggestion. I want to use a replica count of 3 in the pool and the idea is to buy 3 new servers with a 10-drive 2,5 chassis each and 2 10Gbps nics. I have in mind two configurations: Why 3 machines? That's something I would not recommend. If you want 30 drives I'd say, go for 8 machines with 4 drives each. If a single machine fails it's 12.5% of the cluster size instead of 33%! I always advise that a failure of a single machine should be 10% or less of the total cluster size. Wido The idea is to start with 3 nodes and scale them in the future. I am aware that a server failure can be 33% less performance, but if the whole pool performance is good enough with 3 replicas spread over 3 nodes, maybe it could coupe with that. The biggest cost here is the racks and servers, instead of the disks, and I prefer start with 3 high density servers and scale up them progressively. Do you think that this cannot good enough for production? 1- With journal in SSDs OS: 2xSSD intel SC3500 100G Raid 1 Journal: 2xSSD intel SC3700 100G, 3 journal for each SSD OSD: 6 SAS10K 900G (SAS2 6Gbps), each running an OSD process. Total size for OSDs: 5,4TB 2- With journal in a partition in the spinners. OS: 2xSSD intel SC3500 100G Raid 1 OSD+journal: 8 SAS15K 600G (SAS3 12Gbps), each runing an OSD process and its journal. Total size for OSDs: 3,6TB The budget in both configuration is similar, but the total capacity not. What would be the best configuration from the point of view of performance? In the second configuration I know the controller write back cache could be very critical, the servers has a LSI 3108 controller with 2GB Cache. I have to plan this storage as a KVM image backend and the goal is the performance over the capacity. On the other hand, with these new hardware, what would be the best choice: create a new pool in an existing cluster or create a complete new cluster? Are there any advantages in creating and maintaining an isolated new cluster? thanks in advance, Xabier ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] advice with hardware configuration
El 06/05/14 18:17, Christian Balzer escribió: On Tue, 6 May 2014 18:57:04 +0300 Sergey Malinin wrote: My vision of a well built node is when number of journal disks is equal to number of data disks. You definitely don't want to lose 3 journals at once in case of single drive failure. While that certainly is true not everybody is having unlimited budgets. I'd expect the DC3700 to outlast the spinning rust, especially if the implementor is SMART enough to be replace things before something unforetold were to happen. However using a 100GB DC3700 with those drives isn't particular wise performance wise. I'd at least use the 200GB ones. Hi Christian, you are right, I should use the 200GB ones at least. Thanks! Regards, Christian 06 мая 2014 г., в 18:07, Xabier Elkano xelk...@hostinet.com написал(а): Hi, I'm designing a new ceph pool with new hardware and I would like to receive some suggestion. I want to use a replica count of 3 in the pool and the idea is to buy 3 new servers with a 10-drive 2,5 chassis each and 2 10Gbps nics. I have in mind two configurations: 1- With journal in SSDs OS: 2xSSD intel SC3500 100G Raid 1 Journal: 2xSSD intel SC3700 100G, 3 journal for each SSD OSD: 6 SAS10K 900G (SAS2 6Gbps), each running an OSD process. Total size for OSDs: 5,4TB 2- With journal in a partition in the spinners. OS: 2xSSD intel SC3500 100G Raid 1 OSD+journal: 8 SAS15K 600G (SAS3 12Gbps), each runing an OSD process and its journal. Total size for OSDs: 3,6TB The budget in both configuration is similar, but the total capacity not. What would be the best configuration from the point of view of performance? In the second configuration I know the controller write back cache could be very critical, the servers has a LSI 3108 controller with 2GB Cache. I have to plan this storage as a KVM image backend and the goal is the performance over the capacity. On the other hand, with these new hardware, what would be the best choice: create a new pool in an existing cluster or create a complete new cluster? Are there any advantages in creating and maintaining an isolated new cluster? thanks in advance, Xabier ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com