Re: [ceph-users] pgs unfound

2016-12-01 Thread Xabier Elkano
Hi,

I managed to remove the warning  reweighting the  crashed OSD:

ceph osd crush reweight osd.33 0.8

After the recovery, the cluster is not showing the warning any more

Xabier


On 29/11/16 11:18, Xabier Elkano wrote:
> Hi all,
>
> my cluster is in WARN state because apparently there are some pgs
> unfound. I think that I reached this situation because the metadata
> pool, this pool was in default root but without any use because I don't
> use cephfs, I only use rbd for VMs. I don't have OSDs in the default
> root, they are assigned to different roots depending on its disk type.
>
> My pools have specific crush rules to use the different roots. Again,
> all my pools, but not the metadata pool, it was assigned to the default
> root.
> With this situation I got problems in one OSD (I had to reset it from
> scratch) and when I restored the situation I got some pgs unfound
> because they where in the faulty OSD and belonged to the  metadata pool
> with size 1. Because I didn't care of any data in the metadata pool I
> created the unfounds pgs again with "ceph pg force_create_pg 1.25".
> Finally I set a crush rule to the metadata pool to change its location
> and the pgs were created.
>
> But now, the cluster is showing 29 unfound pgs, but without saying what pgs.
> How can I recover from this situation? Can I remove metadata pool and
> recreate it again?
>
>
> # ceph status
> cluster 72a4a18b-ec5c-454d-9135-04362c97c307
>  health HEALTH_WARN
> recovery 29/2748828 unfound (0.001%)
>  monmap e13: 5 mons at
> {mon1=172.16.64.12:6789/0,mon2=172.16.64.13:6789/0,mon3=172.16.64.16:6789/0,mon4=172.16.64.30:6789/0,mon4=172.16.64.31:6789/0}
> election epoch 99672, quorum 0,1,2,3,4 mon1,mon2,mon2,mon3,mon4
>  mdsmap e35323: 0/0/1 up
>  osdmap e49648: 38 osds: 38 up, 38 in
>   pgmap v76150847: 3065 pgs, 21 pools, 10654 GB data, 2684 kobjects
> 3 GB used, 25423 GB / 56534 GB avail
> 29/2748828 unfound (0.001%)
> 3063 active+clean
>2 active+clean+scrubbing
>   client io 4431 kB/s rd, 15897 kB/s wr, 2385 op/s
>
>
> # ceph health detail
> HEALTH_WARN recovery 29/2748829 unfound (0.001%)
> recovery 29/2748829 unfound (0.001%)
>
>
> My cluster runs hammer 0.94.9
> 5 Servers with 7 OSDs each on Ubuntu 14.04
> 5 monitor servers.
>
> Thanks and Best regards,
> Xabier
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pgs unfound

2016-11-29 Thread Xabier Elkano

Hi all,

my cluster is in WARN state because apparently there are some pgs
unfound. I think that I reached this situation because the metadata
pool, this pool was in default root but without any use because I don't
use cephfs, I only use rbd for VMs. I don't have OSDs in the default
root, they are assigned to different roots depending on its disk type.

My pools have specific crush rules to use the different roots. Again,
all my pools, but not the metadata pool, it was assigned to the default
root.
With this situation I got problems in one OSD (I had to reset it from
scratch) and when I restored the situation I got some pgs unfound
because they where in the faulty OSD and belonged to the  metadata pool
with size 1. Because I didn't care of any data in the metadata pool I
created the unfounds pgs again with "ceph pg force_create_pg 1.25".
Finally I set a crush rule to the metadata pool to change its location
and the pgs were created.

But now, the cluster is showing 29 unfound pgs, but without saying what pgs.
How can I recover from this situation? Can I remove metadata pool and
recreate it again?


# ceph status
cluster 72a4a18b-ec5c-454d-9135-04362c97c307
 health HEALTH_WARN
recovery 29/2748828 unfound (0.001%)
 monmap e13: 5 mons at
{mon1=172.16.64.12:6789/0,mon2=172.16.64.13:6789/0,mon3=172.16.64.16:6789/0,mon4=172.16.64.30:6789/0,mon4=172.16.64.31:6789/0}
election epoch 99672, quorum 0,1,2,3,4 mon1,mon2,mon2,mon3,mon4
 mdsmap e35323: 0/0/1 up
 osdmap e49648: 38 osds: 38 up, 38 in
  pgmap v76150847: 3065 pgs, 21 pools, 10654 GB data, 2684 kobjects
3 GB used, 25423 GB / 56534 GB avail
29/2748828 unfound (0.001%)
3063 active+clean
   2 active+clean+scrubbing
  client io 4431 kB/s rd, 15897 kB/s wr, 2385 op/s


# ceph health detail
HEALTH_WARN recovery 29/2748829 unfound (0.001%)
recovery 29/2748829 unfound (0.001%)


My cluster runs hammer 0.94.9
5 Servers with 7 OSDs each on Ubuntu 14.04
5 monitor servers.

Thanks and Best regards,
Xabier

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Correct method to deploy on jessie

2015-10-01 Thread Xabier Elkano
Hi,

you can add monitors and OSDs manually:

http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/

I've deployed all my cluster without ceph-deploy.

Xabier

On jue, 2015-10-01 at 16:54 +1000, Dmitry Ogorodnikov wrote:

> Dear all,
> 
> I'm trying to install ceph on Debian Jessie using pile of various
> manuals, and have no luck so far. 
> Problem is ceph doesnt have jessie repositories. Jessie has own
> repositories of ceph, but doesnt have ceph-deploy. As far as
> ceph-deploy is a part of most documented manipulations, I can't
> operate ceph without it.
> 
> If I use wheezy machine with installed ceph-deploy for deploying on
> jessie, it try to stick to ceph.com repositories for jessie (which
> doesn't exist, as we remember). Deploy fails.
> 
> If I use ceph-deploy with debian repositories (--no-adjust-repos) then
> following quickstart manual causes multiple errors on several steps. I
> can overcome some of those errors using various crutches, but I think
> this way is not ok.
> 
> So, please provide a manual for install on jessie. Or (better) provide
> repository. Please.
> 
> Have a nice day.
> Dmitry.
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Xabier Elkano
Departamento Técnico
Hostinet S.L.U.
http://www.hostinet.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw and keystone users

2015-09-28 Thread Xabier Elkano

Hi,

I'm just deployed the ceph object gateway as an object storage in
OpenStack. I've followed this doc to achieve the integration with
Keystone:

http://docs.ceph.com/docs/master/radosgw/keystone/

"It is possible to integrate the Ceph Object Gateway with Keystone, the
OpenStack identity service. This sets up the gateway to accept Keystone
as the users authority. A user that Keystone authorizes to access the
gateway will also be automatically created on the Ceph Object Gateway
(if didn’t exist beforehand). A token that Keystone validates will be
considered as valid by the gateway."

According to it, I was expecting that the keystone user was created in
radosgw when it was authorized by a keystone token, but instead, what is
created is the tenant id of the project that the user uses to manage his
objects.

# radosgw-admin user stats --uid=db4d25b13eaa4645a180f564b3817e1c
{ "stats": { "total_entries": 1,
  "total_bytes": 24546,
  "total_bytes_rounded": 24576},
  "last_stats_sync": "2015-09-25 12:09:12.795775Z",
  "last_stats_update": "2015-09-28 11:58:43.422859Z"}

Being that "db4d25b13eaa4645a180f564b3817e1c" is the project id I'm
using.

Is this the expected behavior and the doc pointed me in the wrong
direction or I misconfigured something? Really, I prefer this behavior,
because in this way I can set quotas on a project basis without worrying
about the users, but I would like to know if the integration is Ok.

My rados setup:

[client.radosgw.gateway]
host = hostname
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw socket path = ""
log file = /var/log/radosgw/client.radosgw.gateway.log
rgw frontends = fastcgi socket_port=9000 socket_host=0.0.0.0
rgw print continue = false
rgw keystone url = http://keystone_host:5000
rgw keystone admin token = _
rgw keystone accepted roles = _member_, Member, admin
rgw s3 auth use keystone = true
nss db path = /var/ceph/nss


Ceph FireFly 0.80.10 
OpenStack Juno
SO: Ubuntu 14.04

Best regards,
Xabier


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance tests

2014-07-10 Thread Xabier Elkano
El 09/07/14 16:53, Christian Balzer escribió:
 On Wed, 09 Jul 2014 07:07:50 -0500 Mark Nelson wrote:

 On 07/09/2014 06:52 AM, Xabier Elkano wrote:
 El 09/07/14 13:10, Mark Nelson escribió:
 On 07/09/2014 05:57 AM, Xabier Elkano wrote:

 Hi,

 I was doing some tests in my cluster with fio tool, one fio instance
 with 70 jobs, each job writing 1GB random with 4K block size. I did
 this test with 3 variations:

 1- Creating 70 images, 60GB each, in the pool. Using rbd kernel
 module, format and mount each image as ext4. Each fio job writing in
 a separate image/directory. (ioengine=libaio, queue_depth=4,
 direct=1)

  IOPS: 6542
  AVG LAT: 41ms

 2- Creating 1 large image 4,2TB in the pool. Using rbd kernel module,
 format and mount the image as ext4. Each fio job writing in a
 separate file in the same directory. (ioengine=libaio,
 queue_depth=4,direct=1)

 IOPS: 5899
 AVG LAT:  47ms

 3- Creating 1 large image 4,2TB in the pool. Use ioengine rbd in fio
 to access the image through librados. (ioengine=rbd,
 queue_depth=4,direct=1)

 IOPS: 2638
 AVG LAT: 96ms

 Do these results make sense? From Ceph perspective, It is better to
 have many small images than a larger one? What is the best approach
 to simulate the workload of 70 VMs?
 I'm not sure the difference between the first two cases is enough to
 say much yet.  You may need to repeat the test a couple of times to
 ensure that the difference is more than noise.  having said that, if
 we are seeing an effect, it would be interesting to know what the
 latency distribution is like.  is it consistently worse in the 2nd
 case or do we see higher spikes at specific times?

 I've repeated the tests with similar results. Each test is done with a
 clean new rbd image, first removing any existing images in the pool and
 then creating the new image. Between tests I am running:

   echo 3  /proc/sys/vm/drop_caches

 - In the first test I've created 70 images (60G) and mounted them:

 /dev/rbd1 on /mnt/fiotest/vtest0
 /dev/rbd2 on /mnt/fiotest/vtest1
 ..
 /dev/rbd70 on /mnt/fiotest/vtest69

 fio output:

 rand-write-4k: (groupid=0, jobs=70): err= 0: pid=21852: Tue Jul  8
 14:52:56 2014
write: io=2559.5MB, bw=26179KB/s, iops=6542, runt=100116msec
  slat (usec): min=18, max=512646, avg=4002.62, stdev=13754.33
  clat (usec): min=867, max=579715, avg=37581.64, stdev=55954.19
   lat (usec): min=903, max=586022, avg=41957.74, stdev=59276.40
  clat percentiles (msec):
   |  1.00th=[5],  5.00th=[   10], 10.00th=[   13],
 20.00th=[   18], | 30.00th=[   21], 40.00th=[   26], 50.00th=[   31],
 60.00th=[   34], | 70.00th=[   37], 80.00th=[   41], 90.00th=[   48],
 95.00th=[   61], | 99.00th=[  404], 99.50th=[  445], 99.90th=[  494],
 99.95th=[  515], | 99.99th=[  553]
  bw (KB  /s): min=0, max=  694, per=1.46%, avg=383.29,
 stdev=148.01 lat (usec) : 1000=0.01%
  lat (msec) : 2=0.12%, 4=0.63%, 10=4.82%, 20=22.33%, 50=63.97%
  lat (msec) : 100=5.61%, 250=0.47%, 500=2.01%, 750=0.08%
cpu  : usr=0.69%, sys=2.57%, ctx=1525021, majf=0, minf=2405
IO depths: 1=1.1%, 2=0.6%, 4=335.8%, 8=0.0%, 16=0.0%, 32=0.0%,
 =64=0.0%
   submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
 =64=0.0%
   complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
 =64=0.0%
   issued: total=r=0/w=655015/d=0, short=r=0/w=0/d=0
   latency   : target=0, window=0, percentile=100.00%, depth=4

 Run status group 0 (all jobs):
WRITE: io=2559.5MB, aggrb=26178KB/s, minb=26178KB/s, maxb=26178KB/s,
 mint=100116msec, maxt=100116msec

 Disk stats (read/write):
rbd1: ios=0/2408612, merge=0/979004, ticks=0/39436432,
 in_queue=39459720, util=99.68%

 - In the second test I only created one large image (4,2T)

 /dev/rbd1 on /mnt/fiotest/vtest0 type ext4
 (rw,noatime,nodiratime,data=ordered)

 fio output:

 rand-write-4k: (groupid=0, jobs=70): err= 0: pid=8907: Wed Jul  9
 13:38:14 2014
write: io=2264.6MB, bw=23143KB/s, iops=5783, runt=100198msec
  slat (usec): min=0, max=3099.8K, avg=4131.91, stdev=21388.98
  clat (usec): min=850, max=3133.1K, avg=43337.56, stdev=93830.42
   lat (usec): min=930, max=3147.5K, avg=48253.22, stdev=100642.53
  clat percentiles (msec):
   |  1.00th=[5],  5.00th=[   11], 10.00th=[   14],
 20.00th=[   19], | 30.00th=[   24], 40.00th=[   29], 50.00th=[   33],
 60.00th=[   36], | 70.00th=[   39], 80.00th=[   43], 90.00th=[   51],
 95.00th=[   68], | 99.00th=[  506], 99.50th=[  553], 99.90th=[  717],
 99.95th=[  783], | 99.99th=[ 3130]
  bw (KB  /s): min=0, max=  680, per=1.54%, avg=355.39,
 stdev=156.10 lat (usec) : 1000=0.01%
  lat (msec) : 2=0.12%, 4=0.66%, 10=4.21%, 20=17.82%, 50=66.95%
  lat (msec) : 100=7.34%, 250=0.78%, 500=1.10%, 750=0.99%,
 1000=0.02% lat (msec) : =2000=0.04%
cpu  : usr=0.65%, sys=2.45%, ctx=1434322, majf=0, minf=2399
IO depths: 1=0.2%, 2=0.1%, 4=365.4%, 8=0.0%, 16=0.0%, 32=0.0

Re: [ceph-users] performance tests

2014-07-10 Thread Xabier Elkano
El 10/07/14 09:18, Christian Balzer escribió:
 On Thu, 10 Jul 2014 08:57:56 +0200 Xabier Elkano wrote:

 El 09/07/14 16:53, Christian Balzer escribió:
 On Wed, 09 Jul 2014 07:07:50 -0500 Mark Nelson wrote:

 On 07/09/2014 06:52 AM, Xabier Elkano wrote:
 El 09/07/14 13:10, Mark Nelson escribió:
 On 07/09/2014 05:57 AM, Xabier Elkano wrote:
 Hi,

 I was doing some tests in my cluster with fio tool, one fio
 instance with 70 jobs, each job writing 1GB random with 4K block
 size. I did this test with 3 variations:

 1- Creating 70 images, 60GB each, in the pool. Using rbd kernel
 module, format and mount each image as ext4. Each fio job writing
 in a separate image/directory. (ioengine=libaio, queue_depth=4,
 direct=1)

  IOPS: 6542
  AVG LAT: 41ms

 2- Creating 1 large image 4,2TB in the pool. Using rbd kernel
 module, format and mount the image as ext4. Each fio job writing
 in a separate file in the same directory. (ioengine=libaio,
 queue_depth=4,direct=1)

 IOPS: 5899
 AVG LAT:  47ms

 3- Creating 1 large image 4,2TB in the pool. Use ioengine rbd in
 fio to access the image through librados. (ioengine=rbd,
 queue_depth=4,direct=1)

 IOPS: 2638
 AVG LAT: 96ms

 Do these results make sense? From Ceph perspective, It is better to
 have many small images than a larger one? What is the best approach
 to simulate the workload of 70 VMs?
 I'm not sure the difference between the first two cases is enough to
 say much yet.  You may need to repeat the test a couple of times to
 ensure that the difference is more than noise.  having said that, if
 we are seeing an effect, it would be interesting to know what the
 latency distribution is like.  is it consistently worse in the 2nd
 case or do we see higher spikes at specific times?

 I've repeated the tests with similar results. Each test is done with
 a clean new rbd image, first removing any existing images in the
 pool and then creating the new image. Between tests I am running:

   echo 3  /proc/sys/vm/drop_caches

 - In the first test I've created 70 images (60G) and mounted them:

 /dev/rbd1 on /mnt/fiotest/vtest0
 /dev/rbd2 on /mnt/fiotest/vtest1
 ..
 /dev/rbd70 on /mnt/fiotest/vtest69

 fio output:

 rand-write-4k: (groupid=0, jobs=70): err= 0: pid=21852: Tue Jul  8
 14:52:56 2014
write: io=2559.5MB, bw=26179KB/s, iops=6542, runt=100116msec
  slat (usec): min=18, max=512646, avg=4002.62, stdev=13754.33
  clat (usec): min=867, max=579715, avg=37581.64, stdev=55954.19
   lat (usec): min=903, max=586022, avg=41957.74, stdev=59276.40
  clat percentiles (msec):
   |  1.00th=[5],  5.00th=[   10], 10.00th=[   13],
 20.00th=[   18], | 30.00th=[   21], 40.00th=[   26], 50.00th=[   31],
 60.00th=[   34], | 70.00th=[   37], 80.00th=[   41], 90.00th=[   48],
 95.00th=[   61], | 99.00th=[  404], 99.50th=[  445], 99.90th=[  494],
 99.95th=[  515], | 99.99th=[  553]
  bw (KB  /s): min=0, max=  694, per=1.46%, avg=383.29,
 stdev=148.01 lat (usec) : 1000=0.01%
  lat (msec) : 2=0.12%, 4=0.63%, 10=4.82%, 20=22.33%, 50=63.97%
  lat (msec) : 100=5.61%, 250=0.47%, 500=2.01%, 750=0.08%
cpu  : usr=0.69%, sys=2.57%, ctx=1525021, majf=0,
 minf=2405 IO depths: 1=1.1%, 2=0.6%, 4=335.8%, 8=0.0%, 16=0.0%,
 32=0.0%,
 =64=0.0%
   submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
 64=0.0%,
 =64=0.0%
   complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
 64=0.0%,
 =64=0.0%
   issued: total=r=0/w=655015/d=0, short=r=0/w=0/d=0
   latency   : target=0, window=0, percentile=100.00%, depth=4

 Run status group 0 (all jobs):
WRITE: io=2559.5MB, aggrb=26178KB/s, minb=26178KB/s,
 maxb=26178KB/s, mint=100116msec, maxt=100116msec

 Disk stats (read/write):
rbd1: ios=0/2408612, merge=0/979004, ticks=0/39436432,
 in_queue=39459720, util=99.68%

 - In the second test I only created one large image (4,2T)

 /dev/rbd1 on /mnt/fiotest/vtest0 type ext4
 (rw,noatime,nodiratime,data=ordered)

 fio output:

 rand-write-4k: (groupid=0, jobs=70): err= 0: pid=8907: Wed Jul  9
 13:38:14 2014
write: io=2264.6MB, bw=23143KB/s, iops=5783, runt=100198msec
  slat (usec): min=0, max=3099.8K, avg=4131.91, stdev=21388.98
  clat (usec): min=850, max=3133.1K, avg=43337.56, stdev=93830.42
   lat (usec): min=930, max=3147.5K, avg=48253.22, stdev=100642.53
  clat percentiles (msec):
   |  1.00th=[5],  5.00th=[   11], 10.00th=[   14],
 20.00th=[   19], | 30.00th=[   24], 40.00th=[   29], 50.00th=[   33],
 60.00th=[   36], | 70.00th=[   39], 80.00th=[   43], 90.00th=[   51],
 95.00th=[   68], | 99.00th=[  506], 99.50th=[  553], 99.90th=[  717],
 99.95th=[  783], | 99.99th=[ 3130]
  bw (KB  /s): min=0, max=  680, per=1.54%, avg=355.39,
 stdev=156.10 lat (usec) : 1000=0.01%
  lat (msec) : 2=0.12%, 4=0.66%, 10=4.21%, 20=17.82%, 50=66.95%
  lat (msec) : 100=7.34%, 250=0.78%, 500=1.10%, 750=0.99%,
 1000=0.02% lat (msec) : =2000=0.04%
cpu  : usr=0.65%, sys=2.45

[ceph-users] performance tests

2014-07-09 Thread Xabier Elkano


Hi,

I was doing some tests in my cluster with fio tool, one fio instance
with 70 jobs, each job writing 1GB random with 4K block size. I did this
test with 3 variations:

1- Creating 70 images, 60GB each, in the pool. Using rbd kernel module,
format and mount each image as ext4. Each fio job writing in a separate
image/directory. (ioengine=libaio, queue_depth=4, direct=1)
 
   IOPS: 6542
   AVG LAT: 41ms

2- Creating 1 large image 4,2TB in the pool. Using rbd kernel module,
format and mount the image as ext4. Each fio job writing in a separate
file in the same directory. (ioengine=libaio, queue_depth=4,direct=1)
 
  IOPS: 5899
  AVG LAT:  47ms

3- Creating 1 large image 4,2TB in the pool. Use ioengine rbd in fio to
access the image through librados. (ioengine=rbd, queue_depth=4,direct=1)

  IOPS: 2638
  AVG LAT: 96ms
 
Do these results make sense? From Ceph perspective, It is better to have
many small images than a larger one? What is the best approach to
simulate the workload of 70 VMs?


thanks in advance or any help,
Xabier
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance tests

2014-07-09 Thread Xabier Elkano
El 09/07/14 13:10, Mark Nelson escribió:
 On 07/09/2014 05:57 AM, Xabier Elkano wrote:


 Hi,

 I was doing some tests in my cluster with fio tool, one fio instance
 with 70 jobs, each job writing 1GB random with 4K block size. I did this
 test with 3 variations:

 1- Creating 70 images, 60GB each, in the pool. Using rbd kernel module,
 format and mount each image as ext4. Each fio job writing in a separate
 image/directory. (ioengine=libaio, queue_depth=4, direct=1)

 IOPS: 6542
 AVG LAT: 41ms

 2- Creating 1 large image 4,2TB in the pool. Using rbd kernel module,
 format and mount the image as ext4. Each fio job writing in a separate
 file in the same directory. (ioengine=libaio, queue_depth=4,direct=1)

IOPS: 5899
AVG LAT:  47ms

 3- Creating 1 large image 4,2TB in the pool. Use ioengine rbd in fio to
 access the image through librados. (ioengine=rbd,
 queue_depth=4,direct=1)

IOPS: 2638
AVG LAT: 96ms

 Do these results make sense? From Ceph perspective, It is better to have
 many small images than a larger one? What is the best approach to
 simulate the workload of 70 VMs?

 I'm not sure the difference between the first two cases is enough to
 say much yet.  You may need to repeat the test a couple of times to
 ensure that the difference is more than noise.  having said that, if
 we are seeing an effect, it would be interesting to know what the
 latency distribution is like.  is it consistently worse in the 2nd
 case or do we see higher spikes at specific times?

I've repeated the tests with similar results. Each test is done with a
clean new rbd image, first removing any existing images in the pool and
then creating the new image. Between tests I am running:

 echo 3  /proc/sys/vm/drop_caches

- In the first test I've created 70 images (60G) and mounted them:

/dev/rbd1 on /mnt/fiotest/vtest0
/dev/rbd2 on /mnt/fiotest/vtest1
..
/dev/rbd70 on /mnt/fiotest/vtest69

fio output:

rand-write-4k: (groupid=0, jobs=70): err= 0: pid=21852: Tue Jul  8
14:52:56 2014
  write: io=2559.5MB, bw=26179KB/s, iops=6542, runt=100116msec
slat (usec): min=18, max=512646, avg=4002.62, stdev=13754.33
clat (usec): min=867, max=579715, avg=37581.64, stdev=55954.19
 lat (usec): min=903, max=586022, avg=41957.74, stdev=59276.40
clat percentiles (msec):
 |  1.00th=[5],  5.00th=[   10], 10.00th=[   13], 20.00th=[   18],
 | 30.00th=[   21], 40.00th=[   26], 50.00th=[   31], 60.00th=[   34],
 | 70.00th=[   37], 80.00th=[   41], 90.00th=[   48], 95.00th=[   61],
 | 99.00th=[  404], 99.50th=[  445], 99.90th=[  494], 99.95th=[  515],
 | 99.99th=[  553]
bw (KB  /s): min=0, max=  694, per=1.46%, avg=383.29, stdev=148.01
lat (usec) : 1000=0.01%
lat (msec) : 2=0.12%, 4=0.63%, 10=4.82%, 20=22.33%, 50=63.97%
lat (msec) : 100=5.61%, 250=0.47%, 500=2.01%, 750=0.08%
  cpu  : usr=0.69%, sys=2.57%, ctx=1525021, majf=0, minf=2405
  IO depths: 1=1.1%, 2=0.6%, 4=335.8%, 8=0.0%, 16=0.0%, 32=0.0%,
=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
=64=0.0%
 issued: total=r=0/w=655015/d=0, short=r=0/w=0/d=0
 latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
  WRITE: io=2559.5MB, aggrb=26178KB/s, minb=26178KB/s, maxb=26178KB/s,
mint=100116msec, maxt=100116msec

Disk stats (read/write):
  rbd1: ios=0/2408612, merge=0/979004, ticks=0/39436432,
in_queue=39459720, util=99.68%

- In the second test I only created one large image (4,2T)

/dev/rbd1 on /mnt/fiotest/vtest0 type ext4
(rw,noatime,nodiratime,data=ordered)

fio output:

rand-write-4k: (groupid=0, jobs=70): err= 0: pid=8907: Wed Jul  9
13:38:14 2014
  write: io=2264.6MB, bw=23143KB/s, iops=5783, runt=100198msec
slat (usec): min=0, max=3099.8K, avg=4131.91, stdev=21388.98
clat (usec): min=850, max=3133.1K, avg=43337.56, stdev=93830.42
 lat (usec): min=930, max=3147.5K, avg=48253.22, stdev=100642.53
clat percentiles (msec):
 |  1.00th=[5],  5.00th=[   11], 10.00th=[   14], 20.00th=[   19],
 | 30.00th=[   24], 40.00th=[   29], 50.00th=[   33], 60.00th=[   36],
 | 70.00th=[   39], 80.00th=[   43], 90.00th=[   51], 95.00th=[   68],
 | 99.00th=[  506], 99.50th=[  553], 99.90th=[  717], 99.95th=[  783],
 | 99.99th=[ 3130]
bw (KB  /s): min=0, max=  680, per=1.54%, avg=355.39, stdev=156.10
lat (usec) : 1000=0.01%
lat (msec) : 2=0.12%, 4=0.66%, 10=4.21%, 20=17.82%, 50=66.95%
lat (msec) : 100=7.34%, 250=0.78%, 500=1.10%, 750=0.99%, 1000=0.02%
lat (msec) : =2000=0.04%
  cpu  : usr=0.65%, sys=2.45%, ctx=1434322, majf=0, minf=2399
  IO depths: 1=0.2%, 2=0.1%, 4=365.4%, 8=0.0%, 16=0.0%, 32=0.0%,
=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
=64=0.0%
 issued: total=r=0/w

Re: [ceph-users] performance tests

2014-07-09 Thread Xabier Elkano
El 09/07/14 14:07, Mark Nelson escribió:
 On 07/09/2014 06:52 AM, Xabier Elkano wrote:
 El 09/07/14 13:10, Mark Nelson escribió:
 On 07/09/2014 05:57 AM, Xabier Elkano wrote:


 Hi,

 I was doing some tests in my cluster with fio tool, one fio instance
 with 70 jobs, each job writing 1GB random with 4K block size. I did
 this
 test with 3 variations:

 1- Creating 70 images, 60GB each, in the pool. Using rbd kernel
 module,
 format and mount each image as ext4. Each fio job writing in a
 separate
 image/directory. (ioengine=libaio, queue_depth=4, direct=1)

  IOPS: 6542
  AVG LAT: 41ms

 2- Creating 1 large image 4,2TB in the pool. Using rbd kernel module,
 format and mount the image as ext4. Each fio job writing in a separate
 file in the same directory. (ioengine=libaio, queue_depth=4,direct=1)

 IOPS: 5899
 AVG LAT:  47ms

 3- Creating 1 large image 4,2TB in the pool. Use ioengine rbd in
 fio to
 access the image through librados. (ioengine=rbd,
 queue_depth=4,direct=1)

 IOPS: 2638
 AVG LAT: 96ms

 Do these results make sense? From Ceph perspective, It is better to
 have
 many small images than a larger one? What is the best approach to
 simulate the workload of 70 VMs?

 I'm not sure the difference between the first two cases is enough to
 say much yet.  You may need to repeat the test a couple of times to
 ensure that the difference is more than noise.  having said that, if
 we are seeing an effect, it would be interesting to know what the
 latency distribution is like.  is it consistently worse in the 2nd
 case or do we see higher spikes at specific times?

 I've repeated the tests with similar results. Each test is done with a
 clean new rbd image, first removing any existing images in the pool and
 then creating the new image. Between tests I am running:

   echo 3  /proc/sys/vm/drop_caches

 - In the first test I've created 70 images (60G) and mounted them:

 /dev/rbd1 on /mnt/fiotest/vtest0
 /dev/rbd2 on /mnt/fiotest/vtest1
 ..
 /dev/rbd70 on /mnt/fiotest/vtest69

 fio output:

 rand-write-4k: (groupid=0, jobs=70): err= 0: pid=21852: Tue Jul  8
 14:52:56 2014
write: io=2559.5MB, bw=26179KB/s, iops=6542, runt=100116msec
  slat (usec): min=18, max=512646, avg=4002.62, stdev=13754.33
  clat (usec): min=867, max=579715, avg=37581.64, stdev=55954.19
   lat (usec): min=903, max=586022, avg=41957.74, stdev=59276.40
  clat percentiles (msec):
   |  1.00th=[5],  5.00th=[   10], 10.00th=[   13],
 20.00th=[   18],
   | 30.00th=[   21], 40.00th=[   26], 50.00th=[   31],
 60.00th=[   34],
   | 70.00th=[   37], 80.00th=[   41], 90.00th=[   48],
 95.00th=[   61],
   | 99.00th=[  404], 99.50th=[  445], 99.90th=[  494], 99.95th=[ 
 515],
   | 99.99th=[  553]
  bw (KB  /s): min=0, max=  694, per=1.46%, avg=383.29,
 stdev=148.01
  lat (usec) : 1000=0.01%
  lat (msec) : 2=0.12%, 4=0.63%, 10=4.82%, 20=22.33%, 50=63.97%
  lat (msec) : 100=5.61%, 250=0.47%, 500=2.01%, 750=0.08%
cpu  : usr=0.69%, sys=2.57%, ctx=1525021, majf=0, minf=2405
IO depths: 1=1.1%, 2=0.6%, 4=335.8%, 8=0.0%, 16=0.0%, 32=0.0%,
 =64=0.0%
   submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
 =64=0.0%
   complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
 =64=0.0%
   issued: total=r=0/w=655015/d=0, short=r=0/w=0/d=0
   latency   : target=0, window=0, percentile=100.00%, depth=4

 Run status group 0 (all jobs):
WRITE: io=2559.5MB, aggrb=26178KB/s, minb=26178KB/s, maxb=26178KB/s,
 mint=100116msec, maxt=100116msec

 Disk stats (read/write):
rbd1: ios=0/2408612, merge=0/979004, ticks=0/39436432,
 in_queue=39459720, util=99.68%

 - In the second test I only created one large image (4,2T)

 /dev/rbd1 on /mnt/fiotest/vtest0 type ext4
 (rw,noatime,nodiratime,data=ordered)

 fio output:

 rand-write-4k: (groupid=0, jobs=70): err= 0: pid=8907: Wed Jul  9
 13:38:14 2014
write: io=2264.6MB, bw=23143KB/s, iops=5783, runt=100198msec
  slat (usec): min=0, max=3099.8K, avg=4131.91, stdev=21388.98
  clat (usec): min=850, max=3133.1K, avg=43337.56, stdev=93830.42
   lat (usec): min=930, max=3147.5K, avg=48253.22, stdev=100642.53
  clat percentiles (msec):
   |  1.00th=[5],  5.00th=[   11], 10.00th=[   14],
 20.00th=[   19],
   | 30.00th=[   24], 40.00th=[   29], 50.00th=[   33],
 60.00th=[   36],
   | 70.00th=[   39], 80.00th=[   43], 90.00th=[   51],
 95.00th=[   68],
   | 99.00th=[  506], 99.50th=[  553], 99.90th=[  717], 99.95th=[ 
 783],
   | 99.99th=[ 3130]
  bw (KB  /s): min=0, max=  680, per=1.54%, avg=355.39,
 stdev=156.10
  lat (usec) : 1000=0.01%
  lat (msec) : 2=0.12%, 4=0.66%, 10=4.21%, 20=17.82%, 50=66.95%
  lat (msec) : 100=7.34%, 250=0.78%, 500=1.10%, 750=0.99%, 1000=0.02%
  lat (msec) : =2000=0.04%
cpu  : usr=0.65%, sys=2.45%, ctx=1434322, majf=0, minf=2399
IO depths: 1=0.2%, 2=0.1%, 4=365.4%, 8=0.0

Re: [ceph-users] performance tests

2014-07-09 Thread Xabier Elkano
El 09/07/14 13:14, hua peng escribió:
 what're the IO throughput (MB/s) for the test cases?

 Thanks.
Hi Hua,

the throughput in each test is IOPS x 4K block size, all tests are
random write.

Xabier

 On 14-7-9 下午6:57, Xabier Elkano wrote:


 Hi,

 I was doing some tests in my cluster with fio tool, one fio instance
 with 70 jobs, each job writing 1GB random with 4K block size. I did this
 test with 3 variations:

 1- Creating 70 images, 60GB each, in the pool. Using rbd kernel module,
 format and mount each image as ext4. Each fio job writing in a separate
 image/directory. (ioengine=libaio, queue_depth=4, direct=1)

 IOPS: 6542
 AVG LAT: 41ms

 2- Creating 1 large image 4,2TB in the pool. Using rbd kernel module,
 format and mount the image as ext4. Each fio job writing in a separate
 file in the same directory. (ioengine=libaio, queue_depth=4,direct=1)

IOPS: 5899
AVG LAT:  47ms

 3- Creating 1 large image 4,2TB in the pool. Use ioengine rbd in fio to
 access the image through librados. (ioengine=rbd,
 queue_depth=4,direct=1)

IOPS: 2638
AVG LAT: 96ms

 Do these results make sense? From Ceph perspective, It is better to have
 many small images than a larger one? What is the best approach to
 simulate the workload of 70 VMs?


 thanks in advance or any help,
 Xabier
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] error mapping device in firefly

2014-07-07 Thread Xabier Elkano
El 04/07/14 17:58, Ilya Dryomov escribió:
 On Fri, Jul 4, 2014 at 11:48 AM, Xabier Elkano xelk...@hostinet.com wrote:
 Hi,

 I am trying to map a rbd device in  Ubuntu 14.04 (kernel 3.13.0-30-generic):

 # rbd -p mypool create test1 --size 500

 # rbd -p mypool ls
 test1

 # rbd -p mypool map test1
 rbd: add failed: (5) Input/output error

 and in the syslog:
 Jul  4 09:31:48 testceph kernel: [70503.356842] libceph: mon2
 172.16.64.18:6789 feature set mismatch, my 4a042a42  server's
 2004a042a42, missing 200
 Jul  4 09:31:48 testceph kernel: [70503.356938] libceph: mon2
 172.16.64.18:6789 socket error on read


 my environment:

 cluster version on all MONs and OSDs is 0.80.1
 In the client machine:

 ii  ceph-common 0.80.1-1trusty
 amd64common utilities to mount and interact with a ceph storage
 cluster
 ii  python-ceph 0.80.1-1trusty
 amd64Python libraries for the Ceph distributed filesystem
 ii  librados2   0.80.1-1trusty
 amd64RADOS distributed object store client library

 I think I started getting this error when I switched from tunables
 legacy to optimal after upgrading from 0.72 to 0.80.
 Hi Xabier,

 You need to do

 ceph osd getcrushmap -o /tmp/crush
 crushtool -i /tmp/crush --set-chooseleaf_vary_r 0 -o /tmp/crush.new
 ceph osd setcrushmap -i /tmp/crush.new

 or upgrade your kernel to 3.15.

 Thanks,

 Ilya
Thansks you Ilya, I changed the crushmap as you said and it solved the
problem.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] error mapping device in firefly

2014-07-04 Thread Xabier Elkano
Hi,

I am trying to map a rbd device in  Ubuntu 14.04 (kernel 3.13.0-30-generic):

# rbd -p mypool create test1 --size 500

# rbd -p mypool ls
test1

# rbd -p mypool map test1
rbd: add failed: (5) Input/output error

and in the syslog:
Jul  4 09:31:48 testceph kernel: [70503.356842] libceph: mon2
172.16.64.18:6789 feature set mismatch, my 4a042a42  server's
2004a042a42, missing 200
Jul  4 09:31:48 testceph kernel: [70503.356938] libceph: mon2
172.16.64.18:6789 socket error on read


my environment:

cluster version on all MONs and OSDs is 0.80.1
In the client machine:

ii  ceph-common 0.80.1-1trusty   
amd64common utilities to mount and interact with a ceph storage
cluster
ii  python-ceph 0.80.1-1trusty   
amd64Python libraries for the Ceph distributed filesystem
ii  librados2   0.80.1-1trusty   
amd64RADOS distributed object store client library

I think I started getting this error when I switched from tunables
legacy to optimal after upgrading from 0.72 to 0.80.

Thanks in advance!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Journal SSD durability

2014-05-13 Thread Xabier Elkano
El 13/05/14 11:31, Christian Balzer escribió:
 Hello,

 No actual question, just some food for thought and something that later
 generations can scour from the ML archive.

 I'm planning another Ceph storage cluster, this time a classic Ceph
 design, 3 storage nodes with 8 HDDs for OSDs and 4 SSDs for OS and journal.
Christian, do yo have many clusters in production? Are there any
advantages with many clusters vs different pools per cluster? What is
the right way to go?, maintain a big cluster or different clusters?

 When juggling the budget for it the 12 DC3700 200GB SSDs of my first
 draft stood out like the proverbial sore thumb, nearly 1/6th of the total
 budget. 
 I really like those SSDs with their smooth performance and durability of
 1TB/day writes (over 5 years, same for all the other numbers below), but
 wondered if that was really needed. 

 This cluster is supposed to provide the storage for VMs (Vservers
 really) that are currently on 3 DRBD cluster pairs.
 Not particular write intensive, all of them just total about 20GB/day.
 With 2 journals per SSD that's 5GB/day of writes, well within the Intel
 specification of 20GB/day for their 530 drives (180GB version).

 However the uneven IOPS of the 530 and potential future changes in write
 patterns make this 300% safety margin still to slim for my liking.

 Alas a DC3500 240GB SSD will perform well enough at half the price of the
 DC3700 and give me enough breathing room at about 80GB/day writes, so this
 is what I will order in the end.
Did you consider DC3700 100G with similar price?

 Christian

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] advice with hardware configuration

2014-05-07 Thread Xabier Elkano
El 06/05/14 19:31, Cedric Lemarchand escribió:
 Le 06/05/2014 17:07, Xabier Elkano a écrit :
 the goal is the performance over the capacity.
 I am sure you already consider the full SSD option, did you ?

Yes, I considered full SSD option, but it is very expensive. Using intel
520 series each disk costs double than a SAS equivalent.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] advice with hardware configuration

2014-05-07 Thread Xabier Elkano
El 06/05/14 19:38, Sergey Malinin escribió:
 If you plan to scale up in the future you could consider the following config 
 to start with:

 Pool size=2
 3 x servers with OS+journal on 1 ssd, 3 journal ssds, 4 x 900 gb data disks.
 It will get you 5+ TB capacity and you will be able to increase pool size to 
 3 at some point in time.
Thanks for your response. Do you mean 1 SSD for OS and 3 journal + 4 SAS
900G  + 5 free slots ? I had in mind the OS in RAID 1, but with 2 cheap 
SSD intel 3500 disks. The OS disk are SSD, but not for gaining
performance, they are only 100G and they are cheap. I though that a OS
failure could be worst than a journal or a single OSD failure, because
the recovery time to restore de OS could be higher.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] advice with hardware configuration

2014-05-07 Thread Xabier Elkano
El 06/05/14 18:40, Christian Balzer escribió:
 Hello,

 On Tue, 06 May 2014 17:07:33 +0200 Xabier Elkano wrote:

 Hi,

 I'm designing a new ceph pool with new hardware and I would like to
 receive some suggestion.
 I want to use a replica count of 3 in the pool and the idea is to buy 3
 new servers with a 10-drive 2,5 chassis each and 2 10Gbps nics. I have
 in mind two configurations:

 As Wido said, more nodes are usually better, unless you're quite aware of
 what you're doing and why.
Yes, I know that, but what is the minimum number of nodes to start with?
Start with three nodes is not a feasible option?
  
 1- With journal in SSDs
  
 OS: 2xSSD intel SC3500 100G Raid 1
 Journal: 2xSSD intel SC3700 100G, 3 journal for each SSD
 As I wrote just a moment ago, use at least the 200GB ones if performance
 is such an issue for you.
 If you can afford it, use 4 3700s and share OS and journal, the OS IOPS
 will not be that significant, especially if you're using a writeback cache
 controller. 
the journal can be shared with the OS, but I like the RAID 1 for the OS.
I think that the only drawback with it is that I am using two dedicated
disk slots for OS.

 OSD: 6 SAS10K 900G (SAS2 6Gbps), each running an OSD process. Total size
 for OSDs: 5,4TB

 2- With journal in a partition in the spinners.

 OS: 2xSSD intel SC3500 100G Raid 1
 OSD+journal: 8 SAS15K 600G (SAS3 12Gbps), each runing an OSD process and
 its journal. Total size for OSDs: 3,6TB

 I have no idea why anybody would spend money on 12Gb/s HDDs when even
 most SSDs have trouble saturating a 6Gb/s link.
 Given the double write penalty in IOPS, I think you're going to find
 this more expensive (per byte) and slower than a well rounded option 1.
But these disks are 2,5 15K, not only for the link. Other SAS 2,5
(SAS2) disks I found are only 10K. The 15K disks should be better for
random IOPS.

 The budget in both configuration is similar, but the total capacity not.
 What would be the best configuration from the point of view of
 performance? In the second configuration I know the controller write
 back cache could be very critical, the servers has a LSI 3108 controller
 with 2GB Cache. I have to plan this storage as a KVM image backend and
 the goal is the performance over the capacity.

 Writeback cache can be very helpful, however it is not a miracle cure.
 Not knowing your actual load and I/O patterns it might very well be
 enough, though.
The IO patterns are a bit unknown, I should assume 40% read and 60%
write, but the IO size is unknown, because the storage is for KVM images
and the VMs are for many customers and different purposes.

 Regards,

 Christian

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] advice with hardware configuration

2014-05-06 Thread Xabier Elkano

Hi,

I'm designing a new ceph pool with new hardware and I would like to
receive some suggestion.
I want to use a replica count of 3 in the pool and the idea is to buy 3
new servers with a 10-drive 2,5 chassis each and 2 10Gbps nics. I have
in mind two configurations:

1- With journal in SSDs
 
OS: 2xSSD intel SC3500 100G Raid 1
Journal: 2xSSD intel SC3700 100G, 3 journal for each SSD
OSD: 6 SAS10K 900G (SAS2 6Gbps), each running an OSD process. Total size
for OSDs: 5,4TB

2- With journal in a partition in the spinners.

OS: 2xSSD intel SC3500 100G Raid 1
OSD+journal: 8 SAS15K 600G (SAS3 12Gbps), each runing an OSD process and
its journal. Total size for OSDs: 3,6TB

The budget in both configuration is similar, but the total capacity not.
What would be the best configuration from the point of view of
performance? In the second configuration I know the controller write
back cache could be very critical, the servers has a LSI 3108 controller
with 2GB Cache. I have to plan this storage as a KVM image backend and
the goal is the performance over the capacity.

On the other hand, with these new hardware, what would be the best
choice: create a new pool in an existing cluster or create a complete
new cluster? Are there any advantages in creating and maintaining an
isolated new cluster?

thanks in advance,
Xabier


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] advice with hardware configuration

2014-05-06 Thread Xabier Elkano
El 06/05/14 17:57, Sergey Malinin escribió:
 My vision of a well built node is when number of journal disks is equal to 
 number of data disks. You definitely don't want to lose 3 journals at once in 
 case of single drive failure.
thanks for your resonse. This is true, a single SSD failure also mean 3
OSD failure (50% loss capacity of each node and 16% of total capacity ),
but the journal SSDs are intel SC3700 and them should be very reliable.
 06 мая 2014 г., в 18:07, Xabier Elkano xelk...@hostinet.com написал(а):


 Hi,

 I'm designing a new ceph pool with new hardware and I would like to
 receive some suggestion.
 I want to use a replica count of 3 in the pool and the idea is to buy 3SC3700
 new servers with a 10-drive 2,5 chassis each and 2 10Gbps nics. I have
 in mind two configurations:

 1- With journal in SSDs

 OS: 2xSSD intel SC3500 100G Raid 1
 Journal: 2xSSD intel SC3700 100G, 3 journal for each SSD
 OSD: 6 SAS10K 900G (SAS2 6Gbps), each running an OSD process. Total size
 for OSDs: 5,4TB

 2- With journal in a partition in the spinners.

 OS: 2xSSD intel SC3500 100G Raid 1
 OSD+journal: 8 SAS15K 600G (SAS3 12Gbps), each runing an OSD process and
 its journal. Total size for OSDs: 3,6TB

 The budget in both configuration is similar, but the total capacity not.
 What would be the best configuration from the point of view of
 performance? In the second configuration I know the controller write
 back cache could be very critical, the servers has a LSI 3108 controller
 with 2GB Cache. I have to plan this storage as a KVM image backend and
 the goal is the performance over the capacity.

 On the other hand, with these new hardware, what would be the best
 choice: create a new pool in an existing cluster or create a complete
 new cluster? Are there any advantages in creating and maintaining an
 isolated new cluster?

 thanks in advance,
 Xabier


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] advice with hardware configuration

2014-05-06 Thread Xabier Elkano
El 06/05/14 17:51, Wido den Hollander escribió:
 On 05/06/2014 05:07 PM, Xabier Elkano wrote:

 Hi,

 I'm designing a new ceph pool with new hardware and I would like to
 receive some suggestion.
 I want to use a replica count of 3 in the pool and the idea is to buy 3
 new servers with a 10-drive 2,5 chassis each and 2 10Gbps nics. I have
 in mind two configurations:


 Why 3 machines? That's something I would not recommend. If you want 30
 drives I'd say, go for 8 machines with 4 drives each.

 If a single machine fails it's 12.5% of the cluster size instead of 33%!

 I always advise that a failure of a single machine should be 10% or
 less of the total cluster size.

 Wido
The idea is to start with 3 nodes and scale them in the future. I am
aware that a server failure can be 33% less performance, but if the
whole pool performance is good enough with 3 replicas spread over 3
nodes, maybe it could coupe with that.

The biggest cost here is the racks and servers, instead of the disks,
and I prefer start with 3 high density servers and scale up them
progressively.

Do you think that this cannot good enough for production?

 1- With journal in SSDs

 OS: 2xSSD intel SC3500 100G Raid 1
 Journal: 2xSSD intel SC3700 100G, 3 journal for each SSD
 OSD: 6 SAS10K 900G (SAS2 6Gbps), each running an OSD process. Total size
 for OSDs: 5,4TB

 2- With journal in a partition in the spinners.

 OS: 2xSSD intel SC3500 100G Raid 1
 OSD+journal: 8 SAS15K 600G (SAS3 12Gbps), each runing an OSD process and
 its journal. Total size for OSDs: 3,6TB

 The budget in both configuration is similar, but the total capacity not.
 What would be the best configuration from the point of view of
 performance? In the second configuration I know the controller write
 back cache could be very critical, the servers has a LSI 3108 controller
 with 2GB Cache. I have to plan this storage as a KVM image backend and
 the goal is the performance over the capacity.

 On the other hand, with these new hardware, what would be the best
 choice: create a new pool in an existing cluster or create a complete
 new cluster? Are there any advantages in creating and maintaining an
 isolated new cluster?

 thanks in advance,
 Xabier


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] advice with hardware configuration

2014-05-06 Thread Xabier Elkano
El 06/05/14 18:17, Christian Balzer escribió:
 On Tue, 6 May 2014 18:57:04 +0300 Sergey Malinin wrote:

 My vision of a well built node is when number of journal disks is equal
 to number of data disks. You definitely don't want to lose 3 journals at
 once in case of single drive failure.

 While that certainly is true not everybody is having unlimited budgets. 

 I'd expect the DC3700 to outlast the spinning rust, especially if the
 implementor is SMART enough to be replace things before something
 unforetold were to happen.

 However using a 100GB DC3700 with those drives isn't particular wise
 performance wise. I'd at least use the 200GB ones.
Hi Christian, you are right, I should use the 200GB ones at least. Thanks!

 Regards,

 Christian
 06 мая 2014 г., в 18:07, Xabier Elkano xelk...@hostinet.com
 написал(а):


 Hi,

 I'm designing a new ceph pool with new hardware and I would like to
 receive some suggestion.
 I want to use a replica count of 3 in the pool and the idea is to buy 3
 new servers with a 10-drive 2,5 chassis each and 2 10Gbps nics. I have
 in mind two configurations:

 1- With journal in SSDs

 OS: 2xSSD intel SC3500 100G Raid 1
 Journal: 2xSSD intel SC3700 100G, 3 journal for each SSD
 OSD: 6 SAS10K 900G (SAS2 6Gbps), each running an OSD process. Total
 size for OSDs: 5,4TB

 2- With journal in a partition in the spinners.

 OS: 2xSSD intel SC3500 100G Raid 1
 OSD+journal: 8 SAS15K 600G (SAS3 12Gbps), each runing an OSD process
 and its journal. Total size for OSDs: 3,6TB

 The budget in both configuration is similar, but the total capacity
 not. What would be the best configuration from the point of view of
 performance? In the second configuration I know the controller write
 back cache could be very critical, the servers has a LSI 3108
 controller with 2GB Cache. I have to plan this storage as a KVM image
 backend and the goal is the performance over the capacity.

 On the other hand, with these new hardware, what would be the best
 choice: create a new pool in an existing cluster or create a complete
 new cluster? Are there any advantages in creating and maintaining an
 isolated new cluster?

 thanks in advance,
 Xabier


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com