Re: [ceph-users] Newbie Ceph Design Questions

2014-09-21 Thread Udo Lembke
Hi Christian,

On 21.09.2014 07:18, Christian Balzer wrote:
 ...
 Personally I found ext4 to be faster than XFS in nearly all use cases and
 the lack of full, real kernel integration of ZFS is something that doesn't
 appeal to me either.
a little bit OT... what kind of ext4-mount options do you use?
I have an 5-node cluster with xfs (60 osds), and perhaps the performance
with ext4 would be better?!
For xfs  I use osd_mount_options_xfs =
rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M

regards

Udo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Merging two active ceph clusters: suggestions needed

2014-09-21 Thread Robin H. Johnson
On Sun, Sep 21, 2014 at 02:33:09PM +0900, Christian Balzer wrote:
  For a variety of reasons, none good anymore, we have two separate Ceph
  clusters.
  
  I would like to merge them onto the newer hardware, with as little
  downtime and data loss as possible; then discard the old hardware.
  
  Cluster A (2 hosts):
  - 3TB of S3 content, 100k files, file mtimes important
  - 500GB of RBD volumes, exported via iscsi
  
  Cluster B (4 hosts):
  - 50GiB of S3 content
  - 7TB of RBD volumes, exported via iscsi
  
  Short of finding somewhere to dump all of the data from one side, and
  re-importing it after merging with that cluster as empty; are there any
  other alternatives available to me?
  
 
 Having recently seen a similar question and the answer by the Ceph
 developers, no. 
 As in there is no way (and no plans) for merging clusters.
 
 There are export functions for RBD volumes, not sure about S3 and the
 mtimes as I don't use that functionality. 
Can somebody else make comments about migrating S3 buckets with
preserved mtime data (and all of the ACLs  CORS) then?

-- 
Robin Hugh Johnson
Gentoo Linux: Developer, Infrastructure Lead
E-Mail : robb...@gentoo.org
GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-21 Thread Zhang, Jian
Thanks. The results looked close to our results now. 

Thanks
Jian


-Original Message-
From: Alexandre DERUMIER [mailto:aderum...@odiso.com] 
Sent: Friday, September 19, 2014 8:54 PM
To: Zhang, Jian
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
IOPS

Thanks for this great information.
We are using Firefly. We will also try this later. 

Thanks
Jian

Oh, sorry,I have done a mistake when benching with fio (forgot to fill the osd 
with datas before the read benchmark).

true results with 6 osd : bw=118129KB/s, iops=29532 



- Mail original - 

De: Jian Zhang jian.zh...@intel.com 
À: Alexandre DERUMIER aderum...@odiso.com 
Cc: ceph-users@lists.ceph.com 
Envoyé: Vendredi 19 Septembre 2014 10:21:38 
Objet: RE: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
IOPS 

Thanks for this great information. 
We are using Firefly. We will also try this later. 

Thanks 
Jian 


-Original Message- 
From: Alexandre DERUMIER [mailto:aderum...@odiso.com] 
Sent: Friday, September 19, 2014 3:00 PM 
To: Zhang, Jian 
Cc: ceph-users@lists.ceph.com 
Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
IOPS 

I'll do benchs with 6 osd dc3500 tomorrow to compare firefly and giant. 

Here the results (big giant improvements) 

3 nodes with 2osd, replication x1 
network is 2gigabit link with lacp for nodes and client 




firefly : no tunning 
- 
bw=45880KB/s, iops=11469 



firefly with tuning: 
 
debug lockdep = 0/0 
debug context = 0/0 
debug crush = 0/0 
debug buffer = 0/0 
debug timer = 0/0 
debug journaler = 0/0 
debug osd = 0/0 
debug optracker = 0/0 
debug objclass = 0/0 
debug filestore = 0/0 
debug journal = 0/0 
debug ms = 0/0 
debug monc = 0/0 
debug tp = 0/0 
debug auth = 0/0 
debug finisher = 0/0 
debug heartbeatmap = 0/0 
debug perfcounter = 0/0 
debug asok = 0/0 
debug throttle = 0/0 
osd_op_threads = 5 
filestore_op_threads = 4 


bw=62094KB/s, iops=15523 



giant with same tuning 
--- 
bw=247073KB/s, iops=61768 ! 

I think I could reach more, but my 2 gigabit link are satured. 



- Mail original - 

De: Alexandre DERUMIER aderum...@odiso.com 
À: Jian Zhang jian.zh...@intel.com 
Cc: ceph-users@lists.ceph.com 
Envoyé: Jeudi 18 Septembre 2014 15:36:48 
Objet: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
IOPS 

Have anyone ever testing multi volume performance on a *FULL* SSD setup? 

I known that Stefan Priebe run full ssd clusters in production, and have done 
benchmark. (Ad far I remember, he have benched around 20k peak with dumpling) 

We are able to get ~18K IOPS for 4K random read on a single volume with fio 
(with rbd engine) on a 12x DC3700 Setup, but only able to get ~23K (peak) 
IOPS even with multiple volumes. 
Seems the maximum random write performance we can get on the entire cluster 
is quite close to single volume performance. 
Firefly or Giant ? 

I'll do benchs with 6 osd dc3500 tomorrow to compare firefly and giant. 

- Mail original - 

De: Jian Zhang jian.zh...@intel.com 
À: Sebastien Han sebastien@enovance.com, Alexandre DERUMIER 
aderum...@odiso.com 
Cc: ceph-users@lists.ceph.com 
Envoyé: Jeudi 18 Septembre 2014 08:12:32 
Objet: RE: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
IOPS 

Have anyone ever testing multi volume performance on a *FULL* SSD setup? 
We are able to get ~18K IOPS for 4K random read on a single volume with fio 
(with rbd engine) on a 12x DC3700 Setup, but only able to get ~23K (peak) IOPS 
even with multiple volumes. 
Seems the maximum random write performance we can get on the entire cluster is 
quite close to single volume performance. 

Thanks 
Jian 


-Original Message- 
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Sebastien Han 
Sent: Tuesday, September 16, 2014 9:33 PM 
To: Alexandre DERUMIER 
Cc: ceph-users@lists.ceph.com 
Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
IOPS 

Hi, 

Thanks for keeping us updated on this subject. 
dsync is definitely killing the ssd. 

I don't have much to add, I'm just surprised that you're only getting 5299 with 
0.85 since I've been able to get 6,4K, well I was using the 200GB model, that 
might explain this. 


On 12 Sep 2014, at 16:32, Alexandre DERUMIER aderum...@odiso.com wrote: 

 here the results for the intel s3500 
  
 max performance is with ceph 0.85 + optracker disabled. 
 intel s3500 don't have d_sync problem like crucial 
 
 %util show almost 100% for read and write, so maybe the ssd disk performance 
 is the limit. 
 
 I have some stec zeusram 8GB in stock (I used them for zfs zil), I'll try to 
 bench them next week. 
 
 
 
 
 
 
 INTEL s3500 
 --- 
 raw disk 
  
 
 randread: fio --filename=/dev/sdb --direct=1 --rw=randread --bs=4k 
 --iodepth=32 

Re: [ceph-users] Newbie Ceph Design Questions

2014-09-21 Thread Christian Balzer

Hello,

On Sun, 21 Sep 2014 21:00:48 +0200 Udo Lembke wrote:

 Hi Christian,
 
 On 21.09.2014 07:18, Christian Balzer wrote:
  ...
  Personally I found ext4 to be faster than XFS in nearly all use cases
  and the lack of full, real kernel integration of ZFS is something that
  doesn't appeal to me either.
 a little bit OT... what kind of ext4-mount options do you use?
 I have an 5-node cluster with xfs (60 osds), and perhaps the performance
 with ext4 would be better?!

Hard to tell w/o testing your particular load, I/O patterns.

When benchmarking directly with single disks or RAIDs it is fairly
straightforward to see:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-April/028540.html

Also note that the actual question has never been answered by the Ceph
team, which is a shame as I venture that it would make things faster.

Now testing with Ceph on top makes things a lot more murky, as Ceph
currently is so inefficient (see the various OSD on SSD performance
threads) that the underlying FS only makes a difference in certain
scenarios. 
In the next few weeks I will probably have the chance to play with a
cluster before it goes into production and refine those findings.

I found that ext4 definitely fragments less than XFS (using RBD for VM
images only here) over long term use.

Nothing particular for mount options, but the biggest performance boost
for ext4 is to give it the largest journal possible (-J size=1024) at
mkfs time.

 For xfs  I use osd_mount_options_xfs =
 rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
 
That looks pretty much like the optimum settings for XFS indeed.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Troubleshooting down OSDs: Invalid command: ceph osd start osd.1

2014-09-21 Thread Piers Dawson-Damer
Hi Loïc,

It seams there is another error in the documentation at 
(http://ceph.com/docs/argonaut/init/stop-cluster/)

I believe 
sudo service -a ceph stop
Should probably read 

sudo service ceph -a stop
Cheers


On 19 Sep 2014, at 6:33 pm, Loic Dachary l...@dachary.org wrote:

 Hi,
 
 The documentation indeed contains an example that does not work. This should 
 fix it : 
 https://github.com/dachary/ceph/commit/be97b7d5b89d7021f71695b4c1b78830bad4dab6
 
 Cheers
 
 On 19/09/2014 08:06, Piers Dawson-Damer wrote:
 Has the command for manually starting and stopping OSDs changed? 
 
 The documentation for troubleshooting OSDs  
 (http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/)
  mentions restarting OSDs with the command;
 
 ceph osd start osd.{num}
 
 Yet I find, using Firefly 0.80.5
 
 piers@sol:/etc/ceph$ ceph osd start osd.1
 no valid command found; 10 closest matches:
 osd tier remove poolname poolname
 osd tier cache-mode poolname none|writeback|forward|readonly
 osd thrash int[0-]
 osd tier add poolname poolname {--force-nonempty}
 osd pool stats {name}
 osd reweight-by-utilization {int[100-]}
 osd pool set poolname 
 size|min_size|crash_replay_interval|pg_num|pgp_num|crush_ruleset|hashpspool|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|debug_fake_ec_pool|target_max_bytes|target_max_objects|cache_target_dirty_ratio|cache_target_full_ratio|cache_min_flush_age|cache_min_evict_age|auid
  val {--yes-i-really-mean-it}
 osd pool set-quota poolname max_objects|max_bytes val
 osd pool rename poolname poolname
 osd pool get poolname 
 size|min_size|crash_replay_interval|pg_num|pgp_num|crush_ruleset|hit_set_type|hit_set_period|hit_set_count|hit_set_fpp|auid
 Error EINVAL: invalid command
 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 -- 
 Loïc Dachary, Artisan Logiciel Libre
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com