Re: [ceph-users] rbd performance problem on kernel 3.13.6 and 3.18.11

2015-04-14 Thread Ilya Dryomov
On Tue, Apr 14, 2015 at 6:24 AM, yangruifeng.09...@h3c.com yangruifeng.09...@h3c.com wrote: Hi all! I am testing rbd performance based on kernel rbd dirver, when I compared the result of the kernel 3.13.6 with 3.18.11, my head gets so confused. look at the result, down by a third.

Re: [ceph-users] How to dispatch monitors in a multi-site cluster (ie in 2 datacenters)

2015-04-14 Thread Joao Eduardo Luis
On 04/14/2015 04:42 AM, Francois Lafont wrote: Joao Eduardo wrote: To be more precise, it's the lowest IP:PORT combination: 10.0.1.2:6789 = rank 0 10.0.1.2:6790 = rank 1 10.0.1.3:6789 = rank 3 and so on. Ok, so if there is 2 possible quorum, the quorum with the lowest IP:PORT will be

Re: [ceph-users] Binding a pool to certain OSDs

2015-04-14 Thread Saverio Proto
Yes you can. You have to write your own crushmap. At the end of the crushmap you have rulesets Write a ruleset that selects only the OSDs you want. Then you have to assign the pool to that ruleset. I have seen examples online, people what wanted some pools only on SSD disks and other pools only

Re: [ceph-users] Force an OSD to try to peer

2015-04-14 Thread Martin Millnert
On Tue, Mar 31, 2015 at 10:44:51PM +0300, koukou73gr wrote: On 03/31/2015 09:23 PM, Sage Weil wrote: It's nothing specific to peering (or ceph). The symptom we've seen is just that byte stop passing across a TCP connection, usually when there is some largish messages being sent. The

[ceph-users] 答复: rbd performance problem on kernel 3.13.6 and 3.18.11

2015-04-14 Thread yangruifeng.09...@h3c.com
cluster detail: ceph version 0.94 3 host, 3 mon, 18 osd 1 ssd as journal + 6 hdd per host. 1 pool, name is rbd , pg_num is 1024, 3 replicated. step: 1. rbd create test1 -s 81920 rbd create test2 -s 81920 rbd create test3 -s 81920 2. on host1, rbd map test1, get /dev/rbd0 on kernel 3.18.11 or

Re: [ceph-users] All pools have size=3 but MB data and MB used ratio is 1 to 5

2015-04-14 Thread Saverio Proto
2015-03-27 18:27 GMT+01:00 Gregory Farnum g...@gregs42.com: Ceph has per-pg and per-OSD metadata overhead. You currently have 26000 PGs, suitable for use on a cluster of the order of 260 OSDs. You have placed almost 7GB of data into it (21GB replicated) and have about 7GB of additional

Re: [ceph-users] Binding a pool to certain OSDs

2015-04-14 Thread Vincenzo Pii
Hi Giuseppe, There is also this article from Sébastien Han that you might find useful: http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ Best regards, Vincenzo. 2015-04-14 10:34 GMT+02:00 Saverio Proto ziopr...@gmail.com: Yes you can. You have to write

Re: [ceph-users] how to compute Ceph durability?

2015-04-14 Thread ghislain.chevalier
Hi All, Am I alone to have this need ? De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de ghislain.cheval...@orange.com Envoyé : vendredi 20 mars 2015 11:47 À : ceph-users Objet : [ceph-users] how to compute Ceph durability? Hi all, I would like to compute the durability

[ceph-users] OSD replacement

2015-04-14 Thread Corey Kovacs
I am fairly new to ceph and so far things are going great. That said, when I try to replace a failed OSD, I can't seem to get it to use the same OSD id#. I have gotten it to point which a ceph osd create does use the correct id# but when I try to use ceph-deploy to instantiate the replacement, I

Re: [ceph-users] how to compute Ceph durability?

2015-04-14 Thread Mark Nelson
Hi Ghislain, Mark Kampe was working on durability models a couple of years ago, but I'm not sure if they ever were completed or if anyone has reviewed them. The source code is available here: https://github.com/ceph/ceph-tools/tree/master/models/reliability This was before EC was in Ceph,

Re: [ceph-users] Binding a pool to certain OSDs

2015-04-14 Thread Bruce McFarland
You won’t get a PG warning message from ceph –s unless you have 20 PG’s per OSD in your cluster. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Bruce McFarland Sent: Tuesday, April 14, 2015 10:00 AM To: Giuseppe Civitella; Saverio Proto Cc: ceph-users@lists.ceph.com

Re: [ceph-users] Binding a pool to certain OSDs

2015-04-14 Thread Mark Nelson
You may also be interested in the cbt code that does this kind of thing for creating cache tiers: https://github.com/ceph/cbt/blob/master/cluster/ceph.py#L295 The idea is that you create a parallel crush hierarchy for the SSDs and then you can assign that to the pool used for the cache tier.

Re: [ceph-users] norecover and nobackfill

2015-04-14 Thread Robert LeBlanc
HmmmI've been deleting the OSD (ceph osd rm X; ceph osd crush rm osd.X) along with removing the auth key. This has caused data movement, but reading your reply and thinking about it made me think it should be done differently. I should just remove the auth key and leave the OSD in the CRUSH

Re: [ceph-users] v0.80.8 and librbd performance

2015-04-14 Thread Josh Durgin
I don't see any commits that would be likely to affect that between 0.80.7 and 0.80.9. Is this after upgrading an existing cluster? Could this be due to fs aging beneath your osds? How are you measuring create/delete performance? You can try increasing rbd concurrent management ops in

Re: [ceph-users] OSD replacement

2015-04-14 Thread Corey Kovacs
Vikhyat, I went through the steps as I did yesterday but with one small change. I was putting the --zap-disk option before the host:disk option. Now it works as expected. I'll try it again with the wrong syntax to see if that's really the problem but it's the only difference between working and

Re: [ceph-users] Binding a pool to certain OSDs

2015-04-14 Thread Bruce McFarland
I use this to quickly check pool stats: [root@ceph-mon01 ceph]# ceph osd dump | grep pool pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool crash_replay_interval 45 stripe_width 0 pool 1 'metadata' replicated size

Re: [ceph-users] Binding a pool to certain OSDs

2015-04-14 Thread Saverio Proto
No error message. You just finish the RAM memory and you blow up the cluster because of too many PGs. Saverio 2015-04-14 18:52 GMT+02:00 Giuseppe Civitella giuseppe.civite...@gmail.com: Hi Saverio, I first made a test on my test staging lab where I have only 4 OSD. On my mon servers (which

Re: [ceph-users] Binding a pool to certain OSDs

2015-04-14 Thread Giuseppe Civitella
Hi Saverio, I first made a test on my test staging lab where I have only 4 OSD. On my mon servers (which run other services) I have 16BG RAM, 15GB used but 5 cached. On the OSD servers I have 3GB RAM, 3GB used but 2 cached. ceph -s tells me nothing about PGs, shouldn't I get an error message from

Re: [ceph-users] Binding a pool to certain OSDs

2015-04-14 Thread Giuseppe Civitella
Hi all, I've been following this tutorial to realize my setup: http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ I got this CRUSH map from my test lab: http://paste.openstack.org/show/203887/ then I modified the map and uploaded it. This is the final

Re: [ceph-users] rbd: incorrect metadata

2015-04-14 Thread Jason Dillaman
The C++ librados API uses STL strings so it can properly handle embedded NULLs. You can make a backup copy of rbd_children using 'rados cp'. However, if you don't care about the snapshots and you've already flattened the all the images, you could just delete the rbd_children object so that

Re: [ceph-users] rbd: incorrect metadata

2015-04-14 Thread Matthew Monaco
On 04/14/2015 08:45 AM, Jason Dillaman wrote: The C++ librados API uses STL strings so it can properly handle embedded NULLs. You can make a backup copy of rbd_children using 'rados cp'. However, if you don't care about the snapshots and you've already flattened the all the images, you could

Re: [ceph-users] Binding a pool to certain OSDs

2015-04-14 Thread Saverio Proto
You only have 4 OSDs ? How much RAM per server ? I think you have already too many PG. Check your RAM usage. Check on Ceph wiki guidelines to dimension the correct number of PGs. Remeber that everytime to create a new pool you add PGs into the system. Saverio 2015-04-14 17:58 GMT+02:00

Re: [ceph-users] ceph data not well distributed.

2015-04-14 Thread Sage Weil
On Tue, 14 Apr 2015, Mark Nelson wrote: On 04/14/2015 08:58 PM, Yujian Peng wrote: I have a ceph cluster with 125 osds with the same weight. But I found that data is not well distributed. df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda147929224

Re: [ceph-users] v0.80.8 and librbd performance

2015-04-14 Thread shiva rkreddy
The clusters are in test environment, so its a new deployment of 0.80.9. OS on the cluster nodes is reinstalled as well, so there shouldn't be any fs aging unless the disks are slowing down. The perf measurement is done initiating multiple cinder create/delete commands and tracking the volume to

Re: [ceph-users] ceph data not well distributed.

2015-04-14 Thread GuangYang
We have a tiny script which does the CRUSH re-weight based on the PGs/OSD to achieve balance across OSDs, and we run the script right after setup the cluster to avoid data migration after the cluster is filled up. A couple of experiences to share:  1 As suggested, it is helpful to choose a

Re: [ceph-users] use ZFS for OSDs

2015-04-14 Thread Quenten Grasso
Hi Michal, Really nice work on the ZFS testing. I've been thinking about this myself from time to time, However I wasn't sure if ZoL was ready to use in production with Ceph. I would like to see instead of using multiple osd's in zfs/ceph but running say a z+2 for say 8-12 3-4TB spinners and

[ceph-users] ceph data not well distributed.

2015-04-14 Thread Yujian Peng
I have a ceph cluster with 125 osds with the same weight. But I found that data is not well distributed. df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda147929224 2066208 43405264 5% / udev 16434372 4 16434368 1% /dev tmpfs

[ceph-users] Ceph OSD Log INFO Learning

2015-04-14 Thread Star Guo
Hi, all, There is a image in attachment of ceph osd log information. It prints fault witch nothing to send, going to standby. What does it mean? Thanks J. Best Regards, Star Guo ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] Ceph OSD Log INFO Learning

2015-04-14 Thread Yujian Peng
Star Guo starg@... writes:   There is a image in attachment of ceph osd log information. It prints “fault witch nothing to send, going to standby”. What does it mean? Thanks J. Logs like this is OK. ___ ceph-users mailing list

Re: [ceph-users] Upgrade from Firefly to Hammer

2015-04-14 Thread Francois Lafont
Hi, Garg, Pankaj wrote: I have a small cluster of 7 machines. Can I just individually upgrade each of them (using apt-get upgrade) from Firefly to Hammer release, or there more to it than that? Not exactly, this is individually which is not correct. ;) You should indeed apt-get upgrade on

Re: [ceph-users] ceph data not well distributed.

2015-04-14 Thread Mark Nelson
On 04/14/2015 08:58 PM, Yujian Peng wrote: I have a ceph cluster with 125 osds with the same weight. But I found that data is not well distributed. df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda147929224 2066208 43405264 5% / udev

Re: [ceph-users] ceph data not well distributed.

2015-04-14 Thread Yujian Peng
Thanks for your advices! I'll increase the number of PGs to improve the balance. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] v0.80.8 and librbd performance

2015-04-14 Thread shiva rkreddy
Retried the test with by setting: rbd_concurrent_management_ops and rbd-concurrent-management-ops to 20 (default 10?) and didn't see any difference in the delete time. Steps: 1. Create 20, 500GB volumes 2. run : rbd -n clientkey -p cindervols rbd rm $volumeId 3. run rbd ls command in with 1

Re: [ceph-users] norecover and nobackfill

2015-04-14 Thread Francois Lafont
Robert LeBlanc wrote: HmmmI've been deleting the OSD (ceph osd rm X; ceph osd crush rm osd.X) along with removing the auth key. This has caused data movement, Maybe but if the flag noout is set, removing an OSD of the cluster doesn't trigger at all data movement (I have tested with

Re: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to create bucket

2015-04-14 Thread Yehuda Sadeh-Weinraub
- Original Message - From: Francois Lafont flafdiv...@free.fr To: ceph-users@lists.ceph.com Sent: Monday, April 13, 2015 7:11:49 PM Subject: Re: [ceph-users] Radosgw: upgrade Firefly to Hammer, impossible to create bucket Hi, Yehuda Sadeh-Weinraub wrote: The 405 in this

[ceph-users] Upgrade from Firefly to Hammer

2015-04-14 Thread Garg, Pankaj
Hi, I have a small cluster of 7 machines. Can I just individually upgrade each of them (using apt-get upgrade) from Firefly to Hammer release, or there more to it than that? Thank Pankaj ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] v0.80.8 and librbd performance

2015-04-14 Thread shiva rkreddy
Hi Josh, We are using firefly 0.80.9 and see both cinder create/delete numbers slow down compared 0.80.7. I don't see any specific tuning requirements and our cluster is run pretty much on default configuration. Do you recommend any tuning or can you please suggest some log signatures we need to

Re: [ceph-users] how to compute Ceph durability?

2015-04-14 Thread Christian Balzer
Hello, On Tue, 14 Apr 2015 12:04:35 + ghislain.cheval...@orange.com wrote: Hi All, Am I alone to have this need ? No, but for starters, there have been a number of threads about that topic in this ML, for example the Failure probability with largish deployments one nearly 1.5 years

Re: [ceph-users] OSD replacement

2015-04-14 Thread Vikhyat Umrao
Hi, I hope you are following this : http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual After removing the osd successfully run the following command : # ceph-deploy --overwrite-conf osd create osd-host:device-path --zap-disk It will give you the same osd id

Re: [ceph-users] Force an OSD to try to peer

2015-04-14 Thread Scott Laird
Things *mostly* work if hosts on the same network have different MTUs, at least with TCP, because the hosts will negotiate the MSS for each connection. UDP will still break, but large UDP packets are less common. You don't want to run that way for very long, but there's no need for an atomic MTU

Re: [ceph-users] Purpose of the s3gw.fcgi script?

2015-04-14 Thread Ken Dreyer
On 04/13/2015 07:35 PM, Yehuda Sadeh-Weinraub wrote: - Original Message - From: Francois Lafont flafdiv...@free.fr To: ceph-users@lists.ceph.com Sent: Monday, April 13, 2015 5:17:47 PM Subject: Re: [ceph-users] Purpose of the s3gw.fcgi script? Hi, Yehuda Sadeh-Weinraub wrote: