Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-21 Thread Christian Eichelmann
Hi Dan, we are alreay back on the kernel module since the same problems were happening with fuse. I had no special ulimit settings for the fuse-process, so that could have been an issue there. I was pasting you the kernel messages during such incidents here: http://pastebin.com/X5JRe1v3 I was

Re: [ceph-users] Possible improvements for a slow write speed (excluding independent SSD journals)

2015-04-21 Thread Eneko Lacunza
Hi, I'm just writing to you to stress out what others have already said, because it is very important that you take it very seriously. On 20/04/15 19:17, J-P Methot wrote: On 4/20/2015 11:01 AM, Christian Balzer wrote: This is similar to another thread running right now, but since our

Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-21 Thread Christian Eichelmann
Hi Onur, actual 50, ideal 330128, fragmentation factor 0.97% so fragmentation is not an issue here. Regards, Christian Am 20.04.2015 um 16:41 schrieb Onur BEKTAS: Hi, Check xfs fregmentation factor for rbd disks i.e. xfs_db -c frag -r /dev/sdX if it is high, try defrag

Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?

2015-04-21 Thread Götz Reinicke - IT Koordinator
Hi Christian, Am 13.04.15 um 12:54 schrieb Christian Balzer: Hello, On Mon, 13 Apr 2015 11:03:24 +0200 Götz Reinicke - IT Koordinator wrote: Dear ceph users, we are planing a ceph storage cluster from scratch. Might be up to 1 PB within the next 3 years, multiple buildings, new network

[ceph-users] weird issue with OSDs on admin node

2015-04-21 Thread Lee Revell
So I had extra drives on my lab cluster's admin node and decided to use them for more OSDs. The weird thing is, unlike all the other nodes, the OSDs don't start on boot - i have to manually activate them whenever the cluster is rebooted. Manually running service ceph-all start doesn't start them

Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-21 Thread Dan van der Ster
Hi Christian, I've never debugged the kernel client either, so I don't know how to increase debugging. (I don't see any useful parms on the kernel modules). Your log looks like the client just stops communicating with the ceph cluster. Is iptables getting in the way ? Cheers, Dan On Tue, Apr

Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-21 Thread Christian Eichelmann
Hi Dan, nope, we have no iptables rules on those hosts and the gateway is on the same subnet as the ceph cluster. I will see if I can find some informations on how to debug the rbd kernel module (any suggestions are appreciated :)) Regards, Christian Am 21.04.2015 um 10:20 schrieb Dan van der

Re: [ceph-users] OSDs failing on upgrade from Giant to Hammer

2015-04-21 Thread Samuel Just
Yep, you have hit bug 11429. At some point, you removed a pool and then restarted these osds. Due to the original bug, 10617, those osds never actually removed the pgs in that pool. I'm working on a fix, or you can manually remove pgs corresponding to pools which no longer exist from the

Re: [ceph-users] Network redundancy pro and cons, best practice, suggestions?

2015-04-21 Thread Christian Balzer
Hello, On Tue, 21 Apr 2015 08:33:21 +0200 Götz Reinicke - IT Koordinator wrote: Hi Christian, Am 13.04.15 um 12:54 schrieb Christian Balzer: Hello, On Mon, 13 Apr 2015 11:03:24 +0200 Götz Reinicke - IT Koordinator wrote: Dear ceph users, we are planing a ceph storage

Re: [ceph-users] XFS extsize

2015-04-21 Thread Ilya Dryomov
On Tue, Apr 21, 2015 at 3:43 PM, Ilya Dryomov idryo...@gmail.com wrote: On Tue, Apr 21, 2015 at 11:49 AM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: Hi, while running firefly i've seen that each osd log prints: 2015-04-21 10:24:49.498048 7fa9e925d780 0

Re: [ceph-users] CephFS concurrency question

2015-04-21 Thread Neville
Hi Huseyin, Thanks for responding. I did spend quite a bit of time trying to sort out the nova and libvirt uid/gid so hopefully they are correct. That is also one of the reasons why I resorted to testing with a basic text file as root to rule that out. Host 2 :

Re: [ceph-users] Possible improvements for a slow write speed (excluding independent SSD journals)

2015-04-21 Thread Mark Nelson
On 04/21/2015 03:04 AM, Andrei Mikhailovsky wrote: Hi I have been testing the Samsung 840 Pro (128gb) for quite sometime and I can also confirm that this drive is unsuitable for osd journal. The performance and latency that I get from these drives (according to ceph osd perf) are between 10 -

Re: [ceph-users] Possible improvements for a slow write speed (excluding independent SSD journals)

2015-04-21 Thread J-P Methot
Thank you everyone for your replies. We are currently in the process of selecting new drives for journaling to replace the samsung drives. We're running our own tests using DD and the command found here :

[ceph-users] decrease pg number

2015-04-21 Thread Pavel V. Kaygorodov
Hi! I have updated my cluster to Hammer and got a warning too many PGs per OSD (2240 max 300). I know, that there is no way to decrease number of page groups, so I want to re-create my pools with less pg number, move all my data to them, delete old pools and rename new pools as the old ones.

Re: [ceph-users] CephFS concurrency question

2015-04-21 Thread Hüseyin Çotuk
Dear Neville, Could you please share output of ls -l /var/lib/nova/instances on both hosts? The user id's of nova on both is probably different. You can check /etc/passwd files on both host. Regards, Huseyin COTUK On 21-04-2015 15:43, Neville wrote: I'm trying to setup live migration in

[ceph-users] Still CRUSH problems with 0.94.1 ?

2015-04-21 Thread f...@univ-lr.fr
Hi all, may there be a problem with the crush function during 'from scratch' installation of 0.94.1-0 ? This has been tested many times, with ceph-deploy-1.5.22-0 or ceph-deploy-1.5.23-0. Platform RHEL7. Each time, the new cluster ends up in a weird state never seen on my previous

[ceph-users] XFS extsize

2015-04-21 Thread Stefan Priebe - Profihost AG
Hi, while running firefly i've seen that each osd log prints: 2015-04-21 10:24:49.498048 7fa9e925d780 0 xfsfilestorebackend(/ceph/osd.49/) detect_feature: extsize is disabled by conf The firefly release notes V0.80.4: osd: disable XFS extsize hint by default (#8830, Samuel Just) But hammer

Re: [ceph-users] Possible improvements for a slow write speed (excluding independent SSD journals)

2015-04-21 Thread Andrei Mikhailovsky
Hi I have been testing the Samsung 840 Pro (128gb) for quite sometime and I can also confirm that this drive is unsuitable for osd journal. The performance and latency that I get from these drives (according to ceph osd perf) are between 10 - 15 times slower compared to the Intel 520. The

Re: [ceph-users] Is CephFS ready for production?

2015-04-21 Thread Sage Weil
On Tue, 21 Apr 2015, Ray Sun wrote: Cephers, Many people told me ceph is ready for production except the cephFS, is this true? Any why it is? Can any one explain this to me? Thanks a lot. Ready for production is a subjective judgement call. The reason why we are cautious when recommending it

Re: [ceph-users] CRUSH rule for 3 replicas across 2 hosts

2015-04-21 Thread Colin Corr
On 04/20/2015 04:18 PM, Robert LeBlanc wrote: You usually won't end up with more than the size number of replicas, even in a failure situation. Although technically more than size number of OSDs may have the data (if the OSD comes back in service, the journal may be used to quickly get

Re: [ceph-users] CephFS concurrency question

2015-04-21 Thread Neville
Hi Robert, I'm running the latest version of Icehouse - 2014.1.4. I know there was some talk of patching nova but not sure if this was ever done, if it was I assume it wasn't backported for Icehouse. I'm not specifically worried about needing the instances folder on shared storage to get it

Re: [ceph-users] CephFS concurrency question

2015-04-21 Thread Robert LeBlanc
I think your are using an old version of OpenStack. I seem to remember a discussion about a patch to remove the requirement of shared storage for live migration on Ceph RBD. Are you using librbd in open stack? Robert LeBlanc Sent from a mobile device please excuse any typos. On Apr 21, 2015 6:43

[ceph-users] CephFS concurrency question

2015-04-21 Thread Neville
I'm trying to setup live migration in Openstack using Ceph RBD backed volumes. From what I understand I also need to put the libvirt folder /var/lib/nova/instances on shared storage for it to work as Nova tests for this as part of the migration process. I decided to look at using CephFS for

Re: [ceph-users] XFS extsize

2015-04-21 Thread Ilya Dryomov
On Tue, Apr 21, 2015 at 11:49 AM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: Hi, while running firefly i've seen that each osd log prints: 2015-04-21 10:24:49.498048 7fa9e925d780 0 xfsfilestorebackend(/ceph/osd.49/) detect_feature: extsize is disabled by conf The firefly

Re: [ceph-users] CRUSH rule for 3 replicas across 2 hosts

2015-04-21 Thread Colin Corr
On 04/21/2015 09:08 AM, Robert LeBlanc wrote: Your logic isn't quite right and from what I understand, this is how it works: step choose firstn 2 type rack # Choose two racks from the CRUSH map (my CRUSH only has two, so select both of them) step chooseleaf firstn 2 type host #

Re: [ceph-users] CRUSH rule for 3 replicas across 2 hosts

2015-04-21 Thread Colin Corr
Thanks for elaborating on those facilities, Greg. Its all starting to make more sense if I think of it from an osd tree view and the type hierarchy. Figures... you will eliminate them right around the time that I fully understand how to use them effectively. On 04/21/2015 09:52 AM, Gregory

Re: [ceph-users] Possible improvements for a slow write speed (excluding independent SSD journals)

2015-04-21 Thread Alex Moore
Just want to add my own experience, as I'm using consumer Samsung SSDs at the moment (Ceph 0.87.1, replication 3, 16 Gbps infiniband). Originally I only had Samsung 840 EVO 1TB SSDs, which I partitioned with an initial small partition for the journal and the rest for the OSD (using XFS). I

Re: [ceph-users] Is CephFS ready for production?

2015-04-21 Thread Ray Sun
Sage, I have the same question about the road map. I remember in OpenStack HongKong summit, there was a road map to show ceph will support VMWare in Q3 last year, but after Redhat acquired Ceph, I think the road map had changed. Could you please provide more detailed about the road map. Thanks.

Re: [ceph-users] Is CephFS ready for production?

2015-04-21 Thread Mohamed Pakkeer
Hi sage, When can we expect the fully functional fsck for cephfs?. Can we get at next major release?. Is there any roadmap or time frame for the fully functional fsck release? Thanks Regards K.Mohamed Pakkeer On 21 Apr 2015 20:57, Sage Weil s...@newdream.net wrote: On Tue, 21 Apr 2015, Ray

Re: [ceph-users] CRUSH rule for 3 replicas across 2 hosts

2015-04-21 Thread Gregory Farnum
The CRUSH min and max sizes are part of the ruleset facilities that we're slowly removing because they turned out to have no utility and be overly complicated to understand. You should probably just set them all to 1 and 10. The intention behind them was that you could have a single ruleset which

Re: [ceph-users] CRUSH rule for 3 replicas across 2 hosts

2015-04-21 Thread Robert LeBlanc
Your logic isn't quite right and from what I understand, this is how it works: step choose firstn 2 type rack # Choose two racks from the CRUSH map (my CRUSH only has two, so select both of them) step chooseleaf firstn 2 type host # From the set chosen previously (two racks), select a leaf

[ceph-users] ceph.com documentation suggestions

2015-04-21 Thread Chad William Seys
Hi, I've recently seen some confusion over the number of PGs per pool versus per cluster on the mailing list. I also set too many PGs per pool b/c of this confusion. IMO, it is fairly confusing to talk about PGs on the Pool page, but only vaguely talk about the number of PGs for the