Re: [ceph-users] CEPH cluster to meet 5 msec latency

2016-10-20 Thread Christian Balzer
Hello, re-adding the ML, so everybody benefits from this. On Thu, 20 Oct 2016 14:03:56 +0530 Subba Rao K wrote: > Hi Christian, > > I have seen one of your responses in CEPH user group and wanted some help > from you. > > Can you please share HW configuration of the CEPH cluster which can

Re: [ceph-users] effectively reducing scrub io impact

2016-10-20 Thread Christian Balzer
Hello, On Thu, 20 Oct 2016 15:03:02 +0200 Oliver Dzombic wrote: > Hi Christian, > > thank you for your time. > > The problem is deep scrub only. > > Jewel 10.2.2 is used. > Hmm, I was under the impression that the unified queue in Jewel was supposed to stop scrubs from eating all the I/O

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-20 Thread Christian Balzer
Hello, On Thu, 20 Oct 2016 15:45:34 + Jim Kilborn wrote: Good to know. You may be able to squeeze some more 4K write IOPS out of this by cranking the CPUs to full speed, see the relevant recent threads about this. As for the 120GB (there is no 128GB SM863 model according to Samsung) SSDs

[ceph-users] Ceph recommendations for ALL SSD

2016-10-20 Thread Ramakrishna Nishtala (rnishtal)
Hi Any suggestions/recommendations on all SSD for Ceph? I see SSD freezes occasionally on SATA drives, thus creating spikes in latency at times. Recovers after a brief pause of 20-30 secs. Any best practices like colocated journals or not, schedulers, hdparms etc appreciated. Working on 1.3.

Re: [ceph-users] Issue with Ceph padding files out to ceph.dir.layout.stripe_unit size

2016-10-20 Thread Kate Ward
All are relatively recent Ubuntu 16.04.1 kernels. I upgraded ka05 last night, but still see an issue. I'm happy to upgrade the rest. $ for h in ka00 ka01 ka02 ka03 ka04 ka05; do ssh $h uname -a; done Linux ka00 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:00:59 UTC 2016 i686 i686 i686 GNU/Linux

Re: [ceph-users] Issue with Ceph padding files out to ceph.dir.layout.stripe_unit size

2016-10-20 Thread John Spray
On Thu, Oct 20, 2016 at 10:15 PM, Kate Ward wrote: > I have a strange problem that began manifesting after I rebuilt my cluster a > month or so back. A tiny subset of my files on CephFS are being zero-padded > out to the length of ceph.dir.layout.stripe_unit when the

[ceph-users] Issue with Ceph padding files out to ceph.dir.layout.stripe_unit size

2016-10-20 Thread Kate Ward
I have a strange problem that began manifesting after I rebuilt my cluster a month or so back. A tiny subset of my files on CephFS are being zero-padded out to the length of ceph.dir.layout.stripe_unit when the files are later *read* (not when they are written). Tonight I realized the padding

[ceph-users] Announcing the ceph-large mailing list

2016-10-20 Thread Stillwell, Bryan J
Do you run a large Ceph cluster? Do you find that you run into issues that you didn't have when your cluster was smaller? If so we have a new mailing list for you! Announcing the new ceph-large mailing list. This list is targeted at experienced Ceph operators with cluster(s) over 500 OSDs to

[ceph-users] Memory leak in radosgw

2016-10-20 Thread Trey Palmer
I've been trying to test radosgw multisite and have a pretty bad memory leak.It appears to be associated only with multisite sync. Multisite works well for a small numbers of objects.However, it all fell over when I wrote in 8M 64K objects to two buckets overnight for testing (via

Re: [ceph-users] ceph on two data centers far away

2016-10-20 Thread German Anders
Thanks, that's too far actually lol. And how things going with rbd mirroring? *German* 2016-10-20 14:49 GMT-03:00 yan cui : > The two data centers are actually cross US. One is in the west, and the > other in the east. > We try to sync rdb images using RDB mirroring. > >

Re: [ceph-users] removing image of rbd mirroring

2016-10-20 Thread yan cui
Thanks Jason, I will try to use your method. 2016-10-19 17:23 GMT-07:00 Jason Dillaman : > On Wed, Oct 19, 2016 at 6:52 PM, yan cui wrote: > > 2016-10-19 15:46:44.843053 7f35c9925d80 -1 librbd: cannot obtain > exclusive > > lock - not removing > > Are

Re: [ceph-users] ceph on two data centers far away

2016-10-20 Thread German Anders
from curiosity I wanted to ask you what kind of network topology are you trying to use across the cluster? In this type of scenario you really need an ultra low latency network, how far from each other? Best, *German* 2016-10-18 16:22 GMT-03:00 Sean Redmond : > Maybe

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-20 Thread Jim Kilborn
The chart obviously didn’t go well. Here it is again fio --direct=1 --sync=1 --rw={write,randwrite,read,randread} --bs={4M,4K} --numjobs=1 --iodepth=1 --runtime=60 --size=5G --time_based --group_reporting --name=journal-test FIO Test Local disk

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-20 Thread Jim Kilborn
Thanks Christion for the additional information and comments. · upgraded the kernels, but still had poor performance · Removed all the pools and recreated with just a replication of 3, with the two pool for the data and metadata. No cache tier pool · Turned back on

Re: [ceph-users] effectively reducing scrub io impact

2016-10-20 Thread Frédéric Nass
- Le 20 Oct 16, à 15:03, Oliver Dzombic a écrit : > Hi Christian, > thank you for your time. > The problem is deep scrub only. > Jewel 10.2.2 is used. > Thank you for your hint with manual deep scrubs on specific OSD's. I > didnt come up with that idea. > -

Re: [ceph-users] effectively reducing scrub io impact

2016-10-20 Thread Paweł Sadowski
You can inspect source code or do: ceph --admin-daemon /var/run/ceph/ceph-osd.OSD_ID.asok config show | grep scrub # or similar And then check in source code :) On 10/20/2016 03:03 PM, Oliver Dzombic wrote: > Hi Christian, > > thank you for your time. > > The problem is deep scrub only. > >

Re: [ceph-users] Kernel Versions for KVM Hypervisors

2016-10-20 Thread Ilya Dryomov
On Thu, Oct 20, 2016 at 2:45 PM, David Riedl wrote: > Hi cephers, > > I want to use the newest features of jewel on my cluster. I already updated > all kernels on the OSD nodes to the following version: > 4.8.2-1.el7.elrepo.x86_64. > > The KVM hypervisors are running the

Re: [ceph-users] qemu-rbd and ceph striping

2016-10-20 Thread Jason Dillaman
On Thu, Oct 20, 2016 at 1:51 AM, Ahmed Mostafa wrote: > different OSDs PGs -- but more or less correct since the OSDs will process requests for a particular PG sequentially and not in parallel. -- Jason ___ ceph-users

Re: [ceph-users] effectively reducing scrub io impact

2016-10-20 Thread Oliver Dzombic
Hi Christian, thank you for your time. The problem is deep scrub only. Jewel 10.2.2 is used. Thank you for your hint with manual deep scrubs on specific OSD's. I didnt come up with that idea. - Where do you know osd_scrub_sleep from ? I am saw here lately on the mailinglist multiple

[ceph-users] Kernel Versions for KVM Hypervisors

2016-10-20 Thread David Riedl
Hi cephers, I want to use the newest features of jewel on my cluster. I already updated all kernels on the OSD nodes to the following version: 4.8.2-1.el7.elrepo.x86_64. The KVM hypervisors are running the CentOS 7 stock kernel ( 3.10.0-327.22.2.el7.x86_64 ) If I understand it correctly,

Re: [ceph-users] effectively reducing scrub io impact

2016-10-20 Thread Christian Balzer
Hello, On Thu, 20 Oct 2016 11:23:54 +0200 Oliver Dzombic wrote: > Hi, > > we have here globally: > > osd_client_op_priority = 63 > osd_disk_thread_ioprio_class = idle > osd_disk_thread_ioprio_priority = 7 > osd_max_scrubs = 1 > If you google for osd_max_scrubs you will find plenty of

Re: [ceph-users] RBD with SSD journals and SAS OSDs

2016-10-20 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > William Josefsson > Sent: 20 October 2016 10:25 > To: Nick Fisk > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] RBD with SSD journals and SAS OSDs > > On Mon,

[ceph-users] Snapshot size and cluster usage

2016-10-20 Thread Stefan Heitmüller
We do have 2 ceph (9.2.1) clusters, where one is sending snaphots of pools to the other one for backup purposes. Snapshots are fine, however the ceph pool get's blown up by sizes not matching the snapshots. Here's the size of a snapshot and the resulting cluster usage afterwards. The snapshot is

Re: [ceph-users] Surviving a ceph cluster outage: the hard way

2016-10-20 Thread mj
Hi, Interesting reading! Any chance you could state some of your lessons (if any) you learned..? I can, for example, imagine your situation would have been much better with a replication factor of three instead of two..? MJ On 10/20/2016 12:09 AM, Kostis Fardelas wrote: Hello cephers,

Re: [ceph-users] RBD with SSD journals and SAS OSDs

2016-10-20 Thread William Josefsson
On Mon, Oct 17, 2016 at 6:16 PM, Nick Fisk wrote: > Did you also set /check the c-states, this can have a large impact as well? Hi Nick. I did try intel_idle.max_cstate=0, and I've got quite a significant improvement as attached below. Thanks for this advice! This is still with

[ceph-users] effectively reducing scrub io impact

2016-10-20 Thread Oliver Dzombic
Hi, we have here globally: osd_client_op_priority = 63 osd_disk_thread_ioprio_class = idle osd_disk_thread_ioprio_priority = 7 osd_max_scrubs = 1 to influence the scrubbing performance and osd_scrub_begin_hour = 1 osd_scrub_end_hour = 7 to influence the scrubbing time frame Now, as it

Re: [ceph-users] Yet another hardware planning question ...

2016-10-20 Thread Christian Balzer
Hello, On Thu, 20 Oct 2016 07:56:55 + Patrik Martinsson wrote: > Hi Christian,  > > Thanks for your very detailed and thorough explanation, very much > appreciated.  > You're welcome. > We have definitely thought of a design where we have dedicated nvme- > pools for 'high-performance' as

Re: [ceph-users] Yet another hardware planning question ...

2016-10-20 Thread Patrik Martinsson
Hi Christian,  Thanks for your very detailed and thorough explanation, very much appreciated.  We have definitely thought of a design where we have dedicated nvme- pools for 'high-performance' as you say.  At the same time I *thought* that having the journal offloaded to another device *always*

Re: [ceph-users] Surviving a ceph cluster outage: the hard way

2016-10-20 Thread Kris Gillespie
Kostis, Excellent article mate. This is the kind of war story that can really help people out. Learning through (others) adversity. Kris > On 20 Oct 2016, at 00:09, Kostis Fardelas wrote: > > Hello cephers, > this is the blog post on our Ceph cluster's outage we

Re: [ceph-users] Surviving a ceph cluster outage: the hard way

2016-10-20 Thread Kostis Fardelas
We pulled leveldb from upstream and fired leveldb.RepairDB against the OSD omap directory using a simple python script. Ultimately, that didn't make things forward. We resorted to check every object's timestamp/md5sum/attributes on the crashed OSD against the replicas in the cluster and at last