Re: [ceph-users] CEPH pool statistics MAX AVAIL

2019-06-25 Thread Mohamad Gebai
MAX AVAIL is the amount of data you can still write to the cluster before *anyone one of your OSDs* becomes near full. If MAX AVAIL is not what you expect it to be, look at the data distribution using ceph osd tree and make sure you have a uniform distribution. Mohamad On 6/25/19 11:46 AM, Davis

Re: [ceph-users] rbd cache limiting IOPS

2019-03-07 Thread Mohamad Gebai
Hi Florian, On 3/7/19 10:27 AM, Florian Engelmann wrote: > > So the settings are recognized and used by qemu. But any value higher > than the default (32MB) of the cache size leads to strange IOPS > results. IOPS are very constant with 32MB ~20.000 - 23.000 but if we > define a bigger cache size

Re: [ceph-users] rbd space usage

2019-02-28 Thread Mohamad Gebai
On 2/27/19 4:57 PM, Marc Roos wrote: > They are 'thin provisioned' meaning if you create a 10GB rbd, it does > not use 10GB at the start. (afaik) You can use 'rbd -p rbd du' to see how much of these devices is provisioned and see if it's coherent. Mohamad > > > -Original Message- >

Re: [ceph-users] Mimic Bluestore memory optimization

2019-02-25 Thread Mohamad Gebai
Hi Glen, On 2/24/19 9:21 PM, Glen Baars wrote: > I am tracking down a performance issue with some of our mimic 13.2.4 OSDs. It > feels like a lack of memory but I have no real proof of the issue. I have > used the memory profiling ( pprof tool ) and the OSD's are maintaining their > 4GB

Re: [ceph-users] Hardware difference in the same Rack

2019-02-21 Thread Mohamad Gebai
On 2/21/19 1:22 PM, Fabio Abreu wrote: > Hi Everybody, > > It's recommended join different hardwares in the same rack  ? > > For example I have a sata rack with Apollo 4200 storage and I will get > another hardware type to expand this rack, Hp 380 Gen10. > > I was made a lot tests to  understand

Re: [ceph-users] BlueStore / OpenStack Rocky performance issues

2019-02-21 Thread Mohamad Gebai
; >   > > *From: *Mohamad Gebai > *Date: *Thursday, February 21, 2019 at 9:44 AM > *To: *"Smith, Eric" , Sinan Polat > , "ceph-users@lists.ceph.com" > *Subject: *Re: [ceph-users] BlueStore / OpenStack Rocky performance issues > >   > > What is you

Re: [ceph-users] BlueStore / OpenStack Rocky performance issues

2019-02-21 Thread Mohamad Gebai
What is your setup with Bluestore? Standalone OSDs? Or do they have their WAL/DB partitions on another device? How does it compare to your Filestore setup for the journal? On a separate note, these look like they're consumer SSDs, which makes them not a great fit for Ceph. Mohamad On 2/21/19

[ceph-users] Performance issue due to tuned

2019-01-24 Thread Mohamad Gebai
Hi all, I want to share a performance issue I just encountered on a test cluster of mine, specifically related to tuned. I started by setting the "throughput-performance" tuned profile on my OSD nodes and ran some benchmarks. I then applied that same profile to my client node, which intuitively

Re: [ceph-users] monitor cephfs mount io's

2019-01-22 Thread Mohamad Gebai
gt; > > -Original Message- > From: Mohamad Gebai [mailto:mge...@suse.de] > Sent: 17 January 2019 15:57 > To: Marc Roos; ceph-users > Subject: Re: [ceph-users] monitor cephfs mount io's > > You can do that either straight from your client, or by querying the >

Re: [ceph-users] monitor cephfs mount io's

2019-01-17 Thread Mohamad Gebai
You can do that either straight from your client, or by querying the perf dump if you're using ceph-fuse. Mohamad On 1/17/19 6:19 AM, Marc Roos wrote: > > How / where can I monitor the ios on cephfs mount / client? > > ___ > ceph-users mailing list >

Re: [ceph-users] EC pools grinding to a screeching halt on Luminous

2018-12-31 Thread Mohamad Gebai
us know. > > I wish you all a happy new year. > > Regards > Marcus > >> Mohamad Gebai <mailto:mge...@suse.de> >> 28 December 2018 at 16:10 >> Hi Marcus, >> >> On 12/27/18 4:21 PM, Marcus Murwall wrote: >>> Hey Mohamad >>> &

Re: [ceph-users] EC pools grinding to a screeching halt on Luminous

2018-12-28 Thread Mohamad Gebai
might help. Is there anything suspicious in the logs? Also, do you get the same throughput when benchmarking the replicated compared to the EC pool? Mohamad > > > Regards > Marcus > >> Mohamad Gebai <mailto:mge...@suse.de> >> 26 December 2018 at 18:27 >>

Re: [ceph-users] EC pools grinding to a screeching halt on Luminous

2018-12-26 Thread Mohamad Gebai
What is happening on the individual nodes when you reach that point (iostat -x 1 on the OSD nodes)? Also, what throughput do you get when benchmarking the replicated pool? I guess one way to start would be by looking at ongoing operations at the OSD level: ceph daemon osd.X dump_blocked_ops ceph

Re: [ceph-users] RDMA/RoCE enablement failed with (113) No route to host

2018-12-18 Thread Mohamad Gebai
Last I heard (read) was that the RDMA implementation is somewhat experimental. Search for "troubleshooting ceph rdma performance" on this mailing list for more info. (Adding Roman in CC who has been working on this recently.) Mohamad On 12/18/18 11:42 AM, Michael Green wrote: > I don't know.  >

[ceph-users] How are you using tuned

2018-07-12 Thread Mohamad Gebai
Hi all, I was wondering how people were using tuned with Ceph, if at all. I think it makes sense to enable the throuhput-performance profile on OSD nodes, and maybe the network-latency profiles on mon and mgr nodes. Is anyone using a similar configuration, and do you have any thought on this

Re: [ceph-users] in retrospect get OSD for "slow requests are blocked" ? / get detailed health status via librados?

2018-05-16 Thread Mohamad Gebai
On 05/16/2018 07:18 AM, Uwe Sauter wrote: > Hi Mohamad, > >> >> I think this is what you're looking for: >> >> $> ceph daemon osd.X dump_historic_slow_ops >> >> which gives you recent slow operations, as opposed to >> >> $> ceph daemon osd.X dump_blocked_ops >> >> which returns current blocked

Re: [ceph-users] in retrospect get OSD for "slow requests are blocked" ? / get detailed health status via librados?

2018-05-16 Thread Mohamad Gebai
Hi, On 05/16/2018 04:16 AM, Uwe Sauter wrote: > Hi folks, > > I'm currently chewing on an issue regarding "slow requests are blocked". I'd > like to identify the OSD that is causing those events > once the cluster is back to HEALTH_OK (as I have no monitoring yet that would > get this info in

Re: [ceph-users] Questions regarding hardware design of an SSD only cluster

2018-04-23 Thread Mohamad Gebai
On 04/23/2018 09:24 PM, Christian Balzer wrote: > >> If anyone has some ideas/thoughts/pointers, I would be glad to hear them. >> > RAM, you'll need a lot of it, even more with Bluestore given the current > caching. > I'd say 1GB per TB storage as usual and 1-2GB extra per OSD. Does that still

Re: [ceph-users] Ceph luminous - troubleshooting performance issues overall DSK 100%, busy 1%

2018-04-10 Thread Mohamad Gebai
Just to be clear about the issue: You have a 3 servers setup, performance is good. You add a server (with 1 OSD?) and performance goes down, is that right? Can you give us more details? What's your complete setup? How many OSDs per node, bluestore/filestore, WAL/DB setup, etc. You're talking

Re: [ceph-users] What do you use to benchmark your rgw?

2018-04-03 Thread Mohamad Gebai
On 03/28/2018 11:11 AM, Mark Nelson wrote: > Personally I usually use a modified version of Mark Seger's getput > tool here: > > https://github.com/markhpc/getput/tree/wip-fix-timing > > The difference between this version and upstream is primarily to make > getput more accurate/useful when using

Re: [ceph-users] Unstable clock

2017-10-17 Thread Mohamad Gebai
On 10/17/2017 09:57 AM, Sage Weil wrote: > On Tue, 17 Oct 2017, Mohamad Gebai wrote: >> >> Thanks Sage. I assume that's the card you're referring to: >> https://trello.com/c/SAtGPq0N/65-use-time-span-monotonic-for-durations >> >> I can take of that one i

Re: [ceph-users] Unstable clock

2017-10-17 Thread Mohamad Gebai
On 10/17/2017 09:27 AM, Sage Weil wrote: > On Tue, 17 Oct 2017, Mohamad Gebai wrote: > >> It would be good to know if there are any, and maybe prepare for them? > Adam added a new set of clock primitives that include a monotonic clock > option that should be used in all

[ceph-users] Unstable clock

2017-10-17 Thread Mohamad Gebai
Hi, I am looking at the following issue: http://tracker.ceph.com/issues/21375 In summary, during a 'rados bench', impossible latency values (e.g. 9.00648e+07) are suddenly reported. I looked briefly at the code, it seems CLOCK_REALTIME is used, which means that wall clock changes would affect

Re: [ceph-users] Backup VM (Base image + snapshot)

2017-10-15 Thread Mohamad Gebai
Hi, I'm not answering your questions, but I just want to point out that you might be using the documentation for an older version of Ceph: On 10/14/2017 12:25 PM, Oscar Segarra wrote: > > http://docs.ceph.com/docs/giant/rbd/rbd-snapshot/ > If you're not using the 'giant' version of Ceph (which

Re: [ceph-users] BlueStore Cache Ratios

2017-10-11 Thread Mohamad Gebai
Hi Jorge, On 10/10/2017 07:23 AM, Jorge Pinilla López wrote: > Are .99 KV, .01 MetaData and .0 Data ratios right? they seem a little > too disproporcionate. Yes, this is correct. > Also .99 KV and Cache of 3GB for SSD means that almost the 3GB would > be used for KV but there is also another

Re: [ceph-users] Luminous BlueStore EC performance

2017-09-12 Thread Mohamad Gebai
Sorry for the delay. We used the default k=2 and m=1. Mohamad On 09/07/2017 06:22 PM, Christian Wuerdig wrote: > What type of EC config (k+m) was used if I may ask? > > On Fri, Sep 8, 2017 at 1:34 AM, Mohamad Gebai <mge...@suse.de> wrote: >> Hi, >> >> These num

Re: [ceph-users] Luminous BlueStore EC performance

2017-09-07 Thread Mohamad Gebai
Hi, These numbers are probably not as detailed as you'd like, but it's something. They show the overhead of reading and/or writing to EC pools as compared to 3x replicated pools using 1, 2, 8 and 16 threads (single client): Rep EC Diff Slowdown IOPS IOPS

Re: [ceph-users] RBD journaling benchmarks

2017-07-10 Thread Mohamad Gebai
On 07/10/2017 01:51 PM, Jason Dillaman wrote: > On Mon, Jul 10, 2017 at 1:39 PM, Maged Mokhtar wrote: >> These are significant differences, to the point where it may not make sense >> to use rbd journaling / mirroring unless there is only 1 active client. > I interpreted

[ceph-users] RBD journaling benchmarks

2017-07-10 Thread Mohamad Gebai
Resending as my first try seems to have disappeared. Hi, We ran some benchmarks to assess the overhead caused by enabling client-side RBD journaling in Luminous. The tests consists of: - Create an image with journaling enabled (--image-feature journaling) - Run randread, randwrite and randrw