Re: [ceph-users] Required caps for cephfs

2019-04-30 Thread Darius Kasparavičius
Hi, Only available in mimic and up. To create or delete snapshots, clients require the ‘s’ flag in addition to ‘rw’. Note that when capability string also contains the ‘p’ flag, the ‘s’ flag must appear after it (all flags except ‘rw’ must be specified in alphabetical order).

Re: [ceph-users] HW failure cause client IO drops

2019-04-16 Thread Darius Kasparavičius
Hello, Are you using a BBU backed raid controller? It sounds more like your write cache is acting up if you are using one. Can you check what your raid controller is showing? I have sometimes seen raid controllers performing consistency checks or patrol read on single drive raid0. You can disable

Re: [ceph-users] problems with pg down

2019-03-10 Thread Darius Kasparavičius
Hi, Check your osd.14 logs for information its currently stuck and not providing io for replication. And what happened to OSD's 102 121? On Sun, Mar 10, 2019 at 7:44 PM Fabio Abreu wrote: > > Hi Everybody . > > I have an pg with down+peering state and that have requests blocked > impacting my

Re: [ceph-users] objects degraded higher than 100%

2019-03-06 Thread Darius Kasparavičius
; > Simon > > On 06/03/2019 15:26, Darius Kasparavičius wrote: > > Hi, > > > > there it's 1.2% not 1200%. > > > > On Wed, Mar 6, 2019 at 4:36 PM Simon Ironside > wrote: > >> Hi, > >> > >> I'm still seeing this issue during failure testi

Re: [ceph-users] objects degraded higher than 100%

2019-03-06 Thread Darius Kasparavičius
Hi, there it's 1.2% not 1200%. On Wed, Mar 6, 2019 at 4:36 PM Simon Ironside wrote: > > Hi, > > I'm still seeing this issue during failure testing of a new Mimic 13.2.4 > cluster. To reproduce: > > - Working Mimic 13.2.4 cluster > - Pull a disk > - Wait for recovery to complete (i.e. back to

Re: [ceph-users] Ceph cluster on AMD based system.

2019-03-05 Thread Darius Kasparavičius
, probably the most important thing you > can spend money on with NVMe drives is getting high write endurance > (DWPD) if you expect even a moderately high write workload. > > > Mark > > > On 3/5/19 3:49 AM, Darius Kasparavičius wrote: > > Hello, > > > > &g

[ceph-users] Ceph cluster on AMD based system.

2019-03-05 Thread Darius Kasparavičius
Hello, I was thinking of using AMD based system for my new nvme based cluster. In particular I'm looking at https://www.supermicro.com/Aplus/system/1U/1113/AS-1113S-WN10RT.cfm and https://www.amd.com/en/products/cpu/amd-epyc-7451 CPU's. Have anyone tried running it on this particular hardware?

Re: [ceph-users] Right way to delete OSD from cluster?

2019-03-01 Thread Darius Kasparavičius
Hi, Setting crush weight to 0 removes the osds weight from crushmap, by modifying hosts total weight. Which forces rebalancing of data across all the cluster. Setting and OSD to out only modifies "REWEIGHT" status, which balances data inside the same host. On Fri, Mar 1, 2019 at 12:25 PM Paul

Re: [ceph-users] Ceph cluster stability

2019-02-25 Thread Darius Kasparavičius
that exact situation to > arise. Still it left me paranoid about mon DBs and HDDs. > > > > -- aad > > > > > > > > > > But ceph recommendation is to use VM (not even the HW node > > > recommended). will try to change the mon disk as SSD and HW node.

Re: [ceph-users] Ceph cluster stability

2019-02-22 Thread Darius Kasparavičius
e keeping up with requests during a recovery, I could > >> > see that impacting client io. What disks are they running on? CPU? Etc. > >> > > >> > On Fri, Feb 22, 2019, 6:01 AM M Ranga Swami Reddy > >> > wrote: > >> >> > >>

Re: [ceph-users] Ceph cluster stability

2019-02-20 Thread Darius Kasparavičius
Hello, Check your CPU usage when you are doing those kind of operations. We had a similar issue where our CPU monitoring was reporting fine < 40% usage, but our load on the nodes was high mid 60-80. If it's possible try disabling ht and see the actual cpu usage. If you are hitting CPU limits you

[ceph-users] Ceph mimic issue with snaptimming.

2019-01-30 Thread Darius Kasparavičius
Hello, I have recently update a cluster to mimic. After the upgrade I have started converting nodes to bluestore one by one. While ceph was rebalancing I slapped a "nosnaptrim" on the cluster to save a bit of IO. After the rebalancing was done I enabled the snaptrim and my osds started flapping

Re: [ceph-users] Poor ceph cluster performance

2018-11-27 Thread Darius Kasparavičius
Hi, Most likely the issue is with your consumer grade journal ssd. Run this to your ssd to check if it performs: fio --filename= --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test On Tue, Nov 27, 2018 at 2:06 AM Cody

Re: [ceph-users] Ceph pure ssd strange performance.

2018-11-20 Thread Darius Kasparavičius
Update. So I rebuilt the osd with a separate DB partition on the ssd drive and i/o to disks is what I expected, about ~3x the client I/O. On Tue, Nov 20, 2018 at 11:30 AM Darius Kasparavičius wrote: > > Hello, > > > I'm running some tests on pure SSD pool with mimic and blues

[ceph-users] Ceph pure ssd strange performance.

2018-11-20 Thread Darius Kasparavičius
Hello, I'm running some tests on pure SSD pool with mimic and bluestore. Strange thing is that currently running fio into rbd images I'm getting a huge difference client and disk I/O. For pure write performance I'm seeing about ~20k iops on the client side and about ~300k on the ssd side. I have

Re: [ceph-users] Best handling network maintenance

2018-10-05 Thread Darius Kasparavičius
Hello, I would have risked a nodown option for this short downtime. We had a similar experience when we updated a bonded switch and had reboot it. Some of the connections dropped and whole cluster started marking some osds as down. Due to this almost all osd were marked as down, but none of the

Re: [ceph-users] Mimic offline problem

2018-10-03 Thread Darius Kasparavičius
nfo_ignore_history_les = true ? > Is that be usefull here? There is such a less information about it. > > Goktug Yildirim şunları yazdı (2 Eki 2018 22:11): > > Hi, > > Indeed I left ceph-disk to decide the wal and db partitions when I read > somewhere that that will do the

Re: [ceph-users] Mimic offline problem

2018-10-02 Thread Darius Kasparavičius
stucked or unfound PGs as > you advise. > > Any thought would be greatly appreciated. > > > > On 2 Oct 2018, at 18:16, Darius Kasparavičius wrote: > > > > Hello, > > > > Currently you have 15 objects missing. I would recommend finding them > > a

Re: [ceph-users] Mimic offline problem

2018-10-02 Thread Darius Kasparavičius
Hello, Currently you have 15 objects missing. I would recommend finding them and making backups of them. Ditch all other osds that are failing to start and concentrate on bringing online those that have missing objects. Then slowly turn off nodown and noout on the cluster and see if it

Re: [ceph-users] Slow requests blocked. No rebalancing

2018-09-20 Thread Darius Kasparavičius
Hello, 2018-09-20 09:32:58.851160 mon.dri-ceph01 [WRN] Health check update: 249 PGs pending on creation (PENDING_CREATING_PGS) This error might indicate that you are hitting a PG limit per osd. Here some information on it https://ceph.com/community/new-luminous-pg-overdose-protection/ . You

Re: [ceph-users] help needed

2018-09-06 Thread Darius Kasparavičius
Hello, I'm currently running a similar setup. It's running a blustore OSD with 1 NVME device for db/wal devices. That NVME device is not large enough to support 160GB db partition per osd, so I'm stuck with 50GB each. Currently haven't had any issues with slowdowns or crashes. The cluster is