Re: [ceph-users] How to speed up backfill

2018-01-10 Thread Irek Fasikhov
ceph tell osd.* injectargs '--osd_recovery_delay_start 30' 2018-01-11 10:31 GMT+03:00 shadow_lin : > Hi , > Mine is purely backfilling(remove a osd from the cluster) and it > started at 600Mb/s and ended at about 3MB/s. > How is your recovery made up?Is it backfill or log replay pg recov

Re: [ceph-users] How to speed up backfill

2018-01-10 Thread shadow_lin
Hi , Mine is purely backfilling(remove a osd from the cluster) and it started at 600Mb/s and ended at about 3MB/s. How is your recovery made up?Is it backfill or log replay pg recovery or both? 2018-01-11 shadow_lin 发件人:Josef Zelenka 发送时间:2018-01-11 15:26 主题:Re: [ceph-users] How

Re: [ceph-users] How to speed up backfill

2018-01-10 Thread Josef Zelenka
Hi, our recovery slowed down significantly towards the end, however it was still about five times faster than the original speed.We suspected that this is caused somehow by threading (more objects transferred - more threads used), but this is only an assumption. On 11/01/18 05:02, shadow_lin

[ceph-users] Ceph MGR Influx plugin 12.2.2

2018-01-10 Thread Reed Dier
Hi all, Does anyone have any idea if the influx plugin for ceph-mgr is stable in 12.2.2? Would love to ditch collectd and report directly from ceph if that is the case. Documentation says that it is added in Mimic/13.x, however it looks like from an earlier ML post that it would be coming to Lu

Re: [ceph-users] Ceph 10.2.10 - SegFault in ms_pipe_read

2018-01-10 Thread Dyweni - Ceph-Users
I moved the drive from the crashing 10.2.10 OSD node into a different 10.2.10 OSD and everything is working fine now. On 2018-01-10 20:42, Dyweni - Ceph-Users wrote: Hi, My cluster has 12.2.2 Mons and Mgrs, and 10.2.10 OSDs. I tried adding a new 12.2.2 OSD into the mix and it crashed (expec

[ceph-users] Ceph 10.2.10 - SegFault in ms_pipe_read

2018-01-10 Thread Dyweni - Ceph-Users
Hi, My cluster has 12.2.2 Mons and Mgrs, and 10.2.10 OSDs. I tried adding a new 12.2.2 OSD into the mix and it crashed (expected). However, now one of my existing 10.2.10 OSDs is crashing. I've not had any issues with the 10.2.10 OSDs to date. What is strange, is that both the 10.2.10 and 1

Re: [ceph-users] How to speed up backfill

2018-01-10 Thread Sergey Malinin
It is also worth looking at osd_recovery_sleep option. From: ceph-users on behalf of Josef Zelenka Sent: Thursday, January 11, 2018 12:07:45 AM To: shadow_lin Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] How to speed up backfill On 10/01/18 21:53,

Re: [ceph-users] ceph-volume does not support upstart

2018-01-10 Thread 赵赵贺东
Hello, I am sorry for the delay. Thank you for your suggestion. It is better to update system or keep using ceph-disk in fact. Thank you Alfredo Deza & Cary. > 在 2018年1月8日,下午11:41,Alfredo Deza 写道: > > ceph-volume relies on systemd, it will not work with upstart. Going > the fstab way might wo

Re: [ceph-users] OSDs going down/up at random

2018-01-10 Thread Brad Hubbard
On Wed, Jan 10, 2018 at 8:32 PM, Mike O'Connor wrote: > On 10/01/2018 4:48 PM, Mike O'Connor wrote: >> On 10/01/2018 4:24 PM, Sam Huracan wrote: >>> Hi Mike, >>> >>> Could you show system log at moment osd down and up? > So now I know its a crash, what my next step. As soon as I put the > system u

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-01-10 Thread Sean Redmond
Hi David, Thanks for your email, they are connected inside Dell R730XD (2.5 inch 24 disk model) in None RAID mode via a perc RAID card. The version of ceph is Jewel with kernel 4.13.X and ubuntu 16.04. Thanks for your feedback on the HGST disks. Thanks On Wed, Jan 10, 2018 at 10:55 PM, David H

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-01-10 Thread David Herselman
Hi Sean, No, Intel’s feedback has been… Pathetic… I have yet to receive anything more than a request to ‘sign’ a non-disclosure agreement, to obtain beta firmware. No official answer as to whether or not one can logically unlock the drives, no answer to my question whether or not Intel publish

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-01-10 Thread Sean Redmond
Hi, I have a case where 3 out to 12 of these Intel S4600 2TB model failed within a matter of days after being burn-in tested then placed into production. I am interested to know, did you every get any further feedback from the vendor on your issue? Thanks On Thu, Dec 21, 2017 at 1:38 PM, David

Re: [ceph-users] Incomplete pgs and no data movement ( cluster appears readonly )

2018-01-10 Thread Brent Kennedy
Ugh, that’s what I was hoping to avoid. OSD 13 is still in the server, I wonder if I could somehow bring it back in as OSD 13 to see if it has the missing data. I was looking into using the ceph-objectstore tool, but the only instructions I can find online are sparse and mostly in this lists

Re: [ceph-users] How to speed up backfill

2018-01-10 Thread Josef Zelenka
On 10/01/18 21:53, Josef Zelenka wrote: Hi, i had the same issue a few days back, i tried playing around with these two: ceph tell 'osd.*' injectargs '--osd-max-backfills ' ceph tell 'osd.*' injectargs '--osd-recovery-max-active ' and it helped greatly(increased our recovery speed 20x),

[ceph-users] Cluster crash - FAILED assert(interval.last > last)

2018-01-10 Thread Josef Zelenka
Hi, today we had a disasterous crash - we are running a 3 node, 24 osd in total cluster (8 each) with SSDs for blockdb, HDD for bluestore data. This cluster is used as a radosgw backend, for storing a big number of thumbnails for a file hosting site - around 110m files in total. We were adding

Re: [ceph-users] Incomplete pgs and no data movement ( cluster appears readonly )

2018-01-10 Thread Gregory Farnum
On Wed, Jan 10, 2018 at 11:14 AM, Brent Kennedy wrote: > I adjusted “osd max pg per osd hard ratio ” to 50.0 and left “mon max pg per > osd” at 5000 just to see if things would allow data movement. This worked, > the new pool I created finished its creation and spread out. I was able to > then c

Re: [ceph-users] luminous: HEALTH_ERR full ratio(s) out of order

2018-01-10 Thread Sage Weil
On Wed, 10 Jan 2018, Stefan Priebe - Profihost AG wrote: > Am 10.01.2018 um 16:38 schrieb Sage Weil: > > On Wed, 10 Jan 2018, John Spray wrote: > >> On Wed, Jan 10, 2018 at 2:11 PM, Stefan Priebe - Profihost AG > >> wrote: > >>> Hello, > >>> > >>> since upgrading to luminous i get the following er

Re: [ceph-users] luminous: HEALTH_ERR full ratio(s) out of order

2018-01-10 Thread Stefan Priebe - Profihost AG
Am 10.01.2018 um 16:38 schrieb Sage Weil: > On Wed, 10 Jan 2018, John Spray wrote: >> On Wed, Jan 10, 2018 at 2:11 PM, Stefan Priebe - Profihost AG >> wrote: >>> Hello, >>> >>> since upgrading to luminous i get the following error: >>> >>> HEALTH_ERR full ratio(s) out of order >>> OSD_OUT_OF_ORDER

[ceph-users] issue adding OSDs

2018-01-10 Thread Luis Periquito
Hi, I'm running a cluster with 12.2.1 and adding more OSDs to it. Everything is running version 12.2.1 and require_osd is set to luminous. one of the pools is replicated with size 2 min_size 1, and is seemingly blocking IO while recovering. I have no slow requests, looking at the output of "ceph

Re: [ceph-users] Incomplete pgs and no data movement ( cluster appears readonly )

2018-01-10 Thread Brent Kennedy
I adjusted “osd max pg per osd hard ratio ” to 50.0 and left “mon max pg per osd” at 5000 just to see if things would allow data movement. This worked, the new pool I created finished its creation and spread out. I was able to then copy the data from the existing pool into the new pool and del

Re: [ceph-users] How to "reset" rgw?

2018-01-10 Thread Casey Bodley
On 01/10/2018 04:34 AM, Martin Emrich wrote: Hi! As I cannot find any solution for my broken rgw pools, the only way out is to give up and "reset". How do I throw away all rgw data from a ceph cluster? Just delete all rgw pools? Or are some parts stored elsewhere (monitor, ...)? Thanks,

Re: [ceph-users] Incomplete pgs and no data movement ( cluster appears readonly )

2018-01-10 Thread Brent Kennedy
I change “mon max pg per osd” to 5000 because when I changed it to zero, which was supposed to disable it, it caused an issue where I couldn’t create any pools. It would say 0 was larger than the minimum. I imagine that’s a bug, if I wanted it disabled, then it shouldn’t use the calculation.

[ceph-users] How to speed up backfill

2018-01-10 Thread shadow_lin
Hi all, I am playing with setting for backfill to try to find how to control the speed of backfill. Now I only find "osd max backfills" can have effect the backfill speed. But after all pg need to be backfilled begin backfilling I can't find any way to speed up backfills. Especailly when it c

Re: [ceph-users] Open Compute (OCP) servers for Ceph

2018-01-10 Thread Wes Dillingham
Not OCP but regarding 12 3.5 drives in 1U with Decent CPU QCT makes the following: https://www.qct.io/product/index/Server/rackmount-server/1U-Rackmount-Server/QuantaGrid-S51G-1UL and have a few other models with some additional SSD included in addition to the 3.5" Both of those compared here: htt

Re: [ceph-users] Bad crc causing osd hang and block all request.

2018-01-10 Thread shadow_lin
Thanks for your advice I rebuilt the osd and haven't have this happened again.So it could be corruption on the hdds. 2018-01-11 lin.yunfan 发件人:Konstantin Shalygin 发送时间:2018-01-09 12:11 主题:Re: [ceph-users] Bad crc causing osd hang and block all request. 收件人:"ceph-users" 抄送: > What could ca

[ceph-users] Changing device-class using crushtool

2018-01-10 Thread Wido den Hollander
Hi, Is there a way to easily modify the device-class of devices on a offline CRUSHMap? I know I can decompile the CRUSHMap and do it, but that's a lot of work in a large environment. In larger environments I'm a fan of downloading the CRUSHMap, modifying it to my needs, testing it and inje

Re: [ceph-users] luminous: HEALTH_ERR full ratio(s) out of order

2018-01-10 Thread Sage Weil
On Wed, 10 Jan 2018, John Spray wrote: > On Wed, Jan 10, 2018 at 2:11 PM, Stefan Priebe - Profihost AG > wrote: > > Hello, > > > > since upgrading to luminous i get the following error: > > > > HEALTH_ERR full ratio(s) out of order > > OSD_OUT_OF_ORDER_FULL full ratio(s) out of order > > backf

Re: [ceph-users] luminous: HEALTH_ERR full ratio(s) out of order

2018-01-10 Thread John Spray
On Wed, Jan 10, 2018 at 2:11 PM, Stefan Priebe - Profihost AG wrote: > Hello, > > since upgrading to luminous i get the following error: > > HEALTH_ERR full ratio(s) out of order > OSD_OUT_OF_ORDER_FULL full ratio(s) out of order > backfillfull_ratio (0.9) < nearfull_ratio (0.95), increased >

Re: [ceph-users] luminous: HEALTH_ERR full ratio(s) out of order

2018-01-10 Thread David Turner
Why oh why would you run with such lean settings? You very well might not be able to recover your cluster if something happened while you were at 94% full without even a nearfull warning on anything. Nearfull should at least be brought down as it's just a warning in ceph's output to tell you to get

Re: [ceph-users] luminous: HEALTH_ERR full ratio(s) out of order

2018-01-10 Thread Webert de Souza Lima
Good to know. I don't think this should trigger HEALTH_ERR though, but HEALTH_WARN makes sense. It makes sense to keep the backfillfull_ratio greater than nearfull_ratio as one might need backfilling to avoid OSD getting full on reweight operations. Regards, Webert Lima DevOps Engineer at MAV Te

Re: [ceph-users] 'lost' cephfs filesystem?

2018-01-10 Thread Webert de Souza Lima
On Wed, Jan 10, 2018 at 12:44 PM, Mark Schouten wrote: > > Thanks, that's a good suggestion. Just one question, will this affect > RBD- > > access from the same (client)host? i'm sorry that this didn't help. No, it does not affect rbd clients, as MDS is related only to cephfs. Regards, Webert

Re: [ceph-users] 'lost' cephfs filesystem?

2018-01-10 Thread Mark Schouten
On woensdag 10 januari 2018 14:15:19 CET Mark Schouten wrote: > On woensdag 10 januari 2018 08:42:04 CET Webert de Souza Lima wrote: > > try to kick out (evict) that cephfs client from the mds node, see > > http://docs.ceph.com/docs/master/cephfs/eviction/ > > Thanks, that's a good suggestion. Jus

Re: [ceph-users] 'lost' cephfs filesystem?

2018-01-10 Thread Yan, Zheng
On Wed, Jan 10, 2018 at 10:59 AM, Mark Schouten wrote: > Hi, > > While upgrading a server with a CephFS mount tonight, it stalled on installing > a new kernel, because it was waiting for `sync`. > > I'm pretty sure it has something to do with the CephFS filesystem which caused > some issues last w

Re: [ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

2018-01-10 Thread Jens-U. Mozdzen
Hi Alfredo, thank you for your comments: Zitat von Alfredo Deza : On Wed, Jan 10, 2018 at 8:57 AM, Jens-U. Mozdzen wrote: Dear *, has anybody been successful migrating Filestore OSDs to Bluestore OSDs, keeping the OSD number? There have been a number of messages on the list, reporting proble

Re: [ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

2018-01-10 Thread Alfredo Deza
On Wed, Jan 10, 2018 at 8:57 AM, Jens-U. Mozdzen wrote: > Dear *, > > has anybody been successful migrating Filestore OSDs to Bluestore OSDs, > keeping the OSD number? There have been a number of messages on the list, > reporting problems, and my experience is the same. (Removing the existing > OS

[ceph-users] luminous: HEALTH_ERR full ratio(s) out of order

2018-01-10 Thread Stefan Priebe - Profihost AG
Hello, since upgrading to luminous i get the following error: HEALTH_ERR full ratio(s) out of order OSD_OUT_OF_ORDER_FULL full ratio(s) out of order backfillfull_ratio (0.9) < nearfull_ratio (0.95), increased but ceph.conf has: mon_osd_full_ratio = .97 mon_osd_nearfull_ratio

[ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

2018-01-10 Thread Jens-U. Mozdzen
Dear *, has anybody been successful migrating Filestore OSDs to Bluestore OSDs, keeping the OSD number? There have been a number of messages on the list, reporting problems, and my experience is the same. (Removing the existing OSD and creating a new one does work for me.) I'm working on

Re: [ceph-users] rbd: map failed

2018-01-10 Thread Lenz Grimmer
On 01/09/2018 07:46 PM, Karun Josy wrote: > We have a user "testuser" with below permissions : > > $ ceph auth get client.testuser > exported keyring for client.testuser > [client.testuser] >         key = == >         caps mon = "profile rbd" >         caps osd = "profile rbd pool=ecpool, pr

Re: [ceph-users] 'lost' cephfs filesystem?

2018-01-10 Thread Mark Schouten
On woensdag 10 januari 2018 08:42:04 CET Webert de Souza Lima wrote: > try to kick out (evict) that cephfs client from the mds node, see > http://docs.ceph.com/docs/master/cephfs/eviction/ Thanks, that's a good suggestion. Just one question, will this affect RBD- access from the same (client)host

Re: [ceph-users] ceph-volume lvm deactivate/destroy/zap

2018-01-10 Thread Alfredo Deza
On Wed, Jan 10, 2018 at 2:10 AM, Fabian Grünbichler wrote: > On Tue, Jan 09, 2018 at 02:14:51PM -0500, Alfredo Deza wrote: >> On Tue, Jan 9, 2018 at 1:35 PM, Reed Dier wrote: >> > I would just like to mirror what Dan van der Ster’s sentiments are. >> > >> > As someone attempting to move an OSD to

Re: [ceph-users] 'lost' cephfs filesystem?

2018-01-10 Thread Webert de Souza Lima
try to kick out (evict) that cephfs client from the mds node, see http://docs.ceph.com/docs/master/cephfs/eviction/ Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Wed, Jan 10, 2018 at 12:59 AM, Mark Schouten wrote: > Hi, > > While up

Re: [ceph-users] OSDs going down/up at random

2018-01-10 Thread Mike O'Connor
On 10/01/2018 4:48 PM, Mike O'Connor wrote: > On 10/01/2018 4:24 PM, Sam Huracan wrote: >> Hi Mike, >> >> Could you show system log at moment osd down and up? So now I know its a crash, what my next step. As soon as I put the system under write load, OSDs start crashing. Mike _

[ceph-users] How to "reset" rgw?

2018-01-10 Thread Martin Emrich
Hi! As I cannot find any solution for my broken rgw pools, the only way out is to give up and "reset". How do I throw away all rgw data from a ceph cluster? Just delete all rgw pools? Or are some parts stored elsewhere (monitor, ...)? Thanks, Martin ___

Re: [ceph-users] MDS cache size limits

2018-01-10 Thread stefan
Quoting John Spray (jsp...@redhat.com): > On Mon, Jan 8, 2018 at 8:02 PM, Marc Roos wrote: > > > > I guess the mds cache holds files, attributes etc but how many files > > will the default "mds_cache_memory_limit": "1073741824" hold? > > We always used to get asked how much memory a given mds_cac

Re: [ceph-users] Incomplete pgs and no data movement ( cluster appears readonly )

2018-01-10 Thread Janne Johansson
2018-01-10 8:51 GMT+01:00 Brent Kennedy : > As per a previous thread, my pgs are set too high. I tried adjusting the > “mon max pg per osd” up higher and higher, which did clear the > error(restarted monitors and managers each time), but it seems that data > simply wont move around the cluster.