Re: [ceph-users] Upgrading journals to BlueStore: a conundrum

2018-08-08 Thread Gregory Farnum
You could try flushing out the FileStore journals off the SSD and creating new ones elsewhere (eg, colocated). This will obviously have a substantial impact on performance but perhaps that’s acceptable during your upgrade window? On Mon, Aug 6, 2018 at 12:32 PM Robert Stanford wrote: > >

Re: [ceph-users] [Ceph-deploy] Cluster Name

2018-08-08 Thread Thode Jocelyn
Hi Erik, The thing is that the rbd-mirror service uses the /etc/sysconfig/ceph file to determine which configuration file to use (from CLUSTER_NAME). So you need to set this to the name you chose for rbd-mirror to work. However setting this CLUSTER_NAME variable in /etc/sysconfig/ceph makes it

Re: [ceph-users] permission errors rolling back ceph cluster to v13

2018-08-08 Thread Raju Rangoju
Thanks Greg. I think I have to re-install ceph v13 from scratch then. -Raju From: Gregory Farnum Sent: 09 August 2018 01:54 To: Raju Rangoju Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] permission errors rolling back ceph cluster to v13 On Tue, Aug 7, 2018 at 6:27 PM Raju Rangoju

Re: [ceph-users] OSD had suicide timed out

2018-08-08 Thread Brad Hubbard
If, in the above case, osd 13 was not too busy to respond (resource shortage) then you need to find out why else osd 5, etc. could not contact it. On Wed, Aug 8, 2018 at 6:47 PM, Josef Zelenka wrote: > Checked the system load on the host with the OSD that is suiciding currently > and it's fine,

Re: [ceph-users] ceph mds memory usage 20GB : is it normal ?

2018-08-08 Thread Alexandre DERUMIER
Hi, I have upgraded to 12.2.7 , 2 weeks ago, and I don't see anymore memory increase ! (can't confirm that it was related to your patch). Thanks again for helping ! Regards, Alexandre Derumier - Mail original - De: "Zheng Yan" À: "aderumier" Cc: "ceph-users" Envoyé: Mardi 29

Re: [ceph-users] Slack-IRC integration

2018-08-08 Thread Gregory Farnum
I looked at this a bit and it turns out anybody who's already in the slack group can invite people with unrestricted domains. I think it's just part of Slack that you need to specify which domains are allowed in by default? Patrick set things up a couple years ago so I suppose our next community

[ceph-users] removing auids and auid-based cephx capabilities

2018-08-08 Thread Sage Weil
There is an undocumented part of the cephx authentication framework called the 'auid' (auth uid) that assigns an integer identifier to cephx users and to rados pools and allows you to craft cephx capabilities that apply to those pools. This is leftover infrastructure from an ancient time in

Re: [ceph-users] permission errors rolling back ceph cluster to v13

2018-08-08 Thread Gregory Farnum
On Tue, Aug 7, 2018 at 6:27 PM Raju Rangoju wrote: > Hi, > > > > I have been running into some connection issues with the latest ceph-14 > version, so we thought the feasible solution would be to roll back the > cluster to previous version (ceph-13.0.1) where things are known to work > properly.

Re: [ceph-users] cephfs kernel client hangs

2018-08-08 Thread John Spray
On Wed, Aug 8, 2018 at 4:46 PM Jake Grimmett wrote: > > Hi John, > > With regard to memory pressure; Does the cephfs fuse client also cause a > deadlock - or is this just the kernel client? TBH, I'm not expert enough on the kernel-side implementation of fuse to say. Ceph does have the

Re: [ceph-users] cephfs kernel client hangs

2018-08-08 Thread Jake Grimmett
Hi John, With regard to memory pressure; Does the cephfs fuse client also cause a deadlock - or is this just the kernel client? We run the fuse client on ten OSD nodes, and use parsync (parallel rsync) to backup two beegfs systems (~1PB). Ordinarily fuse works OK, but any OSD problems can cause

Re: [ceph-users] cephfs kernel client hangs

2018-08-08 Thread Webert de Souza Lima
You can only try to remount the cephs dir. It will probably not work, giving you I/O Errors, so the fallback would be to use a fuse-mount. If I recall correctly you could do a lazy umount on the current dir (umount -fl /mountdir) and remount it using the FUSE client. it will work for new sessions

Re: [ceph-users] cephfs kernel client hangs

2018-08-08 Thread Zhenshi Zhou
Hi, Is there any other way excpet rebooting the server when the client hangs? If the server is in production environment, I can't restart it everytime. Webert de Souza Lima 于2018年8月8日周三 下午10:33写道: > Hi Zhenshi, > > if you still have the client mount hanging but no session is connected, > you

Re: [ceph-users] [Ceph-deploy] Cluster Name

2018-08-08 Thread Erik McCormick
I'm not using this feature, so maybe I'm missing something, but from the way I understand cluster naming to work... I still don't understand why this is blocking for you. Unless you are attempting to mirror between two clusters running on the same hosts (why would you do this?) then systemd

Re: [ceph-users] cephfs kernel client hangs

2018-08-08 Thread Webert de Souza Lima
Hi Zhenshi, if you still have the client mount hanging but no session is connected, you probably have some PID waiting with blocked IO from cephfs mount. I face that now and then and the only solution is to reboot the server, as you won't be able to kill a process with pending IO. Regards,

Re: [ceph-users] cephfs kernel client hangs

2018-08-08 Thread Zhenshi Zhou
Hi Webert, That command shows the current sessions, whereas the server which I get the files(osdc,mdsc,monc) disconnect for a long time. So I cannot get useful infomation from the command you provide. Thanks Webert de Souza Lima 于2018年8月8日周三 下午10:10写道: > You could also see open sessions at the

Re: [ceph-users] Whole cluster flapping

2018-08-08 Thread Will Marley
Hi again Frederic, It may be worth looking at a recovery sleep. osd recovery sleep Description: Time in seconds to sleep before next recovery or backfill op. Increasing this value will slow down recovery operation while client operations will be less impacted. Type: Float Default: 0 osd

Re: [ceph-users] cephfs kernel client hangs

2018-08-08 Thread Webert de Souza Lima
You could also see open sessions at the MDS server by issuing `ceph daemon mds.XX session ls` Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Wed, Aug 8, 2018 at 5:08 AM Zhenshi Zhou wrote: > Hi, I find an old server which mounted

Re: [ceph-users] Whole cluster flapping

2018-08-08 Thread Webert de Souza Lima
So your OSDs are really too busy to respond heartbeats. You'll be facing this for sometime until cluster loads get lower. I would set `ceph osd set nodeep-scrub` until the heavy disk IO stops. maybe you can schedule it for enable during the night and disabling in the morning. Regards, Webert

Re: [ceph-users] [Ceph-deploy] Cluster Name

2018-08-08 Thread Thode Jocelyn
Hi, We are still blocked by this problem on our end. Glen did you or someone else figure out something for this ? Regards Jocelyn Thode From: Glen Baars [mailto:g...@onsitecomputers.com.au] Sent: jeudi, 2 août 2018 05:43 To: Erik McCormick Cc: Thode Jocelyn ; Vasu Kulkarni ;

Re: [ceph-users] Whole cluster flapping

2018-08-08 Thread CUZA Frédéric
Thx for the command line, I did take a look too it what I don’t really know what to search for, my bad…. All this flapping is due to deep-scrub when it starts on an OSD things start to go bad. I set out all the OSDs that were flapping the most (1 by 1 after rebalancing) and it looks better

Re: [ceph-users] CephFS - Mounting a second Ceph file system

2018-08-08 Thread John Spray
On Tue, Aug 7, 2018 at 11:41 PM Scott Petersen wrote: > We are using kernel 4.15.17 and we keep receiving this error > mount.ceph: unrecognized mount option "mds_namespace", passing to kernel. > That message is harmless -- it just means that the userspace mount.ceph utility doesn't do anything

Re: [ceph-users] Tons of "cls_rgw.cc:3284: gc_iterate_entries end_key=" records in OSD logs

2018-08-08 Thread Jakub Jaszewski
Hi All, exactly the same story today, same 8 OSDs and a lot of garbage collection objects to process Below is the number of "cls_rgw.cc:3284: gc_iterate_entries end_key=" entries per OSD log file hostA: /var/log/ceph/ceph-osd.58.log 1826467 hostB: /var/log/ceph/ceph-osd.88.log 2924241

Re: [ceph-users] OSD had suicide timed out

2018-08-08 Thread Brad Hubbard
Do you see "internal heartbeat not healthy" messages in the log of the osd that suicides? On Wed, Aug 8, 2018 at 5:45 PM, Brad Hubbard wrote: > What is the load like on the osd host at the time and what does the > disk utilization look like? > > Also, what does the transaction look like from one

Re: [ceph-users] OSD had suicide timed out

2018-08-08 Thread Brad Hubbard
What is the load like on the osd host at the time and what does the disk utilization look like? Also, what does the transaction look like from one of the osds that sends the "you died" message with debugging osd 20 and ms 1 enabled? On Wed, Aug 8, 2018 at 5:34 PM, Josef Zelenka wrote: > Thank

Re: [ceph-users] cephfs kernel client hangs

2018-08-08 Thread Zhenshi Zhou
Hi, I find an old server which mounted cephfs and has the debug files. # cat osdc REQUESTS 0 homeless 0 LINGER REQUESTS BACKOFFS # cat monc have monmap 2 want 3+ have osdmap 3507 have fsmap.user 0 have mdsmap 55 want 56+ fs_cluster_id -1 # cat mdsc 194 mds0getattr #1036ae3 What does

Re: [ceph-users] pg count question

2018-08-08 Thread Sébastien VIGNERON
The formula seems correct for a 100 pg/OSD target. > Le 8 août 2018 à 04:21, Satish Patel a écrit : > > Thanks! > > Do you have any comments on Question: 1 ? > > On Tue, Aug 7, 2018 at 10:59 AM, Sébastien VIGNERON > wrote: >> Question 2: >> >> ceph osd pool set-quota max_objects|max_bytes

Re: [ceph-users] OSD had suicide timed out

2018-08-08 Thread Josef Zelenka
Thank you for your suggestion, tried it,  really seems like the other osds think the osd is dead(if I understand this right), however the networking seems absolutely fine between the nodes(no issues in graphs etc).    -13> 2018-08-08 09:13:58.466119 7fe053d41700  1 -- 10.12.3.17:0/706864 <==