[ceph-users] Re: CephFS hangs with access denied

2020-02-13 Thread thoralf schulze
hi Dietmar, were the osds really down, or was this just the perception of the hung client? playing around with mds_session_blacklist_on_timeout, mds_session_blacklist_on_evict to allow the clients to actually reconnect and mds_cap_revoke_eviction_timeout to forcibly evict hung client might be wor

[ceph-users] Re: CephFS hangs with access denied

2020-02-13 Thread Dietmar Rieder
Hi, the attachments got removed from my previous message, here the pastebins: client vmcore-dmesg: https://pastebin.com/AFZgkpaK mds.log: https://pastebin.com/FUU6hyya Best Dietmar On 2020-02-13 05:00, Dietmar Rieder wrote: > Hi, > > now we got a kernel crash (Oops) probably related to the

[ceph-users] Cleanup old messages in ceph health

2020-02-13 Thread Thomas Schneider
Hi, the current outpu of ceph -s reports a warning: 9 daemons have recently crashed root@ld3955:~# ceph -s   cluster:     id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae     health: HEALTH_WARN     9 daemons have recently crashed     2 slow ops, oldest one blocked for 347335 sec, mon.

[ceph-users] Ceph and Windows - experiences or suggestions

2020-02-13 Thread Lars Täuber
Hi there! I got the task to connect a Windows client to our existing ceph cluster. I'm looking for experiences or suggestions from the community. There came two possibilities to my mind: 1. iSCSI Target on RBD exported to Windows 2. NFS-Ganesha on CephFS exported to Windows Is there a third way e

[ceph-users] Identify slow ops

2020-02-13 Thread Thomas Schneider
Hi, the current output of ceph -s reports a warning: 2 slow ops, oldest one blocked for 347335 sec, mon.ld5505 has slow ops This time is increasing. root@ld3955:~# ceph -s   cluster:     id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae     health: HEALTH_WARN     9 daemons have recently crash

[ceph-users] Re: Ceph and Windows - experiences or suggestions

2020-02-13 Thread Marc Roos
Via smb, much discussed here -Original Message- Sent: 13 February 2020 09:33 To: ceph-users@ceph.io Subject: [ceph-users] Ceph and Windows - experiences or suggestions Hi there! I got the task to connect a Windows client to our existing ceph cluster. I'm looking for experiences or sugge

[ceph-users] Re: CephFS hangs with access denied

2020-02-13 Thread Dietmar Rieder
Hi, they were not down as far as I can tell form the affected osd logs at the time in question. I'll try to play with those values, thanks. Is there anything else that might help? The kernel crash is something that makes me nervous. Dietmar On 2020-02-13 09:16, thoralf schulze wrote: > hi Dietma

[ceph-users] Re: Cleanup old messages in ceph health

2020-02-13 Thread Eugen Block
Hi Thomas, there is a crash command for ceph: ceph crash ls ceph crash prune Regards, Eugen Zitat von Thomas Schneider <74cmo...@gmail.com>: Hi, the current outpu of ceph -s reports a warning: 9 daemons have recently crashed root@ld3955:~# ceph -s   cluster:     id: 6b1b5117-6e08-4843-

[ceph-users] Re: Cleanup old messages in ceph health

2020-02-13 Thread Dan van der Ster
Hi, For the crashes, you can run `ceph crash prune 0`. For the mon slow op, it is probably https://tracker.ceph.com/issues/43893 (and you can see how to clear it up in that issue). Cheers, Dan On Thu, Feb 13, 2020 at 9:33 AM Thomas Schneider <74cmo...@gmail.com> wrote: > > Hi, > > the curr

[ceph-users] Re: Ceph and Windows - experiences or suggestions

2020-02-13 Thread Lars Täuber
I don't have samba experiences. Isn't the installation and administration of a samba server just for one "share" overkill? Thu, 13 Feb 2020 09:36:31 +0100 "Marc Roos" ==> ceph-users , taeuber : > Via smb, much discussed here > > -Original Message- > Sent: 13 February 2020 09:33 > To:

[ceph-users] Re: Ceph and Windows - experiences or suggestions

2020-02-13 Thread Janne Johansson
In the larger scheme of things, "one smb server", "one nfs server" or "one cephfs MDS" doesn't differ much. You need some kind of box to translate from object storage (regardless of if this is iscsi, ceph or something else) to a kind of filesystem that can give you some extra guarantees (like bein

[ceph-users] Re: Ceph and Windows - experiences or suggestions

2020-02-13 Thread Lenz Grimmer
On 2020-02-13 10:26, Janne Johansson wrote: > In the larger scheme of things, "one smb server", "one nfs server" or "one > cephfs MDS" doesn't differ much. > > You need some kind of box to translate from object storage (regardless of > if this is iscsi, ceph or something else) to a kind of file

[ceph-users] Re: extract disk usage stats from running ceph cluster

2020-02-13 Thread mj
Hi, I would like to understand why the OSD HDDs on node2 of my three identical ceph hosts claim to have processed 10 times more reads/writes than the other two nodes. OSD weights are all the similar, disk usage in space also, same disk sizes, same reported disk usage hours etc, etc. All data

[ceph-users] Re: CephFS hangs with access denied

2020-02-13 Thread Toby Darling
Hi Dietmar +1 We've been experiencing the exact same variations of hang / Permission denied / Oops as you, with cephfs 14.2.6 kernel client on Scientific Linux 7.[67] (3.10.0-1062.7.1.el7 and 3.10.0-957.21.3.el7). The mds.log shows the same sequence of denied reconnect attempt Evicting (

[ceph-users] Ceph standby-replay metadata server: MDS internal heartbeat is not healthy

2020-02-13 Thread Martin Palma
Hi all, today we observe that out of the sudden our standby-replay metadata server continuously writes the following logs: 2020-02-13 11:56:50.216102 7fd2ad229700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 2020-02-13 11:56:50.287699 7fd2ad229700 0 mds.beacon.dcucmds401 Skipping

[ceph-users] Re: Ceph and Windows - experiences or suggestions

2020-02-13 Thread Paul Emmerich
On Thu, Feb 13, 2020 at 10:04 AM Lars Täuber wrote: > > I don't have samba experiences. Isn't the installation and administration of > a samba server just for one "share" overkill? Check out our software, you can deploy a pair of highly available Samba servers with just a few clicks: https://cro

[ceph-users] Changing the failure-domain of an erasure coded pool

2020-02-13 Thread Neukum, Max (ETP)
Hi ceph enthusiasts, We have a ceph cluster with cephfs and two pools: one replicated for metadata on ssd and one with ec (4+2) on hdd. Recently, we upgraded from 4 to 7 nodes and now want to change the failure domain of the erasure coded pool from 'OSD' to 'HOST'. What we did was to create a

[ceph-users] Re: Changing the failure-domain of an erasure coded pool

2020-02-13 Thread Paul Emmerich
The CRUSH-related information from the ec profile is only used for the initial creation of the crush rule for pool. You can just change the crush rule and everything else will happen automatically. Or you can create a new crush rule and assign it to the pool like you did, that's also fine. Unrelat

[ceph-users] Very bad performance on a ceph rbd pool via iSCSI to VMware esx

2020-02-13 Thread Salsa
I have a 3 hosts, 10 4TB HDDs per host ceph storage set up. I deined a 3 replica rbd pool and some images and presented them to a Vmware host via ISCSI, but the write performance is so bad the I managed to freeze a VM doing a big rsync to a datastore inside ceph and had to reboot it's host (seem

[ceph-users] Re: Changing the failure-domain of an erasure coded pool

2020-02-13 Thread Neukum, Max (ETP)
This is good news! Thanks for the fast reply. We will now wait for Ceph to place all objects correctly and then check if we are happy with the setup. Cheers Max From: Paul Emmerich Sent: Thursday, February 13, 2020 2:54 PM To: Neukum, Max (ETP) Cc: ceph-

[ceph-users] Re: Identify slow ops

2020-02-13 Thread Stefan Kooman
Quoting Thomas Schneider (74cmo...@gmail.com): > Hi, > > the current output of ceph -s reports a warning: > 2 slow ops, oldest one blocked for 347335 sec, mon.ld5505 has slow ops > This time is increasing. This is a bug, Ceph did not track all ops correctly and in certain cases this might lead to

[ceph-users] EC Pools w/ RBD - IOPs

2020-02-13 Thread Anthony Brandelli (abrandel)
Hi Ceph Community, Wondering what experiences good/bad you have with EC pools for iops intensive workloads (IE: 4Kish random IO from things like VMWare ESXi). I realize that EC pools are a tradeoff between more usable capacity, and having larger latency/lower iops, but in my testing the tradeof

[ceph-users] Re: EC Pools w/ RBD - IOPs

2020-02-13 Thread Martin Verges
Hello, please do not even think about using an EC pool (k=2, m=1). See other posts here, just don't. EC works quite well and we have a lot of users with EC based VMs often with proxmox (rbd) oder vmware (iscsi) hypervisors. Performance depends on the hardware and is definitely slower than replica

[ceph-users] Ceph MDS ASSERT In function 'MDRequestRef'

2020-02-13 Thread Stefan Kooman
Hi, We hit the following assert: -10001> 2020-02-13 17:42:35.543 7f11b5669700 -1 /build/ceph-13.2.8/src/mds/MDCache.cc: In function 'MDRequestRef MDCa che::request_get(metareqid_t)' thread 7f11b5669700 time 2020-02-13 17:42:35.545815 /build/ceph-13.2.8/src/mds/MDCache.cc: 9523: FAILED assert(p

[ceph-users] Re: EC Pools w/ RBD - IOPs

2020-02-13 Thread Anthony Brandelli (abrandel)
I should mention this is solely meant as a test cluster, and unfortunately I only have four OSD nodes in it. I guess I’ll go see if I can dig up another node so I can better mirror what might eventually go to production. I would imagine that latency is only going to increase as we increase k tho

[ceph-users] Re: Ceph MDS ASSERT In function 'MDRequestRef'

2020-02-13 Thread Patrick Donnelly
Hello Stefan, On Thu, Feb 13, 2020 at 9:19 AM Stefan Kooman wrote: > > Hi, > > We hit the following assert: > > -10001> 2020-02-13 17:42:35.543 7f11b5669700 -1 > /build/ceph-13.2.8/src/mds/MDCache.cc: In function 'MDRequestRef MDCa > che::request_get(metareqid_t)' thread 7f11b5669700 time 2020-0

[ceph-users] Re: Very bad performance on a ceph rbd pool via iSCSI to VMware esx

2020-02-13 Thread Andrew Ferris
Hi Salsa, More information about your Ceph cluster and VMware infrastructure is pretty much required. What Ceph version? Ceph cluster info - i.e. how many Monitors, OSD hosts, iSCIS gateways and are these components HW or VMs? Do the Ceph components meet recommended hardware levels for CPU, R

[ceph-users] Excessive write load on mons after upgrade from 12.2.13 -> 14.2.7

2020-02-13 Thread Peter Woodman
Hey, I've been running a ceph cluster of arm64 SOCs on Luminous for the past year or so, with no major problems. I recently upgraded to 14.2.7, and the stability of the cluster immediately suffered. Seemed like any mon activity was subject to long pauses, and the cluster would hang frequently. Loo

[ceph-users] Re: Very bad performance on a ceph rbd pool via iSCSI to VMware esx

2020-02-13 Thread Salsa
I'm currently running Nautilus: [root@ceph01 ~]# ceph -s cluster: id: eb4aea44-0c63-4202-b826-e16ea60ed54d health: HEALTH_WARN too few PGs per OSD (25 < min 30) services: mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 8d) mgr: ceph02(active, s

[ceph-users] Re: Excessive write load on mons after upgrade from 12.2.13 -> 14.2.7

2020-02-13 Thread peter woodman
Almost forgot, here's a graph of the change in write rate: https://shortbus.org/x/ceph-mon-io.png ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph MDS ASSERT In function 'MDRequestRef'

2020-02-13 Thread Stefan Kooman
Quoting Patrick Donnelly (pdonn...@redhat.com): > > Thanks for the information. It looks like this bug: > https://tracker.ceph.com/issues/42467#note-7 Yup, looks like it. As soon as the client is gone the MDS hits the assert. > > Do you have logs you can share? You can use ceph-post-file [1] t

[ceph-users] Re: Ceph and Windows - experiences or suggestions

2020-02-13 Thread Stefan Kooman
Quoting Lars Täuber (taeu...@bbaw.de): > Hi there! > > I got the task to connect a Windows client to our existing ceph cluster. > I'm looking for experiences or suggestions from the community. > There came two possibilities to my mind: > 1. iSCSI Target on RBD exported to Windows > 2. NFS-Ganesha

[ceph-users] Re: EC Pools w/ RBD - IOPs

2020-02-13 Thread Vitaliy Filippov
please do not even think about using an EC pool (k=2, m=1). See other posts here, just don't. Why not? -- With best regards, Vitaliy Filippov ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: slow using ISCSI - Help-me

2020-02-13 Thread Gesiel Galvão Bernardes
Hi Em dom., 9 de fev. de 2020 às 18:27, Mike Christie escreveu: > On 02/08/2020 11:34 PM, Gesiel Galvão Bernardes wrote: > > Hi, > > > > Em qui., 6 de fev. de 2020 às 18:56, Mike Christie > > escreveu: > > > > On 02/05/2020 07:03 AM, Gesiel Galvão Bernardes wrote