Re: [ceph-users] MDS: obscene buffer_anon memory use when scanning lots of files

2020-01-21 Thread Patrick Donnelly
ronjob that restarts the active MDS when it reaches swap just to > keep the cluster alive. This looks like it will be fixed by https://tracker.ceph.com/issues/42943 That will be available in v14.2.7. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red

Re: [ceph-users] HEALTH_WARN 1 MDSs report oversized cache

2019-12-05 Thread Patrick Donnelly
ather be "1 > MDSs report *undersized* cache"? Weird. No. I means the MDS cache is larger than its target. This means the MDS cannot trim its cache to go back under the limit. This could be for many reasons but probably due to clients not releasing capabilities, perhaps due to a bug. --

Re: [ceph-users] POOL_TARGET_SIZE_BYTES_OVERCOMMITTED and POOL_TARGET_SIZE_RATIO_OVERCOMMITTED

2019-11-23 Thread Patrick Donnelly
'rbd', > 'vms'] overcommit available storage by 1.308x due to target_size_ratio 0.000 > on pools [] Will be fixed in 14.2.5: https://tracker.ceph.com/issues/42260 -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

Re: [ceph-users] Revert a CephFS snapshot?

2019-11-14 Thread Patrick Donnelly
On Wed, Nov 13, 2019 at 6:36 PM Jerry Lee wrote: > > On Thu, 14 Nov 2019 at 07:07, Patrick Donnelly wrote: > > > > On Wed, Nov 13, 2019 at 2:30 AM Jerry Lee wrote: > > > Recently, I'm evaluating the snpahsot feature of CephFS from kernel > > > cli

Re: [ceph-users] Revert a CephFS snapshot?

2019-11-13 Thread Patrick Donnelly
the feature is not provided? Any insights > or ideas are appreciated. Please provide more information about what you tried to do (commands run) and how it surprised you. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C

Re: [ceph-users] cephfs 1 large omap objects

2019-10-30 Thread Patrick Donnelly
ed. > > ...should we be worried about the size of these OMAP objects? No. There are only a few of these objects and it's not caused problems up to now in any other cluster. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79

Re: [ceph-users] Problematic inode preventing ceph-mds from starting

2019-10-28 Thread Patrick Donnelly
data may be lost through this command. - `cephfs-data_scan scan_links` again. This should repair any duplicate inodes (by dropping the older dentries). - Then you can try marking the rank as repaired. Good luck! [1] https://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/#journal-truncation --

Re: [ceph-users] kernel cephfs - too many caps used by client

2019-10-24 Thread Patrick Donnelly
(ae699615bac534ea496ee965ac6192cb7e0e07c0) > luminous (stable)": 203 > } > } > > Lei Liu 于2019年10月19日周六 上午10:09写道: >> >> Thanks for your reply. >> >> Yes, Already set it. >> >>> [mds] >>> mds_max_caps_per_client = 10485760 # defaul

Re: [ceph-users] Problematic inode preventing ceph-mds from starting

2019-10-24 Thread Patrick Donnelly
py_head > _structures.py_head > markers.py_head > requirements.py_head > specifiers.py_head > utils.py_head > version.py_head > lima:/home/neale$ ceph -- rados -p cephfs_metadata rmomapkey > 1995e63. _compat.py_head > lima:/home/neale$ ceph -- rados

Re: [ceph-users] kernel cephfs - too many caps used by client

2019-10-18 Thread Patrick Donnelly
_limit = 53687091200 > > I want to know if some ceph configurations can solve this problem ? mds_max_caps_per_client is new in Luminous 12.2.12. See [1]. You need to upgrade. [1] https://tracker.ceph.com/issues/38130 -- Patrick Donnelly, Ph.D. He / Him /

Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing

2019-10-09 Thread Patrick Donnelly
; > fh_desc=, handle=0x7f0470fd4900, >> > attrs_out=0x0) at >> > > > > > >> > >> > /usr/src/debug/nfs-ganesha-2.7.3/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1578 >> > > > >

Re: [ceph-users] MDS Stability with lots of CAPS

2019-10-05 Thread Patrick Donnelly
size using a script. > I'm assuming you need to restart the MDS to make the > "mds_cache_memory_limit" effective, is that correct? No. It is respected at runtime. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C33

Re: [ceph-users] mds directory pinning, status display

2019-09-13 Thread Patrick Donnelly
refore, you need to gather all subtrees from all ranks and merge to see the entire distribution. This could be made simpler by showing this information in the upcoming `ceph fs top` display. I've created a tracker ticket: https://tracker.ceph.com/issues/41824 -- Patrick Donnelly, Ph.D. He / Him / His

Re: [ceph-users] Mutliple CephFS Filesystems Nautilus (14.2.2)

2019-08-21 Thread Patrick Donnelly
ture version that makes upgrading more difficult. > While we're on the subject, is it possible to assign a different active MDS > to each filesystem? The monitors do the assignment. You cannot specify which file system an MDS servers. -- Patrick Donnelly, Ph.D. He / Him / His Senior Softwar

Re: [ceph-users] How does CephFS find a file?

2019-08-19 Thread Patrick Donnelly
the stripe size to know how many objects to > fetch for the whole object. The file layout and the inode number determine where a particular block can be found. This is all encoded in the name of the object within the data pool. [1] https://docs.ceph.com/docs/master/cephfs/file-layouts/ --

Re: [ceph-users] [Ceph-users] Re: MDS failing under load with large cache sizes

2019-08-06 Thread Patrick Donnelly
. After re-adding the two active > MDSs again, I am back at higher numbers, although not quite as much as > before. But I think to remember that it took several minutes if not more > until all MDSs received approximately equal load the last time. Try pinning if possible in each parallel r

Re: [ceph-users] [Ceph-users] Re: MDS failing under load with large cache sizes

2019-08-06 Thread Patrick Donnelly
expect such extreme latency issues. Please share: ceph config dump ceph daemon mds.X cache status and the two perf dumps one second apart again please. Also, you said you removed the aggressive recall changes. I assume you didn't reset them to the defaults, right? Just the first suggested change

Re: [ceph-users] [Ceph-users] Re: MDS failing under load with large cache sizes

2019-08-05 Thread Patrick Donnelly
not aggressive enough. Please let us know if that helps. It shouldn't be necessary to do this so I'll make a tracker ticket once we confirm that's the issue. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _

Re: [ceph-users] [Ceph-users] Re: MDS failing under load with large cache sizes

2019-07-25 Thread Patrick Donnelly
this point and the number is slowly going up. Can you share two captures of `ceph daemon mds.X perf dump` about 1 second apart. You can also try increasing the aggressiveness of the MDS recall but I'm surprised it's still a problem with the settings I gave you: ceph config set mds mds_recall_max_c

Re: [ceph-users] [Ceph-users] Re: MDS failing under load with large cache sizes

2019-07-25 Thread Patrick Donnelly
;). Please unset that config option. Your mds_cache_memory_limit is apparently ~19GB. There is another limit mds_max_caps_per_client (default 1M) which the client is hitting. That's why the MDS is recalling caps from the client and not because any cache memory limit is hit. It is not recommend you i

Re: [ceph-users] how to power off a cephfs cluster cleanly

2019-07-25 Thread Patrick Donnelly
s, and leave the file system down until manually brought back up even if there are standby MDSs available. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D ___

Re: [ceph-users] [Ceph-users] Re: MDS failing under load with large cache sizes

2019-07-24 Thread Patrick Donnelly
he parallel create workload will greatly benefit from it. [1] https://ceph.com/community/nautilus-cephfs/ -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Migrating a cephfs data pool

2019-07-02 Thread Patrick Donnelly
For those interested, there's a ticket [1] to perform file layout migrations in the MDS in an automated way. Not sure if it'll get done for Octopus though. [1] http://tracker.ceph.com/issues/40285 -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F

Re: [ceph-users] CephFS damaged and cannot recover

2019-06-19 Thread Patrick Donnelly
--+ > | cephfs_metadata | metadata | 2843M | 34.9T | > | cephfs_data | data | 2580T | 731T | > +-+--+---+---+ > > +-+ > | Standby MDS | > +-+ > | n31-023-227 | > | n31-023-226 | > | n31-023-228 | > +-----+ Are there failovers occurring while all the ranks are in up:

Re: [ceph-users] Meaning of Ceph MDS / Rank in "Stopped" state.

2019-06-03 Thread Patrick Donnelly
path to move out of > that state. Perhaps I am reading it wrong. Well, we didn't document the transitions for rank "states" in this diagram so we don't show that. The path out of "down:stopped" is to increase max_mds so the rank is reactivated. -- Patrick Donnelly, P

Re: [ceph-users] Quotas with Mimic (CephFS-FUSE) clients in a Luminous Cluster

2019-06-03 Thread Patrick Donnelly
ents and a Luminous cluster? > The release notes of Mimic only mention that quota support was added to the > kernel client, but nothing else quota related catches my eye. Unfortunately this wasn't adequately tested. But yes, Mimic ceph-fuse clients will not be able to interact with quotas

Re: [ceph-users] MDS hangs in "heartbeat_map" deadlock

2019-05-31 Thread Patrick Donnelly
Hi Stefan, Sorry I couldn't get back to you sooner. On Mon, May 27, 2019 at 5:02 AM Stefan Kooman wrote: > > Quoting Stefan Kooman (ste...@bit.nl): > > Hi Patrick, > > > > Quoting Stefan Kooman (ste...@bit.nl): > > > Quoting Stefan Kooman (ste...@bit.nl): > &

Re: [ceph-users] pool migration for cephfs?

2019-05-15 Thread Patrick Donnelly
On Wed, May 15, 2019 at 5:05 AM Lars Täuber wrote: > is there a way to migrate a cephfs to a new data pool like it is for rbd on > nautilus? > https://ceph.com/geen-categorie/ceph-pool-migration/ No, this isn't possible. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Eng

Re: [ceph-users] Clients failing to respond to cache pressure

2019-05-09 Thread Patrick Donnelly
it to be replaced by the monitors. This is one of the reasons the changes were made. I'm not really sure how quickly the MDS will chew through 5M cap releases. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B

Re: [ceph-users] Clients failing to respond to cache pressure

2019-05-08 Thread Patrick Donnelly
o limit the number of caps held by clients to 1M. Additionally, trimming the cache and recalling caps is now throttled. This may help a lot for your workload. Note that these fixes haven't been backported to Mimic yet. -- Patrick Donnelly ___ ceph-users mailin

Re: [ceph-users] Inodes on /cephfs

2019-04-30 Thread Patrick Donnelly
naturally rather cumbersome. This is fallout from [1]. See discussion on setting f_free to 0 here [2]. In summary, userland tools are trying to be too clever by looking at f_free. [I could be convinced to go back to f_free = ULONG_MAX if there are other instances of this.] [1] https://github.com/cep

Re: [ceph-users] inline_data (was: CephFS and many small files)

2019-04-03 Thread Patrick Donnelly
gt; same number of objects was created in the data pool. So the raw usage is > again at more than 500 GB. Even for inline files, there is one object created in the data pool to hold backtrace information (an xattr of the object) used for hard links and disaster recovery. -- Patrick Donnelly _

Re: [ceph-users] CephFS and many small files

2019-03-29 Thread Patrick Donnelly
> such opposing concepts that it is simply not worth the effort? You should not have had issues growing to that number of files. Please post more information about your cluster including configuration changes and `ceph osd df`. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How To Scale Ceph for Large Numbers of Clients?

2019-03-07 Thread Patrick Donnelly
e metadata pool on a separate set of OSDs. Also, you're not going to saturate a 1.9TB NVMe SSD with one OSD. You must partition it and setup multiple OSDs. This ends up being positive for you so that you can put the metadata pool on its own set of OSDs. [1] https://ceph.com/

Re: [ceph-users] How To Scale Ceph for Large Numbers of Clients?

2019-03-06 Thread Patrick Donnelly
with that number of clients and mds_cache_memory_limit=17179869184 (16GB). -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] MDS_SLOW_METADATA_IO

2019-02-28 Thread Patrick Donnelly
d ops in flight on the MDSes but all ops that are printed are > finished in a split second (duration: 0.000152), flag_point": "acquired > locks". I believe you're looking at the wrong "ops" dump. You want to check "objector_requests". -- Patrick Donnelly

Re: [ceph-users] faster switch to another mds

2019-02-20 Thread Patrick Donnelly
oes inform the monitors if it has been shutdown. If you pull the plug or SIGKILL, it does not. :) -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Understanding EC properties for CephFS / small files.

2019-02-18 Thread Patrick Donnelly
.data stat 13c. cephfs.a.data/13c. mtime 2019-02-18 14:02:11.00, size 211224 So the object holding "grep" still only uses ~200KB and not 4MB. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS - read latency.

2019-02-18 Thread Patrick Donnelly
as a median, 32ms average is still on the high side, > but way, way better. I'll use this opportunity to point out that serial archive programs like tar are terrible for distributed file systems. It would be awesome if someone multithreaded tar or extended it for asynchronous I/O.

Re: [ceph-users] Update / upgrade cluster with MDS from 12.2.7 to 12.2.11

2019-02-11 Thread Patrick Donnelly
tly think something went wrong. If you don't mind seeing those errors and you're using 1 active MDS, then don't worry about it. Good luck! -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Multi-filesystem wthin a cluster

2019-01-17 Thread Patrick Donnelly
On Thu, Jan 17, 2019 at 2:44 AM Dan van der Ster wrote: > > On Wed, Jan 16, 2019 at 11:17 PM Patrick Donnelly wrote: > > > > On Wed, Jan 16, 2019 at 1:21 AM Marvin Zhang wrote: > > > Hi CephFS experts, > > > From document, I know multi-fs within a cluste

Re: [ceph-users] How to do multiple cephfs mounts.

2019-01-17 Thread Patrick Donnelly
ng 2 kernel mounts on CentOS 7.6 It's unlikely this changes anything unless you also split the workload into two. That may allow the kernel to do parallel requests? -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com htt

Re: [ceph-users] CEPH_FSAL Nfs-ganesha

2019-01-15 Thread Patrick Donnelly
he NFS server is expected to have a lot of load, breaking out the exports can have a positive impact on performance. If there are hard links, then the clients associated with the exports will potentially fight over capabilities which will add to request latency.) -- Patrick Donnelly __

Re: [ceph-users] tuning ceph mds cache settings

2019-01-09 Thread Patrick Donnelly
a "daemon mds.blah > cache drop". The performance bump lasts for quite a long time--far longer > than it takes for the cache to "fill" according to the stats. What version of Ceph are you running? Can you expand on what this performance im

Re: [ceph-users] [Ceph-maintainers] v13.2.4 Mimic released

2019-01-08 Thread Patrick Donnelly
ready running v13.2.2, > >>upgrading to v13.2.3 does not require special action. > > Any special action for upgrading from 13.2.1 ? No special actions for CephFS are required for the upgrade. -- Patrick Donnelly ___ ceph-users mailing l

Re: [ceph-users] CephFS MDS optimal setup on Google Cloud

2019-01-07 Thread Patrick Donnelly
it a single lock per MDS or is it a > global distributed lock for all MDSs? per-MDS -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mon:failed in thread_name:safe_timer

2018-11-21 Thread Patrick Donnelly
t; confused about it. How did you restart the MDSs? If you used `ceph mds fail` then the executable version (v12.2.8) will not change. Also, the monitor failure requires updating the monitor to v12.2.9. What version is the mons? -- Patrick Donnelly ___ ceph-users maili

Re: [ceph-users] read performance, separate client CRUSH maps or limit osd read access from each client

2018-11-20 Thread Patrick Donnelly
You either need to accept that reads/writes will land on different data centers, primary OSD for a given pool is always in the desired data center, or some other non-Ceph solution which will have either expensive, eventual, or false consistency. On Fri, Nov 16, 2018, 10:07 AM Vlad Kopylov This

Re: [ceph-users] mon:failed in thread_name:safe_timer

2018-11-19 Thread Patrick Donnelly
the OSDs): https://tracker.ceph.com/issues/35848 -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Removing MDS

2018-10-30 Thread Patrick Donnelly
tivation. To get more help, you need to describe your environment, version of Ceph in use, relevant log snippets, etc. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Don't upgrade to 13.2.2 if you use cephfs

2018-10-17 Thread Patrick Donnelly
e to 13.2.2 ? > > or better to wait to 13.2.3 ? or install 13.2.1 for now ? Upgrading to 13.2.1 would be safe. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] MDS hangs in "heartbeat_map" deadlock

2018-10-08 Thread Patrick Donnelly
for the detailed notes. It looks like the MDS is stuck somewhere it's not even outputting any log messages. If possible, it'd be helpful to get a coredump (e.g. by sending SIGQUIT to the MDS) or, if you're comfortable with gdb, a backtrace of any threads that look suspicious (e.g. not waiting on a futex

Re: [ceph-users] Don't upgrade to 13.2.2 if you use cephfs

2018-10-08 Thread Patrick Donnelly
are also affected but do not require immediate action. A procedure for handling upgrades of fresh deployments from 13.2.2 to 13.2.3 will be included in the release notes for 13.2.3. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@l

Re: [ceph-users] CephFS performance.

2018-10-04 Thread Patrick Donnelly
iping via layouts: http://docs.ceph.com/docs/master/cephfs/file-layouts/ -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] omap vs. xattr in librados

2018-09-11 Thread Patrick Donnelly
over omap (outside of ease of > use in the API), correct? You may prefer xattrs on bluestore if the metadata is small and you may need to store the xattrs on an EC pool. omap is not supported on ecpools. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS Quota and ACL support

2018-08-27 Thread Patrick Donnelly
nux v4.17+. See also: https://github.com/ceph/ceph/pull/23728/files -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Secure way to wipe a Ceph cluster

2018-07-27 Thread Patrick Donnelly
e firmware may retire bad blocks and make them inaccessible. It may not be possible for the device to physically destroy those blocks either even with SMART directives. You may be stuck with an industrial shredder to be compliant if the rules are stringen

Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-23 Thread Patrick Donnelly
. How fast does your MDS reach 15GB? Your MDS cache size should be configured to 1-8GB (depending on your preference) so it's disturbing to see you set it so low. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ce

Re: [ceph-users] mds daemon damaged

2018-07-12 Thread Patrick Donnelly
On Thu, Jul 12, 2018 at 3:55 PM, Patrick Donnelly wrote: >> Recommends fixing error by hand. Tried running deep scrub on pg 2.4, it >> completes but still have the same issue above >> >> Final option is to attempt removing mds.ds27. If mds.ds29 was a standby and >>

Re: [ceph-users] mds daemon damaged

2018-07-12 Thread Patrick Donnelly
t; has data it should become live. If it was not > I assume we will lose the filesystem at this point > > Why didn't the standby MDS failover? > > Just looking for any way to recover the cephfs, thanks! I think it's time to do a scrub on the PG containing that object. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Mimic (13.2.0) Release Notes Bug on CephFS Snapshot Upgrades

2018-06-07 Thread Patrick Donnelly
d/ [3] https://github.com/ceph/ceph/pull/22445/files -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS "move" operation

2018-05-25 Thread Patrick Donnelly
crossing quota boundaries (I think). It may be possible to allow the rename in the MDS and check quotas there. I've filed a tracker ticket here: http://tracker.ceph.com/issues/24305 -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] (yet another) multi active mds advise needed

2018-05-18 Thread Patrick Donnelly
load for you without trying to micromanage things via pins. You can use pinning to isolate metadata load from other ranks as a stop-gap measure. [1] https://github.com/ceph/ceph/pull/21412 -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Too many active mds servers

2018-05-15 Thread Patrick Donnelly
ecome standby? > > I've run ceph fs set cephfs max_mds 2 which set the max_mds from 3 to 2 but > has no effect on my running configuration. http://docs.ceph.com/docs/luminous/cephfs/multimds/#decreasing-the-number-of-ranks Note: the behavior is changing in Mimic to be automatic after red

Re: [ceph-users] ceph mds memory usage 20GB : is it normal ?

2018-05-10 Thread Patrick Donnelly
ng that limit. But, the mds process is using over 100GB RAM in my > 128GB host. I thought I was playing it safe by configuring at 80. What other > things consume a lot of RAM for this process? > > Let me know if I need to create a new thread. The cache size measurement is imprecise pre

Re: [ceph-users] ceph mds memory usage 20GB : is it normal ?

2018-05-10 Thread Patrick Donnelly
nfigured limit. If the cache size is larger than the limit (use `cache status` admin socket command) then we'd be interested in seeing a few seconds of the MDS debug log with higher debugging set (`config set debug_mds 20`). -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

2018-04-30 Thread Patrick Donnelly
ing that would be a godsend! Thanks for keeping the list apprised of your efforts. Since this is so easily reproduced for you, I would suggest that you next get higher debug logs (debug_mds=20/debug_ms=1) from the MDS. And, since this is a segmentation fault, a backtrace with debug symbols from gdb

Re: [ceph-users] Multi-MDS Failover

2018-04-27 Thread Patrick Donnelly
t cannot obtain the necessary locks. No metadata is lost. No inconsistency is created between clients. Full availability will be restored when the lost ranks come back online. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Multi-MDS Failover

2018-04-26 Thread Patrick Donnelly
by throwing an error or becoming unavailable -- when the standbys exist to make the system available. There's nothing to enforce. A warning is sufficient for the operator that (a) they didn't configure any standbys or (b) MDS daemon processes/boxes are going away and not coming back as standbys (i.e

Re: [ceph-users] Multi-MDS Failover

2018-04-26 Thread Patrick Donnelly
e were still able to access ceph folder and everything seems to > be running. It depends(tm) on how the metadata is distributed and what locks are held by each MDS. Standbys are not optional in any production cluster. -- Patrick Donnelly ___ ceph-use

Re: [ceph-users] cephfs luminous 12.2.4 - multi-active MDSes with manual pinning

2018-04-24 Thread Patrick Donnelly
wer than those on the test MDS VMs. As Dan said, this is simply a spurious log message. Nothing is being exported. This will be fixed in 12.2.6 as part of several fixes to the load balancer: https://github.com/ceph/ceph/pull/21412/commits/cace918dd044b979cd0d54b16a6296094c8a9f90 -- Patrick Donnelly __

Re: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2

2018-04-12 Thread Patrick Donnelly
On Thu, Apr 12, 2018 at 5:05 AM, Mark Schouten <m...@tuxis.nl> wrote: > On Wed, 2018-04-11 at 17:10 -0700, Patrick Donnelly wrote: >> No longer recommended. See: >> http://docs.ceph.com/docs/master/cephfs/upgrading/#upgrading-the-mds- >> cluster > > Shouldn't d

Re: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2

2018-04-11 Thread Patrick Donnelly
fs/upgrading/#upgrading-the-mds-cluster -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephfs snapshot format upgrade

2018-04-10 Thread Patrick Donnelly
; (allows_multimds() || in.size() >1)) && latest_scrubbed_version < > mimic This sounds like the right approach to me. The mons should also be capable of performing the same test and raising a health error that pre-Mimic MDSs must be started and the number of actives be reduced to 1. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs hardlink snapshot

2018-04-05 Thread Patrick Donnelly
of having to copy it. Hardlink handling for snapshots will be in Mimic. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-fuse segfaults

2018-04-02 Thread Patrick Donnelly
_ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Is it possible to suggest the active MDS to move to a datacenter ?

2018-03-29 Thread Patrick Donnelly
t yet exist for NFS-Ganesha+CephFS outside of Openstack Queens deployments. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph mds memory usage 20GB : is it normal ?

2018-03-27 Thread Patrick Donnelly
o": 7510, > "traverse_lock": 86236, > "load_cent": 144401980319, > "q": 49, > "exported": 0, > "exported_inodes": 0, > "imported": 0, > "imported_inodes": 0 > } > } Can you also share `ceph daemon mds.2 cache status`, the full `ceph daemon mds.2 perf dump`, and `ceph status`? Note [1] will be in 12.2.5 and may help with your issue. [1] https://github.com/ceph/ceph/pull/20527 -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs and number of clients

2018-03-20 Thread Patrick Donnelly
olders, example: > > /vol1 > /vol2 > /vol3 > /vol4 > > At the moment the root of the cephfs filesystem is mounted to each web > server. The query is would there be a benefit to having separate mount > points for each folder like above? Performance benefit? No. Data isol

Re: [ceph-users] rctime not tracking inode ctime

2018-03-14 Thread Patrick Donnelly
ges to directory inodes. Traditionally, modifying a file (truncate, write) does not involve metadata changes to a directory inode. Whether that is the intended behavior is a good question. Perhaps it should be changed? -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-03-14 Thread Patrick Donnelly
On Wed, Mar 14, 2018 at 5:48 AM, Lars Marowsky-Bree <l...@suse.com> wrote: > On 2018-02-28T02:38:34, Patrick Donnelly <pdonn...@redhat.com> wrote: > >> I think it will be necessary to reduce the actives to 1 (max_mds -> 1; >> deactivate other ranks), shutdown st

Re: [ceph-users] Don't use ceph mds set max_mds

2018-03-07 Thread Patrick Donnelly
quot;default" filesystem) if called. > > The multi-fs stuff went in for Jewel, so maybe we should think about > removing the old commands in Mimic: any thoughts Patrick? These commands have already been removed (obsoleted) in master/Mimic. You can no longer use

Re: [ceph-users] Ceph-Fuse and mount namespaces

2018-02-28 Thread Patrick Donnelly
-note, once I exit the container (and hence close the mount > namespace), the "old" helper is finally freed. Once the last mount point is unmounted, FUSE will destroy the userspace helper. [1] http://docs.ceph.com/docs/master/rados/configuration/ceph-conf/?highlight=configura

Re: [ceph-users] Updating standby mds from 12.2.2 to 12.2.4 caused up:active 12.2.2 mds's to suicide

2018-02-28 Thread Patrick Donnelly
will be necessary to reduce the actives to 1 (max_mds -> 1; deactivate other ranks), shutdown standbys, upgrade the single active, then upgrade/start the standbys. Unfortunately this didn't get flagged in upgrade testing. Thanks for the report Dan. -- Patrick Donnelly

Re: [ceph-users] Storage usage of CephFS-MDS

2018-02-26 Thread Patrick Donnelly
On Mon, Feb 26, 2018 at 7:59 AM, Patrick Donnelly <pdonn...@redhat.com> wrote: > It seems in the above test you're using about 1KB per inode (file). > Using that you can extrapolate how much space the data pool needs s/data pool/metadata pool/ -- Patr

Re: [ceph-users] Storage usage of CephFS-MDS

2018-02-26 Thread Patrick Donnelly
ove test you're using about 1KB per inode (file). Using that you can extrapolate how much space the data pool needs based on your file system usage. (If all you're doing is filling the file system with empty files, of course you're going to need an unusually large metadata pool.) -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS very unstable with many small files

2018-02-26 Thread Patrick Donnelly
the inode count in cache) by collecting a `perf dump` via the admin socket. Then you can begin to find out what's consuming all of the MDS memory. Additionally, I concur with John on digging into why the MDS is missing heartbeats by collecting debug logs (`debug mds = 15`) at that time. It may also shed light on the issue. Thanks for performing the test and letting us know the results. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Balanced MDS, all as active and recomended client settings.

2018-02-23 Thread Patrick Donnelly
ever, if you don't have any quotas then there is no added load on the client/mds. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Balanced MDS, all as active and recomended client settings.

2018-02-22 Thread Patrick Donnelly
my server files are the most of > time read-only so MDS data can be also cached for a while. The MDS issues capabilities that allow clients to coherently cache metadata. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com htt

Re: [ceph-users] Balanced MDS, all as active and recomended client settings.

2018-02-21 Thread Patrick Donnelly
e a good client configuration > like cache size, and maybe something to lower the metadata servers load. >> >> ## >> [mds] >> mds_cache_size = 25 >> mds_cache_memory_limit = 792723456 You should only specify one of those. See also: http://docs.ceph.com/docs/master

Re: [ceph-users] client with uid

2018-02-06 Thread Patrick Donnelly
3: (Client::_ll_drop_pins()+0x67) [0x558336e5dea7] > 4: (Client::unmount()+0x943) [0x558336e67323] > 5: (main()+0x7ed) [0x558336e02b0d] > 6: (__libc_start_main()+0xea) [0x7efc7a892f2a] > 7: (_start()+0x2a) [0x558336e0b73a] > ceph-fuse [25154]: (33) Numerical argument out

Re: [ceph-users] client with uid

2018-01-25 Thread Patrick Donnelly
ration. [1] http://tracker.ceph.com/issues/22802 [2] http://tracker.ceph.com/issues/22801 -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] After Luminous upgrade: ceph-fuse clients failing to respond to cache pressure

2018-01-18 Thread Patrick Donnelly
ug ms = 1`. Feel free to create a tracker ticket and use ceph-post-file [1] to share logs. [1] http://docs.ceph.com/docs/hammer/man/8/ceph-post-file/ -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] MDS cache size limits

2018-01-05 Thread Patrick Donnelly
On Fri, Jan 5, 2018 at 3:54 AM, Stefan Kooman <ste...@bit.nl> wrote: > Quoting Patrick Donnelly (pdonn...@redhat.com): >> >> It's expected but not desired: http://tracker.ceph.com/issues/21402 >> >> The memory usage tracking is off by a constant factor. I'd sugg

Re: [ceph-users] MDS cache size limits

2018-01-04 Thread Patrick Donnelly
402 The memory usage tracking is off by a constant factor. I'd suggest just lowering the limit so it's about where it should be for your system. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephfs mds millions of caps

2017-12-14 Thread Patrick Donnelly
On Thu, Dec 14, 2017 at 4:44 PM, Webert de Souza Lima <webert.b...@gmail.com> wrote: > Hi Patrick, > > On Thu, Dec 14, 2017 at 7:52 PM, Patrick Donnelly <pdonn...@redhat.com> > wrote: >> >> >> It's likely you're a victim of a kernel backport that removed a

Re: [ceph-users] cephfs mds millions of caps

2017-12-14 Thread Patrick Donnelly
by default: https://github.com/ceph/ceph/pull/17925 I suggest setting that config manually to false on all of your clients and ensure each client can remount itself to trim dentries (i.e. it's being run as root or with sufficient capabiltities) which is a fallback mechanism. --

Re: [ceph-users] CephFS log jam prevention

2017-12-05 Thread Patrick Donnelly
db(mds.0): Behind on trimming (36252/30)max_segments: 30, > num_segments: 36252 See also: http://tracker.ceph.com/issues/21975 You can try doubling (several times if necessary) the MDS configs `mds_log_max_segments` and `mds_log_max_expiring` to make it more aggressively trim its journal.

Re: [ceph-users] strange error on link() for nfs over cephfs

2017-11-29 Thread Patrick Donnelly
e transitioning right now, a number of > machines still auto-mount users home directories from that nfsd. You need to try a newer kernel as there have been many fixes since 4.4 which probably have not been backported to your distribution's kernel. -- Patrick Donnelly

Re: [ceph-users] luminous ceph-fuse crashes with "failed to remount for kernel dentry trimming"

2017-11-27 Thread Patrick Donnelly
ntly. It may be related to [1]. Are you running out of memory on these machines? [1] http://tracker.ceph.com/issues/17517 -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

  1   2   >