Re: [ceph-users] Ceph MDS randomly hangs with no useful error message

2020-01-20 Thread Yan, Zheng
historic_ops' and ''ceph daemon mds.xxx perf reset; ceph daemon mds.xxx perf dump'. send the outputs to us. > > On 17/01/2020 13:07, Yan, Zheng wrote: > > On Fri, Jan 17, 2020 at 4:47 PM Janek Bevendorff > > wrote: > >> Hi, > >> > >> We have a CephFS in our clus

Re: [ceph-users] Ceph MDS randomly hangs with no useful error message

2020-01-17 Thread Yan, Zheng
On Fri, Jan 17, 2020 at 4:47 PM Janek Bevendorff wrote: > > Hi, > > We have a CephFS in our cluster with 3 MDS to which > 300 clients > connect at any given time. The FS contains about 80 TB of data and many > million files, so it is important that meta data operations work > smoothly even when

Re: [ceph-users] MDS crash - FAILED assert(omap_num_objs <= MAX_OBJECTS)

2019-12-04 Thread Yan, Zheng
, int)' thread > 7fd436ca7700 time 2019-12-04 20:28:34.939048 > /build/ceph-13.2.6/src/mds/OpenFileTable.cc: 476: FAILED assert(omap_num_objs > <= MAX_OBJECTS) > > mds.0.openfiles omap_num_objs 1025 <- ... just 1 higher than 1024? > Coincidence? > > Gr. Stefan > Please c

Re: [ceph-users] user and group acls on cephfs mounts

2019-11-06 Thread Yan, Zheng
does 'group dev' have the same id on two VMss? do the the VMs use the same 'ceph auth name' to mount cephfs? On Wed, Nov 6, 2019 at 4:12 PM Alex Litvak wrote: > > Plot thickens. > > I create a new user sam2 and group sam2 both uid and gid = 1501. User sam2 > is a member of group dev. When I

Re: [ceph-users] user and group acls on cephfs mounts

2019-11-05 Thread Yan, Zheng
On Wed, Nov 6, 2019 at 5:47 AM Alex Litvak wrote: > > Hello Cephers, > > > I am trying to understand how uid and gid are handled on the shared cephfs > mount. I am using 14.2.2 and cephfs kernel based client. > I have 2 client vms with following uid gid > > vm1 user dev (uid=500) group dev

Re: [ceph-users] cephfs 1 large omap objects

2019-10-29 Thread Yan, Zheng
see https://tracker.ceph.com/issues/42515. just ignore the warning for now On Mon, Oct 7, 2019 at 7:50 AM Nigel Williams wrote: > > Out of the blue this popped up (on an otherwise healthy cluster): > > HEALTH_WARN 1 large omap objects > LARGE_OMAP_OBJECTS 1 large omap objects > 1 large

Re: [ceph-users] Crashed MDS (segfault)

2019-10-25 Thread Yan, Zheng
> CephFS worked well for approximately 3 hours and then our MDS crashed again, > apparently due to the bug described at https://tracker.ceph.com/issues/38452 > does the method in issue #38452 work for you? if not, please debug_mds to 10, and set log around the crash to us Yan, Zheng

Re: [ceph-users] Crashed MDS (segfault)

2019-10-22 Thread Yan, Zheng
er mds restart can fix the incorrect stat. > On Mon, Oct 21, 2019 at 4:36 AM Yan, Zheng wrote: >> >> On Fri, Oct 18, 2019 at 9:10 AM Gustavo Tonini >> wrote: >> > >> > Hi Zheng, >> > the cluster is running ceph mimic. This warning abo

Re: [ceph-users] MDS crash - FAILED assert(omap_num_objs <= MAX_OBJECTS)

2019-10-21 Thread Yan, Zheng
On Mon, Oct 21, 2019 at 7:58 PM Stefan Kooman wrote: > > Quoting Yan, Zheng (uker...@gmail.com): > > > I double checked the code, but didn't find any clue. Can you compile > > mds with a debug patch? > > Sure, I'll try to do my best to get a properly packaged Ceph Mi

Re: [ceph-users] MDS crash - FAILED assert(omap_num_objs <= MAX_OBJECTS)

2019-10-21 Thread Yan, Zheng
On Mon, Oct 21, 2019 at 4:33 PM Stefan Kooman wrote: > > Quoting Yan, Zheng (uker...@gmail.com): > > > delete 'mdsX_openfiles.0' object from cephfs metadata pool. (X is rank > > of the crashed mds) > > OK, MDS crashed again, restarted. I stopped it, deleted the obj

Re: [ceph-users] MDS crash - FAILED assert(omap_num_objs <= MAX_OBJECTS)

2019-10-21 Thread Yan, Zheng
On Sun, Oct 20, 2019 at 1:53 PM Stefan Kooman wrote: > > Dear list, > > Quoting Stefan Kooman (ste...@bit.nl): > > > I wonder if this situation is more likely to be hit on Mimic 13.2.6 than > > on any other system. > > > > Any hints / help to prevent this from happening? > > We have had this

Re: [ceph-users] Crashed MDS (segfault)

2019-10-21 Thread Yan, Zheng
could variable "newparent" be NULL at > https://github.com/ceph/ceph/blob/master/src/mds/SnapRealm.cc#L599 ? Is there > a way to fix this? > try 'cephfs-data-scan init'. It will setup root inode's snaprealm. > On Thu, Oct 17, 2019 at 9:58 PM Yan, Zheng wrote: >>

Re: [ceph-users] Crashed MDS (segfault)

2019-10-17 Thread Yan, Zheng
ddbc > ceph@deployer:~$ > > Could a journal reset help with this? > > I could snapshot all FS pools and export the journal before to guarantee a > rollback to this state if something goes wrong with jounal reset. > > On Thu, Oct 17, 2019, 09:07 Yan, Zheng wrote: >>

Re: [ceph-users] Crashed MDS (segfault)

2019-10-17 Thread Yan, Zheng
On Tue, Oct 15, 2019 at 12:03 PM Gustavo Tonini wrote: > > Dear ceph users, > we're experiencing a segfault during MDS startup (replay process) which is > making our FS inaccessible. > > MDS log messages: > > Oct 15 03:41:39.894584 mds1 ceph-mds: -472> 2019-10-15 00:40:30.201 > 7f3c08f49700

Re: [ceph-users] mds fail ing to start 14.2.2

2019-10-11 Thread Yan, Zheng
On Sat, Oct 12, 2019 at 1:10 AM Kenneth Waegeman wrote: > Hi all, > > After solving some pg inconsistency problems, my fs is still in > trouble. my mds's are crashing with this error: > > > > -5> 2019-10-11 19:02:55.375 7f2d39f10700 1 mds.1.564276 rejoin_start > > -4> 2019-10-11

Re: [ceph-users] ceph mdss keep on crashing after update to 14.2.3

2019-09-19 Thread Yan, Zheng
gt; ?? > You are right. Sorry for the bug. For now, please got back to 14.2.2 (just mds) or complie ceph-mds from source Yan, Zheng > Did you already try going back to v14.2.2 (on the MDS's only) ?? > > -- dan > > On Thu, Sep 19, 2019 at 4:59 PM Kenneth Waegeman > wrot

Re: [ceph-users] CephFS deletion performance

2019-09-17 Thread Yan, Zheng
On Sat, Sep 14, 2019 at 8:57 PM Hector Martin wrote: > > On 13/09/2019 16.25, Hector Martin wrote: > > Is this expected for CephFS? I know data deletions are asynchronous, but > > not being able to delete metadata/directories without an undue impact on > > the whole filesystem performance is

Re: [ceph-users] Stray count increasing due to snapshots (?)

2019-09-05 Thread Yan, Zheng
On Thu, Sep 5, 2019 at 4:31 PM Hector Martin wrote: > > I have a production CephFS (13.2.6 Mimic) with >400K strays. I believe > this is caused by snapshots. The backup process for this filesystem > consists of creating a snapshot and rsyncing it over daily, and > snapshots are kept locally in

Re: [ceph-users] cephfs-snapshots causing mds failover, hangs

2019-08-26 Thread Yan, Zheng
On Mon, Aug 26, 2019 at 9:25 PM thoralf schulze wrote: > > hi Zheng - > > On 8/26/19 2:55 PM, Yan, Zheng wrote: > > I tracked down the bug > > https://tracker.ceph.com/issues/41434 > > wow, that was quick - thank you for investigating. we are looking > forward fo

Re: [ceph-users] cephfs-snapshots causing mds failover, hangs

2019-08-26 Thread Yan, Zheng
On Mon, Aug 26, 2019 at 6:57 PM thoralf schulze wrote: > > hi Zheng, > > On 8/21/19 4:32 AM, Yan, Zheng wrote: > > Please enable debug mds (debug_mds=10), and try reproducing it again. > > please find the logs at > https://www.user.tu-berlin.de/thoralf.schulze/ceph-de

Re: [ceph-users] cephfs-snapshots causing mds failover, hangs

2019-08-20 Thread Yan, Zheng
the mds > daemons on these machines have to be manually restarted. more often than > we wish, the failover fails altogether, resulting in an unresponsive cephfs. > Please enable debug mds (debug_mds=10), and try reproducing it again. Regards Yan, Zheng > this is with mimic 13.2.6 and a

Re: [ceph-users] MDS corruption

2019-08-13 Thread Yan, Zheng
nautilus version (14.2.2) of ‘cephfs-data-scan scan_links’ can fix snaptable. hopefully it will fix your issue. you don't need to upgrade whole cluster. Just install nautilus in a temp machine or compile ceph from source. On Tue, Aug 13, 2019 at 2:35 PM Adam wrote: > > Pierre Dittes helped

Re: [ceph-users] Error Mounting CephFS

2019-08-07 Thread Yan, Zheng
On Wed, Aug 7, 2019 at 3:46 PM wrote: > > All; > > I have a server running CentOS 7.6 (1810), that I want to set up with CephFS > (full disclosure, I'm going to be running samba on the CephFS). I can mount > the CephFS fine when I use the option secret=, but when I switch to > secretfile=, I

Re: [ceph-users] loaded dup inode (but no mds crash)

2019-07-29 Thread Yan, Zheng
On Mon, Jul 29, 2019 at 9:54 PM Dan van der Ster wrote: > > On Mon, Jul 29, 2019 at 3:47 PM Yan, Zheng wrote: > > > > On Mon, Jul 29, 2019 at 9:13 PM Dan van der Ster > > wrote: > > > > > > On Mon, Jul 29, 2019 at 2:52 PM Yan, Zheng wrote: > >

Re: [ceph-users] loaded dup inode (but no mds crash)

2019-07-29 Thread Yan, Zheng
On Mon, Jul 29, 2019 at 9:13 PM Dan van der Ster wrote: > > On Mon, Jul 29, 2019 at 2:52 PM Yan, Zheng wrote: > > > > On Fri, Jul 26, 2019 at 4:45 PM Dan van der Ster > > wrote: > > > > > > Hi all, > > > > > > Last night we h

Re: [ceph-users] loaded dup inode (but no mds crash)

2019-07-29 Thread Yan, Zheng
adata rmomapkey 617. 10006289992_head. I suggest run 'cephfs-data-scan scan_links' after taking down cephfs (either use 'mds set down true' or 'flush all journasl and kill all mds') Regards Yan, Zheng > > Thanks! > > Dan > _

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-24 Thread Yan, Zheng
On Wed, Jul 24, 2019 at 3:13 PM Janek Bevendorff < janek.bevendo...@uni-weimar.de> wrote: > > which version? > > Nautilus, 14.2.2. > I mean kernel version > try mounting cephfs on a machine/vm with small memory (4G~8G), then rsync > your date into mount point of that machine. > > I could try

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-24 Thread Yan, Zheng
On Wed, Jul 24, 2019 at 1:58 PM Janek Bevendorff < janek.bevendo...@uni-weimar.de> wrote: > Ceph-fuse ? > > No, I am using the kernel module. > > which version? > > Was there "Client xxx failing to respond to cache pressure" health warning? > > > At first, yes (at least with the Mimic client).

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-23 Thread Yan, Zheng
On Wed, Jul 24, 2019 at 4:06 AM Janek Bevendorff < janek.bevendo...@uni-weimar.de> wrote: > Thanks for your reply. > > On 23/07/2019 21:03, Nathan Fish wrote: > > What Ceph version? Do the clients match? What CPUs do the MDS servers > > have, and how is their CPU usage when this occurs? > >

Re: [ceph-users] Mark CephFS inode as lost

2019-07-23 Thread Yan, Zheng
please create a ticket at http://tracker.ceph.com/projects/cephfs and upload mds log with debug_mds =10 On Tue, Jul 23, 2019 at 6:00 AM Robert LeBlanc wrote: > > We have a Luminous cluster which has filled up to 100% multiple times and > this causes an inode to be left in a bad state. Doing

Re: [ceph-users] HEALTH_WARN 1 MDSs report slow metadata IOs

2019-07-17 Thread Yan, Zheng
Check if there is any hang request in 'ceph daemon mds.xxx objecter_requests' On Tue, Jul 16, 2019 at 11:51 PM Dietmar Rieder wrote: > > On 7/16/19 4:11 PM, Dietmar Rieder wrote: > > Hi, > > > > We are running ceph version 14.1.2 with cephfs only. > > > > I just noticed that one of our pgs had

Re: [ceph-users] writable snapshots in cephfs? GDPR/DSGVO

2019-07-10 Thread Yan, Zheng
On Wed, Jul 10, 2019 at 4:16 PM Lars Täuber wrote: > > Hi everbody! > > Is it possible to make snapshots in cephfs writable? > We need to remove files because of this General Data Protection Regulation > also from snapshots. > It's possible (only delete data), but need to modify both mds and

Re: [ceph-users] MDS getattr op stuck in snapshot

2019-06-29 Thread Yan, Zheng
On Fri, Jun 28, 2019 at 11:42 AM Hector Martin wrote: > > On 12/06/2019 22.33, Yan, Zheng wrote: > > I have tracked down the bug. thank you for reporting this. 'echo 2 > > > /proc/sys/vm/drop_cache' should fix the hang. If you can compile ceph > > from source, please

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-06-21 Thread Yan, Zheng
On Fri, Jun 21, 2019 at 6:10 PM Frank Schilder wrote: > > Dear Yan, Zheng, > > does mimic 13.2.6 fix the snapshot issue? If not, could you please send me a > link to the issue tracker? > no https://tracker.ceph.com/issues/39987 > Thanks and best regards, > > ===

Re: [ceph-users] How does cephfs ensure client cache consistency?

2019-06-18 Thread Yan, Zheng
On Tue, Jun 18, 2019 at 4:25 PM ?? ?? wrote: > > > > There are 2 clients, A and B. There is a directory /a/b/c/d/. > > Client A create a file /a/b/c/d/a.txt. > > Client B move the folder d to /a/. > > Now, this directory looks like this:/a/b/c/ and /a/d/. > > /a/b/c/d is not exist any more. > >

Re: [ceph-users] MDS getattr op stuck in snapshot

2019-06-12 Thread Yan, Zheng
On Wed, Jun 12, 2019 at 3:26 PM Hector Martin wrote: > > Hi list, > > I have a setup where two clients mount the same filesystem and > read/write from mostly non-overlapping subsets of files (Dovecot mail > storage/indices). There is a third client that takes backups by > snapshotting the

Re: [ceph-users] How to fix ceph MDS HEALTH_WARN

2019-06-05 Thread Yan, Zheng
On Thu, Jun 6, 2019 at 6:36 AM Jorge Garcia wrote: > > We have been testing a new installation of ceph (mimic 13.2.2) mostly > using cephfs (for now). The current test is just setting up a filesystem > for backups of our other filesystems. After rsyncing data for a few > days, we started getting

Re: [ceph-users] CEPH MDS Damaged Metadata - recovery steps

2019-06-03 Thread Yan, Zheng
On Mon, Jun 3, 2019 at 3:06 PM James Wilkins wrote: > > Hi all, > > After a bit of advice to ensure we’re approaching this the right way. > > (version: 12.2.12, multi-mds, dirfrag is enabled) > > We have corrupt meta-data as identified by ceph > > health: HEALTH_ERR > 2 MDSs

Re: [ceph-users] Quotas with Mimic (CephFS-FUSE) clients in a Luminous Cluster

2019-05-27 Thread Yan, Zheng
-mean-it") but sadly, the max_bytes attribute is still not > > there > > (also not after remounting on the client / using the file creation and > > deletion trick). > > That's interesting - it suddenly started to work for one directory after > creating a

Re: [ceph-users] CephFS msg length greater than osd_max_write_size

2019-05-22 Thread Yan, Zheng
On Tue, May 21, 2019 at 6:10 AM Ryan Leimenstoll wrote: > > Hi all, > > We recently encountered an issue where our CephFS filesystem unexpectedly was > set to read-only. When we look at some of the logs from the daemons I can see > the following: > > On the MDS: > ... > 2019-05-18 16:34:24.341

Re: [ceph-users] Cephfs client evicted, how to unmount the filesystem on the client?

2019-05-22 Thread Yan, Zheng
try 'umount -f' On Tue, May 21, 2019 at 4:41 PM Marc Roos wrote: > > > > > > [@ceph]# ps -aux | grep D > USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND > root 12527 0.0 0.0 123520 932 pts/1D+ 09:26 0:00 umount > /home/mail-archive > root 14549 0.2

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-20 Thread Yan, Zheng
frag.zip?l; . Its a bit > more than 100MB. > MSD cache dump shows there is a snapshot related. Please avoid using snapshot until we fix the bug. Regards Yan, Zheng > The active MDS failed over to the standby after or during the dump cache > operation. Is this expected? As a result,

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-16 Thread Yan, Zheng
nations, to reproduce the issue I will create a directory with many > entries and execute a test with the many-clients single-file-read load on it. > try setting mds_bal_split_rd and mds_bal_split_wr to very large value. which prevent mds from splitting hot dirfrag Regards Yan, Zheng > I

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-15 Thread Yan, Zheng
On Wed, May 15, 2019 at 9:34 PM Frank Schilder wrote: > > Dear Stefan, > > thanks for the fast reply. We encountered the problem again, this time in a > much simpler situation; please see below. However, let me start with your > questions first: > > What bug? -- In a single-active MDS set-up,

Re: [ceph-users] "Failed to authpin" results in large number of blocked requests

2019-04-03 Thread Yan, Zheng
http://tracker.ceph.com/issues/25131 may relieve the issue. please try ceph version 13.2.5. Regards Yan, Zheng On Thu, Mar 28, 2019 at 6:02 PM Zoë O'Connell wrote: > > We're running a Ceph mimic (13.2.4) cluster which is predominantly used > for CephFS. We have recently switched

Re: [ceph-users] MDS allocates all memory (>500G) replaying, OOM-killed, repeat

2019-04-02 Thread Yan, Zheng
Looks like http://tracker.ceph.com/issues/37399. which version of ceph-mds do you use? On Tue, Apr 2, 2019 at 7:47 AM Sergey Malinin wrote: > > These steps pretty well correspond to > http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/ > Were you able to replay journal manually with no

Re: [ceph-users] inline_data (was: CephFS and many small files)

2019-04-02 Thread Yan, Zheng
On Tue, Apr 2, 2019 at 9:10 PM Paul Emmerich wrote: > > On Tue, Apr 2, 2019 at 3:05 PM Yan, Zheng wrote: > > > > On Tue, Apr 2, 2019 at 8:23 PM Clausen, Jörn wrote: > > > > > > Hi! > > > > > > Am 29.03.2019 um 23:56 schrieb Paul Emmerich

Re: [ceph-users] inline_data (was: CephFS and many small files)

2019-04-02 Thread Yan, Zheng
On Tue, Apr 2, 2019 at 9:05 PM Yan, Zheng wrote: > > On Tue, Apr 2, 2019 at 8:23 PM Clausen, Jörn wrote: > > > > Hi! > > > > Am 29.03.2019 um 23:56 schrieb Paul Emmerich: > > > There's also some metadata overhead etc. You might want to consider > &g

Re: [ceph-users] inline_data (was: CephFS and many small files)

2019-04-02 Thread Yan, Zheng
don't have plan to mark this feature stable. (probably we will remove this feature in the furthure). Yan, Zheng > $ ceph fs dump | grep inline_data > dumped fsmap epoch 1224 > inline_data enabled > > I have reduced the size of the bonnie-generated files to 1 byte. But >

Re: [ceph-users] MDS stuck at replaying status

2019-04-02 Thread Yan, Zheng
please set debug_mds=10, and try again On Tue, Apr 2, 2019 at 1:01 PM Albert Yue wrote: > > Hi, > > This happens after we restart the active MDS, and somehow the standby MDS > daemon cannot take over successfully and is stuck at up:replaying. It is > showing the following log. Any idea on how

Re: [ceph-users] co-located cephfs client deadlock

2019-04-01 Thread Yan, Zheng
On Mon, Apr 1, 2019 at 6:45 PM Dan van der Ster wrote: > > Hi all, > > We have been benchmarking a hyperconverged cephfs cluster (kernel > clients + osd on same machines) for awhile. Over the weekend (for the > first time) we had one cephfs mount deadlock while some clients were > running ior. >

Re: [ceph-users] Ceph MDS laggy

2019-03-25 Thread Yan, Zheng
On Mon, Mar 25, 2019 at 6:36 PM Mark Schouten wrote: > > On Mon, Jan 21, 2019 at 10:17:31AM +0800, Yan, Zheng wrote: > > It's http://tracker.ceph.com/issues/37977. Thanks for your help. > > > > I think I've hit this bug. Ceph MDS using 100% ceph and reporting as > lagg

Re: [ceph-users] CephFS: effects of using hard links

2019-03-20 Thread Yan, Zheng
ged. hard link in cephfs is magic symbol link. Its main overhead is at open. Regards Yan, Zheng 2. Is there any performance (dis)advantage? Generally not once the file is open. 3. When using hard links, is there an actual space savings, or is there some trickery happening?

Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Yan, Zheng
On Mon, Mar 18, 2019 at 9:50 PM Dylan McCulloch wrote: > > > >please run following command. It will show where is 4. > > > >rados -p -p hpcfs_metadata getxattr 4. parent >/tmp/parent > >ceph-dencoder import /tmp/parent type inode_backtrace_t decode dump_json > > > > $

Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Yan, Zheng
please run following command. It will show where is 4. rados -p -p hpcfs_metadata getxattr 4. parent >/tmp/parent ceph-dencoder import /tmp/parent type inode_backtrace_t decode dump_json On Mon, Mar 18, 2019 at 8:15 PM Dylan McCulloch wrote: > > >> >> >cephfs does not

Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Yan, Zheng
please check if 4. has omap header and xattrs rados -p hpcfs_data listxattr 4. rados -p hpcfs_data getomapheader 4. On Mon, Mar 18, 2019 at 7:37 PM Dylan McCulloch wrote: > > >> > > >> >cephfs does not create/use object "4.". Please show us some > >> >of its

Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Yan, Zheng
On Mon, Mar 18, 2019 at 6:05 PM Dylan McCulloch wrote: > > > > > >cephfs does not create/use object "4.". Please show us some > >of its keys. > > > > https://pastebin.com/WLfLTgni > Thanks > Is the object recently modified? rados -p hpcfs_metadata stat 4. > >On Mon, Mar 18,

Re: [ceph-users] CephFS - large omap object

2019-03-18 Thread Yan, Zheng
cephfs does not create/use object "4.". Please show us some of its keys. On Mon, Mar 18, 2019 at 4:16 PM Dylan McCulloch wrote: > > Hi all, > > We have a large omap object warning on one of our Ceph clusters. > The only reports I've seen regarding the "large omap objects" warning from

Re: [ceph-users] Can CephFS Kernel Client Not Read & Write at the Same Time?

2019-03-07 Thread Yan, Zheng
CephFS kernel mount blocks reads while other client has dirty data in its page cache. Cache coherency rule looks like: state 1 - only one client opens a file for read/write. the client can use page cache state 2 - multiple clients open a file for read, no client opens the file for wirte.

Re: [ceph-users] Cephfs recursive stats | rctime in the future

2019-02-28 Thread Yan, Zheng
On Thu, Feb 28, 2019 at 5:33 PM David C wrote: > > On Wed, Feb 27, 2019 at 11:35 AM Hector Martin wrote: >> >> On 27/02/2019 19:22, David C wrote: >> > Hi All >> > >> > I'm seeing quite a few directories in my filesystem with rctime years in >> > the future. E.g >> > >> > ]# getfattr -d -m

Re: [ceph-users] CephFS: client hangs

2019-02-19 Thread Yan, Zheng
On Tue, Feb 19, 2019 at 5:10 PM Hennen, Christian wrote: > > Hi! > > >mon_max_pg_per_osd = 400 > > > >In the ceph.conf and then restart all the services / or inject the config > >into the running admin > > I restarted each server (MONs and OSDs weren’t enough) and now the health > warning is

Re: [ceph-users] CephFS: client hangs

2019-02-18 Thread Yan, Zheng
On Mon, Feb 18, 2019 at 10:55 PM Hennen, Christian wrote: > > Dear Community, > > > > we are running a Ceph Luminous Cluster with CephFS (Bluestore OSDs). During > setup, we made the mistake of configuring the OSDs on RAID Volumes. Initially > our cluster consisted of 3 nodes, each housing 1

Re: [ceph-users] MDS crash (Mimic 13.2.2 / 13.2.4 ) elist.h: 39: FAILED assert(!is_on_list())

2019-02-11 Thread Yan, Zheng
> mds_cache_size = 8589934592 > mds_cache_memory_limit = 17179869184 > > Should these values be left in our configuration? No. you'd better to change them to original values. > > again thanks for the assistance, > > Jake > > On 2/11/19 8:17 AM, Yan, Zheng wrote: &

Re: [ceph-users] Controlling CephFS hard link "primary name" for recursive stat

2019-02-11 Thread Yan, Zheng
On Sat, Feb 9, 2019 at 8:10 AM Hector Martin wrote: > > Hi list, > > As I understand it, CephFS implements hard links as effectively "smart > soft links", where one link is the primary for the inode and the others > effectively reference it. When it comes to directories, the size for a >

Re: [ceph-users] MDS crash (Mimic 13.2.2 / 13.2.4 ) elist.h: 39: FAILED assert(!is_on_list())

2019-02-11 Thread Yan, Zheng
On Sat, Feb 9, 2019 at 12:36 AM Jake Grimmett wrote: > > Dear All, > > Unfortunately the MDS has crashed on our Mimic cluster... > > First symptoms were rsync giving: > "No space left on device (28)" > when trying to rename or delete > > This prompted me to try restarting the MDS, as it reported

Re: [ceph-users] tuning ceph mds cache settings

2019-01-29 Thread Yan, Zheng
On Tue, Jan 29, 2019 at 9:05 PM Jonathan Woytek wrote: > > On Tue, Jan 29, 2019 at 7:12 AM Yan, Zheng wrote: >> >> Looks like you have 5 active mds. I suspect your issue is related to >> load balancer. Please try disabling mds load balancer (add >> "mds_bal_m

Re: [ceph-users] ceph-fs crashed after upgrade to 13.2.4

2019-01-29 Thread Yan, Zheng
2 59=0+59), dirfrag has f(v0 > m2019-01-28 14:46:47.983292 58=0+58) > log [ERR] : unmatched rstat rbytes on single dirfrag 0x10002253db6, > inode has n(v11 rc2019-01-28 14:46:47.983292 b1478 71=11+60), dirfrag > has n(v11 rc2019-01-28 14:46:47.983292 b1347 68=10+58) > ... > > any

Re: [ceph-users] cephfs constantly strays ( num_strays)

2019-01-29 Thread Yan, Zheng
Nothing to worried about. On Sun, Jan 27, 2019 at 10:13 PM Marc Roos wrote: > > > I have constantly strays. What are strays? Why do I have them? Is this > bad? > > > > [@~]# ceph daemon mds.c perf dump| grep num_stray > "num_strays": 25823, > "num_strays_delayed": 0, >

Re: [ceph-users] tuning ceph mds cache settings

2019-01-29 Thread Yan, Zheng
and use 'export_pin' to manually pin directories to mds (https://ceph.com/community/new-luminous-cephfs-subtree-pinning/) > > On Wed, Jan 9, 2019 at 9:10 PM Yan, Zheng wrote: >> >> [...] >> Could you please run following command (for each active mds) when >> ope

Re: [ceph-users] ceph-fs crashed after upgrade to 13.2.4

2019-01-29 Thread Yan, Zheng
upgraded from which version? have you try downgrade ceph-mds to old version? On Mon, Jan 28, 2019 at 9:20 PM Ansgar Jazdzewski wrote: > > hi folks we need some help with our cephfs, all mds keep crashing > > starting mds.mds02 at - > terminate called after throwing an instance of >

Re: [ceph-users] how to debug a stuck cephfs?

2019-01-27 Thread Yan, Zheng
http://docs.ceph.com/docs/master/cephfs/troubleshooting/ For your case, it's likely client got evicted by mds. On Mon, Jan 28, 2019 at 9:50 AM Sang, Oliver wrote: > > Hello, > > > > Our cephfs looks just stuck. If I run some command such like ‘makdir’, > ‘touch’ a new file, it just stuck

Re: [ceph-users] MDS performance issue

2019-01-27 Thread Yan, Zheng
On Mon, Jan 28, 2019 at 10:34 AM Albert Yue wrote: > > Hi Yan Zheng, > > Our clients are also complaining about operations like 'du' or 'ncdu' being > very slow. Is there any alternative tool for such kind of operation on > CephFS? Thanks! > 'du' traverse whole directory t

Re: [ceph-users] Process stuck in D+ on cephfs mount

2019-01-23 Thread Yan, Zheng
On Wed, Jan 23, 2019 at 6:07 PM Marc Roos wrote: > > Yes sort of. I do have an inconsistent pg for a while, but it is on a > different pool. But I take it this is related to a networking issue I > currently have with rsync and broken pipe. > > Where exactly does it go wrong? The cephfs kernel

Re: [ceph-users] MDS performance issue

2019-01-22 Thread Yan, Zheng
00G metadata, mds may need 1T or more memory. > On Tue, Jan 22, 2019 at 5:48 PM Yan, Zheng wrote: >> >> On Tue, Jan 22, 2019 at 10:49 AM Albert Yue >> wrote: >> > >> > Hi Yan Zheng, >> > >> > In your opinion, can we resolve this issue by mo

Re: [ceph-users] Broken CephFS stray entries?

2019-01-22 Thread Yan, Zheng
On Tue, Jan 22, 2019 at 10:42 PM Dan van der Ster wrote: > > On Tue, Jan 22, 2019 at 3:33 PM Yan, Zheng wrote: > > > > On Tue, Jan 22, 2019 at 9:08 PM Dan van der Ster > > wrote: > > > > > > Hi Zheng, > > > > > > We also just

Re: [ceph-users] cephfs performance degraded very fast

2019-01-22 Thread Yan, Zheng
On Tue, Jan 22, 2019 at 8:24 PM renjianxinlover wrote: > > hi, >at some time, as cache pressure or caps release failure, client apps mount > got stuck. >my use case is in kubernetes cluster and automatic kernel client mount in > nodes. >is anyone faced with same issue or has related

Re: [ceph-users] Process stuck in D+ on cephfs mount

2019-01-22 Thread Yan, Zheng
On Wed, Jan 23, 2019 at 5:50 AM Marc Roos wrote: > > > I got one again > > [] wait_on_page_bit_killable+0x83/0xa0 > [] __lock_page_or_retry+0xb2/0xc0 > [] filemap_fault+0x3b7/0x410 > [] ceph_filemap_fault+0x13c/0x310 [ceph] > [] __do_fault+0x4c/0xc0 > [] do_read_fault.isra.42+0x43/0x130 > []

Re: [ceph-users] Broken CephFS stray entries?

2019-01-22 Thread Yan, Zheng
gt; } else { > - clog->error() << "unmatched fragstat on " << ino() << ", inode > has " > + clog->warn() << "unmatched fragstat on " << ino() << ", inode has > " >

Re: [ceph-users] MDS performance issue

2019-01-22 Thread Yan, Zheng
On Tue, Jan 22, 2019 at 10:49 AM Albert Yue wrote: > > Hi Yan Zheng, > > In your opinion, can we resolve this issue by move MDS to a 512GB or 1TB > memory machine? > The problem is from client side, especially clients with large memory. I don't think enlarge mds cache size is

Re: [ceph-users] MDS performance issue

2019-01-21 Thread Yan, Zheng
On Mon, Jan 21, 2019 at 11:16 AM Albert Yue wrote: > > Dear Ceph Users, > > We have set up a cephFS cluster with 6 osd machines, each with 16 8TB > harddisk. Ceph version is luminous 12.2.5. We created one data pool with > these hard disks and created another meta data pool with 3 ssd. We

Re: [ceph-users] Process stuck in D+ on cephfs mount

2019-01-21 Thread Yan, Zheng
no, there is no config for request timeout > > -Original Message- > From: Yan, Zheng [mailto:uker...@gmail.com] > Sent: 21 January 2019 02:50 > To: Marc Roos > Cc: ceph-users > Subject: Re: [ceph-users] Process stuck in D+ on cephfs mount > > check /proc//stack to f

Re: [ceph-users] MDS performance issue

2019-01-21 Thread Yan, Zheng
On Mon, Jan 21, 2019 at 12:12 PM Albert Yue wrote: > > Hi Yan Zheng, > > 1. mds cache limit is set to 64GB > 2. we get the size of meta data pool by running `ceph df` and saw meta data > pool just used 200MB space. > That's very strange. One file uses about 1k metadat

Re: [ceph-users] MDS performance issue

2019-01-20 Thread Yan, Zheng
On Mon, Jan 21, 2019 at 11:16 AM Albert Yue wrote: > > Dear Ceph Users, > > We have set up a cephFS cluster with 6 osd machines, each with 16 8TB > harddisk. Ceph version is luminous 12.2.5. We created one data pool with > these hard disks and created another meta data pool with 3 ssd. We

Re: [ceph-users] Ceph MDS laggy

2019-01-20 Thread Yan, Zheng
It's http://tracker.ceph.com/issues/37977. Thanks for your help. Regards Yan, Zheng On Sun, Jan 20, 2019 at 12:40 AM Adam Tygart wrote: > > It worked for about a week, and then seems to have locked up again. > > Here is the back trace from the threads on the mds: > http://pe

Re: [ceph-users] Process stuck in D+ on cephfs mount

2019-01-20 Thread Yan, Zheng
check /proc//stack to find where it is stuck On Mon, Jan 21, 2019 at 5:51 AM Marc Roos wrote: > > > I have a process stuck in D+ writing to cephfs kernel mount. Anything > can be done about this? (without rebooting) > > > CentOS Linux release 7.5.1804 (Core) > Linux 3.10.0-514.21.2.el7.x86_64 >

Re: [ceph-users] mds0: Metadata damage detected

2019-01-15 Thread Yan, Zheng
ic\/video\/3h\/3hG6X7\/screen-msmall"}] > Looks like object 1005607c727. in cephfs metadata pool is corrupted. please run following commands and send mds.0 log to us ceph tell mds.0 injectargs '--debug_mds 10' ceph tell mds.0 damage rm 3472877204 ls

Re: [ceph-users] Ceph MDS laggy

2019-01-13 Thread Yan, Zheng
g on' and 'thread apply all bt' inside gdb. and send the output to us Yan, Zheng > -- > Adam > > On Sat, Jan 12, 2019 at 7:53 PM Adam Tygart wrote: > > > > On a hunch, I shutdown the compute nodes for our HPC cluster, and 10 > > minutes after that restarted the

Re: [ceph-users] tuning ceph mds cache settings

2019-01-09 Thread Yan, Zheng
. > Could you please run following command (for each active mds) when operations are fast and when operations are slow - for i in `seq 10`; do ceph daemon mds.xxx dump_historic_ops > mds.xxx.$i; sleep 1; done Then send the results to us Regards Yan, Zheng > There are ma

Re: [ceph-users] cephfs : rsync backup create cache pressure on clients, filling caps

2019-01-06 Thread Yan, Zheng
On Fri, Jan 4, 2019 at 11:40 AM Alexandre DERUMIER wrote: > > Hi, > > I'm currently doing cephfs backup, through a dedicated clients mounting the > whole filesystem at root. > others clients are mounting part of the filesystem. (kernel cephfs clients) > > > I have around 22millions inodes, > >

Re: [ceph-users] MDS uses up to 150 GByte of memory during journal replay

2019-01-06 Thread Yan, Zheng
likely caused by http://tracker.ceph.com/issues/37399. Regards Yan, Zheng On Sat, Jan 5, 2019 at 5:44 PM Matthias Aebi wrote: > > Hello everyone, > > We are running a small cluster on 5 machines with 48 OSDs / 5 MDSs / 5 MONs > based on Luminous 12.2.10 and Debian Stretch

Re: [ceph-users] CephFS client df command showing raw space after adding second pool to mds

2019-01-03 Thread Yan, Zheng
On Fri, Jan 4, 2019 at 1:53 AM David C wrote: > > Hi All > > Luminous 12.2.12 > Single MDS > Replicated pools > > A 'df' on a CephFS kernel client used to show me the usable space (i.e the > raw space with the replication overhead applied). This was when I just had a > single cephfs data pool.

Re: [ceph-users] cephfs client operation record

2019-01-01 Thread Yan, Zheng
On Wed, Jan 2, 2019 at 11:12 AM Zhenshi Zhou wrote: > > Hi all, > > I have a cluster on Luminous(12.2.8). > Is there a way I can check clients' operation records? > No way do that > Thanks > ___ > ceph-users mailing list > ceph-users@lists.ceph.com >

Re: [ceph-users] cephfs file block size: must it be so big?

2018-12-14 Thread Yan, Zheng
that it's > the RADOS object size? > > I'm thinking of modifying the cephfs filesystem driver to add a mount option > to specify a fixed block size to be reported for all files, and using 4K or > 64K. Would that break something? mou

Re: [ceph-users] mds lost very frequently

2018-12-13 Thread Yan, Zheng
On Fri, Dec 14, 2018 at 12:05 PM Sang, Oliver wrote: > > Thanks a lot, Yan Zheng! > > I enabled only 2 MDS - node1(active) and node2. Then I modified ceph.conf of > node2 to have - > debug_mds = 10/10 > > At 08:35:28, I observed degradation, the node1 was not a MDS

Re: [ceph-users] mds lost very frequently

2018-12-13 Thread Yan, Zheng
On Thu, Dec 13, 2018 at 9:25 PM Sang, Oliver wrote: > > Thanks a lot, Yan Zheng! > > Regarding the " set debug_mds =10 for standby mds (change debug_mds to 0 > after mds becomes active)." > Could you please explain the purpose? Just want to collect debug log, or it

Re: [ceph-users] mds lost very frequently

2018-12-12 Thread Yan, Zheng
lt;<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< > > > > The full log is also attached. Could you please help us? Thanks! > > Please try

Re: [ceph-users] 【cephfs】cephfs hung when scp/rsync large files

2018-12-05 Thread Yan, Zheng
l on the ceph storage > server side. > > > Anyway,I will have a try. > > — > Best Regards > Li, Ning > > > > > On Dec 6, 2018, at 11:41, Yan, Zheng wrote: > > > > On Wed, Dec 5, 2018 at 2:33 PM NingLi wrote: > >> > >> Hi all, > >

Re: [ceph-users] 【cephfs】cephfs hung when scp/rsync large files

2018-12-05 Thread Yan, Zheng
On Wed, Dec 5, 2018 at 2:33 PM NingLi wrote: > > Hi all, > > We found that some process writing cephfs will hang for a long time (> 120s) > when uploading(scp/rsync) large files(totally 50G ~ 300G)to the app node's > cephfs mountpoint. > > This problem is not always reproduciable. But when

Re: [ceph-users] 【cephfs】cephfs hung when scp/rsync large files

2018-12-05 Thread Yan, Zheng
Is the cephfs mount on the same machine that run OSD? On Wed, Dec 5, 2018 at 2:33 PM NingLi wrote: > > Hi all, > > We found that some process writing cephfs will hang for a long time (> 120s) > when uploading(scp/rsync) large files(totally 50G ~ 300G)to the app node's > cephfs mountpoint. >

Re: [ceph-users] [cephfs] Kernel outage / timeout

2018-12-04 Thread Yan, Zheng
On Tue, Dec 4, 2018 at 6:55 PM wrote: > > Hi, > > I have some wild freeze using cephfs with the kernel driver > For instance: > [Tue Dec 4 10:57:48 2018] libceph: mon1 10.5.0.88:6789 session lost, > hunting for new mon > [Tue Dec 4 10:57:48 2018] libceph: mon2 10.5.0.89:6789 session established

  1   2   3   4   5   6   >