historic_ops' and ''ceph daemon mds.xxx perf reset; ceph
daemon mds.xxx perf dump'. send the outputs to us.
>
> On 17/01/2020 13:07, Yan, Zheng wrote:
> > On Fri, Jan 17, 2020 at 4:47 PM Janek Bevendorff
> > wrote:
> >> Hi,
> >>
> >> We have a CephFS in our clus
On Fri, Jan 17, 2020 at 4:47 PM Janek Bevendorff
wrote:
>
> Hi,
>
> We have a CephFS in our cluster with 3 MDS to which > 300 clients
> connect at any given time. The FS contains about 80 TB of data and many
> million files, so it is important that meta data operations work
> smoothly even when
, int)' thread
> 7fd436ca7700 time 2019-12-04 20:28:34.939048
> /build/ceph-13.2.6/src/mds/OpenFileTable.cc: 476: FAILED assert(omap_num_objs
> <= MAX_OBJECTS)
>
> mds.0.openfiles omap_num_objs 1025 <- ... just 1 higher than 1024?
> Coincidence?
>
> Gr. Stefan
>
Please c
does 'group dev' have the same id on two VMss? do the the VMs use the
same 'ceph auth name' to mount cephfs?
On Wed, Nov 6, 2019 at 4:12 PM Alex Litvak wrote:
>
> Plot thickens.
>
> I create a new user sam2 and group sam2 both uid and gid = 1501. User sam2
> is a member of group dev. When I
On Wed, Nov 6, 2019 at 5:47 AM Alex Litvak wrote:
>
> Hello Cephers,
>
>
> I am trying to understand how uid and gid are handled on the shared cephfs
> mount. I am using 14.2.2 and cephfs kernel based client.
> I have 2 client vms with following uid gid
>
> vm1 user dev (uid=500) group dev
see https://tracker.ceph.com/issues/42515. just ignore the warning for now
On Mon, Oct 7, 2019 at 7:50 AM Nigel Williams
wrote:
>
> Out of the blue this popped up (on an otherwise healthy cluster):
>
> HEALTH_WARN 1 large omap objects
> LARGE_OMAP_OBJECTS 1 large omap objects
> 1 large
> CephFS worked well for approximately 3 hours and then our MDS crashed again,
> apparently due to the bug described at https://tracker.ceph.com/issues/38452
>
does the method in issue #38452 work for you? if not, please
debug_mds to 10, and set log around the crash to us
Yan, Zheng
er mds
restart can fix the incorrect stat.
> On Mon, Oct 21, 2019 at 4:36 AM Yan, Zheng wrote:
>>
>> On Fri, Oct 18, 2019 at 9:10 AM Gustavo Tonini
>> wrote:
>> >
>> > Hi Zheng,
>> > the cluster is running ceph mimic. This warning abo
On Mon, Oct 21, 2019 at 7:58 PM Stefan Kooman wrote:
>
> Quoting Yan, Zheng (uker...@gmail.com):
>
> > I double checked the code, but didn't find any clue. Can you compile
> > mds with a debug patch?
>
> Sure, I'll try to do my best to get a properly packaged Ceph Mi
On Mon, Oct 21, 2019 at 4:33 PM Stefan Kooman wrote:
>
> Quoting Yan, Zheng (uker...@gmail.com):
>
> > delete 'mdsX_openfiles.0' object from cephfs metadata pool. (X is rank
> > of the crashed mds)
>
> OK, MDS crashed again, restarted. I stopped it, deleted the obj
On Sun, Oct 20, 2019 at 1:53 PM Stefan Kooman wrote:
>
> Dear list,
>
> Quoting Stefan Kooman (ste...@bit.nl):
>
> > I wonder if this situation is more likely to be hit on Mimic 13.2.6 than
> > on any other system.
> >
> > Any hints / help to prevent this from happening?
>
> We have had this
could variable "newparent" be NULL at
> https://github.com/ceph/ceph/blob/master/src/mds/SnapRealm.cc#L599 ? Is there
> a way to fix this?
>
try 'cephfs-data-scan init'. It will setup root inode's snaprealm.
> On Thu, Oct 17, 2019 at 9:58 PM Yan, Zheng wrote:
>>
ddbc
> ceph@deployer:~$
>
> Could a journal reset help with this?
>
> I could snapshot all FS pools and export the journal before to guarantee a
> rollback to this state if something goes wrong with jounal reset.
>
> On Thu, Oct 17, 2019, 09:07 Yan, Zheng wrote:
>>
On Tue, Oct 15, 2019 at 12:03 PM Gustavo Tonini wrote:
>
> Dear ceph users,
> we're experiencing a segfault during MDS startup (replay process) which is
> making our FS inaccessible.
>
> MDS log messages:
>
> Oct 15 03:41:39.894584 mds1 ceph-mds: -472> 2019-10-15 00:40:30.201
> 7f3c08f49700
On Sat, Oct 12, 2019 at 1:10 AM Kenneth Waegeman
wrote:
> Hi all,
>
> After solving some pg inconsistency problems, my fs is still in
> trouble. my mds's are crashing with this error:
>
>
> > -5> 2019-10-11 19:02:55.375 7f2d39f10700 1 mds.1.564276 rejoin_start
> > -4> 2019-10-11
gt; ??
>
You are right. Sorry for the bug. For now, please got back to 14.2.2
(just mds) or complie ceph-mds from source
Yan, Zheng
> Did you already try going back to v14.2.2 (on the MDS's only) ??
>
> -- dan
>
> On Thu, Sep 19, 2019 at 4:59 PM Kenneth Waegeman
> wrot
On Sat, Sep 14, 2019 at 8:57 PM Hector Martin wrote:
>
> On 13/09/2019 16.25, Hector Martin wrote:
> > Is this expected for CephFS? I know data deletions are asynchronous, but
> > not being able to delete metadata/directories without an undue impact on
> > the whole filesystem performance is
On Thu, Sep 5, 2019 at 4:31 PM Hector Martin wrote:
>
> I have a production CephFS (13.2.6 Mimic) with >400K strays. I believe
> this is caused by snapshots. The backup process for this filesystem
> consists of creating a snapshot and rsyncing it over daily, and
> snapshots are kept locally in
On Mon, Aug 26, 2019 at 9:25 PM thoralf schulze wrote:
>
> hi Zheng -
>
> On 8/26/19 2:55 PM, Yan, Zheng wrote:
> > I tracked down the bug
> > https://tracker.ceph.com/issues/41434
>
> wow, that was quick - thank you for investigating. we are looking
> forward fo
On Mon, Aug 26, 2019 at 6:57 PM thoralf schulze wrote:
>
> hi Zheng,
>
> On 8/21/19 4:32 AM, Yan, Zheng wrote:
> > Please enable debug mds (debug_mds=10), and try reproducing it again.
>
> please find the logs at
> https://www.user.tu-berlin.de/thoralf.schulze/ceph-de
the mds
> daemons on these machines have to be manually restarted. more often than
> we wish, the failover fails altogether, resulting in an unresponsive cephfs.
>
Please enable debug mds (debug_mds=10), and try reproducing it again.
Regards
Yan, Zheng
> this is with mimic 13.2.6 and a
nautilus version (14.2.2) of ‘cephfs-data-scan scan_links’ can fix
snaptable. hopefully it will fix your issue.
you don't need to upgrade whole cluster. Just install nautilus in a
temp machine or compile ceph from source.
On Tue, Aug 13, 2019 at 2:35 PM Adam wrote:
>
> Pierre Dittes helped
On Wed, Aug 7, 2019 at 3:46 PM wrote:
>
> All;
>
> I have a server running CentOS 7.6 (1810), that I want to set up with CephFS
> (full disclosure, I'm going to be running samba on the CephFS). I can mount
> the CephFS fine when I use the option secret=, but when I switch to
> secretfile=, I
On Mon, Jul 29, 2019 at 9:54 PM Dan van der Ster wrote:
>
> On Mon, Jul 29, 2019 at 3:47 PM Yan, Zheng wrote:
> >
> > On Mon, Jul 29, 2019 at 9:13 PM Dan van der Ster
> > wrote:
> > >
> > > On Mon, Jul 29, 2019 at 2:52 PM Yan, Zheng wrote:
> >
On Mon, Jul 29, 2019 at 9:13 PM Dan van der Ster wrote:
>
> On Mon, Jul 29, 2019 at 2:52 PM Yan, Zheng wrote:
> >
> > On Fri, Jul 26, 2019 at 4:45 PM Dan van der Ster
> > wrote:
> > >
> > > Hi all,
> > >
> > > Last night we h
adata rmomapkey 617. 10006289992_head.
I suggest run 'cephfs-data-scan scan_links' after taking down cephfs
(either use 'mds set down true' or 'flush all journasl and
kill all mds')
Regards
Yan, Zheng
>
> Thanks!
>
> Dan
> _
On Wed, Jul 24, 2019 at 3:13 PM Janek Bevendorff <
janek.bevendo...@uni-weimar.de> wrote:
>
> which version?
>
> Nautilus, 14.2.2.
>
I mean kernel version
> try mounting cephfs on a machine/vm with small memory (4G~8G), then rsync
> your date into mount point of that machine.
>
> I could try
On Wed, Jul 24, 2019 at 1:58 PM Janek Bevendorff <
janek.bevendo...@uni-weimar.de> wrote:
> Ceph-fuse ?
>
> No, I am using the kernel module.
>
>
which version?
>
> Was there "Client xxx failing to respond to cache pressure" health warning?
>
>
> At first, yes (at least with the Mimic client).
On Wed, Jul 24, 2019 at 4:06 AM Janek Bevendorff <
janek.bevendo...@uni-weimar.de> wrote:
> Thanks for your reply.
>
> On 23/07/2019 21:03, Nathan Fish wrote:
> > What Ceph version? Do the clients match? What CPUs do the MDS servers
> > have, and how is their CPU usage when this occurs?
>
>
please create a ticket at http://tracker.ceph.com/projects/cephfs and
upload mds log with debug_mds =10
On Tue, Jul 23, 2019 at 6:00 AM Robert LeBlanc wrote:
>
> We have a Luminous cluster which has filled up to 100% multiple times and
> this causes an inode to be left in a bad state. Doing
Check if there is any hang request in 'ceph daemon mds.xxx objecter_requests'
On Tue, Jul 16, 2019 at 11:51 PM Dietmar Rieder
wrote:
>
> On 7/16/19 4:11 PM, Dietmar Rieder wrote:
> > Hi,
> >
> > We are running ceph version 14.1.2 with cephfs only.
> >
> > I just noticed that one of our pgs had
On Wed, Jul 10, 2019 at 4:16 PM Lars Täuber wrote:
>
> Hi everbody!
>
> Is it possible to make snapshots in cephfs writable?
> We need to remove files because of this General Data Protection Regulation
> also from snapshots.
>
It's possible (only delete data), but need to modify both mds and
On Fri, Jun 28, 2019 at 11:42 AM Hector Martin wrote:
>
> On 12/06/2019 22.33, Yan, Zheng wrote:
> > I have tracked down the bug. thank you for reporting this. 'echo 2 >
> > /proc/sys/vm/drop_cache' should fix the hang. If you can compile ceph
> > from source, please
On Fri, Jun 21, 2019 at 6:10 PM Frank Schilder wrote:
>
> Dear Yan, Zheng,
>
> does mimic 13.2.6 fix the snapshot issue? If not, could you please send me a
> link to the issue tracker?
>
no
https://tracker.ceph.com/issues/39987
> Thanks and best regards,
>
> ===
On Tue, Jun 18, 2019 at 4:25 PM ?? ?? wrote:
>
>
>
> There are 2 clients, A and B. There is a directory /a/b/c/d/.
>
> Client A create a file /a/b/c/d/a.txt.
>
> Client B move the folder d to /a/.
>
> Now, this directory looks like this:/a/b/c/ and /a/d/.
>
> /a/b/c/d is not exist any more.
>
>
On Wed, Jun 12, 2019 at 3:26 PM Hector Martin wrote:
>
> Hi list,
>
> I have a setup where two clients mount the same filesystem and
> read/write from mostly non-overlapping subsets of files (Dovecot mail
> storage/indices). There is a third client that takes backups by
> snapshotting the
On Thu, Jun 6, 2019 at 6:36 AM Jorge Garcia wrote:
>
> We have been testing a new installation of ceph (mimic 13.2.2) mostly
> using cephfs (for now). The current test is just setting up a filesystem
> for backups of our other filesystems. After rsyncing data for a few
> days, we started getting
On Mon, Jun 3, 2019 at 3:06 PM James Wilkins
wrote:
>
> Hi all,
>
> After a bit of advice to ensure we’re approaching this the right way.
>
> (version: 12.2.12, multi-mds, dirfrag is enabled)
>
> We have corrupt meta-data as identified by ceph
>
> health: HEALTH_ERR
> 2 MDSs
-mean-it") but sadly, the max_bytes attribute is still not
> > there
> > (also not after remounting on the client / using the file creation and
> > deletion trick).
>
> That's interesting - it suddenly started to work for one directory after
> creating a
On Tue, May 21, 2019 at 6:10 AM Ryan Leimenstoll
wrote:
>
> Hi all,
>
> We recently encountered an issue where our CephFS filesystem unexpectedly was
> set to read-only. When we look at some of the logs from the daemons I can see
> the following:
>
> On the MDS:
> ...
> 2019-05-18 16:34:24.341
try 'umount -f'
On Tue, May 21, 2019 at 4:41 PM Marc Roos wrote:
>
>
>
>
>
> [@ceph]# ps -aux | grep D
> USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND
> root 12527 0.0 0.0 123520 932 pts/1D+ 09:26 0:00 umount
> /home/mail-archive
> root 14549 0.2
frag.zip?l; . Its a bit
> more than 100MB.
>
MSD cache dump shows there is a snapshot related. Please avoid using
snapshot until we fix the bug.
Regards
Yan, Zheng
> The active MDS failed over to the standby after or during the dump cache
> operation. Is this expected? As a result,
nations, to reproduce the issue I will create a directory with many
> entries and execute a test with the many-clients single-file-read load on it.
>
try setting mds_bal_split_rd and mds_bal_split_wr to very large value.
which prevent mds from splitting hot dirfrag
Regards
Yan, Zheng
> I
On Wed, May 15, 2019 at 9:34 PM Frank Schilder wrote:
>
> Dear Stefan,
>
> thanks for the fast reply. We encountered the problem again, this time in a
> much simpler situation; please see below. However, let me start with your
> questions first:
>
> What bug? -- In a single-active MDS set-up,
http://tracker.ceph.com/issues/25131 may relieve the issue. please try
ceph version 13.2.5.
Regards
Yan, Zheng
On Thu, Mar 28, 2019 at 6:02 PM Zoë O'Connell wrote:
>
> We're running a Ceph mimic (13.2.4) cluster which is predominantly used
> for CephFS. We have recently switched
Looks like http://tracker.ceph.com/issues/37399. which version of
ceph-mds do you use?
On Tue, Apr 2, 2019 at 7:47 AM Sergey Malinin wrote:
>
> These steps pretty well correspond to
> http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/
> Were you able to replay journal manually with no
On Tue, Apr 2, 2019 at 9:10 PM Paul Emmerich wrote:
>
> On Tue, Apr 2, 2019 at 3:05 PM Yan, Zheng wrote:
> >
> > On Tue, Apr 2, 2019 at 8:23 PM Clausen, Jörn wrote:
> > >
> > > Hi!
> > >
> > > Am 29.03.2019 um 23:56 schrieb Paul Emmerich
On Tue, Apr 2, 2019 at 9:05 PM Yan, Zheng wrote:
>
> On Tue, Apr 2, 2019 at 8:23 PM Clausen, Jörn wrote:
> >
> > Hi!
> >
> > Am 29.03.2019 um 23:56 schrieb Paul Emmerich:
> > > There's also some metadata overhead etc. You might want to consider
> &g
don't have plan to mark this feature
stable. (probably we will remove this feature in the furthure).
Yan, Zheng
> $ ceph fs dump | grep inline_data
> dumped fsmap epoch 1224
> inline_data enabled
>
> I have reduced the size of the bonnie-generated files to 1 byte. But
>
please set debug_mds=10, and try again
On Tue, Apr 2, 2019 at 1:01 PM Albert Yue wrote:
>
> Hi,
>
> This happens after we restart the active MDS, and somehow the standby MDS
> daemon cannot take over successfully and is stuck at up:replaying. It is
> showing the following log. Any idea on how
On Mon, Apr 1, 2019 at 6:45 PM Dan van der Ster wrote:
>
> Hi all,
>
> We have been benchmarking a hyperconverged cephfs cluster (kernel
> clients + osd on same machines) for awhile. Over the weekend (for the
> first time) we had one cephfs mount deadlock while some clients were
> running ior.
>
On Mon, Mar 25, 2019 at 6:36 PM Mark Schouten wrote:
>
> On Mon, Jan 21, 2019 at 10:17:31AM +0800, Yan, Zheng wrote:
> > It's http://tracker.ceph.com/issues/37977. Thanks for your help.
> >
>
> I think I've hit this bug. Ceph MDS using 100% ceph and reporting as
> lagg
ged. hard link in cephfs is magic symbol link. Its main
overhead is at open.
Regards
Yan, Zheng
2. Is there any performance (dis)advantage?
Generally not once the file is open.
3. When using hard links, is there an actual space savings, or is
there some trickery happening?
On Mon, Mar 18, 2019 at 9:50 PM Dylan McCulloch wrote:
>
>
> >please run following command. It will show where is 4.
> >
> >rados -p -p hpcfs_metadata getxattr 4. parent >/tmp/parent
> >ceph-dencoder import /tmp/parent type inode_backtrace_t decode dump_json
> >
>
> $
please run following command. It will show where is 4.
rados -p -p hpcfs_metadata getxattr 4. parent >/tmp/parent
ceph-dencoder import /tmp/parent type inode_backtrace_t decode dump_json
On Mon, Mar 18, 2019 at 8:15 PM Dylan McCulloch wrote:
>
> >> >> >cephfs does not
please check if 4. has omap header and xattrs
rados -p hpcfs_data listxattr 4.
rados -p hpcfs_data getomapheader 4.
On Mon, Mar 18, 2019 at 7:37 PM Dylan McCulloch wrote:
>
> >> >
> >> >cephfs does not create/use object "4.". Please show us some
> >> >of its
On Mon, Mar 18, 2019 at 6:05 PM Dylan McCulloch wrote:
>
>
> >
> >cephfs does not create/use object "4.". Please show us some
> >of its keys.
> >
>
> https://pastebin.com/WLfLTgni
> Thanks
>
Is the object recently modified?
rados -p hpcfs_metadata stat 4.
> >On Mon, Mar 18,
cephfs does not create/use object "4.". Please show us some
of its keys.
On Mon, Mar 18, 2019 at 4:16 PM Dylan McCulloch wrote:
>
> Hi all,
>
> We have a large omap object warning on one of our Ceph clusters.
> The only reports I've seen regarding the "large omap objects" warning from
CephFS kernel mount blocks reads while other client has dirty data in
its page cache. Cache coherency rule looks like:
state 1 - only one client opens a file for read/write. the client can
use page cache
state 2 - multiple clients open a file for read, no client opens the
file for wirte.
On Thu, Feb 28, 2019 at 5:33 PM David C wrote:
>
> On Wed, Feb 27, 2019 at 11:35 AM Hector Martin wrote:
>>
>> On 27/02/2019 19:22, David C wrote:
>> > Hi All
>> >
>> > I'm seeing quite a few directories in my filesystem with rctime years in
>> > the future. E.g
>> >
>> > ]# getfattr -d -m
On Tue, Feb 19, 2019 at 5:10 PM Hennen, Christian
wrote:
>
> Hi!
>
> >mon_max_pg_per_osd = 400
> >
> >In the ceph.conf and then restart all the services / or inject the config
> >into the running admin
>
> I restarted each server (MONs and OSDs weren’t enough) and now the health
> warning is
On Mon, Feb 18, 2019 at 10:55 PM Hennen, Christian
wrote:
>
> Dear Community,
>
>
>
> we are running a Ceph Luminous Cluster with CephFS (Bluestore OSDs). During
> setup, we made the mistake of configuring the OSDs on RAID Volumes. Initially
> our cluster consisted of 3 nodes, each housing 1
> mds_cache_size = 8589934592
> mds_cache_memory_limit = 17179869184
>
> Should these values be left in our configuration?
No. you'd better to change them to original values.
>
> again thanks for the assistance,
>
> Jake
>
> On 2/11/19 8:17 AM, Yan, Zheng wrote:
&
On Sat, Feb 9, 2019 at 8:10 AM Hector Martin wrote:
>
> Hi list,
>
> As I understand it, CephFS implements hard links as effectively "smart
> soft links", where one link is the primary for the inode and the others
> effectively reference it. When it comes to directories, the size for a
>
On Sat, Feb 9, 2019 at 12:36 AM Jake Grimmett wrote:
>
> Dear All,
>
> Unfortunately the MDS has crashed on our Mimic cluster...
>
> First symptoms were rsync giving:
> "No space left on device (28)"
> when trying to rename or delete
>
> This prompted me to try restarting the MDS, as it reported
On Tue, Jan 29, 2019 at 9:05 PM Jonathan Woytek wrote:
>
> On Tue, Jan 29, 2019 at 7:12 AM Yan, Zheng wrote:
>>
>> Looks like you have 5 active mds. I suspect your issue is related to
>> load balancer. Please try disabling mds load balancer (add
>> "mds_bal_m
2 59=0+59), dirfrag has f(v0
> m2019-01-28 14:46:47.983292 58=0+58)
> log [ERR] : unmatched rstat rbytes on single dirfrag 0x10002253db6,
> inode has n(v11 rc2019-01-28 14:46:47.983292 b1478 71=11+60), dirfrag
> has n(v11 rc2019-01-28 14:46:47.983292 b1347 68=10+58)
> ...
>
> any
Nothing to worried about.
On Sun, Jan 27, 2019 at 10:13 PM Marc Roos wrote:
>
>
> I have constantly strays. What are strays? Why do I have them? Is this
> bad?
>
>
>
> [@~]# ceph daemon mds.c perf dump| grep num_stray
> "num_strays": 25823,
> "num_strays_delayed": 0,
>
and use 'export_pin'
to manually pin directories to mds
(https://ceph.com/community/new-luminous-cephfs-subtree-pinning/)
>
> On Wed, Jan 9, 2019 at 9:10 PM Yan, Zheng wrote:
>>
>> [...]
>> Could you please run following command (for each active mds) when
>> ope
upgraded from which version? have you try downgrade ceph-mds to old version?
On Mon, Jan 28, 2019 at 9:20 PM Ansgar Jazdzewski
wrote:
>
> hi folks we need some help with our cephfs, all mds keep crashing
>
> starting mds.mds02 at -
> terminate called after throwing an instance of
>
http://docs.ceph.com/docs/master/cephfs/troubleshooting/
For your case, it's likely client got evicted by mds.
On Mon, Jan 28, 2019 at 9:50 AM Sang, Oliver wrote:
>
> Hello,
>
>
>
> Our cephfs looks just stuck. If I run some command such like ‘makdir’,
> ‘touch’ a new file, it just stuck
On Mon, Jan 28, 2019 at 10:34 AM Albert Yue wrote:
>
> Hi Yan Zheng,
>
> Our clients are also complaining about operations like 'du' or 'ncdu' being
> very slow. Is there any alternative tool for such kind of operation on
> CephFS? Thanks!
>
'du' traverse whole directory t
On Wed, Jan 23, 2019 at 6:07 PM Marc Roos wrote:
>
> Yes sort of. I do have an inconsistent pg for a while, but it is on a
> different pool. But I take it this is related to a networking issue I
> currently have with rsync and broken pipe.
>
> Where exactly does it go wrong? The cephfs kernel
00G metadata, mds may need 1T or more memory.
> On Tue, Jan 22, 2019 at 5:48 PM Yan, Zheng wrote:
>>
>> On Tue, Jan 22, 2019 at 10:49 AM Albert Yue
>> wrote:
>> >
>> > Hi Yan Zheng,
>> >
>> > In your opinion, can we resolve this issue by mo
On Tue, Jan 22, 2019 at 10:42 PM Dan van der Ster wrote:
>
> On Tue, Jan 22, 2019 at 3:33 PM Yan, Zheng wrote:
> >
> > On Tue, Jan 22, 2019 at 9:08 PM Dan van der Ster
> > wrote:
> > >
> > > Hi Zheng,
> > >
> > > We also just
On Tue, Jan 22, 2019 at 8:24 PM renjianxinlover wrote:
>
> hi,
>at some time, as cache pressure or caps release failure, client apps mount
> got stuck.
>my use case is in kubernetes cluster and automatic kernel client mount in
> nodes.
>is anyone faced with same issue or has related
On Wed, Jan 23, 2019 at 5:50 AM Marc Roos wrote:
>
>
> I got one again
>
> [] wait_on_page_bit_killable+0x83/0xa0
> [] __lock_page_or_retry+0xb2/0xc0
> [] filemap_fault+0x3b7/0x410
> [] ceph_filemap_fault+0x13c/0x310 [ceph]
> [] __do_fault+0x4c/0xc0
> [] do_read_fault.isra.42+0x43/0x130
> []
gt; } else {
> - clog->error() << "unmatched fragstat on " << ino() << ", inode
> has "
> + clog->warn() << "unmatched fragstat on " << ino() << ", inode has
> "
>
On Tue, Jan 22, 2019 at 10:49 AM Albert Yue wrote:
>
> Hi Yan Zheng,
>
> In your opinion, can we resolve this issue by move MDS to a 512GB or 1TB
> memory machine?
>
The problem is from client side, especially clients with large memory.
I don't think enlarge mds cache size is
On Mon, Jan 21, 2019 at 11:16 AM Albert Yue wrote:
>
> Dear Ceph Users,
>
> We have set up a cephFS cluster with 6 osd machines, each with 16 8TB
> harddisk. Ceph version is luminous 12.2.5. We created one data pool with
> these hard disks and created another meta data pool with 3 ssd. We
no, there is no config for request timeout
>
> -Original Message-
> From: Yan, Zheng [mailto:uker...@gmail.com]
> Sent: 21 January 2019 02:50
> To: Marc Roos
> Cc: ceph-users
> Subject: Re: [ceph-users] Process stuck in D+ on cephfs mount
>
> check /proc//stack to f
On Mon, Jan 21, 2019 at 12:12 PM Albert Yue wrote:
>
> Hi Yan Zheng,
>
> 1. mds cache limit is set to 64GB
> 2. we get the size of meta data pool by running `ceph df` and saw meta data
> pool just used 200MB space.
>
That's very strange. One file uses about 1k metadat
On Mon, Jan 21, 2019 at 11:16 AM Albert Yue wrote:
>
> Dear Ceph Users,
>
> We have set up a cephFS cluster with 6 osd machines, each with 16 8TB
> harddisk. Ceph version is luminous 12.2.5. We created one data pool with
> these hard disks and created another meta data pool with 3 ssd. We
It's http://tracker.ceph.com/issues/37977. Thanks for your help.
Regards
Yan, Zheng
On Sun, Jan 20, 2019 at 12:40 AM Adam Tygart wrote:
>
> It worked for about a week, and then seems to have locked up again.
>
> Here is the back trace from the threads on the mds:
> http://pe
check /proc//stack to find where it is stuck
On Mon, Jan 21, 2019 at 5:51 AM Marc Roos wrote:
>
>
> I have a process stuck in D+ writing to cephfs kernel mount. Anything
> can be done about this? (without rebooting)
>
>
> CentOS Linux release 7.5.1804 (Core)
> Linux 3.10.0-514.21.2.el7.x86_64
>
ic\/video\/3h\/3hG6X7\/screen-msmall"}]
>
Looks like object 1005607c727. in cephfs metadata pool is
corrupted. please run following commands and send mds.0 log to us
ceph tell mds.0 injectargs '--debug_mds 10'
ceph tell mds.0 damage rm 3472877204
ls
g on' and 'thread apply all bt' inside gdb. and send the output
to us
Yan, Zheng
> --
> Adam
>
> On Sat, Jan 12, 2019 at 7:53 PM Adam Tygart wrote:
> >
> > On a hunch, I shutdown the compute nodes for our HPC cluster, and 10
> > minutes after that restarted the
.
>
Could you please run following command (for each active mds) when
operations are fast and when operations are slow
- for i in `seq 10`; do ceph daemon mds.xxx dump_historic_ops >
mds.xxx.$i; sleep 1; done
Then send the results to us
Regards
Yan, Zheng
> There are ma
On Fri, Jan 4, 2019 at 11:40 AM Alexandre DERUMIER wrote:
>
> Hi,
>
> I'm currently doing cephfs backup, through a dedicated clients mounting the
> whole filesystem at root.
> others clients are mounting part of the filesystem. (kernel cephfs clients)
>
>
> I have around 22millions inodes,
>
>
likely caused by http://tracker.ceph.com/issues/37399.
Regards
Yan, Zheng
On Sat, Jan 5, 2019 at 5:44 PM Matthias Aebi wrote:
>
> Hello everyone,
>
> We are running a small cluster on 5 machines with 48 OSDs / 5 MDSs / 5 MONs
> based on Luminous 12.2.10 and Debian Stretch
On Fri, Jan 4, 2019 at 1:53 AM David C wrote:
>
> Hi All
>
> Luminous 12.2.12
> Single MDS
> Replicated pools
>
> A 'df' on a CephFS kernel client used to show me the usable space (i.e the
> raw space with the replication overhead applied). This was when I just had a
> single cephfs data pool.
On Wed, Jan 2, 2019 at 11:12 AM Zhenshi Zhou wrote:
>
> Hi all,
>
> I have a cluster on Luminous(12.2.8).
> Is there a way I can check clients' operation records?
>
No way do that
> Thanks
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
>
that it's
> the RADOS object size?
>
> I'm thinking of modifying the cephfs filesystem driver to add a mount option
> to specify a fixed block size to be reported for all files, and using 4K or
> 64K. Would that break something?
mou
On Fri, Dec 14, 2018 at 12:05 PM Sang, Oliver wrote:
>
> Thanks a lot, Yan Zheng!
>
> I enabled only 2 MDS - node1(active) and node2. Then I modified ceph.conf of
> node2 to have -
> debug_mds = 10/10
>
> At 08:35:28, I observed degradation, the node1 was not a MDS
On Thu, Dec 13, 2018 at 9:25 PM Sang, Oliver wrote:
>
> Thanks a lot, Yan Zheng!
>
> Regarding the " set debug_mds =10 for standby mds (change debug_mds to 0
> after mds becomes active)."
> Could you please explain the purpose? Just want to collect debug log, or it
lt;<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>
>
>
> The full log is also attached. Could you please help us? Thanks!
>
>
Please try
l on the ceph storage
> server side.
>
>
> Anyway,I will have a try.
>
> —
> Best Regards
> Li, Ning
>
>
>
> > On Dec 6, 2018, at 11:41, Yan, Zheng wrote:
> >
> > On Wed, Dec 5, 2018 at 2:33 PM NingLi wrote:
> >>
> >> Hi all,
> >
On Wed, Dec 5, 2018 at 2:33 PM NingLi wrote:
>
> Hi all,
>
> We found that some process writing cephfs will hang for a long time (> 120s)
> when uploading(scp/rsync) large files(totally 50G ~ 300G)to the app node's
> cephfs mountpoint.
>
> This problem is not always reproduciable. But when
Is the cephfs mount on the same machine that run OSD?
On Wed, Dec 5, 2018 at 2:33 PM NingLi wrote:
>
> Hi all,
>
> We found that some process writing cephfs will hang for a long time (> 120s)
> when uploading(scp/rsync) large files(totally 50G ~ 300G)to the app node's
> cephfs mountpoint.
>
On Tue, Dec 4, 2018 at 6:55 PM wrote:
>
> Hi,
>
> I have some wild freeze using cephfs with the kernel driver
> For instance:
> [Tue Dec 4 10:57:48 2018] libceph: mon1 10.5.0.88:6789 session lost,
> hunting for new mon
> [Tue Dec 4 10:57:48 2018] libceph: mon2 10.5.0.89:6789 session established
1 - 100 of 522 matches
Mail list logo