Quoting Stefan Kooman (ste...@bit.nl):
> 13.2.6 with this patch is running production now. We will continue the
> cleanup process that *might* have triggered this tomorrow morning.
For what's worth it ... that process completed succesfully ... Time will
tell if it's really fixed, but it looks pro
Hi,
Quoting Yan, Zheng (uker...@gmail.com):
> Please check if https://github.com/ceph/ceph/pull/32020 works
Thanks!
13.2.6 with this patch is running production now. We will continue the
cleanup process that *might* have triggered this tomorrow morning.
Gr. Stefan
--
| BIT BV https://www.bi
On Thu, Dec 5, 2019 at 4:40 AM Stefan Kooman wrote:
>
> Quoting Stefan Kooman (ste...@bit.nl):
> > and it crashed again (and again) ... until we stopped the mds and
> > deleted the mds0_openfiles.0 from the metadata pool.
> >
> > Here is the (debug) output:
> >
> > A specific workload that *m
Quoting Stefan Kooman (ste...@bit.nl):
> and it crashed again (and again) ... until we stopped the mds and
> deleted the mds0_openfiles.0 from the metadata pool.
>
> Here is the (debug) output:
>
> A specific workload that *might* have triggered this: recursively deleting a
> long
> list of
Hi,
Quoting Stefan Kooman (ste...@bit.nl):
> > please apply following patch, thanks.
> >
> > diff --git a/src/mds/OpenFileTable.cc b/src/mds/OpenFileTable.cc
> > index c0f72d581d..2ca737470d 100644
> > --- a/src/mds/OpenFileTable.cc
> > +++ b/src/mds/OpenFileTable.cc
> > @@ -470,7 +470,11 @@ voi
Hi,
Quoting Yan, Zheng (uker...@gmail.com):
> > > I double checked the code, but didn't find any clue. Can you compile
> > > mds with a debug patch?
> >
> > Sure, I'll try to do my best to get a properly packaged Ceph Mimic
> > 13.2.6 with the debug patch in it (and / or get help to get it build)
On Mon, Oct 21, 2019 at 7:58 PM Stefan Kooman wrote:
>
> Quoting Yan, Zheng (uker...@gmail.com):
>
> > I double checked the code, but didn't find any clue. Can you compile
> > mds with a debug patch?
>
> Sure, I'll try to do my best to get a properly packaged Ceph Mimic
> 13.2.6 with the debug pat
Quoting Yan, Zheng (uker...@gmail.com):
> I double checked the code, but didn't find any clue. Can you compile
> mds with a debug patch?
Sure, I'll try to do my best to get a properly packaged Ceph Mimic
13.2.6 with the debug patch in it (and / or get help to get it build).
Do you already have th
On Mon, Oct 21, 2019 at 4:33 PM Stefan Kooman wrote:
>
> Quoting Yan, Zheng (uker...@gmail.com):
>
> > delete 'mdsX_openfiles.0' object from cephfs metadata pool. (X is rank
> > of the crashed mds)
>
> OK, MDS crashed again, restarted. I stopped it, deleted the object and
> restarted the MDS. It b
Quoting Yan, Zheng (uker...@gmail.com):
> delete 'mdsX_openfiles.0' object from cephfs metadata pool. (X is rank
> of the crashed mds)
OK, MDS crashed again, restarted. I stopped it, deleted the object and
restarted the MDS. It became active right away.
Any idea on why the openfiles list (object
Quoting Yan, Zheng (uker...@gmail.com):
> delete 'mdsX_openfiles.0' object from cephfs metadata pool. (X is rank
> of the crashed mds)
Just to make sure I understand correctly. Current status is that the MDS
is active (no standby for now) and not in a "crashed" state (although it
has been crashin
On Sun, Oct 20, 2019 at 1:53 PM Stefan Kooman wrote:
>
> Dear list,
>
> Quoting Stefan Kooman (ste...@bit.nl):
>
> > I wonder if this situation is more likely to be hit on Mimic 13.2.6 than
> > on any other system.
> >
> > Any hints / help to prevent this from happening?
>
> We have had this happe
Dear list,
Quoting Stefan Kooman (ste...@bit.nl):
> I wonder if this situation is more likely to be hit on Mimic 13.2.6 than
> on any other system.
>
> Any hints / help to prevent this from happening?
We have had this happening another two times now. In both cases the MDS
recovers, becomes acti
Dear list,
Today our active MDS crashed with an assert:
2019-10-19 08:14:50.645 7f7906cb7700 -1
/build/ceph-13.2.6/src/mds/OpenFileTable.cc: In function 'void
OpenFileTable::commit(MDSInternalContextBase*, uint64_t, int)' thread
7f7906cb7700 time 2019-10-19 08:14:50.648559
/build/ceph-13.2.6/s
On Mon, Feb 11, 2019 at 8:01 PM Jake Grimmett wrote:
>
> Hi Zheng,
>
> Many, many thanks for your help...
>
> Your suggestion of setting large values for mds_cache_size and
> mds_cache_memory_limit stopped our MDS crashing :)
>
> The values in ceph.conf are now:
>
> mds_cache_size = 8589934592
> m
Hi Zheng,
Sorry - I've just re-read your email and saw your instruction to restore
the mds_cache_size and mds_cache_memory_limit to original values if the
MDS does not crash - I have now done this...
thanks again for your help,
best regards,
Jake
On 2/11/19 12:01 PM, Jake Grimmett wrote:
> Hi
Hi Zheng,
Many, many thanks for your help...
Your suggestion of setting large values for mds_cache_size and
mds_cache_memory_limit stopped our MDS crashing :)
The values in ceph.conf are now:
mds_cache_size = 8589934592
mds_cache_memory_limit = 17179869184
Should these values be left in our co
On Sat, Feb 9, 2019 at 12:36 AM Jake Grimmett wrote:
>
> Dear All,
>
> Unfortunately the MDS has crashed on our Mimic cluster...
>
> First symptoms were rsync giving:
> "No space left on device (28)"
> when trying to rename or delete
>
> This prompted me to try restarting the MDS, as it reported l
Dear All,
Unfortunately the MDS has crashed on our Mimic cluster...
First symptoms were rsync giving:
"No space left on device (28)"
when trying to rename or delete
This prompted me to try restarting the MDS, as it reported laggy.
Restarting the MDS, shows this as error in the log before the cr
Thanks for the tips, John. I'll increase the debug level as suggested.
On 25 Feb 2018 20:56, "John Spray" wrote:
> On Sat, Feb 24, 2018 at 10:13 AM, David C wrote:
> > Hi All
> >
> > I had an MDS go down on a 12.2.1 cluster, the standby took over but I
> don't
> > know what caused the issue. Sc
On Sat, Feb 24, 2018 at 10:13 AM, David C wrote:
> Hi All
>
> I had an MDS go down on a 12.2.1 cluster, the standby took over but I don't
> know what caused the issue. Scrubs are scheduled to start at 23:00 on this
> cluster but this appears to have started a minute before.
>
> Can anyone help me
Hi All
I had an MDS go down on a 12.2.1 cluster, the standby took over but I don't
know what caused the issue. Scrubs are scheduled to start at 23:00 on this
cluster but this appears to have started a minute before.
Can anyone help me with diagnosing this please. Here's the relevant bit
from the
On Tue, Aug 16, 2016 at 6:29 AM, Randy Orr wrote:
> Hi Patrick,
>
> We continue to hit this bug. Just a couple of questions:
>
> 1. I see that http://tracker.ceph.com/issues/16983 has been updated and you
> believe it is related to http://tracker.ceph.com/issues/16013. It looks like
> this fix is
Hi Patrick,
We continue to hit this bug. Just a couple of questions:
1. I see that http://tracker.ceph.com/issues/16983 has been updated and you
believe it is related to http://tracker.ceph.com/issues/16013. It looks
like this fix is scheduled to be backported to Jewel at some point... is
there a
Patrick,
We are using the kernel client. We have a mix of 4.4 and 3.19 kernels on
the client side with plans to move away from the 3.19 kernel where/when we
can.
-Randy
On Wed, Aug 10, 2016 at 4:24 PM, Patrick Donnelly
wrote:
> Randy, are you using ceph-fuse or the kernel client (or something
Randy, are you using ceph-fuse or the kernel client (or something else)?
On Wed, Aug 10, 2016 at 2:33 PM, Randy Orr wrote:
> Great, thank you. Please let me know if I can be of any assistance in
> testing or validating a fix.
>
> -Randy
>
> On Wed, Aug 10, 2016 at 1:21 PM, Patrick Donnelly
> wro
Great, thank you. Please let me know if I can be of any assistance in
testing or validating a fix.
-Randy
On Wed, Aug 10, 2016 at 1:21 PM, Patrick Donnelly
wrote:
> Hello Randy,
>
> On Wed, Aug 10, 2016 at 12:20 PM, Randy Orr wrote:
> > mds/Locker.cc: In function 'bool Locker::check_inode_max_
Hello Randy,
On Wed, Aug 10, 2016 at 12:20 PM, Randy Orr wrote:
> mds/Locker.cc: In function 'bool Locker::check_inode_max_size(CInode*, bool,
> bool, uint64_t, bool, uint64_t, utime_t)' thread 7fc305b83700 time
> 2016-08-09 18:51:50.626630
> mds/Locker.cc: 2190: FAILED assert(in->is_file())
>
>
Hello,
We have recently had some failures with our MDS processes. We are running
Jewel 10.2.1. The two MDS services are on dedicated hosts running in
active/standby on Ubuntu 14.04.3 with kernel 3.19.0-56-generic. I have
searched the mailing list and open tickets without much luck so far.
The fir
hi,
that appears to have worked. The mds are now stable and I can read and
write correctly.
thanks for the help and have a good day.
On 29/05/15 12:25, John Spray wrote:
On 29/05/2015 11:41, Peter Tiernan wrote:
ok, thanks. I wasn’t aware of this. Should this command fix
everything or is
On 29/05/2015 11:41, Peter Tiernan wrote:
ok, thanks. I wasn’t aware of this. Should this command fix everything
or is do i need to delete cephfs and pools and start again:
> ceph osd tier cache-mode CachePool writeback
It might well work, give it a try.
John
_
ok, thanks. I wasn’t aware of this. Should this command fix everything
or is do i need to delete cephfs and pools and start again:
> ceph osd tier cache-mode CachePool writeback
On 29/05/15 11:37, John Spray wrote:
On 29/05/2015 11:34, Peter Tiernan wrote:
ok, thats interesting. I had issues
On 29/05/2015 11:34, Peter Tiernan wrote:
ok, thats interesting. I had issues before this crash where files were
being garbled. I followed what I thought was the correct procedure for
erasure coded pool with cache tier:
> ceph osd pool create ECpool 800 800 erasure default
> ceph osd pool crea
ok, thats interesting. I had issues before this crash where files were
being garbled. I followed what I thought was the correct procedure for
erasure coded pool with cache tier:
> ceph osd pool create ECpool 800 800 erasure default
> ceph osd pool create CachePool 4096 4096
> ceph osd tier add
On 29/05/2015 09:46, Peter Tiernan wrote:
-16> 2015-05-29 09:28:23.106541 7f78c53a9700 10 mds.0.objecter in
handle_osd_op_reply
-15> 2015-05-29 09:28:23.106543 7f78c53a9700 7 mds.0.objecter
handle_osd_op_reply 28 ondisk v 0'0 uv 0 in 11.5ce99960 attempt 1
-14> 2015-05-29 09:28:23.106
Thank you for your reply
I had read the 'mds crashing' thread and i dont think im seeing that bug
(http://tracker.ceph.com/issues/10449) .
I have enabled "debug objector = 10" and here is the full log on
starting mds : http://pastebin.com/dbk0uLYy
Here is the last part of log:
-35> 20
(This came up as in-reply-to to the previous "mds crashing" thread --
it's better to start threads with a fresh message)
On 28/05/2015 16:58, Peter Tiernan wrote:
Hi all,
I have been testing cephfs with erasure coded pool and cache tier. I
have 3 mds running on the same physical server as
Hi all,
I have been testing cephfs with erasure coded pool and cache tier. I
have 3 mds running on the same physical server as 3 mons. The cluster is
in ok state otherwise, rbd is working and all pg are active+clean. Im
running v 0.87.2 giant on all nodes and ubuntu 14.04.2 .
The cluster was
For the question of OSD failures causing MDS crashes, there are many
places where the MDS asserts that OSD operations succeeded (grep the
code for "assert(r == 0)") -- we could probably do a better job of
handling these, e.g. log the OSD error and respawn rather than
assert'ing.
John
On Sat, Jul
It crashed on an OSD reply. What's the output of "ceph -s"?
-Greg
On Wednesday, July 9, 2014, Florent B wrote:
> Hi all,
>
> I run a Firefly cluster with a MDS server for a while without any problem.
>
> I would like to setup a second one to get a failover server.
>
> To minimize downtime in cas
there is memory leak bug in standby replay code, your issue is likely
caused by it.
Yan, Zheng
On Wed, Jul 9, 2014 at 4:49 PM, Florent B wrote:
> Hi all,
>
> I run a Firefly cluster with a MDS server for a while without any problem.
>
> I would like to setup a second one to get a failover server
On Wednesday, June 11, 2014, Florent B wrote:
> Hi every one,
>
> Sometimes my MDS crashes... sometimes after a few hours, sometimes after
> a few days.
>
> I know I could enable debugging and so on to get more information. But
> if it crashes after a few days, it generates gigabytes of debugging
I'm not sure I will re-test and tell you ;)
On 04/02/2014 04:14 PM, Gregory Farnum wrote:
> A *clean* shutdown? That sounds like a different issue; hjcho616's
> issue only happens when a client wakes back up again.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Wed
A *clean* shutdown? That sounds like a different issue; hjcho616's
issue only happens when a client wakes back up again.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Wed, Apr 2, 2014 at 6:34 AM, Florent B wrote:
> Can someone confirm that this issue is also in Emperor re
Yes, Zheng's fix for the MDS crash is in current mainline and will be
in the next Firefly RC release.
Sage, is there something else we can/should be doing when a client
goes to sleep that we aren't already? (ie, flushing out all dirty data
or something and disconnecting?)
-Greg
Software Engineer #
wait a bit more.
Regards,
Hong
From: Gregory Farnum
To: hjcho616
Cc: Mohd Bazli Ab Karim ; "Yan, Zheng"
; Sage Weil ; "ceph-users@lists.ceph.com"
Sent: Tuesday, March 25, 2014 11:05 AM
Subject: Re: [ceph-users] MDS crash when client goe
ts.ceph.com"
Sent: Tuesday, March 25, 2014 12:59 PM
Subject: Re: [ceph-users] MDS crash when client goes to sleep
On Tue, Mar 25, 2014 at 9:56 AM, hjcho616 wrote:
> I am merely putting the client to sleep and waking it up. When it is up,
> running ls on the mounted directory. A
On Tue, Mar 25, 2014 at 9:56 AM, hjcho616 wrote:
> I am merely putting the client to sleep and waking it up. When it is up,
> running ls on the mounted directory. As far as I am concerned at very high
> level I am doing the same thing. All are running 3.13 kernel Debian
> provided.
>
> When tha
Regards,
Hong
From: Gregory Farnum
To: hjcho616
Cc: Mohd Bazli Ab Karim ; "Yan, Zheng"
; Sage Weil ; "ceph-users@lists.ceph.com"
Sent: Tuesday, March 25, 2014 11:05 AM
Subject: Re: [ceph-users] MDS crash when client goes to sleep
On Mon, Mar 24, 2014 at 6:26
On Mon, Mar 24, 2014 at 6:26 PM, hjcho616 wrote:
> I tried the patch twice. First time, it worked. There was no issue.
> Connected back to MDS and was happily running. All three MDS demons were
> running ok.
>
> Second time though... all three demons were alive. Health was reported OK.
> Howev
steps, just in case if it happens again in future.
Many thanks.
Bazli
-Original Message-
From: Yan, Zheng [mailto:uker...@gmail.com]
Sent: Sunday, March 23, 2014 2:53 PM
To: Sage Weil
Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] MDS crash when client goes to slee
uld it able to mount to the filesystem now? It
>> > looks
>> > similar to our case, http://www.spinics.net/lists/ceph-devel/msg18395.html
>> >
>> > However, you need to collect some logs to confirm this.
>> >
>> >
>> >
>> > Thanks.
>
;s the client now? Would it able to mount to the filesystem now? It looks
> > similar to our case, http://www.spinics.net/lists/ceph-devel/msg18395.html
> >
> > However, you need to collect some logs to confirm this.
> >
> >
> >
> > Thanks.
> >
> >
>
ou need to collect some logs to confirm this.
>
>
>
> Thanks.
>
>
>
>
>
> From: hjcho616 [mailto:hjcho...@yahoo.com]
> Sent: Friday, March 21, 2014 2:30 PM
>
>
> To: Luke Jing Yuan
> Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
> Subject: Re: [
ch 21, 2014 2:30 PM
To: Luke Jing Yuan
Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] MDS crash when client goes to sleep
Luke,
Not sure what flapping ceph-mds daemon mean, but when I connected to MDS when
this happened there no longer was any process with ceph-mds w
im ; "ceph-users@lists.ceph.com"
Sent: Friday, March 21, 2014 1:17 AM
Subject: RE: [ceph-users] MDS crash when client goes to sleep
Hi Hong,
That's interesting, for Mr. Bazli and I, we ended with MDS stuck in (up:replay)
and a flapping ceph-mds daemon, but then again we are usin
ds,
Luke
From: hjcho616 [mailto:hjcho...@yahoo.com]
Sent: Friday, 21 March, 2014 12:09 PM
To: Luke Jing Yuan
Cc: Mohd Bazli Ab Karim; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] MDS crash when client goes to sleep
Nope just these segfaults.
[149884.709608] ceph-mds[17366]: segfault at 200 ip 00
PM
Subject: Re: [ceph-users] MDS crash when client goes to sleep
Did you see any messages in dmesg saying ceph-mds respawnning or stuffs like
that?
Regards,
Luke
On Mar 21, 2014, at 11:09 AM, "hjcho616" wrote:
On client, I was no longer able to access the filesystem. It would
@lists.ceph.com<mailto:ceph-users@lists.ceph.com>"
mailto:ceph-users@lists.ceph.com>>
Sent: Thursday, March 20, 2014 9:40 PM
Subject: RE: [ceph-users] MDS crash when client goes to sleep
Hi Hong,
May I know what has happened to your MDS once it crashed? Was it able to
recover from replay
: hjcho616 ; "ceph-users@lists.ceph.com"
Sent: Thursday, March 20, 2014 9:40 PM
Subject: RE: [ceph-users] MDS crash when client goes to sleep
Hi Hong,
May I know what has happened to your MDS once it crashed? Was it able to
recover from replay?
We also facing this issue and I am int
hjcho616
Sent: Friday, March 21, 2014 10:29 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] MDS crash when client goes to sleep
When CephFS is mounted on a client and when client decides to go to sleep, MDS
segfaults. Has anyone seen this? Below is a part of MDS log. This happened
in
When CephFS is mounted on a client and when client decides to go to sleep, MDS
segfaults. Has anyone seen this? Below is a part of MDS log. This happened
in emperor and recent 0.77 release. I am running Debian Wheezy with testing
kernels 3.13. What can I do to not crash the whole system if
Ah, looks like it was.
I've got a gitbuilder build of the mds running and it seems to be working.
Thanks!
Mike
On 30 April 2013 16:56, Kevin Decherf wrote:
> On Tue, Apr 30, 2013 at 03:10:00PM +0100, Mike Bryant wrote:
>> All of my MDS daemons have begun crashing when I start them up, and
>> the
On Tue, Apr 30, 2013 at 03:10:00PM +0100, Mike Bryant wrote:
> All of my MDS daemons have begun crashing when I start them up, and
> they try to begin recovery.
Hi,
It seems to be the same bug as #4644
http://tracker.ceph.com/issues/4644
--
Kevin Decherf - @Kdecherf
GPG C610 FE73 E706 F968 612B
All of my MDS daemons have begun crashing when I start them up, and
they try to begin recovery.
Log attached
Mike
--
Mike Bryant | Systems Administrator | Ocado Technology
mike.bry...@ocado.com | 01707 382148 | www.ocado.com
--
Notice: This email is confidential and may contain copyright mater
On Mon, Feb 25, 2013 at 8:44 AM, Sage Weil wrote:
> On Mon, 25 Feb 2013, Steffen Thorhauer wrote:
>> Hi,
>> I've found out, what I make wrong: stop the cluster and forget a client,
>> which as mounting the cephfs. I simply forget the client.
>> With a
>> ceph mds newfs 0 1 --yes-i-really-mean-it
On Mon, 25 Feb 2013, Steffen Thorhauer wrote:
> Hi,
> I've found out, what I make wrong: stop the cluster and forget a client,
> which as mounting the cephfs. I simply forget the client.
> With a
> ceph mds newfs 0 1 --yes-i-really-mean-it
Oh... the 'newfs' resets the MDSMap in the monitor, but
Hi,
I've found out, what I make wrong: stop the cluster and forget a client, which
as mounting the cephfs. I simply forget the client.
With a
ceph mds newfs 0 1 --yes-i-really-mean-it
(I dont really what the parameters are), but the mds is restarting with
an empty fs.
I tried the patch version
68 matches
Mail list logo