[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-03-16 Thread Frédéric Nass
' node out of the 'default' root and probably fixed your problem instantly. Regards, Frédéric. -Message original- De: Anthony à: nguyenvandiep Cc: ceph-users Envoyé: samedi 24 février 2024 16:24 CET Sujet : [ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering T

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-26 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
Glad to hear it all worked out for you! From: nguyenvand...@baoviet.com.vn At: 02/26/24 05:32:32 UTC-5:00To: ceph-users@ceph.io Subject: [ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering Dear Mr Eugen, Mr Matthew, Mr David, Mr Anthony My System is UP. Thank you so

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-26 Thread nguyenvandiep
Dear Mr Eugen, Mr Matthew, Mr David, Mr Anthony My System is UP. Thank you so much. We get many support from all of you., mazing, kindly support from Top professional in Ceph. Hope we have a chance to cooperate in the future. And If you travel to VietNam in future, let me know. I ll be your

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
Once recovery is underway way simply restarting the RGWs should be enough to reset them and get your object store back up. Bloomberg doesn’t use cephfs so hopefully David’s suggestions work or if anyone else in the community can chip in for that part. Sent from Bloomberg Professional for

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread David C.
if rebalancing tasks have been launched it's not a big deal, but I don't think it's the priority. The priority being to get the MDS back on its feet. I haven't seen an answer to this question: can you stop/unmount cephfs clients or not ? There are other solutions but as you are not comfortable I

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Thank you so much, Matthew. Pls keep an eye on my thread. You and Mr Anthony made my day. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Thank you so much, Sir. You make my day T.T ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
> Low space hindering backfill (add storage if this doesn't resolve > itself): 21 pgs backfill_toofull ^^^ Ceph even told you what you need to do ;) If your have recovery taking place and the numbers of misplaced objects and *full PGs/pools keeps decreasing, then yes wait. As for

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
Sudo watch ceph -s You should see stats on the recovery and see PGs transition from all the backfill* states to active+clean Once you get everything active clean then we can focus on your rgws and MDSs Sent from Bloomberg Professional for iPhone - Original Message - From:

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Thank you Matthew. Im following guidance from Mr Anthony and now my recovery progress speed is much faster. I will update my case day by day. Thank you so much ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Hi Mr Anthony, Forget it, the osd is UP and recovery speed is x10times Amazing And now we just wait, right ? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Yes, Sir. We added 10TIb to cephosd02 node. Now the disk is IN, but DOWN state. What should we do now :( For additional, the recovery speed is x10 times :) ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
Anthony is correct, this is what I was getting at as well when seeing your ceph -s output. More details in the Ceph docs here if you want to understand the details of why you need to balance your nodes. https://docs.ceph.com/en/quincy/rados/operations/monitoring-osd-pg/ But you need to get

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
Your recovery is stuck because there are no OSDs that have enough space to accept data. Your second OSD host appears to only have 9 OSDs currently, so you should be able to add a 10TB OSD there without removing anything. That will enable data to move to all three of your 10TB OSDs. > On Feb

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
HolySh*** First, we change the mon_max_pg_per_osd to 1000 About adding disk for cephosd02, for more detail , what is TO, sir ? I ll make conversation with my boss. To be honest, im thinking that the volume recovery progress will get problem... ___

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
You aren’t going to be able to finish recovery without having somewhere to recover TO. > On Feb 24, 2024, at 10:33 AM, nguyenvand...@baoviet.com.vn wrote: > > Thank you, Sir. But i think i ll wait for PG BACKFILLFULL finish, my boss is > very angry now and will not allow me to add one more

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
You also might want to increase mon_max_pg_per_osd since you have a wide spread of OSD sizes. Default is 250. Set it to 1000. > On Feb 24, 2024, at 10:30 AM, Anthony D'Atri wrote: > > Add a 10tb HDD to the third node as I suggested, that will help your cluster. > > >> On Feb 24, 2024, at

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Thank you, Sir. But i think i ll wait for PG BACKFILLFULL finish, my boss is very angry now and will not allow me to add one more disk( this action make him think that ceph would take more time for recovering and rebalancing ). We want to wait volume recovering progress finish

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
Add a 10tb HDD to the third node as I suggested, that will help your cluster. > On Feb 24, 2024, at 10:29 AM, nguyenvand...@baoviet.com.vn wrote: > > I will correct some small things: > > we have 6 nodes, 3 osd node and 3 gaeway node ( which run RGW, mds and nfs > service) > you r corrct, 2/3

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
and sure, we have one more 10tib disk which cephosd02 will get it. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
I will correct some small things: we have 6 nodes, 3 osd node and 3 gaeway node ( which run RGW, mds and nfs service) you r corrct, 2/3 osd node have ONE-NEW 10tib disk About your suggestion, add another osd host, we will. But we need to end this nightmare, my NFS folder which have 10tib data

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
# ceph osd dump | grep ratio full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 Read the four sections here: https://docs.ceph.com/en/quincy/rados/operations/health-checks/#osd-out-of-order-full > On Feb 24, 2024, at 10:12 AM, nguyenvand...@baoviet.com.vn wrote: > > Hi Mr Anthony,

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
There ya go. You have 4 hosts, one of which appears to be down and have a single OSD that is so small as to not be useful. Whatever cephgw03 is, it looks like a mistake. OSDs much smaller than, say, 1TB often aren’t very useful. Your pools appear to be replicated, size=3. So each of your

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Hi Mr Anthony, Could you tell me more details about raising the full and backfullfull threshold is it ceph tell 'osd.*' injectargs --osd-max-backfills=2 --osd-recovery-max-active=6 ?? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Hi Mr Anthony pls check the output https://anotepad.com/notes/s7nykdmc ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Hi Mr Anthony, pls check the output https://anotepad.com/notes/s7nykdmc ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Hi Mathew, 1) We have 2 MDS service running before this nightmare. Now we trying to apply mds on 3 nodes, but all of them will stop within 2 minutes. 2) You are correct. We just add two 10TIB disk to cluster ( which currently have 27 x 4TIB disk), all of them have weight 1.0 About volume

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Hi David, I ll follow your suggestion. Do you have Telegram ? If yes, could you pls add my Telegram, +84989177619. Thank you so much ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Anthony D'Atri
> > 2) It looks like you might have an interesting crush map. Allegedly you have > 41TiB of space but you can’t finish rococering you have lots of PGs stuck as > their destination is too full. Are you running homogenous hardware or do you > have different drive sizes? Are all the weights set

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
It looks like you have quite a few problems I’ll try and address them one by one. 1) Looks like you had a bunch of crashes, from the ceph -s it looks like you don’t have enough MDS daemons running for a quorum. So you’ll need to restart the crashed containers. 2) It looks like you might

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread David C.
Do you have the possibility to stop/unmount cephfs clients ? If so, do that and restart the MDS. It should restart. Have the clients restart one by one and check that the MDS does not crash (by monitoring the logs) Cordialement, *David

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread nguyenvandiep
Hi Mathew Pls chekc my ceph -s ceph -s cluster: id: 258af72a-cff3-11eb-a261-d4f5ef25154c health: HEALTH_WARN 3 failed cephadm daemon(s) 1 filesystem is degraded insufficient standby MDS daemons available 1 nearfull osd(s)

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-23 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
Can you send sudo ceph -s and sudo ceph health detail Sent from Bloomberg Professional for iPhone - Original Message - From: nguyenvand...@baoviet.com.vn To: ceph-users@ceph.io At: 02/23/24 20:27:53 UTC-05:00 Could you pls guide me more detail :( im very newbie in Ceph :(

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-23 Thread nguyenvandiep
Could you pls guide me more detail :( im very newbie in Ceph :( ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-23 Thread David C.
look at ALL cephfs kernel clients (no effect on RGW) Le ven. 23 févr. 2024 à 16:38, a écrit : > And we dont have parameter folder > > cd /sys/module/ceph/ > [root@cephgw01 ceph]# ls > coresize holders initsize initstate notes refcnt rhelversion > sections srcversion taint uevent > >

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-23 Thread nguyenvandiep
And we dont have parameter folder cd /sys/module/ceph/ [root@cephgw01 ceph]# ls coresize holders initsize initstate notes refcnt rhelversion sections srcversion taint uevent My Ceph is 16.2.4 ___ ceph-users mailing list --

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-23 Thread nguyenvandiep
Hi David, Could you pls helo me understand, Does it affect to RGW service ? And if something go bad, how can i rollback ? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-23 Thread nguyenvandiep
Thank you for your time :) Have a good day, sir ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-23 Thread David C.
Hi, The problem seems to come from the clients (reconnect). Test by disabling metrics on all clients: echo Y > /sys/module/ceph/parameters/disable_send_metrics Cordialement, *David CASIER*

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-23 Thread Eugen Block
This seems to be the relevant stack trace: ---snip--- Feb 23 15:18:39 cephgw02 conmon[2158052]: debug -1> 2024-02-23T08:18:39.609+ 7fccc03c0700 -1

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-23 Thread nguyenvandiep
https://drive.google.com/file/d/1OIN5O2Vj0iWfEMJ2fyHN_xV6fpknBmym/view?usp=sharing Pls check my mds log which generate by command cephadm logs --name mds.cephfs.cephgw02.qqsavr --fsid 258af72a-cff3-11eb-a261-d4f5ef25154c ___ ceph-users mailing list --

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-23 Thread Eugen Block
You still haven't provided any details (logs) of what happened. The short excerpt from yesterday isn't useful as it only shows the startup of the daemon. Zitat von nguyenvand...@baoviet.com.vn: Could you pls help me explain the status of volume: recovering ? what is it ? and do we need to

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread nguyenvandiep
Could you pls help me explain the status of volume: recovering ? what is it ? and do we need to wait for volume recovery progress finished ?? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread Eugen Block
If it crashes after two minutes you have your time window to look for. Restart the mds daemon and capture everything after that until the crash. Zitat von nguyenvand...@baoviet.com.vn: it suck too long log, could you pls guide me how to grep/filter important things in logs ?

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread nguyenvandiep
Feb 22 13:39:43 cephgw02 conmon[1340927]: log_file /var/lib/ceph/crash/2024-02-22T06:39:43.618845Z_78ee38bc-9115-4bc6-8c3a-4bf42284c970/log Feb 22 13:39:43 cephgw02 conmon[1340927]: --- end dump of recent events --- Feb 22 13:39:45 cephgw02 systemd[1]:

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread nguyenvandiep
it suck too long log, could you pls guide me how to grep/filter important things in logs ? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread Eugen Block
There a couple of ways, find your MDS daemon with: ceph fs status -> should show you the to-be-active MDS On that host run: cephadm logs --name mds.{MDS} or alternatively: cephadm ls --no-detail | grep mds journalctl -u ceph-{FSID}@mds.{MDS} --no-pager > {MDS}.log Zitat von

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread nguyenvandiep
How can we get log of MDS, pls guide me T_T ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-22 Thread Eugen Block
What does the MDS log when it crashes? Zitat von nguyenvand...@baoviet.com.vn: We have 6 node ( 3 OSD-node and 3 service node), t2/3 OSD nodes was powered off and we got big problem pls check ceph-s result below now we cannot start mds service, ( we tried to start but it stopped after 2