' node out of the 'default' root and probably
fixed your problem instantly.
Regards,
Frédéric.
-Message original-
De: Anthony
à: nguyenvandiep
Cc: ceph-users
Envoyé: samedi 24 février 2024 16:24 CET
Sujet : [ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering
T
Glad to hear it all worked out for you!
From: nguyenvand...@baoviet.com.vn At: 02/26/24 05:32:32 UTC-5:00To:
ceph-users@ceph.io
Subject: [ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in
recovering
Dear Mr Eugen, Mr Matthew, Mr David, Mr Anthony
My System is UP.
Thank you so
Dear Mr Eugen, Mr Matthew, Mr David, Mr Anthony
My System is UP.
Thank you so much. We get many support from all of you., mazing, kindly support
from Top professional in Ceph.
Hope we have a chance to cooperate in the future. And If you travel to VietNam
in future, let me know. I ll be your
Once recovery is underway way simply restarting the RGWs should be enough to
reset them and get your object store back up.
Bloomberg doesn’t use cephfs so hopefully David’s suggestions work or if anyone
else in the community can chip in for that part.
Sent from Bloomberg Professional for
if rebalancing tasks have been launched it's not a big deal, but I don't
think it's the priority.
The priority being to get the MDS back on its feet.
I haven't seen an answer to this question: can you stop/unmount cephfs
clients or not ?
There are other solutions but as you are not comfortable I
Thank you so much, Matthew. Pls keep an eye on my thread.
You and Mr Anthony made my day.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Thank you so much, Sir. You make my day T.T
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
> Low space hindering backfill (add storage if this doesn't resolve
> itself): 21 pgs backfill_toofull
^^^ Ceph even told you what you need to do ;)
If your have recovery taking place and the numbers of misplaced objects and
*full PGs/pools keeps decreasing, then yes wait.
As for
Sudo watch ceph -s
You should see stats on the recovery and see PGs transition from all the
backfill* states to active+clean
Once you get everything active clean then we can focus on your rgws and MDSs
Sent from Bloomberg Professional for iPhone
- Original Message -
From:
Thank you Matthew.
Im following guidance from Mr Anthony and now my recovery progress speed is
much faster.
I will update my case day by day.
Thank you so much
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to
Hi Mr Anthony,
Forget it, the osd is UP and recovery speed is x10times
Amazing
And now we just wait, right ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Yes, Sir.
We added 10TIb to cephosd02 node. Now the disk is IN, but DOWN state.
What should we do now :(
For additional, the recovery speed is x10 times :)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to
Anthony is correct, this is what I was getting at as well when seeing your ceph
-s output. More details in the Ceph docs here if you want to understand the
details of why you need to balance your nodes.
https://docs.ceph.com/en/quincy/rados/operations/monitoring-osd-pg/
But you need to get
Your recovery is stuck because there are no OSDs that have enough space to
accept data.
Your second OSD host appears to only have 9 OSDs currently, so you should be
able to add a 10TB OSD there without removing anything.
That will enable data to move to all three of your 10TB OSDs.
> On Feb
HolySh***
First, we change the mon_max_pg_per_osd to 1000
About adding disk for cephosd02, for more detail , what is TO, sir ? I ll make
conversation with my boss. To be honest, im thinking that the volume recovery
progress will get problem...
___
You aren’t going to be able to finish recovery without having somewhere to
recover TO.
> On Feb 24, 2024, at 10:33 AM, nguyenvand...@baoviet.com.vn wrote:
>
> Thank you, Sir. But i think i ll wait for PG BACKFILLFULL finish, my boss is
> very angry now and will not allow me to add one more
You also might want to increase mon_max_pg_per_osd since you have a wide spread
of OSD sizes.
Default is 250. Set it to 1000.
> On Feb 24, 2024, at 10:30 AM, Anthony D'Atri wrote:
>
> Add a 10tb HDD to the third node as I suggested, that will help your cluster.
>
>
>> On Feb 24, 2024, at
Thank you, Sir. But i think i ll wait for PG BACKFILLFULL finish, my boss is
very angry now and will not allow me to add one more disk( this action make him
think that ceph would take more time for recovering and rebalancing ). We want
to wait volume recovering progress finish
Add a 10tb HDD to the third node as I suggested, that will help your cluster.
> On Feb 24, 2024, at 10:29 AM, nguyenvand...@baoviet.com.vn wrote:
>
> I will correct some small things:
>
> we have 6 nodes, 3 osd node and 3 gaeway node ( which run RGW, mds and nfs
> service)
> you r corrct, 2/3
and sure, we have one more 10tib disk which cephosd02 will get it.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
I will correct some small things:
we have 6 nodes, 3 osd node and 3 gaeway node ( which run RGW, mds and nfs
service)
you r corrct, 2/3 osd node have ONE-NEW 10tib disk
About your suggestion, add another osd host, we will. But we need to end this
nightmare, my NFS folder which have 10tib data
# ceph osd dump | grep ratio
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
Read the four sections here:
https://docs.ceph.com/en/quincy/rados/operations/health-checks/#osd-out-of-order-full
> On Feb 24, 2024, at 10:12 AM, nguyenvand...@baoviet.com.vn wrote:
>
> Hi Mr Anthony,
There ya go.
You have 4 hosts, one of which appears to be down and have a single OSD that is
so small as to not be useful. Whatever cephgw03 is, it looks like a mistake.
OSDs much smaller than, say, 1TB often aren’t very useful.
Your pools appear to be replicated, size=3.
So each of your
Hi Mr Anthony, Could you tell me more details about raising the full and
backfullfull threshold
is it
ceph tell 'osd.*' injectargs --osd-max-backfills=2 --osd-recovery-max-active=6
??
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe
Hi Mr Anthony
pls check the output
https://anotepad.com/notes/s7nykdmc
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Hi Mr Anthony,
pls check the output
https://anotepad.com/notes/s7nykdmc
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Hi Mathew,
1) We have 2 MDS service running before this nightmare. Now we trying to apply
mds on 3 nodes, but all of them will stop within 2 minutes.
2) You are correct. We just add two 10TIB disk to cluster ( which currently
have 27 x 4TIB disk), all of them have weight 1.0
About volume
Hi David,
I ll follow your suggestion. Do you have Telegram ? If yes, could you pls add
my Telegram, +84989177619. Thank you so much
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
>
> 2) It looks like you might have an interesting crush map. Allegedly you have
> 41TiB of space but you can’t finish rococering you have lots of PGs stuck as
> their destination is too full. Are you running homogenous hardware or do you
> have different drive sizes? Are all the weights set
It looks like you have quite a few problems I’ll try and address them one by
one.
1) Looks like you had a bunch of crashes, from the ceph -s it looks like you
don’t have enough MDS daemons running for a quorum. So you’ll need to restart
the crashed containers.
2) It looks like you might
Do you have the possibility to stop/unmount cephfs clients ?
If so, do that and restart the MDS.
It should restart.
Have the clients restart one by one and check that the MDS does not crash
(by monitoring the logs)
Cordialement,
*David
Hi Mathew
Pls chekc my ceph -s
ceph -s
cluster:
id: 258af72a-cff3-11eb-a261-d4f5ef25154c
health: HEALTH_WARN
3 failed cephadm daemon(s)
1 filesystem is degraded
insufficient standby MDS daemons available
1 nearfull osd(s)
Can you send sudo ceph -s and sudo ceph health detail
Sent from Bloomberg Professional for iPhone
- Original Message -
From: nguyenvand...@baoviet.com.vn
To: ceph-users@ceph.io
At: 02/23/24 20:27:53 UTC-05:00
Could you pls guide me more detail :( im very newbie in Ceph :(
Could you pls guide me more detail :( im very newbie in Ceph :(
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
look at ALL cephfs kernel clients (no effect on RGW)
Le ven. 23 févr. 2024 à 16:38, a écrit :
> And we dont have parameter folder
>
> cd /sys/module/ceph/
> [root@cephgw01 ceph]# ls
> coresize holders initsize initstate notes refcnt rhelversion
> sections srcversion taint uevent
>
>
And we dont have parameter folder
cd /sys/module/ceph/
[root@cephgw01 ceph]# ls
coresize holders initsize initstate notes refcnt rhelversion sections
srcversion taint uevent
My Ceph is 16.2.4
___
ceph-users mailing list --
Hi David,
Could you pls helo me understand,
Does it affect to RGW service ? And if something go bad, how can i rollback ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Thank you for your time :) Have a good day, sir
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Hi,
The problem seems to come from the clients (reconnect).
Test by disabling metrics on all clients:
echo Y > /sys/module/ceph/parameters/disable_send_metrics
Cordialement,
*David CASIER*
This seems to be the relevant stack trace:
---snip---
Feb 23 15:18:39 cephgw02 conmon[2158052]: debug -1>
2024-02-23T08:18:39.609+ 7fccc03c0700 -1
https://drive.google.com/file/d/1OIN5O2Vj0iWfEMJ2fyHN_xV6fpknBmym/view?usp=sharing
Pls check my mds log which generate by command
cephadm logs --name mds.cephfs.cephgw02.qqsavr --fsid
258af72a-cff3-11eb-a261-d4f5ef25154c
___
ceph-users mailing list --
You still haven't provided any details (logs) of what happened. The
short excerpt from yesterday isn't useful as it only shows the startup
of the daemon.
Zitat von nguyenvand...@baoviet.com.vn:
Could you pls help me explain the status of volume: recovering ?
what is it ? and do we need to
Could you pls help me explain the status of volume: recovering ? what is it ?
and do we need to wait for volume recovery progress finished ??
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
If it crashes after two minutes you have your time window to look for.
Restart the mds daemon and capture everything after that until the
crash.
Zitat von nguyenvand...@baoviet.com.vn:
it suck too long log, could you pls guide me how to grep/filter
important things in logs ?
Feb 22 13:39:43 cephgw02 conmon[1340927]: log_file
/var/lib/ceph/crash/2024-02-22T06:39:43.618845Z_78ee38bc-9115-4bc6-8c3a-4bf42284c970/log
Feb 22 13:39:43 cephgw02 conmon[1340927]: --- end dump of recent events ---
Feb 22 13:39:45 cephgw02 systemd[1]:
it suck too long log, could you pls guide me how to grep/filter important
things in logs ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
There a couple of ways, find your MDS daemon with:
ceph fs status -> should show you the to-be-active MDS
On that host run:
cephadm logs --name mds.{MDS}
or alternatively:
cephadm ls --no-detail | grep mds
journalctl -u ceph-{FSID}@mds.{MDS} --no-pager > {MDS}.log
Zitat von
How can we get log of MDS, pls guide me T_T
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
What does the MDS log when it crashes?
Zitat von nguyenvand...@baoviet.com.vn:
We have 6 node ( 3 OSD-node and 3 service node), t2/3 OSD nodes was
powered off and we got big problem
pls check ceph-s result below
now we cannot start mds service, ( we tried to start but it stopped
after 2
49 matches
Mail list logo