For the records i have tried/done this like:
Ceph config set osd bluestore_rocksdb_options_annex option1=8,option2=4
But i am not sure if it is necessary to restart the osds... cause
Ceph config dump
Shows
... .. .
Osd advanced option1=8,option2=4 *
... .. .
The "*" is shown in the "RO"
Does anyone else receive unsolicited replies from sender "Chip Cox
" to e-mails posted on this list?
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe
I concur, having this heavily collocated set-up will not perform any better
than you observe. Do you really have 2 MDS daemons per host? I just saw hat you
have only 2 disks, probably 1 per node. In this set-up, you cannot really
expect good fail-over times due to the amount of simultaneous
Once activated the dashboard, I try to import certificates, but it fails:
$ ceph dashboard set-ssl-certificate-key -i /data/ceph/conf/ceph.key
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1337, in _handle_command
return
Following up on this and other comments, there are 2 different time delays. One
(1) is the time it takes from killing an MDS until a stand-by is made an
active rank, and (2) the time it takes for the new active rank to restore all
client sessions. My experience is that (1) takes close to 0
We haven't found a more 'elegant' way, but the process we follow: we
pre-create all the pools prior to creating the realm/zonegroup/zone, then
we period apply, then we remove the default zonegroup/zone, period apply,
then remove the default pools.
Hope this is at least somewhat helpful,
David
On
I wouldn't recommend a colocated MDS in a production environment.
Zitat von Lokendra Rathour :
Hello Frank,
Thanks for your inputs.
*Responding to your Queries , Kindly refer below:*
- *Do you have services co-located? *
- [loke] : Yes they are colocated:
- Cephnode1 :
Hello Frank,
Thanks for your inputs.
*Responding to your Queries , Kindly refer below:*
- *Do you have services co-located? *
- [loke] : Yes they are colocated:
- Cephnode1 : MDS,MGR,MON,RGW,OSD,MDS
- Cephnode2: MDS,MGR,MON,RGW,OSD,MDS
- Cephnode3: MON
-
Yes Patric,
In the process of killing MDS we are also *killing Monitor along with
OSD,Mgr and RGW*. we are performing Poweroff/Reboot the complete node (with
MDS,Mon,RGW,OSD,Mgr daemon).
Cluster: 2 Nodes with MDS|Mon|RGW|OSD each and third node with 1 Mon.
Note : when I am only stopping the MDS
Hi Rob.
I think I wasn't clear enough with the first mail.
I'm having issues with the RGW. radosgw-admin or s3 can not access
some objects in the bucket. These objects are exist in the "RADOS" and
I can export with "rados get -p $pooll $object".
But the problem ise 4M chunk and multiparts. I have
Hi Dan,
just restarted all MONs, no change though :(
Thanks for looking at this. I will wait until tomorrow. My plan is to get the
disk up again with the same OSD ID and would expect that this will eventually
allow the message to be cleared.
Best regards,
=
Frank Schilder
AIT
Ok,
Will try with nautilus as well.
But we are really configuring too many variables to achieve 10 seconds of
failover time.
Is it possible for you to share the setup details.
Like we are using
2 node ceph cluster in health ok (configured replication factor and related
variables)
Hardware is HP,
Hi Vladimir,
thanks for your reply. I did, the cluster is healthy:
[root@gnosis ~]# ceph status
cluster:
id: ---
health: HEALTH_WARN
430 slow ops, oldest one blocked for 36 sec, osd.580 has slow ops
services:
mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
On Mon, May 3, 2021 at 6:36 AM Lokendra Rathour
wrote:
>
> Hi Team,
> I was setting up the ceph cluster with
>
>- Node Details:3 Mon,2 MDS, 2 Mgr, 2 RGW
>- Deployment Type: Active Standby
>- Testing Mode: Failover of MDS Node
>- Setup : Octopus (15.2.7)
>- OS: centos 8.3
>
Hi,
Yes we tried ceph-standby-replay but could not see much difference in the
handover time. It was comming as 35 to 40 seconds in either case.
Did you also changed these variables (as mentioned above) along with the
hot-standby ?
no, we barely differ from the default configs and haven't
Dear all,
I'm having a hard time troubleshooting a file-system failure on my 3 node
cluster (deployed with cephadm + docker). After moving some files between
folders, the cluster became laggy and Metadata Servers started failing and got
stuck in rejoin state. Of course I already tried to
Hello Eugen,
Thankyou for the response.
Yes we tried ceph-standby-replay but could not see much difference in the
handover time. It was comming as 35 to 40 seconds in either case.
Did you also changed these variables (as mentioned above) along with the
hot-standby ?
Couple of seconds is
Created BugTicket : https://tracker.ceph.com/issues/50616
> On Mon May 03 2021 21:49:41 GMT+0800 (Singapore Standard Time), Ashley
> Merrick wrote:
> Just checked cluster logs and they are full of:cephadm exited with an error
> code: 1, stderr:Reconfig daemon osd.16 ... Traceback (most recent
Also there's a difference between 'standby-replay' (hot standby) and
just 'standby'. We use CephFS for a couple of years now with
standby-replay and the failover takes a couple of seconds max,
depending on the current load. Have you tried to enable the
standby-replay config and tested the
Just checked cluster logs and they are full of:cephadm exited with an error
code: 1, stderr:Reconfig daemon osd.16 ... Traceback (most recent call last):
File
"/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484bdc911a9c50d6408adfca696c2faaa65c018d660a3b697d119482",
line
hello
perhaps you should have more than one MDS active.
mds: cephfs:3 {0=cephfs-d=up:active,1=cephfs-e=up:active,2=cephfs-
a=up:active} 1 up:standby-replay
I got 3 active mds and one standby.
I'm using rook in kubernetes for this setup.
oau
Le lundi 03 mai 2021 à 19:06 +0530, Lokendra
Hi Team,
I was setting up the ceph cluster with
- Node Details:3 Mon,2 MDS, 2 Mgr, 2 RGW
- Deployment Type: Active Standby
- Testing Mode: Failover of MDS Node
- Setup : Octopus (15.2.7)
- OS: centos 8.3
- hardware: HP
- Ram: 128 GB on each Node
- OSD: 2 ( 1 tb each)
-
Wait, first just restart the leader mon.
See: https://tracker.ceph.com/issues/47380 for a related issue.
-- dan
On Mon, May 3, 2021 at 2:55 PM Vladimir Sigunov
wrote:
>
> Hi Frank,
> Yes, I would purge the osd. The cluster looks absolutely healthy except of
> this osd.584 Probably, the purge
Hi Frank,
Yes, I would purge the osd. The cluster looks absolutely healthy except of this
osd.584 Probably, the purge will help the cluster to forget this faulty one.
Also, I would restart monitors, too.
With the amount of data you maintain in your cluster, I don't think your
ceph.conf
Hi Morphin,
There are multiple ways you can do this.
1. run a radosgw-admin bucket radoslist --bucket write
that output to a file, grep all entries containing the object name '
im034113.jpg', sort that list and download them.
2. run a radosgw-admin object stat --bucket --object this
Hi Frank.
Check your cluster for inactive/incomplete placement groups. I saw similar
behavior on Octopus when some pgs stuck in incomplete/inactive or peering state.
From: Frank Schilder
Sent: Monday, May 3, 2021 3:42:48 AM
To: ceph-users@ceph.io
Subject:
Dear cephers,
I have a strange problem. An OSD went down and recovery finished. For some
reason, I have a slow ops warning for the failed OSD stuck in the system:
health: HEALTH_WARN
430 slow ops, oldest one blocked for 36 sec, osd.580 has slow ops
The OSD is auto-out:
| 580 |
On Mon, May 3, 2021 at 12:24 PM Magnus Harlander wrote:
>
> Am 03.05.21 um 11:22 schrieb Ilya Dryomov:
>
> There is a 6th osd directory on both machines, but it's empty
>
> [root@s0 osd]# ll
> total 0
> drwxrwxrwt. 2 ceph ceph 200 2. Mai 16:31 ceph-1
> drwxrwxrwt. 2 ceph ceph 200 2. Mai 16:31
On Mon, May 3, 2021 at 12:27 PM Magnus Harlander wrote:
>
> Am 03.05.21 um 12:25 schrieb Ilya Dryomov:
>
> ceph osd setmaxosd 10
>
> Bingo! Mount works again.
>
> Vry strange things are going on here (-:
>
> Thanx a lot for now!! If I can help to track it down, please let me know.
Good to
On Mon, May 3, 2021 at 12:00 PM Magnus Harlander wrote:
>
> Am 03.05.21 um 11:22 schrieb Ilya Dryomov:
>
> max_osd 12
>
> I never had more then 10 osds on the two osd nodes of this cluster.
>
> I was running a 3 osd-node cluster earlier with more than 10
> osds, but the current cluster has been
I created an issue during the weekend without problems:
https://tracker.ceph.com/issues/50604
On 05/03 09:36, Tobias Urdin wrote:
> Hello,
>
> Anybody, still error?
>
>
> Best regards
>
> -
>
>
> Internal error
> An error occurred on the page you were trying to access.
> If you
Hello,
Anybody, still error?
Best regards
-
Internal error
An error occurred on the page you were trying to access.
If you continue to experience problems please contact your Redmine
administrator for assistance.
If you are the Redmine administrator, check your log files for details
On Mon, May 3, 2021 at 9:20 AM Magnus Harlander wrote:
>
> Am 03.05.21 um 00:44 schrieb Ilya Dryomov:
>
> On Sun, May 2, 2021 at 11:15 PM Magnus Harlander wrote:
>
> Hi,
>
> I know there is a thread about problems with mounting cephfs with 5.11
> kernels.
>
> ...
>
> Hi Magnus,
>
> What is the
Hello,Wondering if anyone had any feedback on some commands I could try to
manually update the current OSD that is down to 16.2.1 so I can at least get
around this upgrade bug and back to 100%?If there is any log's or if it seems a
new bug and I should create a bugzilla report do let me
34 matches
Mail list logo