date:20230504

[ceph-users] Re: pg deep-scrub issue

2023-05-04 Thread Janne Johansson

>undergo deepscrub and regular scrub cannot be completed in a timely manner. I 
>have noticed that these PGs appear to be concentrated on a single OSD. I am 
>seeking your guidance on how to address this issue and would appreciate any 
>insights or suggestions you may have.
>

The usual "see if there are SMART errors on the drive", "check dmesg
for this drive" and see if this OSD has lots larger latencies* than
the other similar drives and if any of these are true, take it out of
the cluster and replace it with a new working drive.

*) Perhaps with iostat, checking the service time and utilization%, perhaps with
"# ceph daemon osd.X perf dump" on the host running this OSD, "ceph
osd perf" and see if this one OSD is an outlier in terms of latencies


-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Unable to restart mds - mds crashes almost immediately after finishing recovery

2023-05-04 Thread Xiubo Li


Hi Emmanuel,

This should be one known issue as https://tracker.ceph.com/issues/58392 
and there is one fix in https://github.com/ceph/ceph/pull/49652.


Could you just stop all the clients first and then set the 'max_mds' to 
1 and then restart the MDS daemons ?


Thanks

On 5/3/23 16:01, Emmanuel Jaep wrote:

Hi,

I just inherited a ceph storage. Therefore, my level of confidence with the 
tool is certainly less than ideal.

We currently have an mds server that refuses to come back online. While 
reviewing the logs, I can see that, upon mds start, the recovery goes well:
```
-10> 2023-05-03T08:12:43.632+0200 7f345d00b700  1 mds.4.2638711 cluster 
recovered.
  12: (MDCache::_open_ino_traverse_dir(inodeno_t, MDCache::open_ino_info_t&, 
int)+0xbf) [0x558323d602df]
```

However, rights after this message, ceph handles a couple of clients request:
```
 -9> 2023-05-03T08:12:43.632+0200 7f345d00b700  4 mds.4.2638711 
set_osd_epoch_barrier: epoch=249241
 -8> 2023-05-03T08:12:43.632+0200 7f3459003700  2 mds.4.cache Memory usage: 
 total 2739784, rss 2321188, heap 348412, baseline 315644, 0 / 765023 inodes have 
caps, 0 caps, 0 caps per inode
 -7> 2023-05-03T08:12:43.688+0200 7f3458802700  4 mds.4.server 
handle_client_request client_request(client.108396030:57271 lookup 
#0x70001516236/012385530.npy 2023-05-02T20:37:19.675666+0200 RETRY=6 
caller_uid=135551, caller_gid=11157{0,4,27,11157,}) v5
 -6> 2023-05-03T08:12:43.688+0200 7f3458802700  4 mds.4.server 
handle_client_request client_request(client.104073212:5109945 readdir 
#0x70001516236 2023-05-02T20:36:29.517066+0200 RETRY=6 caller_uid=180090, 
caller_gid=11157{0,4,27,11157,}) v5
 -5> 2023-05-03T08:12:43.688+0200 7f3458802700  4 mds.4.server 
handle_client_request client_request(client.104288735:3008344 readdir 
#0x70001516236 2023-05-02T20:36:29.520801+0200 RETRY=6 caller_uid=135551, 
caller_gid=11157{0,4,27,11157,}) v5
 -4> 2023-05-03T08:12:43.688+0200 7f3458802700  4 mds.4.server 
handle_client_request client_request(client.8558540:46306346 readdir 
#0x700019ba15e 2023-05-01T21:26:34.303697+0200 RETRY=49 caller_uid=0, 
caller_gid=0{}) v2
 -3> 2023-05-03T08:12:43.688+0200 7f3458802700  4 mds.4.server 
handle_client_request client_request(client.96913903:2156912 create 
#0x1000b37db9a/street-photo-3.png 2023-05-01T17:27:37.454042+0200 RETRY=59 
caller_uid=271932, caller_gid=30034{}) v2
 -2> 2023-05-03T08:12:43.688+0200 7f345d00b700  5 mds.icadmin006 
handle_mds_map old map epoch 2638715 <= 2638715, discarding
```

and crashes:
```
 -1> 2023-05-03T08:12:43.692+0200 7f345d00b700 -1 
/build/ceph-16.2.10/src/mds/Server.cc: In function 'void 
Server::handle_client_open(MDRequestRef&)' thread 7f345d00b700 time 
2023-05-03T08:12:43.694660+0200
/build/ceph-16.2.10/src/mds/Server.cc: 4240: FAILED ceph_assert(cur->is_auth())

  ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific 
(stable)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x152) [0x7f3462533d65]
  2: /usr/lib/ceph/libceph-common.so.2(+0x265f6d) [0x7f3462533f6d]
  3: (Server::handle_client_open(boost::intrusive_ptr&)+0x1834) 
[0x558323c89f04]
  4: (Server::handle_client_openc(boost::intrusive_ptr&)+0x28f) 
[0x558323c925ef]
  5: 
(Server::dispatch_client_request(boost::intrusive_ptr&)+0xa45) 
[0x558323cc3575]
  6: (MDCache::dispatch_request(boost::intrusive_ptr&)+0x3d) 
[0x558323d7460d]
  7: (MDSContext::complete(int)+0x61) [0x558323f68681]
  8: (MDCache::_open_remote_dentry_finish(CDentry*, inodeno_t, MDSContext*, 
bool, int)+0x3e) [0x558323d3edce]
  9: (C_MDC_OpenRemoteDentry::finish(int)+0x3e) [0x558323de6cce]
  10: (MDSContext::complete(int)+0x61) [0x558323f68681]
  11: (MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, 
int)+0xcf) [0x558323d5ff2f]
  12: (MDCache::_open_ino_traverse_dir(inodeno_t, MDCache::open_ino_info_t&, 
int)+0xbf) [0x558323d602df]
  13: (MDSContext::complete(int)+0x61) [0x558323f68681]
  14: (MDSRank::_advance_queues()+0x88) [0x558323c23c38]
  15: (MDSRank::_dispatch(boost::intrusive_ptr const&, 
bool)+0x1fa) [0x558323c24a1a]
  16: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr 
const&)+0x5e) [0x558323c254fe]
  17: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr const&)+0x1d6) 
[0x558323bfd906]
  18: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr 
const&)+0x460) [0x7f34627854e0]
  19: (DispatchQueue::entry()+0x58f) [0x7f3462782d7f]
  20: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f346284eee1]
  21: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f3462278609]
  22: clone()

  0> 2023-05-03T08:12:43.700+0200 7f345d00b700 -1 *** Caught signal 
(Aborted) **
  in thread 7f345d00b700 thread_name:ms_dispatch

  ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific 
(stable)
  1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0) [0x7f34622843c0]
  2: gsignal()
  3: abort()
  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char

[ceph-users] Re: client isn't responding to mclientcaps(revoke), pending pAsLsXsFsc issued pAsLsXsFsc

2023-05-04 Thread Xiubo Li



On 5/1/23 17:35, Frank Schilder wrote:

Hi all,

I think we might be hitting a known problem 
(https://tracker.ceph.com/issues/57244). I don't want to fail the mds yet, 
because we have troubles with older kclients that miss the mds restart and hold 
on to cache entries referring to the killed instance, leading to hanging jobs 
on our HPC cluster.


Will this cause any issue in your case ?


I have seen this issue before and there was a process in D-state that 
dead-locked itself. Usually, killing this process succeeded and resolved the 
issue. However, this time I can't find such a process.


BTW, what's the D-state process ? A ceph one ?

Thanks


The tracker mentions that one can delete the file/folder. I have the inode 
number, but really don't want to start a find on a 1.5PB file system. Is there 
a better way to find what path is causing the issue (ask the MDS directly, look 
at a cache dump, or similar)? Is there an alternative to deletion or MDS fail?

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Change in DMARC handling for the list

2023-05-04 Thread Dan Mick

Several users have complained for some time that our DMARC/DKIM handling 
is not correct.  I've recently had time to go study DMARC, DKIM, SPF, 
SRS, and other tasty morsels of initialisms, and have thus made a change 
to how Mailman handles DKIM signatures for the list:


If a domain advertises that it will reject or quarantine messages that 
fail DKIM (through its DMARC policy in the DNS text record 
_dmarc.), the message will be rewritten to be "From" ceph.io, 
and SPF should be correct.  I do not know if it will regenerate a DKIM 
signature in that case for what is now it's own message.  The From: 
address will say something like "From Original Sender via ceph-users 
 so it's somewhat clear who first sent the message, 
and Reply-To will be set to Original Sender.


Again, this will only happen for senders from domains that advertise a 
strict DMARC policy.  This does not include gmail.com, surprisingly.


Let me know if you notice anything that seems to have gotten worse.

Next on the list is to investigate if DKIM-signing outbound messages, or 
at least ones that don't already have an ARC-Seal, is appropriate and/or 
workable.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] pg deep-scrub issue

2023-05-04 Thread Peter

Dear all,

I am writing to seek your assistance in resolving an issue with my Ceph cluster.

Currently, the cluster is experiencing a problem where the number of Placement 
Groups (PGs) that need to undergo deepscrub and regular scrub cannot be 
completed in a timely manner. I have noticed that these PGs appear to be 
concentrated on a single OSD. I am seeking your guidance on how to address this 
issue and would appreciate any insights or suggestions you may have.

Please find attached the Ceph health detail output for your reference.
HEALTH_WARN 13 pgs not deep-scrubbed in time; 7 pgs not scrubbed in time
[WRN] PG_NOT_DEEP_SCRUBBED: 13 pgs not deep-scrubbed in time
pg 4.426 not deep-scrubbed since 2023-04-22T10:00:21.529716+0800
pg 4.f0 not deep-scrubbed since 2023-04-22T04:55:17.868881+0800
pg 4.b9 not deep-scrubbed since 2023-04-22T16:47:25.219603+0800
pg 4.87 not deep-scrubbed since 2023-04-22T20:01:02.508600+0800
pg 4.31 not deep-scrubbed since 2023-04-23T00:27:39.299893+0800
pg 4.5b9 not deep-scrubbed since 2023-04-19T21:03:47.041934+0800
pg 4.68a not deep-scrubbed since 2023-04-21T19:52:39.251293+0800
pg 4.6a4 not deep-scrubbed since 2023-04-22T16:20:51.078431+0800
pg 4.6ec not deep-scrubbed since 2023-04-21T11:20:33.661595+0800
pg 4.7a4 not deep-scrubbed since 2023-04-20T22:30:44.506420+0800
pg 4.7a2 not deep-scrubbed since 2023-04-16T12:05:56.586205+0800
pg 4.7b4 not deep-scrubbed since 2023-04-17T15:50:10.595292+0800
pg 4.7c8 not deep-scrubbed since 2023-04-19T15:10:12.673655+0800
[WRN] PG_NOT_SCRUBBED: 7 pgs not scrubbed in time
pg 4.31e not scrubbed since 2023-04-24T13:34:26.103257+0800
pg 4.5b9 not scrubbed since 2023-04-24T07:20:53.891175+0800
pg 4.68a not scrubbed since 2023-04-24T03:37:58.070854+0800
pg 4.7a4 not scrubbed since 2023-04-24T02:55:25.912789+0800
pg 4.7b4 not scrubbed since 2023-04-24T10:04:46.889422+0800
pg 4.7c8 not scrubbed since 2023-04-24T13:36:07.284271+0800
pg 4.7d2 not scrubbed since 2023-04-24T14:47:19.365551+0800

Peter

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CephFS Scrub Questions

2023-05-04 Thread Patrick Donnelly

On Thu, May 4, 2023 at 11:35 AM Chris Palmer  wrote:
>
> Hi
>
> Grateful if someone could clarify some things about CephFS Scrubs:
>
> 1) Am I right that a command such as "ceph tell mds.cephfs:0 scrub start
> / recursive" only triggers a forward scrub (not a backward scrub)?

The naming here that has become conventional is unfortunate. Forward
scrub really just means metadata scrub. There is no data integrity
checking.

cephfs-data-scan ("backward" scrub) is just attempting to recover
metadata from what's available on the data pool.

To answer your question: yes.

> 2) I couldn't find any reference to forward scrubs being done
> automatically and was wondering whether I should do them using cron? But
> then I saw an undated (but I think a little elderly) presentation by
> Greg Farnum that states that "forward scrub...runs continuously in the
> background". Is that still correct (for Quincy), and if so what controls
> the frequency?

He was probably referring to RADOS scrub. CephFS does not have any
continuous scrub and has no plans to introduce one.

> 3) Are backward scrubs always manual, using the 3 cephfs-data-scan phases?

Technically there are 5 phases with some other steps. Please check:
https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#recovery-from-missing-metadata-objects

> 4) Are regular backward scrubs recommended, or only if there is
> indication of a problem? (With due regard to the amount of time they may
> take...)

cephfs-data-scan should only be employed for disaster recovery.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-04 Thread Radoslaw Zarzynski

If we get some time, I would like to include:

  https://github.com/ceph/ceph/pull/50894.

Regards,
Radek

On Thu, May 4, 2023 at 5:56 PM Venky Shankar  wrote:
>
> Hi Yuri,
>
> On Wed, May 3, 2023 at 7:10 PM Venky Shankar  wrote:
> >
> > On Tue, May 2, 2023 at 8:25 PM Yuri Weinstein  wrote:
> > >
> > > Venky, I did plan to cherry-pick this PR if you approve this (this PR
> > > was used for a rerun)
> >
> > OK. The fs suite failure is being looked into
> > (https://tracker.ceph.com/issues/59626).
>
> Fix is being tracked by
>
> https://github.com/ceph/ceph/pull/51344
>
> Once ready, it needs to be included in 16.2.13 and would require a fs
> suite re-run (although re-renning the failed tests should suffice,
> however, I'm a bit inclined in putting it through the fs suite).
>
> >
> > >
> > > On Tue, May 2, 2023 at 7:51 AM Venky Shankar  wrote:
> > > >
> > > > Hi Yuri,
> > > >
> > > > On Fri, Apr 28, 2023 at 2:53 AM Yuri Weinstein  
> > > > wrote:
> > > > >
> > > > > Details of this release are summarized here:
> > > > >
> > > > > https://tracker.ceph.com/issues/59542#note-1
> > > > > Release Notes - TBD
> > > > >
> > > > > Seeking approvals for:
> > > > >
> > > > > smoke - Radek, Laura
> > > > > rados - Radek, Laura
> > > > >   rook - Sébastien Han
> > > > >   cephadm - Adam K
> > > > >   dashboard - Ernesto
> > > > >
> > > > > rgw - Casey
> > > > > rbd - Ilya
> > > > > krbd - Ilya
> > > > > fs - Venky, Patrick
> > > >
> > > > There are a couple of new failures which are qa/test related - I'll
> > > > have a look at those (they _do not_ look serious).
> > > >
> > > > Also, Yuri, do you plan to merge
> > > >
> > > > https://github.com/ceph/ceph/pull/51232
> > > >
> > > > into the pacific-release branch although it's tagged with one of your
> > > > other pacific runs?
> > > >
> > > > > upgrade/octopus-x (pacific) - Laura (look the same as in 16.2.8)
> > > > > upgrade/pacific-p2p - Laura
> > > > > powercycle - Brad (SELinux denials)
> > > > > ceph-volume - Guillaume, Adam K
> > > > >
> > > > > Thx
> > > > > YuriW
> > > > > ___
> > > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > >
> > > >
> > > >
> > > > --
> > > > Cheers,
> > > > Venky
> > > >
> > >
> >
> >
> > --
> > Cheers,
> > Venky
>
>
>
> --
> Cheers,
> Venky
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-04 Thread Yuri Weinstein

In summary:

Release Notes:  https://github.com/ceph/ceph/pull/51301

We plan to finish this release next week and we have the following PRs
planned to be added:

https://github.com/ceph/ceph/pull/51232 -- Venky approved
https://github.com/ceph/ceph/pull/51344  -- Venky in progress
https://github.com/ceph/ceph/pull/51200 -- Casey approved
https://github.com/ceph/ceph/pull/50894  -- Radek in progress

As soon as these PRs are finalized, I will cherry-pick them and
rebuild "pacific-release" and rerun appropriate suites.

On Thu, May 4, 2023 at 9:07 AM Radoslaw Zarzynski  wrote:
>
> If we get some time, I would like to include:
>
>   https://github.com/ceph/ceph/pull/50894.
>
> Regards,
> Radek
>
> On Thu, May 4, 2023 at 5:56 PM Venky Shankar  wrote:
> >
> > Hi Yuri,
> >
> > On Wed, May 3, 2023 at 7:10 PM Venky Shankar  wrote:
> > >
> > > On Tue, May 2, 2023 at 8:25 PM Yuri Weinstein  wrote:
> > > >
> > > > Venky, I did plan to cherry-pick this PR if you approve this (this PR
> > > > was used for a rerun)
> > >
> > > OK. The fs suite failure is being looked into
> > > (https://tracker.ceph.com/issues/59626).
> >
> > Fix is being tracked by
> >
> > https://github.com/ceph/ceph/pull/51344
> >
> > Once ready, it needs to be included in 16.2.13 and would require a fs
> > suite re-run (although re-renning the failed tests should suffice,
> > however, I'm a bit inclined in putting it through the fs suite).
> >
> > >
> > > >
> > > > On Tue, May 2, 2023 at 7:51 AM Venky Shankar  
> > > > wrote:
> > > > >
> > > > > Hi Yuri,
> > > > >
> > > > > On Fri, Apr 28, 2023 at 2:53 AM Yuri Weinstein  
> > > > > wrote:
> > > > > >
> > > > > > Details of this release are summarized here:
> > > > > >
> > > > > > https://tracker.ceph.com/issues/59542#note-1
> > > > > > Release Notes - TBD
> > > > > >
> > > > > > Seeking approvals for:
> > > > > >
> > > > > > smoke - Radek, Laura
> > > > > > rados - Radek, Laura
> > > > > >   rook - Sébastien Han
> > > > > >   cephadm - Adam K
> > > > > >   dashboard - Ernesto
> > > > > >
> > > > > > rgw - Casey
> > > > > > rbd - Ilya
> > > > > > krbd - Ilya
> > > > > > fs - Venky, Patrick
> > > > >
> > > > > There are a couple of new failures which are qa/test related - I'll
> > > > > have a look at those (they _do not_ look serious).
> > > > >
> > > > > Also, Yuri, do you plan to merge
> > > > >
> > > > > https://github.com/ceph/ceph/pull/51232
> > > > >
> > > > > into the pacific-release branch although it's tagged with one of your
> > > > > other pacific runs?
> > > > >
> > > > > > upgrade/octopus-x (pacific) - Laura (look the same as in 16.2.8)
> > > > > > upgrade/pacific-p2p - Laura
> > > > > > powercycle - Brad (SELinux denials)
> > > > > > ceph-volume - Guillaume, Adam K
> > > > > >
> > > > > > Thx
> > > > > > YuriW
> > > > > > ___
> > > > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Cheers,
> > > > > Venky
> > > > >
> > > >
> > >
> > >
> > > --
> > > Cheers,
> > > Venky
> >
> >
> >
> > --
> > Cheers,
> > Venky
> > ___
> > Dev mailing list -- d...@ceph.io
> > To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Radosgw: ssl_private_key could not find the file even if it existed

2023-05-04 Thread Janne Johansson

Den tors 4 maj 2023 kl 17:07 skrev :
>
> The radosgw has been configured like this:
>
> [client.rgw.ceph1]
> host = ceph1
> rgw_frontends = beast port=8080 ssl_port=443 ssl_certificate=/root/ssl/ca.crt 
> ssl_private_key=/root/ssl/ca.key
> #rgw_frontends = beast port=8080 ssl_port=443 
> ssl_certificate=/root/ssl/ca.crt 
> ssl_private_key=config://rgw/cert/default/ca.key
> admin_socket = /var/run/ceph/ceph-client.rgw.ceph1
>
> but I'm getting this error:
>
> failed to add ssl_private_key=/root/ssl/ca.key: No such file or directory
>
> I also tried to import the key into ceph db and provided the path with 
> config://, but it doesn't work too.
>
> Anyone have any ideal? Thanks

Perhaps rgw already runs as ceph user and can't read into /root or the
key file due to normal file permissions?


-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-04 Thread Venky Shankar

Hi Yuri,

On Wed, May 3, 2023 at 7:10 PM Venky Shankar  wrote:
>
> On Tue, May 2, 2023 at 8:25 PM Yuri Weinstein  wrote:
> >
> > Venky, I did plan to cherry-pick this PR if you approve this (this PR
> > was used for a rerun)
>
> OK. The fs suite failure is being looked into
> (https://tracker.ceph.com/issues/59626).

Fix is being tracked by

https://github.com/ceph/ceph/pull/51344

Once ready, it needs to be included in 16.2.13 and would require a fs
suite re-run (although re-renning the failed tests should suffice,
however, I'm a bit inclined in putting it through the fs suite).

>
> >
> > On Tue, May 2, 2023 at 7:51 AM Venky Shankar  wrote:
> > >
> > > Hi Yuri,
> > >
> > > On Fri, Apr 28, 2023 at 2:53 AM Yuri Weinstein  
> > > wrote:
> > > >
> > > > Details of this release are summarized here:
> > > >
> > > > https://tracker.ceph.com/issues/59542#note-1
> > > > Release Notes - TBD
> > > >
> > > > Seeking approvals for:
> > > >
> > > > smoke - Radek, Laura
> > > > rados - Radek, Laura
> > > >   rook - Sébastien Han
> > > >   cephadm - Adam K
> > > >   dashboard - Ernesto
> > > >
> > > > rgw - Casey
> > > > rbd - Ilya
> > > > krbd - Ilya
> > > > fs - Venky, Patrick
> > >
> > > There are a couple of new failures which are qa/test related - I'll
> > > have a look at those (they _do not_ look serious).
> > >
> > > Also, Yuri, do you plan to merge
> > >
> > > https://github.com/ceph/ceph/pull/51232
> > >
> > > into the pacific-release branch although it's tagged with one of your
> > > other pacific runs?
> > >
> > > > upgrade/octopus-x (pacific) - Laura (look the same as in 16.2.8)
> > > > upgrade/pacific-p2p - Laura
> > > > powercycle - Brad (SELinux denials)
> > > > ceph-volume - Guillaume, Adam K
> > > >
> > > > Thx
> > > > YuriW
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> > >
> > >
> > > --
> > > Cheers,
> > > Venky
> > >
> >
>
>
> --
> Cheers,
> Venky



-- 
Cheers,
Venky
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Upgrading from Pacific to Quincy fails with "Unexpected error"

2023-05-04 Thread Adam King

for setting the user, `ceph cephadm set-user` command should do it. Bit
surprised by the second part of that though. With passwordless sudo access
I would have expected that to start working.

On Thu, May 4, 2023 at 11:27 AM Reza Bakhshayeshi 
wrote:

> Thank you.
> I don't see any more errors rather than:
>
> 2023-05-04T15:07:38.003+ 7ff96cbe0700  0 log_channel(cephadm) log
> [DBG] : Running command: sudo which python3
> 2023-05-04T15:07:38.025+ 7ff96cbe0700  0 log_channel(cephadm) log
> [DBG] : Connection to host1 failed. Process exited with non-zero exit
> status 3
> 2023-05-04T15:07:38.025+ 7ff96cbe0700  0 log_channel(cephadm) log
> [DBG] : _reset_con close host1
>
> What is the best way to safely change the cephadm user to root for the
> existing cluster? It seems "ceph cephadm set-ssh-config" is not effective
> (BTW, my cephadmin user can run "sudo which python3" without prompting
> password on other hosts now, but nothing has been solved)
>
> Best regards,
> Reza
>
> On Tue, 2 May 2023 at 19:00, Adam King  wrote:
>
>> The number of mgr daemons thing is expected. The way it works is it first
>> upgrades all the standby mgrs (which will be all but one) and then fails
>> over so the previously active mgr can be upgraded as well. After that
>> failover is when it's first actually running the newer cephadm code, which
>> is when you're hitting this issue. Are the logs still saying something
>> similar about how "sudo which python3" is failing? I'm thinking this
>> might just be a general issue with the user being used not having
>> passwordless sudo access, that sort of accidentally working in pacific, but
>> now not working any more in quincy. If the log lines confirm the same, we
>> might have to work on something in order to handle this case (making the
>> sudo optional somehow). As mentioned in the previous email, that setup
>> wasn't intended to be supported even in pacific, although if it did work,
>> we could bring something in to make it usable in quincy onward as well.
>>
>> On Tue, May 2, 2023 at 10:58 AM Reza Bakhshayeshi 
>> wrote:
>>
>>> Hi Adam,
>>>
>>> I'm still struggling with this issue. I also checked it one more time
>>> with newer versions, upgrading the cluster from 16.2.11 to 16.2.12 was
>>> successful but from 16.2.12 to 17.2.6 failed again with the same ssh errors
>>> (I checked
>>> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#ssh-errors a
>>> couple of times and all keys/access are fine).
>>>
>>> [root@host1 ~]# ceph health detail
>>> HEALTH_ERR Upgrade: Failed to connect to host host2 at addr (x.x.x.x)
>>> [ERR] UPGRADE_OFFLINE_HOST: Upgrade: Failed to connect to host host2 at
>>> addr (x.x.x.x)
>>> SSH connection failed to host2 at addr (x.x.x.x): Host(s) were
>>> marked offline: {'host2', 'host6', 'host9', 'host4', 'host3', 'host5',
>>> 'host1', 'host7', 'host8'}
>>>
>>> The interesting thing is that always (total number of mgrs) - 1 is
>>> upgraded, If I provision 5 MGRs then 4 of them, and for 3, 2 of them!
>>>
>>> As long as I'm in an internal environment, I also checked the process
>>> with Quincy cephadm binary file. FYI I'm using stretch mode on this cluster.
>>>
>>> I don't understand why Quincy MGRs cannot ssh into Pacific nodes, if you
>>> have any more hints I would be really glad to hear.
>>>
>>> Best regards,
>>> Reza
>>>
>>>
>>>
>>> On Wed, 12 Apr 2023 at 17:18, Adam King  wrote:
>>>
 Ah, okay. Someone else had opened an issue about the same thing after
 the 17.2.5 release I believe. It's changed in 17.2.6 at least to only use
 sudo for non-root users
 https://github.com/ceph/ceph/blob/v17.2.6/src/pybind/mgr/cephadm/ssh.py#L148-L153.
 But it looks like you're also using a non-root user anyway. We've required
 passwordless sudo access for custom ssh users for a long time I think (e.g.
 it's in pacific docs
 https://docs.ceph.com/en/pacific/cephadm/install/#further-information-about-cephadm-bootstrap,
 see the point on "--ssh-user"). Did this actually work for you before in
 pacific with a non-root user that doesn't have sudo privileges? I had
 assumed that had never worked.

 On Wed, Apr 12, 2023 at 10:38 AM Reza Bakhshayeshi <
 reza.b2...@gmail.com> wrote:

> Thank you Adam for your response,
>
> I tried all your comments and the troubleshooting link you sent. From
> the Quincy mgrs containers, they can ssh into all other Pacific nodes
> successfully by running the exact command in the log output and vice 
> versa.
>
> Here are some debug logs from the cephadm while updating:
>
> 2023-04-12T11:35:56.260958+ mgr.host8.jukgqm (mgr.4468627) 103 :
> cephadm [DBG] Opening connection to cephadmin@x.x.x.x with ssh
> options '-F /tmp/cephadm-conf-2bbfubub -i /tmp/cephadm-identity-7x2m8gvr'
> 2023-04-12T11:35:56.525091+ mgr.host8.jukgqm (mgr.4468627) 144 :
> cephadm [DBG] _run_cephadm : command = ls
>

[ceph-users] CephFS Scrub Questions

2023-05-04 Thread Chris Palmer


Hi

Grateful if someone could clarify some things about CephFS Scrubs:

1) Am I right that a command such as "ceph tell mds.cephfs:0 scrub start 
/ recursive" only triggers a forward scrub (not a backward scrub)?


2) I couldn't find any reference to forward scrubs being done 
automatically and was wondering whether I should do them using cron? But 
then I saw an undated (but I think a little elderly) presentation by 
Greg Farnum that states that "forward scrub...runs continuously in the 
background". Is that still correct (for Quincy), and if so what controls 
the frequency?


3) Are backward scrubs always manual, using the 3 cephfs-data-scan phases?

4) Are regular backward scrubs recommended, or only if there is 
indication of a problem? (With due regard to the amount of time they may 
take...)


Thanks for any advice.

Regards, Chris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Upgrading from Pacific to Quincy fails with "Unexpected error"

2023-05-04 Thread Reza Bakhshayeshi

Thank you.
I don't see any more errors rather than:

2023-05-04T15:07:38.003+ 7ff96cbe0700  0 log_channel(cephadm) log [DBG]
: Running command: sudo which python3
2023-05-04T15:07:38.025+ 7ff96cbe0700  0 log_channel(cephadm) log [DBG]
: Connection to host1 failed. Process exited with non-zero exit status 3
2023-05-04T15:07:38.025+ 7ff96cbe0700  0 log_channel(cephadm) log [DBG]
: _reset_con close host1

What is the best way to safely change the cephadm user to root for the
existing cluster? It seems "ceph cephadm set-ssh-config" is not effective
(BTW, my cephadmin user can run "sudo which python3" without prompting
password on other hosts now, but nothing has been solved)

Best regards,
Reza

On Tue, 2 May 2023 at 19:00, Adam King  wrote:

> The number of mgr daemons thing is expected. The way it works is it first
> upgrades all the standby mgrs (which will be all but one) and then fails
> over so the previously active mgr can be upgraded as well. After that
> failover is when it's first actually running the newer cephadm code, which
> is when you're hitting this issue. Are the logs still saying something
> similar about how "sudo which python3" is failing? I'm thinking this
> might just be a general issue with the user being used not having
> passwordless sudo access, that sort of accidentally working in pacific, but
> now not working any more in quincy. If the log lines confirm the same, we
> might have to work on something in order to handle this case (making the
> sudo optional somehow). As mentioned in the previous email, that setup
> wasn't intended to be supported even in pacific, although if it did work,
> we could bring something in to make it usable in quincy onward as well.
>
> On Tue, May 2, 2023 at 10:58 AM Reza Bakhshayeshi 
> wrote:
>
>> Hi Adam,
>>
>> I'm still struggling with this issue. I also checked it one more time
>> with newer versions, upgrading the cluster from 16.2.11 to 16.2.12 was
>> successful but from 16.2.12 to 17.2.6 failed again with the same ssh errors
>> (I checked
>> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#ssh-errors a
>> couple of times and all keys/access are fine).
>>
>> [root@host1 ~]# ceph health detail
>> HEALTH_ERR Upgrade: Failed to connect to host host2 at addr (x.x.x.x)
>> [ERR] UPGRADE_OFFLINE_HOST: Upgrade: Failed to connect to host host2 at
>> addr (x.x.x.x)
>> SSH connection failed to host2 at addr (x.x.x.x): Host(s) were marked
>> offline: {'host2', 'host6', 'host9', 'host4', 'host3', 'host5', 'host1',
>> 'host7', 'host8'}
>>
>> The interesting thing is that always (total number of mgrs) - 1 is
>> upgraded, If I provision 5 MGRs then 4 of them, and for 3, 2 of them!
>>
>> As long as I'm in an internal environment, I also checked the process
>> with Quincy cephadm binary file. FYI I'm using stretch mode on this cluster.
>>
>> I don't understand why Quincy MGRs cannot ssh into Pacific nodes, if you
>> have any more hints I would be really glad to hear.
>>
>> Best regards,
>> Reza
>>
>>
>>
>> On Wed, 12 Apr 2023 at 17:18, Adam King  wrote:
>>
>>> Ah, okay. Someone else had opened an issue about the same thing after
>>> the 17.2.5 release I believe. It's changed in 17.2.6 at least to only use
>>> sudo for non-root users
>>> https://github.com/ceph/ceph/blob/v17.2.6/src/pybind/mgr/cephadm/ssh.py#L148-L153.
>>> But it looks like you're also using a non-root user anyway. We've required
>>> passwordless sudo access for custom ssh users for a long time I think (e.g.
>>> it's in pacific docs
>>> https://docs.ceph.com/en/pacific/cephadm/install/#further-information-about-cephadm-bootstrap,
>>> see the point on "--ssh-user"). Did this actually work for you before in
>>> pacific with a non-root user that doesn't have sudo privileges? I had
>>> assumed that had never worked.
>>>
>>> On Wed, Apr 12, 2023 at 10:38 AM Reza Bakhshayeshi 
>>> wrote:
>>>
 Thank you Adam for your response,

 I tried all your comments and the troubleshooting link you sent. From
 the Quincy mgrs containers, they can ssh into all other Pacific nodes
 successfully by running the exact command in the log output and vice versa.

 Here are some debug logs from the cephadm while updating:

 2023-04-12T11:35:56.260958+ mgr.host8.jukgqm (mgr.4468627) 103 :
 cephadm [DBG] Opening connection to cephadmin@x.x.x.x with ssh options
 '-F /tmp/cephadm-conf-2bbfubub -i /tmp/cephadm-identity-7x2m8gvr'
 2023-04-12T11:35:56.525091+ mgr.host8.jukgqm (mgr.4468627) 144 :
 cephadm [DBG] _run_cephadm : command = ls
 2023-04-12T11:35:56.525406+ mgr.host8.jukgqm (mgr.4468627) 145 :
 cephadm [DBG] _run_cephadm : args = []
 2023-04-12T11:35:56.525571+ mgr.host8.jukgqm (mgr.4468627) 146 :
 cephadm [DBG] mon container image my-private-repo/quay-io/ceph/ceph@sha256
 :1b9803c8984bef8b82f05e233e8fe8ed8f0bba8e5cc2c57f6efaccbeea682add
 2023-04-12T11:35:56.525619+ mgr.host8.jukgqm (mgr.4468627) 147 :

[ceph-users] Radosgw: ssl_private_key could not find the file even if it existed

2023-05-04 Thread viplanghe6

The radosgw has been configured like this:

[client.rgw.ceph1]
host = ceph1
rgw_frontends = beast port=8080 ssl_port=443 ssl_certificate=/root/ssl/ca.crt 
ssl_private_key=/root/ssl/ca.key
#rgw_frontends = beast port=8080 ssl_port=443 ssl_certificate=/root/ssl/ca.crt 
ssl_private_key=config://rgw/cert/default/ca.key
admin_socket = /var/run/ceph/ceph-client.rgw.ceph1

but I'm getting this error:

failed to add ssl_private_key=/root/ssl/ca.key: No such file or directory

I also tried to import the key into ceph db and provided the path with 
config://, but it doesn't work too.

Anyone have any ideal? Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Orchestration seems not to work

2023-05-04 Thread Thomas Widhalm

I uploaded the output there: 
https://nextcloud.widhalm.or.at/nextcloud/s/FCqPM8zRsix3gss


IP 192.168.23.62 is one of my OSDs that were still booting when the 
reconnect tries happened. What makes me wonder is that it's the only one 
listed when there are a few similar ones in the cluster.


On 04.05.23 16:55, Adam King wrote:

what does specifically `ceph log last 200 debug cephadm` spit out? The log
lines you've posted so far I don't think are generated by the orchestrator
so curious what the last actions it took was (and how long ago).

On Thu, May 4, 2023 at 10:35 AM Thomas Widhalm 
wrote:


To completely rule out hung processes, I managed to get another short
shutdown.

Now I'm seeing lots of:

mgr.server handle_open ignoring open from mds.mds01.ceph01.usujbi
v2:192.168.23.61:6800/2922006253; not ready for session (expect reconnect)
mgr finish mon failed to return metadata for mds.mds01.ceph02.otvipq:
(2) No such file or directory

log lines. Seems like it now realises that some of these informations
are stale. But it looks like it's just waiting for it to come back and
not do anything about it.

On 04.05.23 14:48, Eugen Block wrote:

Hi,

try setting debug logs for the mgr:

ceph config set mgr mgr/cephadm/log_level debug

This should provide more details what the mgr is trying and where it's
failing, hopefully. Last week this helped to identify an issue between a
lower pacific issue for me.
Do you see anything in the cephadm.log pointing to the mgr actually
trying something?


Zitat von Thomas Widhalm :


Hi,

I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but
the following problem existed when I was still everywhere on 17.2.5 .

I had a major issue in my cluster which could be solved with a lot of
your help and even more trial and error. Right now it seems that most
is already fixed but I can't rule out that there's still some problem
hidden. The very issue I'm asking about started during the repair.

When I want to orchestrate the cluster, it logs the command but it
doesn't do anything. No matter if I use ceph dashboard or "ceph orch"
in "cephadm shell". I don't get any error message when I try to deploy
new services, redeploy them etc. The log only says "scheduled" and
that's it. Same when I change placement rules. Usually I use tags. But
since they don't work anymore, too, I tried host and umanaged. No
success. The only way I can actually start and stop containers is via
systemctl from the host itself.

When I run "ceph orch ls" or "ceph orch ps" I see services I deployed
for testing being deleted (for weeks now). Ans especially a lot of old
MDS are listed as "error" or "starting". The list doesn't match
reality at all because I had to start them by hand.

I tried "ceph mgr fail" and even a complete shutdown of the whole
cluster with all nodes including all mgs, mds even osd - everything
during a maintenance window. Didn't change anything.

Could you help me? To be honest I'm still rather new to Ceph and since
I didn't find anything in the logs that caught my eye I would be
thankful for hints how to debug.

Cheers,
Thomas
--
http://www.widhalm.or.at
GnuPG : 6265BAE6 , A84CB603
Threema: H7AV7D33
Telegram, Signal: widha...@widhalm.or.at



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


OpenPGP_signature
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Orchestration seems not to work

2023-05-04 Thread Adam King

what does specifically `ceph log last 200 debug cephadm` spit out? The log
lines you've posted so far I don't think are generated by the orchestrator
so curious what the last actions it took was (and how long ago).

On Thu, May 4, 2023 at 10:35 AM Thomas Widhalm 
wrote:

> To completely rule out hung processes, I managed to get another short
> shutdown.
>
> Now I'm seeing lots of:
>
> mgr.server handle_open ignoring open from mds.mds01.ceph01.usujbi
> v2:192.168.23.61:6800/2922006253; not ready for session (expect reconnect)
> mgr finish mon failed to return metadata for mds.mds01.ceph02.otvipq:
> (2) No such file or directory
>
> log lines. Seems like it now realises that some of these informations
> are stale. But it looks like it's just waiting for it to come back and
> not do anything about it.
>
> On 04.05.23 14:48, Eugen Block wrote:
> > Hi,
> >
> > try setting debug logs for the mgr:
> >
> > ceph config set mgr mgr/cephadm/log_level debug
> >
> > This should provide more details what the mgr is trying and where it's
> > failing, hopefully. Last week this helped to identify an issue between a
> > lower pacific issue for me.
> > Do you see anything in the cephadm.log pointing to the mgr actually
> > trying something?
> >
> >
> > Zitat von Thomas Widhalm :
> >
> >> Hi,
> >>
> >> I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but
> >> the following problem existed when I was still everywhere on 17.2.5 .
> >>
> >> I had a major issue in my cluster which could be solved with a lot of
> >> your help and even more trial and error. Right now it seems that most
> >> is already fixed but I can't rule out that there's still some problem
> >> hidden. The very issue I'm asking about started during the repair.
> >>
> >> When I want to orchestrate the cluster, it logs the command but it
> >> doesn't do anything. No matter if I use ceph dashboard or "ceph orch"
> >> in "cephadm shell". I don't get any error message when I try to deploy
> >> new services, redeploy them etc. The log only says "scheduled" and
> >> that's it. Same when I change placement rules. Usually I use tags. But
> >> since they don't work anymore, too, I tried host and umanaged. No
> >> success. The only way I can actually start and stop containers is via
> >> systemctl from the host itself.
> >>
> >> When I run "ceph orch ls" or "ceph orch ps" I see services I deployed
> >> for testing being deleted (for weeks now). Ans especially a lot of old
> >> MDS are listed as "error" or "starting". The list doesn't match
> >> reality at all because I had to start them by hand.
> >>
> >> I tried "ceph mgr fail" and even a complete shutdown of the whole
> >> cluster with all nodes including all mgs, mds even osd - everything
> >> during a maintenance window. Didn't change anything.
> >>
> >> Could you help me? To be honest I'm still rather new to Ceph and since
> >> I didn't find anything in the logs that caught my eye I would be
> >> thankful for hints how to debug.
> >>
> >> Cheers,
> >> Thomas
> >> --
> >> http://www.widhalm.or.at
> >> GnuPG : 6265BAE6 , A84CB603
> >> Threema: H7AV7D33
> >> Telegram, Signal: widha...@widhalm.or.at
> >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Frequent calling monitor election

2023-05-04 Thread Frank Schilder

Hi all,

there was another election after about 2 hours. trying the stop+reboot 
procedure on another mon now. Just for the record, I observe that when I stop 
one mon another goes down as a consequence:

[root@ceph-02 ~]# docker stop ceph-mon
ceph-mon
[root@ceph-02 ~]# ceph status
  cluster:
id: e4ece518-f2cb-4708-b00f-b6bf511e91d9
health: HEALTH_WARN
2/5 mons down, quorum ceph-01,ceph-25,ceph-26

  services:
mon: 5 daemons, quorum  (age 17M), out of quorum: ceph-01, ceph-02, 
ceph-03, ceph-25, ceph-26
mgr: ceph-03(active, since 3d), standbys: ceph-26, ceph-02, ceph-25, ceph-01
mds: con-fs2:8 4 up:standby 8 up:active
osd: 1260 osds: 1260 up (since 3d), 1260 in (since 3M)

  data:
pools:   14 pools, 25065 pgs
objects: 1.88G objects, 3.3 PiB
usage:   4.1 PiB used, 9.0 PiB / 13 PiB avail
pgs: 25038 active+clean
 25active+clean+scrubbing+deep
 2 active+clean+scrubbing

  io:
client:   562 MiB/s rd, 542 MiB/s wr, 3.77k op/s rd, 3.09k op/s wr

This looks like it should not happen either.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

From: Frank Schilder 
Sent: Thursday, May 4, 2023 2:30 PM
To: Gregory Farnum; Dan van der Ster
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: Frequent calling monitor election

Hi all,

I think I can reduce the defcon level a bit. Since I couldn't see something in 
the mon log, I started to try if its a specific mon that causes trouble by 
shutting one by one down for a while. I got lucky at the first try. Shutting 
down the leader stopped the voting from happening.

I left it down for a while and rebooted the server. Then I started the mon 
again and there has still not been a new election. It looks like the reboot 
finally cleared out the problem.

This indicates that it might be a problem with the hardware, although the 
coincidence with the MDS restart is striking and I doubt that its just 
coincidence. Unfortunately, I can't find anything in the logs or health 
monitoring. Also an fsck on the mon store gave nothing.

Since this is a recurring issue, it would be great if someone could take a look 
at the paste https://pastebin.com/hGPvVkuR if there is a clue.

Thanks a lot for your help!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

From: Frank Schilder 
Sent: Thursday, May 4, 2023 1:01 PM
To: Gregory Farnum; Dan van der Ster
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: Frequent calling monitor election

Hi all,

I have to get back to this case. On Monday I had to restart an MDS to get rid 
of a stuck client caps recall. Right after that fail-over, the MONs went into a 
voting frenzy again. I already restarted all of them like last time, but this 
time this doesn't help. I might be in a different case here.

In an effort to collect debug info, I set debug_mon on the leader to 10/10 and 
its producing voluminous output. Unfortunately, while debug_mon=10/10, the 
voting frenzy is not happening. It seems that I'm a bit in the situation 
described with "Tip: When debug output slows down your system, the latency can 
hide race conditions." at 
https://docs.ceph.com/en/octopus/rados/troubleshooting/log-and-debug/.

The election frequency is significantly lower when debug_mon=10/10. I managed 
to catch one though and pasted the 20s before the election happened here: 
https://pastebin.com/hGPvVkuR . I hope there is a clue, I can't see anything 
that sticks out.

Is there anything else I can look for?

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

From: Frank Schilder 
Sent: Thursday, February 9, 2023 5:29 PM
To: Gregory Farnum; Dan van der Ster
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: Frequent calling monitor election

Hi Dan and Gregory,

thanks! These are good pointers. Will look into that tomorrow.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

From: Gregory Farnum 
Sent: 09 February 2023 17:12:23
To: Dan van der Ster
Cc: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Frequent calling monitor election

Also, that the current leader (ceph-01) is one of the monitors
proposing an election each time suggests the problem is with getting
commit acks back from one of its followers.

On Thu, Feb 9, 2023 at 8:09 AM Dan van der Ster  wrote:
>
> Hi Frank,
>
> Check the mon logs with some increased debug levels to find out what
> the leader is busy with.
> We have a similar issue (though, daily) and it turned out to be
> related to the mon leader timing out doing a SMART check.
> See https://tracker.ceph.com/issues/54313 for how I debugged that.
>
> Cheers, Dan
>
> On Thu, Feb 9, 2023 at 7:56 AM Frank Schilder  wrote:
> >
> > Hi all,
> >
> > our monitors

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-04 Thread Frank Schilder

Yep, reading but not using LRC. Please keep it on the ceph user list for future 
reference -- thanks!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Eugen Block 
Sent: Thursday, May 4, 2023 3:07 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Help needed to configure erasure coding LRC plugin

Hi,

I don't think you've shared your osd tree yet, could you do that?
Apparently nobody else but us reads this thread or nobody reading this
uses the LRC plugin. ;-)

Thanks,
Eugen

Zitat von Michel Jouvin :

> Hi,
>
> I had to restart one of my OSD server today and the problem showed
> up again. This time I managed to capture "ceph health detail" output
> showing the problem with the 2 PGs:
>
> [WRN] PG_AVAILABILITY: Reduced data availability: 2 pgs inactive, 2 pgs down
> pg 56.1 is down, acting
> [208,65,73,206,197,193,144,155,178,182,183,133,17,NONE,36,NONE,230,NONE]
> pg 56.12 is down, acting
> [NONE,236,28,228,218,NONE,215,117,203,213,204,115,136,181,171,162,137,128]
>
> I still doesn't understand why, if I am supposed to survive to a
> datacenter failure, I cannot survive to 3 OSDs down on the same
> host, hosting shards for the PG. In the second case it is only 2
> OSDs down but I'm surprised they don't seem in the same "group" of
> OSD (I'd expected all the the OSDs of one datacenter to be in the
> same groupe of 5 if the order given really reflects the allocation
> done...
>
> Still interested by some explanation on what I'm doing wrong! Best regards,
>
> Michel
>
> Le 03/05/2023 à 10:21, Eugen Block a écrit :
>> I think I got it wrong with the locality setting, I'm still limited
>> by the number of hosts I have available in my test cluster, but as
>> far as I got with failure-domain=osd I believe k=6, m=3, l=3 with
>> locality=datacenter could fit your requirement, at least with
>> regards to the recovery bandwidth usage between DCs, but the
>> resiliency would not match your requirement (one DC failure). That
>> profile creates 3 groups of 4 chunks (3 data/coding chunks and one
>> parity chunk) across three DCs, in total 12 chunks. The min_size=7
>> would not allow an entire DC to go down, I'm afraid, you'd have to
>> reduce it to 6 to allow reads/writes in a disaster scenario. I'm
>> still not sure if I got it right this time, but maybe you're better
>> off without the LRC plugin with the limited number of hosts.
>> Instead you could use the jerasure plugin with a profile like k=4
>> m=5 allowing an entire DC to fail without losing data access (we
>> have one customer using that).
>>
>> Zitat von Eugen Block :
>>
>>> Hi,
>>>
>>> disclaimer: I haven't used LRC in a real setup yet, so there might
>>> be some misunderstandings on my side. But I tried to play around
>>> with one of my test clusters (Nautilus). Because I'm limited in
>>> the number of hosts (6 across 3 virtual DCs) I tried two different
>>> profiles with lower numbers to get a feeling for how that works.
>>>
>>> # first attempt
>>> ceph:~ # ceph osd erasure-code-profile set LRCprofile plugin=lrc
>>> k=4 m=2 l=3 crush-failure-domain=host
>>>
>>> For every third OSD one parity chunk is added, so 2 more chunks to
>>> store ==> 8 chunks in total. Since my failure-domain is host and I
>>> only have 6 I get incomplete PGs.
>>>
>>> # second attempt
>>> ceph:~ # ceph osd erasure-code-profile set LRCprofile plugin=lrc
>>> k=2 m=2 l=2 crush-failure-domain=host
>>>
>>> This gives me 6 chunks in total to store across 6 hosts which works:
>>>
>>> ceph:~ # ceph pg ls-by-pool lrcpool
>>> PG   OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES*
>>> OMAP_KEYS* LOG STATESINCE VERSION REPORTED
>>> UPACTING SCRUB_STAMP
>>> DEEP_SCRUB_STAMP
>>> 50.0   10 0   0   619 0  0   1
>>> active+clean   72s 18410'1 18415:54 [27,13,0,2,25,7]p27
>>> [27,13,0,2,25,7]p27 2023-05-02 14:53:54.322135 2023-05-02
>>> 14:53:54.322135
>>> 50.1   00 0   0 0 0  0   0
>>> active+clean6m 0'0 18414:26 [27,33,22,6,13,34]p27
>>> [27,33,22,6,13,34]p27 2023-05-02 14:53:54.322135 2023-05-02
>>> 14:53:54.322135
>>> 50.2   00 0   0 0 0  0   0
>>> active+clean6m 0'0 18413:25 [1,28,14,4,31,21]p1
>>> [1,28,14,4,31,21]p1 2023-05-02 14:53:54.322135 2023-05-02
>>> 14:53:54.322135
>>> 50.3   00 0   0 0 0  0   0
>>> active+clean6m 0'0 18413:24 [8,16,26,33,7,25]p8
>>> [8,16,26,33,7,25]p8 2023-05-02 14:53:54.322135 2023-05-02
>>> 14:53:54.322135
>>>
>>> After stopping all OSDs on one host I was still able to read and
>>> write into the pool, but after stopping a second host one PG from
>>> that pool went "down". That I don't fully understand yet, but I
>>> just started to look into it.
>>> With your setup (12 hosts) I would recommend to not utilize all of
>>> them so you have capacity to recover, let's say one "spare" host
>>> per

[ceph-users] Re: Orchestration seems not to work

2023-05-04 Thread Thomas Widhalm

To completely rule out hung processes, I managed to get another short 
shutdown.


Now I'm seeing lots of:

mgr.server handle_open ignoring open from mds.mds01.ceph01.usujbi 
v2:192.168.23.61:6800/2922006253; not ready for session (expect reconnect)
mgr finish mon failed to return metadata for mds.mds01.ceph02.otvipq: 
(2) No such file or directory


log lines. Seems like it now realises that some of these informations 
are stale. But it looks like it's just waiting for it to come back and 
not do anything about it.


On 04.05.23 14:48, Eugen Block wrote:

Hi,

try setting debug logs for the mgr:

ceph config set mgr mgr/cephadm/log_level debug

This should provide more details what the mgr is trying and where it's 
failing, hopefully. Last week this helped to identify an issue between a 
lower pacific issue for me.
Do you see anything in the cephadm.log pointing to the mgr actually 
trying something?



Zitat von Thomas Widhalm :


Hi,

I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but 
the following problem existed when I was still everywhere on 17.2.5 .


I had a major issue in my cluster which could be solved with a lot of 
your help and even more trial and error. Right now it seems that most 
is already fixed but I can't rule out that there's still some problem 
hidden. The very issue I'm asking about started during the repair.


When I want to orchestrate the cluster, it logs the command but it 
doesn't do anything. No matter if I use ceph dashboard or "ceph orch" 
in "cephadm shell". I don't get any error message when I try to deploy 
new services, redeploy them etc. The log only says "scheduled" and 
that's it. Same when I change placement rules. Usually I use tags. But 
since they don't work anymore, too, I tried host and umanaged. No 
success. The only way I can actually start and stop containers is via 
systemctl from the host itself.


When I run "ceph orch ls" or "ceph orch ps" I see services I deployed 
for testing being deleted (for weeks now). Ans especially a lot of old 
MDS are listed as "error" or "starting". The list doesn't match 
reality at all because I had to start them by hand.


I tried "ceph mgr fail" and even a complete shutdown of the whole 
cluster with all nodes including all mgs, mds even osd - everything 
during a maintenance window. Didn't change anything.


Could you help me? To be honest I'm still rather new to Ceph and since 
I didn't find anything in the logs that caught my eye I would be 
thankful for hints how to debug.


Cheers,
Thomas
--
http://www.widhalm.or.at
GnuPG : 6265BAE6 , A84CB603
Threema: H7AV7D33
Telegram, Signal: widha...@widhalm.or.at



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


OpenPGP_signature
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: pg upmap primary

2023-05-04 Thread Dan van der Ster

Hello,

After you delete the OSD, the now "invalid" upmap rule will be
automatically removed.

Cheers, Dan

__
Clyso GmbH | https://www.clyso.com


On Wed, May 3, 2023 at 10:13 PM Nguetchouang Ngongang Kevin
 wrote:
>
> Hello, I have a question, when happened when i delete a pg on which i
> set a particular osd as primary using the pg-upmap-primary command ?
>
> --
> Nguetchouang Ngongang Kevin
> ENS de Lyon
> https://perso.ens-lyon.fr/kevin.nguetchouang/
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Orchestration seems not to work

2023-05-04 Thread Thomas Widhalm


Hi,

What I'm seeing a lot is this: "[stats WARNING root] cmdtag  not found 
in client metadata" Can't make anything of it but I guess it's not 
showing the initial issue.


Now that I think of it - I started the cluster with 3 nodes which are 
now only used as OSD. Could it be there's something missing on the new 
nodes that are now used as mgr/mon?


Cheers,
Thomas

On 04.05.23 14:48, Eugen Block wrote:

Hi,

try setting debug logs for the mgr:

ceph config set mgr mgr/cephadm/log_level debug

This should provide more details what the mgr is trying and where it's 
failing, hopefully. Last week this helped to identify an issue between a 
lower pacific issue for me.
Do you see anything in the cephadm.log pointing to the mgr actually 
trying something?



Zitat von Thomas Widhalm :


Hi,

I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but 
the following problem existed when I was still everywhere on 17.2.5 .


I had a major issue in my cluster which could be solved with a lot of 
your help and even more trial and error. Right now it seems that most 
is already fixed but I can't rule out that there's still some problem 
hidden. The very issue I'm asking about started during the repair.


When I want to orchestrate the cluster, it logs the command but it 
doesn't do anything. No matter if I use ceph dashboard or "ceph orch" 
in "cephadm shell". I don't get any error message when I try to deploy 
new services, redeploy them etc. The log only says "scheduled" and 
that's it. Same when I change placement rules. Usually I use tags. But 
since they don't work anymore, too, I tried host and umanaged. No 
success. The only way I can actually start and stop containers is via 
systemctl from the host itself.


When I run "ceph orch ls" or "ceph orch ps" I see services I deployed 
for testing being deleted (for weeks now). Ans especially a lot of old 
MDS are listed as "error" or "starting". The list doesn't match 
reality at all because I had to start them by hand.


I tried "ceph mgr fail" and even a complete shutdown of the whole 
cluster with all nodes including all mgs, mds even osd - everything 
during a maintenance window. Didn't change anything.


Could you help me? To be honest I'm still rather new to Ceph and since 
I didn't find anything in the logs that caught my eye I would be 
thankful for hints how to debug.


Cheers,
Thomas
--
http://www.widhalm.or.at
GnuPG : 6265BAE6 , A84CB603
Threema: H7AV7D33
Telegram, Signal: widha...@widhalm.or.at



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


OpenPGP_signature
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Best practice for expanding Ceph cluster

2023-05-04 Thread huxia...@horebdata.cn

Dear Josh,

Thanks a lot. Your clarification really gives me much courage on using pgmap 
tool set for re-balancing.

best regards,

Samuel



huxia...@horebdata.cn
 
From: Josh Baergen
Date: 2023-05-04 15:46
To: huxia...@horebdata.cn
CC: Janne Johansson; ceph-users
Subject: Re: [ceph-users] Re: Best practice for expanding Ceph cluster
Hi Samuel,
 
Both pgremapper and the CERN scripts were developed against Luminous,
and in my experience 12.2.13 has all of the upmap patches needed for
the scheme that Janne outlined to work. However, if you have a complex
CRUSH map sometimes the upmap balancer can struggle, and I think
that's true of any release so far.
 
Josh
 
On Thu, May 4, 2023 at 5:58 AM huxia...@horebdata.cn
 wrote:
>
> Janne,
>
> thanks a lot for the detailed scheme. I totally agree that the upmap approach 
> would be one of best methods, however, my current cluster is working on 
> Luminious 12.2.13 version and upmap seems not work reliably on Lumnious.
>
> samuel
>
>
>
> huxia...@horebdata.cn
>
> From: Janne Johansson
> Date: 2023-05-04 11:56
> To: huxia...@horebdata.cn
> CC: ceph-users
> Subject: Re: [ceph-users] Best practice for expanding Ceph cluster
> Den tors 4 maj 2023 kl 10:39 skrev huxia...@horebdata.cn
> :
> > Dear Ceph folks,
> >
> > I am writing to ask for advice on best practice of expanding ceph cluster. 
> > We are running an 8-node Ceph cluster and RGW, and would like to add 
> > another 10 node, each of which have 10x 12TB HDD. The current 8-node has 
> > ca. 400TB user data.
> >
> > I am wondering whether to add 10 nodes at one shot and let the cluster to 
> > rebalance, or divide into 5 steps, each of which add 2 nodes and rebalance 
> > step by step?  I do not know what would be the advantages or disadvantages 
> > with the one shot scheme vs 5 bataches of adding 2 nodes step-by-step.
> >
> > Any suggestions, experience sharing or advice are highly appreciated.
>
> If you add one or two hosts, it will rebalance involving all hosts to
> even out the data. Then you add two more and it has to even all data
> again more or less. Then two more and all old hosts have to redo the
> same work again.
>
> I would suggest that you add all new hosts and make the OSDs start
> with a super-low initial weight (0.0001 or so), which means they will
> be in and up, but not receive any PGs.
>
> Then you set "noout" and "norebalance" and ceph osd crush reweight the
> new OSDs to their correct size, perhaps with a sleep 30 in between or
> so, to let the dust settle after you change weights.
>
> After all new OSDs are of the correct crush weight, there will be a
> lot of PGs misplaced/remapped but not moving. Now you grab one of the
> programs/scripts[1] which talks to upmap and tells it that every
> misplaced PG actually is where you want it to be. You might need to
> run several times, but it usually goes quite fast on the second/third
> run. Even if it never gets 100% of the PGs happy, it is quite
> sufficient if 95-99% are thinking they are at their correct place.
>
> Now, if you enable the ceph balancer (or already have it enabled) in
> upmap mode and unset "noout" and "norebalance" the mgr balancer will
> take a certain amount of PGs (some 3% by default[2] ) and remove the
> temporary "upmap" setting that says the PG is at the right place even
> when it isn't. This means that the balancer takes a small amount of
> PGs, lets them move to where they actually want to be, then picks a
> few more PGs and repeats until the final destination is correct for
> all PGs, evened out on all OSDs as you wanted.
>
> This is the method that I think has the least impact on client IO,
> scrubs and all that, should be quite safe but will take a while in
> calendar time to finish. The best part is that the admin work needed
> comes only in at the beginning, the rest is automatic.
>
> [1] Tools:
> https://raw.githubusercontent.com/HeinleinSupport/cern-ceph-scripts/master/tools/upmap/upmap-remapped.py
> https://github.com/digitalocean/pgremapper
> I think this one works too, haven't tried it:
> https://github.com/TheJJ/ceph-balancer
>
> [2] Percent to have moving at any moment:
> https://docs.ceph.com/en/latest/rados/operations/balancer/#throttling
>
> --
> May the most significant bit of your life be positive.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Best practice for expanding Ceph cluster

2023-05-04 Thread Josh Baergen

Hi Samuel,

Both pgremapper and the CERN scripts were developed against Luminous,
and in my experience 12.2.13 has all of the upmap patches needed for
the scheme that Janne outlined to work. However, if you have a complex
CRUSH map sometimes the upmap balancer can struggle, and I think
that's true of any release so far.

Josh

On Thu, May 4, 2023 at 5:58 AM huxia...@horebdata.cn
 wrote:
>
> Janne,
>
> thanks a lot for the detailed scheme. I totally agree that the upmap approach 
> would be one of best methods, however, my current cluster is working on 
> Luminious 12.2.13 version and upmap seems not work reliably on Lumnious.
>
> samuel
>
>
>
> huxia...@horebdata.cn
>
> From: Janne Johansson
> Date: 2023-05-04 11:56
> To: huxia...@horebdata.cn
> CC: ceph-users
> Subject: Re: [ceph-users] Best practice for expanding Ceph cluster
> Den tors 4 maj 2023 kl 10:39 skrev huxia...@horebdata.cn
> :
> > Dear Ceph folks,
> >
> > I am writing to ask for advice on best practice of expanding ceph cluster. 
> > We are running an 8-node Ceph cluster and RGW, and would like to add 
> > another 10 node, each of which have 10x 12TB HDD. The current 8-node has 
> > ca. 400TB user data.
> >
> > I am wondering whether to add 10 nodes at one shot and let the cluster to 
> > rebalance, or divide into 5 steps, each of which add 2 nodes and rebalance 
> > step by step?  I do not know what would be the advantages or disadvantages 
> > with the one shot scheme vs 5 bataches of adding 2 nodes step-by-step.
> >
> > Any suggestions, experience sharing or advice are highly appreciated.
>
> If you add one or two hosts, it will rebalance involving all hosts to
> even out the data. Then you add two more and it has to even all data
> again more or less. Then two more and all old hosts have to redo the
> same work again.
>
> I would suggest that you add all new hosts and make the OSDs start
> with a super-low initial weight (0.0001 or so), which means they will
> be in and up, but not receive any PGs.
>
> Then you set "noout" and "norebalance" and ceph osd crush reweight the
> new OSDs to their correct size, perhaps with a sleep 30 in between or
> so, to let the dust settle after you change weights.
>
> After all new OSDs are of the correct crush weight, there will be a
> lot of PGs misplaced/remapped but not moving. Now you grab one of the
> programs/scripts[1] which talks to upmap and tells it that every
> misplaced PG actually is where you want it to be. You might need to
> run several times, but it usually goes quite fast on the second/third
> run. Even if it never gets 100% of the PGs happy, it is quite
> sufficient if 95-99% are thinking they are at their correct place.
>
> Now, if you enable the ceph balancer (or already have it enabled) in
> upmap mode and unset "noout" and "norebalance" the mgr balancer will
> take a certain amount of PGs (some 3% by default[2] ) and remove the
> temporary "upmap" setting that says the PG is at the right place even
> when it isn't. This means that the balancer takes a small amount of
> PGs, lets them move to where they actually want to be, then picks a
> few more PGs and repeats until the final destination is correct for
> all PGs, evened out on all OSDs as you wanted.
>
> This is the method that I think has the least impact on client IO,
> scrubs and all that, should be quite safe but will take a while in
> calendar time to finish. The best part is that the admin work needed
> comes only in at the beginning, the rest is automatic.
>
> [1] Tools:
> https://raw.githubusercontent.com/HeinleinSupport/cern-ceph-scripts/master/tools/upmap/upmap-remapped.py
> https://github.com/digitalocean/pgremapper
> I think this one works too, haven't tried it:
> https://github.com/TheJJ/ceph-balancer
>
> [2] Percent to have moving at any moment:
> https://docs.ceph.com/en/latest/rados/operations/balancer/#throttling
>
> --
> May the most significant bit of your life be positive.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Orchestration seems not to work

2023-05-04 Thread Thomas Widhalm


Thanks.

I set the log level to debug, try a few steps and then come back.

On 04.05.23 14:48, Eugen Block wrote:

Hi,

try setting debug logs for the mgr:

ceph config set mgr mgr/cephadm/log_level debug

This should provide more details what the mgr is trying and where it's 
failing, hopefully. Last week this helped to identify an issue between a 
lower pacific issue for me.
Do you see anything in the cephadm.log pointing to the mgr actually 
trying something?



Zitat von Thomas Widhalm :


Hi,

I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but 
the following problem existed when I was still everywhere on 17.2.5 .


I had a major issue in my cluster which could be solved with a lot of 
your help and even more trial and error. Right now it seems that most 
is already fixed but I can't rule out that there's still some problem 
hidden. The very issue I'm asking about started during the repair.


When I want to orchestrate the cluster, it logs the command but it 
doesn't do anything. No matter if I use ceph dashboard or "ceph orch" 
in "cephadm shell". I don't get any error message when I try to deploy 
new services, redeploy them etc. The log only says "scheduled" and 
that's it. Same when I change placement rules. Usually I use tags. But 
since they don't work anymore, too, I tried host and umanaged. No 
success. The only way I can actually start and stop containers is via 
systemctl from the host itself.


When I run "ceph orch ls" or "ceph orch ps" I see services I deployed 
for testing being deleted (for weeks now). Ans especially a lot of old 
MDS are listed as "error" or "starting". The list doesn't match 
reality at all because I had to start them by hand.


I tried "ceph mgr fail" and even a complete shutdown of the whole 
cluster with all nodes including all mgs, mds even osd - everything 
during a maintenance window. Didn't change anything.


Could you help me? To be honest I'm still rather new to Ceph and since 
I didn't find anything in the logs that caught my eye I would be 
thankful for hints how to debug.


Cheers,
Thomas
--
http://www.widhalm.or.at
GnuPG : 6265BAE6 , A84CB603
Threema: H7AV7D33
Telegram, Signal: widha...@widhalm.or.at



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


OpenPGP_signature
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Orchestration seems not to work

2023-05-04 Thread Thomas Widhalm


Thanks for the reply.

"Refreshed" is "3 weeks ago" on most lines. The running mds and 
osd.cost_capacity are both "-" in this column.


I'm already done with "mgr fail", that didn't do anything. And I even 
tried a complete shut down during a maintenance windows that was not 3 
weeks ago but last week.


So this doesn't seem to help. Thanks anyway. The only thing could be 
that the command was started by a systemd service again. But I can't 
imagine that.


On 04.05.23 15:05, Adam King wrote:
First thing I always check when it seems like orchestrator commands 
aren't doing anything is "ceph orch ps" and "ceph orch device ls" and 
check the REFRESHED column. If it's well above 10 minutes for orch ps or 
30 minutes for orch device ls, then it means the orchestrator is most 
likely hanging on some command to refresh the host information. If 
that's the case, you can follow up with a "ceph mgr fail", wait a few 
minutes and check the orch ps and device ls REFRESHED column again. If 
only certain hosts are not having their daemon/device information 
refreshed, you can go to the hosts that aren't having their info 
refreshed and check for hanging "cephadm" commands (I just check for "ps 
aux | grep cephadm").


On Thu, May 4, 2023 at 8:38 AM Thomas Widhalm > wrote:


Hi,

I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but
the
following problem existed when I was still everywhere on 17.2.5 .

I had a major issue in my cluster which could be solved with a lot of
your help and even more trial and error. Right now it seems that
most is
already fixed but I can't rule out that there's still some problem
hidden. The very issue I'm asking about started during the repair.

When I want to orchestrate the cluster, it logs the command but it
doesn't do anything. No matter if I use ceph dashboard or "ceph
orch" in
"cephadm shell". I don't get any error message when I try to deploy new
services, redeploy them etc. The log only says "scheduled" and that's
it. Same when I change placement rules. Usually I use tags. But since
they don't work anymore, too, I tried host and umanaged. No success.
The
only way I can actually start and stop containers is via systemctl from
the host itself.

When I run "ceph orch ls" or "ceph orch ps" I see services I deployed
for testing being deleted (for weeks now). Ans especially a lot of old
MDS are listed as "error" or "starting". The list doesn't match reality
at all because I had to start them by hand.

I tried "ceph mgr fail" and even a complete shutdown of the whole
cluster with all nodes including all mgs, mds even osd - everything
during a maintenance window. Didn't change anything.

Could you help me? To be honest I'm still rather new to Ceph and
since I
didn't find anything in the logs that caught my eye I would be thankful
for hints how to debug.

Cheers,
Thomas
-- 
http://www.widhalm.or.at 

GnuPG : 6265BAE6 , A84CB603
Threema: H7AV7D33
Telegram, Signal: widha...@widhalm.or.at 
___
ceph-users mailing list -- ceph-users@ceph.io

To unsubscribe send an email to ceph-users-le...@ceph.io




OpenPGP_signature
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-04 Thread Eugen Block


Hi,

I don't think you've shared your osd tree yet, could you do that?  
Apparently nobody else but us reads this thread or nobody reading this  
uses the LRC plugin. ;-)


Thanks,
Eugen

Zitat von Michel Jouvin :


Hi,

I had to restart one of my OSD server today and the problem showed  
up again. This time I managed to capture "ceph health detail" output  
showing the problem with the 2 PGs:


[WRN] PG_AVAILABILITY: Reduced data availability: 2 pgs inactive, 2 pgs down
    pg 56.1 is down, acting  
[208,65,73,206,197,193,144,155,178,182,183,133,17,NONE,36,NONE,230,NONE]
    pg 56.12 is down, acting  
[NONE,236,28,228,218,NONE,215,117,203,213,204,115,136,181,171,162,137,128]


I still doesn't understand why, if I am supposed to survive to a  
datacenter failure, I cannot survive to 3 OSDs down on the same  
host, hosting shards for the PG. In the second case it is only 2  
OSDs down but I'm surprised they don't seem in the same "group" of  
OSD (I'd expected all the the OSDs of one datacenter to be in the  
same groupe of 5 if the order given really reflects the allocation  
done...


Still interested by some explanation on what I'm doing wrong! Best regards,

Michel

Le 03/05/2023 à 10:21, Eugen Block a écrit :
I think I got it wrong with the locality setting, I'm still limited  
by the number of hosts I have available in my test cluster, but as  
far as I got with failure-domain=osd I believe k=6, m=3, l=3 with  
locality=datacenter could fit your requirement, at least with  
regards to the recovery bandwidth usage between DCs, but the  
resiliency would not match your requirement (one DC failure). That  
profile creates 3 groups of 4 chunks (3 data/coding chunks and one  
parity chunk) across three DCs, in total 12 chunks. The min_size=7  
would not allow an entire DC to go down, I'm afraid, you'd have to  
reduce it to 6 to allow reads/writes in a disaster scenario. I'm  
still not sure if I got it right this time, but maybe you're better  
off without the LRC plugin with the limited number of hosts.  
Instead you could use the jerasure plugin with a profile like k=4  
m=5 allowing an entire DC to fail without losing data access (we  
have one customer using that).


Zitat von Eugen Block :


Hi,

disclaimer: I haven't used LRC in a real setup yet, so there might  
be some misunderstandings on my side. But I tried to play around  
with one of my test clusters (Nautilus). Because I'm limited in  
the number of hosts (6 across 3 virtual DCs) I tried two different  
profiles with lower numbers to get a feeling for how that works.


# first attempt
ceph:~ # ceph osd erasure-code-profile set LRCprofile plugin=lrc  
k=4 m=2 l=3 crush-failure-domain=host


For every third OSD one parity chunk is added, so 2 more chunks to  
store ==> 8 chunks in total. Since my failure-domain is host and I  
only have 6 I get incomplete PGs.


# second attempt
ceph:~ # ceph osd erasure-code-profile set LRCprofile plugin=lrc  
k=2 m=2 l=2 crush-failure-domain=host


This gives me 6 chunks in total to store across 6 hosts which works:

ceph:~ # ceph pg ls-by-pool lrcpool
PG   OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES*  
OMAP_KEYS* LOG STATE    SINCE VERSION REPORTED  
UP    ACTING SCRUB_STAMP     
DEEP_SCRUB_STAMP
50.0   1    0 0   0   619 0  0   1  
active+clean   72s 18410'1 18415:54 [27,13,0,2,25,7]p27    
[27,13,0,2,25,7]p27 2023-05-02 14:53:54.322135 2023-05-02  
14:53:54.322135
50.1   0    0 0   0 0 0  0   0  
active+clean    6m 0'0 18414:26 [27,33,22,6,13,34]p27  
[27,33,22,6,13,34]p27 2023-05-02 14:53:54.322135 2023-05-02  
14:53:54.322135
50.2   0    0 0   0 0 0  0   0  
active+clean    6m 0'0 18413:25 [1,28,14,4,31,21]p1    
[1,28,14,4,31,21]p1 2023-05-02 14:53:54.322135 2023-05-02  
14:53:54.322135
50.3   0    0 0   0 0 0  0   0  
active+clean    6m 0'0 18413:24 [8,16,26,33,7,25]p8    
[8,16,26,33,7,25]p8 2023-05-02 14:53:54.322135 2023-05-02  
14:53:54.322135


After stopping all OSDs on one host I was still able to read and  
write into the pool, but after stopping a second host one PG from  
that pool went "down". That I don't fully understand yet, but I  
just started to look into it.
With your setup (12 hosts) I would recommend to not utilize all of  
them so you have capacity to recover, let's say one "spare" host  
per DC, leaving 9 hosts in total. A profile with k=3 m=3 l=2 could  
make sense here, resulting in 9 total chunks (one more parity  
chunks for every other OSD), min_size 4. But as I wrote, it  
probably doesn't have the resiliency for a DC failure, so that  
needs some further investigation.


Regards,
Eugen

Zitat von Michel Jouvin :


Hi,

No... our current setup is 3 datacenters with the same  
configuration, i.e. 1 mon/mgr + 4 OSD servers with 16 OSDs each.  
Thus the total of 12 OSDs servers. As with LRC plugin, k+m

[ceph-users] Re: Orchestration seems not to work

2023-05-04 Thread Adam King

First thing I always check when it seems like orchestrator commands aren't
doing anything is "ceph orch ps" and "ceph orch device ls" and check the
REFRESHED column. If it's well above 10 minutes for orch ps or 30 minutes
for orch device ls, then it means the orchestrator is most likely hanging
on some command to refresh the host information. If that's the case, you
can follow up with a "ceph mgr fail", wait a few minutes and check the orch
ps and device ls REFRESHED column again. If only certain hosts are not
having their daemon/device information refreshed, you can go to the hosts
that aren't having their info refreshed and check for hanging "cephadm"
commands (I just check for "ps aux | grep cephadm").

On Thu, May 4, 2023 at 8:38 AM Thomas Widhalm 
wrote:

> Hi,
>
> I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but the
> following problem existed when I was still everywhere on 17.2.5 .
>
> I had a major issue in my cluster which could be solved with a lot of
> your help and even more trial and error. Right now it seems that most is
> already fixed but I can't rule out that there's still some problem
> hidden. The very issue I'm asking about started during the repair.
>
> When I want to orchestrate the cluster, it logs the command but it
> doesn't do anything. No matter if I use ceph dashboard or "ceph orch" in
> "cephadm shell". I don't get any error message when I try to deploy new
> services, redeploy them etc. The log only says "scheduled" and that's
> it. Same when I change placement rules. Usually I use tags. But since
> they don't work anymore, too, I tried host and umanaged. No success. The
> only way I can actually start and stop containers is via systemctl from
> the host itself.
>
> When I run "ceph orch ls" or "ceph orch ps" I see services I deployed
> for testing being deleted (for weeks now). Ans especially a lot of old
> MDS are listed as "error" or "starting". The list doesn't match reality
> at all because I had to start them by hand.
>
> I tried "ceph mgr fail" and even a complete shutdown of the whole
> cluster with all nodes including all mgs, mds even osd - everything
> during a maintenance window. Didn't change anything.
>
> Could you help me? To be honest I'm still rather new to Ceph and since I
> didn't find anything in the logs that caught my eye I would be thankful
> for hints how to debug.
>
> Cheers,
> Thomas
> --
> http://www.widhalm.or.at
> GnuPG : 6265BAE6 , A84CB603
> Threema: H7AV7D33
> Telegram, Signal: widha...@widhalm.or.at
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Orchestration seems not to work

2023-05-04 Thread Eugen Block


Hi,

try setting debug logs for the mgr:

ceph config set mgr mgr/cephadm/log_level debug

This should provide more details what the mgr is trying and where it's  
failing, hopefully. Last week this helped to identify an issue between  
a lower pacific issue for me.
Do you see anything in the cephadm.log pointing to the mgr actually  
trying something?



Zitat von Thomas Widhalm :


Hi,

I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but  
the following problem existed when I was still everywhere on 17.2.5 .


I had a major issue in my cluster which could be solved with a lot  
of your help and even more trial and error. Right now it seems that  
most is already fixed but I can't rule out that there's still some  
problem hidden. The very issue I'm asking about started during the  
repair.


When I want to orchestrate the cluster, it logs the command but it  
doesn't do anything. No matter if I use ceph dashboard or "ceph  
orch" in "cephadm shell". I don't get any error message when I try  
to deploy new services, redeploy them etc. The log only says  
"scheduled" and that's it. Same when I change placement rules.  
Usually I use tags. But since they don't work anymore, too, I tried  
host and umanaged. No success. The only way I can actually start and  
stop containers is via systemctl from the host itself.


When I run "ceph orch ls" or "ceph orch ps" I see services I  
deployed for testing being deleted (for weeks now). Ans especially a  
lot of old MDS are listed as "error" or "starting". The list doesn't  
match reality at all because I had to start them by hand.


I tried "ceph mgr fail" and even a complete shutdown of the whole  
cluster with all nodes including all mgs, mds even osd - everything  
during a maintenance window. Didn't change anything.


Could you help me? To be honest I'm still rather new to Ceph and  
since I didn't find anything in the logs that caught my eye I would  
be thankful for hints how to debug.


Cheers,
Thomas
--
http://www.widhalm.or.at
GnuPG : 6265BAE6 , A84CB603
Threema: H7AV7D33
Telegram, Signal: widha...@widhalm.or.at



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Orchestration seems not to work

2023-05-04 Thread Thomas Widhalm


Hi,

I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but the 
following problem existed when I was still everywhere on 17.2.5 .


I had a major issue in my cluster which could be solved with a lot of 
your help and even more trial and error. Right now it seems that most is 
already fixed but I can't rule out that there's still some problem 
hidden. The very issue I'm asking about started during the repair.


When I want to orchestrate the cluster, it logs the command but it 
doesn't do anything. No matter if I use ceph dashboard or "ceph orch" in 
"cephadm shell". I don't get any error message when I try to deploy new 
services, redeploy them etc. The log only says "scheduled" and that's 
it. Same when I change placement rules. Usually I use tags. But since 
they don't work anymore, too, I tried host and umanaged. No success. The 
only way I can actually start and stop containers is via systemctl from 
the host itself.


When I run "ceph orch ls" or "ceph orch ps" I see services I deployed 
for testing being deleted (for weeks now). Ans especially a lot of old 
MDS are listed as "error" or "starting". The list doesn't match reality 
at all because I had to start them by hand.


I tried "ceph mgr fail" and even a complete shutdown of the whole 
cluster with all nodes including all mgs, mds even osd - everything 
during a maintenance window. Didn't change anything.


Could you help me? To be honest I'm still rather new to Ceph and since I 
didn't find anything in the logs that caught my eye I would be thankful 
for hints how to debug.


Cheers,
Thomas
--
http://www.widhalm.or.at
GnuPG : 6265BAE6 , A84CB603
Threema: H7AV7D33
Telegram, Signal: widha...@widhalm.or.at


OpenPGP_signature
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Frequent calling monitor election

2023-05-04 Thread Frank Schilder

Hi all,

I think I can reduce the defcon level a bit. Since I couldn't see something in 
the mon log, I started to try if its a specific mon that causes trouble by 
shutting one by one down for a while. I got lucky at the first try. Shutting 
down the leader stopped the voting from happening.

I left it down for a while and rebooted the server. Then I started the mon 
again and there has still not been a new election. It looks like the reboot 
finally cleared out the problem.

This indicates that it might be a problem with the hardware, although the 
coincidence with the MDS restart is striking and I doubt that its just 
coincidence. Unfortunately, I can't find anything in the logs or health 
monitoring. Also an fsck on the mon store gave nothing.

Since this is a recurring issue, it would be great if someone could take a look 
at the paste https://pastebin.com/hGPvVkuR if there is a clue.

Thanks a lot for your help!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

From: Frank Schilder 
Sent: Thursday, May 4, 2023 1:01 PM
To: Gregory Farnum; Dan van der Ster
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: Frequent calling monitor election

Hi all,

I have to get back to this case. On Monday I had to restart an MDS to get rid 
of a stuck client caps recall. Right after that fail-over, the MONs went into a 
voting frenzy again. I already restarted all of them like last time, but this 
time this doesn't help. I might be in a different case here.

In an effort to collect debug info, I set debug_mon on the leader to 10/10 and 
its producing voluminous output. Unfortunately, while debug_mon=10/10, the 
voting frenzy is not happening. It seems that I'm a bit in the situation 
described with "Tip: When debug output slows down your system, the latency can 
hide race conditions." at 
https://docs.ceph.com/en/octopus/rados/troubleshooting/log-and-debug/.

The election frequency is significantly lower when debug_mon=10/10. I managed 
to catch one though and pasted the 20s before the election happened here: 
https://pastebin.com/hGPvVkuR . I hope there is a clue, I can't see anything 
that sticks out.

Is there anything else I can look for?

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

From: Frank Schilder 
Sent: Thursday, February 9, 2023 5:29 PM
To: Gregory Farnum; Dan van der Ster
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: Frequent calling monitor election

Hi Dan and Gregory,

thanks! These are good pointers. Will look into that tomorrow.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

From: Gregory Farnum 
Sent: 09 February 2023 17:12:23
To: Dan van der Ster
Cc: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Frequent calling monitor election

Also, that the current leader (ceph-01) is one of the monitors
proposing an election each time suggests the problem is with getting
commit acks back from one of its followers.

On Thu, Feb 9, 2023 at 8:09 AM Dan van der Ster  wrote:
>
> Hi Frank,
>
> Check the mon logs with some increased debug levels to find out what
> the leader is busy with.
> We have a similar issue (though, daily) and it turned out to be
> related to the mon leader timing out doing a SMART check.
> See https://tracker.ceph.com/issues/54313 for how I debugged that.
>
> Cheers, Dan
>
> On Thu, Feb 9, 2023 at 7:56 AM Frank Schilder  wrote:
> >
> > Hi all,
> >
> > our monitors have enjoyed democracy since the beginning. However, I don't 
> > share a sudden excitement about voting:
> >
> > 2/9/23 4:42:30 PM[INF]overall HEALTH_OK
> > 2/9/23 4:42:30 PM[INF]mon.ceph-01 is new leader, mons 
> > ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 in quorum (ranks 0,1,2,3,4)
> > 2/9/23 4:42:26 PM[INF]mon.ceph-01 calling monitor election
> > 2/9/23 4:42:26 PM[INF]mon.ceph-26 calling monitor election
> > 2/9/23 4:42:26 PM[INF]mon.ceph-25 calling monitor election
> > 2/9/23 4:42:26 PM[INF]mon.ceph-02 calling monitor election
> > 2/9/23 4:40:00 PM[INF]overall HEALTH_OK
> > 2/9/23 4:30:00 PM[INF]overall HEALTH_OK
> > 2/9/23 4:24:34 PM[INF]overall HEALTH_OK
> > 2/9/23 4:24:34 PM[INF]mon.ceph-01 is new leader, mons 
> > ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 in quorum (ranks 0,1,2,3,4)
> > 2/9/23 4:24:29 PM[INF]mon.ceph-01 calling monitor election
> > 2/9/23 4:24:29 PM[INF]mon.ceph-02 calling monitor election
> > 2/9/23 4:24:29 PM[INF]mon.ceph-03 calling monitor election
> > 2/9/23 4:24:29 PM[INF]mon.ceph-01 calling monitor election
> > 2/9/23 4:24:29 PM[INF]mon.ceph-26 calling monitor election
> > 2/9/23 4:24:29 PM[INF]mon.ceph-25 calling monitor election
> > 2/9/23 4:24:29 PM[INF]mon.ceph-02 calling monitor election
> > 2/9/23 4:24:04 PM[INF]overall HEALTH_OK
> > 2/9/23 4:24:03 PM[INF]mon.ceph-01 is new leader, mons 
> > ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 in quorum

[ceph-users] Re: Best practice for expanding Ceph cluster

2023-05-04 Thread huxia...@horebdata.cn

Janne,

thanks a lot for the detailed scheme. I totally agree that the upmap approach 
would be one of best methods, however, my current cluster is working on 
Luminious 12.2.13 version and upmap seems not work reliably on Lumnious.

samuel



huxia...@horebdata.cn
 
From: Janne Johansson
Date: 2023-05-04 11:56
To: huxia...@horebdata.cn
CC: ceph-users
Subject: Re: [ceph-users] Best practice for expanding Ceph cluster
Den tors 4 maj 2023 kl 10:39 skrev huxia...@horebdata.cn
:
> Dear Ceph folks,
>
> I am writing to ask for advice on best practice of expanding ceph cluster. We 
> are running an 8-node Ceph cluster and RGW, and would like to add another 10 
> node, each of which have 10x 12TB HDD. The current 8-node has ca. 400TB user 
> data.
>
> I am wondering whether to add 10 nodes at one shot and let the cluster to 
> rebalance, or divide into 5 steps, each of which add 2 nodes and rebalance 
> step by step?  I do not know what would be the advantages or disadvantages 
> with the one shot scheme vs 5 bataches of adding 2 nodes step-by-step.
>
> Any suggestions, experience sharing or advice are highly appreciated.
 
If you add one or two hosts, it will rebalance involving all hosts to
even out the data. Then you add two more and it has to even all data
again more or less. Then two more and all old hosts have to redo the
same work again.
 
I would suggest that you add all new hosts and make the OSDs start
with a super-low initial weight (0.0001 or so), which means they will
be in and up, but not receive any PGs.
 
Then you set "noout" and "norebalance" and ceph osd crush reweight the
new OSDs to their correct size, perhaps with a sleep 30 in between or
so, to let the dust settle after you change weights.
 
After all new OSDs are of the correct crush weight, there will be a
lot of PGs misplaced/remapped but not moving. Now you grab one of the
programs/scripts[1] which talks to upmap and tells it that every
misplaced PG actually is where you want it to be. You might need to
run several times, but it usually goes quite fast on the second/third
run. Even if it never gets 100% of the PGs happy, it is quite
sufficient if 95-99% are thinking they are at their correct place.
 
Now, if you enable the ceph balancer (or already have it enabled) in
upmap mode and unset "noout" and "norebalance" the mgr balancer will
take a certain amount of PGs (some 3% by default[2] ) and remove the
temporary "upmap" setting that says the PG is at the right place even
when it isn't. This means that the balancer takes a small amount of
PGs, lets them move to where they actually want to be, then picks a
few more PGs and repeats until the final destination is correct for
all PGs, evened out on all OSDs as you wanted.
 
This is the method that I think has the least impact on client IO,
scrubs and all that, should be quite safe but will take a while in
calendar time to finish. The best part is that the admin work needed
comes only in at the beginning, the rest is automatic.
 
[1] Tools:
https://raw.githubusercontent.com/HeinleinSupport/cern-ceph-scripts/master/tools/upmap/upmap-remapped.py
https://github.com/digitalocean/pgremapper
I think this one works too, haven't tried it:
https://github.com/TheJJ/ceph-balancer
 
[2] Percent to have moving at any moment:
https://docs.ceph.com/en/latest/rados/operations/balancer/#throttling
 
-- 
May the most significant bit of your life be positive.
 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: rbd map: corrupt full osdmap (-22) when

2023-05-04 Thread Ilya Dryomov

On Thu, May 4, 2023 at 11:27 AM Kamil Madac  wrote:
>
> Thanks for the info.
>
> As a solution we used rbd-nbd which works fine without any issues. If we will 
> have time we will also try to disable ipv4 on the cluster and will try kernel 
> rbd mapping again. Are there any disadvantages when using NBD instead of 
> kernel driver?

Ceph doesn't really support dual stack configurations.  It's not
something that is tested: even if it happens to work for some use case
today, it can very well break tomorrow.  The kernel client just makes
that very explicit ;)

rbd-nbd is less performant and historically also less stable (although
that might have changed in recent kernels as a bunch of work went into
the NBD driver upstream).  It's also heavier on resource usage but that
won't be noticeable/can be disregarded if you are not mapping dozens of
RBD images on a single node.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Frequent calling monitor election

2023-05-04 Thread Frank Schilder

Hi all,

I have to get back to this case. On Monday I had to restart an MDS to get rid 
of a stuck client caps recall. Right after that fail-over, the MONs went into a 
voting frenzy again. I already restarted all of them like last time, but this 
time this doesn't help. I might be in a different case here.

In an effort to collect debug info, I set debug_mon on the leader to 10/10 and 
its producing voluminous output. Unfortunately, while debug_mon=10/10, the 
voting frenzy is not happening. It seems that I'm a bit in the situation 
described with "Tip: When debug output slows down your system, the latency can 
hide race conditions." at 
https://docs.ceph.com/en/octopus/rados/troubleshooting/log-and-debug/.

The election frequency is significantly lower when debug_mon=10/10. I managed 
to catch one though and pasted the 20s before the election happened here: 
https://pastebin.com/hGPvVkuR . I hope there is a clue, I can't see anything 
that sticks out.

Is there anything else I can look for?

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: Thursday, February 9, 2023 5:29 PM
To: Gregory Farnum; Dan van der Ster
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: Frequent calling monitor election

Hi Dan and Gregory,

thanks! These are good pointers. Will look into that tomorrow.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Gregory Farnum 
Sent: 09 February 2023 17:12:23
To: Dan van der Ster
Cc: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Frequent calling monitor election

Also, that the current leader (ceph-01) is one of the monitors
proposing an election each time suggests the problem is with getting
commit acks back from one of its followers.

On Thu, Feb 9, 2023 at 8:09 AM Dan van der Ster  wrote:
>
> Hi Frank,
>
> Check the mon logs with some increased debug levels to find out what
> the leader is busy with.
> We have a similar issue (though, daily) and it turned out to be
> related to the mon leader timing out doing a SMART check.
> See https://tracker.ceph.com/issues/54313 for how I debugged that.
>
> Cheers, Dan
>
> On Thu, Feb 9, 2023 at 7:56 AM Frank Schilder  wrote:
> >
> > Hi all,
> >
> > our monitors have enjoyed democracy since the beginning. However, I don't 
> > share a sudden excitement about voting:
> >
> > 2/9/23 4:42:30 PM[INF]overall HEALTH_OK
> > 2/9/23 4:42:30 PM[INF]mon.ceph-01 is new leader, mons 
> > ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 in quorum (ranks 0,1,2,3,4)
> > 2/9/23 4:42:26 PM[INF]mon.ceph-01 calling monitor election
> > 2/9/23 4:42:26 PM[INF]mon.ceph-26 calling monitor election
> > 2/9/23 4:42:26 PM[INF]mon.ceph-25 calling monitor election
> > 2/9/23 4:42:26 PM[INF]mon.ceph-02 calling monitor election
> > 2/9/23 4:40:00 PM[INF]overall HEALTH_OK
> > 2/9/23 4:30:00 PM[INF]overall HEALTH_OK
> > 2/9/23 4:24:34 PM[INF]overall HEALTH_OK
> > 2/9/23 4:24:34 PM[INF]mon.ceph-01 is new leader, mons 
> > ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 in quorum (ranks 0,1,2,3,4)
> > 2/9/23 4:24:29 PM[INF]mon.ceph-01 calling monitor election
> > 2/9/23 4:24:29 PM[INF]mon.ceph-02 calling monitor election
> > 2/9/23 4:24:29 PM[INF]mon.ceph-03 calling monitor election
> > 2/9/23 4:24:29 PM[INF]mon.ceph-01 calling monitor election
> > 2/9/23 4:24:29 PM[INF]mon.ceph-26 calling monitor election
> > 2/9/23 4:24:29 PM[INF]mon.ceph-25 calling monitor election
> > 2/9/23 4:24:29 PM[INF]mon.ceph-02 calling monitor election
> > 2/9/23 4:24:04 PM[INF]overall HEALTH_OK
> > 2/9/23 4:24:03 PM[INF]mon.ceph-01 is new leader, mons 
> > ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 in quorum (ranks 0,1,2,3,4)
> > 2/9/23 4:23:59 PM[INF]mon.ceph-01 calling monitor election
> > 2/9/23 4:23:59 PM[INF]mon.ceph-02 calling monitor election
> > 2/9/23 4:20:00 PM[INF]overall HEALTH_OK
> > 2/9/23 4:10:00 PM[INF]overall HEALTH_OK
> > 2/9/23 4:00:00 PM[INF]overall HEALTH_OK
> > 2/9/23 3:50:00 PM[INF]overall HEALTH_OK
> > 2/9/23 3:43:13 PM[INF]overall HEALTH_OK
> > 2/9/23 3:43:13 PM[INF]mon.ceph-01 is new leader, mons 
> > ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 in quorum (ranks 0,1,2,3,4)
> > 2/9/23 3:43:08 PM[INF]mon.ceph-01 calling monitor election
> > 2/9/23 3:43:08 PM[INF]mon.ceph-26 calling monitor election
> > 2/9/23 3:43:08 PM[INF]mon.ceph-25 calling monitor election
> >
> > We moved a switch from one rack to another and after the switch came beck 
> > up, the monitors frequently bitch about who is the alpha. How do I get them 
> > to focus more on their daily duties again?
> >
> > Thanks for any help!
> > =
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
>

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-04 Thread Michel Jouvin


Hi,

I had to restart one of my OSD server today and the problem showed up 
again. This time I managed to capture "ceph health detail" output 
showing the problem with the 2 PGs:


[WRN] PG_AVAILABILITY: Reduced data availability: 2 pgs inactive, 2 pgs down
    pg 56.1 is down, acting 
[208,65,73,206,197,193,144,155,178,182,183,133,17,NONE,36,NONE,230,NONE]
    pg 56.12 is down, acting 
[NONE,236,28,228,218,NONE,215,117,203,213,204,115,136,181,171,162,137,128]


I still doesn't understand why, if I am supposed to survive to a 
datacenter failure, I cannot survive to 3 OSDs down on the same host, 
hosting shards for the PG. In the second case it is only 2 OSDs down but 
I'm surprised they don't seem in the same "group" of OSD (I'd expected 
all the the OSDs of one datacenter to be in the same groupe of 5 if the 
order given really reflects the allocation done...


Still interested by some explanation on what I'm doing wrong! Best regards,

Michel

Le 03/05/2023 à 10:21, Eugen Block a écrit :
I think I got it wrong with the locality setting, I'm still limited by 
the number of hosts I have available in my test cluster, but as far as 
I got with failure-domain=osd I believe k=6, m=3, l=3 with 
locality=datacenter could fit your requirement, at least with regards 
to the recovery bandwidth usage between DCs, but the resiliency would 
not match your requirement (one DC failure). That profile creates 3 
groups of 4 chunks (3 data/coding chunks and one parity chunk) across 
three DCs, in total 12 chunks. The min_size=7 would not allow an 
entire DC to go down, I'm afraid, you'd have to reduce it to 6 to 
allow reads/writes in a disaster scenario. I'm still not sure if I got 
it right this time, but maybe you're better off without the LRC plugin 
with the limited number of hosts. Instead you could use the jerasure 
plugin with a profile like k=4 m=5 allowing an entire DC to fail 
without losing data access (we have one customer using that).


Zitat von Eugen Block :


Hi,

disclaimer: I haven't used LRC in a real setup yet, so there might be 
some misunderstandings on my side. But I tried to play around with 
one of my test clusters (Nautilus). Because I'm limited in the number 
of hosts (6 across 3 virtual DCs) I tried two different profiles with 
lower numbers to get a feeling for how that works.


# first attempt
ceph:~ # ceph osd erasure-code-profile set LRCprofile plugin=lrc k=4 
m=2 l=3 crush-failure-domain=host


For every third OSD one parity chunk is added, so 2 more chunks to 
store ==> 8 chunks in total. Since my failure-domain is host and I 
only have 6 I get incomplete PGs.


# second attempt
ceph:~ # ceph osd erasure-code-profile set LRCprofile plugin=lrc k=2 
m=2 l=2 crush-failure-domain=host


This gives me 6 chunks in total to store across 6 hosts which works:

ceph:~ # ceph pg ls-by-pool lrcpool
PG   OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* 
LOG STATE    SINCE VERSION REPORTED UP    ACTING 
SCRUB_STAMP    DEEP_SCRUB_STAMP
50.0   1    0 0   0   619 0  0   1 
active+clean   72s 18410'1 18415:54 [27,13,0,2,25,7]p27   
[27,13,0,2,25,7]p27 2023-05-02 14:53:54.322135 2023-05-02 
14:53:54.322135
50.1   0    0 0   0 0 0  0   0 
active+clean    6m 0'0 18414:26 [27,33,22,6,13,34]p27 
[27,33,22,6,13,34]p27 2023-05-02 14:53:54.322135 2023-05-02 
14:53:54.322135
50.2   0    0 0   0 0 0  0   0 
active+clean    6m 0'0 18413:25 [1,28,14,4,31,21]p1   
[1,28,14,4,31,21]p1 2023-05-02 14:53:54.322135 2023-05-02 
14:53:54.322135
50.3   0    0 0   0 0 0  0   0 
active+clean    6m 0'0 18413:24 [8,16,26,33,7,25]p8   
[8,16,26,33,7,25]p8 2023-05-02 14:53:54.322135 2023-05-02 
14:53:54.322135


After stopping all OSDs on one host I was still able to read and 
write into the pool, but after stopping a second host one PG from 
that pool went "down". That I don't fully understand yet, but I just 
started to look into it.
With your setup (12 hosts) I would recommend to not utilize all of 
them so you have capacity to recover, let's say one "spare" host per 
DC, leaving 9 hosts in total. A profile with k=3 m=3 l=2 could make 
sense here, resulting in 9 total chunks (one more parity chunks for 
every other OSD), min_size 4. But as I wrote, it probably doesn't 
have the resiliency for a DC failure, so that needs some further 
investigation.


Regards,
Eugen

Zitat von Michel Jouvin :


Hi,

No... our current setup is 3 datacenters with the same 
configuration, i.e. 1 mon/mgr + 4 OSD servers with 16 OSDs each. 
Thus the total of 12 OSDs servers. As with LRC plugin, k+m must be a 
multiple of l, I found that k=9/m=66/l=5 with 
crush-locality=datacenter was achieving my goal of being resilient 
to a datacenter failure. Because I had this, I considered that 
lowering the crush failure domain to osd was not a major issue in my 
case (as it

[ceph-users] Re: Best practice for expanding Ceph cluster

2023-05-04 Thread Janne Johansson

Den tors 4 maj 2023 kl 10:39 skrev huxia...@horebdata.cn
:
> Dear Ceph folks,
>
> I am writing to ask for advice on best practice of expanding ceph cluster. We 
> are running an 8-node Ceph cluster and RGW, and would like to add another 10 
> node, each of which have 10x 12TB HDD. The current 8-node has ca. 400TB user 
> data.
>
> I am wondering whether to add 10 nodes at one shot and let the cluster to 
> rebalance, or divide into 5 steps, each of which add 2 nodes and rebalance 
> step by step?  I do not know what would be the advantages or disadvantages 
> with the one shot scheme vs 5 bataches of adding 2 nodes step-by-step.
>
> Any suggestions, experience sharing or advice are highly appreciated.

If you add one or two hosts, it will rebalance involving all hosts to
even out the data. Then you add two more and it has to even all data
again more or less. Then two more and all old hosts have to redo the
same work again.

I would suggest that you add all new hosts and make the OSDs start
with a super-low initial weight (0.0001 or so), which means they will
be in and up, but not receive any PGs.

Then you set "noout" and "norebalance" and ceph osd crush reweight the
new OSDs to their correct size, perhaps with a sleep 30 in between or
so, to let the dust settle after you change weights.

After all new OSDs are of the correct crush weight, there will be a
lot of PGs misplaced/remapped but not moving. Now you grab one of the
programs/scripts[1] which talks to upmap and tells it that every
misplaced PG actually is where you want it to be. You might need to
run several times, but it usually goes quite fast on the second/third
run. Even if it never gets 100% of the PGs happy, it is quite
sufficient if 95-99% are thinking they are at their correct place.

Now, if you enable the ceph balancer (or already have it enabled) in
upmap mode and unset "noout" and "norebalance" the mgr balancer will
take a certain amount of PGs (some 3% by default[2] ) and remove the
temporary "upmap" setting that says the PG is at the right place even
when it isn't. This means that the balancer takes a small amount of
PGs, lets them move to where they actually want to be, then picks a
few more PGs and repeats until the final destination is correct for
all PGs, evened out on all OSDs as you wanted.

This is the method that I think has the least impact on client IO,
scrubs and all that, should be quite safe but will take a while in
calendar time to finish. The best part is that the admin work needed
comes only in at the beginning, the rest is automatic.

[1] Tools:
https://raw.githubusercontent.com/HeinleinSupport/cern-ceph-scripts/master/tools/upmap/upmap-remapped.py
https://github.com/digitalocean/pgremapper
I think this one works too, haven't tried it:
https://github.com/TheJJ/ceph-balancer

[2] Percent to have moving at any moment:
https://docs.ceph.com/en/latest/rados/operations/balancer/#throttling

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: rbd map: corrupt full osdmap (-22) when

2023-05-04 Thread Kamil Madac

Thanks for the info.

As a solution we used rbd-nbd which works fine without any issues. If we
will have time we will also try to disable ipv4 on the cluster and will try
kernel rbd mapping again. Are there any disadvantages when using NBD
instead of kernel driver?

Thanks

On Wed, May 3, 2023 at 4:06 PM Ilya Dryomov  wrote:

> On Wed, May 3, 2023 at 11:24 AM Kamil Madac  wrote:
> >
> > Hi,
> >
> > We deployed pacific cluster 16.2.12 with cephadm. We experience following
> > error during rbd map:
> >
> > [Wed May  3 08:59:11 2023] libceph: mon2 (1)[2a00:da8:ffef:1433::]:6789
> > session established
> > [Wed May  3 08:59:11 2023] libceph: another match of type 1 in addrvec
> > [Wed May  3 08:59:11 2023] libceph: corrupt full osdmap (-22) epoch 200
> off
> > 1042 (9876284d of 0cb24b58-80b70596)
> > [Wed May  3 08:59:11 2023] osdmap: : 08 07 7d 10 00 00 09 01 5d
> 09
> > 00 00 a2 22 3b 86  ..}.]";.
> > [Wed May  3 08:59:11 2023] osdmap: 0010: e4 f5 11 ed 99 ee 47 75 ca
> 3c
> > ad 23 c8 00 00 00  ..Gu.<.#
> > [Wed May  3 08:59:11 2023] osdmap: 0020: 21 68 4a 64 98 d2 5d 2e 84
> fd
> > 50 64 d9 3a 48 26  !hJd..]...Pd.:H&
> > [Wed May  3 08:59:11 2023] osdmap: 0030: 02 00 00 00 01 00 00 00 00
> 00
> > 00 00 1d 05 71 01  ..q.
> > 
> >
> > Linux Kernel is 6.1.13 and the important thing is that we are using ipv6
> > addresses for connection to ceph nodes.
> > We were able to map rbd from client with kernel 5.10, but in prod
> > environment we are not allowed to use that kernel.
> >
> > What could be the reason for such behavior on newer kernels and how to
> > troubleshoot it?
> >
> > Here is output of ceph osd dump:
> >
> > # ceph osd dump
> > epoch 200
> > fsid a2223b86-e4f5-11ed-99ee-4775ca3cad23
> > created 2023-04-27T12:18:41.777900+
> > modified 2023-05-02T12:09:40.642267+
> > flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit
> > crush_version 34
> > full_ratio 0.95
> > backfillfull_ratio 0.9
> > nearfull_ratio 0.85
> > require_min_compat_client luminous
> > min_compat_client jewel
> > require_osd_release pacific
> > stretch_mode_enabled false
> > pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0
> > object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 183
> > flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application
> > mgr_devicehealth
> > pool 2 'idp' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins
> > pg_num 32 pgp_num 32 autoscale_mode on last_change 48 flags
> > hashpspool,selfmanaged_snaps stripe_width 0 application rbd
> > max_osd 3
> > osd.0 up   in  weight 1 up_from 176 up_thru 182 down_at 172
> > last_clean_interval [170,171)
> >
> [v2:[2a00:da8:ffef:1431::]:6800/805023868,v1:[2a00:da8:ffef:1431::]:6801/805023868,v2:
> > 0.0.0.0:6802/805023868,v1:0.0.0.0:6803/805023868]
> >
> [v2:[2a00:da8:ffef:1431::]:6804/805023868,v1:[2a00:da8:ffef:1431::]:6805/805023868,v2:
> > 0.0.0.0:6806/805023868,v1:0.0.0.0:6807/805023868] exists,up
> > e8fd0ee2-ea63-4d02-8f36-219d36869078
> > osd.1 up   in  weight 1 up_from 136 up_thru 182 down_at 0
> > last_clean_interval [0,0)
> >
> [v2:[2a00:da8:ffef:1432::]:6800/2172723816,v1:[2a00:da8:ffef:1432::]:6801/2172723816,v2:
> > 0.0.0.0:6802/2172723816,v1:0.0.0.0:6803/2172723816]
> >
> [v2:[2a00:da8:ffef:1432::]:6804/2172723816,v1:[2a00:da8:ffef:1432::]:6805/2172723816,v2:
> > 0.0.0.0:6806/2172723816,v1:0.0.0.0:6807/2172723816] exists,up
> > 0b7b5628-9273-4757-85fb-9c16e8441895
> > osd.2 up   in  weight 1 up_from 182 up_thru 182 down_at 178
> > last_clean_interval [123,177)
> >
> [v2:[2a00:da8:ffef:1433::]:6800/887631330,v1:[2a00:da8:ffef:1433::]:6801/887631330,v2:
> > 0.0.0.0:6802/887631330,v1:0.0.0.0:6803/887631330]
> >
> [v2:[2a00:da8:ffef:1433::]:6804/887631330,v1:[2a00:da8:ffef:1433::]:6805/887631330,v2:
> > 0.0.0.0:6806/887631330,v1:0.0.0.0:6807/887631330] exists,up
> > 21f8d0d5-6a3f-4f78-96c8-8ec4e4f78a01
>
> Hi Kamil,
>
> The issue is bogus 0.0.0.0 addresses.  This came up before, see [1] and
> later messages from Stefan in the thread.  You would need to ensure that
> ms_bind_ipv4 is set to false and restart OSDs.
>
> [1]
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/Q6VYRJBPHQI63OQTBJG2N3BJD2KBEZM4/
>
> Thanks,
>
> Ilya
>


-- 
Kamil Madac 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] ??????ceph-users Digest, Vol 107, Issue 20

2023-05-04 Thread ??????

help




----
??: 
   "ceph-users" 
   
https://github.com/ceph/ceph/blob/main/src/tools/cephfs/first-damage.py

 Just do:

 python3 first-damage.py --memo run.1 https://jenkins.ceph.com/job/ceph-volume-test/553/

On Wed, 3 May 2023 at 22:43, Guillaume Abrioux https://tracker.ceph.com/issues/59542#note-1
   Release Notes - TBD
  
   Seeking approvals for:
  
   smoke - Radek, Laura
   rados - Radek, Laura
   rook - S??bastien Han
   cephadm - Adam K
   dashboard - Ernesto
  
   rgw - Casey
   rbd - Ilya
   krbd - Ilya
   fs - Venky, Patrick
   upgrade/octopus-x (pacific) - Laura (look the same as in 
16.2.8)
   upgrade/pacific-p2p - Laura
   powercycle - Brad (SELinux denials)
   ceph-volume - Guillaume, Adam K
  
   Thx
   YuriW
   ___
   Dev mailing list -- d...@ceph.io
   To unsubscribe send an email to dev-le...@ceph.io
 
 ___
 Dev mailing list -- d...@ceph.io
 To unsubscribe send an email to dev-le...@ceph.io



 --

 Laura Flores

 She/Her/Hers

 Software Engineer, Ceph Storage

[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-04 Thread Guillaume Abrioux

ceph-volume approved https://jenkins.ceph.com/job/ceph-volume-test/553/

On Wed, 3 May 2023 at 22:43, Guillaume Abrioux  wrote:

> The failure seen in ceph-volume tests isn't related.
> That being said, it needs to be fixed to have a better view of the current
> status.
>
> On Wed, 3 May 2023 at 21:00, Laura Flores  wrote:
>
>> upgrade/octopus-x (pacific) is approved. Went over failures with Adam
>> King and it was decided they are not release blockers.
>>
>> On Wed, May 3, 2023 at 1:53 PM Yuri Weinstein 
>> wrote:
>>
>>> upgrade/octopus-x (pacific) - Laura
>>> ceph-volume - Guillaume
>>>
>>> + 2 PRs are the remaining issues
>>>
>>> Josh FYI
>>>
>>> On Wed, May 3, 2023 at 11:50 AM Radoslaw Zarzynski 
>>> wrote:
>>> >
>>> > rados approved.
>>> >
>>> > Big thanks to Laura for helping with this!
>>> >
>>> > On Thu, Apr 27, 2023 at 11:21 PM Yuri Weinstein 
>>> wrote:
>>> > >
>>> > > Details of this release are summarized here:
>>> > >
>>> > > https://tracker.ceph.com/issues/59542#note-1
>>> > > Release Notes - TBD
>>> > >
>>> > > Seeking approvals for:
>>> > >
>>> > > smoke - Radek, Laura
>>> > > rados - Radek, Laura
>>> > >   rook - Sébastien Han
>>> > >   cephadm - Adam K
>>> > >   dashboard - Ernesto
>>> > >
>>> > > rgw - Casey
>>> > > rbd - Ilya
>>> > > krbd - Ilya
>>> > > fs - Venky, Patrick
>>> > > upgrade/octopus-x (pacific) - Laura (look the same as in 16.2.8)
>>> > > upgrade/pacific-p2p - Laura
>>> > > powercycle - Brad (SELinux denials)
>>> > > ceph-volume - Guillaume, Adam K
>>> > >
>>> > > Thx
>>> > > YuriW
>>> > > ___
>>> > > Dev mailing list -- d...@ceph.io
>>> > > To unsubscribe send an email to dev-le...@ceph.io
>>> >
>>> ___
>>> Dev mailing list -- d...@ceph.io
>>> To unsubscribe send an email to dev-le...@ceph.io
>>>
>>
>>
>> --
>>
>> Laura Flores
>>
>> She/Her/Hers
>>
>> Software Engineer, Ceph Storage 
>>
>> Chicago, IL
>>
>> lflo...@ibm.com | lflo...@redhat.com 
>> M: +17087388804
>>
>>
>> ___
>> Dev mailing list -- d...@ceph.io
>> To unsubscribe send an email to dev-le...@ceph.io
>>
>
>
> --
>
> *Guillaume Abrioux*Senior Software Engineer
>


-- 

*Guillaume Abrioux*Senior Software Engineer
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Best practice for expanding Ceph cluster

2023-05-04 Thread huxia...@horebdata.cn

Dear Ceph folks,

I am writing to ask for advice on best practice of expanding ceph cluster. We 
are running an 8-node Ceph cluster and RGW, and would like to add another 10 
node, each of which have 10x 12TB HDD. The current 8-node has ca. 400TB user 
data.

I am wondering whether to add 10 nodes at one shot and let the cluster to 
rebalance, or divide into 5 steps, each of which add 2 nodes and rebalance step 
by step?  I do not know what would be the advantages or disadvantages with the 
one shot scheme vs 5 bataches of adding 2 nodes step-by-step.

Any suggestions, experience sharing or advice are highly appreciated. 

thanks a lot in advance,

Samuel



huxia...@horebdata.cn
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS "newly corrupt dentry" after patch version upgrade

2023-05-04 Thread Janek Bevendorff

After running the tool for 11 hours straight, it exited with the 
following exception:


Traceback (most recent call last):
  File "/home/webis/first-damage.py", line 156, in 
    traverse(f, ioctx)
  File "/home/webis/first-damage.py", line 84, in traverse
    for (dnk, val) in it:
  File "rados.pyx", line 1389, in rados.OmapIterator.__next__
  File "rados.pyx", line 318, in rados.decode_cstr
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 8: 
invalid start byte


Does that mean that the last inode listed in the output file is corrupt? 
Any way I can fix it?


The output file has 14 million lines. We have about 24.5 million objects 
in the metadata pool.


Janek


On 03/05/2023 14:20, Patrick Donnelly wrote:

On Wed, May 3, 2023 at 4:33 AM Janek Bevendorff
 wrote:

Hi Patrick,


I'll try that tomorrow and let you know, thanks!

I was unable to reproduce the crash today. Even with
mds_abort_on_newly_corrupt_dentry set to true, all MDS booted up
correctly (though they took forever to rejoin with logs set to 20).

To me it looks like the issue has resolved itself overnight. I had run a
recursive scrub on the file system and another snapshot was taken, in
case any of those might have had an effect on this. It could also be the
case that the (supposedly) corrupt journal entry has simply been
committed now and hence doesn't trigger the assertion any more. Is there
any way I can verify this?

You can run:

https://github.com/ceph/ceph/blob/main/src/tools/cephfs/first-damage.py

Just do:

python3 first-damage.py --memo run.1 

No need to do any of the other steps if you just want a read-only check.


--

Bauhaus-Universität Weimar
Bauhausstr. 9a, R308
99423 Weimar, Germany

Phone: +49 3643 58 3577
www.webis.de
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS crash on FAILED ceph_assert(cur->is_auth())

2023-05-04 Thread Peter van Heusden

Hi Emmaneul

It was a while ago, but as I recall I evicted all clients and that allowed
me to restart the MDS servers. There was something clearly "broken" in how
at least one of the clients was interacting with the system.

Peter

On Thu, 4 May 2023 at 07:18, Emmanuel Jaep  wrote:

> Hi,
>
> did you finally figure out what happened?
> I do have the same behavior and we can't get the mds to start again...
>
> Thanks,
>
> Emmanuel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

41 matches

Mail list logo