[ceph-users] Re: quincy v17.2.4 QE Validation status

2022-09-13 Thread Casey Bodley
On Tue, Sep 13, 2022 at 4:03 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/57472#note-1
> Release Notes - https://github.com/ceph/ceph/pull/48072
>
> Seeking approvals for:
>
> rados - Neha, Travis, Ernesto, Adam
> rgw - Casey

rgw approved

> fs - Venky
> orch - Adam
> rbd - Ilya, Deepika
> krbd - missing packages, Adam Kr is looking into it
> upgrade/octopus-x - missing packages, Adam Kr is looking into it
> ceph-volume - Guillaume is looking into it
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Josh, Neha - LRC upgrade pending major suites approvals.
> RC release - pending major suites approvals.
>
> Thx
> YuriW
>
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
Hi,
i just checked and all OSDs have it set to true.
It seems also not a problem with the snaptrim opration.

We just had two times in the last 7 days where nearly all OSDs logged a lot
(around 3k times in 20 minutes) of these messages:
022-09-12T20:27:19.146+0200 7f576de49700 -1 osd.9 786378 get_health_metrics
reporting 1 slow ops, oldest is osd_op(client.153241560.0:42288714 8.56
8:6a19e4ee:::rbd_data.4c64dc3662fb05.0c00:head [write
2162688~4096 in=4096b] snapc 9835e=[] ondisk+write+known_if_redirected
e786375)


Am Di., 13. Sept. 2022 um 20:20 Uhr schrieb Wesley Dillingham <
w...@wesdillingham.com>:

> I haven't read through this entire thread so forgive me if already
> mentioned:
>
> What is the parameter "bluefs_buffered_io" set to on your OSDs? We once
> saw a terrible slowdown on our OSDs during snaptrim events and setting
> bluefs_buffered_io to true alleviated that issue. That was on a nautilus
> cluster.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn 
>
>
> On Tue, Sep 13, 2022 at 10:48 AM Boris Behrens  wrote:
>
>> The cluster is SSD only with 2TB,4TB and 8TB disks. I would expect that
>> this should be done fairly fast.
>> For now I will recreate every OSD in the cluster and check if this helps.
>>
>> Do you experience slow OPS (so the cluster shows a message like "cluster
>> [WRN] Health check update: 679 slow ops, oldest one blocked for 95 sec,
>> daemons
>>
>> [osd.0,osd.106,osd.107,osd.108,osd.113,osd.116,osd.123,osd.124,osd.125,osd.134]...
>> have slow ops. (SLOW_OPS)")?
>>
>> I can also see a huge spike in the load of all hosts in our cluster for a
>> couple of minutes.
>>
>>
>> Am Di., 13. Sept. 2022 um 13:14 Uhr schrieb Frank Schilder > >:
>>
>> > Hi Boris.
>> >
>> > > 3. wait some time (took around 5-20 minutes)
>> >
>> > Sounds short. Might just have been the compaction that the OSDs do any
>> > ways on startup after upgrade. I don't know how to check for completed
>> > format conversion. What I see in your MON log is exactly what I have
>> seen
>> > with default snap trim settings until all OSDs were converted. Once an
>> OSD
>> > falls behind and slow ops start piling up, everything comes to a halt.
>> Your
>> > logs clearly show a sudden drop of IOP/s on snap trim start and I would
>> > guess this is the cause of the slowly growing OPS back log of the OSDs.
>> >
>> > If its not that, I don't know what else to look for.
>> >
>> > Best regards,
>> > =
>> > Frank Schilder
>> > AIT Risø Campus
>> > Bygning 109, rum S14
>> >
>> > 
>> > From: Boris Behrens 
>> > Sent: 13 September 2022 12:58:19
>> > To: Frank Schilder
>> > Cc: ceph-users@ceph.io
>> > Subject: Re: [ceph-users] laggy OSDs and staling krbd IO after upgrade
>> > from nautilus to octopus
>> >
>> > Hi Frank,
>> > we converted the OSDs directly on the upgrade.
>> >
>> > 1. installing new ceph versions
>> > 2. restart all OSD daemons
>> > 3. wait some time (took around 5-20 minutes)
>> > 4. all OSDs were online again.
>> >
>> > So I would expect, that the OSDs are all upgraded correctly.
>> > I also checked when the trimming happens, and it does not seem to be an
>> > issue on it's own, as the trim happens all the time in various sizes.
>> >
>> > Am Di., 13. Sept. 2022 um 12:45 Uhr schrieb Frank Schilder <
>> fr...@dtu.dk
>> > >:
>> > Are you observing this here:
>> >
>> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/LAN6PTZ2NHF2ZHAYXZIQPHZ4CMJKMI5K/
>> > =
>> > Frank Schilder
>> > AIT Risø Campus
>> > Bygning 109, rum S14
>> >
>> > 
>> > From: Boris Behrens mailto:b...@kervyn.de>>
>> > Sent: 13 September 2022 11:43:20
>> > To: ceph-users@ceph.io
>> > Subject: [ceph-users] laggy OSDs and staling krbd IO after upgrade from
>> > nautilus to octopus
>> >
>> > Hi, I need you help really bad.
>> >
>> > we are currently experiencing a very bad cluster hangups that happen
>> > sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and
>> once
>> > 2022-09-12 in the evening)
>> > We use krbd without cephx for the qemu clients and when the OSDs are
>> > getting laggy, the krbd connection comes to a grinding halt, to a point
>> > that all IO is staling and we can't even unmap the rbd device.
>> >
>> > From the logs, it looks like that the cluster starts to snaptrim a lot a
>> > PGs, then PGs become laggy and then the cluster snowballs into laggy
>> OSDs.
>> > I have attached the monitor log and the osd log (from one OSD) around
>> the
>> > time where it happened.
>> >
>> > - is this a known issue?
>> > - what can I do to debug it further?
>> > - can I downgrade back to nautilus?
>> > - should I upgrade the PGs for the pool to 4096 or 8192?
>> >
>> > The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks)
>> > where the 8TB disks got ~120PGs and the 2TB 

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Wesley Dillingham
I haven't read through this entire thread so forgive me if already
mentioned:

What is the parameter "bluefs_buffered_io" set to on your OSDs? We once saw
a terrible slowdown on our OSDs during snaptrim events and setting
bluefs_buffered_io to true alleviated that issue. That was on a nautilus
cluster.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Tue, Sep 13, 2022 at 10:48 AM Boris Behrens  wrote:

> The cluster is SSD only with 2TB,4TB and 8TB disks. I would expect that
> this should be done fairly fast.
> For now I will recreate every OSD in the cluster and check if this helps.
>
> Do you experience slow OPS (so the cluster shows a message like "cluster
> [WRN] Health check update: 679 slow ops, oldest one blocked for 95 sec,
> daemons
>
> [osd.0,osd.106,osd.107,osd.108,osd.113,osd.116,osd.123,osd.124,osd.125,osd.134]...
> have slow ops. (SLOW_OPS)")?
>
> I can also see a huge spike in the load of all hosts in our cluster for a
> couple of minutes.
>
>
> Am Di., 13. Sept. 2022 um 13:14 Uhr schrieb Frank Schilder :
>
> > Hi Boris.
> >
> > > 3. wait some time (took around 5-20 minutes)
> >
> > Sounds short. Might just have been the compaction that the OSDs do any
> > ways on startup after upgrade. I don't know how to check for completed
> > format conversion. What I see in your MON log is exactly what I have seen
> > with default snap trim settings until all OSDs were converted. Once an
> OSD
> > falls behind and slow ops start piling up, everything comes to a halt.
> Your
> > logs clearly show a sudden drop of IOP/s on snap trim start and I would
> > guess this is the cause of the slowly growing OPS back log of the OSDs.
> >
> > If its not that, I don't know what else to look for.
> >
> > Best regards,
> > =
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > 
> > From: Boris Behrens 
> > Sent: 13 September 2022 12:58:19
> > To: Frank Schilder
> > Cc: ceph-users@ceph.io
> > Subject: Re: [ceph-users] laggy OSDs and staling krbd IO after upgrade
> > from nautilus to octopus
> >
> > Hi Frank,
> > we converted the OSDs directly on the upgrade.
> >
> > 1. installing new ceph versions
> > 2. restart all OSD daemons
> > 3. wait some time (took around 5-20 minutes)
> > 4. all OSDs were online again.
> >
> > So I would expect, that the OSDs are all upgraded correctly.
> > I also checked when the trimming happens, and it does not seem to be an
> > issue on it's own, as the trim happens all the time in various sizes.
> >
> > Am Di., 13. Sept. 2022 um 12:45 Uhr schrieb Frank Schilder  > >:
> > Are you observing this here:
> >
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/LAN6PTZ2NHF2ZHAYXZIQPHZ4CMJKMI5K/
> > =
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > 
> > From: Boris Behrens mailto:b...@kervyn.de>>
> > Sent: 13 September 2022 11:43:20
> > To: ceph-users@ceph.io
> > Subject: [ceph-users] laggy OSDs and staling krbd IO after upgrade from
> > nautilus to octopus
> >
> > Hi, I need you help really bad.
> >
> > we are currently experiencing a very bad cluster hangups that happen
> > sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once
> > 2022-09-12 in the evening)
> > We use krbd without cephx for the qemu clients and when the OSDs are
> > getting laggy, the krbd connection comes to a grinding halt, to a point
> > that all IO is staling and we can't even unmap the rbd device.
> >
> > From the logs, it looks like that the cluster starts to snaptrim a lot a
> > PGs, then PGs become laggy and then the cluster snowballs into laggy
> OSDs.
> > I have attached the monitor log and the osd log (from one OSD) around the
> > time where it happened.
> >
> > - is this a known issue?
> > - what can I do to debug it further?
> > - can I downgrade back to nautilus?
> > - should I upgrade the PGs for the pool to 4096 or 8192?
> >
> > The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks)
> > where the 8TB disks got ~120PGs and the 2TB disks got ~30PGs. All hosts
> > have a minimum of 128GB RAM and the kernel logs of all ceph hosts do not
> > show anything for the timeframe.
> >
> > Cluster stats:
> >   cluster:
> > id: 74313356-3b3d-43f3-bce6-9fb0e4591097
> > health: HEALTH_OK
> >
> >   services:
> > mon: 3 daemons, quorum ceph-rbd-mon4,ceph-rbd-mon5,ceph-rbd-mon6 (age
> > 25h)
> > mgr: ceph-rbd-mon5(active, since 4d), standbys: ceph-rbd-mon4,
> > ceph-rbd-mon6
> > osd: 149 osds: 149 up (since 6d), 149 in (since 7w)
> >
> >   data:
> > pools:   4 pools, 2241 pgs
> > objects: 25.43M objects, 82 TiB
> > usage:   231 TiB used, 187 TiB / 417 TiB avail
> > pgs: 2241 active+clean
> >
> >   io:
> > client:   211 MiB/s rd, 273 MiB/s wr, 1.43k op/s rd, 8.80k op/s 

[ceph-users] Re: 16.2.10 Cephfs with CTDB, Samba running on Ubuntu

2022-09-13 Thread Marco Pizzolo
Thanks so much Bailey, Tim,

We've been pinned the past week and a half, but will look at reviewing the
configuration provided this week or more likely next.

Thanks again.

Marco

On Fri, Sep 9, 2022 at 10:29 AM Bailey Allison 
wrote:

> Hi Tim,
>
> We've actually been having issues with ceph snapshots working on Ubuntu 20,
> as well as Rocky Linux 8 currently, using the vfs ceph_snapshots module at
> least. On Ubuntu the snapshots don't appear at all, and on Rocky Linux they
> do appear but they are all empty. It's one of those thing's we've gotten
> working before but haven't used in a while so need to relearn
> unfortunately.
> We are currently looking into it so if we figure something out I will reach
> back out and let you know.
>
> Regards,
>
> Bailey
>
> -Original Message-
> From: Tim Bishop 
> Sent: September 8, 2022 6:44 AM
> To: Bailey Allison 
> Cc: 'ceph-users' 
> Subject: Re: [ceph-users] Re: 16.2.10 Cephfs with CTDB, Samba running on
> Ubuntu
>
> Hi Bailey,
>
> You mention you got the Ceph snapshots to shadowcopies working. I didn't
> have much luck with this myself, on Ubuntu 20.04+22.04 and Ceph Octopus
> (since upgraded to Pacific).
>
> Just wondering if you could share the config you used for that, and
> anything
> notable about your configuration such as OS+Samba versions.
>
> Thanks,
> Tim.
>
> On Tue, Sep 06, 2022 at 07:01:24PM -0300, Bailey Allison wrote:
> > Hey Marco,
> >
> > Though the Ceph version is Octopus we've had great experience setting
> > up Samba/CTDB on top of CephFS.
> >
> > Typically we'd collocate the MDS nodes as our CTDB nodes/samba
> > gateways and share out kernel CephFS mounts on the gateways via samba.
> >
> > Some nice features are the ceph snapshots can be exposed as samba
> > shadowcopies to samba clients, as well as excellent support for
> > Windows ACLs/domain member samba servers if you're in need of that.
> >
> > Overall I would say it works out quite well together, and is probably
> > the best method to get CephFS connected to any MacOS or Windows clients.
> >
> > -Original Message-
> > From: Marco Pizzolo 
> > Sent: September 6, 2022 2:45 PM
> > To: ceph-users 
> > Subject: [ceph-users] 16.2.10 Cephfs with CTDB, Samba running on
> > Ubuntu
> >
> > Hello Everyone,
> >
> > We are looking at clustering Samba with CTDB to have highly available
> > access to CephFS for clients.
> >
> > I wanted to see how others have implemented, and their experiences so
> far.
> >
> > Would welcome all feedback, and of course if you happen to have any
> > documentation on what you did so that we can test out the same way
> > that would be fantastic.
> >
> > Many thanks.
> > Marco
>
> --
> Tim Bishop
> http://www.bishnet.net/tim/
> PGP Key: 0x6C226B37FDF38D55
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
The cluster is SSD only with 2TB,4TB and 8TB disks. I would expect that
this should be done fairly fast.
For now I will recreate every OSD in the cluster and check if this helps.

Do you experience slow OPS (so the cluster shows a message like "cluster
[WRN] Health check update: 679 slow ops, oldest one blocked for 95 sec,
daemons
[osd.0,osd.106,osd.107,osd.108,osd.113,osd.116,osd.123,osd.124,osd.125,osd.134]...
have slow ops. (SLOW_OPS)")?

I can also see a huge spike in the load of all hosts in our cluster for a
couple of minutes.


Am Di., 13. Sept. 2022 um 13:14 Uhr schrieb Frank Schilder :

> Hi Boris.
>
> > 3. wait some time (took around 5-20 minutes)
>
> Sounds short. Might just have been the compaction that the OSDs do any
> ways on startup after upgrade. I don't know how to check for completed
> format conversion. What I see in your MON log is exactly what I have seen
> with default snap trim settings until all OSDs were converted. Once an OSD
> falls behind and slow ops start piling up, everything comes to a halt. Your
> logs clearly show a sudden drop of IOP/s on snap trim start and I would
> guess this is the cause of the slowly growing OPS back log of the OSDs.
>
> If its not that, I don't know what else to look for.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Boris Behrens 
> Sent: 13 September 2022 12:58:19
> To: Frank Schilder
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] laggy OSDs and staling krbd IO after upgrade
> from nautilus to octopus
>
> Hi Frank,
> we converted the OSDs directly on the upgrade.
>
> 1. installing new ceph versions
> 2. restart all OSD daemons
> 3. wait some time (took around 5-20 minutes)
> 4. all OSDs were online again.
>
> So I would expect, that the OSDs are all upgraded correctly.
> I also checked when the trimming happens, and it does not seem to be an
> issue on it's own, as the trim happens all the time in various sizes.
>
> Am Di., 13. Sept. 2022 um 12:45 Uhr schrieb Frank Schilder  >:
> Are you observing this here:
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/LAN6PTZ2NHF2ZHAYXZIQPHZ4CMJKMI5K/
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Boris Behrens mailto:b...@kervyn.de>>
> Sent: 13 September 2022 11:43:20
> To: ceph-users@ceph.io
> Subject: [ceph-users] laggy OSDs and staling krbd IO after upgrade from
> nautilus to octopus
>
> Hi, I need you help really bad.
>
> we are currently experiencing a very bad cluster hangups that happen
> sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once
> 2022-09-12 in the evening)
> We use krbd without cephx for the qemu clients and when the OSDs are
> getting laggy, the krbd connection comes to a grinding halt, to a point
> that all IO is staling and we can't even unmap the rbd device.
>
> From the logs, it looks like that the cluster starts to snaptrim a lot a
> PGs, then PGs become laggy and then the cluster snowballs into laggy OSDs.
> I have attached the monitor log and the osd log (from one OSD) around the
> time where it happened.
>
> - is this a known issue?
> - what can I do to debug it further?
> - can I downgrade back to nautilus?
> - should I upgrade the PGs for the pool to 4096 or 8192?
>
> The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks)
> where the 8TB disks got ~120PGs and the 2TB disks got ~30PGs. All hosts
> have a minimum of 128GB RAM and the kernel logs of all ceph hosts do not
> show anything for the timeframe.
>
> Cluster stats:
>   cluster:
> id: 74313356-3b3d-43f3-bce6-9fb0e4591097
> health: HEALTH_OK
>
>   services:
> mon: 3 daemons, quorum ceph-rbd-mon4,ceph-rbd-mon5,ceph-rbd-mon6 (age
> 25h)
> mgr: ceph-rbd-mon5(active, since 4d), standbys: ceph-rbd-mon4,
> ceph-rbd-mon6
> osd: 149 osds: 149 up (since 6d), 149 in (since 7w)
>
>   data:
> pools:   4 pools, 2241 pgs
> objects: 25.43M objects, 82 TiB
> usage:   231 TiB used, 187 TiB / 417 TiB avail
> pgs: 2241 active+clean
>
>   io:
> client:   211 MiB/s rd, 273 MiB/s wr, 1.43k op/s rd, 8.80k op/s wr
>
> --- RAW STORAGE ---
> CLASS  SIZE AVAILUSED RAW USED  %RAW USED
> ssd417 TiB  187 TiB  230 TiB   231 TiB  55.30
> TOTAL  417 TiB  187 TiB  230 TiB   231 TiB  55.30
>
> --- POOLS ---
> POOL   ID  PGS   STORED   OBJECTS  USED %USED  MAX
> AVAIL
> isos764  455 GiB  117.92k  1.3 TiB   1.17 38
> TiB
> rbd 8  2048   76 TiB   24.65M  222 TiB  66.31 38
> TiB
> archive 9   128  2.4 TiB  669.59k  7.3 TiB   6.06 38
> TiB
> device_health_metrics  10 1   25 MiB  149   76 MiB  0 38
> TiB
>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> 

[ceph-users] Re: Increasing number of unscrubbed PGs

2022-09-13 Thread Wesley Dillingham
what does "ceph pg ls scrubbing" show? Do you have PGs that have been stuck
in a scrubbing state for a long period of time (many hours,days,weeks etc).
This will show in the "SINCE" column.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Tue, Sep 13, 2022 at 7:32 AM Burkhard Linke <
burkhard.li...@computational.bio.uni-giessen.de> wrote:

> Hi Josh,
>
>
> thx for the link. I'm not sure whether this is the root cause, since we
> did not use the noscrub and nodeepscrub flags in the past. I've set them
> for a short period to test whether removing the flag triggers more
> backfilling. During that time no OSD were restarted etc.
>
>
> But the ticket mentioned repeering as a method for resolving the stuck
> OSDs. I've repeered some of the PGs, and the number of affected PG did
> not increase significantly anymore. On the other hand the number of
> running deep-scrubs also did not increase significantly. I'll keep an
> eye on the developement and hope for 16.2.11 being released soon..
>
>
> Best regards,
>
> Burkhard
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Marc


> 
> It might be possible that converting OSDs before setting require-osd-
> release=octopus leads to a broken state of the converted OSDs. I could
> not yet find a way out of this situation. We will soon perform a third
> upgrade test to test this hypothesis.
> 

So with upgrading one should put this line in ceph.conf, before restarting the 
osd daemons?
require-osd-release=octopus

(I still need to upgrade from Nautilus)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Increasing number of unscrubbed PGs

2022-09-13 Thread Burkhard Linke

Hi Josh,


thx for the link. I'm not sure whether this is the root cause, since we 
did not use the noscrub and nodeepscrub flags in the past. I've set them 
for a short period to test whether removing the flag triggers more 
backfilling. During that time no OSD were restarted etc.



But the ticket mentioned repeering as a method for resolving the stuck 
OSDs. I've repeered some of the PGs, and the number of affected PG did 
not increase significantly anymore. On the other hand the number of 
running deep-scrubs also did not increase significantly. I'll keep an 
eye on the developement and hope for 16.2.11 being released soon..



Best regards,

Burkhard


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
Hi Frank,
we converted the OSDs directly on the upgrade.

1. installing new ceph versions
2. restart all OSD daemons
3. wait some time (took around 5-20 minutes)
4. all OSDs were online again.

So I would expect, that the OSDs are all upgraded correctly.
I also checked when the trimming happens, and it does not seem to be an
issue on it's own, as the trim happens all the time in various sizes.

Am Di., 13. Sept. 2022 um 12:45 Uhr schrieb Frank Schilder :

> Are you observing this here:
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/LAN6PTZ2NHF2ZHAYXZIQPHZ4CMJKMI5K/
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Boris Behrens 
> Sent: 13 September 2022 11:43:20
> To: ceph-users@ceph.io
> Subject: [ceph-users] laggy OSDs and staling krbd IO after upgrade from
> nautilus to octopus
>
> Hi, I need you help really bad.
>
> we are currently experiencing a very bad cluster hangups that happen
> sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once
> 2022-09-12 in the evening)
> We use krbd without cephx for the qemu clients and when the OSDs are
> getting laggy, the krbd connection comes to a grinding halt, to a point
> that all IO is staling and we can't even unmap the rbd device.
>
> From the logs, it looks like that the cluster starts to snaptrim a lot a
> PGs, then PGs become laggy and then the cluster snowballs into laggy OSDs.
> I have attached the monitor log and the osd log (from one OSD) around the
> time where it happened.
>
> - is this a known issue?
> - what can I do to debug it further?
> - can I downgrade back to nautilus?
> - should I upgrade the PGs for the pool to 4096 or 8192?
>
> The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks)
> where the 8TB disks got ~120PGs and the 2TB disks got ~30PGs. All hosts
> have a minimum of 128GB RAM and the kernel logs of all ceph hosts do not
> show anything for the timeframe.
>
> Cluster stats:
>   cluster:
> id: 74313356-3b3d-43f3-bce6-9fb0e4591097
> health: HEALTH_OK
>
>   services:
> mon: 3 daemons, quorum ceph-rbd-mon4,ceph-rbd-mon5,ceph-rbd-mon6 (age
> 25h)
> mgr: ceph-rbd-mon5(active, since 4d), standbys: ceph-rbd-mon4,
> ceph-rbd-mon6
> osd: 149 osds: 149 up (since 6d), 149 in (since 7w)
>
>   data:
> pools:   4 pools, 2241 pgs
> objects: 25.43M objects, 82 TiB
> usage:   231 TiB used, 187 TiB / 417 TiB avail
> pgs: 2241 active+clean
>
>   io:
> client:   211 MiB/s rd, 273 MiB/s wr, 1.43k op/s rd, 8.80k op/s wr
>
> --- RAW STORAGE ---
> CLASS  SIZE AVAILUSED RAW USED  %RAW USED
> ssd417 TiB  187 TiB  230 TiB   231 TiB  55.30
> TOTAL  417 TiB  187 TiB  230 TiB   231 TiB  55.30
>
> --- POOLS ---
> POOL   ID  PGS   STORED   OBJECTS  USED %USED  MAX
> AVAIL
> isos764  455 GiB  117.92k  1.3 TiB   1.17 38
> TiB
> rbd 8  2048   76 TiB   24.65M  222 TiB  66.31 38
> TiB
> archive 9   128  2.4 TiB  669.59k  7.3 TiB   6.06 38
> TiB
> device_health_metrics  10 1   25 MiB  149   76 MiB  0 38
> TiB
>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
I checked the cluster for other snaptrim operations and they happen all
over the place, so for me it looks like they just happend to be done when
the issue occured, but were not the driving factor.

Am Di., 13. Sept. 2022 um 12:04 Uhr schrieb Boris Behrens :

> Because someone mentioned that the attachments did not went through I
> created pastebin links:
>
> monlog: https://pastebin.com/jiNPUrtL
> osdlog: https://pastebin.com/dxqXgqDz
>
> Am Di., 13. Sept. 2022 um 11:43 Uhr schrieb Boris Behrens :
>
>> Hi, I need you help really bad.
>>
>> we are currently experiencing a very bad cluster hangups that happen
>> sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once
>> 2022-09-12 in the evening)
>> We use krbd without cephx for the qemu clients and when the OSDs are
>> getting laggy, the krbd connection comes to a grinding halt, to a point
>> that all IO is staling and we can't even unmap the rbd device.
>>
>> From the logs, it looks like that the cluster starts to snaptrim a lot a
>> PGs, then PGs become laggy and then the cluster snowballs into laggy OSDs.
>> I have attached the monitor log and the osd log (from one OSD) around the
>> time where it happened.
>>
>> - is this a known issue?
>> - what can I do to debug it further?
>> - can I downgrade back to nautilus?
>> - should I upgrade the PGs for the pool to 4096 or 8192?
>>
>> The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks)
>> where the 8TB disks got ~120PGs and the 2TB disks got ~30PGs. All hosts
>> have a minimum of 128GB RAM and the kernel logs of all ceph hosts do not
>> show anything for the timeframe.
>>
>> Cluster stats:
>>   cluster:
>> id: 74313356-3b3d-43f3-bce6-9fb0e4591097
>> health: HEALTH_OK
>>
>>   services:
>> mon: 3 daemons, quorum ceph-rbd-mon4,ceph-rbd-mon5,ceph-rbd-mon6 (age
>> 25h)
>> mgr: ceph-rbd-mon5(active, since 4d), standbys: ceph-rbd-mon4,
>> ceph-rbd-mon6
>> osd: 149 osds: 149 up (since 6d), 149 in (since 7w)
>>
>>   data:
>> pools:   4 pools, 2241 pgs
>> objects: 25.43M objects, 82 TiB
>> usage:   231 TiB used, 187 TiB / 417 TiB avail
>> pgs: 2241 active+clean
>>
>>   io:
>> client:   211 MiB/s rd, 273 MiB/s wr, 1.43k op/s rd, 8.80k op/s wr
>>
>> --- RAW STORAGE ---
>> CLASS  SIZE AVAILUSED RAW USED  %RAW USED
>> ssd417 TiB  187 TiB  230 TiB   231 TiB  55.30
>> TOTAL  417 TiB  187 TiB  230 TiB   231 TiB  55.30
>>
>> --- POOLS ---
>> POOL   ID  PGS   STORED   OBJECTS  USED %USED  MAX
>> AVAIL
>> isos764  455 GiB  117.92k  1.3 TiB   1.17 38
>> TiB
>> rbd 8  2048   76 TiB   24.65M  222 TiB  66.31 38
>> TiB
>> archive 9   128  2.4 TiB  669.59k  7.3 TiB   6.06 38
>> TiB
>> device_health_metrics  10 1   25 MiB  149   76 MiB  0 38
>> TiB
>>
>>
>>
>> --
>> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
>> groüen Saal.
>>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
Because someone mentioned that the attachments did not went through I
created pastebin links:

monlog: https://pastebin.com/jiNPUrtL
osdlog: https://pastebin.com/dxqXgqDz

Am Di., 13. Sept. 2022 um 11:43 Uhr schrieb Boris Behrens :

> Hi, I need you help really bad.
>
> we are currently experiencing a very bad cluster hangups that happen
> sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once
> 2022-09-12 in the evening)
> We use krbd without cephx for the qemu clients and when the OSDs are
> getting laggy, the krbd connection comes to a grinding halt, to a point
> that all IO is staling and we can't even unmap the rbd device.
>
> From the logs, it looks like that the cluster starts to snaptrim a lot a
> PGs, then PGs become laggy and then the cluster snowballs into laggy OSDs.
> I have attached the monitor log and the osd log (from one OSD) around the
> time where it happened.
>
> - is this a known issue?
> - what can I do to debug it further?
> - can I downgrade back to nautilus?
> - should I upgrade the PGs for the pool to 4096 or 8192?
>
> The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks)
> where the 8TB disks got ~120PGs and the 2TB disks got ~30PGs. All hosts
> have a minimum of 128GB RAM and the kernel logs of all ceph hosts do not
> show anything for the timeframe.
>
> Cluster stats:
>   cluster:
> id: 74313356-3b3d-43f3-bce6-9fb0e4591097
> health: HEALTH_OK
>
>   services:
> mon: 3 daemons, quorum ceph-rbd-mon4,ceph-rbd-mon5,ceph-rbd-mon6 (age
> 25h)
> mgr: ceph-rbd-mon5(active, since 4d), standbys: ceph-rbd-mon4,
> ceph-rbd-mon6
> osd: 149 osds: 149 up (since 6d), 149 in (since 7w)
>
>   data:
> pools:   4 pools, 2241 pgs
> objects: 25.43M objects, 82 TiB
> usage:   231 TiB used, 187 TiB / 417 TiB avail
> pgs: 2241 active+clean
>
>   io:
> client:   211 MiB/s rd, 273 MiB/s wr, 1.43k op/s rd, 8.80k op/s wr
>
> --- RAW STORAGE ---
> CLASS  SIZE AVAILUSED RAW USED  %RAW USED
> ssd417 TiB  187 TiB  230 TiB   231 TiB  55.30
> TOTAL  417 TiB  187 TiB  230 TiB   231 TiB  55.30
>
> --- POOLS ---
> POOL   ID  PGS   STORED   OBJECTS  USED %USED  MAX
> AVAIL
> isos764  455 GiB  117.92k  1.3 TiB   1.17 38
> TiB
> rbd 8  2048   76 TiB   24.65M  222 TiB  66.31 38
> TiB
> archive 9   128  2.4 TiB  669.59k  7.3 TiB   6.06 38
> TiB
> device_health_metrics  10 1   25 MiB  149   76 MiB  0 38
> TiB
>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
Hi, I need you help really bad.

we are currently experiencing a very bad cluster hangups that happen
sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once
2022-09-12 in the evening)
We use krbd without cephx for the qemu clients and when the OSDs are
getting laggy, the krbd connection comes to a grinding halt, to a point
that all IO is staling and we can't even unmap the rbd device.

From the logs, it looks like that the cluster starts to snaptrim a lot a
PGs, then PGs become laggy and then the cluster snowballs into laggy OSDs.
I have attached the monitor log and the osd log (from one OSD) around the
time where it happened.

- is this a known issue?
- what can I do to debug it further?
- can I downgrade back to nautilus?
- should I upgrade the PGs for the pool to 4096 or 8192?

The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks)
where the 8TB disks got ~120PGs and the 2TB disks got ~30PGs. All hosts
have a minimum of 128GB RAM and the kernel logs of all ceph hosts do not
show anything for the timeframe.

Cluster stats:
  cluster:
id: 74313356-3b3d-43f3-bce6-9fb0e4591097
health: HEALTH_OK

  services:
mon: 3 daemons, quorum ceph-rbd-mon4,ceph-rbd-mon5,ceph-rbd-mon6 (age
25h)
mgr: ceph-rbd-mon5(active, since 4d), standbys: ceph-rbd-mon4,
ceph-rbd-mon6
osd: 149 osds: 149 up (since 6d), 149 in (since 7w)

  data:
pools:   4 pools, 2241 pgs
objects: 25.43M objects, 82 TiB
usage:   231 TiB used, 187 TiB / 417 TiB avail
pgs: 2241 active+clean

  io:
client:   211 MiB/s rd, 273 MiB/s wr, 1.43k op/s rd, 8.80k op/s wr

--- RAW STORAGE ---
CLASS  SIZE AVAILUSED RAW USED  %RAW USED
ssd417 TiB  187 TiB  230 TiB   231 TiB  55.30
TOTAL  417 TiB  187 TiB  230 TiB   231 TiB  55.30

--- POOLS ---
POOL   ID  PGS   STORED   OBJECTS  USED %USED  MAX AVAIL
isos764  455 GiB  117.92k  1.3 TiB   1.17 38 TiB
rbd 8  2048   76 TiB   24.65M  222 TiB  66.31 38 TiB
archive 9   128  2.4 TiB  669.59k  7.3 TiB   6.06 38 TiB
device_health_metrics  10 1   25 MiB  149   76 MiB  0 38 TiB



-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD Crash in recovery: SST file contains data beyond the point of corruption.

2022-09-13 Thread Igor Fedotov

Hi Benjamin,

sorry for the confusion this should be kSkipAnyCorruptedRecords not 
kSkipAnyCorruptedRecord



Thanks,

Igor

On 9/12/2022 11:26 PM, Benjamin Naber wrote:

Hi Igor,

looks like the setting wont work, the container now starts with a 
different error message that the setting is an invalid argument.
Did i something wrong by setting: /ceph config set osd.4 
bluestore_rocksdb_options_annex 
"wal_recovery_mode=kSkipAnyCorruptedRecord"/ ?


debug 2022-09-12T20:20:45.044+ 8714e040  1 bluefs 
add_block_device bdev 1 path /var/lib/ceph/osd/ceph-4/block size 2.7 TiB

debug 2022-09-12T20:20:45.044+ 8714e040  1 bluefs mount
debug 2022-09-12T20:20:45.044+ 8714e040  1 bluefs _init_alloc 
shared, id 1, capacity 0x2baa100, block size 0x1
debug 2022-09-12T20:20:45.608+ 8714e040  1 bluefs mount 
shared_bdev_used = 0
debug 2022-09-12T20:20:45.608+ 8714e040  1 
bluestore(/var/lib/ceph/osd/ceph-4) _prepare_db_environment set 
db_paths to db,2850558889164 db.slow,2850558889164
debug 2022-09-12T20:20:45.608+ 8714e040 -1 rocksdb: Invalid 
argument: No mapping for enum : wal_recovery_mode
debug 2022-09-12T20:20:45.608+ 8714e040 -1 rocksdb: Invalid 
argument: No mapping for enum : wal_recovery_mode
debug 2022-09-12T20:20:45.608+ 8714e040  1 rocksdb: do_open 
load rocksdb options failed
debug 2022-09-12T20:20:45.608+ 8714e040 -1 
bluestore(/var/lib/ceph/osd/ceph-4) _open_db erroring opening db:

debug 2022-09-12T20:20:45.608+ 8714e040  1 bluefs umount
debug 2022-09-12T20:20:45.608+ 8714e040  1 bdev(0xec8e3c00 
/var/lib/ceph/osd/ceph-4/block) close
debug 2022-09-12T20:20:45.836+ 8714e040  1 bdev(0xec8e2400 
/var/lib/ceph/osd/ceph-4/block) close
debug 2022-09-12T20:20:46.088+ 8714e040 -1 osd.4 0 OSD:init: 
unable to mount object store
debug 2022-09-12T20:20:46.088+ 8714e040 -1  ** ERROR: osd init 
failed: (5) Input/output error


Regards and many thanks for the help!

Ben
Am Montag, September 12, 2022 21:14 CEST, schrieb Igor Fedotov 
:

Hi Benjamin,

honestly the following advice is unlikely to help but you may want to
try to set bluestore_rocksdb_options_annex to one of the following 
options:


- wal_recovery_mode=kTolerateCorruptedTailRecords

- wal_recovery_mode=kSkipAnyCorruptedRecord


The indication that the setting is in effect would be the respective
value at the end of following log line:

debug 2022-09-12T17:37:05.574+ a8316040 4 rocksdb:
Options.wal_recovery_mode: 2


It should get 0 and 3 respectively.


Hoe this helps,

Igor


On 9/12/2022 9:09 PM, Benjamin Naber wrote:
> Hi Everybody,
>
> im struggeling now a couple of days with a degraded cehp cluster.
> Its a simple 3 node Cluster with 6 OSD´s, 3 SSD based, 3 HDD based. 
A couple of days ago one of the nodes crashed. in case of Hardisk 
failure, i replaces the hard disk and the recovery process started 
without any issues.
> As the node was still recovering the new replaced OSD drive was 
switched to backfillfull. And this is where the pain stareted. I 
added another node bought a harddrive and wiped the replacement OSD.
> The Cluster then was a 4 node sized cluster with 3 OSD´s for the 
SSD pool and 4 OSD´s for the HDD pool.
> Then i started the recovery process from beginning. Ceph has also 
started at this point a reassingment of missplaced objects.
> Then a power failure to one of the remaining nodes happend and now 
im stucking with a degraded Cluster and  49 pgs inactive, 3 pgs 
incomplete.
> The OSD Container on the power failure node dindt come up anymore 
in case of rocksdb error. Any advice how the recover the corrupt 
rocksdb ?

> Container Log and rocksdb error:
>
> https://pastebin.com/gvGJdubx
>
> Regards an thanks for your help!
>
> Ben
>
>
> --
> ___
> Diese E-mail einschließlich eventuell angehängter Dateien enthält 
vertrauliche und / oder rechtlich geschützte Informationen. Wenn Sie 
nicht der richtige Adressat sind und diese E-mail irrtümlich erhalten 
haben, dürfen Sie weder den Inhalt dieser E-mail nutzen noch dürfen 
Sie die eventuell angehängten Dateien öffnen und auch keine Kopie 
fertigen oder den Inhalt weitergeben / verbreiten. Bitte verständigen 
Sie den Absender und löschen Sie diese E-mail und eventuell 
angehängte Dateien umgehend.

> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx




--
___
Benjamin Naber • Holzstraße 7 • D-73650 Winterbach
Mobil: +49 (0) 152.34087809
E-Mail: 

[ceph-users] Re: Ceph on windows (wnbd) rbd.exe keeps crashing

2022-09-13 Thread Lucian Petrut

Hi,

Thanks for bringing up this issue. The Pacific release doesn't support 
ipv6 but fortunately there's a backport that's likely going to merge 
soon: https://github.com/ceph/ceph/pull/47303.


There are a few other fixes that we'd like to include before releasing 
new Pacific and Quincy MSIs. I'm seeing increased demand for the ipv6 
fix, so we'll try to speed things up.


Regards,

Lucian

On 11.09.2022 09:18, Ilya Dryomov wrote:

On Sun, Sep 11, 2022 at 2:52 AM Angelo Hongens  wrote:

Does that windows driver even support ipv6?

Hi Angelo,

Adding Lucian who would know more, but there is a recent fix for IPv6
on Windows:

https://tracker.ceph.com/issues/53281

Thanks,

 Ilya


I remember I could not get the driver working as well on my ipv6 setup,
but no logs to help me troubleshoot the issue. I create an issue on
github somewhere, but no response, so I gave up.

Ah, here's my ticket. Might not be related to your issue, but I could
not help my suspicion it might be ipv6 related:
https://github.com/cloudbase/ceph-windows-installer/issues/27



On 09/09/2022 04:22, Stefan Kooman wrote:

Hi,

I try to get wnbd to work on a Windows 2019 virtual machine (Version
1809, OS Build 17763.2183). Unfortunately the process rbd.exe keeps
crashing (according to logs in event viewer).

I have tested with a linux VM in the same network and that just works.

In the ceph.conf I specified the following (besides mon host):

[global]
log to stderr = true
run dir = C:/ProgramData/ceph
crash dir = C:/ProgramData/ceph
debug client = 2

ms bind ipv4 = false
ms bind ipv6 = true

[client]
keyring = C:/ProgramData/ceph/keyring
admin socket = c:/ProgramData/ceph/$name.$pid.asok
debug client = 2

Note: The Ceph network is IPv6 only, and no IPv4 is involved.


I double checked that I can connect with the cluster from the VM.
Eventually I made a tcpdump and from that dump I can conclude that the
client keeps on trying to connect to the cluster (probably because the
rbd.exe process is restarting over and over) but never seems to manage
to actually connect to it. Although debug logging is defined in the
ceph.conf, the client does not write any log output.

Here an example of a crash report:


��Version=1
EventType=APPCRASH
EventTime=133068416700266255
ReportType=2
Consent=1
UploadTime=133068512230149899
ReportStatus=4196
ReportIdentifier=1c521fd8-325f-494e-9b6b-e7a608d9f1b1
IntegratorReportIdentifier=1a2b76ca-248e-4636-9ed6-1cae6c332c0c
Wow64Host=34404
NsAppName=rbd.exe
AppSessionGuid=03c4-0001-0011-7c4e-cb1b05c1d801
TargetAppId=W:7c6b388ea9ba05b8df74c0e19907c78c0904!e32ad63d5bac11abc70d42f12a6c189e6b9edfdc!rbd.exe
TargetAppVer=1970//01//01:00:00:00!26b9f4!rbd.exe
BootId=4294967295
TargetAsId=2397
IsFatal=1
EtwNonCollectReason=1
Response.type=4
Sig[0].Name=Application Name
Sig[0].Value=rbd.exe
Sig[1].Name=Application Version
Sig[1].Value=0.0.0.0
Sig[2].Name=Application Timestamp
Sig[2].Value=
Sig[3].Name=Fault Module Name
Sig[3].Value=libceph-common.dll
Sig[4].Name=Fault Module Version
Sig[4].Value=0.0.0.0
Sig[5].Name=Fault Module Timestamp
Sig[5].Value=
Sig[6].Name=Exception Code
Sig[6].Value=4015
Sig[7].Name=Exception Offset
Sig[7].Value=0069a3bb
DynamicSig[1].Name=OS Version
DynamicSig[1].Value=10.0.17763.2.0.0.272.7
DynamicSig[2].Name=Locale ID
DynamicSig[2].Value=1033
DynamicSig[22].Name=Additional Information 1
DynamicSig[22].Value=74e0
DynamicSig[23].Name=Additional Information 2
DynamicSig[23].Value=74e0e499b0f720be12f39b443eb7059c
DynamicSig[24].Name=Additional Information 3
DynamicSig[24].Value=f912
DynamicSig[25].Name=Additional Information 4
DynamicSig[25].Value=f9121477e3a172a0e72323a2204f3558
UI[2]=C:\Program Files\Ceph\bin\rbd.exe
LoadedModule[0]=C:\Program Files\Ceph\bin\rbd.exe
LoadedModule[1]=C:\Windows\SYSTEM32\ntdll.dll
LoadedModule[2]=C:\Windows\System32\KERNEL32.DLL
LoadedModule[3]=C:\Windows\System32\KERNELBASE.dll
LoadedModule[4]=C:\Windows\System32\msvcrt.dll
LoadedModule[5]=C:\Windows\System32\WS2_32.dll
LoadedModule[6]=C:\Windows\System32\RPCRT4.dll
LoadedModule[7]=C:\Program Files\Ceph\bin\libboost_program_options.dll
LoadedModule[8]=C:\Program Files\Ceph\bin\libwinpthread-1.dll
LoadedModule[9]=C:\Program Files\Ceph\bin\libgcc_s_seh-1.dll
LoadedModule[10]=C:\Program Files\Ceph\bin\libstdc++-6.dll
LoadedModule[11]=C:\Program Files\Ceph\bin\libceph-common.dll
LoadedModule[12]=C:\Windows\System32\ADVAPI32.dll
LoadedModule[13]=C:\Windows\System32\sechost.dll
LoadedModule[14]=C:\Windows\System32\bcrypt.dll
LoadedModule[15]=C:\Program Files\Ceph\bin\librados.dll
LoadedModule[16]=C:\Program Files\Ceph\bin\librbd.dll
LoadedModule[17]=C:\Program Files\Ceph\bin\libboost_random.dll
LoadedModule[18]=C:\Program Files\Ceph\bin\libboost_thread_pthread.dll
LoadedModule[19]=C:\Program Files\Ceph\bin\libcrypto-1_1-x64.dll
LoadedModule[20]=C:\Windows\System32\USER32.dll
LoadedModule[21]=C:\Windows\System32\win32u.dll
LoadedModule[22]=C:\Windows\System32\GDI32.dll

[ceph-users] Re: just-rebuilt mon does not join the cluster

2022-09-13 Thread Jan Kasprzak
Hello,

Stefan Kooman wrote:
: Hi,
: 
: On 9/9/22 10:53, Frank Schilder wrote:
: >Is there a chance you might have seen this 
https://tracker.ceph.com/issues/49231 ?
: >
: >Do you have network monitoring with packet reports? It is possible though 
that you have observed something new.
: >
: >Your cluster comes from pre-luminous times? The issue with dropping support 
for level-db was discussed in the user list some time ago. There were 
instructions how to upgrade the mon store, which should happen before starting 
a ceph upgrade. Seems like the info didn't make it into the upgrade 
instructions.
: 
: It's stated in the Quincy release notes [1]:
: 
: "LevelDB support has been removed. WITH_LEVELDB is no longer a
: supported build option. Users should migrate their monitors and OSDs
: to RocksDB before upgrading to Quincy."
: 
: And here as a Note [2]:
: 
: I'm not sure if cephadm has a check for this before attempting an
: upgrade to Quincy though.

OK, this is my fault. Actually, using my cluster with defaults mostly,
I even didn't know whether I was using rocksdb or leveldb :-)

But then, the problem I reported here - a newly rebuilt mon failing
to add itself to the cluster and only succeed after four hours and after
setting mon_sync_max_payload_size 4096 (not sure whether this made any
difference) is probably not related to upgrade to Quincy, as at that time
I had all mons downgraded back to Octopus.

Nevertheless, I have finished upgrading to Quincy now.

-Yenya

-- 
| Jan "Yenya" Kasprzak  |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
We all agree on the necessity of compromise. We just can't agree on
when it's necessary to compromise. --Larry Wall
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io