[ceph-users] Re: Crush map & rule

2023-11-08 Thread Albert Shih
Le 08/11/2023 à 19:29:19+0100, David C. a écrit
Hi David. 

> 
> What would be the number of replicas (in total and on each row) and their
> distribution on the tree ?

Well “inside” a row that would be 3 in replica mode. 

Between row...well two ;-)

Beside to understanding how to write a rule a little more complex than the
example in the official documentation, they are another purpose and it's to try 
to have
a protocole for changing the hardware.

For example if «row primary» are only with old bare metal server, and I
have some new server I put inside the ceph and want to copy everything
from the “row primary” to “row secondary”.

Regards

> 
> 
> Le mer. 8 nov. 2023 à 18:45, Albert Shih  a écrit :
> 
> Hi everyone,
> 
> I'm totally newbie with ceph, so sorry if I'm asking some stupid question.
> 
> I'm trying to understand how the crush map & rule work, my goal is to have
> two groups of 3 servers, so I'm using “row” bucket
> 
> ID   CLASS  WEIGHT    TYPE NAME                 STATUS  REWEIGHT  PRI-AFF
>  -1         59.38367  root default
> -15         59.38367      zone City
> -17         29.69183          row primary
>  -3          9.89728              host server1
>   0    ssd   3.49309                  osd.0         up   1.0  1.0
>   1    ssd   1.74660                  osd.1         up   1.0  1.0
>   2    ssd   1.74660                  osd.2         up   1.0  1.0
>   3    ssd   2.91100                  osd.3         up   1.0  1.0
>  -5          9.89728              host server2
>   4    ssd   1.74660                  osd.4         up   1.0  1.0
>   5    ssd   1.74660                  osd.5         up   1.0  1.0
>   6    ssd   2.91100                  osd.6         up   1.0  1.0
>   7    ssd   3.49309                  osd.7         up   1.0  1.0
>  -7          9.89728              host server3
>   8    ssd   3.49309                  osd.8         up   1.0  1.0
>   9    ssd   1.74660                  osd.9         up   1.0  1.0
>  10    ssd   2.91100                  osd.10        up   1.0  1.0
>  11    ssd   1.74660                  osd.11        up   1.0  1.0
> -19         29.69183          row secondary
>  -9          9.89728              host server4
>  12    ssd   1.74660                  osd.12        up   1.0  1.0
>  13    ssd   1.74660                  osd.13        up   1.0  1.0
>  14    ssd   3.49309                  osd.14        up   1.0  1.0
>  15    ssd   2.91100                  osd.15        up   1.0  1.0
> -11          9.89728              host server5
>  16    ssd   1.74660                  osd.16        up   1.0  1.0
>  17    ssd   1.74660                  osd.17        up   1.0  1.0
>  18    ssd   3.49309                  osd.18        up   1.0  1.0
>  19    ssd   2.91100                  osd.19        up   1.0  1.0
> -13          9.89728              host server6
>  20    ssd   1.74660                  osd.20        up   1.0  1.0
>  21    ssd   1.74660                  osd.21        up   1.0  1.0
>  22    ssd   2.91100                  osd.22        up   1.0  1.0
> 
> and I want to create a some rules, first I like to have
> 
>   a rule «replica» (over host) inside the «row» primary
>   a rule «erasure» (over host)  inside the «row» primary
> 
> but also two crush rule between primary/secondary, meaning I like to have 
> a
> replica (with only 1 copy of course) of pool from “row” primary to
> secondary.
> 
> How can I achieve that ?
> 
> Regards
> 
> 
> 
> --
> Albert SHIH 嶺 
> mer. 08 nov. 2023 18:37:54 CET
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
-- 
Albert SHIH 嶺 
Observatoire de Paris
France
Heure locale/Local time:
jeu. 09 nov. 2023 08:39:41 CET
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Dashboard - Community News Sticker [Feedback]

2023-11-08 Thread Dominique Ramaekers
Hi,

On my opinion... Please don't. In worst case, maybe only messages concerning 
critical updates (security, stability issues).

For two reasons:
1) as low as the impact may be, server sources are precious...
2) my time is also precious. If I login to the GUI, it's with the intention to 
do some work. Knowing myself, I will be distracted with miscellaneous messages. 
If I want to get myself up to speed with events, updates, ect... I'll direct 
myself to the different channels already available. I think mailing lists, 
newsletters, social media channels are maybe more appropriate.

BTW: thanks a lot for the user input on these decisions!

Greetings,

Dominique.

> -Oorspronkelijk bericht-
> Van: Nizamudeen A 
> Verzonden: donderdag 9 november 2023 7:36
> Aan: ceph-users ; dev 
> CC: ceph-dashboard-core 
> Onderwerp: [ceph-users] Ceph Dashboard - Community News Sticker
> [Feedback]
>
> Hello,
>
> We wanted to get some feedback on one of the features that we are
> planning to bring in for upcoming releases.
>
> On the Ceph GUI, we thought it could be interesting to show information
> regarding the community events, ceph release information (Release notes
> and
> changelogs) and maybe even notify about new blog post releases and also
> inform regarding the community group meetings. There would be options to
> subscribe to the events that you want to get notified.
>
> Before proceeding with its implementation, we thought it'd be good to get
> some community feedback around it. So please let us know what you think
> (the goods and the bads).
>
> Regards,
> --
>
> Nizamudeen A
>
> Software Engineer
>
> Red Hat 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email
> to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Dashboard - Community News Sticker [Feedback]

2023-11-08 Thread Chris Palmer

My vote would be "no":

 * This is an operational high-criticality system. Not the right place
   to have distracting other stuff or to bloat the dashboard.
 * Our ceph systems deliberately don't have direct internet connectivity.
 * There is plenty of useful operational information that could fill
   the screen real-estate.
 * Effort could be much better spent on real bugs.

Sorry!

Chris


On 09/11/2023 06:35, Nizamudeen A wrote:

Hello,

We wanted to get some feedback on one of the features that we are planning
to bring in for upcoming releases.

On the Ceph GUI, we thought it could be interesting to show information
regarding the community events, ceph release information (Release notes and
changelogs) and maybe even notify about new blog post releases and also
inform regarding the community group meetings. There would be options to
subscribe to the events that you want to get notified.

Before proceeding with its implementation, we thought it'd be good to get
some community feedback around it. So please let us know what you think
(the goods and the bads).

Regards,

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Help needed with Grafana password

2023-11-08 Thread Eugen Block

Hi,
you mean you forgot your password? You can remove the service with  
'ceph orch rm grafana', then re-apply your grafana.yaml containing the  
initial password. Note that this would remove all of the grafana  
configs or custom dashboards etc., you would have to reconfigure them.  
So before doing that you should verify that this is actually what  
you're looking for. Not sure what this has to do with Loki though.


Eugen

Zitat von Sake Ceph :


I configured a password for Grafana because I want to use Loki. I
used the spec parameter initial_admin_password and this works fine for a
staging environment, where I never tried to used Grafana with a password
for Loki. 
    
   Using the username admin with the configured password gives a
credentials error on environment where I tried to use Grafana with Loki in
the past (with 17.2.6 of Ceph/cephadm). I changed the password in the past
within Grafana, but how can I overwrite this now? Or is there a way to
cleanup all Grafana files? 
    
   Best regards, 
   Sake



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Dashboard - Community News Sticker [Feedback]

2023-11-08 Thread Dmitry Melekhov

09.11.2023 10:35, Nizamudeen A пишет:

Hello,

We wanted to get some feedback on one of the features that we are planning
to bring in for upcoming releases.

On the Ceph GUI, we thought it could be interesting to show information
regarding the community events, ceph release information (Release notes and
changelogs) and maybe even notify about new blog post releases and also
inform regarding the community group meetings. There would be options to
subscribe to the events that you want to get notified.

Before proceeding with its implementation, we thought it'd be good to get
some community feedback around it. So please let us know what you think
(the goods and the bads).

Regards,


Hello!

I think this feature will require internet access from servers, my 
servers have no internet access, so for me this feature is useless.



Thank you!

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Dashboard - Community News Sticker [Feedback]

2023-11-08 Thread Nizamudeen A
Hello,

We wanted to get some feedback on one of the features that we are planning
to bring in for upcoming releases.

On the Ceph GUI, we thought it could be interesting to show information
regarding the community events, ceph release information (Release notes and
changelogs) and maybe even notify about new blog post releases and also
inform regarding the community group meetings. There would be options to
subscribe to the events that you want to get notified.

Before proceeding with its implementation, we thought it'd be good to get
some community feedback around it. So please let us know what you think
(the goods and the bads).

Regards,
-- 

Nizamudeen A

Software Engineer

Red Hat 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-08 Thread Venky Shankar
On Thu, Nov 9, 2023 at 3:53 AM Laura Flores  wrote:

> @Venky Shankar  and @Patrick Donnelly
> , I reviewed the smoke suite results and identified
> a new bug:
>
> https://tracker.ceph.com/issues/63488 - smoke test fails from "NameError:
> name 'DEBUGFS_META_DIR' is not defined"
>
> Can you take a look?
>

Thanks for reporting the failure, Laura. ON it now.


>
> On Wed, Nov 8, 2023 at 12:32 PM Adam King  wrote:
>
>> >
>> > https://tracker.ceph.com/issues/63151 - Adam King do we need anything
>> for
>> > this?
>> >
>>
>> Yes, but not an actual code change in the main ceph repo. I'm looking into
>> a ceph-container change to alter the ganesha version in the container as a
>> solution.
>>
>> On Wed, Nov 8, 2023 at 11:10 AM Yuri Weinstein 
>> wrote:
>>
>> > We merged 3 PRs and rebuilt "reef-release" (Build 2)
>> >
>> > Seeking approvals/reviews for:
>> >
>> > smoke - Laura, Radek 2 jobs failed in "objectstore/bluestore" tests
>> > (see Build 2)
>> > rados - Neha, Radek, Travis, Ernesto, Adam King
>> > rgw - Casey reapprove on Build 2
>> > fs - Venky, approve on Build 2
>> > orch - Adam King
>> > upgrade/quincy-x (reef) - Laura PTL
>> > powercycle - Brad (known issues)
>> >
>> > We need to close
>> > https://tracker.ceph.com/issues/63391
>> > (https://github.com/ceph/ceph/pull/54392) - Travis, Guillaume
>> > https://tracker.ceph.com/issues/63151 - Adam King do we need anything
>> for
>> > this?
>> >
>> > On Wed, Nov 8, 2023 at 6:33 AM Travis Nielsen 
>> wrote:
>> > >
>> > > Yuri, we need to add this issue as a blocker for 18.2.1. We discovered
>> > this issue after the release of 17.2.7, and don't want to hit the same
>> > blocker in 18.2.1 where some types of OSDs are failing to be created in
>> new
>> > clusters, or failing to start in upgraded clusters.
>> > > https://tracker.ceph.com/issues/63391
>> > >
>> > > Thanks!
>> > > Travis
>> > >
>> > > On Wed, Nov 8, 2023 at 4:41 AM Venky Shankar 
>> > wrote:
>> > >>
>> > >> Hi Yuri,
>> > >>
>> > >> On Wed, Nov 8, 2023 at 2:32 AM Yuri Weinstein 
>> > wrote:
>> > >> >
>> > >> > 3 PRs above mentioned were merged and I am returning some tests:
>> > >> >
>> > https://pulpito.ceph.com/?sha1=55e3239498650453ff76a9b06a37f1a6f488c8fd
>> > >> >
>> > >> > Still seeing approvals.
>> > >> > smoke - Laura, Radek, Prashant, Venky in progress
>> > >> > rados - Neha, Radek, Travis, Ernesto, Adam King
>> > >> > rgw - Casey in progress
>> > >> > fs - Venky
>> > >>
>> > >> There's a failure in the fs suite
>> > >>
>> > >>
>> >
>> https://pulpito.ceph.com/vshankar-2023-11-07_05:14:36-fs-reef-release-distro-default-smithi/7450325/
>> > >>
>> > >> Seems to be related to nfs-ganesha. I've reached out to Frank Filz
>> > >> (#cephfs on ceph slack) to have a look. WIll update as soon as
>> > >> possible.
>> > >>
>> > >> > orch - Adam King
>> > >> > rbd - Ilya approved
>> > >> > krbd - Ilya approved
>> > >> > upgrade/quincy-x (reef) - Laura PTL
>> > >> > powercycle - Brad
>> > >> > perf-basic - in progress
>> > >> >
>> > >> >
>> > >> > On Tue, Nov 7, 2023 at 8:38 AM Casey Bodley 
>> > wrote:
>> > >> > >
>> > >> > > On Mon, Nov 6, 2023 at 4:31 PM Yuri Weinstein <
>> ywein...@redhat.com>
>> > wrote:
>> > >> > > >
>> > >> > > > Details of this release are summarized here:
>> > >> > > >
>> > >> > > > https://tracker.ceph.com/issues/63443#note-1
>> > >> > > >
>> > >> > > > Seeking approvals/reviews for:
>> > >> > > >
>> > >> > > > smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE
>> > failures)
>> > >> > > > rados - Neha, Radek, Travis, Ernesto, Adam King
>> > >> > > > rgw - Casey
>> > >> > >
>> > >> > > rgw results are approved.
>> https://github.com/ceph/ceph/pull/54371
>> > >> > > merged to reef but is needed on reef-release
>> > >> > >
>> > >> > > > fs - Venky
>> > >> > > > orch - Adam King
>> > >> > > > rbd - Ilya
>> > >> > > > krbd - Ilya
>> > >> > > > upgrade/quincy-x (reef) - Laura PTL
>> > >> > > > powercycle - Brad
>> > >> > > > perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures)
>> > >> > > >
>> > >> > > > Please reply to this email with approval and/or trackers of
>> known
>> > >> > > > issues/PRs to address them.
>> > >> > > >
>> > >> > > > TIA
>> > >> > > > YuriW
>> > >> > > > ___
>> > >> > > > ceph-users mailing list -- ceph-users@ceph.io
>> > >> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
>> > >> > > >
>> > >> > >
>> > >> > ___
>> > >> > ceph-users mailing list -- ceph-users@ceph.io
>> > >> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Cheers,
>> > >> Venky
>> > >> ___
>> > >> Dev mailing list -- d...@ceph.io
>> > >> To unsubscribe send an email to dev-le...@ceph.io
>> > ___
>> > Dev mailing list -- d...@ceph.io
>> > To unsubscribe send an email to dev-le...@ceph.io
>> >
>> 

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-08 Thread Laura Flores
@Venky Shankar  and @Patrick Donnelly
, I reviewed the smoke suite results and identified a
new bug:

https://tracker.ceph.com/issues/63488 - smoke test fails from "NameError:
name 'DEBUGFS_META_DIR' is not defined"

Can you take a look?

On Wed, Nov 8, 2023 at 12:32 PM Adam King  wrote:

> >
> > https://tracker.ceph.com/issues/63151 - Adam King do we need anything
> for
> > this?
> >
>
> Yes, but not an actual code change in the main ceph repo. I'm looking into
> a ceph-container change to alter the ganesha version in the container as a
> solution.
>
> On Wed, Nov 8, 2023 at 11:10 AM Yuri Weinstein 
> wrote:
>
> > We merged 3 PRs and rebuilt "reef-release" (Build 2)
> >
> > Seeking approvals/reviews for:
> >
> > smoke - Laura, Radek 2 jobs failed in "objectstore/bluestore" tests
> > (see Build 2)
> > rados - Neha, Radek, Travis, Ernesto, Adam King
> > rgw - Casey reapprove on Build 2
> > fs - Venky, approve on Build 2
> > orch - Adam King
> > upgrade/quincy-x (reef) - Laura PTL
> > powercycle - Brad (known issues)
> >
> > We need to close
> > https://tracker.ceph.com/issues/63391
> > (https://github.com/ceph/ceph/pull/54392) - Travis, Guillaume
> > https://tracker.ceph.com/issues/63151 - Adam King do we need anything
> for
> > this?
> >
> > On Wed, Nov 8, 2023 at 6:33 AM Travis Nielsen 
> wrote:
> > >
> > > Yuri, we need to add this issue as a blocker for 18.2.1. We discovered
> > this issue after the release of 17.2.7, and don't want to hit the same
> > blocker in 18.2.1 where some types of OSDs are failing to be created in
> new
> > clusters, or failing to start in upgraded clusters.
> > > https://tracker.ceph.com/issues/63391
> > >
> > > Thanks!
> > > Travis
> > >
> > > On Wed, Nov 8, 2023 at 4:41 AM Venky Shankar 
> > wrote:
> > >>
> > >> Hi Yuri,
> > >>
> > >> On Wed, Nov 8, 2023 at 2:32 AM Yuri Weinstein 
> > wrote:
> > >> >
> > >> > 3 PRs above mentioned were merged and I am returning some tests:
> > >> >
> > https://pulpito.ceph.com/?sha1=55e3239498650453ff76a9b06a37f1a6f488c8fd
> > >> >
> > >> > Still seeing approvals.
> > >> > smoke - Laura, Radek, Prashant, Venky in progress
> > >> > rados - Neha, Radek, Travis, Ernesto, Adam King
> > >> > rgw - Casey in progress
> > >> > fs - Venky
> > >>
> > >> There's a failure in the fs suite
> > >>
> > >>
> >
> https://pulpito.ceph.com/vshankar-2023-11-07_05:14:36-fs-reef-release-distro-default-smithi/7450325/
> > >>
> > >> Seems to be related to nfs-ganesha. I've reached out to Frank Filz
> > >> (#cephfs on ceph slack) to have a look. WIll update as soon as
> > >> possible.
> > >>
> > >> > orch - Adam King
> > >> > rbd - Ilya approved
> > >> > krbd - Ilya approved
> > >> > upgrade/quincy-x (reef) - Laura PTL
> > >> > powercycle - Brad
> > >> > perf-basic - in progress
> > >> >
> > >> >
> > >> > On Tue, Nov 7, 2023 at 8:38 AM Casey Bodley 
> > wrote:
> > >> > >
> > >> > > On Mon, Nov 6, 2023 at 4:31 PM Yuri Weinstein <
> ywein...@redhat.com>
> > wrote:
> > >> > > >
> > >> > > > Details of this release are summarized here:
> > >> > > >
> > >> > > > https://tracker.ceph.com/issues/63443#note-1
> > >> > > >
> > >> > > > Seeking approvals/reviews for:
> > >> > > >
> > >> > > > smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE
> > failures)
> > >> > > > rados - Neha, Radek, Travis, Ernesto, Adam King
> > >> > > > rgw - Casey
> > >> > >
> > >> > > rgw results are approved. https://github.com/ceph/ceph/pull/54371
> > >> > > merged to reef but is needed on reef-release
> > >> > >
> > >> > > > fs - Venky
> > >> > > > orch - Adam King
> > >> > > > rbd - Ilya
> > >> > > > krbd - Ilya
> > >> > > > upgrade/quincy-x (reef) - Laura PTL
> > >> > > > powercycle - Brad
> > >> > > > perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures)
> > >> > > >
> > >> > > > Please reply to this email with approval and/or trackers of
> known
> > >> > > > issues/PRs to address them.
> > >> > > >
> > >> > > > TIA
> > >> > > > YuriW
> > >> > > > ___
> > >> > > > ceph-users mailing list -- ceph-users@ceph.io
> > >> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >> > > >
> > >> > >
> > >> > ___
> > >> > ceph-users mailing list -- ceph-users@ceph.io
> > >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >>
> > >>
> > >>
> > >> --
> > >> Cheers,
> > >> Venky
> > >> ___
> > >> Dev mailing list -- d...@ceph.io
> > >> To unsubscribe send an email to dev-le...@ceph.io
> > ___
> > Dev mailing list -- d...@ceph.io
> > To unsubscribe send an email to dev-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804

[ceph-users] Re: HDD cache

2023-11-08 Thread Peter



This server configured Dell R730 with HBA 330 card HDD are configured write 
through mode.

From: David C. 
Sent: Wednesday, November 8, 2023 10:14
To: Peter 
Cc: ceph-users@ceph.io 
Subject: Re: [ceph-users] HDD cache

Without (raid/jbod) controller ?

Le mer. 8 nov. 2023 à 18:36, Peter 
mailto:peter...@raksmart.com>> a écrit :
Hi All,

I note that HDD cluster commit delay improves after i turn off HDD cache. 
However, i also note that not all HDDs are able to turn off the cache. special 
I found that two HDD with same model number, one can turn off, anther doesn't. 
i guess i have my system config or something different setting with two HDDs.
Below is my command to turn off the HDD cache.

root@lahost008:~# sdparm --set WCE=0 /dev/sdb
 /dev/sdb: ATA ST1NE000-3AP EN01
root@lahost008:~# cat /sys/block/sdb/queue/write_cache
 write through

I also tried sdparm to run "sdparm --set WCE=0 /dev/sdb" I got the same result.

Anyone experienced this can advise?

Thanks a lot


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-08 Thread Adam King
>
> https://tracker.ceph.com/issues/63151 - Adam King do we need anything for
> this?
>

Yes, but not an actual code change in the main ceph repo. I'm looking into
a ceph-container change to alter the ganesha version in the container as a
solution.

On Wed, Nov 8, 2023 at 11:10 AM Yuri Weinstein  wrote:

> We merged 3 PRs and rebuilt "reef-release" (Build 2)
>
> Seeking approvals/reviews for:
>
> smoke - Laura, Radek 2 jobs failed in "objectstore/bluestore" tests
> (see Build 2)
> rados - Neha, Radek, Travis, Ernesto, Adam King
> rgw - Casey reapprove on Build 2
> fs - Venky, approve on Build 2
> orch - Adam King
> upgrade/quincy-x (reef) - Laura PTL
> powercycle - Brad (known issues)
>
> We need to close
> https://tracker.ceph.com/issues/63391
> (https://github.com/ceph/ceph/pull/54392) - Travis, Guillaume
> https://tracker.ceph.com/issues/63151 - Adam King do we need anything for
> this?
>
> On Wed, Nov 8, 2023 at 6:33 AM Travis Nielsen  wrote:
> >
> > Yuri, we need to add this issue as a blocker for 18.2.1. We discovered
> this issue after the release of 17.2.7, and don't want to hit the same
> blocker in 18.2.1 where some types of OSDs are failing to be created in new
> clusters, or failing to start in upgraded clusters.
> > https://tracker.ceph.com/issues/63391
> >
> > Thanks!
> > Travis
> >
> > On Wed, Nov 8, 2023 at 4:41 AM Venky Shankar 
> wrote:
> >>
> >> Hi Yuri,
> >>
> >> On Wed, Nov 8, 2023 at 2:32 AM Yuri Weinstein 
> wrote:
> >> >
> >> > 3 PRs above mentioned were merged and I am returning some tests:
> >> >
> https://pulpito.ceph.com/?sha1=55e3239498650453ff76a9b06a37f1a6f488c8fd
> >> >
> >> > Still seeing approvals.
> >> > smoke - Laura, Radek, Prashant, Venky in progress
> >> > rados - Neha, Radek, Travis, Ernesto, Adam King
> >> > rgw - Casey in progress
> >> > fs - Venky
> >>
> >> There's a failure in the fs suite
> >>
> >>
> https://pulpito.ceph.com/vshankar-2023-11-07_05:14:36-fs-reef-release-distro-default-smithi/7450325/
> >>
> >> Seems to be related to nfs-ganesha. I've reached out to Frank Filz
> >> (#cephfs on ceph slack) to have a look. WIll update as soon as
> >> possible.
> >>
> >> > orch - Adam King
> >> > rbd - Ilya approved
> >> > krbd - Ilya approved
> >> > upgrade/quincy-x (reef) - Laura PTL
> >> > powercycle - Brad
> >> > perf-basic - in progress
> >> >
> >> >
> >> > On Tue, Nov 7, 2023 at 8:38 AM Casey Bodley 
> wrote:
> >> > >
> >> > > On Mon, Nov 6, 2023 at 4:31 PM Yuri Weinstein 
> wrote:
> >> > > >
> >> > > > Details of this release are summarized here:
> >> > > >
> >> > > > https://tracker.ceph.com/issues/63443#note-1
> >> > > >
> >> > > > Seeking approvals/reviews for:
> >> > > >
> >> > > > smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE
> failures)
> >> > > > rados - Neha, Radek, Travis, Ernesto, Adam King
> >> > > > rgw - Casey
> >> > >
> >> > > rgw results are approved. https://github.com/ceph/ceph/pull/54371
> >> > > merged to reef but is needed on reef-release
> >> > >
> >> > > > fs - Venky
> >> > > > orch - Adam King
> >> > > > rbd - Ilya
> >> > > > krbd - Ilya
> >> > > > upgrade/quincy-x (reef) - Laura PTL
> >> > > > powercycle - Brad
> >> > > > perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures)
> >> > > >
> >> > > > Please reply to this email with approval and/or trackers of known
> >> > > > issues/PRs to address them.
> >> > > >
> >> > > > TIA
> >> > > > YuriW
> >> > > > ___
> >> > > > ceph-users mailing list -- ceph-users@ceph.io
> >> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >> > > >
> >> > >
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >>
> >>
> >> --
> >> Cheers,
> >> Venky
> >> ___
> >> Dev mailing list -- d...@ceph.io
> >> To unsubscribe send an email to dev-le...@ceph.io
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Crush map & rule

2023-11-08 Thread David C.
Hi Albert,

What would be the number of replicas (in total and on each row) and their
distribution on the tree ?


Le mer. 8 nov. 2023 à 18:45, Albert Shih  a écrit :

> Hi everyone,
>
> I'm totally newbie with ceph, so sorry if I'm asking some stupid question.
>
> I'm trying to understand how the crush map & rule work, my goal is to have
> two groups of 3 servers, so I'm using “row” bucket
>
> ID   CLASS  WEIGHTTYPE NAME STATUS  REWEIGHT  PRI-AFF
>  -1 59.38367  root default
> -15 59.38367  zone City
> -17 29.69183  row primary
>  -3  9.89728  host server1
>   0ssd   3.49309  osd.0 up   1.0  1.0
>   1ssd   1.74660  osd.1 up   1.0  1.0
>   2ssd   1.74660  osd.2 up   1.0  1.0
>   3ssd   2.91100  osd.3 up   1.0  1.0
>  -5  9.89728  host server2
>   4ssd   1.74660  osd.4 up   1.0  1.0
>   5ssd   1.74660  osd.5 up   1.0  1.0
>   6ssd   2.91100  osd.6 up   1.0  1.0
>   7ssd   3.49309  osd.7 up   1.0  1.0
>  -7  9.89728  host server3
>   8ssd   3.49309  osd.8 up   1.0  1.0
>   9ssd   1.74660  osd.9 up   1.0  1.0
>  10ssd   2.91100  osd.10up   1.0  1.0
>  11ssd   1.74660  osd.11up   1.0  1.0
> -19 29.69183  row secondary
>  -9  9.89728  host server4
>  12ssd   1.74660  osd.12up   1.0  1.0
>  13ssd   1.74660  osd.13up   1.0  1.0
>  14ssd   3.49309  osd.14up   1.0  1.0
>  15ssd   2.91100  osd.15up   1.0  1.0
> -11  9.89728  host server5
>  16ssd   1.74660  osd.16up   1.0  1.0
>  17ssd   1.74660  osd.17up   1.0  1.0
>  18ssd   3.49309  osd.18up   1.0  1.0
>  19ssd   2.91100  osd.19up   1.0  1.0
> -13  9.89728  host server6
>  20ssd   1.74660  osd.20up   1.0  1.0
>  21ssd   1.74660  osd.21up   1.0  1.0
>  22ssd   2.91100  osd.22up   1.0  1.0
>
> and I want to create a some rules, first I like to have
>
>   a rule «replica» (over host) inside the «row» primary
>   a rule «erasure» (over host)  inside the «row» primary
>
> but also two crush rule between primary/secondary, meaning I like to have a
> replica (with only 1 copy of course) of pool from “row” primary to
> secondary.
>
> How can I achieve that ?
>
> Regards
>
>
>
> --
> Albert SHIH 嶺 
> mer. 08 nov. 2023 18:37:54 CET
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: HDD cache

2023-11-08 Thread David C.
Without (raid/jbod) controller ?

Le mer. 8 nov. 2023 à 18:36, Peter  a écrit :

> Hi All,
>
> I note that HDD cluster commit delay improves after i turn off HDD cache.
> However, i also note that not all HDDs are able to turn off the cache.
> special I found that two HDD with same model number, one can turn off,
> anther doesn't. i guess i have my system config or something different
> setting with two HDDs.
> Below is my command to turn off the HDD cache.
>
> root@lahost008:~# sdparm --set WCE=0 /dev/sdb
>  /dev/sdb: ATA ST1NE000-3AP EN01
> root@lahost008:~# cat /sys/block/sdb/queue/write_cache
>  write through
>
> I also tried sdparm to run "sdparm --set WCE=0 /dev/sdb" I got the same
> result.
>
> Anyone experienced this can advise?
>
> Thanks a lot
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Crush map & rule

2023-11-08 Thread Albert Shih
Hi everyone, 

I'm totally newbie with ceph, so sorry if I'm asking some stupid question. 

I'm trying to understand how the crush map & rule work, my goal is to have
two groups of 3 servers, so I'm using “row” bucket 

ID   CLASS  WEIGHTTYPE NAME STATUS  REWEIGHT  PRI-AFF
 -1 59.38367  root default
-15 59.38367  zone City
-17 29.69183  row primary
 -3  9.89728  host server1
  0ssd   3.49309  osd.0 up   1.0  1.0
  1ssd   1.74660  osd.1 up   1.0  1.0
  2ssd   1.74660  osd.2 up   1.0  1.0
  3ssd   2.91100  osd.3 up   1.0  1.0
 -5  9.89728  host server2
  4ssd   1.74660  osd.4 up   1.0  1.0
  5ssd   1.74660  osd.5 up   1.0  1.0
  6ssd   2.91100  osd.6 up   1.0  1.0
  7ssd   3.49309  osd.7 up   1.0  1.0
 -7  9.89728  host server3
  8ssd   3.49309  osd.8 up   1.0  1.0
  9ssd   1.74660  osd.9 up   1.0  1.0
 10ssd   2.91100  osd.10up   1.0  1.0
 11ssd   1.74660  osd.11up   1.0  1.0
-19 29.69183  row secondary
 -9  9.89728  host server4
 12ssd   1.74660  osd.12up   1.0  1.0
 13ssd   1.74660  osd.13up   1.0  1.0
 14ssd   3.49309  osd.14up   1.0  1.0
 15ssd   2.91100  osd.15up   1.0  1.0
-11  9.89728  host server5
 16ssd   1.74660  osd.16up   1.0  1.0
 17ssd   1.74660  osd.17up   1.0  1.0
 18ssd   3.49309  osd.18up   1.0  1.0
 19ssd   2.91100  osd.19up   1.0  1.0
-13  9.89728  host server6
 20ssd   1.74660  osd.20up   1.0  1.0
 21ssd   1.74660  osd.21up   1.0  1.0
 22ssd   2.91100  osd.22up   1.0  1.0

and I want to create a some rules, first I like to have

  a rule «replica» (over host) inside the «row» primary
  a rule «erasure» (over host)  inside the «row» primary

but also two crush rule between primary/secondary, meaning I like to have a
replica (with only 1 copy of course) of pool from “row” primary to
secondary. 

How can I achieve that ?

Regards
  


-- 
Albert SHIH 嶺 
mer. 08 nov. 2023 18:37:54 CET
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] HDD cache

2023-11-08 Thread Peter
Hi All,

I note that HDD cluster commit delay improves after i turn off HDD cache. 
However, i also note that not all HDDs are able to turn off the cache. special 
I found that two HDD with same model number, one can turn off, anther doesn't. 
i guess i have my system config or something different setting with two HDDs.
Below is my command to turn off the HDD cache.

root@lahost008:~# sdparm --set WCE=0 /dev/sdb
 /dev/sdb: ATA ST1NE000-3AP EN01
root@lahost008:~# cat /sys/block/sdb/queue/write_cache
 write through

I also tried sdparm to run "sdparm --set WCE=0 /dev/sdb" I got the same result.

Anyone experienced this can advise?

Thanks a lot


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Question about PG mgr/balancer/crush_compat_metrics

2023-11-08 Thread Bryan Song
Sorry for not making it clear, we are using upmap. Just saw this from the
code and wondering about the usage.

For the OSDs, we do not have any OSD weight < 1.00 until one OSD reaches
the 85% near full ratio. Before I reweight the
OSD, our mgr/balancer/upmap_max_deviation is set to 5 and the PG
distribution is around +/- 5 on each OSD. What's more, I checked the OSD
usage and found the usage varies from about 50% to 70% when the average
usage is 60%, is this distribution OK? We also enabled the compression and
used snappy, will the compression affect the OSD usage?

On Wed, Nov 8, 2023 at 7:24 AM  wrote:

> Hello,
>
> We are using a Ceph Pacific (16.2.10) cluster and enabled the balancer
> module, but the usage of some OSDs keeps growing and reached up to
> mon_osd_nearfull_ratio, which we use 85% by default, and we think the
> balancer module should do some balancer work.
>
> So I checked our balancer configuration and found that our
> "crush_compat_metrics" is set to "pgs,objects,bytes", and this three values
> are used in src.pybind.mgr.balancer.module.Module.calc_eval. However, when
> doing the actual balance task, only the first key is used to do the auto
> balance, in src.pybind.mgr.balancer.module.Module.do_crush_compat:
> metrics = self.get_module_option('crush_compat_metrics').split(',')
> key = metrics[0] # balancing using the first score metric
>
> My concern is, any reason why we calculate the balancing using the three
> items but only do the balance using the first one?
>
> Thanks.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Thanks & best regards...

Bryan (Longchao Song)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Leadership Team Weekly Meeting Minutes 2023-11-08

2023-11-08 Thread Patrick Donnelly
Hello all,

Here are the minutes from today's meeting.

   - New time for CDM APAC to increase participation


   - 9.30 - 11.30 pm PT seems like the most popular based on
  https://doodle.com/meeting/participate/id/aM9XGZ3a/vote


   - One more week for more feedback; please ask more APAC folks to suggest
  their preferred times.


   - [Ernesto] Revamp Ansible/Ceph-Ansible for non-containerized users?


   - open nebula / proxmox


   - solicit maintainers for ceph-ansible on the ML


   - 18.2.1


   - yuri: approval email sent out a few days ago; waiting on some approvals


   - Blocker:


   - https://tracker.ceph.com/issues/63391


   - lab upgrades (Laura will help Yuri coordinate)


   - Next Pacific release being worked on in background by Yuri.


   - https://pad.ceph.com/p/pacific_16.2.15


   - Try v16.2.15 milestone to help prune PRs


   - https://github.com/ceph/ceph/milestone/17


   - [Nizam] Ceph News Ticker - Ceph Dashboard


   - Notify when new release is available (display changelogs)


   - Display important ceph events


   - CVEs, critical bug fixes


   - Maybe newly added blog posts or informations regarding the upcoming
  group meetings?


   - User + Dev meeting next week


   - Topics include migration between EC profiles and challenges related to
  RGW zone replication


   - Casey can attend end of meeting


   - open nebula folks planning to do webinar; looking for speakers


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph storage pool error

2023-11-08 Thread Robert Sander

Hi,

On 11/7/23 12:35, necoe0...@gmail.com wrote:

Ceph 3 clusters are running and the 3rd cluster gave an error, it is currently 
offline. I want to get all the remaining data in 2 clusters. Instead of fixing 
ceph, I just want to save the data. How can I access this data and connect to 
the pool? Can you help me?1 and 2 clusters are working. I want to view my data 
from them and then transfer them to another place. How can I do this? I have 
never used Ceph before.


Please send the output of:

ceph -s
ceph health detail
ceph osd df tree

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-08 Thread Nizamudeen A
dashboard approved, the test failure is known cypress issue which is not a
blocker.

Regards,
Nizam

On Wed, Nov 8, 2023, 21:41 Yuri Weinstein  wrote:

> We merged 3 PRs and rebuilt "reef-release" (Build 2)
>
> Seeking approvals/reviews for:
>
> smoke - Laura, Radek 2 jobs failed in "objectstore/bluestore" tests
> (see Build 2)
> rados - Neha, Radek, Travis, Ernesto, Adam King
> rgw - Casey reapprove on Build 2
> fs - Venky, approve on Build 2
> orch - Adam King
> upgrade/quincy-x (reef) - Laura PTL
> powercycle - Brad (known issues)
>
> We need to close
> https://tracker.ceph.com/issues/63391
> (https://github.com/ceph/ceph/pull/54392) - Travis, Guillaume
> https://tracker.ceph.com/issues/63151 - Adam King do we need anything for
> this?
>
> On Wed, Nov 8, 2023 at 6:33 AM Travis Nielsen  wrote:
> >
> > Yuri, we need to add this issue as a blocker for 18.2.1. We discovered
> this issue after the release of 17.2.7, and don't want to hit the same
> blocker in 18.2.1 where some types of OSDs are failing to be created in new
> clusters, or failing to start in upgraded clusters.
> > https://tracker.ceph.com/issues/63391
> >
> > Thanks!
> > Travis
> >
> > On Wed, Nov 8, 2023 at 4:41 AM Venky Shankar 
> wrote:
> >>
> >> Hi Yuri,
> >>
> >> On Wed, Nov 8, 2023 at 2:32 AM Yuri Weinstein 
> wrote:
> >> >
> >> > 3 PRs above mentioned were merged and I am returning some tests:
> >> >
> https://pulpito.ceph.com/?sha1=55e3239498650453ff76a9b06a37f1a6f488c8fd
> >> >
> >> > Still seeing approvals.
> >> > smoke - Laura, Radek, Prashant, Venky in progress
> >> > rados - Neha, Radek, Travis, Ernesto, Adam King
> >> > rgw - Casey in progress
> >> > fs - Venky
> >>
> >> There's a failure in the fs suite
> >>
> >>
> https://pulpito.ceph.com/vshankar-2023-11-07_05:14:36-fs-reef-release-distro-default-smithi/7450325/
> >>
> >> Seems to be related to nfs-ganesha. I've reached out to Frank Filz
> >> (#cephfs on ceph slack) to have a look. WIll update as soon as
> >> possible.
> >>
> >> > orch - Adam King
> >> > rbd - Ilya approved
> >> > krbd - Ilya approved
> >> > upgrade/quincy-x (reef) - Laura PTL
> >> > powercycle - Brad
> >> > perf-basic - in progress
> >> >
> >> >
> >> > On Tue, Nov 7, 2023 at 8:38 AM Casey Bodley 
> wrote:
> >> > >
> >> > > On Mon, Nov 6, 2023 at 4:31 PM Yuri Weinstein 
> wrote:
> >> > > >
> >> > > > Details of this release are summarized here:
> >> > > >
> >> > > > https://tracker.ceph.com/issues/63443#note-1
> >> > > >
> >> > > > Seeking approvals/reviews for:
> >> > > >
> >> > > > smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE
> failures)
> >> > > > rados - Neha, Radek, Travis, Ernesto, Adam King
> >> > > > rgw - Casey
> >> > >
> >> > > rgw results are approved. https://github.com/ceph/ceph/pull/54371
> >> > > merged to reef but is needed on reef-release
> >> > >
> >> > > > fs - Venky
> >> > > > orch - Adam King
> >> > > > rbd - Ilya
> >> > > > krbd - Ilya
> >> > > > upgrade/quincy-x (reef) - Laura PTL
> >> > > > powercycle - Brad
> >> > > > perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures)
> >> > > >
> >> > > > Please reply to this email with approval and/or trackers of known
> >> > > > issues/PRs to address them.
> >> > > >
> >> > > > TIA
> >> > > > YuriW
> >> > > > ___
> >> > > > ceph-users mailing list -- ceph-users@ceph.io
> >> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >> > > >
> >> > >
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >>
> >>
> >> --
> >> Cheers,
> >> Venky
> >> ___
> >> Dev mailing list -- d...@ceph.io
> >> To unsubscribe send an email to dev-le...@ceph.io
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] one cephfs volume becomes very slow

2023-11-08 Thread Ben
Dear cephers,

we have a cephfs volume, that will be mounted by many clients with
concurrent read/write capability. From time to time, maybe when concurrency
goes as high as 100 clients' access, accessing it will become very slow to
be useful at all.
the cluster has multiple active mds. All disks are hdd.
Any ideas to improve this?

here is one of mds log during the slow time, others are simillar:

{"log":"debug 2023-11-08T07:26:00.114+ 7f190b014700  0
log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest
blocked for \u003e 5.662970
secs\n","stream":"stderr","time":"2023-11-08T07:26:00.121996282Z"}

{"log":"debug 2023-11-08T07:26:00.114+ 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 5.662970 seconds old,
received at 2023-11-08T07:25:54.458863+:
peer_request:client.12917739:8654334 currently
dispatched\n","stream":"stderr","time":"2023-11-08T07:26:00.122016551Z"}

{"log":"debug 2023-11-08T07:29:54.118+ 7f190b014700  0
log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest
blocked for \u003e 11.900602
secs\n","stream":"stderr","time":"2023-11-08T07:29:54.124567293Z"}

{"log":"debug 2023-11-08T07:29:54.118+ 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 11.900601 seconds old,
received at 2023-11-08T07:29:42.223813+:
client_request(client.27494331:18564666 getattr pAsLsXsFs #0x70001830366
2023-11-08T07:29:42.219416+ caller_uid=0, caller_gid=0{}) currently
failed to rdlock,
waiting\n","stream":"stderr","time":"2023-11-08T07:29:54.124589613Z"}

{"log":"debug 2023-11-08T07:30:00.118+ 7f190b014700  0
log_channel(cluster) log [WRN] : 5 slow requests, 5 included below; oldest
blocked for \u003e 17.900670
secs\n","stream":"stderr","time":"2023-11-08T07:30:00.124691442Z"}

{"log":"debug 2023-11-08T07:30:00.118+ 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 17.900670 seconds old,
received at 2023-11-08T07:29:42.223813+:
client_request(client.27494331:18564666 getattr pAsLsXsFs #0x70001830366
2023-11-08T07:29:42.219416+ caller_uid=0, caller_gid=0{}) currently
failed to rdlock,
waiting\n","stream":"stderr","time":"2023-11-08T07:30:00.124726772Z"}

{"log":"debug 2023-11-08T07:30:00.118+ 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 6.649942 seconds old,
received at 2023-11-08T07:29:53.474541+: client_request(mds.1:305661
rename #0x70001851b32/91e670f9004ddb237a353b2a9ddc063208f5
#0x649/800019f1da7 caller_uid=0, caller_gid=0{}) currently failed to
acquire_locks\n","stream":"stderr","time":"2023-11-08T07:30:00.124731626Z"}

{"log":"debug 2023-11-08T07:30:00.118+ 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 6.649864 seconds old,
received at 2023-11-08T07:29:53.474619+: client_request(mds.1:305662
rename #0x70001851b32/91e670f9004ddb237a353b2a9ddc063208f5
#0x649/800019f1da7 caller_uid=0, caller_gid=0{}) currently requesting
remote
authpins\n","stream":"stderr","time":"2023-11-08T07:30:00.124734415Z"}

{"log":"debug 2023-11-08T07:30:00.118+ 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 6.649719 seconds old,
received at 2023-11-08T07:29:53.474764+:
client_request(client.27497255:25173 getattr pAsLsXsFs #0x800019f1da7
2023-11-08T07:29:53.473182+ caller_uid=0, caller_gid=0{}) currently
requesting remote
authpins\n","stream":"stderr","time":"2023-11-08T07:30:00.124736973Z"}

{"log":"debug 2023-11-08T07:30:00.118+ 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 6.648454 seconds old,
received at 2023-11-08T07:29:53.476029+: client_request(mds.1:305663
rename #0x70001851b32/91e670f9004ddb237a353b2a9ddc063208f5
#0x649/800019f1da7 caller_uid=0, caller_gid=0{}) currently requesting
remote
authpins\n","stream":"stderr","time":"2023-11-08T07:30:00.124739607Z"}

{"log":"debug 2023-11-08T07:43:30.127+ 7f190b014700  0
log_channel(cluster) log [WRN] : 2 slow requests, 2 included below; oldest
blocked for \u003e 5.206645
secs\n","stream":"stderr","time":"2023-11-08T07:43:30.133682292Z"}

{"log":"debug 2023-11-08T07:43:30.127+ 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 5.206644 seconds old,
received at 2023-11-08T07:43:24.926862+:
client_request(client.27430891:5371608 mkdir #0x700018317cd/13
2023-11-08T07:43:24.924423+ caller_uid=0, caller_gid=0{}) currently
submit entry:
journal_and_reply\n","stream":"stderr","time":"2023-11-08T07:43:30.133708161Z"}

{"log":"debug 2023-11-08T07:43:30.127+ 7f190b014700  0
log_channel(cluster) log [WRN] : slow request 5.206209 seconds old,
received at 2023-11-08T07:43:24.927297+:
client_request(client.27430891:5371609 create
#0x2000d8a535a/2d69440b170ada3dd3670b5f3e2ebe4d75eeb274ff2a8238a23162e95601
2023-11-08T07:43:24.924423+ caller_uid=0, caller_gid=0{}) currently
submit entry:
journal_and_reply\n","stream":"stderr","time":"2023-11-08T07:43:30.133711838Z"}

{"log":"debug 2023-11-08T07:50:18.134+ 7f190b014700  0
log_channel(cluster) 

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-08 Thread Yuri Weinstein
We merged 3 PRs and rebuilt "reef-release" (Build 2)

Seeking approvals/reviews for:

smoke - Laura, Radek 2 jobs failed in "objectstore/bluestore" tests
(see Build 2)
rados - Neha, Radek, Travis, Ernesto, Adam King
rgw - Casey reapprove on Build 2
fs - Venky, approve on Build 2
orch - Adam King
upgrade/quincy-x (reef) - Laura PTL
powercycle - Brad (known issues)

We need to close
https://tracker.ceph.com/issues/63391
(https://github.com/ceph/ceph/pull/54392) - Travis, Guillaume
https://tracker.ceph.com/issues/63151 - Adam King do we need anything for this?

On Wed, Nov 8, 2023 at 6:33 AM Travis Nielsen  wrote:
>
> Yuri, we need to add this issue as a blocker for 18.2.1. We discovered this 
> issue after the release of 17.2.7, and don't want to hit the same blocker in 
> 18.2.1 where some types of OSDs are failing to be created in new clusters, or 
> failing to start in upgraded clusters.
> https://tracker.ceph.com/issues/63391
>
> Thanks!
> Travis
>
> On Wed, Nov 8, 2023 at 4:41 AM Venky Shankar  wrote:
>>
>> Hi Yuri,
>>
>> On Wed, Nov 8, 2023 at 2:32 AM Yuri Weinstein  wrote:
>> >
>> > 3 PRs above mentioned were merged and I am returning some tests:
>> > https://pulpito.ceph.com/?sha1=55e3239498650453ff76a9b06a37f1a6f488c8fd
>> >
>> > Still seeing approvals.
>> > smoke - Laura, Radek, Prashant, Venky in progress
>> > rados - Neha, Radek, Travis, Ernesto, Adam King
>> > rgw - Casey in progress
>> > fs - Venky
>>
>> There's a failure in the fs suite
>>
>> 
>> https://pulpito.ceph.com/vshankar-2023-11-07_05:14:36-fs-reef-release-distro-default-smithi/7450325/
>>
>> Seems to be related to nfs-ganesha. I've reached out to Frank Filz
>> (#cephfs on ceph slack) to have a look. WIll update as soon as
>> possible.
>>
>> > orch - Adam King
>> > rbd - Ilya approved
>> > krbd - Ilya approved
>> > upgrade/quincy-x (reef) - Laura PTL
>> > powercycle - Brad
>> > perf-basic - in progress
>> >
>> >
>> > On Tue, Nov 7, 2023 at 8:38 AM Casey Bodley  wrote:
>> > >
>> > > On Mon, Nov 6, 2023 at 4:31 PM Yuri Weinstein  
>> > > wrote:
>> > > >
>> > > > Details of this release are summarized here:
>> > > >
>> > > > https://tracker.ceph.com/issues/63443#note-1
>> > > >
>> > > > Seeking approvals/reviews for:
>> > > >
>> > > > smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE failures)
>> > > > rados - Neha, Radek, Travis, Ernesto, Adam King
>> > > > rgw - Casey
>> > >
>> > > rgw results are approved. https://github.com/ceph/ceph/pull/54371
>> > > merged to reef but is needed on reef-release
>> > >
>> > > > fs - Venky
>> > > > orch - Adam King
>> > > > rbd - Ilya
>> > > > krbd - Ilya
>> > > > upgrade/quincy-x (reef) - Laura PTL
>> > > > powercycle - Brad
>> > > > perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures)
>> > > >
>> > > > Please reply to this email with approval and/or trackers of known
>> > > > issues/PRs to address them.
>> > > >
>> > > > TIA
>> > > > YuriW
>> > > > ___
>> > > > ceph-users mailing list -- ceph-users@ceph.io
>> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
>> > > >
>> > >
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>>
>> --
>> Cheers,
>> Venky
>> ___
>> Dev mailing list -- d...@ceph.io
>> To unsubscribe send an email to dev-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-08 Thread Zakhar Kirpichenko
Take hints from this: "544 pgs not deep-scrubbed in time". Your OSDs are
unable to scrub their data in time, likely because they cannot cope with
the client + scrubbing I/O. I.e. there's too much data on too few and too
slow spindles.

You can play with osd_deep_scrub_interval and increase the scrub interval
from the default 604800 seconds (1 week) to 1209600 (2 weeks) or more. It
may be also a good idea to manually force scrubbing of some PGs to spread
scrubbing time more evenly over the selected period.

But in general this is not a balanced setup and little can be done to
alleviate the lack of spindle performance.

/Z

On Wed, 8 Nov 2023 at 17:22,  wrote:

> Hi Eugen
>  Please find the details below
>
> root@meghdootctr1:/var/log/ceph# ceph -s
> cluster:
> id: c59da971-57d1-43bd-b2b7-865d392412a5
> health: HEALTH_WARN
> nodeep-scrub flag(s) set
> 544 pgs not deep-scrubbed in time
>
> services:
> mon: 3 daemons, quorum meghdootctr1,meghdootctr2,meghdootctr3 (age 5d)
> mgr: meghdootctr1(active, since 5d), standbys: meghdootctr2, meghdootctr3
> mds: 3 up:standby
> osd: 36 osds: 36 up (since 34h), 36 in (since 34h)
> flags nodeep-scrub
>
> data:
> pools: 2 pools, 544 pgs
> objects: 10.14M objects, 39 TiB
> usage: 116 TiB used, 63 TiB / 179 TiB avail
> pgs: 544 active+clean
>
> io:
> client: 24 MiB/s rd, 16 MiB/s wr, 2.02k op/s rd, 907 op/s wr
>
>
> Ceph Versions:
> root@meghdootctr1:/var/log/ceph# ceph --version
> ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus
> (stable)
>
> Ceph df -h
> https://pastebin.com/1ffucyJg
>
> Ceph OSD performance dump
> https://pastebin.com/1R6YQksE
>
> Ceph tell osd.XX bench  (Out of 36 osds only 8 OSDs give High IOPS value
> of 250 +. Out of that 4 OSDs are from HP 3PAR and 4 OSDS from DELL EMC. We
> are using only 4 OSDs from HP3 par and it is working fine without any
> latency and iops issues from the beginning but the remaining 32 OSDs are
> from DELL EMC in which 4 OSDs are much better than the remaining 28 OSDs)
>
> https://pastebin.com/CixaQmBi
>
> Please help me to identify if the issue is with the DELL EMC Storage, Ceph
> configuration parameter tuning or the Overload in the cloud setup
>
>
>
> On November 1, 2023 at 9:48 PM Eugen Block  wrote:
> > Hi,
> >
> > for starters please add more cluster details like 'ceph status', 'ceph
> > versions', 'ceph osd df tree'. Increasing the to 10G was the right
> > thing to do, you don't get far with 1G with real cluster load. How are
> > the OSDs configured (HDD only, SSD only or HDD with rocksdb on SSD)?
> > How is the disk utilization?
> >
> > Regards,
> > Eugen
> >
> > Zitat von prab...@cdac.in:
> >
> > > In a production setup of 36 OSDs( SAS disks) totalling 180 TB
> > > allocated to a single Ceph Cluster with 3 monitors and 3 managers.
> > > There were 830 volumes and VMs created in Openstack with Ceph as a
> > > backend. On Sep 21, users reported slowness in accessing the VMs.
> > > Analysing the logs lead us to problem with SAS , Network congestion
> > > and Ceph configuration( as all default values were used). We updated
> > > the Network from 1Gbps to 10Gbps for public and cluster networking.
> > > There was no change.
> > > The ceph benchmark performance showed that 28 OSDs out of 36 OSDs
> > > reported very low IOPS of 30 to 50 while the remaining showed 300+
> > > IOPS.
> > > We gradually started reducing the load on the ceph cluster and now
> > > the volumes count is 650. Now the slow operations has gradually
> > > reduced but I am aware that this is not the solution.
> > > Ceph configuration is updated with increasing the
> > > osd_journal_size to 10 GB,
> > > osd_max_backfills = 1
> > > osd_recovery_max_active = 1
> > > osd_recovery_op_priority = 1
> > > bluestore_cache_trim_max_skip_pinned=1
> > >
> > > After one month, now we faced another issue with Mgr daemon stopped
> > > in all 3 quorums and 16 OSDs went down. From the
> > > ceph-mon,ceph-mgr.log could not get the reason. Please guide me as
> > > its a production setup
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: list cephfs dirfrags

2023-11-08 Thread Ben
Hi,
this directory is very busy:
ceph tell mds.* dirfrag ls
/volumes/csi/csi-vol-3a69d51a-f3cd-11ed-b738-964ec15fdba7/

while running it, all mds output:
 [
{
"value": 0,
"bits": 0,
"str": "0/0"
}
]

Thank you,
Ben

Patrick Donnelly  于2023年11月8日周三 21:58写道:

>
> On Mon, Nov 6, 2023 at 4:56 AM Ben  wrote:
>
>> Hi,
>> I used this but all returns "directory inode not in cache"
>> ceph tell mds.* dirfrag ls path
>>
>> I would like to pin some subdirs to a rank after dynamic subtree
>> partitioning. Before that, I need to know where are they exactly
>>
>
> If the dirfrag is not in cache on any rank then the dirfrag is "nowhere".
> It's only pinned to a rank if it's in cache.
>
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Red Hat Partner Engineer
> IBM, Inc.
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Help needed with Grafana password

2023-11-08 Thread Sake Ceph
I configured a password for Grafana because I want to use Loki. I used the spec 
parameter initial_admin_password and this works fine for a staging environment, 
where I never tried to used Grafana with a password for Loki. 
 
Using the username admin with the configured password gives a credentials error 
on environment where I tried to use Grafana with Loki in the past (with 17.2.6 
of Ceph/cephadm). I changed the password in the past within Grafana, but how 
can I overwrite this now? Or is there a way to cleanup all Grafana files? 
 
Best regards, 
Sake
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: owner locked out of bucket via bucket policy

2023-11-08 Thread Jayanth Reddy
Hello Casey,

Thank you so much, the steps you provided worked. I'll follow up on the
tracker to provide further information.

Regards,
Jayanth

On Wed, Nov 8, 2023 at 8:41 PM Jayanth Reddy 
wrote:

> Hello Casey,
>
> Thank you so much for the response. I'm applying these right now and let
> you know the results.
>
> Regards,
> Jayanth
>
> On Wed, Nov 8, 2023 at 8:15 PM Casey Bodley  wrote:
>
>> i've opened https://tracker.ceph.com/issues/63485 to allow
>> admin/system users to override policy parsing errors like this. i'm
>> not sure yet where this parsing regression was introduced. in reef,
>> https://github.com/ceph/ceph/pull/49395 added better error messages
>> here, along with a rgw_policy_reject_invalid_principals option to be
>> strict about principal names
>>
>>
>> to remove a bucket policy that fails to parse with "Error reading IAM
>> Policy", you can follow these steps:
>>
>> 1. find the bucket's instance id using the 'bucket stats' command
>>
>> $ radosgw-admin bucket stats --bucket {bucketname} | grep id
>>
>> 2. use the rados tool to remove the bucket policy attribute
>> (user.rgw.iam-policy) from the bucket instance metadata object
>>
>> $ rados -p default.rgw.meta -N root rmxattr
>> .bucket.meta.{bucketname}:{bucketid} user.rgw.iam-policy
>>
>> 3. radosgws may be caching the existing bucket metadata and xattrs, so
>> you'd either need to restart them or clear their metadata caches
>>
>> $ ceph daemon client.rgw.xyz cache zap
>>
>> On Wed, Nov 8, 2023 at 9:06 AM Jayanth Reddy 
>> wrote:
>> >
>> > Hello Wesley,
>> > Thank you for the response. I tried the same but ended up with 403.
>> >
>> > Regards,
>> > Jayanth
>> >
>> > On Wed, Nov 8, 2023 at 7:34 PM Wesley Dillingham 
>> wrote:
>> >>
>> >> Jaynath:
>> >>
>> >> Just to be clear with the "--admin" user's key's you have attempted to
>> delete the bucket policy using the following method:
>> https://docs.aws.amazon.com/cli/latest/reference/s3api/delete-bucket-policy.html
>> >>
>> >> This is what worked for me (on a 16.2.14 cluster). I didn't attempt to
>> interact with the affected bucket in any way other than "aws s3api
>> delete-bucket-policy"
>> >>
>> >> Respectfully,
>> >>
>> >> Wes Dillingham
>> >> w...@wesdillingham.com
>> >> LinkedIn
>> >>
>> >>
>> >> On Wed, Nov 8, 2023 at 8:30 AM Jayanth Reddy <
>> jayanthreddy5...@gmail.com> wrote:
>> >>>
>> >>> Hello Casey,
>> >>>
>> >>> We're totally stuck at this point and none of the options seem to
>> work. Please let us know if there is something in metadata or index to
>> remove those applied bucket policies. We downgraded to v17.2.6 and
>> encountering the same.
>> >>>
>> >>> Regards,
>> >>> Jayanth
>> >>>
>> >>> On Wed, Nov 8, 2023 at 7:14 AM Jayanth Reddy <
>> jayanthreddy5...@gmail.com> wrote:
>> 
>>  Hello Casey,
>> 
>>  And on further inspection, we identified that there were bucket
>> policies set from the initial days; we were in v16.2.12.
>>  We upgraded the cluster to v17.2.7 two days ago and it seems obvious
>> that the IAM error logs are generated the next minute rgw daemon upgraded
>> from v16.2.12 to v17.2.7. Looks like there is some issue with parsing.
>> 
>>  I'm thinking to downgrade back to v17.2.6 and earlier, please let me
>> know if this is a good option for now.
>> 
>>  Thanks,
>>  Jayanth
>>  
>>  From: Jayanth Reddy 
>>  Sent: Tuesday, November 7, 2023 11:59:38 PM
>>  To: Casey Bodley 
>>  Cc: Wesley Dillingham ; ceph-users <
>> ceph-users@ceph.io>; Adam Emerson 
>>  Subject: Re: [ceph-users] Re: owner locked out of bucket via bucket
>> policy
>> 
>>  Hello Casey,
>> 
>>  Thank you for the quick response. I see
>> `rgw_policy_reject_invalid_principals` is not present in v17.2.7. Please
>> let me know.
>> 
>>  Regards
>>  Jayanth
>> 
>>  On Tue, Nov 7, 2023 at 11:50 PM Casey Bodley 
>> wrote:
>> 
>>  On Tue, Nov 7, 2023 at 12:41 PM Jayanth Reddy
>>   wrote:
>>  >
>>  > Hello Wesley and Casey,
>>  >
>>  > We've ended up with the same issue and here it appears that even
>> the user with "--admin" isn't able to do anything. We're now unable to
>> figure out if it is due to bucket policies, ACLs or IAM of some sort. I'm
>> seeing these IAM errors in the logs
>>  >
>>  > ```
>>  >
>>  > Nov  7 00:02:00 ceph-05 radosgw[4054570]: req 8786689665323103851
>> 0.00368s s3:get_obj Error reading IAM Policy: Terminate parsing due to
>> Handler error.
>>  >
>>  > Nov  7 22:51:40 ceph-05 radosgw[4054570]: req 13293029267332025583
>> 0.0s s3:list_bucket Error reading IAM Policy: Terminate parsing due
>> to Handler error.
>> 
>>  it's failing to parse the bucket policy document, but the error
>>  message doesn't say what's wrong with it
>> 
>>  disabling rgw_policy_reject_invalid_principals might help if it's
>>  failing on the Principal
>> 
>>  > Nov  7 

[ceph-users] Help needed with Grafana password

2023-11-08 Thread Sake Ceph


 
 
  
   I configured a password for Grafana because I want to use Loki. I used the spec parameter initial_admin_password and this works fine for a staging environment, where I never tried to used Grafana with a password for Loki. 
   
  
    
   
  
   Using the username admin with the configured password gives a credentials error on environment where I tried to use Grafana with Loki in the past (with 17.2.6 of Ceph/cephadm). I changed the password in the past within Grafana, but how can I overwrite this now? Or is there a way to cleanup all Grafana files? 
   
  
    
   
  
   Best regards, 
   
  
   Sake
   
 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: owner locked out of bucket via bucket policy

2023-11-08 Thread Siddhit Renake
Hello Casey,

Our Production buckets are impacted due to this issue. We have downgraded Ceph 
version from 17.2.7 to 17.2.6 but still we are getting "bucket policy parsing" 
error while accessing the buckets. rgw_policy_reject_invalid_principals is not 
present in 17.2.6 as configurable parameter.  Would appreciate response from 
your end.

Nov  8 16:39:03 [1485064]: req 4696096351995892977 0.0s s3:get_obj 
Error reading IAM Policy: Terminate parsing due to Handler error.
Nov  8 16:39:03 [1485064]: req 8949648957608194335 0.0s s3:get_obj 
Error reading IAM Policy: Terminate parsing due to Handler error.
Nov  8 16:39:03  1485064]: req 3856551010860810445 0.00348s s3:get_obj 
Error reading IAM Policy: Terminate parsing due to Handler error.
Nov  8 16:39:03 [1485064]: req 18116384331500039920 0.0s s3:get_obj 
Error reading IAM Policy: Terminate parsing due to Handler error.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Radosgw object stat olh object attrs what does it mean.

2023-11-08 Thread Selcuk Gultekin
I'd like to discuss the questions I should ask to understand the values under 
the 'attrs' of an object in the following JSON data structure and evaluate the 
health of these objects:

I have a sample json output, can you comment on the object state here?

{ "name": "$image.name", "size": 0, "tag": "", "attrs": { "user.rgw.manifest": 
"", "user.rgw.olh.idtag": "$tag.uuid", "user.rgw.olh.info": "\u0001\u0001�", 
"user.rgw.olh.ver": "4" } }

What is the purpose of this fields?

"user.rgw.manifest" "user.rgw.olh.idtag" "user.rgw.olh.info" "user.rgw.olh.ver"

What does the empty value "", signify in the context of the object? 
How does the absence of value in this field affect the object's health?
How is the content of this field generated? (For example, what does the "$tag" 
value represent?) 
What is the function of this field? 
What information does the content of this field carry about the object's 
status? 
What does the content of this field signify? (For instance, what does "4" 
represent?) Does this field represent the object's version? 
What are the distinguishing features that set this object apart from previous 
versions?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph storage pool error

2023-11-08 Thread necoe0147
Ceph 3 clusters are running and the 3rd cluster gave an error, it is currently 
offline. I want to get all the remaining data in 2 clusters. Instead of fixing 
ceph, I just want to save the data. How can I access this data and connect to 
the pool? Can you help me?1 and 2 clusters are working. I want to view my data 
from them and then transfer them to another place. How can I do this? I have 
never used Ceph before.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Memory footprint of increased PG number

2023-11-08 Thread Nicola Mori
Dear Ceph user,

I'm wondering how much an increase of PG number would impact on the memory 
occupancy of OSD daemons. In my cluster I currently have 512 PGs and I would 
like to increase it to 1024 to mitigate some disk occupancy issues, but having 
machines with low amount of memory (down to 24 GB for 16 OSDs) I fear this 
could kill my cluster. Is it possible to evaluate the relative increase in OSD 
memory footprint when doubling the number of PGs (hopefully not a linear 
scaling)? Or is there a way to experiment without crashing everything?
Thank you.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Question about PG mgr/balancer/crush_compat_metrics

2023-11-08 Thread bryansoong21
Hello,

We are using a Ceph Pacific (16.2.10) cluster and enabled the balancer module, 
but the usage of some OSDs keeps growing and reached up to 
mon_osd_nearfull_ratio, which we use 85% by default, and we think the balancer 
module should do some balancer work.

So I checked our balancer configuration and found that our 
"crush_compat_metrics" is set to "pgs,objects,bytes", and this three values are 
used in src.pybind.mgr.balancer.module.Module.calc_eval. However, when doing 
the actual balance task, only the first key is used to do the auto balance, in 
src.pybind.mgr.balancer.module.Module.do_crush_compat:
metrics = self.get_module_option('crush_compat_metrics').split(',')
key = metrics[0] # balancing using the first score metric

My concern is, any reason why we calculate the balancing using the three items 
but only do the balance using the first one?

Thanks.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph OSD reported Slow operations

2023-11-08 Thread prabhav
Hi Eugen
 Please find the details below

root@meghdootctr1:/var/log/ceph# ceph -s
cluster:
id: c59da971-57d1-43bd-b2b7-865d392412a5
health: HEALTH_WARN
nodeep-scrub flag(s) set
544 pgs not deep-scrubbed in time

services:
mon: 3 daemons, quorum meghdootctr1,meghdootctr2,meghdootctr3 (age 5d)
mgr: meghdootctr1(active, since 5d), standbys: meghdootctr2, meghdootctr3
mds: 3 up:standby
osd: 36 osds: 36 up (since 34h), 36 in (since 34h)
flags nodeep-scrub

data:
pools: 2 pools, 544 pgs
objects: 10.14M objects, 39 TiB
usage: 116 TiB used, 63 TiB / 179 TiB avail
pgs: 544 active+clean

io:
client: 24 MiB/s rd, 16 MiB/s wr, 2.02k op/s rd, 907 op/s wr


Ceph Versions:
root@meghdootctr1:/var/log/ceph# ceph --version
ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus 
(stable)

Ceph df -h
https://pastebin.com/1ffucyJg

Ceph OSD performance dump
https://pastebin.com/1R6YQksE

Ceph tell osd.XX bench  (Out of 36 osds only 8 OSDs give High IOPS value of 250 
+. Out of that 4 OSDs are from HP 3PAR and 4 OSDS from DELL EMC. We are using 
only 4 OSDs from HP3 par and it is working fine without any latency and iops 
issues from the beginning but the remaining 32 OSDs are from DELL EMC in which 
4 OSDs are much better than the remaining 28 OSDs)

https://pastebin.com/CixaQmBi

Please help me to identify if the issue is with the DELL EMC Storage, Ceph 
configuration parameter tuning or the Overload in the cloud setup



On November 1, 2023 at 9:48 PM Eugen Block  wrote:
> Hi,
>
> for starters please add more cluster details like 'ceph status', 'ceph
> versions', 'ceph osd df tree'. Increasing the to 10G was the right
> thing to do, you don't get far with 1G with real cluster load. How are
> the OSDs configured (HDD only, SSD only or HDD with rocksdb on SSD)?
> How is the disk utilization?
>
> Regards,
> Eugen
>
> Zitat von prab...@cdac.in:
>
> > In a production setup of 36 OSDs( SAS disks) totalling 180 TB
> > allocated to a single Ceph Cluster with 3 monitors and 3 managers.
> > There were 830 volumes and VMs created in Openstack with Ceph as a
> > backend. On Sep 21, users reported slowness in accessing the VMs.
> > Analysing the logs lead us to problem with SAS , Network congestion
> > and Ceph configuration( as all default values were used). We updated
> > the Network from 1Gbps to 10Gbps for public and cluster networking.
> > There was no change.
> > The ceph benchmark performance showed that 28 OSDs out of 36 OSDs
> > reported very low IOPS of 30 to 50 while the remaining showed 300+
> > IOPS.
> > We gradually started reducing the load on the ceph cluster and now
> > the volumes count is 650. Now the slow operations has gradually
> > reduced but I am aware that this is not the solution.
> > Ceph configuration is updated with increasing the
> > osd_journal_size to 10 GB,
> > osd_max_backfills = 1
> > osd_recovery_max_active = 1
> > osd_recovery_op_priority = 1
> > bluestore_cache_trim_max_skip_pinned=1
> >
> > After one month, now we faced another issue with Mgr daemon stopped
> > in all 3 quorums and 16 OSDs went down. From the
> > ceph-mon,ceph-mgr.log could not get the reason. Please guide me as
> > its a production setup
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: owner locked out of bucket via bucket policy

2023-11-08 Thread Jayanth Reddy
Hello Casey,

Thank you so much for the response. I'm applying these right now and let
you know the results.

Regards,
Jayanth

On Wed, Nov 8, 2023 at 8:15 PM Casey Bodley  wrote:

> i've opened https://tracker.ceph.com/issues/63485 to allow
> admin/system users to override policy parsing errors like this. i'm
> not sure yet where this parsing regression was introduced. in reef,
> https://github.com/ceph/ceph/pull/49395 added better error messages
> here, along with a rgw_policy_reject_invalid_principals option to be
> strict about principal names
>
>
> to remove a bucket policy that fails to parse with "Error reading IAM
> Policy", you can follow these steps:
>
> 1. find the bucket's instance id using the 'bucket stats' command
>
> $ radosgw-admin bucket stats --bucket {bucketname} | grep id
>
> 2. use the rados tool to remove the bucket policy attribute
> (user.rgw.iam-policy) from the bucket instance metadata object
>
> $ rados -p default.rgw.meta -N root rmxattr
> .bucket.meta.{bucketname}:{bucketid} user.rgw.iam-policy
>
> 3. radosgws may be caching the existing bucket metadata and xattrs, so
> you'd either need to restart them or clear their metadata caches
>
> $ ceph daemon client.rgw.xyz cache zap
>
> On Wed, Nov 8, 2023 at 9:06 AM Jayanth Reddy 
> wrote:
> >
> > Hello Wesley,
> > Thank you for the response. I tried the same but ended up with 403.
> >
> > Regards,
> > Jayanth
> >
> > On Wed, Nov 8, 2023 at 7:34 PM Wesley Dillingham 
> wrote:
> >>
> >> Jaynath:
> >>
> >> Just to be clear with the "--admin" user's key's you have attempted to
> delete the bucket policy using the following method:
> https://docs.aws.amazon.com/cli/latest/reference/s3api/delete-bucket-policy.html
> >>
> >> This is what worked for me (on a 16.2.14 cluster). I didn't attempt to
> interact with the affected bucket in any way other than "aws s3api
> delete-bucket-policy"
> >>
> >> Respectfully,
> >>
> >> Wes Dillingham
> >> w...@wesdillingham.com
> >> LinkedIn
> >>
> >>
> >> On Wed, Nov 8, 2023 at 8:30 AM Jayanth Reddy <
> jayanthreddy5...@gmail.com> wrote:
> >>>
> >>> Hello Casey,
> >>>
> >>> We're totally stuck at this point and none of the options seem to
> work. Please let us know if there is something in metadata or index to
> remove those applied bucket policies. We downgraded to v17.2.6 and
> encountering the same.
> >>>
> >>> Regards,
> >>> Jayanth
> >>>
> >>> On Wed, Nov 8, 2023 at 7:14 AM Jayanth Reddy <
> jayanthreddy5...@gmail.com> wrote:
> 
>  Hello Casey,
> 
>  And on further inspection, we identified that there were bucket
> policies set from the initial days; we were in v16.2.12.
>  We upgraded the cluster to v17.2.7 two days ago and it seems obvious
> that the IAM error logs are generated the next minute rgw daemon upgraded
> from v16.2.12 to v17.2.7. Looks like there is some issue with parsing.
> 
>  I'm thinking to downgrade back to v17.2.6 and earlier, please let me
> know if this is a good option for now.
> 
>  Thanks,
>  Jayanth
>  
>  From: Jayanth Reddy 
>  Sent: Tuesday, November 7, 2023 11:59:38 PM
>  To: Casey Bodley 
>  Cc: Wesley Dillingham ; ceph-users <
> ceph-users@ceph.io>; Adam Emerson 
>  Subject: Re: [ceph-users] Re: owner locked out of bucket via bucket
> policy
> 
>  Hello Casey,
> 
>  Thank you for the quick response. I see
> `rgw_policy_reject_invalid_principals` is not present in v17.2.7. Please
> let me know.
> 
>  Regards
>  Jayanth
> 
>  On Tue, Nov 7, 2023 at 11:50 PM Casey Bodley 
> wrote:
> 
>  On Tue, Nov 7, 2023 at 12:41 PM Jayanth Reddy
>   wrote:
>  >
>  > Hello Wesley and Casey,
>  >
>  > We've ended up with the same issue and here it appears that even
> the user with "--admin" isn't able to do anything. We're now unable to
> figure out if it is due to bucket policies, ACLs or IAM of some sort. I'm
> seeing these IAM errors in the logs
>  >
>  > ```
>  >
>  > Nov  7 00:02:00 ceph-05 radosgw[4054570]: req 8786689665323103851
> 0.00368s s3:get_obj Error reading IAM Policy: Terminate parsing due to
> Handler error.
>  >
>  > Nov  7 22:51:40 ceph-05 radosgw[4054570]: req 13293029267332025583
> 0.0s s3:list_bucket Error reading IAM Policy: Terminate parsing due
> to Handler error.
> 
>  it's failing to parse the bucket policy document, but the error
>  message doesn't say what's wrong with it
> 
>  disabling rgw_policy_reject_invalid_principals might help if it's
>  failing on the Principal
> 
>  > Nov  7 22:51:40 ceph-05 radosgw[4054570]: req 13293029267332025583
> 0.0s s3:list_bucket init_permissions on
> :window-dev[1d0fa0b4-04eb-48f9-889b-a60de865ccd8.24143.10]) failed, ret=-13
>  > Nov  7 22:51:40 ceph-feed-05 radosgw[4054570]: req
> 13293029267332025583 0.0s op->ERRORHANDLER: err_no=-13
> new_err_no=-13
>  >
>  > ```
> 

[ceph-users] Re: Permanent KeyError: 'TYPE' ->17.2.7: return self.blkid_api['TYPE'] == 'part'

2023-11-08 Thread Sascha Lucas

Hi,

On Tue, 7 Nov 2023, Harry G Coin wrote:

These repeat for every host, only after upgrading from prev release Quincy to 
17.2.7.   As a result, the cluster is always warned, never indicates healthy.


I'm hitting this error, too.

"/usr/lib/python3.6/site-packages/ceph_volume/util/device.py", line 482, in 
is_partition

/usr/bin/docker: stderr return self.blkid_api['TYPE'] == 'part'
/usr/bin/docker: stderr KeyError: 'TYPE'


Variable names indicate usage of BLKID(8). It seems that `blkid` usually 
returns TYPE="something", but I have devices without TYPE:


/dev/mapper/data-4d323729--8fec--42c6--a1da--bacdea89fb37.disk0_data: 
PTUUID="c2901603-fae8-45cb-86fe-13d02e6b6dc6" PTTYPE="gpt"
/dev/mapper/data-8d485122--d8ca--4e11--85bb--3f795a4e31e9.disk0_data: PTUUID="2bc7a15e" 
PTTYPE="dos"
/dev/drbd3: PTUUID="2bc7a15e" PTTYPE="dos"

Maybe this indicates why the key is missing?

Please tell me if there is anything I can do to find the root cause.

Thanks, Sascha.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Seagate Exos power settings - any experiences at your sites?

2023-11-08 Thread Danny Webb
We've had some issues with Exos drives dropping out of our sas controllers (LSI 
SAS3008 PCI-Express Fusion-MPT SAS-3) intermittently which we believe is due to 
this.  Upgrading the drive firmware largely solved it for us so we never ended 
up messing about with the power settings.

From: Alex Gorbachev 
Sent: 07 November 2023 15:06
To: ceph-users 
Subject: [ceph-users] Seagate Exos power settings - any experiences at your 
sites?

CAUTION: This email originates from outside THG

We have been seeing some odd behavior with scrubbing (very slow) and OSD
warnings on a couple of new clusters. A bit of research turned up this:

https://www.reddit.com/r/truenas/comments/p1ebnf/seagate_exos_load_cyclingidling_info_solution/

We've installed the tool from 
https://github.com/Seagate/openSeaChest
 and
disabled EPC power features similar to:

openSeaChest_PowerControl --scan|grep ST|awk '{print $2}'|xargs -I {}
openSeaChest_PowerControl -d {} --EPCfeature disable

Things seem to be better now on those two clusters. Has anyone seen
anything similar? This would seem to be a huge issue if all defaults on
Exos are wrong (stop-and-go on all Ceph/ZFS workloads).
--
Best regards,
Alex Gorbachev
--
Intelligent Systems Services Inc.
http://www.iss-integration.com
https://www.linkedin.com/in/alex-gorbachev-iss/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: owner locked out of bucket via bucket policy

2023-11-08 Thread Casey Bodley
i've opened https://tracker.ceph.com/issues/63485 to allow
admin/system users to override policy parsing errors like this. i'm
not sure yet where this parsing regression was introduced. in reef,
https://github.com/ceph/ceph/pull/49395 added better error messages
here, along with a rgw_policy_reject_invalid_principals option to be
strict about principal names


to remove a bucket policy that fails to parse with "Error reading IAM
Policy", you can follow these steps:

1. find the bucket's instance id using the 'bucket stats' command

$ radosgw-admin bucket stats --bucket {bucketname} | grep id

2. use the rados tool to remove the bucket policy attribute
(user.rgw.iam-policy) from the bucket instance metadata object

$ rados -p default.rgw.meta -N root rmxattr
.bucket.meta.{bucketname}:{bucketid} user.rgw.iam-policy

3. radosgws may be caching the existing bucket metadata and xattrs, so
you'd either need to restart them or clear their metadata caches

$ ceph daemon client.rgw.xyz cache zap

On Wed, Nov 8, 2023 at 9:06 AM Jayanth Reddy  wrote:
>
> Hello Wesley,
> Thank you for the response. I tried the same but ended up with 403.
>
> Regards,
> Jayanth
>
> On Wed, Nov 8, 2023 at 7:34 PM Wesley Dillingham  
> wrote:
>>
>> Jaynath:
>>
>> Just to be clear with the "--admin" user's key's you have attempted to 
>> delete the bucket policy using the following method: 
>> https://docs.aws.amazon.com/cli/latest/reference/s3api/delete-bucket-policy.html
>>
>> This is what worked for me (on a 16.2.14 cluster). I didn't attempt to 
>> interact with the affected bucket in any way other than "aws s3api 
>> delete-bucket-policy"
>>
>> Respectfully,
>>
>> Wes Dillingham
>> w...@wesdillingham.com
>> LinkedIn
>>
>>
>> On Wed, Nov 8, 2023 at 8:30 AM Jayanth Reddy  
>> wrote:
>>>
>>> Hello Casey,
>>>
>>> We're totally stuck at this point and none of the options seem to work. 
>>> Please let us know if there is something in metadata or index to remove 
>>> those applied bucket policies. We downgraded to v17.2.6 and encountering 
>>> the same.
>>>
>>> Regards,
>>> Jayanth
>>>
>>> On Wed, Nov 8, 2023 at 7:14 AM Jayanth Reddy  
>>> wrote:

 Hello Casey,

 And on further inspection, we identified that there were bucket policies 
 set from the initial days; we were in v16.2.12.
 We upgraded the cluster to v17.2.7 two days ago and it seems obvious that 
 the IAM error logs are generated the next minute rgw daemon upgraded from 
 v16.2.12 to v17.2.7. Looks like there is some issue with parsing.

 I'm thinking to downgrade back to v17.2.6 and earlier, please let me know 
 if this is a good option for now.

 Thanks,
 Jayanth
 
 From: Jayanth Reddy 
 Sent: Tuesday, November 7, 2023 11:59:38 PM
 To: Casey Bodley 
 Cc: Wesley Dillingham ; ceph-users 
 ; Adam Emerson 
 Subject: Re: [ceph-users] Re: owner locked out of bucket via bucket policy

 Hello Casey,

 Thank you for the quick response. I see 
 `rgw_policy_reject_invalid_principals` is not present in v17.2.7. Please 
 let me know.

 Regards
 Jayanth

 On Tue, Nov 7, 2023 at 11:50 PM Casey Bodley  wrote:

 On Tue, Nov 7, 2023 at 12:41 PM Jayanth Reddy
  wrote:
 >
 > Hello Wesley and Casey,
 >
 > We've ended up with the same issue and here it appears that even the 
 > user with "--admin" isn't able to do anything. We're now unable to 
 > figure out if it is due to bucket policies, ACLs or IAM of some sort. 
 > I'm seeing these IAM errors in the logs
 >
 > ```
 >
 > Nov  7 00:02:00 ceph-05 radosgw[4054570]: req 8786689665323103851 
 > 0.00368s s3:get_obj Error reading IAM Policy: Terminate parsing due 
 > to Handler error.
 >
 > Nov  7 22:51:40 ceph-05 radosgw[4054570]: req 13293029267332025583 
 > 0.0s s3:list_bucket Error reading IAM Policy: Terminate parsing 
 > due to Handler error.

 it's failing to parse the bucket policy document, but the error
 message doesn't say what's wrong with it

 disabling rgw_policy_reject_invalid_principals might help if it's
 failing on the Principal

 > Nov  7 22:51:40 ceph-05 radosgw[4054570]: req 13293029267332025583 
 > 0.0s s3:list_bucket init_permissions on 
 > :window-dev[1d0fa0b4-04eb-48f9-889b-a60de865ccd8.24143.10]) failed, 
 > ret=-13
 > Nov  7 22:51:40 ceph-feed-05 radosgw[4054570]: req 13293029267332025583 
 > 0.0s op->ERRORHANDLER: err_no=-13 new_err_no=-13
 >
 > ```
 >
 > Please help what's wrong here. We're in Ceph v17.2.7.
 >
 > Regards,
 > Jayanth
 >
 > On Thu, Oct 26, 2023 at 7:14 PM Wesley Dillingham 
 >  wrote:
 >>
 >> Thank you, this has worked to remove the policy.
 >>
 >> Respectfully,
 >>
 >> *Wes Dillingham*
 >> w...@wesdillingham.com
 >> 

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-08 Thread Travis Nielsen
Yuri, we need to add this issue as a blocker for 18.2.1. We discovered this
issue after the release of 17.2.7, and don't want to hit the same blocker
in 18.2.1 where some types of OSDs are failing to be created in new
clusters, or failing to start in upgraded clusters.
https://tracker.ceph.com/issues/63391

Thanks!
Travis

On Wed, Nov 8, 2023 at 4:41 AM Venky Shankar  wrote:

> Hi Yuri,
>
> On Wed, Nov 8, 2023 at 2:32 AM Yuri Weinstein  wrote:
> >
> > 3 PRs above mentioned were merged and I am returning some tests:
> > https://pulpito.ceph.com/?sha1=55e3239498650453ff76a9b06a37f1a6f488c8fd
> >
> > Still seeing approvals.
> > smoke - Laura, Radek, Prashant, Venky in progress
> > rados - Neha, Radek, Travis, Ernesto, Adam King
> > rgw - Casey in progress
> > fs - Venky
>
> There's a failure in the fs suite
>
>
> https://pulpito.ceph.com/vshankar-2023-11-07_05:14:36-fs-reef-release-distro-default-smithi/7450325/
>
> Seems to be related to nfs-ganesha. I've reached out to Frank Filz
> (#cephfs on ceph slack) to have a look. WIll update as soon as
> possible.
>
> > orch - Adam King
> > rbd - Ilya approved
> > krbd - Ilya approved
> > upgrade/quincy-x (reef) - Laura PTL
> > powercycle - Brad
> > perf-basic - in progress
> >
> >
> > On Tue, Nov 7, 2023 at 8:38 AM Casey Bodley  wrote:
> > >
> > > On Mon, Nov 6, 2023 at 4:31 PM Yuri Weinstein 
> wrote:
> > > >
> > > > Details of this release are summarized here:
> > > >
> > > > https://tracker.ceph.com/issues/63443#note-1
> > > >
> > > > Seeking approvals/reviews for:
> > > >
> > > > smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE failures)
> > > > rados - Neha, Radek, Travis, Ernesto, Adam King
> > > > rgw - Casey
> > >
> > > rgw results are approved. https://github.com/ceph/ceph/pull/54371
> > > merged to reef but is needed on reef-release
> > >
> > > > fs - Venky
> > > > orch - Adam King
> > > > rbd - Ilya
> > > > krbd - Ilya
> > > > upgrade/quincy-x (reef) - Laura PTL
> > > > powercycle - Brad
> > > > perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures)
> > > >
> > > > Please reply to this email with approval and/or trackers of known
> > > > issues/PRs to address them.
> > > >
> > > > TIA
> > > > YuriW
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > >
> > >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> --
> Cheers,
> Venky
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: owner locked out of bucket via bucket policy

2023-11-08 Thread Jayanth Reddy
Hello Wesley,
Thank you for the response. I tried the same but ended up with 403.

Regards,
Jayanth

On Wed, Nov 8, 2023 at 7:34 PM Wesley Dillingham 
wrote:

> Jaynath:
>
> Just to be clear with the "--admin" user's key's you have attempted to
> delete the bucket policy using the following method:
> https://docs.aws.amazon.com/cli/latest/reference/s3api/delete-bucket-policy.html
>
> This is what worked for me (on a 16.2.14 cluster). I didn't attempt to
> interact with the affected bucket in any way other than "aws s3api
> delete-bucket-policy"
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn 
>
>
> On Wed, Nov 8, 2023 at 8:30 AM Jayanth Reddy 
> wrote:
>
>> Hello Casey,
>>
>> We're totally stuck at this point and none of the options seem to work.
>> Please let us know if there is something in metadata or index to remove
>> those applied bucket policies. We downgraded to v17.2.6 and encountering
>> the same.
>>
>> Regards,
>> Jayanth
>>
>> On Wed, Nov 8, 2023 at 7:14 AM Jayanth Reddy 
>> wrote:
>>
>>> Hello Casey,
>>>
>>> And on further inspection, we identified that there were bucket policies
>>> set from the initial days; we were in v16.2.12.
>>> We upgraded the cluster to v17.2.7 two days ago and it seems obvious
>>> that the IAM error logs are generated the next minute rgw daemon upgraded
>>> from v16.2.12 to v17.2.7. Looks like there is some issue with parsing.
>>>
>>> I'm thinking to downgrade back to v17.2.6 and earlier, please let me
>>> know if this is a good option for now.
>>>
>>> Thanks,
>>> Jayanth
>>> --
>>> *From:* Jayanth Reddy 
>>> *Sent:* Tuesday, November 7, 2023 11:59:38 PM
>>> *To:* Casey Bodley 
>>> *Cc:* Wesley Dillingham ; ceph-users <
>>> ceph-users@ceph.io>; Adam Emerson 
>>> *Subject:* Re: [ceph-users] Re: owner locked out of bucket via bucket
>>> policy
>>>
>>> Hello Casey,
>>>
>>> Thank you for the quick response. I see
>>> `rgw_policy_reject_invalid_principals` is not present in v17.2.7. Please
>>> let me know.
>>>
>>> Regards
>>> Jayanth
>>>
>>> On Tue, Nov 7, 2023 at 11:50 PM Casey Bodley  wrote:
>>>
>>> On Tue, Nov 7, 2023 at 12:41 PM Jayanth Reddy
>>>  wrote:
>>> >
>>> > Hello Wesley and Casey,
>>> >
>>> > We've ended up with the same issue and here it appears that even the
>>> user with "--admin" isn't able to do anything. We're now unable to figure
>>> out if it is due to bucket policies, ACLs or IAM of some sort. I'm seeing
>>> these IAM errors in the logs
>>> >
>>> > ```
>>> >
>>> > Nov  7 00:02:00 ceph-05 radosgw[4054570]: req 8786689665323103851
>>> 0.00368s s3:get_obj Error reading IAM Policy: Terminate parsing due to
>>> Handler error.
>>> >
>>> > Nov  7 22:51:40 ceph-05 radosgw[4054570]: req 13293029267332025583
>>> 0.0s s3:list_bucket Error reading IAM Policy: Terminate parsing due
>>> to Handler error.
>>>
>>> it's failing to parse the bucket policy document, but the error
>>> message doesn't say what's wrong with it
>>>
>>> disabling rgw_policy_reject_invalid_principals might help if it's
>>> failing on the Principal
>>>
>>> > Nov  7 22:51:40 ceph-05 radosgw[4054570]: req 13293029267332025583
>>> 0.0s s3:list_bucket init_permissions on
>>> :window-dev[1d0fa0b4-04eb-48f9-889b-a60de865ccd8.24143.10]) failed, ret=-13
>>> > Nov  7 22:51:40 ceph-feed-05 radosgw[4054570]: req
>>> 13293029267332025583 0.0s op->ERRORHANDLER: err_no=-13
>>> new_err_no=-13
>>> >
>>> > ```
>>> >
>>> > Please help what's wrong here. We're in Ceph v17.2.7.
>>> >
>>> > Regards,
>>> > Jayanth
>>> >
>>> > On Thu, Oct 26, 2023 at 7:14 PM Wesley Dillingham <
>>> w...@wesdillingham.com> wrote:
>>> >>
>>> >> Thank you, this has worked to remove the policy.
>>> >>
>>> >> Respectfully,
>>> >>
>>> >> *Wes Dillingham*
>>> >> w...@wesdillingham.com
>>> >> LinkedIn 
>>> >>
>>> >>
>>> >> On Wed, Oct 25, 2023 at 5:10 PM Casey Bodley 
>>> wrote:
>>> >>
>>> >> > On Wed, Oct 25, 2023 at 4:59 PM Wesley Dillingham <
>>> w...@wesdillingham.com>
>>> >> > wrote:
>>> >> > >
>>> >> > > Thank you, I am not sure (inherited cluster). I presume such an
>>> admin
>>> >> > user created after-the-fact would work?
>>> >> >
>>> >> > yes
>>> >> >
>>> >> > > Is there a good way to discover an admin user other than iterate
>>> over
>>> >> > all users and retrieve user information? (I presume radosgw-admin
>>> user info
>>> >> > --uid=" would illustrate such administrative access?
>>> >> >
>>> >> > not sure there's an easy way to search existing users, but you could
>>> >> > create a temporary admin user for this repair
>>> >> >
>>> >> > >
>>> >> > > Respectfully,
>>> >> > >
>>> >> > > Wes Dillingham
>>> >> > > w...@wesdillingham.com
>>> >> > > LinkedIn
>>> >> > >
>>> >> > >
>>> >> > > On Wed, Oct 25, 2023 at 4:41 PM Casey Bodley 
>>> wrote:
>>> >> > >>
>>> >> > >> if you have an administrative user (created with --admin), you
>>> should
>>> >> > >> be able 

[ceph-users] Re: owner locked out of bucket via bucket policy

2023-11-08 Thread Wesley Dillingham
Jaynath:

Just to be clear with the "--admin" user's key's you have attempted to
delete the bucket policy using the following method:
https://docs.aws.amazon.com/cli/latest/reference/s3api/delete-bucket-policy.html

This is what worked for me (on a 16.2.14 cluster). I didn't attempt to
interact with the affected bucket in any way other than "aws s3api
delete-bucket-policy"

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Wed, Nov 8, 2023 at 8:30 AM Jayanth Reddy 
wrote:

> Hello Casey,
>
> We're totally stuck at this point and none of the options seem to work.
> Please let us know if there is something in metadata or index to remove
> those applied bucket policies. We downgraded to v17.2.6 and encountering
> the same.
>
> Regards,
> Jayanth
>
> On Wed, Nov 8, 2023 at 7:14 AM Jayanth Reddy 
> wrote:
>
>> Hello Casey,
>>
>> And on further inspection, we identified that there were bucket policies
>> set from the initial days; we were in v16.2.12.
>> We upgraded the cluster to v17.2.7 two days ago and it seems obvious that
>> the IAM error logs are generated the next minute rgw daemon upgraded from
>> v16.2.12 to v17.2.7. Looks like there is some issue with parsing.
>>
>> I'm thinking to downgrade back to v17.2.6 and earlier, please let me know
>> if this is a good option for now.
>>
>> Thanks,
>> Jayanth
>> --
>> *From:* Jayanth Reddy 
>> *Sent:* Tuesday, November 7, 2023 11:59:38 PM
>> *To:* Casey Bodley 
>> *Cc:* Wesley Dillingham ; ceph-users <
>> ceph-users@ceph.io>; Adam Emerson 
>> *Subject:* Re: [ceph-users] Re: owner locked out of bucket via bucket
>> policy
>>
>> Hello Casey,
>>
>> Thank you for the quick response. I see
>> `rgw_policy_reject_invalid_principals` is not present in v17.2.7. Please
>> let me know.
>>
>> Regards
>> Jayanth
>>
>> On Tue, Nov 7, 2023 at 11:50 PM Casey Bodley  wrote:
>>
>> On Tue, Nov 7, 2023 at 12:41 PM Jayanth Reddy
>>  wrote:
>> >
>> > Hello Wesley and Casey,
>> >
>> > We've ended up with the same issue and here it appears that even the
>> user with "--admin" isn't able to do anything. We're now unable to figure
>> out if it is due to bucket policies, ACLs or IAM of some sort. I'm seeing
>> these IAM errors in the logs
>> >
>> > ```
>> >
>> > Nov  7 00:02:00 ceph-05 radosgw[4054570]: req 8786689665323103851
>> 0.00368s s3:get_obj Error reading IAM Policy: Terminate parsing due to
>> Handler error.
>> >
>> > Nov  7 22:51:40 ceph-05 radosgw[4054570]: req 13293029267332025583
>> 0.0s s3:list_bucket Error reading IAM Policy: Terminate parsing due
>> to Handler error.
>>
>> it's failing to parse the bucket policy document, but the error
>> message doesn't say what's wrong with it
>>
>> disabling rgw_policy_reject_invalid_principals might help if it's
>> failing on the Principal
>>
>> > Nov  7 22:51:40 ceph-05 radosgw[4054570]: req 13293029267332025583
>> 0.0s s3:list_bucket init_permissions on
>> :window-dev[1d0fa0b4-04eb-48f9-889b-a60de865ccd8.24143.10]) failed, ret=-13
>> > Nov  7 22:51:40 ceph-feed-05 radosgw[4054570]: req 13293029267332025583
>> 0.0s op->ERRORHANDLER: err_no=-13 new_err_no=-13
>> >
>> > ```
>> >
>> > Please help what's wrong here. We're in Ceph v17.2.7.
>> >
>> > Regards,
>> > Jayanth
>> >
>> > On Thu, Oct 26, 2023 at 7:14 PM Wesley Dillingham <
>> w...@wesdillingham.com> wrote:
>> >>
>> >> Thank you, this has worked to remove the policy.
>> >>
>> >> Respectfully,
>> >>
>> >> *Wes Dillingham*
>> >> w...@wesdillingham.com
>> >> LinkedIn 
>> >>
>> >>
>> >> On Wed, Oct 25, 2023 at 5:10 PM Casey Bodley 
>> wrote:
>> >>
>> >> > On Wed, Oct 25, 2023 at 4:59 PM Wesley Dillingham <
>> w...@wesdillingham.com>
>> >> > wrote:
>> >> > >
>> >> > > Thank you, I am not sure (inherited cluster). I presume such an
>> admin
>> >> > user created after-the-fact would work?
>> >> >
>> >> > yes
>> >> >
>> >> > > Is there a good way to discover an admin user other than iterate
>> over
>> >> > all users and retrieve user information? (I presume radosgw-admin
>> user info
>> >> > --uid=" would illustrate such administrative access?
>> >> >
>> >> > not sure there's an easy way to search existing users, but you could
>> >> > create a temporary admin user for this repair
>> >> >
>> >> > >
>> >> > > Respectfully,
>> >> > >
>> >> > > Wes Dillingham
>> >> > > w...@wesdillingham.com
>> >> > > LinkedIn
>> >> > >
>> >> > >
>> >> > > On Wed, Oct 25, 2023 at 4:41 PM Casey Bodley 
>> wrote:
>> >> > >>
>> >> > >> if you have an administrative user (created with --admin), you
>> should
>> >> > >> be able to use its credentials with awscli to delete or overwrite
>> this
>> >> > >> bucket policy
>> >> > >>
>> >> > >> On Wed, Oct 25, 2023 at 4:11 PM Wesley Dillingham <
>> >> > w...@wesdillingham.com> wrote:
>> >> > >> >
>> >> > >> > I have a bucket which got injected with bucket policy which
>> locks the
>> >> > >> > bucket even 

[ceph-users] Re: list cephfs dirfrags

2023-11-08 Thread Patrick Donnelly
On Mon, Nov 6, 2023 at 4:56 AM Ben  wrote:

> Hi,
> I used this but all returns "directory inode not in cache"
> ceph tell mds.* dirfrag ls path
>
> I would like to pin some subdirs to a rank after dynamic subtree
> partitioning. Before that, I need to know where are they exactly
>

If the dirfrag is not in cache on any rank then the dirfrag is "nowhere".
It's only pinned to a rank if it's in cache.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: owner locked out of bucket via bucket policy

2023-11-08 Thread Jayanth Reddy
Hello Casey,

We're totally stuck at this point and none of the options seem to work.
Please let us know if there is something in metadata or index to remove
those applied bucket policies. We downgraded to v17.2.6 and encountering
the same.

Regards,
Jayanth

On Wed, Nov 8, 2023 at 7:14 AM Jayanth Reddy 
wrote:

> Hello Casey,
>
> And on further inspection, we identified that there were bucket policies
> set from the initial days; we were in v16.2.12.
> We upgraded the cluster to v17.2.7 two days ago and it seems obvious that
> the IAM error logs are generated the next minute rgw daemon upgraded from
> v16.2.12 to v17.2.7. Looks like there is some issue with parsing.
>
> I'm thinking to downgrade back to v17.2.6 and earlier, please let me know
> if this is a good option for now.
>
> Thanks,
> Jayanth
> --
> *From:* Jayanth Reddy 
> *Sent:* Tuesday, November 7, 2023 11:59:38 PM
> *To:* Casey Bodley 
> *Cc:* Wesley Dillingham ; ceph-users <
> ceph-users@ceph.io>; Adam Emerson 
> *Subject:* Re: [ceph-users] Re: owner locked out of bucket via bucket
> policy
>
> Hello Casey,
>
> Thank you for the quick response. I see
> `rgw_policy_reject_invalid_principals` is not present in v17.2.7. Please
> let me know.
>
> Regards
> Jayanth
>
> On Tue, Nov 7, 2023 at 11:50 PM Casey Bodley  wrote:
>
> On Tue, Nov 7, 2023 at 12:41 PM Jayanth Reddy
>  wrote:
> >
> > Hello Wesley and Casey,
> >
> > We've ended up with the same issue and here it appears that even the
> user with "--admin" isn't able to do anything. We're now unable to figure
> out if it is due to bucket policies, ACLs or IAM of some sort. I'm seeing
> these IAM errors in the logs
> >
> > ```
> >
> > Nov  7 00:02:00 ceph-05 radosgw[4054570]: req 8786689665323103851
> 0.00368s s3:get_obj Error reading IAM Policy: Terminate parsing due to
> Handler error.
> >
> > Nov  7 22:51:40 ceph-05 radosgw[4054570]: req 13293029267332025583
> 0.0s s3:list_bucket Error reading IAM Policy: Terminate parsing due
> to Handler error.
>
> it's failing to parse the bucket policy document, but the error
> message doesn't say what's wrong with it
>
> disabling rgw_policy_reject_invalid_principals might help if it's
> failing on the Principal
>
> > Nov  7 22:51:40 ceph-05 radosgw[4054570]: req 13293029267332025583
> 0.0s s3:list_bucket init_permissions on
> :window-dev[1d0fa0b4-04eb-48f9-889b-a60de865ccd8.24143.10]) failed, ret=-13
> > Nov  7 22:51:40 ceph-feed-05 radosgw[4054570]: req 13293029267332025583
> 0.0s op->ERRORHANDLER: err_no=-13 new_err_no=-13
> >
> > ```
> >
> > Please help what's wrong here. We're in Ceph v17.2.7.
> >
> > Regards,
> > Jayanth
> >
> > On Thu, Oct 26, 2023 at 7:14 PM Wesley Dillingham 
> wrote:
> >>
> >> Thank you, this has worked to remove the policy.
> >>
> >> Respectfully,
> >>
> >> *Wes Dillingham*
> >> w...@wesdillingham.com
> >> LinkedIn 
> >>
> >>
> >> On Wed, Oct 25, 2023 at 5:10 PM Casey Bodley 
> wrote:
> >>
> >> > On Wed, Oct 25, 2023 at 4:59 PM Wesley Dillingham <
> w...@wesdillingham.com>
> >> > wrote:
> >> > >
> >> > > Thank you, I am not sure (inherited cluster). I presume such an
> admin
> >> > user created after-the-fact would work?
> >> >
> >> > yes
> >> >
> >> > > Is there a good way to discover an admin user other than iterate
> over
> >> > all users and retrieve user information? (I presume radosgw-admin
> user info
> >> > --uid=" would illustrate such administrative access?
> >> >
> >> > not sure there's an easy way to search existing users, but you could
> >> > create a temporary admin user for this repair
> >> >
> >> > >
> >> > > Respectfully,
> >> > >
> >> > > Wes Dillingham
> >> > > w...@wesdillingham.com
> >> > > LinkedIn
> >> > >
> >> > >
> >> > > On Wed, Oct 25, 2023 at 4:41 PM Casey Bodley 
> wrote:
> >> > >>
> >> > >> if you have an administrative user (created with --admin), you
> should
> >> > >> be able to use its credentials with awscli to delete or overwrite
> this
> >> > >> bucket policy
> >> > >>
> >> > >> On Wed, Oct 25, 2023 at 4:11 PM Wesley Dillingham <
> >> > w...@wesdillingham.com> wrote:
> >> > >> >
> >> > >> > I have a bucket which got injected with bucket policy which
> locks the
> >> > >> > bucket even to the bucket owner. The bucket now cannot be
> accessed
> >> > (even
> >> > >> > get its info or delete bucket policy does not work) I have
> looked in
> >> > the
> >> > >> > radosgw-admin command for a way to delete a bucket policy but do
> not
> >> > see
> >> > >> > anything. I presume I will need to somehow remove the bucket
> policy
> >> > from
> >> > >> > however it is stored in the bucket metadata / omap etc. If
> anyone can
> >> > point
> >> > >> > me in the right direction on that I would appreciate it. Thanks
> >> > >> >
> >> > >> > Respectfully,
> >> > >> >
> >> > >> > *Wes Dillingham*
> >> > >> > w...@wesdillingham.com
> >> > >> > LinkedIn 
> >> > >> > 

[ceph-users] Re: 100.00 Usage for ssd-pool (maybe after: ceph osd crush move .. root=default)

2023-11-08 Thread David C.
so the next step is to place the pools on the right rule :

ceph osd pool set db-pool  crush_rule fc-r02-ssd


Le mer. 8 nov. 2023 à 12:04, Denny Fuchs  a écrit :

> hi,
>
> I've forget to write the command, I've used:
>
> =
> ceph osd crush move fc-r02-ceph-osd-01 root=default
> ceph osd crush move fc-r02-ceph-osd-01 root=default
> ...
> =
>
> and I've found also this param:
>
> ===
> root@fc-r02-ceph-osd-01:[~]: ceph osd crush tree --show-shadow
> ID   CLASS  WEIGHTTYPE NAME
> -39   nvme   1.81938  root default~nvme
> -30   nvme 0  host fc-r02-ceph-osd-01~nvme
> -31   nvme   0.36388  host fc-r02-ceph-osd-02~nvme
>   36   nvme   0.36388  osd.36
> -32   nvme   0.36388  host fc-r02-ceph-osd-03~nvme
>   40   nvme   0.36388  osd.40
> -33   nvme   0.36388  host fc-r02-ceph-osd-04~nvme
>   37   nvme   0.36388  osd.37
> -34   nvme   0.36388  host fc-r02-ceph-osd-05~nvme
>   38   nvme   0.36388  osd.38
> -35   nvme   0.36388  host fc-r02-ceph-osd-06~nvme
>   39   nvme   0.36388  osd.39
> -38   nvme 0  root ssds~nvme
> -37   nvme 0  datacenter fc-ssds~nvme
> -36   nvme 0  rack r02-ssds~nvme
> -29   nvme 0  root sata~nvme
> -28   nvme 0  datacenter fc-sata~nvme
> -27   nvme 0  rack r02-sata~nvme
> -24ssd 0  root ssds~ssd
> -23ssd 0  datacenter fc-ssds~ssd
> -21ssd 0  rack r02-ssds~ssd
> -22ssd 0  root sata~ssd
> -19ssd 0  datacenter fc-sata~ssd
> -20ssd 0  rack r02-sata~ssd
> -140  root sata
> -180  datacenter fc-sata
> -160  rack r02-sata
> -130  root ssds
> -170  datacenter fc-ssds
> -150  rack r02-ssds
>   -4ssd  22.17122  root default~ssd
>   -7ssd   4.00145  host fc-r02-ceph-osd-01~ssd
>0ssd   0.45470  osd.0
>1ssd   0.45470  osd.1
>2ssd   0.45470  osd.2
>3ssd   0.45470  osd.3
>4ssd   0.45470  osd.4
>5ssd   0.45470  osd.5
>   41ssd   0.36388  osd.41
>   42ssd   0.45470  osd.42
>   48ssd   0.45470  osd.48
>   -3ssd   3.61948  host fc-r02-ceph-osd-02~ssd
>6ssd   0.45470  osd.6
>7ssd   0.45470  osd.7
>8ssd   0.45470  osd.8
>9ssd   0.45470  osd.9
>   10ssd   0.43660  osd.10
>   29ssd   0.45470  osd.29
>   43ssd   0.45470  osd.43
>   49ssd   0.45470  osd.49
>   -8ssd   3.63757  host fc-r02-ceph-osd-03~ssd
>   11ssd   0.45470  osd.11
>   12ssd   0.45470  osd.12
>   13ssd   0.45470  osd.13
>   14ssd   0.45470  osd.14
>   15ssd   0.45470  osd.15
>   16ssd   0.45470  osd.16
>   44ssd   0.45470  osd.44
>   50ssd   0.45470  osd.50
> -10ssd   3.63757  host fc-r02-ceph-osd-04~ssd
>   30ssd   0.45470  osd.30
>   31ssd   0.45470  osd.31
>   32ssd   0.45470  osd.32
>   33ssd   0.45470  osd.33
>   34ssd   0.45470  osd.34
>   35ssd   0.45470  osd.35
>   45ssd   0.45470  osd.45
>   51ssd   0.45470  osd.51
> -12ssd   3.63757  host fc-r02-ceph-osd-05~ssd
>   17ssd   0.45470  osd.17
>   18ssd   0.45470  osd.18
>   19ssd   0.45470  osd.19
>   20ssd   0.45470  osd.20
>   21ssd   0.45470  osd.21
>   22ssd   0.45470  osd.22
>   46ssd   0.45470  osd.46
>   52ssd   0.45470  osd.52
> -26ssd   3.63757  host fc-r02-ceph-osd-06~ssd
>   23ssd   0.45470  osd.23
>   24ssd   0.45470  osd.24
>   25ssd   0.45470  osd.25
>   26ssd   0.45470  osd.26
>   27ssd   0.45470  osd.27
>   28ssd   0.45470  osd.28
>   47ssd   0.45470  osd.47
>   53ssd   0.45470  osd.53
>   -1 23.99060  root default
>   -6  4.00145  host fc-r02-ceph-osd-01
>0ssd   0.45470  osd.0
>1ssd   0.45470  osd.1
>2ssd   0.45470  osd.2
>3ssd   0.45470  osd.3
>4ssd   0.45470  osd.4
>5ssd   0.45470  osd.5
>   41ssd   0.36388  osd.41
>   42ssd   0.45470  osd.42
>   48ssd   0.45470  osd.48
>   -2  3.98335  host fc-r02-ceph-osd-02
>   36   nvme   0.36388  osd.36
>6ssd   0.45470  osd.6
>7ssd   0.45470  osd.7
>8ssd   0.45470  osd.8
>9ssd   0.45470  osd.9
>   10ssd   0.43660  osd.10
>   29ssd   0.45470 

[ceph-users] Re: 100.00 Usage for ssd-pool (maybe after: ceph osd crush move .. root=default)

2023-11-08 Thread Denny Fuchs

Hi,

I overseen also this:

==
root@fc-r02-ceph-osd-01:[~]: ceph -s
  cluster:
id: cfca8c93-f3be-4b86-b9cb-8da095ca2c26
health: HEALTH_OK

  services:
mon: 5 daemons, quorum 
fc-r02-ceph-osd-01,fc-r02-ceph-osd-02,fc-r02-ceph-osd-03,fc-r02-ceph-osd-05,fc-r02-ceph-osd-06 
(age 2w)
mgr: fc-r02-ceph-osd-06(active, since 2w), standbys: 
fc-r02-ceph-osd-02, fc-r02-ceph-osd-03, fc-r02-ceph-osd-01, 
fc-r02-ceph-osd-05, fc-r02-ceph-osd-04

osd: 54 osds: 54 up (since 2w), 54 in (since 2w); 2176 remapped pgs

  data:
pools:   3 pools, 2177 pgs
objects: 1.14M objects, 4.3 TiB
usage:   13 TiB used, 11 TiB / 23 TiB avail
pgs: 5684410/3410682 objects misplaced (166.665%)
 2176 active+clean+remapped
 1active+clean

  io:
client:   1.8 MiB/s rd, 13 MiB/s wr, 40 op/s rd, 702 op/s wr
==

pretty bad:

pgs: 5684410/3410682 objects misplaced (166.665%)

I did not removed any bucket, just executed the "ceph osd crush move" 
command ...


cu denny
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 100.00 Usage for ssd-pool (maybe after: ceph osd crush move .. root=default)

2023-11-08 Thread Denny Fuchs

hi,

I've forget to write the command, I've used:

=
ceph osd crush move fc-r02-ceph-osd-01 root=default
ceph osd crush move fc-r02-ceph-osd-01 root=default
...
=

and I've found also this param:

===
root@fc-r02-ceph-osd-01:[~]: ceph osd crush tree --show-shadow
ID   CLASS  WEIGHTTYPE NAME
-39   nvme   1.81938  root default~nvme
-30   nvme 0  host fc-r02-ceph-osd-01~nvme
-31   nvme   0.36388  host fc-r02-ceph-osd-02~nvme
 36   nvme   0.36388  osd.36
-32   nvme   0.36388  host fc-r02-ceph-osd-03~nvme
 40   nvme   0.36388  osd.40
-33   nvme   0.36388  host fc-r02-ceph-osd-04~nvme
 37   nvme   0.36388  osd.37
-34   nvme   0.36388  host fc-r02-ceph-osd-05~nvme
 38   nvme   0.36388  osd.38
-35   nvme   0.36388  host fc-r02-ceph-osd-06~nvme
 39   nvme   0.36388  osd.39
-38   nvme 0  root ssds~nvme
-37   nvme 0  datacenter fc-ssds~nvme
-36   nvme 0  rack r02-ssds~nvme
-29   nvme 0  root sata~nvme
-28   nvme 0  datacenter fc-sata~nvme
-27   nvme 0  rack r02-sata~nvme
-24ssd 0  root ssds~ssd
-23ssd 0  datacenter fc-ssds~ssd
-21ssd 0  rack r02-ssds~ssd
-22ssd 0  root sata~ssd
-19ssd 0  datacenter fc-sata~ssd
-20ssd 0  rack r02-sata~ssd
-140  root sata
-180  datacenter fc-sata
-160  rack r02-sata
-130  root ssds
-170  datacenter fc-ssds
-150  rack r02-ssds
 -4ssd  22.17122  root default~ssd
 -7ssd   4.00145  host fc-r02-ceph-osd-01~ssd
  0ssd   0.45470  osd.0
  1ssd   0.45470  osd.1
  2ssd   0.45470  osd.2
  3ssd   0.45470  osd.3
  4ssd   0.45470  osd.4
  5ssd   0.45470  osd.5
 41ssd   0.36388  osd.41
 42ssd   0.45470  osd.42
 48ssd   0.45470  osd.48
 -3ssd   3.61948  host fc-r02-ceph-osd-02~ssd
  6ssd   0.45470  osd.6
  7ssd   0.45470  osd.7
  8ssd   0.45470  osd.8
  9ssd   0.45470  osd.9
 10ssd   0.43660  osd.10
 29ssd   0.45470  osd.29
 43ssd   0.45470  osd.43
 49ssd   0.45470  osd.49
 -8ssd   3.63757  host fc-r02-ceph-osd-03~ssd
 11ssd   0.45470  osd.11
 12ssd   0.45470  osd.12
 13ssd   0.45470  osd.13
 14ssd   0.45470  osd.14
 15ssd   0.45470  osd.15
 16ssd   0.45470  osd.16
 44ssd   0.45470  osd.44
 50ssd   0.45470  osd.50
-10ssd   3.63757  host fc-r02-ceph-osd-04~ssd
 30ssd   0.45470  osd.30
 31ssd   0.45470  osd.31
 32ssd   0.45470  osd.32
 33ssd   0.45470  osd.33
 34ssd   0.45470  osd.34
 35ssd   0.45470  osd.35
 45ssd   0.45470  osd.45
 51ssd   0.45470  osd.51
-12ssd   3.63757  host fc-r02-ceph-osd-05~ssd
 17ssd   0.45470  osd.17
 18ssd   0.45470  osd.18
 19ssd   0.45470  osd.19
 20ssd   0.45470  osd.20
 21ssd   0.45470  osd.21
 22ssd   0.45470  osd.22
 46ssd   0.45470  osd.46
 52ssd   0.45470  osd.52
-26ssd   3.63757  host fc-r02-ceph-osd-06~ssd
 23ssd   0.45470  osd.23
 24ssd   0.45470  osd.24
 25ssd   0.45470  osd.25
 26ssd   0.45470  osd.26
 27ssd   0.45470  osd.27
 28ssd   0.45470  osd.28
 47ssd   0.45470  osd.47
 53ssd   0.45470  osd.53
 -1 23.99060  root default
 -6  4.00145  host fc-r02-ceph-osd-01
  0ssd   0.45470  osd.0
  1ssd   0.45470  osd.1
  2ssd   0.45470  osd.2
  3ssd   0.45470  osd.3
  4ssd   0.45470  osd.4
  5ssd   0.45470  osd.5
 41ssd   0.36388  osd.41
 42ssd   0.45470  osd.42
 48ssd   0.45470  osd.48
 -2  3.98335  host fc-r02-ceph-osd-02
 36   nvme   0.36388  osd.36
  6ssd   0.45470  osd.6
  7ssd   0.45470  osd.7
  8ssd   0.45470  osd.8
  9ssd   0.45470  osd.9
 10ssd   0.43660  osd.10
 29ssd   0.45470  osd.29
 43ssd   0.45470  osd.43
 49ssd   0.45470  osd.49
 -5  4.00145  host fc-r02-ceph-osd-03
 40   nvme   0.36388  osd.40
 11ssd   0.45470  osd.11
 12ssd   0.45470  osd.12
 13ssd   0.45470  osd.13
 14ssd   0.45470  osd.14
 15ssd   0.45470  osd.15
 16ssd   0.45470  osd.16
 44ssd   0.45470  osd.44
 50ssd   0.45470  osd.50
 -9

[ceph-users] Re: 100.00 Usage for ssd-pool (maybe after: ceph osd crush move .. root=default)

2023-11-08 Thread David C.
I've probably answered too quickly if the migration is complete and there
are no incidents.

Are the pg active+clean?


Cordialement,

*David CASIER*





Le mer. 8 nov. 2023 à 11:50, David C.  a écrit :

> Hi,
>
> It seems to me that before removing buckets from the crushmap, it is
> necessary to do the migration first.
> I think you should restore the initial crushmap by adding the default root
> next to it and only then do the migration.
> There should be some backfill (probably a lot).
> 
>
> Cordialement,
>
> *David CASIER*
>
> 
>
>
>
> Le mer. 8 nov. 2023 à 11:27, Denny Fuchs  a écrit :
>
>> Hello,
>>
>> we upgraded to Quincy and tried to remove an obsolete part:
>>
>> In the beginning of Ceph, there where no device classes and we created
>> rules, to split them into hdd and ssd on one of our datacenters.
>>
>>
>> https://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/
>>
>> So we had different "roots" for SSD and HDD. Two weeks ago .. we tried
>> to move the hosts to the root=default and checked .. what happens ..
>> nothing .. all was fine and working. But we did not checked the "ceph
>> df":
>>
>> ==
>> root@fc-r02-ceph-osd-01:[~]: ceph osd df tree
>> ID   CLASS  WEIGHTREWEIGHT  SIZE RAW USE  DATA OMAP META
>>   AVAIL%USE   VAR   PGS  STATUS  TYPE NAME
>> -140 -  0 B  0 B  0 B  0 B
>> 0 B  0 B  0 0-  root sata
>> -180 -  0 B  0 B  0 B  0 B
>> 0 B  0 B  0 0-  datacenter fc-sata
>> -160 -  0 B  0 B  0 B  0 B
>> 0 B  0 B  0 0-  rack r02-sata
>> -130 -  0 B  0 B  0 B  0 B
>> 0 B  0 B  0 0-  root ssds
>> -170 -  0 B  0 B  0 B  0 B
>> 0 B  0 B  0 0-  datacenter fc-ssds
>> -150 -  0 B  0 B  0 B  0 B
>> 0 B  0 B  0 0-  rack r02-ssds
>>   -1 23.99060 -   23 TiB   13 TiB   12 TiB  6.0 GiB32
>> GiB   11 TiB  54.17  1.00-  root default
>>   -6  4.00145 -  3.9 TiB  2.1 TiB  2.1 TiB  2.1 MiB   7.2
>> GiB  1.7 TiB  54.87  1.01-  host fc-r02-ceph-osd-01
>>0ssd   0.45470   1.0  447 GiB  236 GiB  235 GiB  236 KiB   794
>> MiB  211 GiB  52.80  0.97  119  up  osd.0
>>1ssd   0.45470   1.0  447 GiB  222 GiB  221 GiB  239 KiB   808
>> MiB  225 GiB  49.67  0.92  108  up  osd.1
>>2ssd   0.45470   1.0  447 GiB  245 GiB  244 GiB  254 KiB   819
>> MiB  202 GiB  54.85  1.01  118  up  osd.2
>>3ssd   0.45470   1.0  447 GiB  276 GiB  276 GiB  288 KiB   903
>> MiB  171 GiB  61.83  1.14  135  up  osd.3
>>4ssd   0.45470   1.0  447 GiB  268 GiB  267 GiB  272 KiB   913
>> MiB  180 GiB  59.85  1.10  132  up  osd.4
>>5ssd   0.45470   1.0  447 GiB  204 GiB  203 GiB  181 KiB   684
>> MiB  243 GiB  45.56  0.84  108  up  osd.5
>>   41ssd   0.36388   1.0  373 GiB  211 GiB  210 GiB  207 KiB   818
>> MiB  161 GiB  56.69  1.05  104  up  osd.41
>>   42ssd   0.45470   1.0  447 GiB  220 GiB  219 GiB  214 KiB   791
>> MiB  227 GiB  49.26  0.91  107  up  osd.42
>>   48ssd   0.45470   1.0  447 GiB  284 GiB  284 GiB  281 KiB   864
>> MiB  163 GiB  63.62  1.17  139  up  osd.48
>>   -2  3.98335 -  3.9 TiB  2.1 TiB  2.1 TiB  1.0 GiB   5.0
>> GiB  1.7 TiB  54.82  1.01-  host fc-r02-ceph-osd-02
>>   36   nvme   0.36388   1.0  373 GiB  239 GiB  238 GiB  163 MiB   460
>> MiB  134 GiB  64.10  1.18  127  up  osd.36
>>6ssd   0.45470   1.0  447 GiB  247 GiB  246 GiB  114 MiB   585
>> MiB  200 GiB  55.20  1.02  121  up  osd.6
>>7ssd   0.45470   1.0  447 GiB  260 GiB  259 GiB  158 MiB   590
>> MiB  187 GiB  58.19  1.07  126  up  osd.7
>>8ssd   0.45470   1.0  447 GiB  196 GiB  195 GiB  165 MiB   471
>> MiB  251 GiB  43.85  0.81  101  up  osd.8
>>9ssd   0.45470   1.0  447 GiB  203 GiB  202 GiB  168 MiB   407
>> MiB  244 GiB  45.34  0.84  104  up  osd.9
>>   10ssd   0.43660   1.0  447 GiB  284 GiB  283 GiB  287 KiB   777
>> MiB  163 GiB  63.49  1.17  142  up  osd.10
>>   29ssd   0.45470   1.0  447 GiB  241 GiB  240 GiB  147 MiB   492
>> MiB  206 GiB  53.93  1.00  124  up  osd.29
>>   43ssd   0.45470   1.0  447 GiB  257 GiB  256 GiB 

[ceph-users] Re: 100.00 Usage for ssd-pool (maybe after: ceph osd crush move .. root=default)

2023-11-08 Thread David C.
Hi,

It seems to me that before removing buckets from the crushmap, it is
necessary to do the migration first.
I think you should restore the initial crushmap by adding the default root
next to it and only then do the migration.
There should be some backfill (probably a lot).


Cordialement,

*David CASIER*




*Ligne directe: +33(0) 9 72 61 98 29*




Le mer. 8 nov. 2023 à 11:27, Denny Fuchs  a écrit :

> Hello,
>
> we upgraded to Quincy and tried to remove an obsolete part:
>
> In the beginning of Ceph, there where no device classes and we created
> rules, to split them into hdd and ssd on one of our datacenters.
>
>
> https://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/
>
> So we had different "roots" for SSD and HDD. Two weeks ago .. we tried
> to move the hosts to the root=default and checked .. what happens ..
> nothing .. all was fine and working. But we did not checked the "ceph
> df":
>
> ==
> root@fc-r02-ceph-osd-01:[~]: ceph osd df tree
> ID   CLASS  WEIGHTREWEIGHT  SIZE RAW USE  DATA OMAP META
>   AVAIL%USE   VAR   PGS  STATUS  TYPE NAME
> -140 -  0 B  0 B  0 B  0 B
> 0 B  0 B  0 0-  root sata
> -180 -  0 B  0 B  0 B  0 B
> 0 B  0 B  0 0-  datacenter fc-sata
> -160 -  0 B  0 B  0 B  0 B
> 0 B  0 B  0 0-  rack r02-sata
> -130 -  0 B  0 B  0 B  0 B
> 0 B  0 B  0 0-  root ssds
> -170 -  0 B  0 B  0 B  0 B
> 0 B  0 B  0 0-  datacenter fc-ssds
> -150 -  0 B  0 B  0 B  0 B
> 0 B  0 B  0 0-  rack r02-ssds
>   -1 23.99060 -   23 TiB   13 TiB   12 TiB  6.0 GiB32
> GiB   11 TiB  54.17  1.00-  root default
>   -6  4.00145 -  3.9 TiB  2.1 TiB  2.1 TiB  2.1 MiB   7.2
> GiB  1.7 TiB  54.87  1.01-  host fc-r02-ceph-osd-01
>0ssd   0.45470   1.0  447 GiB  236 GiB  235 GiB  236 KiB   794
> MiB  211 GiB  52.80  0.97  119  up  osd.0
>1ssd   0.45470   1.0  447 GiB  222 GiB  221 GiB  239 KiB   808
> MiB  225 GiB  49.67  0.92  108  up  osd.1
>2ssd   0.45470   1.0  447 GiB  245 GiB  244 GiB  254 KiB   819
> MiB  202 GiB  54.85  1.01  118  up  osd.2
>3ssd   0.45470   1.0  447 GiB  276 GiB  276 GiB  288 KiB   903
> MiB  171 GiB  61.83  1.14  135  up  osd.3
>4ssd   0.45470   1.0  447 GiB  268 GiB  267 GiB  272 KiB   913
> MiB  180 GiB  59.85  1.10  132  up  osd.4
>5ssd   0.45470   1.0  447 GiB  204 GiB  203 GiB  181 KiB   684
> MiB  243 GiB  45.56  0.84  108  up  osd.5
>   41ssd   0.36388   1.0  373 GiB  211 GiB  210 GiB  207 KiB   818
> MiB  161 GiB  56.69  1.05  104  up  osd.41
>   42ssd   0.45470   1.0  447 GiB  220 GiB  219 GiB  214 KiB   791
> MiB  227 GiB  49.26  0.91  107  up  osd.42
>   48ssd   0.45470   1.0  447 GiB  284 GiB  284 GiB  281 KiB   864
> MiB  163 GiB  63.62  1.17  139  up  osd.48
>   -2  3.98335 -  3.9 TiB  2.1 TiB  2.1 TiB  1.0 GiB   5.0
> GiB  1.7 TiB  54.82  1.01-  host fc-r02-ceph-osd-02
>   36   nvme   0.36388   1.0  373 GiB  239 GiB  238 GiB  163 MiB   460
> MiB  134 GiB  64.10  1.18  127  up  osd.36
>6ssd   0.45470   1.0  447 GiB  247 GiB  246 GiB  114 MiB   585
> MiB  200 GiB  55.20  1.02  121  up  osd.6
>7ssd   0.45470   1.0  447 GiB  260 GiB  259 GiB  158 MiB   590
> MiB  187 GiB  58.19  1.07  126  up  osd.7
>8ssd   0.45470   1.0  447 GiB  196 GiB  195 GiB  165 MiB   471
> MiB  251 GiB  43.85  0.81  101  up  osd.8
>9ssd   0.45470   1.0  447 GiB  203 GiB  202 GiB  168 MiB   407
> MiB  244 GiB  45.34  0.84  104  up  osd.9
>   10ssd   0.43660   1.0  447 GiB  284 GiB  283 GiB  287 KiB   777
> MiB  163 GiB  63.49  1.17  142  up  osd.10
>   29ssd   0.45470   1.0  447 GiB  241 GiB  240 GiB  147 MiB   492
> MiB  206 GiB  53.93  1.00  124  up  osd.29
>   43ssd   0.45470   1.0  447 GiB  257 GiB  256 GiB  151 MiB   509
> MiB  190 GiB  57.48  1.06  131  up  osd.43
>   49ssd   0.45470   1.0  447 GiB  239 GiB  238 GiB  242 KiB   820
> MiB  209 GiB  53.35  0.98  123  up  osd.49
>   -5  4.00145 -  3.9 TiB  2.1 TiB  2.1 TiB  1.3 GiB   4.9
> GiB  1.7 TiB  55.41  1.02-  host fc-r02-ceph-osd-03
>   40   nvme   0.36388  

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-08 Thread Venky Shankar
Hi Yuri,

On Wed, Nov 8, 2023 at 2:32 AM Yuri Weinstein  wrote:
>
> 3 PRs above mentioned were merged and I am returning some tests:
> https://pulpito.ceph.com/?sha1=55e3239498650453ff76a9b06a37f1a6f488c8fd
>
> Still seeing approvals.
> smoke - Laura, Radek, Prashant, Venky in progress
> rados - Neha, Radek, Travis, Ernesto, Adam King
> rgw - Casey in progress
> fs - Venky

There's a failure in the fs suite


https://pulpito.ceph.com/vshankar-2023-11-07_05:14:36-fs-reef-release-distro-default-smithi/7450325/

Seems to be related to nfs-ganesha. I've reached out to Frank Filz
(#cephfs on ceph slack) to have a look. WIll update as soon as
possible.

> orch - Adam King
> rbd - Ilya approved
> krbd - Ilya approved
> upgrade/quincy-x (reef) - Laura PTL
> powercycle - Brad
> perf-basic - in progress
>
>
> On Tue, Nov 7, 2023 at 8:38 AM Casey Bodley  wrote:
> >
> > On Mon, Nov 6, 2023 at 4:31 PM Yuri Weinstein  wrote:
> > >
> > > Details of this release are summarized here:
> > >
> > > https://tracker.ceph.com/issues/63443#note-1
> > >
> > > Seeking approvals/reviews for:
> > >
> > > smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE failures)
> > > rados - Neha, Radek, Travis, Ernesto, Adam King
> > > rgw - Casey
> >
> > rgw results are approved. https://github.com/ceph/ceph/pull/54371
> > merged to reef but is needed on reef-release
> >
> > > fs - Venky
> > > orch - Adam King
> > > rbd - Ilya
> > > krbd - Ilya
> > > upgrade/quincy-x (reef) - Laura PTL
> > > powercycle - Brad
> > > perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures)
> > >
> > > Please reply to this email with approval and/or trackers of known
> > > issues/PRs to address them.
> > >
> > > TIA
> > > YuriW
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Cheers,
Venky
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] 100.00 Usage for ssd-pool (maybe after: ceph osd crush move .. root=default)

2023-11-08 Thread Denny Fuchs

Hello,

we upgraded to Quincy and tried to remove an obsolete part:

In the beginning of Ceph, there where no device classes and we created 
rules, to split them into hdd and ssd on one of our datacenters.


https://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/

So we had different "roots" for SSD and HDD. Two weeks ago .. we tried 
to move the hosts to the root=default and checked .. what happens .. 
nothing .. all was fine and working. But we did not checked the "ceph 
df":


==
root@fc-r02-ceph-osd-01:[~]: ceph osd df tree
ID   CLASS  WEIGHTREWEIGHT  SIZE RAW USE  DATA OMAP META 
 AVAIL%USE   VAR   PGS  STATUS  TYPE NAME
-140 -  0 B  0 B  0 B  0 B   
0 B  0 B  0 0-  root sata
-180 -  0 B  0 B  0 B  0 B   
0 B  0 B  0 0-  datacenter fc-sata
-160 -  0 B  0 B  0 B  0 B   
0 B  0 B  0 0-  rack r02-sata
-130 -  0 B  0 B  0 B  0 B   
0 B  0 B  0 0-  root ssds
-170 -  0 B  0 B  0 B  0 B   
0 B  0 B  0 0-  datacenter fc-ssds
-150 -  0 B  0 B  0 B  0 B   
0 B  0 B  0 0-  rack r02-ssds
 -1 23.99060 -   23 TiB   13 TiB   12 TiB  6.0 GiB32 
GiB   11 TiB  54.17  1.00-  root default
 -6  4.00145 -  3.9 TiB  2.1 TiB  2.1 TiB  2.1 MiB   7.2 
GiB  1.7 TiB  54.87  1.01-  host fc-r02-ceph-osd-01
  0ssd   0.45470   1.0  447 GiB  236 GiB  235 GiB  236 KiB   794 
MiB  211 GiB  52.80  0.97  119  up  osd.0
  1ssd   0.45470   1.0  447 GiB  222 GiB  221 GiB  239 KiB   808 
MiB  225 GiB  49.67  0.92  108  up  osd.1
  2ssd   0.45470   1.0  447 GiB  245 GiB  244 GiB  254 KiB   819 
MiB  202 GiB  54.85  1.01  118  up  osd.2
  3ssd   0.45470   1.0  447 GiB  276 GiB  276 GiB  288 KiB   903 
MiB  171 GiB  61.83  1.14  135  up  osd.3
  4ssd   0.45470   1.0  447 GiB  268 GiB  267 GiB  272 KiB   913 
MiB  180 GiB  59.85  1.10  132  up  osd.4
  5ssd   0.45470   1.0  447 GiB  204 GiB  203 GiB  181 KiB   684 
MiB  243 GiB  45.56  0.84  108  up  osd.5
 41ssd   0.36388   1.0  373 GiB  211 GiB  210 GiB  207 KiB   818 
MiB  161 GiB  56.69  1.05  104  up  osd.41
 42ssd   0.45470   1.0  447 GiB  220 GiB  219 GiB  214 KiB   791 
MiB  227 GiB  49.26  0.91  107  up  osd.42
 48ssd   0.45470   1.0  447 GiB  284 GiB  284 GiB  281 KiB   864 
MiB  163 GiB  63.62  1.17  139  up  osd.48
 -2  3.98335 -  3.9 TiB  2.1 TiB  2.1 TiB  1.0 GiB   5.0 
GiB  1.7 TiB  54.82  1.01-  host fc-r02-ceph-osd-02
 36   nvme   0.36388   1.0  373 GiB  239 GiB  238 GiB  163 MiB   460 
MiB  134 GiB  64.10  1.18  127  up  osd.36
  6ssd   0.45470   1.0  447 GiB  247 GiB  246 GiB  114 MiB   585 
MiB  200 GiB  55.20  1.02  121  up  osd.6
  7ssd   0.45470   1.0  447 GiB  260 GiB  259 GiB  158 MiB   590 
MiB  187 GiB  58.19  1.07  126  up  osd.7
  8ssd   0.45470   1.0  447 GiB  196 GiB  195 GiB  165 MiB   471 
MiB  251 GiB  43.85  0.81  101  up  osd.8
  9ssd   0.45470   1.0  447 GiB  203 GiB  202 GiB  168 MiB   407 
MiB  244 GiB  45.34  0.84  104  up  osd.9
 10ssd   0.43660   1.0  447 GiB  284 GiB  283 GiB  287 KiB   777 
MiB  163 GiB  63.49  1.17  142  up  osd.10
 29ssd   0.45470   1.0  447 GiB  241 GiB  240 GiB  147 MiB   492 
MiB  206 GiB  53.93  1.00  124  up  osd.29
 43ssd   0.45470   1.0  447 GiB  257 GiB  256 GiB  151 MiB   509 
MiB  190 GiB  57.48  1.06  131  up  osd.43
 49ssd   0.45470   1.0  447 GiB  239 GiB  238 GiB  242 KiB   820 
MiB  209 GiB  53.35  0.98  123  up  osd.49
 -5  4.00145 -  3.9 TiB  2.1 TiB  2.1 TiB  1.3 GiB   4.9 
GiB  1.7 TiB  55.41  1.02-  host fc-r02-ceph-osd-03
 40   nvme   0.36388   1.0  373 GiB  236 GiB  235 GiB  156 MiB   469 
MiB  137 GiB  63.26  1.17  119  up  osd.40
 11ssd   0.45470   1.0  447 GiB  244 GiB  244 GiB  187 MiB   602 
MiB  203 GiB  54.68  1.01  123  up  osd.11
 12ssd   0.45470   1.0  447 GiB  235 GiB  235 GiB  163 MiB   496 
MiB  212 GiB  52.65  0.97  122  up  osd.12
 13ssd   0.45470   1.0  447 GiB  236 GiB  235 GiB  114 MiB   594 
MiB  211 GiB  52.79  0.97  124  up  osd.13
 14ssd   0.45470   1.0  447 GiB  259 GiB  258 GiB  145 MiB   475 
MiB  188 GiB  57.87  1.07  126  up