[ceph-users] Re: Planning: Ceph User Survey 2020

2021-01-27 Thread Mike Perez
Hey Alexandre,

Sorry for the late reply here. I believe Anthony can give you a response on why 
we chose a matrix rating scale type instead of rank.

—

— Mike Perez (thingee)
On Wed, Nov 25, 2020 at 8:27 AM Alexandre Marangone  
wrote:
> Hi Mike,
>
> For some of the multiple answer questions like "Which resources do you
> check when you need help?" could these be ranked answers instead? It
> would allow to see which resources are more useful to the community
>
> On Tue, Nov 24, 2020 at 10:06 AM Mike Perez  wrote:
> >
> > Hi everyone,
> >
> > The Ceph User Survey 2020 is being planned by our working group. Please
> > review the draft survey pdf, and let's discuss any changes. You may also
> > join us in the next meeting *on November 25th at 12pm *PT
> >
> > https://tracker.ceph.com/projects/ceph/wiki/User_Survey_Working_Group
> >
> > https://tracker.ceph.com/attachments/download/5260/Ceph%20User%20Survey%202020.pdf
> >
> > We're aiming to have something ready by mid-December.
> >
> > --
> >
> > Mike Perez
> >
> > he/him
> >
> > Ceph / Rook / RDO / Gluster Community Architect
> >
> > Open-Source Program Office (OSPO)
> >
> >
> > M: +1-951-572-2633
> >
> > 494C 5D25 2968 D361 65FB 3829 94BC D781 ADA8 8AEA
> > @Thingee   Thingee
> >  
> > 
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Where has my capacity gone?

2021-01-27 Thread George Yil
May I ask if enabling pool compression helps for the future space amplification?

> George Yil  şunları yazdı (27 Oca 2021 18:57):
> 
> Thank you. This helps a lot.
> 
>> Josh Baergen  şunları yazdı (27 Oca 2021 17:08):
>> 
>> On Wed, Jan 27, 2021 at 12:24 AM George Yil  wrote:
>>> May I ask if it can be dynamically changed and any disadvantages should be 
>>> expected?
>> 
>> Unless there's some magic I'm unaware of, there is no way to
>> dynamically change this. Each OSD must be recreated with the new
>> min_alloc_size setting. In production systems this can be quite the
>> chore, since the safest way to accomplish this is to drain the OSD
>> (set it 'out', use CRUSH map changes, or use upmaps), recreate it, and
>> then repopulate it. With automation this can run in the background.
>> Given how much room you have currently you may be able to do this
>> host-at-a-time by storing a host's data on the other hosts in a given
>> rack (though I don't remember what your CRUSH tree looks like so maybe
>> you can't do this and maintain host independence).
>> 
>> The downside is potentially more tracking metadata at the OSD level,
>> though I understand that Nautilus has made improvements here. I'm not
>> up to speed on the latest state in this area, though.
>> 
>> Josh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CEPHFS - MDS gracefull handover of rank 0

2021-01-27 Thread Stefan Kooman

On 1/27/21 3:51 PM, Konstantin Shalygin wrote:

Martin, also before restart - issue cache drop command to active mds


Don't do this if you have a large cache. It will make your MDS 
unresponsive and replaced by a standby if available. There is a PR to 
fix this: https://github.com/ceph/ceph/pull/36823


I killed a 170 GB MDS this way. YMMV.

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG inconsistent with empty inconsistent objects

2021-01-27 Thread Richard Bade
Thanks Dan and Anthony your suggestions have pointed me in the right
direction. Looking back through the logs at when the first error was
detected I found this:

ceph-osd: 2021-01-24 01:04:55.905 7f0c17821700 -1 log_channel(cluster)
log [ERR] : 17.7ffs0 scrub : stat mismatch, got 112867/112868 objects,
0/0 clones, 112867/112868 dirty, 0/0 omap, 0/0 pinned, 0/0
hit_set_archive, 0/0 whiteouts, 473372381184/473376575488 bytes, 0/0
manifest objects, 0/0 hit_set_archive bytes.

As Anthony suggested the error is not in the rados objects but
actually the stats.
I assume that a repair will fix this up?

Thanks again everyone.
Rich

On Thu, 28 Jan 2021 at 03:59, Dan van der Ster  wrote:
>
> Usually the ceph.log prints the reason for the inconsistency when it
> is first detected by scrubbing.
>
> -- dan
>
> On Wed, Jan 27, 2021 at 12:41 AM Richard Bade  wrote:
> >
> > Hi Everyone,
> > I also have seen this inconsistent with empty when you do 
> > list-inconsistent-obj
> >
> > $ sudo ceph health detail
> > HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent; 1
> > pgs not deep-scrubbed in time
> > OSD_SCRUB_ERRORS 1 scrub errors
> > PG_DAMAGED Possible data damage: 1 pg inconsistent
> > pg 17.7ff is active+clean+inconsistent, acting [232,242,34,280,266,21]
> > PG_NOT_DEEP_SCRUBBED 1 pgs not deep-scrubbed in time
> > pg 17.1c2 not deep-scrubbed since 2021-01-15 02:46:16.271811
> >
> > $ sudo rados list-inconsistent-obj 17.7ff --format=json-pretty
> > {
> > "epoch": 183807,
> > "inconsistents": []
> > }
> >
> > Usually these are caused by read errors on the disks, but I've checked
> > all osd hosts that are part of this osd and there's no smart or dmesg
> > errors.
> >
> > Rich
> >
> > --
> > >
> > > Date: Sun, 17 Jan 2021 14:00:01 +0330
> > > From: Seena Fallah 
> > > Subject: [ceph-users] Re: PG inconsistent with empty inconsistent
> > > objects
> > > To: "Alexander E. Patrakov" 
> > > Cc: ceph-users 
> > > Message-ID:
> > > 
> > > 
> > > Content-Type: text/plain; charset="UTF-8"
> > >
> > > It's for a long time ago and I don't have the `ceph health detail` output!
> > >
> > > On Sat, Jan 16, 2021 at 9:42 PM Alexander E. Patrakov 
> > > wrote:
> > >
> > > > For a start, please post the "ceph health detail" output.
> > > >
> > > > сб, 19 дек. 2020 г. в 23:48, Seena Fallah :
> > > > >
> > > > > Hi,
> > > > >
> > > > > I'm facing something strange! One of the PGs in my pool got 
> > > > > inconsistent
> > > > > and when I run `rados list-inconsistent-obj $PG_ID 
> > > > > --format=json-pretty`
> > > > > the `inconsistents` key was empty! What is this? Is it a bug in Ceph
> > > > or..?
> > > > >
> > > > > Thanks.
> > > > > ___
> > > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > >
> > > >
> > > >
> > > > --
> > > > Alexander E. Patrakov
> > > > CV: http://u.pc.cd/wT8otalK
> > > >
> > >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Where has my capacity gone?

2021-01-27 Thread Josh Baergen
On Wed, Jan 27, 2021 at 12:24 AM George Yil  wrote:
> May I ask if it can be dynamically changed and any disadvantages should be 
> expected?

Unless there's some magic I'm unaware of, there is no way to
dynamically change this. Each OSD must be recreated with the new
min_alloc_size setting. In production systems this can be quite the
chore, since the safest way to accomplish this is to drain the OSD
(set it 'out', use CRUSH map changes, or use upmaps), recreate it, and
then repopulate it. With automation this can run in the background.
Given how much room you have currently you may be able to do this
host-at-a-time by storing a host's data on the other hosts in a given
rack (though I don't remember what your CRUSH tree looks like so maybe
you can't do this and maintain host independence).

The downside is potentially more tracking metadata at the OSD level,
though I understand that Nautilus has made improvements here. I'm not
up to speed on the latest state in this area, though.

Josh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw not working - upgraded from mimic to octopus

2021-01-27 Thread Youzhong Yang
Anyone running octopus (v15)? Can you please share your experience of
radosgw-admin performance?

A simple 'radosgw-admin user list' took 11 minutes; if I use a v13.2.4
radosgw-admin, it can be finished in a few seconds.

This sounds like a performance regression to me. I've already filed a bug
report (https://tracker.ceph.com/issues/48983) but so far no feedback yet.

On Mon, Jan 25, 2021 at 10:06 AM Youzhong Yang  wrote:

> I upgraded our ceph cluster (6 bare metal nodes, 3 rgw VMs) from v13.2.4
> to v15.2.8. The mon, mgr, mds and osd daemons were all upgraded
> successfully, everything looked good.
>
> After the radosgw was upgraded, they refused to work, the log messages are
> at the end of this e-mail.
>
> Here are the things I tried:
>
> 1. I moved aside the pools for the rgw service, started from scratch
> (creating realm, zonegroup, zone, users), but when I tried to run
> 'radosgw-admin user create ...', it appeared to be stuck and never
> returned, other command like 'radosgw-admin period update --commit' also
> got stuck.
>
> 2. I rolled back radosgw to the old version v13.2.4, then everything works
> great again.
>
> What am I missing here? Is there anything extra that needs to be done for
> rgw after upgrading from mimic to octopus?
>
> Please kindly help. Thanks.
>
> -
> 2021-01-24T09:24:10.192-0500 7f638f79f9c0  0 deferred set uid:gid to
> 64045:64045 (ceph:ceph)
> 2021-01-24T09:24:10.192-0500 7f638f79f9c0  0 ceph version 15.2.8
> (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable), process
> radosgw, pid 898
> 2021-01-24T09:24:10.192-0500 7f638f79f9c0  0 framework: civetweb
> 2021-01-24T09:24:10.192-0500 7f638f79f9c0  0 framework conf key: port,
> val: 80
> 2021-01-24T09:24:10.192-0500 7f638f79f9c0  0 framework conf key:
> num_threads, val: 1024
> 2021-01-24T09:24:10.192-0500 7f638f79f9c0  0 framework conf key:
> request_timeout_ms, val: 5
> 2021-01-24T09:24:10.192-0500 7f638f79f9c0  1 radosgw_Main not setting numa
> affinity
> 2021-01-24T09:29:10.195-0500 7f638cbcd700 -1 Initialization timeout,
> failed to initialize
> 2021-01-24T09:29:10.367-0500 7f4c213ba9c0  0 deferred set uid:gid to
> 64045:64045 (ceph:ceph)
> 2021-01-24T09:29:10.367-0500 7f4c213ba9c0  0 ceph version 15.2.8
> (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable), process
> radosgw, pid 1541
> 2021-01-24T09:29:10.367-0500 7f4c213ba9c0  0 framework: civetweb
> 2021-01-24T09:29:10.367-0500 7f4c213ba9c0  0 framework conf key: port,
> val: 80
> 2021-01-24T09:29:10.367-0500 7f4c213ba9c0  0 framework conf key:
> num_threads, val: 1024
> 2021-01-24T09:29:10.367-0500 7f4c213ba9c0  0 framework conf key:
> request_timeout_ms, val: 5
> 2021-01-24T09:29:10.367-0500 7f4c213ba9c0  1 radosgw_Main not setting numa
> affinity
> 2021-01-24T09:29:25.883-0500 7f4c213ba9c0  1 robust_notify: If at first
> you don't succeed: (110) Connection timed out
> 2021-01-24T09:29:25.883-0500 7f4c213ba9c0  0 ERROR: failed to distribute
> cache for coredumps.rgw.log:meta.history
> 2021-01-24T09:32:27.754-0500 7fcdac2bf9c0  0 deferred set uid:gid to
> 64045:64045 (ceph:ceph)
> 2021-01-24T09:32:27.754-0500 7fcdac2bf9c0  0 ceph version 15.2.8
> (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable), process
> radosgw, pid 978
> 2021-01-24T09:32:27.758-0500 7fcdac2bf9c0  0 framework: civetweb
> 2021-01-24T09:32:27.758-0500 7fcdac2bf9c0  0 framework conf key: port,
> val: 80
> 2021-01-24T09:32:27.758-0500 7fcdac2bf9c0  0 framework conf key:
> num_threads, val: 1024
> 2021-01-24T09:32:27.758-0500 7fcdac2bf9c0  0 framework conf key:
> request_timeout_ms, val: 5
> 2021-01-24T09:32:27.758-0500 7fcdac2bf9c0  1 radosgw_Main not setting numa
> affinity
> 2021-01-24T09:32:44.719-0500 7fcdac2bf9c0  1 robust_notify: If at first
> you don't succeed: (110) Connection timed out
> 2021-01-24T09:32:44.719-0500 7fcdac2bf9c0  0 ERROR: failed to distribute
> cache for coredumps.rgw.log:meta.history
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Balancing with upmap

2021-01-27 Thread Francois Legrand

Nope !

Le 27/01/2021 à 17:40, Anthony D'Atri a écrit :

Do you have any override reweights set to values less than 1.0?

The REWEIGHT column when you run `ceph osd df`


On Jan 27, 2021, at 8:15 AM, Francois Legrand  wrote:

Hi all,
I have a cluster with 116 disks (24 new disks of 16TB added in december and the 
rest of 8TB) running nautilus 14.2.16.
I moved (8 month ago) from crush_compat to upmap balancing.
But the cluster seems not well balanced, with a number of pgs on the 8TB disks 
varying from 26 to 52 ! And an occupation from 35 to 69%.
The recent 16 TB disks are more homogeneous with 48 to 61 pgs and space between 
30 and 43%.
Last week, I realized that some osd were maybe not using upmap because I did a 
ceph osd crush weight-set ls and got (compat) as result.
Thus I ran a ceph osd crush weight-set rm-compat which triggered some 
rebalancing. Now there is no more recovery for 2 days, but the cluster is still 
unbalanced.
As far as I understand, upmap is supposed to reach an equal number of pgs on 
all the disks (I guess weighted by their capacity).
Thus I would expect more or less 30 pgs on the 8TB disks and 60 on the 16TB and 
around 50% usage on all. Which is not the case (by far).
The problem is that it impact the free available space in the pools (264Ti 
while there is more than 578Ti free in the cluster) because free space seems to 
be based on space available before the first osd will be full !
Is it normal ? Did I missed something ? What could I do ?

F.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW Bucket notification troubleshooting

2021-01-27 Thread Yuval Lifshitz
On Wed, Jan 27, 2021 at 5:34 PM Schoonjans, Tom (RFI,RAL,-) <
tom.schoonj...@rfi.ac.uk> wrote:

> Looks like there’s already a ticket open for AMQP SSL support:
> https://tracker.ceph.com/issues/42902 (you opened it ;-))
>
> I will give a try myself if I have some time, but don’t hold your breath
> with lockdown and home schooling. Also I am not much of a C++ coder.
>
> I need to go over the logs with Tom Byrne to see why it is not working
> properly. And perhaps I will be able to come up with a fix then.
>
> However this is what I have run into so far today:
>
> 1. After configuring a bucket with a topic using the non-SSL port, I tried
> a couple of uploads to this bucket. They all hanged, which seemed like
> something was very wrong, so I Ctrl-C’ed every time. After some time I
> figured out from the RabbitMQ admin UI that Ceph was indeed connecting to
> it, and the connections remained so I killed them from the UI.
>

sending the notification to the rabbitmq server is synchronous with the
upload to the bucket. so, if the server is slow or not acking the
notification, the upload request would hang. not that the upload itself is
done first, but the reply to the client does not happen until rabbitmq
server acks.

would be great if you can share the radosgw logs.
maybe the issue is related to the user/password method we use? we use:
AMQP_SASL_METHOD_PLAIN

one possible workaround would be to set "amqp-ack-level" to "none". in this
case the radosgw does not wait for an ack

in "pacific" you could use "persistent topics" where the notifications are
sent asynchronously to the endpoint.

2. I then wrote a python script with Pika to consume the events, hoping
> that would stop the blocking. I had some minor success with this. Usually
> the first three or four uploaded files would generate events that I could
> consume with my script.
>

the radosgw is waiting for an ack from the broker, not the end consumer, so
this should not have mattered...
did you actually see any notifications delivered to the consumer?


> However, the rest would block for ever. I repeated this a couple of times
> but always the same result. I noticed that after I stopped uploading,
> removed the bucket and the topic, the connection from Ceph in the RabbitMQ
> UI remained. I killed it but it came back seconds later from another port
> on the Ceph cluster. I ended up playing whack-a-mole with this until no
> more connections would be established from Ceph to RabbitMQ. I probably
> killed a 100 or so of them.
>

once you remove the bucket there cannot be new notification sent. if you
create the bucket again you may see notifications again (this is fixed in
"pacific").
either way, even if the connection to the rabbitmq server would still be
open, but no new notifications should be sent there. just having the
connection should not be an issue but would be nice to fix that as well:
https://tracker.ceph.com/issues/49033

3. After this I couldn’t get any events sent anymore. There is no more
> blocking when uploading, files get written but nothing else happens. No
> connections are made anymore from Ceph to RabbitMQ.
>
> Hope this helps…
>

yes, this is very helpful!


> Best,
>
> Tom
>
>
>
>
> Dr Tom Schoonjans
>
> Research Software Engineer - HPC and Cloud
>
> Rosalind Franklin Institute
> Harwell Science & Innovation Campus
> Didcot
> Oxfordshire
> OX11 0FA
> United Kingdom
>
> https://www.rfi.ac.uk
>
> The Rosalind Franklin Institute is a registered charity in England and
> Wales, No. 1179810 Company Limited by Guarantee Registered in England
> and Wales, No.11266143. Funded by UK Research and Innovation through
> the Engineering and Physical Sciences Research Council.
>
> On 27 Jan 2021, at 13:04, Yuval Lifshitz  wrote:
>
>
>
> On Wed, Jan 27, 2021 at 11:33 AM Schoonjans, Tom (RFI,RAL,-) <
> tom.schoonj...@rfi.ac.uk> wrote:
>
>> Hi Yuval,
>>
>>
>> Switching to non-SSL connections to RabbitMQ allowed us to get things
>> working, although currently it’s not very reliable.
>>
>
> can you please add more about that? what reliability issues did you see?
>
>
>> I will open a new ticket over this if we can’t fix things ourselves.
>>
>>
> this would be great. we have ssl support for kafka and http endpoint, so,
> if you decide to give it a try you can look at them as examples.
> and let me know if you have questions or need help.
>
>
>
>> I will open an issue on the tracker as soon as my account request has
>> been approved :-)
>>
>> Best,
>>
>> Tom
>>
>>
>>
>>
>>
>> Dr Tom Schoonjans
>>
>> Research Software Engineer - HPC and Cloud
>>
>> Rosalind Franklin Institute
>> Harwell Science & Innovation Campus
>> Didcot
>> Oxfordshire
>> OX11 0FA
>> United Kingdom
>>
>> https://www.rfi.ac.uk
>>
>> The Rosalind Franklin Institute is a registered charity in England and
>> Wales, No. 1179810 Company Limited by Guarantee Registered in England
>> and Wales, No.11266143. Funded by UK Research and Innovation through
>> the Engineering and Physical Sciences 

[ceph-users] Balancing with upmap

2021-01-27 Thread Francois Legrand

Hi all,
I have a cluster with 116 disks (24 new disks of 16TB added in december 
and the rest of 8TB) running nautilus 14.2.16.

I moved (8 month ago) from crush_compat to upmap balancing.
But the cluster seems not well balanced, with a number of pgs on the 8TB 
disks varying from 26 to 52 ! And an occupation from 35 to 69%.
The recent 16 TB disks are more homogeneous with 48 to 61 pgs and space 
between 30 and 43%.
Last week, I realized that some osd were maybe not using upmap because I 
did a ceph osd crush weight-set ls and got (compat) as result.
Thus I ran a ceph osd crush weight-set rm-compat which triggered some 
rebalancing. Now there is no more recovery for 2 days, but the cluster 
is still unbalanced.
As far as I understand, upmap is supposed to reach an equal number of 
pgs on all the disks (I guess weighted by their capacity).
Thus I would expect more or less 30 pgs on the 8TB disks and 60 on the 
16TB and around 50% usage on all. Which is not the case (by far).
The problem is that it impact the free available space in the pools 
(264Ti while there is more than 578Ti free in the cluster) because free 
space seems to be based on space available before the first osd will be 
full !

Is it normal ? Did I missed something ? What could I do ?

F.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Where has my capacity gone?

2021-01-27 Thread George Yil
Thank you. This helps a lot.

> Josh Baergen  şunları yazdı (27 Oca 2021 17:08):
> 
> On Wed, Jan 27, 2021 at 12:24 AM George Yil  wrote:
>> May I ask if it can be dynamically changed and any disadvantages should be 
>> expected?
> 
> Unless there's some magic I'm unaware of, there is no way to
> dynamically change this. Each OSD must be recreated with the new
> min_alloc_size setting. In production systems this can be quite the
> chore, since the safest way to accomplish this is to drain the OSD
> (set it 'out', use CRUSH map changes, or use upmaps), recreate it, and
> then repopulate it. With automation this can run in the background.
> Given how much room you have currently you may be able to do this
> host-at-a-time by storing a host's data on the other hosts in a given
> rack (though I don't remember what your CRUSH tree looks like so maybe
> you can't do this and maintain host independence).
> 
> The downside is potentially more tracking metadata at the OSD level,
> though I understand that Nautilus has made improvements here. I'm not
> up to speed on the latest state in this area, though.
> 
> Josh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG inconsistent with empty inconsistent objects

2021-01-27 Thread Dan van der Ster
Usually the ceph.log prints the reason for the inconsistency when it
is first detected by scrubbing.

-- dan

On Wed, Jan 27, 2021 at 12:41 AM Richard Bade  wrote:
>
> Hi Everyone,
> I also have seen this inconsistent with empty when you do 
> list-inconsistent-obj
>
> $ sudo ceph health detail
> HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent; 1
> pgs not deep-scrubbed in time
> OSD_SCRUB_ERRORS 1 scrub errors
> PG_DAMAGED Possible data damage: 1 pg inconsistent
> pg 17.7ff is active+clean+inconsistent, acting [232,242,34,280,266,21]
> PG_NOT_DEEP_SCRUBBED 1 pgs not deep-scrubbed in time
> pg 17.1c2 not deep-scrubbed since 2021-01-15 02:46:16.271811
>
> $ sudo rados list-inconsistent-obj 17.7ff --format=json-pretty
> {
> "epoch": 183807,
> "inconsistents": []
> }
>
> Usually these are caused by read errors on the disks, but I've checked
> all osd hosts that are part of this osd and there's no smart or dmesg
> errors.
>
> Rich
>
> --
> >
> > Date: Sun, 17 Jan 2021 14:00:01 +0330
> > From: Seena Fallah 
> > Subject: [ceph-users] Re: PG inconsistent with empty inconsistent
> > objects
> > To: "Alexander E. Patrakov" 
> > Cc: ceph-users 
> > Message-ID:
> > 
> > Content-Type: text/plain; charset="UTF-8"
> >
> > It's for a long time ago and I don't have the `ceph health detail` output!
> >
> > On Sat, Jan 16, 2021 at 9:42 PM Alexander E. Patrakov 
> > wrote:
> >
> > > For a start, please post the "ceph health detail" output.
> > >
> > > сб, 19 дек. 2020 г. в 23:48, Seena Fallah :
> > > >
> > > > Hi,
> > > >
> > > > I'm facing something strange! One of the PGs in my pool got inconsistent
> > > > and when I run `rados list-inconsistent-obj $PG_ID --format=json-pretty`
> > > > the `inconsistents` key was empty! What is this? Is it a bug in Ceph
> > > or..?
> > > >
> > > > Thanks.
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> > >
> > >
> > > --
> > > Alexander E. Patrakov
> > > CV: http://u.pc.cd/wT8otalK
> > >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CEPHFS - MDS gracefull handover of rank 0

2021-01-27 Thread Konstantin Shalygin
Martin, also before restart - issue cache drop command to active mds


k

Sent from my iPhone

> On 27 Jan 2021, at 11:58, Dan van der Ster  wrote:
> 
> In our experience failovers are largely transparent if the mds has:
> 
>mds session blacklist on timeout = false
>mds session blacklist on evict = false
> 
> And clients have
> 
>client reconnect stale = true
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: "ceph orch restart mgr" command creates mgr restart loop

2021-01-27 Thread Jens Hyllegaard (Soft Design A/S)
Hi Chris

Having also recently started exploring Ceph. I too happened upon this problem.
I found that terminating the command witch ctrl-c seemed to stop the looping. 
Which btw. also happens on all other mgr instances in the cluster.

Regards

Jens

-Original Message-
From: Chris Read  
Sent: 11. januar 2021 21:54
To: ceph-users@ceph.io
Subject: [ceph-users] "ceph orch restart mgr" command creates mgr restart loop

Greetings all...

I'm busy testing out Ceph and have hit this troublesome bug while following the 
steps outlined here:

https://docs.ceph.com/en/octopus/cephadm/monitoring/#configuring-ssl-tls-for-grafana

When I issue the "ceph orch restart mgr" command, it appears the command is not 
cleared from a message queue somewhere (I'm still very unclear on many ceph 
specifics), and so each time the mgr process returns from restart it picks up 
the message again and keeps restarting itself forever (so far it's been stuck 
in this state for 45 minutes).

Watching the logs we see this going on:

$ ceph log last cephadm -w

root@ceph-poc-000:~# ceph log last cephadm -w
  cluster:
id: d23bc326-543a-11eb-bfe0-b324db228b6c
health: HEALTH_OK

  services:
mon: 5 daemons, quorum
ceph-poc-000,ceph-poc-003,ceph-poc-004,ceph-poc-002,ceph-poc-001 (age 2h)
mgr: ceph-poc-000.himivo(active, since 4s), standbys:
ceph-poc-001.unjulx
osd: 10 osds: 10 up (since 2h), 10 in (since 2h)

  data:
pools:   1 pools, 1 pgs
objects: 0 objects, 0 B
usage:   10 GiB used, 5.4 TiB / 5.5 TiB avail
pgs: 1 active+clean


2021-01-11T20:46:32.976606+ mon.ceph-poc-000 [INF] Active manager daemon 
ceph-poc-000.himivo restarted
2021-01-11T20:46:32.980749+ mon.ceph-poc-000 [INF] Activating manager 
daemon ceph-poc-000.himivo
2021-01-11T20:46:33.061519+ mon.ceph-poc-000 [INF] Manager daemon 
ceph-poc-000.himivo is now available
2021-01-11T20:46:39.156420+ mon.ceph-poc-000 [INF] Active manager daemon 
ceph-poc-000.himivo restarted
2021-01-11T20:46:39.160618+ mon.ceph-poc-000 [INF] Activating manager 
daemon ceph-poc-000.himivo
2021-01-11T20:46:39.242603+ mon.ceph-poc-000 [INF] Manager daemon 
ceph-poc-000.himivo is now available
2021-01-11T20:46:45.299953+ mon.ceph-poc-000 [INF] Active manager daemon 
ceph-poc-000.himivo restarted
2021-01-11T20:46:45.304006+ mon.ceph-poc-000 [INF] Activating manager 
daemon ceph-poc-000.himivo
2021-01-11T20:46:45.733495+ mon.ceph-poc-000 [INF] Manager daemon 
ceph-poc-000.himivo is now available
2021-01-11T20:46:51.871903+ mon.ceph-poc-000 [INF] Active manager daemon 
ceph-poc-000.himivo restarted
2021-01-11T20:46:51.877107+ mon.ceph-poc-000 [INF] Activating manager 
daemon ceph-poc-000.himivo
2021-01-11T20:46:51.976190+ mon.ceph-poc-000 [INF] Manager daemon 
ceph-poc-000.himivo is now available
2021-01-11T20:46:58.000720+ mon.ceph-poc-000 [INF] Active manager daemon 
ceph-poc-000.himivo restarted
2021-01-11T20:46:58.006843+ mon.ceph-poc-000 [INF] Activating manager 
daemon ceph-poc-000.himivo
2021-01-11T20:46:58.097163+ mon.ceph-poc-000 [INF] Manager daemon 
ceph-poc-000.himivo is now available
2021-01-11T20:47:04.188630+ mon.ceph-poc-000 [INF] Active manager daemon 
ceph-poc-000.himivo restarted
2021-01-11T20:47:04.193501+ mon.ceph-poc-000 [INF] Activating manager 
daemon ceph-poc-000.himivo
2021-01-11T20:47:04.285509+ mon.ceph-poc-000 [INF] Manager daemon 
ceph-poc-000.himivo is now available
2021-01-11T20:47:10.348099+ mon.ceph-poc-000 [INF] Active manager daemon 
ceph-poc-000.himivo restarted
2021-01-11T20:47:10.352340+ mon.ceph-poc-000 [INF] Activating manager 
daemon ceph-poc-000.himivo
2021-01-11T20:47:10.752243+ mon.ceph-poc-000 [INF] Manager daemon 
ceph-poc-000.himivo is now available

And in the logs for the mgr instance itself we see it keep replaying the 
message over and over:

$ docker logs -f
ceph-d23bc326-543a-11eb-bfe0-b324db228b6c-mgr.ceph-poc-000.himivo
debug 2021-01-11T20:47:31.390+ 7f48b0d0d200  0 set uid:gid to 167:167
(ceph:ceph)
debug 2021-01-11T20:47:31.390+ 7f48b0d0d200  0 ceph version 15.2.8
(bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable), process ceph-mgr, 
pid 1 debug 2021-01-11T20:47:31.390+ 7f48b0d0d200  0 pidfile_write: ignore 
empty --pid-file debug 2021-01-11T20:47:31.414+ 7f48b0d0d200  1 mgr[py] 
Loading python module 'alerts'
debug 2021-01-11T20:47:31.486+ 7f48b0d0d200  1 mgr[py] Loading python 
module 'balancer'
debug 2021-01-11T20:47:31.542+ 7f48b0d0d200  1 mgr[py] Loading python 
module 'cephadm'
debug 2021-01-11T20:47:31.742+ 7f48b0d0d200  1 mgr[py] Loading python 
module 'crash'
debug 2021-01-11T20:47:31.798+ 7f48b0d0d200  1 mgr[py] Loading python 
module 'dashboard'
debug 2021-01-11T20:47:32.258+ 7f48b0d0d200  1 mgr[py] Loading python 
module 'devicehealth'
debug 2021-01-11T20:47:32.306+ 7f48b0d0d200  1 mgr[py] Loading python 
module 'diskprediction_local'
debug 

[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-27 Thread Adam Boyhan
Doing some more testing. 

I can demote the rbd image on the primary, promote on the secondary and the 
image looks great. I can map it, mount it, and it looks just like it should. 

However, the rbd snapshots are still unusable on the secondary even when 
promoted. I went as far as taking a 2nd snapshot on the rbd before 
demoting/promoting and that 2nd snapshot still won't work either. Both 
snapshots look and work great on the primary. 

If I request a resync from the secondary the rbd snapshots start working just 
like the primary, but only if I request a resync. 

I get the same exact results whether I am using a 15.2.8 or 16.1 client. 




From: "adamb"  
To: "dillaman"  
Cc: "ceph-users" , "Matt Wilder"  
Sent: Tuesday, January 26, 2021 1:51:13 PM 
Subject: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

Did some testing with clients running 16.1. I setup two different clients, each 
one dedicated to its perspective cluster. Running Proxmox, I compiled the 
latest Pacific 16.1 build. 

root@Ccspacificclient:/cephbuild/ceph/build/bin# ./ceph -v 
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 
ceph version 16.1.0-8-g5f17c37f78 (5f17c37f78a331b7a4bf793890f9d324c64183e5) 
pacific (rc) 

root@Bunkpacificclient:/cephbuild/ceph/build/bin# ./ceph -v 
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 
ceph version 16.1.0-8-g5f17c37f78 (5f17c37f78a331b7a4bf793890f9d324c64183e5) 
pacific (rc) 

Unfortunately, I am hitting the same exact issues using a pacific client. 

Would this confirm that its something specific in 15.2.8 on the osd/mon nodes? 






From: "Jason Dillaman"  
To: "adamb"  
Cc: "ceph-users" , "Matt Wilder"  
Sent: Friday, January 22, 2021 3:44:26 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

On Fri, Jan 22, 2021 at 3:29 PM Adam Boyhan  wrote: 
> 
> I will have to do some looking into how that is done on Proxmox, but most 
> definitely. 

Thanks, appreciate it. 

>  
> From: "Jason Dillaman"  
> To: "adamb"  
> Cc: "ceph-users" , "Matt Wilder"  
> Sent: Friday, January 22, 2021 3:02:23 PM 
> Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> 
> Any chance you can attempt to repeat the process on the latest master 
> or pacific branch clients (no need to upgrade the MONs/OSDs)? 
> 
> On Fri, Jan 22, 2021 at 2:32 PM Adam Boyhan  wrote: 
> > 
> > The steps are pretty straight forward. 
> > 
> > - Create rbd image of 500G on the primary 
> > - Enable rbd-mirror snapshot on the image 
> > - Map the image on the primary 
> > - Format the block device with ext4 
> > - Mount it and write out 200-300G worth of data (I am using rsync with some 
> > local real data we have) 
> > - Unmap the image from the primary 
> > - Create rdb snapshot 
> > - Create rdb mirror snapshot 
> > - Wait for copy process to complete 
> > - Clone the rdb snapshot on secondary 
> > - Map the image on secondary 
> > - Try to mount on secondary 
> > 
> > Just as a reference. All of my nodes are the same. 
> > 
> > root@Bunkcephtest1:~# ceph --version 
> > ceph version 15.2.8 (8b89984e92223ec320fb4c70589c39f384c86985) octopus 
> > (stable) 
> > 
> > root@Bunkcephtest1:~# dpkg -l | grep rbd-mirror 
> > ii rbd-mirror 15.2.8-pve2 amd64 Ceph daemon for mirroring RBD images 
> > 
> > This is pretty straight forward, I don't know what I could be missing here. 
> > 
> > 
> >  
> > From: "Jason Dillaman"  
> > To: "adamb"  
> > Cc: "ceph-users" , "Matt Wilder" 
> >  
> > Sent: Friday, January 22, 2021 2:11:36 PM 
> > Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> > 
> > Any chance you could write a small reproducer test script? I can't 
> > repeat what you are seeing and we do have test cases that really 
> > hammer random IO on primary images, create snapshots, rinse-and-repeat 
> > and they haven't turned up anything yet. 
> > 
> > Thanks! 
> > 
> > On Fri, Jan 22, 2021 at 1:50 PM Adam Boyhan  wrote: 
> > > 
> > > I have been doing a lot of testing. 
> > > 
> > > The size of the RBD image doesn't have any effect. 
> > > 
> > > I run into the issue once I actually write data to the rbd. The more data 
> > > I write out, the larger the chance of reproducing the issue. 
> > > 
> > > I seem to hit the issue of missing the filesystem all together the most, 
> > > but I have also had a few instances where some of the data was simply 
> > > missing. 
> > > 
> > > I monitor the mirror status on the remote cluster until the snapshot is 
> > > 100% copied and also make sure all the IO is done. My setup has no issue 
> > > maxing out my 10G interconnect during replication, so its pretty obvious 
> > > once its done. 
> > > 
> > > The only way I have found to resolve the issue is to call a mirror resync 
> > > on the secondary array. 
> > > 
> > > I can then map the rbd on the primary, write more data to it, snap it 
> > > again, and I am back in the same 

[ceph-users] Re: CEPHFS - MDS gracefull handover of rank 0

2021-01-27 Thread Dan van der Ster
Hi,

In our experience failovers are largely transparent if the mds has:

mds session blacklist on timeout = false
mds session blacklist on evict = false

And clients have

client reconnect stale = true

Cheers, Dan

On Wed, Jan 27, 2021 at 9:09 AM Martin Hronek
 wrote:
>
> Hello fellow CEPH-users,
> currently we are updating our CEPH(14.2.16) and making changes to some
> config settings.
>
> TLDR: is there a way to make a graceful MDS active node shutdown without
> loosing the caps, open files and client connections? Something like
> handover active state, promote standby to active, ...?
>
>
> Sadly we run into some difficulties when restarting MDS Nodes. While we
> had two active nodes and one standby we initially though that this would
> have a nice handover when restarting the active rank ... sadly we saw
> how the node was going through the states:
> replay-reconnect-rejoin-active as nicely visualized here
> https://docs.ceph.com/en/latest/cephfs/mds-states/
>
> This left some nodes going into timeouts until the standby node has gone
> into the active state again, most probably since the cephfs hast already
> some 600k folders and 3M files and from the client side it took more
> than 30s.
>
> So before the next MDS the FS config where changed to one active and one
> standby-replay node, the idea was that since the MDS replay nodes
> follows the active one the handover would be smoother. The active state
> was reached faster, but we still noticed some hiccups on the clients
> while the new active MDS was waiting for clients to reconnect(state
> up:reconnect) after the failover.
>
> The next idea was to do a manual node promotion, graceful shutdown or
> something similar - where the open caps and sessions would be handed
> over ... but I did not find any hint in the docs regarding this
> functionality.
> But, this should somehow be possible (imho), since when adding a second
> active mds node (max_mds 2) and then removing it again (max_mds 1) the
> rank 1 node goes to stopping-state and hands over all clients/caps to
> rank 0 without interruptions for the clients.
>
> Therefore my question: how can one gracefully shutdown an active rank 0
> mds node or promote an standby node to the active state without loosing
> open files/caps or client sessions?
>
> Thanks in advance,
> M
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CEPHFS - MDS gracefull handover of rank 0

2021-01-27 Thread Martin Hronek

Hello fellow CEPH-users,
currently we are updating our CEPH(14.2.16) and making changes to some 
config settings.


TLDR: is there a way to make a graceful MDS active node shutdown without 
loosing the caps, open files and client connections? Something like 
handover active state, promote standby to active, ...?



Sadly we run into some difficulties when restarting MDS Nodes. While we 
had two active nodes and one standby we initially though that this would 
have a nice handover when restarting the active rank ... sadly we saw 
how the node was going through the states: 
replay-reconnect-rejoin-active as nicely visualized here 
https://docs.ceph.com/en/latest/cephfs/mds-states/


This left some nodes going into timeouts until the standby node has gone 
into the active state again, most probably since the cephfs hast already 
some 600k folders and 3M files and from the client side it took more 
than 30s.


So before the next MDS the FS config where changed to one active and one 
standby-replay node, the idea was that since the MDS replay nodes 
follows the active one the handover would be smoother. The active state 
was reached faster, but we still noticed some hiccups on the clients 
while the new active MDS was waiting for clients to reconnect(state 
up:reconnect) after the failover.


The next idea was to do a manual node promotion, graceful shutdown or 
something similar - where the open caps and sessions would be handed 
over ... but I did not find any hint in the docs regarding this 
functionality.
But, this should somehow be possible (imho), since when adding a second 
active mds node (max_mds 2) and then removing it again (max_mds 1) the 
rank 1 node goes to stopping-state and hands over all clients/caps to 
rank 0 without interruptions for the clients.


Therefore my question: how can one gracefully shutdown an active rank 0 
mds node or promote an standby node to the active state without loosing 
open files/caps or client sessions?


Thanks in advance,
M
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io