[ceph-users] Re: Recoveries without any misplaced objects?

2024-04-24 Thread David Orman
It is RGW, but the index is on a different pool. Not seeing any key/s being 
reported in recovery. We've definitely had OSDs flap multiple times.

David

On Wed, Apr 24, 2024, at 16:48, Anthony D'Atri wrote:
> Do you see *keys* aka omap traffic?  Especially if you have RGW set up?
>
>> On Apr 24, 2024, at 15:37, David Orman  wrote:
>> 
>> Did you ever figure out what was happening here?
>> 
>> David
>> 
>> On Mon, May 29, 2023, at 07:16, Hector Martin wrote:
>>> On 29/05/2023 20.55, Anthony D'Atri wrote:
>>>> Check the uptime for the OSDs in question
>>> 
>>> I restarted all my OSDs within the past 10 days or so. Maybe OSD
>>> restarts are somehow breaking these stats?
>>> 
>>>> 
>>>>> On May 29, 2023, at 6:44 AM, Hector Martin  wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I'm watching a cluster finish a bunch of backfilling, and I noticed that
>>>>> quite often PGs end up with zero misplaced objects, even though they are
>>>>> still backfilling.
>>>>> 
>>>>> Right now the cluster is down to 6 backfilling PGs:
>>>>> 
>>>>> data:
>>>>>   volumes: 1/1 healthy
>>>>>   pools:   6 pools, 268 pgs
>>>>>   objects: 18.79M objects, 29 TiB
>>>>>   usage:   49 TiB used, 25 TiB / 75 TiB avail
>>>>>   pgs: 262 active+clean
>>>>>6   active+remapped+backfilling
>>>>> 
>>>>> But there are no misplaced objects, and the misplaced column in `ceph pg
>>>>> dump` is zero for all PGs.
>>>>> 
>>>>> If I do a `ceph pg dump_json`, I can see `num_objects_recovered`
>>>>> increasing for these PGs... but the misplaced count is still 0.
>>>>> 
>>>>> Is there something else that would cause recoveries/backfills other than
>>>>> misplaced objects? Or perhaps there is a bug somewhere causing the
>>>>> misplaced object count to be misreported as 0 sometimes?
>>>>> 
>>>>> # ceph -v
>>>>> ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
>>>>> (stable)
>>>>> 
>>>>> - Hector
>>>>> ___
>>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>> 
>>>> 
>>> 
>>> - Hector
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recoveries without any misplaced objects?

2024-04-24 Thread David Orman
Did you ever figure out what was happening here?

David

On Mon, May 29, 2023, at 07:16, Hector Martin wrote:
> On 29/05/2023 20.55, Anthony D'Atri wrote:
>> Check the uptime for the OSDs in question
>
> I restarted all my OSDs within the past 10 days or so. Maybe OSD
> restarts are somehow breaking these stats?
>
>> 
>>> On May 29, 2023, at 6:44 AM, Hector Martin  wrote:
>>>
>>> Hi,
>>>
>>> I'm watching a cluster finish a bunch of backfilling, and I noticed that
>>> quite often PGs end up with zero misplaced objects, even though they are
>>> still backfilling.
>>>
>>> Right now the cluster is down to 6 backfilling PGs:
>>>
>>>  data:
>>>volumes: 1/1 healthy
>>>pools:   6 pools, 268 pgs
>>>objects: 18.79M objects, 29 TiB
>>>usage:   49 TiB used, 25 TiB / 75 TiB avail
>>>pgs: 262 active+clean
>>> 6   active+remapped+backfilling
>>>
>>> But there are no misplaced objects, and the misplaced column in `ceph pg
>>> dump` is zero for all PGs.
>>>
>>> If I do a `ceph pg dump_json`, I can see `num_objects_recovered`
>>> increasing for these PGs... but the misplaced count is still 0.
>>>
>>> Is there something else that would cause recoveries/backfills other than
>>> misplaced objects? Or perhaps there is a bug somewhere causing the
>>> misplaced object count to be misreported as 0 sometimes?
>>>
>>> # ceph -v
>>> ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
>>> (stable)
>>>
>>> - Hector
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
>> 
>
> - Hector
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: DB/WALL and RGW index on the same NVME

2024-04-08 Thread David Orman
I would suggest that you might consider EC vs. replication for index data, and 
the latency implications. There's more than just the nvme vs. rotational 
discussion to entertain, especially if using the more widely spread EC modes 
like 8+3. It would be worth testing for your particular workload.

Also make sure to factor in storage utilization if you expect to see 
versioning/object lock in use. This can be the source of a significant amount 
of additional consumption that isn't planned for initially.

On Mon, Apr 8, 2024, at 01:42, Daniel Parkes wrote:
> Hi Lukasz,
>
> RGW uses Omap objects for the index pool; Omaps are stored in Rocksdb
> database of each osd, not on the actual index pool, so by putting DB/WALL
> on an NVMe as you mentioned, you are already configuring the index pool on
> a non-rotational drive, you don't need to do anything else.
>
> You just need to size your DB/WALL partition accordingly. For RGW/object
> storage, a good starting point for the DB/Wall sizing is 4%.
>
> Example of Omap entries in the index pool using 0 bytes, as they are stored
> in Rocksdb:
>
> # rados -p default.rgw.buckets.index listomapkeys
> .dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
> file1
> file2
> file4
> file10
>
> rados df -p default.rgw.buckets.index
> POOL_NAME  USED  OBJECTS  CLONES  COPIES
> MISSING_ON_PRIMARY  UNFOUND  DEGRADED  RD_OPS   RD  WR_OPS  WR
>  USED COMPR  UNDER COMPR
> default.rgw.buckets.index   0 B   11   0  33
> 00 0 208  207 KiB  41  20 KiB 0 B
> 0 B
>
> # rados -p default.rgw.buckets.index stat
> .dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
> default.rgw.buckets.index/.dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
> mtime 2022-12-20T07:32:11.00-0500, size 0
>
>
> On Sun, Apr 7, 2024 at 10:06 PM Lukasz Borek  wrote:
>
>> Hi!
>>
>> I'm working on a POC cluster setup dedicated to backup app writing objects
>> via s3 (large objects, up to 1TB transferred via multipart upload process).
>>
>> Initial setup is 18 storage nodes (12HDDs + 1 NVME card for DB/WALL) + EC
>> pool.  Plan is to use cephadm.
>>
>> I'd like to follow good practice and put the RGW index pool on a
>> no-rotation drive. Question is how to do it?
>>
>>- replace a few HDDs (1 per node) with a SSD (how many? 4-6-8?)
>>- reserve space on NVME drive on each node, create lv based OSD and let
>>rgb index use the same NVME drive as DB/WALL
>>
>> Thoughts?
>>
>> --
>> Lukasz
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-07 Thread David Orman
That tracker's last update indicates it's slated for inclusion.

On Thu, Feb 1, 2024, at 10:47, Zakhar Kirpichenko wrote:
> Hi, 
>
> Please consider not leaving this behind: 
> https://github.com/ceph/ceph/pull/55109
>
> It's a serious bug, which potentially affects a whole node stability if 
> the affected mgr is colocated with OSDs. The bug was known for quite a 
> while and really shouldn't be left unfixed. 
>
> /Z
>
> On Thu, 1 Feb 2024 at 18:45, Nizamudeen A  wrote:
>> Thanks Laura,
>> 
>> Raised a PR for  https://tracker.ceph.com/issues/57386
>> https://github.com/ceph/ceph/pull/55415
>> 
>> 
>> On Thu, Feb 1, 2024 at 5:15 AM Laura Flores  wrote:
>> 
>> > I reviewed the rados suite. @Adam King , @Nizamudeen A
>> >  would appreciate a look from you, as there are some
>> > orchestrator and dashboard trackers that came up.
>> >
>> > pacific-release, 16.2.15
>> >
>> > Failures:
>> > 1. https://tracker.ceph.com/issues/62225
>> > 2. https://tracker.ceph.com/issues/64278
>> > 3. https://tracker.ceph.com/issues/58659
>> > 4. https://tracker.ceph.com/issues/58658
>> > 5. https://tracker.ceph.com/issues/64280 -- new tracker, worth a look
>> > from Orch
>> > 6. https://tracker.ceph.com/issues/63577
>> > 7. https://tracker.ceph.com/issues/63894
>> > 8. https://tracker.ceph.com/issues/64126
>> > 9. https://tracker.ceph.com/issues/63887
>> > 10. https://tracker.ceph.com/issues/61602
>> > 11. https://tracker.ceph.com/issues/54071
>> > 12. https://tracker.ceph.com/issues/57386
>> > 13. https://tracker.ceph.com/issues/64281
>> > 14. https://tracker.ceph.com/issues/49287
>> >
>> > Details:
>> > 1. pacific upgrade test fails on 'ceph versions | jq -e' command -
>> > Ceph - RADOS
>> > 2. Unable to update caps for client.iscsi.iscsi.a - Ceph - Orchestrator
>> > 3. mds_upgrade_sequence: failure when deploying node-exporter - Ceph -
>> > Orchestrator
>> > 4. mds_upgrade_sequence: Error: initializing source
>> > docker://prom/alertmanager:v0.20.0 - Ceph - Orchestrator
>> > 5. mgr-nfs-upgrade test times out from failed cephadm daemons - Ceph -
>> > Orchestrator
>> > 6. cephadm: docker.io/library/haproxy: toomanyrequests: You have
>> > reached your pull rate limit. You may increase the limit by authenticating
>> > and upgrading: https://www.docker.com/increase-rate-limit - Ceph -
>> > Orchestrator
>> > 7. qa: cephadm failed with an error code 1, alertmanager container not
>> > found. - Ceph - Orchestrator
>> > 8. ceph-iscsi build was retriggered and now missing
>> > package_manager_version attribute - Ceph
>> > 9. Starting alertmanager fails from missing container - Ceph -
>> > Orchestrator
>> > 10. pacific: cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do
>> > not have an application enabled (POOL_APP_NOT_ENABLED) - Ceph - RADOS
>> > 11. rados/cephadm/osds: Invalid command: missing required parameter
>> > hostname() - Ceph - Orchestrator
>> > 12. cephadm/test_dashboard_e2e.sh: Expected to find content: '/^foo$/'
>> > within the selector: 'cd-modal .badge' but never did - Ceph - Mgr -
>> > Dashboard
>> > 13. Failed to download key at
>> > http://download.ceph.com/keys/autobuild.asc: Request failed: > > error [Errno 101] Network is unreachable> - Infrastructure
>> > 14. podman: setting cgroup config for procHooks process caused: Unit
>> > libpod-$hash.scope not found - Ceph - Orchestrator
>> >
>> > On Wed, Jan 31, 2024 at 1:41 PM Casey Bodley  wrote:
>> >
>> >> On Mon, Jan 29, 2024 at 4:39 PM Yuri Weinstein 
>> >> wrote:
>> >> >
>> >> > Details of this release are summarized here:
>> >> >
>> >> > https://tracker.ceph.com/issues/64151#note-1
>> >> >
>> >> > Seeking approvals/reviews for:
>> >> >
>> >> > rados - Radek, Laura, Travis, Ernesto, Adam King
>> >> > rgw - Casey
>> >>
>> >> rgw approved, thanks
>> >>
>> >> > fs - Venky
>> >> > rbd - Ilya
>> >> > krbd - in progress
>> >> >
>> >> > upgrade/nautilus-x (pacific) - Casey PTL (regweed tests failed)
>> >> > upgrade/octopus-x (pacific) - Casey PTL (regweed tests failed)
>> >> >
>> >> > upgrade/pacific-x (quincy) - in progress
>> >> > upgrade/pacific-p2p - Ilya PTL (maybe rbd related?)
>> >> >
>> >> > ceph-volume - Guillaume
>> >> >
>> >> > TIA
>> >> > YuriW
>> >> > ___
>> >> > ceph-users mailing list -- ceph-users@ceph.io
>> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >> >
>> >> ___
>> >> Dev mailing list -- d...@ceph.io
>> >> To unsubscribe send an email to dev-le...@ceph.io
>> >>
>> >
>> >
>> > --
>> >
>> > Laura Flores
>> >
>> > She/Her/Hers
>> >
>> > Software Engineer, Ceph Storage 
>> >
>> > Chicago, IL
>> >
>> > lflo...@ibm.com | lflo...@redhat.com 
>> > M: +17087388804
>> >
>> >
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to 

[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-02-05 Thread David Orman
Hi,

Just looking back through PyO3 issues, it would appear this functionality was 
never supported:

https://github.com/PyO3/pyo3/issues/3451
https://github.com/PyO3/pyo3/issues/576

It just appears attempting to use this functionality (which does not 
work/exist) wasn't successfully prevented previously, and is now. I see a few 
PRs in associated projects (such as bcrypt) where they attempted to rollback 
(example):

https://github.com/pyca/bcrypt/pull/714

This will restore functionality to the way it was before (not sure if the same 
exists for other libraries) but they are basically stop-gaps until proper 
support exists in PyO3, which may or may not happen in the near future. It 
sounds like the rollbacks are being considered if the upstream issue isn't 
resolved in some undefined timeline.

I just wanted to add this information to further the discussion, as I know it 
does not resolve your immediate problem(s). It sounds like we need to discuss 
the reliance on PyO3 if this is necessary functionality from the library which 
isn't actually implemented but was just permitted in error and has an undefined 
or ill-defined target date of resolution (sounds like a large upstream 
project). I don't pretend to know the complexities around an alternative 
implementation, but it seems worth at least a cursory investigation, as 
behavior right now (prior to the blocking change) may be somewhat undefined 
even if not throwing errors, according to the above PyO3 issues.

David 

On Fri, Feb 2, 2024, at 10:20, Chris Palmer wrote:
> Hi Matthew
>
> AFAIK the upgrade from quincy/deb11 to reef/deb12 is not possible:
>
>   * The packaging problem you can work around, and a fix is pending
>   * You have to upgrade both the OS and Ceph in one step
>   * The MGR will not run under deb12 due to the PyO3 lack of support for
> subinterpreters.
>
> If you do attempt an upgrade, you will end up stuck with a partially 
> upgraded cluster. The MONs will be on deb12/reef and cannot be 
> downgraded, and the MGR will be stuck on deb11/quincy, We have a test 
> cluster in that state with no way forward or back.
>
> I fear the MGR problem will spread as time goes on and PyO3 updates 
> occur. And it's not good that it can silently corrupt in the existing 
> apparently-working installations.
>
> No-one has picked up issue 64213 that I raised yet.
>
> I'm tempted to raise another issue for qa : the debian 12 package cannot 
> have been tested as it just won't work either as an upgrade or a new 
> install.
>
> Regards, Chris
>
>
> On 02/02/2024 14:40, Matthew Darwin wrote:
>> Chris,
>>
>> Thanks for all the investigations you are doing here. We're on 
>> quincy/debian11.  Is there any working path at this point to 
>> reef/debian12?  Ideally I want to go in two steps.  Upgrade ceph first 
>> or upgrade debian first, then do the upgrade to the other one. Most of 
>> our infra is already upgraded to debian 12, except ceph.
>>
>> On 2024-01-29 07:27, Chris Palmer wrote:
>>> I have logged this as https://tracker.ceph.com/issues/64213
>>>
>>> On 16/01/2024 14:18, DERUMIER, Alexandre wrote:
 Hi,

>> ImportError: PyO3 modules may only be initialized once per
>> interpreter
>> process
>>
>> and ceph -s reports "Module 'dashboard' has failed dependency: PyO3
>> modules may only be initialized once per interpreter process
 We have the same problem on proxmox8 (based on debian12) with ceph
 quincy or reef.

 It seem to be related to python version on debian12

 (we have no fix for this currently)



>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RFI: Prometheus, Etc, Services - Optimum Number To Run

2024-01-21 Thread David Orman
The "right" way to do this is to not run your metrics system on the cluster you 
want to monitor. Use the provided metrics via the exporter and ingest them 
using your own system (ours is Mimir/Loki/Grafana + related alerting), so if 
you have failures of nodes/etc you still have access to, at a minimum, your 
metrics/log data and alerting. Using the built-in services is a great stop-gap, 
but in my opinion, should not be relied on for production operation of Ceph 
clusters (or any software, for that matter.) Spin up some VMs if that's what 
you have available to you and manage your LGTM (or other choice) externally.

Cheers,
David

On Fri, Jan 19, 2024, at 23:42, duluxoz wrote:
> Hi All,
>
> In regards to the monitoring services on a Ceph Cluster (ie Prometheus, 
> Grafana, Alertmanager, Loki, Node-Exported, Promtail, etc) how many 
> instances should/can we run for fault tolerance purposes? I can't seem 
> to recall that advice being in the doco anywhere (but of course, I 
> probably missed it).
>
> I'm concerned about HA on those services - will they continue to run if 
> the Ceph Node they're on fails?
>
> At the moment we're running only 1 instance of each in the cluster, but 
> several Ceph Nodes are capable of running each - ie/eg 3 nodes 
> configured but only count:1.
>
> This is on the latest version of Reef using cephadmin (if it makes a 
> huge difference :-) ).
>
> So any advice, etc, would be greatly appreciated, including if we should 
> be running any services not mentioned (not Mgr, Mon, OSD, or iSCSI, 
> obviously :-) )
>
> Cheers
>
> Dulux-Oz
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CLT Meeting Minutes 2024-01-03

2024-01-03 Thread David Orman
Happy 2024!

Today's CLT meeting covered the following:

 1. 2024 brings a focus on performance of Crimson (some information here: 
https://docs.ceph.com/en/reef/dev/crimson/crimson/ )
   1. Status is available here: https://github.com/ceph/ceph.io/pull/635
   2. There will be a new Crimson performance weekly meeting that will be lead 
by Matan Breizman
 1. This does not replace the existing performance weekly, and is focused 
on Crimson
 2. An email will follow with more details about this meeting
 2. Ceph Quarterly will be published on/around the 14th of January, 2024.
   1. See https://ceph.io/en/community/cq/ for previous issues of CQ
 3. A development freeze on Squid is tentatively scheduled for January 31, 2024
 4. Upcoming releases
   1. 16.2.15 is next (the last Pacific release)
 1. Anticipated by the end of January
   2. 17.2.8 will follow (Quincy)
   3. 18.2.2 will follow this (Reef)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RadosGW public HA traffic - best practices?

2023-11-17 Thread David Orman
I apologize, I somehow missed you cannot do BGP. I don't know of a better 
solution for you if this is the case. You'll just want to make sure to do 
graceful shutdowns of haproxy when necessary to do maintenance work to avoid 
severing active connections. At some point, though, timeouts will likely 
happens, so the impact won't be non-zero, but it also won't be catastrophic.

David

On Fri, Nov 17, 2023, at 10:09, David Orman wrote:
> Use BGP/ECMP with something like exabgp on the haproxy servers.
>
> David
>
> On Fri, Nov 17, 2023, at 04:09, Boris Behrens wrote:
>> Hi,
>> I am looking for some experience on how people make their RGW public.
>>
>> Currently we use the follow:
>> 3 IP addresses that get distributed via keepalived between three HAproxy
>> instances, which then balance to three RGWs.
>> The caveat is, that keepalived is PITA to get working in distributing a set
>> of IP addresses, and it doesn't scale very well (up and down).
>> The upside is, that it is really stable and customer nearly never have an
>> availability problem. And we have 3 IPs that make some sort of LB. It
>> serves up to 24Gbit in peak times, when all those backup jobs are running
>> at night.
>>
>> But today I thought, what will happen if I just ditch the keepalived and
>> configure thos addresses static to the haproxy hosts?
>> How bad will the impact to a customer if I reboot one haproxy? Is there an
>> easier, more scalable way if I want to spread the load even further without
>> having an ingress HW LB (what I don't have)?
>>
>> I have a lot of hosts that would be able to host some POD with a haproxy
>> and a RGW as container together, or even host the RGW alone in a container.
>> It would just need to bridge two networks.
>> But I currently do not have a way to use BGP to have one IP address split
>> between a set of RGW instances.
>>
>> So long story short:
>> What are your easy setups to serve public RGW traffic with some sort of HA
>> and LB (without using a big HW LB that is capable of 100GBit traffic)?
>> And have you experienced problems when you do not shift around IP addresses.
>>
>> Cheers
>>  Boris
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RadosGW public HA traffic - best practices?

2023-11-17 Thread David Orman
Use BGP/ECMP with something like exabgp on the haproxy servers.

David

On Fri, Nov 17, 2023, at 04:09, Boris Behrens wrote:
> Hi,
> I am looking for some experience on how people make their RGW public.
>
> Currently we use the follow:
> 3 IP addresses that get distributed via keepalived between three HAproxy
> instances, which then balance to three RGWs.
> The caveat is, that keepalived is PITA to get working in distributing a set
> of IP addresses, and it doesn't scale very well (up and down).
> The upside is, that it is really stable and customer nearly never have an
> availability problem. And we have 3 IPs that make some sort of LB. It
> serves up to 24Gbit in peak times, when all those backup jobs are running
> at night.
>
> But today I thought, what will happen if I just ditch the keepalived and
> configure thos addresses static to the haproxy hosts?
> How bad will the impact to a customer if I reboot one haproxy? Is there an
> easier, more scalable way if I want to spread the load even further without
> having an ingress HW LB (what I don't have)?
>
> I have a lot of hosts that would be able to host some POD with a haproxy
> and a RGW as container together, or even host the RGW alone in a container.
> It would just need to bridge two networks.
> But I currently do not have a way to use BGP to have one IP address split
> between a set of RGW instances.
>
> So long story short:
> What are your easy setups to serve public RGW traffic with some sort of HA
> and LB (without using a big HW LB that is capable of 100GBit traffic)?
> And have you experienced problems when you do not shift around IP addresses.
>
> Cheers
>  Boris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph_leadership_team_meeting_s18e06.mkv

2023-09-08 Thread David Orman
I would suggest updating: https://tracker.ceph.com/issues/59580

We did notice it with 16.2.13, as well, after upgrading from .10, so likely 
in-between those two releases.

David

On Fri, Sep 8, 2023, at 04:00, Loïc Tortay wrote:
> On 07/09/2023 21:33, Mark Nelson wrote:
>> Hi Rok,
>> 
>> We're still try to catch what's causing the memory growth, so it's hard 
>> to guess at which releases are affected.  We know it's happening 
>> intermittently on a live Pacific cluster at least.  If you have the 
>> ability to catch it while it's happening, there are several 
>> approaches/tools that might aid in diagnosing it. Container deployments 
>> are a bit tougher to get debugging tools working in though which afaik 
>> has slowed down existing attempts at diagnosing the issue.
>> 
> Hello,
> We have a cluster recently upgraded from Octopus to Pacific 16.2.13 
> where the active MGR was OOM-killed a few times.
>
> We have another cluster that was recently upgraded from 16.2.11 to 
> 16.2.14 and the issue also started to appear (very soon) on that cluster.
> We didn't have the issue before, during the months running 16.2.11.
>
> In short: the issue seems to be due to a change in 16.2.12 or 16.2.13.
>
>
> Loïc.
> -- 
> |   Loīc Tortay  - IN2P3 Computing Centre  |
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MGR Memory Leak in Restful

2023-09-08 Thread David Orman
Hi,

I do not believe this is actively being worked on, but there is a tracker open, 
if you can submit an update it may help attract attention/develop a fix: 
https://tracker.ceph.com/issues/59580

David

On Fri, Sep 8, 2023, at 03:29, Chris Palmer wrote:
> I first posted this on 17 April but did not get any response (although 
> IIRC a number of other posts referred to it).
> Seeing as MGR OOM is being discussed at the moment I am re-posting.
> These clusters are not containerized.
>
> Is this being tracked/fixed or not?
>
> Thanks, Chris
>
> ---
>
> We've hit a memory leak in the Manager Restful interface, in versions 
> 17.2.5 & 17.2.6. On our main production cluster the active MGR grew to 
> about 60G until the oom_reaper killed it, causing a successful failover 
> and restart of the failed one. We can then see that the problem is 
> recurring, actually on all 3 of our clusters.
>
> We've traced this to when we enabled full Ceph monitoring by Zabbix last 
> week. The leak is about 20GB per day, and seems to be proportional to 
> the number of PGs. For some time we just had the default settings, and 
> no memory leak, but had not got around to finding why many of the Zabbix 
> items were showing as Access Denied. We traced this to the MGR's MON 
> CAPS which were "mon 'profile mgr'".
>
> The MON logs showed recurring:
>
> log_channel(audit) log [DBG] : from='mgr.284576436 
> 192.168.xxx.xxx:0/2356365' entity='mgr.host1' cmd=[{"format": "json", 
> "prefix": "pg dump"}]:  access denied
>
>
> Changing the MGR CAPS to "mon 'allow *'" and restarting the MGR 
> immediately allowed that to work, and all the follow-on REST calls worked.
>
> log_channel(audit) log [DBG] : from='mgr.283590200 
> 192.168.xxx.xxx:0/1779' entity='mgr.host1' cmd=[{"format": "json", 
> "prefix": "pg dump"}]: dispatch
>
>
> However it has also caused the memory leak to start.
>
> We've reverted the CAPS and are back to how we were.
>
> Two questions:
> 1) No matter what the REST consumer is doing, the MGR should not 
> accumulate memory, especially as we can see that the REST TCP 
> connections have wrapped up. Is there anything more we can do to 
> diagnose this?
> 2) Setting "allow *" worked, but is there are better setting just to 
> allow the "pg dump" call (in addition to profile mgr)?
>
> Thanks, Chris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs spam log with scrub starts

2023-08-31 Thread David Orman
https://github.com/ceph/ceph/pull/48070 may be relevant.

I think this may have gone out in 16.2.11. I would tend to agree, personally 
this feels quite noisy at default logging levels for production clusters.

David

On Thu, Aug 31, 2023, at 11:17, Zakhar Kirpichenko wrote:
> This is happening to our 16.2.14 cluster as well. I'm not sure whether this
> was happening before the upgrade to 16.2.14.
>
> /Z
>
> On Thu, 31 Aug 2023, 17:49 Adrien Georget, 
> wrote:
>
>> Hello,
>>
>> On our 16.2.14 CephFS cluster, all OSDs are spamming logs with messages
>> like "log_channel(cluster) log [DBG] : xxx scrub starts".
>> All OSDs are concerned, for different PGs. Cluster is healthy without
>> any recovery ops.
>>
>> For a single PG, we can have hundreds of scrub starts msg in less than
>> an hour. With 720 OSDs (8k PG, EC8+2), it can lead to millions of
>> messages by hour...
>> For example with PG 3.1d57 or||3.1988 :
>>
>> |Aug 31 16:02:09
>> ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-58[1310188]: debug
>> 2023-08-31T14:02:09.453+ 7fdab1ec4700  0 log_channel(cluster) log
>> [DBG] : 3.1d57 scrub starts||
>> ||Aug 31 16:02:11
>> ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-58[1310188]: debug
>> 2023-08-31T14:02:11.446+ 7fdab1ec4700  0 log_channel(cluster) log
>> [DBG] : 3.1d57 scrub starts||
>> ||Aug 31 16:02:12
>> ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-58[1310188]: debug
>> 2023-08-31T14:02:12.428+ 7fdab1ec4700  0 log_channel(cluster) log
>> [DBG] : 3.1d57 scrub starts||
>> ||Aug 31 16:02:13
>> ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-58[1310188]: debug
>> 2023-08-31T14:02:13.456+ 7fdab1ec4700  0 log_channel(cluster) log
>> [DBG] : 3.1d57 scrub starts||
>> ||Aug 31 16:02:14
>> ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-58[1310188]: debug
>> 2023-08-31T14:02:14.431+ 7fdab1ec4700  0 log_channel(cluster) log
>> [DBG] : 3.1d57 scrub starts||
>> ||Aug 31 16:02:15
>> ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-58[1310188]: debug
>> 2023-08-31T14:02:15.475+ 7fdab1ec4700  0 log_channel(cluster) log
>> [DBG] : 3.1d57 scrub starts||
>> ||Aug 31 16:02:21
>> ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-58[1310188]: debug
>> 2023-08-31T14:02:21.516+ 7fdab1ec4700  0 log_channel(cluster) log
>> [DBG] : 3.1d57 scrub starts||
>> ||Aug 31 16:02:23
>> ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-58[1310188]: debug
>> 2023-08-31T14:02:23.555+ 7fdab1ec4700  0 log_channel(cluster) log
>> [DBG] : 3.1d57 scrub starts||
>> ||Aug 31 16:02:24
>> ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-58[1310188]: debug
>> 2023-08-31T14:02:24.510+ 7fdab1ec4700  0 log_channel(cluster) log
>> [DBG] : 3.1d57 deep-scrub starts||
>>
>> Aug 31 16:02:10
>> ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-276[1325507]: debug
>> 2023-08-31T14:02:10.384+ 7f0606ce3700  0 log_channel(cluster) log
>> [DBG] : 3.1988 deep-scrub starts
>> Aug 31 16:02:11
>> ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-276[1325507]: debug
>> 2023-08-31T14:02:11.377+ 7f0606ce3700  0 log_channel(cluster) log
>> [DBG] : 3.1988 scrub starts
>> Aug 31 16:02:13
>> ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-276[1325507]: debug
>> 2023-08-31T14:02:13.383+ 7f0606ce3700  0 log_channel(cluster) log
>> [DBG] : 3.1988 scrub starts
>> Aug 31 16:02:15
>> ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-276[1325507]: debug
>> 2023-08-31T14:02:15.383+ 7f0606ce3700  0 log_channel(cluster) log
>> [DBG] : 3.1988 deep-scrub starts
>> Aug 31 16:02:17
>> ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-276[1325507]: debug
>> 2023-08-31T14:02:17.336+ 7f0606ce3700  0 log_channel(cluster) log
>> [DBG] : 3.1988 scrub starts
>> Aug 31 16:02:19
>> ceph-86cd8a68-7649-11ed-b2be-5cba2c7fdb30-osd-276[1325507]: debug
>> 2023-08-31T14:02:19.328+ 7f0606ce3700  0 log_channel(cluster) log
>> [DBG] : 3.1988 scrub starts
>> ||
>> ||PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED MISPLACED  UNFOUND
>> BYTES OMAP_BYTES*  OMAP_KEYS*  LOG DISK_LOG  STATE
>> STATE_STAMP  VERSION  REPORTED
>> UP UP_PRIMARY
>> ACTING ACTING_PRIMARY
>> LAST_SCRUB   SCRUB_STAMP  LAST_DEEP_SCRUB
>> DEEP_SCRUB_STAMP SNAPTRIMQ_LEN||
>> ||3.1d57 52757   0 0 00
>> 1675960266480   0   1799 1799
>> active+clean 2023-08-31T14:27:24.025755+   236010'4532653
>> 236011:8745383  [58,421,335,9,59,199,390,481,425,480] 58
>> [58,421,335,9,59,199,390,481,425,480]  58 231791'4531915
>> *2023-08-29T22:41:12.266874+* 229377'4526369
>> *2023-08-26T04:30:42.894505+* 0|
>> |3.1988 52867   0 0 00
>> 1686038728080   0   1811 1811
>> active+clean 2023-08-31T14:32:13.361420+   236018'4241611
>> 236018:9815753[276,342,345,299,210,349,85,481,446,46] 276
>> [276,342,345,299,210,349,85,481,446,46] 

[ceph-users] Re: Another Pacific point release?

2023-07-17 Thread David Orman
I'm hoping to see at least one more, if not more than that, but I have no 
crystal ball. I definitely support this idea, and strongly suggest it's given 
some thought. There have been a lot of delays/missed releases due to all of the 
lab issues, and it's significantly impacted the release cadence for quite some 
time.

We've got a fair number of patches we intend for backport to Pacific that 
address core functionality issues impacting our customer workloads, but haven't 
been able to get released due to all of the infrastructure problems.

David

On Mon, Jul 17, 2023, at 05:27, Konstantin Shalygin wrote:
> Hi,
>
>> On 17 Jul 2023, at 12:53, Ponnuvel Palaniyappan  wrote:
>> 
>> The typical EOL date (2023-06-01) has already passed for Pacific. Just
>> wondering if there's going to be another Pacific point release (16.2.14) in
>> the pipeline.
>
> Good point! At least, for possibility upgrade RBD clusters from 
> Nautilus to Pacific, seems this release should get this backport [1]
>
> Also, it will be good to see an update of information on distributions 
> (ABC QA grades) [2]
>
> Thanks,
>
> [1] https://tracker.ceph.com/issues/59538
> [2] https://docs.ceph.com/en/quincy/start/os-recommendations/#platforms
>
> k
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Slow recovery on Quincy

2023-05-22 Thread David Orman
Someone who's got data regarding this should file a bug report, it sounds like 
a quick fix for defaults if this holds true.

On Sat, May 20, 2023, at 00:59, Hector Martin wrote:
> On 17/05/2023 03.07, 胡 玮文 wrote:
>> Hi Sake,
>> 
>> We are experiencing the same. I set “osd_mclock_cost_per_byte_usec_hdd” to 
>> 0.1 (default is 2.6) and get about 15 times backfill speed, without 
>> significant affect client IO. This parameter seems calculated wrongly, from 
>> the description 5e-3 should be a reasonable value for HDD (corresponding to 
>> 200MB/s). I noticed this default is originally 5.2, then changed to 2.6 to 
>> increase the recovery speed. So I suspect the original author just convert 
>> the unit wrongly, he may want 5.2e-3 but wrote 5.2 in code.
>> 
>> But all this may be not important in the next version. I see the relevant 
>> code is rewritten, and this parameter is now removed.
>> 
>> high_recovery_ops profile works very poorly for us. It increase the average 
>> latency of client IO from 50ms to about 1s.
>> 
>> Weiwen Hu
>> 
>
> Thank you for this, that parameter indeed seems completely wrong
> (assuming it means what it says on the tin). After changing that my
> Quincy cluster is no recovering at a much more reasonable speed.
>
> - Hector
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph pg stuck - missing on 1 osd how to proceed

2023-04-18 Thread David Orman
You may want to consider disabling deep scrubs and scrubs while attempting to 
complete a backfill operation.

On Tue, Apr 18, 2023, at 01:46, Eugen Block wrote:
> I didn't mean you should split your PGs now, that won't help because  
> there is already backfilling going on. I would revert the pg_num  
> changes (since nothing actually happened yet there's no big risk) and  
> wait for the backfill to finish. You don't seem to have inactive PGs  
> so it shouldn't be an issue as long as nothing else breaks down. Do  
> you see progress of the backfilling? Do the numbers of misplaced  
> objects change?
>
> Zitat von xadhoo...@gmail.com:
>
>> Thanks, I try to change the pg and pgp number to an higher value but  
>> pg do not increase
>> ta:
>> pools:   8 pools, 1085 pgs
>> objects: 242.28M objects, 177 TiB
>> usage:   553 TiB used, 521 TiB / 1.0 PiB avail
>> pgs: 635281/726849381 objects degraded (0.087%)
>>  91498351/726849381 objects misplaced (12.588%)
>>  773 active+clean
>>  288 active+remapped+backfilling
>>  11  active+clean+scrubbing+deep
>>  10  active+clean+scrubbing
>>  3   active+undersized+degraded+remapped+backfilling
>>
>>
>> still have those 3 pg in stuck
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5

2023-03-06 Thread David Orman
I've seen what appears to be the same post on Reddit, previously, and attempted 
to assist. My suspicion is a "stop" command was passed to ceph orch upgrade in 
an attempt to stop it, but with the --image flag preceding it, setting the 
image to stop. I asked the user to do an actual upgrade stop, then re-attempt 
specifying a different image, and the user indicated the "stop" image pull 
attempt continued. That part didn't seem right, to which I suggested a bug 
report.

https://www.reddit.com/r/ceph/comments/11g3rze/anyone_having_pull_issues_with_ceph_images/

@OP - are you the same poster as the above, or do you just have the same 
problem? If there's multiple users with this, it would indicate something 
larger than just a misplaced option/flag/command. If it is you - could you link 
to the bug report?

Just to make sure, you've issued:

"ceph orch upgrade stop"

Then performed another "ceph orch upgrade start" specifying a --ceph-version or 
--image?

I'll also echo Adam's request for a "ceph config dump |grep image". It sounds 
like it's still set to "stop", but I'd have expected the above to initiate an 
upgrade to the correct image. If not, the bug report would be helpful to 
continue so it could be fixed.

David

On Mon, Mar 6, 2023, at 15:02, Adam King wrote:
> Can I see the output of `ceph orch upgrade status` and `ceph config dump |
> grep image`? The "Pulling container image stop" implies somehow (as Eugen
> pointed out) that cephadm thinks the image to pull is named "stop" which
> means it is likely set as either the image to upgrade to or as one of the
> config options.
>
> On Sat, Mar 4, 2023 at 2:06 AM  wrote:
>
>> I initially ran the upgrade fine but it failed @ around 40/100 on an osd,
>> so after waiting for  along time i thought I'd try restarting it and then
>> restarting the upgrade.
>> I am stuck with the below debug error, I have tested docker pull from
>> other servers and they dont fail for the ceph images but on ceph it does.
>> If i even try to redeploy or add or remove mon damons for example it comes
>> up with the same error related to the images.
>>
>> The error that ceph is giving me is:
>> 2023-03-02T07:22:45.063976-0700 mgr.mgr-node.idvkbw [DBG] _run_cephadm :
>> args = []
>> 2023-03-02T07:22:45.070342-0700 mgr.mgr-node.idvkbw [DBG] args: --image
>> stop --no-container-init pull
>> 2023-03-02T07:22:45.081086-0700 mgr.mgr-node.idvkbw [DBG] Running command:
>> which python3
>> 2023-03-02T07:22:45.180052-0700 mgr.mgr-node.idvkbw [DBG] Running command:
>> /usr/bin/python3
>> /var/lib/ceph/5058e342-dac7-11ec-ada3-01065e90228d/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e
>> --image stop --no-container-init pull
>> 2023-03-02T07:22:46.500561-0700 mgr.mgr-node.idvkbw [DBG] code: 1
>> 2023-03-02T07:22:46.500787-0700 mgr.mgr-node.idvkbw [DBG] err: Pulling
>> container image stop...
>> Non-zero exit code 1 from /usr/bin/docker pull stop
>> /usr/bin/docker: stdout Using default tag: latest
>> /usr/bin/docker: stderr Error response from daemon: pull access denied for
>> stop, repository does not exist or may require 'docker login': denied:
>> requested access to the resource is denied
>> ERROR: Failed command: /usr/bin/docker pull stop
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Undo "radosgw-admin bi purge"

2023-02-22 Thread David Orman
If it's a test cluster, you could try:

root@ceph01:/# radosgw-admin bucket check -h |grep -A1 check-objects
   --check-objects   bucket check: rebuilds bucket index according to
 actual objects state


On Wed, Feb 22, 2023, at 02:22, Robert Sander wrote:
> On 21.02.23 22:52, Richard Bade wrote:
>
>> A colleague and I ran into this a few weeks ago. The way we managed to
>> get access back to delete the bucket properly (using radosgw-admin
>> bucket rm) was to reshard the bucket.
>
>> This created a new bucket index and therefore it was then possible to delete 
>> it.
>> If you are looking to get access back to the objects, then as Eric
>> said there's no way to get those indexes back but the objects will
>> still be there in the pool.
>
> Thanks for the answers so far.
>
> The issue we faced was a corrupt bucket index object.
>
> We thought about strategies to repair that but found none.
>
> I tried different things on a test cluster in a test bucket, one of them 
> was "bi purge". And then I thought: Why is there such an operation when 
> there is no way to get the index back and a working bucket?
>
> Resharding after a "bi prune" seems to work but as a result the bucket 
> is empty when listing via S3. A bucket remove is successful but leaves 
> all the RADOS objects in the index and data pools.
>
> Why is there no operation to rebuild the index for a bucket based on the 
> existing RADOS objects in the data pool?
>
> Regards
> -- 
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
>
> https://www.heinlein-support.de
>
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
>
> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> Geschäftsführer: Peer Heinlein - Sitz: Berlin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Replacing OSD with containerized deployment

2023-01-31 Thread David Orman
What does your OSD service specification look like? Did your db/wall device 
show as having free space prior to the OSD creation?

On Tue, Jan 31, 2023, at 04:01, mailing-lists wrote:
> OK, the OSD is filled again. In and Up, but it is not using the nvme 
> WAL/DB anymore.
>
> And it looks like the lvm group of the old osd is still on the nvme 
> drive. I come to this idea, because the two nvme drives still have 9 lvm 
> groups each. 18 groups but only 17 osd are using the nvme (shown in 
> dashboard).
>
>
> Do you have a hint on how to fix this?
>
>
>
> Best
>
> Ken
>
>
>
> On 30.01.23 16:50, mailing-lists wrote:
>> oph wait,
>>
>> i might have been too impatient:
>>
>>
>> 1/30/23 4:43:07 PM[INF]Deploying daemon osd.232 on ceph-a1-06
>>
>> 1/30/23 4:42:26 PM[INF]Found osd claims for drivegroup 
>> dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}
>>
>> 1/30/23 4:42:26 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
>>
>> 1/30/23 4:42:19 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
>>
>> 1/30/23 4:41:01 PM[INF]Found osd claims for drivegroup 
>> dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}
>>
>> 1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
>>
>> 1/30/23 4:41:01 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
>>
>> 1/30/23 4:41:00 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
>>
>> 1/30/23 4:39:34 PM[INF]Found osd claims for drivegroup 
>> dashboard-admin-1661788934732 -> {'ceph-a1-06': ['232']}
>>
>> 1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
>>
>> 1/30/23 4:39:34 PM[INF]Found osd claims -> {'ceph-a1-06': ['232']}
>>
>>
>>
>> Although, it doesnt show the NVME as wal/db yet, but i will let it 
>> proceed to a clear state until i do anything further.
>>
>>
>> On 30.01.23 16:42, mailing-lists wrote:
>>> root@ceph-a2-01:/# ceph osd destroy 232 --yes-i-really-mean-it
>>> destroyed osd.232
>>>
>>>
>>> OSD 232 shows now as destroyed and out in the dashboard.
>>>
>>>
>>> root@ceph-a1-06:/# ceph-volume lvm zap /dev/sdm
>>> --> Zapping: /dev/sdm
>>> --> --destroy was not specified, but zapping a whole device will 
>>> remove the partition table
>>> Running command: /usr/bin/dd if=/dev/zero of=/dev/sdm bs=1M count=10 
>>> conv=fsync
>>>  stderr: 10+0 records in
>>> 10+0 records out
>>>  stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0675647 s, 155 MB/s
>>> --> Zapping successful for: 
>>>
>>>
>>> root@ceph-a2-01:/# ceph orch device ls
>>>
>>> ceph-a1-06  /dev/sdm  hdd   TOSHIBA_X_X 16.0T 21m ago *locked*
>>>
>>>
>>> It shows locked and is not automatically added now, which is good i 
>>> think? otherwise it would probably be a new osd 307.
>>>
>>>
>>> root@ceph-a2-01:/# ceph orch osd rm status
>>> No OSD remove/replace operations reported
>>>
>>> root@ceph-a2-01:/# ceph orch osd rm 232 --replace
>>> Unable to find OSDs: ['232']
>>>
>>>
>>> Unfortunately it is still not replacing.
>>>
>>>
>>> It is so weird, i tried this procedure exactly in my virtual ceph 
>>> environment and it just worked. The real scenario is acting up now. -.-
>>>
>>>
>>> Do you have more hints for me?
>>>
>>> Thank you for your help so far!
>>>
>>>
>>> Best
>>>
>>> Ken
>>>
>>>
>>> On 30.01.23 15:46, David Orman wrote:
>>>> The 'down' status is why it's not being replaced, vs. destroyed, 
>>>> which would allow the replacement. I'm not sure why --replace lead 
>>>> to that scenario, but you will probably need to mark it destroyed 
>>>> for it to be replaced.
>>>>
>>>> https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#replacing-an-osd
>>>>  
>>>> has instructions on the non-orch way of doing that. You only need 1/2.
>>>>
>>>> You should look through your logs to see what happened that the OSD 
>>>> was marked down and not destroyed. Obviously, make sure you 
>>>> understand ramifications before running any commands. :)
>>>>
>>>> David
>>>>
>>>> On Mon, Jan 30, 2023, at 04:24, mailing-lists wrote:
>>>>> # ceph orch osd rm status
>>>

[ceph-users] Re: Replacing OSD with containerized deployment

2023-01-30 Thread David Orman
The 'down' status is why it's not being replaced, vs. destroyed, which would 
allow the replacement. I'm not sure why --replace lead to that scenario, but 
you will probably need to mark it destroyed for it to be replaced.

https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#replacing-an-osd
 has instructions on the non-orch way of doing that. You only need 1/2.

You should look through your logs to see what happened that the OSD was marked 
down and not destroyed. Obviously, make sure you understand ramifications 
before running any commands. :)

David

On Mon, Jan 30, 2023, at 04:24, mailing-lists wrote:
> # ceph orch osd rm status
> No OSD remove/replace operations reported
> # ceph orch osd rm 232 --replace
> Unable to find OSDs: ['232']
>
> It is not finding 232 anymore. It is still shown as down and out in the 
> Ceph-Dashboard.
>
>
>      pgs: 3236 active+clean
>
>
> This is the new disk shown as locked (because unzapped at the moment).
>
> # ceph orch device ls
>
> ceph-a1-06  /dev/sdm  hdd   TOSHIBA_X_X 16.0T 9m ago 
> locked
>
>
> Best
>
> Ken
>
>
> On 29.01.23 18:19, David Orman wrote:
>> What does "ceph orch osd rm status" show before you try the zap? Is 
>> your cluster still backfilling to the other OSDs for the PGs that were 
>> on the failed disk?
>>
>> David
>>
>> On Fri, Jan 27, 2023, at 03:25, mailing-lists wrote:
>>> Dear Ceph-Users,
>>>
>>> i am struggling to replace a disk. My ceph-cluster is not replacing the
>>> old OSD even though I did:
>>>
>>> ceph orch osd rm 232 --replace
>>>
>>> The OSD 232 is still shown in the osd list, but the new hdd will be
>>> placed as a new OSD. This wouldnt mind me much, if the OSD was also
>>> placed on the bluestoreDB / NVME, but it doesn't.
>>>
>>>
>>> My steps:
>>>
>>> "ceph orch osd rm 232 --replace"
>>>
>>> remove the failed hdd.
>>>
>>> add the new one.
>>>
>>> Convert the disk within the servers bios, so that the node can have
>>> direct access on it.
>>>
>>> It shows up as /dev/sdt,
>>>
>>> enter maintenance mode
>>>
>>> reboot server
>>>
>>> drive is now /dev/sdm (which the old drive had)
>>>
>>> "ceph orch device zap node-x /dev/sdm"
>>>
>>> A new OSD is placed on the cluster.
>>>
>>>
>>> Can you give me a hint, where did I take a wrong turn? Why is the disk
>>> not being used as OSD 232?
>>>
>>>
>>> Best
>>>
>>> Ken
>>>
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Replacing OSD with containerized deployment

2023-01-29 Thread David Orman
What does "ceph orch osd rm status" show before you try the zap? Is your 
cluster still backfilling to the other OSDs for the PGs that were on the failed 
disk?

David

On Fri, Jan 27, 2023, at 03:25, mailing-lists wrote:
> Dear Ceph-Users,
>
> i am struggling to replace a disk. My ceph-cluster is not replacing the 
> old OSD even though I did:
>
> ceph orch osd rm 232 --replace
>
> The OSD 232 is still shown in the osd list, but the new hdd will be 
> placed as a new OSD. This wouldnt mind me much, if the OSD was also 
> placed on the bluestoreDB / NVME, but it doesn't.
>
>
> My steps:
>
> "ceph orch osd rm 232 --replace"
>
> remove the failed hdd.
>
> add the new one.
>
> Convert the disk within the servers bios, so that the node can have 
> direct access on it.
>
> It shows up as /dev/sdt,
>
> enter maintenance mode
>
> reboot server
>
> drive is now /dev/sdm (which the old drive had)
>
> "ceph orch device zap node-x /dev/sdm "
>
> A new OSD is placed on the cluster.
>
>
> Can you give me a hint, where did I take a wrong turn? Why is the disk 
> not being used as OSD 232?
>
>
> Best
>
> Ken
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Current min_alloc_size of OSD?

2023-01-13 Thread David Orman
I think this would be valuable to have easily accessible during runtime, 
perhaps submit a report (and patch if possible)?

David

On Fri, Jan 13, 2023, at 08:14, Robert Sander wrote:
> Hi,
> 
> Am 13.01.23 um 14:35 schrieb Konstantin Shalygin:
> 
> > ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-0/ get S 
> > min_alloc_size
> 
> This only works when the OSD is not running.
> 
> Regards
> -- 
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
> 
> http://www.heinlein-support.de
> 
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
> 
> Zwangsangaben lt. §35a GmbHG:
> HRB 220009 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein -- Sitz: Berlin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [ERR] OSD_SCRUB_ERRORS: 2 scrub errors

2023-01-09 Thread David Orman
We ship all of this to our centralized monitoring system (and a lot more) and 
have dashboards/proactive monitoring/alerting with 100PiB+ of Ceph. If you're 
running Ceph in production, I believe host-level monitoring is critical, above 
and beyond Ceph level. Things like inlet/outlet temperature, hardware state of 
various components, and various other details are probably best served by 
monitoring external to Ceph itself.

I did a quick glance and didn't see this data (OSD errors re: reads/writes) 
exposed in the Pacific version of Ceph's Prometheus-style exporter, but I may 
have overlooked it. This would be nice to have, as well, if it does not exist.

We collect drive counters at the host level, and alert at levels prior to 
general impact. Even a failing drive can cause latency spikes which are 
frustrating, before it starts returning errors (correctable errors) - the OSD 
will not see these other than longer latency on operations. Seeing a change in 
the smart counters either at a high rate or above thresholds you define is most 
certainly something I would suggest ensuring is covered in whatever host-level 
monitoring you're already performing for production usage.

David

On Mon, Jan 9, 2023, at 17:46, Erik Lindahl wrote:
> Hi,
> 
> Good points; however, given that ceph already collects all this statistics, 
> isn't  there any way to set (?) reasonable thresholds and actually have ceph 
> detect the amount of read errors and suggest that a given drive should be 
> replaced?
> 
> It seems a bit strange that we all should have to wait for a PG read error, 
> then log into each node to check the number of read errors for each device 
> and keep track of this?  Of course it's possible to write scripts for 
> everything, but there must be numerous Ceph sites with hundreds of OSD nodes, 
> so I'm a bit surprised this isn't more automated...
> 
> Cheers,
> 
> Erik
> 
> --
> Erik Lindahl 
> On 10 Jan 2023 at 00:09 +0100, Anthony D'Atri , wrote:
> >
> >
> > > On Jan 9, 2023, at 17:46, David Orman  wrote:
> > >
> > > It's important to note we do not suggest using the SMART "OK" indicator 
> > > as the drive being valid. We monitor correctable/uncorrectable error 
> > > counts, as you can see a dramatic rise when the drives start to fail. 
> > > 'OK' will be reported for SMART health long after the drive is throwing 
> > > many uncorrectable errors and needs replacement. You have to look at the 
> > > actual counters, themselves.
> >
> > I strongly agree, especially given personal experience with SSD firmware 
> > design flaws.
> >
> > Also, examining UDMA / CRC error rates led to the discovery that certain 
> > aftermarket drive carriers had lower tolerances than those from the chassis 
> > vendor, resulting in drives that were silently slow. Reseating in most 
> > cases restored performance.
> >
> > — aad
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [ERR] OSD_SCRUB_ERRORS: 2 scrub errors

2023-01-09 Thread David Orman
It's important to note we do not suggest using the SMART "OK" indicator as the 
drive being valid. We monitor correctable/uncorrectable error counts, as you 
can see a dramatic rise when the drives start to fail. 'OK' will be reported 
for SMART health long after the drive is throwing many uncorrectable errors and 
needs replacement. You have to look at the actual counters, themselves.

That said, you will generally see these uncorrectable errors in the kernel 
output from dmesg, as well.

On Mon, Jan 9, 2023, at 16:38, Erik Lindahl wrote:
> Hi,
> 
> We too kept seeing this until a few months ago in a cluster with ~400 HDDs, 
> while all the drive SMART statistics was always A-OK. Since we use erasure 
> coding each PG involves up to 10 HDDs.
> 
> It took us a while to realize we shouldn't expect scrub errors on healthy 
> drives, but eventually we decided to track it down, and found documentation 
> suggesting to use
> 
>  rados list-inconsistent-obj   --format=json-pretty
> 
> ... before you repair the PG. If you look into that (long) output, you are 
> likely going to find a "read_error" for a specific OSD. Then we started to 
> make a note of the HDD that saw the error.
> 
> This helped us identify two HDDs that had multiple read errors within a few 
> weeks, even though their SMART data was still perfectly fine. Now that 
> *might* just be bad luck, but we have enough drives that we don't care, so we 
> just replaced them, and since then I've only had a single drive report an 
> error.
> 
> One conclusion (in our case) is that it could be a drive that likely would 
> have failed sooner or later, even though it hadn't yet reached a threshold 
> for SMART to worry, or the alternative might be that it's a drive that just 
> has more frequent read errors, but it's technically within the allowed 
> variation. Assuming you have configured your cluster with reasonable 
> redundancy you shouldn't run any risk of data losses, but for us we figured 
> it's worth replacing a few outlier drives to sleep better.
> 
> Cheers,
> 
> Erik
> 
> --
> Erik Lindahl 
> On 9 Jan 2023 at 23:06 +0100, David Orman , wrote:
> > "dmesg" on all the linux hosts and look for signs of failing drives. Look 
> > at smart data, your HBAs/disk controllers, OOB management logs, and so 
> > forth. If you're seeing scrub errors, it's probably a bad disk backing an 
> > OSD or OSDs.
> >
> > Is there a common OSD in the PGs you've run the repairs on?
> >
> > On Mon, Jan 9, 2023, at 03:37, Kuhring, Mathias wrote:
> > > Hey all,
> > >
> > > I'd like to pick up on this topic, since we also see regular scrub
> > > errors recently.
> > > Roughly one per week for around six weeks now.
> > > It's always a different PG and the repair command always helps after a
> > > while.
> > > But the regular re-occurrence seems it bit unsettling.
> > > How to best troubleshoot this.
> > >
> > > We are currently on ceph version 17.2.1
> > > (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable)
> > >
> > > Best Wishes,
> > > Mathias
> > >
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [ERR] OSD_SCRUB_ERRORS: 2 scrub errors

2023-01-09 Thread David Orman
"dmesg" on all the linux hosts and look for signs of failing drives. Look at 
smart data, your HBAs/disk controllers, OOB management logs, and so forth. If 
you're seeing scrub errors, it's probably a bad disk backing an OSD or OSDs.

Is there a common OSD in the PGs you've run the repairs on?

On Mon, Jan 9, 2023, at 03:37, Kuhring, Mathias wrote:
> Hey all,
> 
> I'd like to pick up on this topic, since we also see regular scrub 
> errors recently.
> Roughly one per week for around six weeks now.
> It's always a different PG and the repair command always helps after a 
> while.
> But the regular re-occurrence seems it bit unsettling.
> How to best troubleshoot this.
> 
> We are currently on ceph version 17.2.1 
> (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable)
> 
> Best Wishes,
> Mathias
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Leadership Team Meeting - 2022/01/04

2023-01-04 Thread David Orman
Today's CLT meeting had the following topics of discussion:

 * Docs questions
   * crushtool options could use additional documentation
 * This is being addressed
   * sticky header on documentation pages obscuring titles when anchor links 
are used
 * There will be a follow-up email soliciting community feedback
   * RGW multisite documentation
 * This is being updated
 * Pacific release status
   * RC4 built
 * Sepia test lab issues are being worked on, which block testing
 * Re-attempt being made
   * Build issues
 * centos8 builds failing with java-related issue
   * Also being addressed
 * Build/test issues
   * Fallout from last year's issues, still a work-in-progress
   * Is there any community interest in being more involved in discussions 
regarding this infrastructure, from an advisory position, with a potential path 
in the future towards more hands-on involvement?
 * Please let us know!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CLT meeting summary 2022-09-21

2022-09-22 Thread David Orman
This was a short meeting, and in summary:

 * Testing of upgrades for 17.2.4 in Gibba commenced and slowness during 
upgrade has been investigated.
   * Workaround available; not a release blocker
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Wide variation in osd_mclock_max_capacity_iops_hdd

2022-09-06 Thread David Orman
Yes. Rotational drives can generally do 100-200IOPS (some outliers, of
course). Do you have all forms of caching disabled on your storage
controllers/disks?

On Tue, Sep 6, 2022 at 11:32 AM Vladimir Brik <
vladimir.b...@icecube.wisc.edu> wrote:

> Setting osd_mclock_force_run_benchmark_on_init to true and
> restarting OSDs fixed the problem of high variability.
>
> However, osd_mclock_max_capacity_iops_hdd is now over 2000.
> That's way too much for spinning disks, isn't it?
>
> Vlad
>
>
>
> On 9/1/22 03:54, Sridhar Seshasayee wrote:
> > Hello Vladimir,
> >
> > I have noticed that our osd_mclock_max_capacity_iops_hdd
> > varies widely for OSDs on identical drives in identical
> > machines (from ~600 to ~2800).
> >
> > The IOPS shouldn't vary widely if the drives are of similar
> > age and running
> > the same workloads. The osd_mclock_max_capacity_iops_hdd is
> > determined
> > using Ceph's osd bench during osd boot-up. From our testing
> > on HDDs, the
> > tool shows fairly consistent numbers (+/- a few 10s of IOPS)
> > for a given HDD.
> > For more details please see:
> >
> https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/#osd-capacity-determination-automated
> <
> https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/#osd-capacity-determination-automated
> >
> >
> > In your case, it would be good to take another look at the
> > subset of HDDs
> > showing degraded/lower IOPS performance and check if there
> > are any
> > underlying issues. You could run the osd bench against the
> > affected osds as
> > described in the link below or your preferred tool to get a
> > better understanding:
> >
> >
> https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/#benchmarking-test-steps-using-osd-bench
> <
> https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/#benchmarking-test-steps-using-osd-bench
> >
> >
> > Is such great variation a problem? What effect on
> > performance does this have?
> >
> > The mClock profiles use the IOPS capacity of each osd to
> > allocate IOPS
> > resources to ops like client, recovery, scrubs and so on.
> > Therefore, a lower IOPS
> > capacity will have an impact on these ops and therefore it
> > would make sense to
> > check the health of the HDDs that are showing lower than
> > expected IOPS numbers.
> > -Sridhar
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm old spec Feature `crush_device_class` is not supported

2022-08-04 Thread David Orman
https://github.com/ceph/ceph/pull/46480 - you can see the backports/dates
there.

Perhaps it isn't in the version you're running?

On Thu, Aug 4, 2022 at 7:51 AM Kenneth Waegeman 
wrote:

> Hi all,
>
> I’m trying to deploy this spec:
>
> spec:
>   data_devices:
> model: Dell Ent NVMe AGN MU U.2 6.4TB
> rotational: 0
>   encrypted: true
>   osds_per_device: 4
>   crush_device_class: nvme
> placement:
>   host_pattern: 'ceph30[1-3]'
> service_id: nvme_22_drive_group
> service_type: osd
>
>
> But it fails:
>
> ceph orch apply -i /etc/ceph/orch_osd.yaml --dry-run
> Error EINVAL: Failed to validate OSD spec "nvme_22_drive_group": Feature
> `crush_device_class` is not supported
>
> It’s in the docs
> https://docs.ceph.com/en/quincy/cephadm/services/osd/#ceph.deployment.drive_group.DriveGroupSpec.crush_device_class,
> and it’s also even in the docs of Pacific. I’m running Quincy 17.2.0
>
> Is this option missing somehow?
>
> Thanks!!
>
> Kenneth
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PGs stuck deep-scrubbing for weeks - 16.2.9

2022-07-15 Thread David Orman
Apologies, backport link should be: https://github.com/ceph/ceph/pull/46845

On Fri, Jul 15, 2022 at 9:14 PM David Orman  wrote:

> I think you may have hit the same bug we encountered. Cory submitted a
> fix, see if it fits what you've encountered:
>
> https://github.com/ceph/ceph/pull/46727 (backport to Pacific here:
> https://github.com/ceph/ceph/pull/46877 )
> https://tracker.ceph.com/issues/54172
>
> On Fri, Jul 15, 2022 at 8:52 AM Wesley Dillingham 
> wrote:
>
>> We have two clusters one 14.2.22 -> 16.2.7 -> 16.2.9
>>
>> Another 16.2.7 -> 16.2.9
>>
>> Both with a multi disk (spinner block / ssd block.db) and both CephFS
>> around 600 OSDs each with combo of rep-3 and 8+3 EC data pools. Examples
>> of
>> stuck scrubbing PGs from all of the pools.
>>
>> They have generally been behind on scrubbing which we attributed to simply
>> being large disks (10TB) with a heavy write load and the OSDs just having
>> trouble keeping up. On closer inspection it appears we have many PGs that
>> have been lodged in a deep scrubbing state on one cluster for 2 weeks and
>> another for 7 weeks. Wondering if others have been experiencing anything
>> similar. The only example of PGs being stuck scrubbing I have seen in the
>> past has been related to snaptrim PG state but we arent doing anything
>> with
>> snapshots in these new clusters.
>>
>> Granted my cluster has been warning me with "pgs not deep-scrubbed in
>> time"
>> and its on me for not looking more closely into why. Perhaps a separate
>> warning of "PG Stuck Scrubbing for greater than 24 hours" or similar might
>> be helpful to an operator.
>>
>> In any case I was able to get scrubs proceeding again by restarting the
>> primary OSD daemon in the PGs which were stuck. Will monitor closely for
>> additional stuck scrubs.
>>
>>
>> Respectfully,
>>
>> *Wes Dillingham*
>> w...@wesdillingham.com
>> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PGs stuck deep-scrubbing for weeks - 16.2.9

2022-07-15 Thread David Orman
I think you may have hit the same bug we encountered. Cory submitted a fix,
see if it fits what you've encountered:

https://github.com/ceph/ceph/pull/46727 (backport to Pacific here:
https://github.com/ceph/ceph/pull/46877 )
https://tracker.ceph.com/issues/54172

On Fri, Jul 15, 2022 at 8:52 AM Wesley Dillingham 
wrote:

> We have two clusters one 14.2.22 -> 16.2.7 -> 16.2.9
>
> Another 16.2.7 -> 16.2.9
>
> Both with a multi disk (spinner block / ssd block.db) and both CephFS
> around 600 OSDs each with combo of rep-3 and 8+3 EC data pools. Examples of
> stuck scrubbing PGs from all of the pools.
>
> They have generally been behind on scrubbing which we attributed to simply
> being large disks (10TB) with a heavy write load and the OSDs just having
> trouble keeping up. On closer inspection it appears we have many PGs that
> have been lodged in a deep scrubbing state on one cluster for 2 weeks and
> another for 7 weeks. Wondering if others have been experiencing anything
> similar. The only example of PGs being stuck scrubbing I have seen in the
> past has been related to snaptrim PG state but we arent doing anything with
> snapshots in these new clusters.
>
> Granted my cluster has been warning me with "pgs not deep-scrubbed in time"
> and its on me for not looking more closely into why. Perhaps a separate
> warning of "PG Stuck Scrubbing for greater than 24 hours" or similar might
> be helpful to an operator.
>
> In any case I was able to get scrubs proceeding again by restarting the
> primary OSD daemon in the PGs which were stuck. Will monitor closely for
> additional stuck scrubs.
>
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-13 Thread David Orman
Is this something that makes sense to do the 'quick' fix on for the next
pacific release to minimize impact to users until the improved iteration
can be implemented?

On Tue, Jul 12, 2022 at 6:16 AM Igor Fedotov  wrote:

> Hi Dan,
>
> I can confirm this is a regression introduced by
> https://github.com/ceph/ceph/pull/42725.
>
> Indeed strict comparison is a key point in your specific case but
> generally  it looks like this piece of code needs more redesign to
> better handle fragmented allocations (and issue deferred write for every
> short enough fragment independently).
>
> So I'm looking for a way to improve that at the moment. Will fallback to
> trivial comparison fix if I fail to do find better solution.
>
> Meanwhile you can adjust bluestore_min_alloc_size_hdd indeed but I'd
> prefer not to raise it that high as 128K to avoid too many writes being
> deferred (and hence DB overburden).
>
> IMO setting the parameter to 64K+1 should be fine.
>
>
> Thanks,
>
> Igor
>
> On 7/7/2022 12:43 AM, Dan van der Ster wrote:
> > Hi Igor and others,
> >
> > (apologies for html, but i want to share a plot ;) )
> >
> > We're upgrading clusters to v16.2.9 from v15.2.16, and our simple
> > "rados bench -p test 10 write -b 4096 -t 1" latency probe showed
> > something is very wrong with deferred writes in pacific.
> > Here is an example cluster, upgraded today:
> >
> > image.png
> >
> > The OSDs are 12TB HDDs, formatted in nautilus with the default
> > bluestore_min_alloc_size_hdd = 64kB, and each have a large flash
> block.db.
> >
> > I found that the performance issue is because 4kB writes are no longer
> > deferred from those pre-pacific hdds to flash in pacific with the
> > default config !!!
> > Here are example bench writes from both releases:
> > https://pastebin.com/raw/m0yL1H9Z
> >
> > I worked out that the issue is fixed if I set
> > bluestore_prefer_deferred_size_hdd = 128k (up from the 64k pacific
> > default. Note the default was 32k in octopus).
> >
> > I think this is related to the fixes in
> > https://tracker.ceph.com/issues/52089 which landed in 16.2.6 --
> > _do_alloc_write is comparing the prealloc size 0x1 with
> > bluestore_prefer_deferred_size_hdd (0x1) and the "strictly less
> > than" condition prevents deferred writes from ever happening.
> >
> > So I think this would impact anyone upgrading clusters with hdd/ssd
> > mixed osds ... surely we must not be the only clusters impacted by this?!
> >
> > Should we increase the default bluestore_prefer_deferred_size_hdd up
> > to 128kB or is there in fact a bug here?
> >
> > Best Regards,
> >
> > Dan
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Leadership Team Meeting Minutes (2022-07-06)

2022-07-06 Thread David Orman
Here are the main topics of discussion during the CLT meeting today:

   - make-check/API tests
   - Ignoring the doc/ directory would skip an expensive git checkout
  operation and save time
   - Stale PRs
  - Currently an issue with stalebot which is being investigated
   - Cephalocon
  - Discuss options moving forward
  - No decisions finalized, discussion ongoing
  - Quincy blog posts are being planned, for example:
   https://github.com/ceph/ceph.io/pull/408
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Set device-class via service specification file

2022-06-27 Thread David Orman
Hi Robert,

We had the same question and ended up creating a PR for this:
https://github.com/ceph/ceph/pull/46480 - there are backports, as well, so
I'd expect it will be in the next release or two.

David

On Mon, Jun 27, 2022 at 8:07 AM Robert Reihs  wrote:

> Hi,
> We are setting up a test cluster with cephadm. We would like to
> set different device classes for the osd's . Is there a possibility to set
> this via the service specification yaml file. This is the configuration for
> the osd service:
> 
> ---
> service_type: osd
> service_id: osd_mon_disk_layout_fast
> placement:
>   hosts:
> - fsn1-ceph-01
> - fsn1-ceph-02
> - fsn1-ceph-03
> spec:
>   data_devices:
> paths:
>   - /dev/vdb
>   encrypted: true
>   journal_devices:
> paths:
>   - /dev/vdc
>   db_devices:
> paths:
>   - /dev/vdc
>
> We would use this than in the crush rule. Or is there another way to set
> this up?
> Thanks
> Best
> Robert
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs getting OOM-killed right after startup

2022-06-10 Thread David Orman
Are you thinking it might be a permutation of:
https://tracker.ceph.com/issues/53729 ? There are some posts in it to check
for the issue, #53 and #65 had a few potential ways to check.

On Fri, Jun 10, 2022 at 5:32 AM Marius Leustean 
wrote:

> Did you check the mempools?
>
> ceph daemon osd.X dump_mempools
>
> This will tell you how much memory is consumed by different components of
> the OSD.
> Finger in the air, your ram might be consumed by the pg_log.
>
> If osd_pglog from the dump_mempools output is big, then you can lower the
> values of the related configuration options:
>
> osd_min_pg_log_entries = 100 (default is 250)
> osd_max_pg_log_entries = 500 (default is 1)
> osd_target_pg_log_entries_per_osd = 3 (default is 30)
>
> Those are just examples. You can adjust it based on your current memory
> consumption and the available amount of RAM.
>
> On Fri, Jun 10, 2022 at 1:21 PM Eugen Block  wrote:
>
> > Hi,
> >
> > could you share more cluster details and what your workload is?
> >
> > ceph -s
> > ceph osd df tree
> > ceph orch ls
> > ceph osd pools ls detail
> >
> > How big are your PGs?
> >
> > Zitat von Mara Sophie Grosch :
> >
> > > Hi,
> > >
> > > good catch with the way too low memory target, I wanted to configure 1
> > > GiB not 1 MiB. I'm aware it's low, but removed anyway for testing - it
> > > sadly didn't change anything.
> > >
> > > I customize the config mostly for dealing problems I have, something in
> > > my setup makes the OSDs eat lots of memory in normal operation, just
> > > gradually increasing .. I'd send metrics if my monitoring was up again
> > > ^^' That maybe being some form of cache was the reason for that config
> > > line (it does not seem to be cache).
> > >
> > > I have found a way to deal with my cluster, using ceph-objectstore-tool
> > > to export-remove all PGs from the OSDs, getting them online and happy
> > > again and then importing a few PGs at a time in one OSD and let it
> > > backfill to the others.
> > >
> > > The problem of eating very much memory on startup manifests with some
> of
> > > the PGs only, but for those it goes up to ~50GiB. Problematic: when it
> > > has multiple of those PGs in its storage, it's handling those in
> > > parallel - needing even more memory of course, until it finally gets
> > > OOM-killed.
> > >
> > > So.. seems I can get my cluster running again, only limited by my
> > > internet upload now. Any hints why it eats a lot of memory in normal
> > > operation would still be appreciated.
> > >
> > > Best, Mara
> > >
> > >
> > > Am Wed, Jun 08, 2022 at 09:05:52AM + schrieb Eugen Block:
> > >> It's even worse, you only give them 1MB, not GB.
> > >>
> > >> Zitat von Eugen Block :
> > >>
> > >>> Hi,
> > >>>
> > >>> is there any reason you use custom configs? Most of the defaults
> > >>> work well. But you only give your OSDs 1 GB of memory, that is way
> > >>> too low except for an idle cluster without much data. I recommend
> > >>> to remove the line
> > >>>
> > >>>   osd_memory_target = 1048576
> > >>>
> > >>> and let ceph handle it. I didn't install Quincy yet but in a
> > >>> healthy cluster the OSDs will take around 3 GB of memory, maybe 4,
> > >>> so you should be good with your setup.
> > >>>
> > >>> Regards,
> > >>> Eugen
> > >>>
> > >>> Zitat von Mara Sophie Grosch :
> > >>>
> >  Hi,
> > 
> >  I have a currently-down ceph cluster
> >  * v17.2.0 / quay.io/v17.2.0-20220420
> >  * 3 nodes, 4 OSDs
> >  * around 1TiB used/3TiB total
> >  * probably enough resources
> >  - two of those nodes have 64GiB memory, the third has 16GiB
> >  - one of the 64GiB nodes runs two OSDs, as it's a physical node with
> >   2 NVMe drives
> >  * provisioned via Rook and running in my Kubernetes cluster
> > 
> >  After some upgrades yesterday (system packages on the nodes) and
> today
> >  (Kubernetes to latest version), I wanted to reboot my nodes. The
> drain
> >  of the first node put a lot of stress on the other OSDs, making them
> > go
> >  OOM - but I think that probably is a bug already, as at least one of
> >  those nodes has enough resources (64GiB memory, physical machine,
> > ~40GiB
> >  surely free - but don't have metrics rn as everything is down).
> > 
> >  I'm now seeing all OSDs going into OOM right on startup, from what
> it
> >  looks like everything is fine until right after `load_pgs` - as soon
> > as
> >  it activates some PGs, memory usage increases _a lot_ (from ~4-5GiB
> >  RES before to .. 60GiB, though that depends on the free memory on
> the
> >  node).
> > 
> >  Because of this, I cannot get any of them online again and need
> advice
> >  what to do and what info might be useful. Logs of one of those OSDs
> > are
> >  here[1] (captured via kubectl logs, so something right from start
> > might
> >  be missing - happy to dig deeper if you need more) and my changed
> >  ceph.conf entries are 

[ceph-users] Re: OpenStack Swift on top of CephFS

2022-06-09 Thread David Orman
I agree with this, just because you can doesn't mean you should. It will
likely be significantly less painful to upgrade the infrastructure to
support doing this the more-correct way, vs. trying to layer swift on top
of cephfs. I say this having a lot of personal experience with Swift at
extremely large scales.

On Thu, Jun 9, 2022 at 7:34 AM Etienne Menguy 
wrote:

> > but why not CephFS?
> You don't want to offer distributed storage on top of distributed storage.
> You can't compare rgw and openstack swift, swift also takes care of data
> storage ( the openstack swift proxy is "similar" to rgw ).
>
> For sure you could use 'tricks' like a single replica on swift or ceph,
> but don't expect great performance. It sounds like a terrible idea.
> Also, it's probably easier to update your infrastructure rather than
> deploy/learn/maintain openstack swift.
>
> Étienne
>
> > -Original Message-
> > From: Kees Meijs | Nefos 
> > Sent: jeudi 9 juin 2022 13:43
> > To: Etienne Menguy 
> > Cc: ceph-users@ceph.io
> > Subject: Re: [ceph-users] OpenStack Swift on top of CephFS
> >
> > Hi,
> >
> > Well, there's a Ceph implementation in production already with a lot of
> > storage. Local storage is small and limited.
> >
> > Customers ask for Swift in addition to the OpenStack environment, so it
> > makes sense to combine both with regard to Swift.
> >
> > Obviously it's best to use Keystone integration with Ceph RGW and
> integrate
> > on that level, but both Ceph and OpenStack implementations aren't new
> > enough to do that.
> >
> > So, I was wondering if someone tried to use CephFS as a backend for
> Swift.
> > An alternative would be RBD with a filesystem on top but why not CephFS?
> >
> > Regards,
> > Kees
> >
> > On 09-06-2022 13:23, Etienne Menguy wrote:
> > > Hi,
> > >
> > > You should probably explain your need, why do you want to use cephfs
> > rather than local storage?
> > >
> > > Étienne
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Slow delete speed through the s3 API

2022-06-03 Thread David Orman
Is your client using the DeleteObjects call to delete 1000 per request?:
https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html

On Fri, Jun 3, 2022 at 9:35 AM J-P Methot 
wrote:

> Read/writes are super fast. It's only deletes that are incredibly slow,
> both through the s3 api and radosgw commands. It doesn't have a lot of
> index shards with 11 and it basically has one bucket with huge objects.
> We're trying to store backups in there, essentially. New backups write
> themselves about 100 times faster than old backups can be deleted.
>
> On 6/2/22 19:15, Wesley Dillingham wrote:
> > Is it just your deletes which are slow or writes and read as well?
> >
> > On Thu, Jun 2, 2022, 4:09 PM J-P Methot 
> > wrote:
> >
> > I'm following up on this as we upgraded to Pacific 16.2.9 and deletes
> > are still incredibly slow. The pool rgw is using is a fairly small
> > erasure coding pool set at 8 + 3. Is there anyone who's having the
> > same
> > issue?
> >
> > On 5/16/22 15:23, J-P Methot wrote:
> > > Hi,
> > >
> > > First of all, a quick google search shows me that questions
> > about the
> > > s3 API slow object deletion speed have been asked before and are
> > well
> > > documented. My issue is slightly different, because I am getting
> > > abysmal speeds of 11 objects/second on a full SSD ceph running
> > Octopus
> > > with about a hundred OSDs. This is much lower than the Redhat
> > reported
> > > limit of 1000 objects/second.
> > >
> > > I've seen elsewhere that it was a Rocksdb limitation and that it
> > would
> > > be fixed in Pacific, but the Pacific release logs do not show me
> > > anything that suggest that. Furthermore, I have limited control
> > over
> > > the s3client deleting the files as it's a 3rd-party open source
> > > automatic backup program.
> > >
> > > Could updating to Pacific fix this issue? Is there any
> > configuration
> > > change I could do to speed up object deletion?
> > >
> > --
> > Jean-Philippe Méthot
> > Senior Openstack system administrator
> > Administrateur système Openstack sénior
> > PlanetHoster inc.
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> --
> Jean-Philippe Méthot
> Senior Openstack system administrator
> Administrateur système Openstack sénior
> PlanetHoster inc.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Replacing OSD with DB on shared NVMe

2022-05-25 Thread David Orman
In your example, you can login to the server in question with the OSD, and
run "ceph-volume lvm zap --osd-id  --destroy" and it will purge the
DB/WAL LV. You don't need to reapply your osd spec, it will detect the
available space on the nvme and redploy that OSD.

On Wed, May 25, 2022 at 3:37 PM Edward R Huyer  wrote:

> Ok, I'm not sure if I'm missing something or if this is a gap in ceph orch
> functionality, or what:
>
> On a given host all the OSDs share a single large NVMe drive for DB/WAL
> storage and were set up using a simple ceph orch spec file.  I'm replacing
> some of the OSDs.  After they've been removed with the dashboard equivalent
> of "ceph orch osd rm # --replace" and a new drive has been swapped in, how
> do I get the OSD recreated using the chunk of NVMe for DB/WAL storage?
> Because the NVMe has data and is still in use by other OSDs, the
> orchestrator doesn't seem to recognize it as a valid storage location, so
> it won't create the OSDs when I do "ceph orch apply -i osdspec.yml".
>
> Thoughts?
>
> -
> Edward Huyer
> Golisano College of Computing and Information Sciences
> Rochester Institute of Technology
> Golisano 70-2373
> 152 Lomb Memorial Drive
> Rochester, NY 14623
> 585-475-6651
> erh...@rit.edu
>
> Obligatory Legalese:
> The information transmitted, including attachments, is intended only for
> the person(s) or entity to which it is addressed and may contain
> confidential and/or privileged material. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon
> this information by persons or entities other than the intended recipient
> is prohibited. If you received this in error, please contact the sender and
> destroy any copies of this information.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Migration Nautilus to Pacifi : Very high latencies (EC profile)

2022-05-17 Thread David Orman
We don't have any that wouldn't have the problem. That said, we've already
got a PR out for the 16.2.8 issue we encountered, so I would expect a
relatively quick update assuming no issues are found during testing.

On Tue, May 17, 2022 at 1:21 PM Wesley Dillingham 
wrote:

> What was the largest cluster that you upgraded that didn't exhibit the new
> issue in 16.2.8 ? Thanks.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Tue, May 17, 2022 at 10:24 AM David Orman  wrote:
>
>> We had an issue with our original fix in 45963 which was resolved in
>> https://github.com/ceph/ceph/pull/46096. It includes the fix as well as
>> handling for upgraded clusters. This is in the 16.2.8 release. I'm not
>> sure
>> if it will resolve your problem (or help mitigate it) but it would be
>> worth
>> trying.
>>
>> Head's up on 16.2.8 though, see the release thread, we ran into an issue
>> with it on our larger clusters: https://tracker.ceph.com/issues/55687
>>
>> On Tue, May 17, 2022 at 3:44 AM BEAUDICHON Hubert (Acoss) <
>> hubert.beaudic...@acoss.fr> wrote:
>>
>> > Hi Josh,
>> >
>> > I'm working with Stéphane and I'm the "ceph admin" (big words ^^) in our
>> > team.
>> > So, yes, as part of the upgrade we've done the offline repair to split
>> the
>> > omap by pool.
>> > The quick fix is, as far as I know, still disable on the default
>> > properties.
>> >
>> > On the I/O and CPU load, between Nautilus and Pacific, we haven't seen a
>> > really big change, just an increase in disk latency and in the end, the
>> > "ceph read operation" metric drop from 20K to 5K or less.
>> >
>> > But yes, a lot of slow IOPs were emerging as time passed.
>> >
>> > At this time, we have completely out one of our data node, and recreate
>> > from scratch 5 of 8 OSD deamons (DB on SSD, data on spinning drive).
>> > The result seems very good at this moment (we're seeing better metrics
>> > than under Nautilus).
>> >
>> > Since recreation, I have change 3 parameters :
>> > bdev_async_discard => osd : true
>> > bdev_enable_discard => osd : true
>> > bdev_aio_max_queue_depth => osd: 8192
>> >
>> > The first two have been extremely helpful for our SSD Pool, even with
>> > enterprise grade SSD, the "trim" seems to have rejuvenate our pool.
>> > The last one was set in response of messages in the newly create OSD :
>> > "bdev(0x55588e220400 ) aio_submit retries XX"
>> > After changing it and restarting the OSD process, messages were gone,
>> and
>> > it seems to have a beneficial effect on our data node.
>> >
>> > I've seen that the 16.2.8 was out yesterday, but I'm a little confused
>> on :
>> > [Revert] bluestore: set upper and lower bounds on rocksdb omap iterators
>> > (pr#46092, Neha Ojha)
>> > bluestore: set upper and lower bounds on rocksdb omap iterators
>> (pr#45963,
>> > Cory Snyder)
>> >
>> > (theses two lines seems related to
>> https://tracker.ceph.com/issues/55324).
>> >
>> > One step forward, one step backward ?
>> >
>> > Hubert Beaudichon
>> >
>> >
>> > -Message d'origine-
>> > De : Josh Baergen 
>> > Envoyé : lundi 16 mai 2022 16:56
>> > À : stéphane chalansonnet 
>> > Cc : ceph-users@ceph.io
>> > Objet : [ceph-users] Re: Migration Nautilus to Pacifi : Very high
>> > latencies (EC profile)
>> >
>> > Hi Stéphane,
>> >
>> > On Sat, May 14, 2022 at 4:27 AM stéphane chalansonnet <
>> schal...@gmail.com>
>> > wrote:
>> > > After a successful update from Nautilus to Pacific on Centos8.5, we
>> > > observed some high latencies on our cluster.
>> >
>> > As a part of this upgrade, did you also migrate the OSDs to sharded
>> > rocksdb column families? This would have been done by setting
>> bluestore's
>> > "quick fix on mount" setting to true or by issuing a
>> "ceph-bluestore-tool
>> > repair" offline, perhaps in response to a BLUESTORE_NO_PER_POOL_OMAP
>> > warning post-upgrade.
>> >
>> > I ask because I'm wondering if you're hitting
>> > https://tracker.ceph.com/issues/55324, for which there is a fix coming
>> in
>> > 16.2.8. If you inspect the nodes and disks involved in your EC pool, are
>> > you seeing high read or write I/O? High CPU usage?
>> >
>> > Josh
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
>> > email to ceph-users-le...@ceph.io
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Migration Nautilus to Pacifi : Very high latencies (EC profile)

2022-05-17 Thread David Orman
We had an issue with our original fix in 45963 which was resolved in
https://github.com/ceph/ceph/pull/46096. It includes the fix as well as
handling for upgraded clusters. This is in the 16.2.8 release. I'm not sure
if it will resolve your problem (or help mitigate it) but it would be worth
trying.

Head's up on 16.2.8 though, see the release thread, we ran into an issue
with it on our larger clusters: https://tracker.ceph.com/issues/55687

On Tue, May 17, 2022 at 3:44 AM BEAUDICHON Hubert (Acoss) <
hubert.beaudic...@acoss.fr> wrote:

> Hi Josh,
>
> I'm working with Stéphane and I'm the "ceph admin" (big words ^^) in our
> team.
> So, yes, as part of the upgrade we've done the offline repair to split the
> omap by pool.
> The quick fix is, as far as I know, still disable on the default
> properties.
>
> On the I/O and CPU load, between Nautilus and Pacific, we haven't seen a
> really big change, just an increase in disk latency and in the end, the
> "ceph read operation" metric drop from 20K to 5K or less.
>
> But yes, a lot of slow IOPs were emerging as time passed.
>
> At this time, we have completely out one of our data node, and recreate
> from scratch 5 of 8 OSD deamons (DB on SSD, data on spinning drive).
> The result seems very good at this moment (we're seeing better metrics
> than under Nautilus).
>
> Since recreation, I have change 3 parameters :
> bdev_async_discard => osd : true
> bdev_enable_discard => osd : true
> bdev_aio_max_queue_depth => osd: 8192
>
> The first two have been extremely helpful for our SSD Pool, even with
> enterprise grade SSD, the "trim" seems to have rejuvenate our pool.
> The last one was set in response of messages in the newly create OSD :
> "bdev(0x55588e220400 ) aio_submit retries XX"
> After changing it and restarting the OSD process, messages were gone, and
> it seems to have a beneficial effect on our data node.
>
> I've seen that the 16.2.8 was out yesterday, but I'm a little confused on :
> [Revert] bluestore: set upper and lower bounds on rocksdb omap iterators
> (pr#46092, Neha Ojha)
> bluestore: set upper and lower bounds on rocksdb omap iterators (pr#45963,
> Cory Snyder)
>
> (theses two lines seems related to https://tracker.ceph.com/issues/55324).
>
> One step forward, one step backward ?
>
> Hubert Beaudichon
>
>
> -Message d'origine-
> De : Josh Baergen 
> Envoyé : lundi 16 mai 2022 16:56
> À : stéphane chalansonnet 
> Cc : ceph-users@ceph.io
> Objet : [ceph-users] Re: Migration Nautilus to Pacifi : Very high
> latencies (EC profile)
>
> Hi Stéphane,
>
> On Sat, May 14, 2022 at 4:27 AM stéphane chalansonnet 
> wrote:
> > After a successful update from Nautilus to Pacific on Centos8.5, we
> > observed some high latencies on our cluster.
>
> As a part of this upgrade, did you also migrate the OSDs to sharded
> rocksdb column families? This would have been done by setting bluestore's
> "quick fix on mount" setting to true or by issuing a "ceph-bluestore-tool
> repair" offline, perhaps in response to a BLUESTORE_NO_PER_POOL_OMAP
> warning post-upgrade.
>
> I ask because I'm wondering if you're hitting
> https://tracker.ceph.com/issues/55324, for which there is a fix coming in
> 16.2.8. If you inspect the nodes and disks involved in your EC pool, are
> you seeing high read or write I/O? High CPU usage?
>
> Josh
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recommendations on books

2022-04-27 Thread David Orman
Hi,

I don't have any book suggestions, but in my experience, the best way to
learn is to set up a cluster and start intentionally breaking things, and
see how you can fix them. Perform upgrades, add load, etc.

I do suggest starting with Pacific (the upcoming 16.2.8 release would
likely be a good start) and deploying with cephadm to get a feel for the
current "standard" deployment method. (Lots of opinions on if this is
good/bad, you'll run into them all, but it will be a good learning
experience one way or another). Just spin up 3 VMs with a few virtual
disks, and do a deployment!

Once you get a cluster up and running, setup RGW, some RBD mounts, create
some RGW users/use an S3 client, and "use" the cluster a bit, things will
start to snap into focus. Then try breaking things! Power down a VM with a
hard power off. Remove a disk without shutting down the OSD if possible,
etc. You'll learn a lot working through that, and also come to see one of
Ceph's strongest features - durability. You'll also find some of the warts,
which is quite helpful to learn.

Hope that helps,
David

On Tue, Apr 26, 2022 at 10:17 PM Angelo Höngens  wrote:

> Hey guys and girls,
>
> Can you recommend some books to get started with ceph? I know the docs are
> probably a good source, but books, in my experience, do a better job of
> glueing it all together and painting the big picture. And I can take a book
> to places where reading docs on a laptop is inconvenient. I know Amazon has
> some books, but what do you think are the best books?
>
> I hope to read about the different deployment methods (cephadm? Docker?
> Native?), what pg’s and crush maps are, best practices in building
> clusters, ratios between osd, wal, db, etc, what they do and why, use cases
> for cephfs vs rdb vs s3, etc.
>
> Looking forward to your tips!
>
> Angelo.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Re: radosgw-admin bi list failing with Input/output error

2022-04-21 Thread David Orman
https://tracker.ceph.com/issues/51429 with
https://github.com/ceph/ceph/pull/45088 for Octopus.

We're also working on: https://tracker.ceph.com/issues/55324 which is
somewhat related in a sense.

On Thu, Apr 21, 2022 at 11:19 AM Guillaume Nobiron 
wrote:

> Yes, all the buckets in the reshard list are versioned (like most of our
> buckets by the way).
>
>
> Cegid est susceptible d’effectuer un traitement sur vos données
> personnelles à des fins de gestion de notre relation commerciale. Pour plus
> d’information, consultez https://www.cegid.com/fr/privacy-policy
> Ce message et les pièces jointes sont confidentiels et établis à
> l'attention exclusive de ses destinataires. Toute utilisation ou diffusion,
> même partielle, non autorisée est interdite. Tout message électronique est
> susceptible d'altération; Cegid décline donc toute responsabilité au titre
> de ce message. Si vous n'êtes pas le destinataire de ce message, merci de
> le détruire et d'avertir l'expéditeur.
>
> Cegid may process your personal data for the purpose of our business
> relationship management. For more information, please visit our website
> https://www.cegid.com/en/privacy-policy
> This message and any attachments are confidential and intended solely for
> the addressees. Any unauthorized use or disclosure, either whole or partial
> is prohibited. E-mails are susceptible to alteration; Cegid shall therefore
> not be liable for the content of this message. If you are not the intended
> recipient of this message, please delete it and notify the sender.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw-admin bi list failing with Input/output error

2022-04-21 Thread David Orman
Is this a versioned bucket?

On Thu, Apr 21, 2022 at 9:51 AM Guillaume Nobiron 
wrote:

> Hello,
>
> I have on issue on my ceph cluster (octopus 15.2.16) with several buckets
> raising a LARGE_OMAP_OBJECTS warning.
> I found the buckets in the resharding list but ceph fails to reshard them.
>
> The root cause seems to be on "bi list". When I run the following command
> on an impacted bucket, I get an input/output error:
> ===
> radosgw-admin bi list --bucket bucket-name >/dev/null
> ERROR: bi_list(): (5) Input/output error
> ===
> PS: I redirected to /dev/null because I still a lot of output before it
> fails
>
> I have the issue since version 15.2.15.
>
> Anybody knows if my problem could be fixed by the following backport :
> https://tracker.ceph.com/issues/53856 ?
>
>
>
>
> Cegid est susceptible d'effectuer un traitement sur vos données
> personnelles à des fins de gestion de notre relation commerciale. Pour plus
> d'information, consultez https://www.cegid.com/fr/privacy-policy
> Ce message et les pièces jointes sont confidentiels et établis à
> l'attention exclusive de ses destinataires. Toute utilisation ou diffusion,
> même partielle, non autorisée est interdite. Tout message électronique est
> susceptible d'altération; Cegid décline donc toute responsabilité au titre
> de ce message. Si vous n'êtes pas le destinataire de ce message, merci de
> le détruire et d'avertir l'expéditeur.
>
> Cegid may process your personal data for the purpose of our business
> relationship management. For more information, please visit our website
> https://www.cegid.com/en/privacy-policy
> This message and any attachments are confidential and intended solely for
> the addressees. Any unauthorized use or disclosure, either whole or partial
> is prohibited. E-mails are susceptible to alteration; Cegid shall therefore
> not be liable for the content of this message. If you are not the intended
> recipient of this message, please delete it and notify the sender.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Laggy OSDs

2022-03-29 Thread David Orman
We're definitely dealing with something that sounds similar, but hard to
state definitively without more detail. Do you have object lock/versioned
buckets in use (especially if one started being used around the time of the
slowdown)? Was this cluster always 16.2.7?

What is your pool configuration (EC k+m or replicated X setup), and do you
use the same pool for indexes and data? I'm assuming this is RGW usage via
the S3 API, let us know if this is not correct.

On Tue, Mar 29, 2022 at 4:13 PM Alex Closs  wrote:

> Hey folks,
>
> We have a 16.2.7 cephadm cluster that's had slow ops and several
> (constantly changing) laggy PGs. The set of OSDs with slow ops seems to
> change at random, among all 6 OSD hosts in the cluster. All drives are
> enterprise SATA SSDs, by either Intel or Micron. We're still not ruling out
> a network issue, but wanted to troubleshoot from the Ceph side in case
> something broke there.
>
> ceph -s:
>
>  health: HEALTH_WARN
>  3 slow ops, oldest one blocked for 246 sec, daemons
> [osd.124,osd.130,osd.141,osd.152,osd.27] have slow ops.
>
>  services:
>  mon: 5 daemons, quorum
> ceph-osd10,ceph-mon0,ceph-mon1,ceph-osd9,ceph-osd11 (age 28h)
>  mgr: ceph-mon0.sckxhj(active, since 25m), standbys: ceph-osd10.xmdwfh,
> ceph-mon1.iogajr
>  osd: 143 osds: 143 up (since 92m), 143 in (since 2w)
>  rgw: 3 daemons active (3 hosts, 1 zones)
>
>  data:
>  pools: 26 pools, 3936 pgs
>  objects: 33.14M objects, 144 TiB
>  usage: 338 TiB used, 162 TiB / 500 TiB avail
>  pgs: 3916 active+clean
>  19 active+clean+laggy
>  1 active+clean+scrubbing+deep
>
>  io:
>  client: 59 MiB/s rd, 98 MiB/s wr, 1.66k op/s rd, 1.68k op/s wr
>
> This is actually much faster than it's been for much of the past hour,
> it's been as low as 50 kb/s and dozens of iops in both directions (where
> the cluster typically does 300MB to a few gigs, and ~4k iops)
>
> The cluster has been on 16.2.7 since a few days after release without
> issue. The only recent change was an apt upgrade and reboot on the hosts
> (which was last Friday and didn't show signs of problems).
>
> Happy to provide logs, let me know what would be useful. Thanks for
> reading this wall :)
>
> -Alex
>
> MIT CSAIL
> he/they
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [RGW] Too much index objects and OMAP keys on them

2022-03-25 Thread David Orman
Hi Gilles,

Did you ever figure this out? Also, your rados ls output indicates that the
prod cluster has fewer objects in the index pool than the backup cluster,
or am I misreading this?

David

On Wed, Dec 1, 2021 at 4:32 AM Gilles Mocellin <
gilles.mocel...@nuagelibre.org> wrote:

> Hello,
>
> We see large omap objects warnings on the RGW bucket index pool.
> The objects OMAP keys are about objects in one identified big bucket.
>
> Context :
> =
> We use S3 storage for an application, with ~1,5 M objects.
>
> The production cluster is "replicated" with rclone cron jobs on another
> distant cluster.
>
> We have for the moment only one big bucket (23 shards), but we work on a
> multi-bucket solution.
> The problem is not here.
>
> One other important information : the bucket is versioned. We don't
> really have versions or deleted markers due to the way the application
> works. It's mainly a way for recovery as we don't have backups, due to
> the expected storage volume. Versioning + replication should solve most
> of the restoration use cases.
>
>
> First, we don't have large omap objects in the production cluster, only
> on the replicated / backup one.
>
> Differences between the two clusters :
> - production is a 5 nodes cluster with SSD for rocksdb+wal, 2To SCSI 10k
> in RAID0 + battery backed cache.
> - backup cluster is a 13 nodes cluster without SSD? only 8To HDD with
> direct HBA
>
> Both clusters use Erasure Coding for the RGW buckets data pool. (3+2 on
> the production one, 8+2 on the backup one).
>
> Firsts seen facts :
> ===
>
> Both cluster have the same number of S3 objects in the main bucket.
> I've seen that there is 10x more objects in the RGW buckets index pool
> in the prod cluster than in the backup cluster.
> On these objects, there is 4x more OMAP keys in the backup cluster.
>
> Example :
> With rados ls :
> - 311 objects in defaults.rgw.buckets.index (prod cluster)
> - 3157 objects in MRS4.rgw.buckets.index (backup cluster)
>
> In the backup cluster, we have 22 objects with more than 20 OMAP
> keys, that's why we have a Warning.
> Searching in the production cluster, I can see around 6 OMAP keys
> max on objects.
>
> Root Cause ?
> 
>
> It seems we have too much OMAP keys and even too much objects in the
> index pool of our backup cluster. But Why ? And how to remove the
> orphans ?
>
> I've already tried :
> - radosgw-admin bucket check --fix -check-objects (still running)
> - rgw-orphan-list (but was interrupted last night after 5 hours)
>
> As I understand, the last script will do the reverse of what I need :
> show objects that don't have indexes pointing on it ?
> The radosgw-admin bucket check will perhaps rebuild indexes, but will it
> remove unused ones ?
>
> Workaround ?
> 
>
> How can I get rid of the unused index objects and omap keys ?
> Of course, I can add more reshards, but I think it would be better to
> solve the root cause if I can.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm is stable or not in product?

2022-03-08 Thread David Orman
We use it without major issues, at this point. There are still flaws, but
there are flaws in almost any deployment and management system, and this is
not unique to cephadm. I agree with the general sentiment that you need to
have some knowledge about containers, however. I don't think that's
necessarily out of place with 2022, either. This most certainly has some
upsides that (for us) offset the downsides in additional
knowledge/complexity. We've significantly benefited from the ease of
containerized deployments when dealing with debugging OSDs with custom
containers without disturbing others on the same host, for example.

There's a huge thread with all the discussion concerning containers if you
search the archives; there's no point in repeating it again.

On Mon, Mar 7, 2022 at 10:17 PM norman.kern  wrote:

> Dear Ceph folks,
>
> Anyone is using cephadm in product(Version: Pacific)? I found several bugs
> on it and
> I really doubt it.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW automation encryption - still testing only?

2022-02-08 Thread David Orman
Totally understand, I'm not really a fan of service-managed encryption keys
as a general rule vs. client-managed. I just thought I'd probe about
capabilities considered stable before embarking on our own work. SSE-S3
would be a reasonable middle-ground. I appreciate the PR link, that's very
helpful.

On Tue, Feb 8, 2022 at 10:29 AM Casey Bodley  wrote:

> On Tue, Feb 8, 2022 at 11:11 AM Casey Bodley  wrote:
> >
> > hi David,
> >
> > that method of encryption based on rgw_crypt_default_encryption_key
> > will never be officially supported.
>
> to expand on why: rgw_crypt_default_encryption_key requires the key
> material to be stored insecurely in ceph's config, and cannot support
> key rotation
>
> > however, support for SSE-S3
> > encryption [1] is nearly complete in [2] (cc Marcus), and we hope to
> > include that in the quincy release - and if not, we'll backport it to
> > quincy in an early point release
> >
> > can SSE-S3 with PutBucketEncryption satisfy your use case?
> >
> > [1]
> https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingServerSideEncryption.html
> > [2] https://github.com/ceph/ceph/pull/44494
> >
> > On Tue, Feb 8, 2022 at 10:44 AM David Orman 
> wrote:
> > >
> > > Is RGW encryption for all objects at rest still testing only, and if
> not,
> > > which version is it considered stable in?:
> > >
> > >
> https://docs.ceph.com/en/latest/radosgw/encryption/#automatic-encryption-for-testing-only
> > >
> > > David
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW automation encryption - still testing only?

2022-02-08 Thread David Orman
Is RGW encryption for all objects at rest still testing only, and if not,
which version is it considered stable in?:

https://docs.ceph.com/en/latest/radosgw/encryption/#automatic-encryption-for-testing-only

David
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Monitoring ceph cluster

2022-01-26 Thread David Orman
What version of Ceph are you using? Newer versions deploy a dashboard and
prometheus module, which has some of this built in. It's a great start to
seeing what can be done using Prometheus and the built in exporter. Once
you learn this, if you decide you want something more robust, you can do an
external deployment of Prometheus (clusters), Alertmanager, Grafana, and
all the other tooling that might interest you for a more scalable solution
when dealing with more clusters. It's the perfect way to get your feet wet
and it showcases a lot of the interesting things you can do with this
solution!

https://docs.ceph.com/en/latest/mgr/dashboard/
https://docs.ceph.com/en/latest/mgr/prometheus/

David

On Wed, Jan 26, 2022 at 1:42 AM Michel Niyoyita  wrote:

> Thank you for your email Szabo, these can be helpful , can you provide
> links then I start to work on it.
>
> Michel.
>
> On Tue, 25 Jan 2022, 18:51 Szabo, Istvan (Agoda), 
> wrote:
>
> > Which monitoring tool? Like prometheus or nagios style thing?
> > We use sensu for keepalive and ceph health reporting + prometheus with
> > grafana for metrics collection.
> >
> > Istvan Szabo
> > Senior Infrastructure Engineer
> > ---
> > Agoda Services Co., Ltd.
> > e: istvan.sz...@agoda.com
> > ---
> >
> > On 2022. Jan 25., at 22:38, Michel Niyoyita  wrote:
> >
> > Email received from the internet. If in doubt, don't click any link nor
> > open any attachment !
> > 
> >
> > Hello team,
> >
> > I would like to monitor my ceph cluster using one of the
> > monitoring tool, does someone has a help on that ?
> >
> > Michel
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ideas for Powersaving on archive Cluster ?

2022-01-12 Thread David Orman
If performance isn't as big a concern, most servers have firmware settings
that enable more aggressive power saving, at the cost of added
latency/reduced cpu power/etc. HPE would be accessible/configurable via
HP's ILO, Dells with DRAC, etc. They'd want to test and see how much of an
impact it made on performance vs. power consumption at idle, to see if it's
worth the tradeoffs.

On Wed, Jan 12, 2022 at 3:22 AM Christoph Adomeit <
christoph.adom...@gatworks.de> wrote:

> Hi,
>
> a customer has a ceph cluster which is used for archiving large amounts of
> video data.
>
> The cluster sometimes is not used for several days but if data is needed
> the cluster
> should be available within a few minutes.
>
> The cluster consists of 5 Servers and 180 physical seagate harddisks and
> wastes a lot of power
> for drives and cooling.
>
> Any ideas what can be done to reduce the power usage an heat output in
> this scenario ?
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm issues

2022-01-07 Thread David Orman
What are you trying to do that won't work? If you need resources from
outside the container, it doesn't sound like something you should need to
be entering a shell inside the container to accomplish.

On Fri, Jan 7, 2022 at 1:49 PM François RONVAUX 
wrote:

> Thanks for the answer.
>
> I would want to get the ceph CLI to do some admin tasks.
>
> I can access to it with the command suggested at the end of the
> bootstrap process :
> You can access the Ceph CLI with:
>
> sudo /usr/sbin/cephadm shell --fsid dbd1f122-6fd1-11ec-b7dc-560003c792b4 -c
> /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring
>
> But when I do that, I think I'am locked into a chroot that prevent me to
> access to some files I need to perform the admin tasks  :-(
>
>
>
> Le ven. 7 janv. 2022 à 19:50, John Mulligan 
> a écrit :
>
> > On Friday, January 7, 2022 11:56:53 AM EST François RONVAUX wrote:
> > > Hello,
> > >
> > >
> > > On a CentOS Stream9 , when I try to install the ceph packages from the
> > > pacific release, I got this error message :
> > >
> > > [root@ceph00 ~]# cephadm install ceph-common
> > > Installing packages ['ceph-common']...
> > > Non-zero exit code 1 from yum install -y ceph-common
> > > yum: stdout Ceph x86_64 161  B/s |
> > 162
> > >  B 00:01
> > > yum: stderr Errors during downloading metadata for repository 'Ceph':
> > > yum: stderr   - Status code: 404 for
> > > https://download.ceph.com/rpm-pacific/el9/x86_64/repodata/repomd.xml
> > (IP:
> > > 158.69.68.124)
> > > yum: stderr Error: Failed to download metadata for repo 'Ceph': Cannot
> > > download repomd.xml: Cannot download repodata/repomd.xml: All mirrors
> > were
> > > tried
> > > Traceback (most recent call last):
> > >   File "/usr/sbin/cephadm", line 8571, in 
> > > main()
> > >   File "/usr/sbin/cephadm", line 8559, in main
> > > r = ctx.func(ctx)
> > >   File "/usr/sbin/cephadm", line 6459, in command_install
> > > pkg.install(ctx.packages)
> > >   File "/usr/sbin/cephadm", line 6306, in install
> > > call_throws(self.ctx, [self.tool, 'install', '-y'] + ls)
> > >   File "/usr/sbin/cephadm", line 1467, in call_throws
> > > raise RuntimeError('Failed command: %s' % ' '.join(command))
> > > RuntimeError: Failed command: yum install -y ceph-common
> > >
> > > Any idea on how I can fix it ?
> > >
> >
> > AFAICT there's no RPMs being built for centos stream 9 (yet).  Depending
> > on
> > what you're trying to do you may need to wait for (or request) ceph to
> > support
> > building RPMs for centos stream 9.
> >
> >
> >
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Repair/Rebalance slows down

2022-01-06 Thread David Orman
What's iostat show for the drive in question? What you're seeing is the
cluster rebalancing initially, then at the end, it's probably that single
drive being filled. I'd expect 25-100MB/s to be the fill rate of the newly
added drive with backfills per osd set to 2 or so (much more than that
doesn't help). Check the disk utilization for the newly added OSD at the
tail end, and you'll probably see it IOPS saturated.

On Thu, Jan 6, 2022 at 8:09 AM Ray Cunningham 
wrote:

> Hi Everyone!
>
> I have a 16 node, 640 OSD (5 to 1 SSD) bluestore cluster which is mainly
> used for RGW services. It has its own backend cluster network for IO
> separate from the customer network.
>
> Whenever we add or remove an OSD the rebalance or repair IO starts off
> very fast 4GB/s+ but it will continually slow down over a week and by then
> end it's moving at KB/s. So each 16TB OSD takes a week+ to repair or
> rebalance! I have not been able to identify any bottleneck or slow point,
> it just seems to be Ceph taking longer to do its thing.
>
> Are there any settings I can check or change to get the repair speed to
> maintain a high level to completion? If we could stay in the GB/s speed we
> should be able to repair in a couple days, not a week or more...
>
> Thank you,
> Ray
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.7 pacific QE validation status, RC1 available for testing

2021-12-03 Thread David Orman
We've been testing RC1 since release on our 504 OSD/21 host, with split
db/wal test cluster, and have experienced no issues on upgrade or operation
so far.

On Mon, Nov 29, 2021 at 11:23 AM Yuri Weinstein  wrote:

> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/53324
> Release Notes - https://github.com/ceph/ceph/pull/44131
>
> Seeking approvals for:
>
> rados - Neha
> rgw - Casey
> rbd - Ilya, Deepika
> krbd  Ilya, Deepika
> fs - Venky, Patrick
> upgrade/nautilus-x - Neha, Josh
> upgrade/pacific-p2p - Neha, Josh
>
> 
> We are also publishing a release candidate this time for users to try
> for testing only.
>
> The branch name is pacific-16.2.7_RC1
> (
> https://shaman.ceph.com/builds/ceph/pacific-16.2.7_RC1/fdc003bc12f1b2443c4596eeacb32cf62e806970/
> )
>
> ***Don’t use this RC on production clusters!***
>
> The idea of doing Release Candidates (RC) for point releases before
> doing final point releases was discussed in the first-ever Ceph User +
> Dev Monthly Meeting. Everybody thought that it was a good idea, to
> help identify bugs that do not get caught in integration testing.
>
> The goal is to give users time to test and give feedback on RC
> releases while our upstream long-running cluster also runs the same RC
> release during that time (period of one week). We will kick this
> process off with 16.2.7 and the pacific-16.2.7_RC1 release is now
> available for users to test.
>
> Please respond to this email to provide any feedback on issues found
> in this release.
>
> Thx
> YuriW
>
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is it normal for a orch osd rm drain to take so long?

2021-12-02 Thread David Orman
Hi,

It would be good to have the full output. Does iostat show the backing
device performing I/O? Additionally, what does ceph -s show for cluster
state? Also, can you check the logs on that OSD, and see if anything looks
abnormal?

David

On Thu, Dec 2, 2021 at 1:20 PM Zach Heise (SSCC)  wrote:

> Good morning David,
>
> Assuming you need/want to see the data about the other 31 OSDs, 14 is
> showing:
> ID
> CLASS
> WEIGHT
> REWEIGHT
> SIZE
> RAW USE
> DATA
> OMAP
> META
> AVAIL
> %USE
> VAR
> PGS
> STATUS
> 14
> hdd
> 2.72899
> 0
> 0 B
> 0 B
> 0 B
> 0 B
> 0 B
> 0 B
> 0
> 0
> 1
> up
>
> Zach
>
> On 2021-12-01 5:20 PM, David Orman wrote:
>
> What's "ceph osd df" show?
>
> On Wed, Dec 1, 2021 at 2:20 PM Zach Heise (SSCC) 
> wrote:
>
>> I wanted to swap out on existing OSD, preserve the number, and then
>> remove the HDD that had it (osd.14 in this case) and give the ID of 14 to a
>> new SSD that would be taking its place in the same node. First time ever
>> doing this, so not sure what to expect.
>>
>> I followed the instructions here
>> <https://docs.ceph.com/en/latest/cephadm/services/osd/#remove-an-osd>,
>> using the --replace flag.
>>
>> However, I'm a bit concerned that the operation is taking so long in my
>> test cluster. Out of 70TB in the cluster, only 40GB were in use. This is a
>> relatively large OSD in comparison to others in the cluster (2.7TB versus
>> 300GB for most other OSDs) and yet it's been 36 hours with the following
>> status:
>>
>> ceph04.ssc.wisc.edu> ceph orch osd rm status
>> OSD_ID  HOST STATE PG_COUNT  REPLACE  FORCE  
>> DRAIN_STARTED_AT
>> 14  ceph04.ssc.wisc.edu  draining  1 True True   2021-11-30 
>> 15:22:23.469150+00:00
>>
>>
>> Another note: I don't know why it has the "force = true" set; the command
>> that I ran was just Ceph orch osd rm 14 --replace, without specifying
>> --force. Hopefully not a big deal but still strange.
>>
>> At this point is there any way to tell if it's still actually doing
>> something, or perhaps it is hung? if it is hung, what would be the
>> 'recommended' way to proceed? I know that I could just manually eject the
>> HDD from the chassis and run the "ceph osd crush remove osd.14" command and
>> then manually delete the auth keys, etc, but the documentation seems to
>> state that this shouldn't be necessary if a ceph OSD replacement goes
>> properly.
>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is it normal for a orch osd rm drain to take so long?

2021-12-01 Thread David Orman
What's "ceph osd df" show?

On Wed, Dec 1, 2021 at 2:20 PM Zach Heise (SSCC)  wrote:

> I wanted to swap out on existing OSD, preserve the number, and then remove
> the HDD that had it (osd.14 in this case) and give the ID of 14 to a new
> SSD that would be taking its place in the same node. First time ever doing
> this, so not sure what to expect.
>
> I followed the instructions here
> ,
> using the --replace flag.
>
> However, I'm a bit concerned that the operation is taking so long in my
> test cluster. Out of 70TB in the cluster, only 40GB were in use. This is a
> relatively large OSD in comparison to others in the cluster (2.7TB versus
> 300GB for most other OSDs) and yet it's been 36 hours with the following
> status:
>
> ceph04.ssc.wisc.edu> ceph orch osd rm status
> OSD_ID  HOST STATE PG_COUNT  REPLACE  FORCE  
> DRAIN_STARTED_AT
> 14  ceph04.ssc.wisc.edu  draining  1 True True   2021-11-30 
> 15:22:23.469150+00:00
>
>
> Another note: I don't know why it has the "force = true" set; the command
> that I ran was just Ceph orch osd rm 14 --replace, without specifying
> --force. Hopefully not a big deal but still strange.
>
> At this point is there any way to tell if it's still actually doing
> something, or perhaps it is hung? if it is hung, what would be the
> 'recommended' way to proceed? I know that I could just manually eject the
> HDD from the chassis and run the "ceph osd crush remove osd.14" command and
> then manually delete the auth keys, etc, but the documentation seems to
> state that this shouldn't be necessary if a ceph OSD replacement goes
> properly.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Pg autoscaling and device_health_metrics pool pg sizing

2021-11-02 Thread David Orman
I suggest continuing with manual PG sizing for now. With 16.2.6 we have
seen the autoscaler scale up the device health metrics to 16000+ PGs on
brand new clusters, which we know is incorrect. It's on our company backlog
to investigate, but far down the backlog. It's bitten us enough times in
the past, and even with new 16.2.6 deployments, we disable it immediately
on cluster creation. On PGs per OSD, 100 does seem about optimal in our
experience, just bear in mind how many pools you have/what type of pools
(EC vs. replicated). We generally end up around ~100 and our clusters are
fairly well balanced, but without inordinate CPU/etc.

On Mon, Nov 1, 2021 at 11:31 AM Alex Petty  wrote:

> Hello,
>
> I’m evaluating Ceph as a storage option, using ceph version 16.2.6,
> Pacific stable installed using cephadm. I was hoping to use PG autoscaling
> to reduce ops efforts. I’m standing this up on a cluster with 96 OSDs
> across 9 hosts.
>
> The device_health_metrics pool was created by Ceph automatically once I
> started adding OSD  and created with 2048 PGs. This seems high, and put
> many PGs on each OSD. Documentation indicates that I should be targeting
> around 100 PGs per OSD, is that guideline out of date?
>
> Also, when I created a pool to test erasure coded with a 6+2 config for
> CephFS with PG autoscaling enabled, it was created with 1PG to start, and
> didn’t scale up even as I loaded test data onto it, giving the entire
> CephFS the write performance of 1 single disk as it was only writing to 1
> disk and backfilling to 7 others. Should I be manually setting default PGs
> at a sane level (512, 1024) or will autoscaling size this pool up? I have
> never seen any output from ceph osd pool autoscale-status when I am trying
> to see autoscaling information.
>
> I’d appreciate some guidance about configuring PGs on Pacific.
>
> Thanks,
>
> Alex
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Free space in ec-pool should I worry?

2021-11-01 Thread David Orman
The balancer does a pretty good job. It's the PG autoscaler that has bitten
us frequently enough that we always ensure it is disabled for all pools.

David

On Mon, Nov 1, 2021 at 2:08 PM Alexander Closs  wrote:

> I can add another 2 positive datapoints for the balancer, my personal and
> work clusters are both happily balancing.
>
> Good luck :)
> -Alex
>
> On 11/1/21, 3:05 PM, "Josh Baergen"  wrote:
>
> Well, those who have negative reviews are often the most vocal. :)
> We've had few, if any, problems with the balancer in our own use of
> it.
>
> Josh
>
> On Mon, Nov 1, 2021 at 12:58 PM Szabo, Istvan (Agoda)
>  wrote:
> >
> > Yeah, just follow the autoscaler at the moment, it suggested 128,
> might enable later the balancer, just scare a bit due to negative feedbacks
> about it.
> >
> > Istvan Szabo
> > Senior Infrastructure Engineer
> > ---
> > Agoda Services Co., Ltd.
> > e: istvan.sz...@agoda.com
> > ---
> >
> > On 2021. Nov 1., at 19:29, Josh Baergen 
> wrote:
> >
> > Email received from the internet. If in doubt, don't click any link
> nor open any attachment !
> > 
> >
> > To expand on the comments below, "max avail" takes into account usage
> > imbalance between OSDs. There's a pretty significant imbalance in
> this
> > cluster and Ceph assumes that the imbalance will continue, and thus
> > indicates that there's not much room left in the pool. Rebalancing
> > that pool will make a big difference in terms of top-OSD fullness and
> > the "max avail" metric.
> >
> > Josh
> >
> > On Mon, Nov 1, 2021 at 12:25 PM Alexander Closs <
> acl...@csail.mit.edu> wrote:
> >
> >
> > Max available = free space actually usable now based on OSD usage,
> not including already-used space.
> >
> >
> > -Alex
> >
> > MIT CSAIL
> >
> >
> > On 11/1/21, 2:18 PM, "Szabo, Istvan (Agoda)" <
> istvan.sz...@agoda.com> wrote:
> >
> >
> >It says max available: 115TB and current use is 104TB, what I
> don’t understand where the max available come from because on the pool no
> object and no size limit is set:
> >
> >
> >quotas for pool 'sin.rgw.buckets.data':
> >
> >  max objects: N/A
> >
> >  max bytes  : N/A
> >
> >
> >Istvan Szabo
> >
> >Senior Infrastructure Engineer
> >
> >---
> >
> >Agoda Services Co., Ltd.
> >
> >e: istvan.sz...@agoda.com
> >
> >---
> >
> >
> >On 2021. Nov 1., at 18:48, Etienne Menguy <
> etienne.men...@croit.io> wrote:
> >
> >
> >sin.rgw.buckets.data24  128  104 TiB  104 TiB  0 B
> 1.30G  156 TiB  156 TiB  0 B  47.51115 TiB  N/AN/A
>  1.30G 0 B  0 B
> >
> >___
> >
> >ceph-users mailing list -- ceph-users@ceph.io
> >
> >To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> >
> > ___
> >
> > ceph-users mailing list -- ceph-users@ceph.io
> >
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Adopting "unmanaged" OSDs into OSD service specification

2021-10-13 Thread David Orman
That's the exact situation we've found too. We'll add it to our backlog to
investigate on the development side since it seems nobody else has run into
this issue before.

David

On Wed, Oct 13, 2021 at 4:24 AM Luis Domingues 
wrote:

> Hi,
>
> We have the same issue on our lab cluster. The only way I found to have
> the osds on the new specification was to drain, remove and re-add the host.
> The orchestrator was happy to recreate the osds under the good
> specification.
>
> But I do not think this is a good solution for production cluster. We are
> still looking for a more smooth way to do that.
>
> Luis Domingues
>
> ‐‐‐ Original Message ‐‐‐
>
> On Monday, October 4th, 2021 at 10:01 PM, David Orman <
> orma...@corenode.com> wrote:
>
> > We have an older cluster which has been iterated on many times. It's
> >
> > always been cephadm deployed, but I am certain the OSD specification
> >
> > used has changed over time. I believe at some point, it may have been
> >
> > 'rm'd.
> >
> > So here's our current state:
> >
> > root@ceph02:/# ceph orch ls osd --export
> >
> > service_type: osd
> >
> > service_id: osd_spec_foo
> >
> > service_name: osd.osd_spec_foo
> >
> > placement:
> >
> > label: osd
> >
> > spec:
> >
> > data_devices:
> >
> > rotational: 1
> >
> > db_devices:
> >
> > rotational: 0
> >
> > db_slots: 12
> >
> > filter_logic: AND
> >
> > objectstore: bluestore
> >
> 
> >
> > service_type: osd
> >
> > service_id: unmanaged
> >
> > service_name: osd.unmanaged
> >
> > placement: {}
> >
> > unmanaged: true
> >
> > spec:
> >
> > filter_logic: AND
> >
> > objectstore: bluestore
> >
> > root@ceph02:/# ceph orch ls
> >
> > NAME PORTS RUNNING REFRESHED AGE PLACEMENT
> >
> > crash 7/7 10m ago 14M *
> >
> > mgr 5/5 10m ago 7M label:mgr
> >
> > mon 5/5 10m ago 14M label:mon
> >
> > osd.osd_spec_foo 0/7 - 24m label:osd
> >
> > osd.unmanaged 167/167 10m ago - 
> >
> > The osd_spec_foo would match these devices normally, so we're curious
> >
> > how we can get these 'managed' under this service specification.
> >
> > What's the appropriate way in order to effectively 'adopt' these
> >
> > pre-existing OSDs into the service specification that we want them to
> >
> > be managed under?
> >
> > ceph-users mailing list -- ceph-users@ceph.io
> >
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RFP for arm64 test nodes

2021-10-09 Thread David Orman
If there's intent to use this for performance comparisons between releases,
I would propose that you include rotational drive(s), as well. It will be
quite some time before everyone is running pure NVME/SSD clusters with the
storage costs associated with that type of workload, and this should be
reflected in test clusters.

On Fri, Oct 8, 2021 at 6:25 PM Dan Mick  wrote:

> Ceph has been completely ported to build and run on ARM hardware
> (architecture arm64/aarch64), but we're unable to test it due to lack of
> hardware.  We propose to purchase a significant number of ARM servers
> (50+?) to install in our upstream Sepia test lab to use for upstream
> testing of Ceph, alongside the x86 hardware we already own.
>
> This message is to start a discussion of what the nature of that
> hardware should be, and an investigation as to what's available and how
> much it might cost.  The general idea is to build something arm64-based
> that is similar to the smithi/gibba nodes:
>
> https://wiki.sepia.ceph.com/doku.php?id=hardware:gibba
>
> Some suggested features:
>
> * base hardware/peripheral support for current releases of RHEL, CentOS,
> Ubuntu
> * 1 fast and largish (400GB+) NVME drive for OSDs (it will be
> partitioned into 4-5 subdrives for tests)
> * 1 large (1TB+) SSD/HDD for boot/system and logs (faster is better but
> not as crucial as for cluster storage)
> * Remote/headless management (IPMI?)
> * At least 1 10G network interface per host
> * Order of 64GB main memory per host
>
> Density is valuable to the lab; we have space but not an unlimited amount.
>
> Any suggestions on vendors or specific server configurations?
>
> Thanks!
>
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Adopting "unmanaged" OSDs into OSD service specification

2021-10-04 Thread David Orman
We have an older cluster which has been iterated on many times. It's
always been cephadm deployed, but I am certain the OSD specification
used has changed over time. I believe at some point, it may have been
'rm'd.

So here's our current state:

root@ceph02:/# ceph orch ls osd --export
service_type: osd
service_id: osd_spec_foo
service_name: osd.osd_spec_foo
placement:
  label: osd
spec:
  data_devices:
rotational: 1
  db_devices:
rotational: 0
  db_slots: 12
  filter_logic: AND
  objectstore: bluestore
---
service_type: osd
service_id: unmanaged
service_name: osd.unmanaged
placement: {}
unmanaged: true
spec:
  filter_logic: AND
  objectstore: bluestore

root@ceph02:/# ceph orch ls
NAMEPORTS  RUNNING  REFRESHED  AGE  PLACEMENT
crash  7/7  10m ago14M  *
mgr5/5  10m ago7M   label:mgr
mon5/5  10m ago14M  label:mon
osd.osd_spec_foo 0/7  -  24m  label:osd
osd.unmanaged  167/167  10m ago-

The osd_spec_foo would match these devices normally, so we're curious
how we can get these 'managed' under this service specification.
What's the appropriate way in order to effectively 'adopt' these
pre-existing OSDs into the service specification that we want them to
be managed under?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [16.2.6] When adding new host, cephadm deploys ceph image that no longer exists

2021-09-29 Thread David Orman
It appears when an updated container for 16.2.6 (there was a remoto
version included with a bug in the first release) was pushed, the old
one was removed from quay. We had to update our 16.2.6 clusters to the
'new' 16.2.6 version, and just did the typical upgrade with the image
specified. This should resolve your issue, as well as fixing the
effects of the remoto bug:

https://tracker.ceph.com/issues/50526
https://github.com/alfredodeza/remoto/pull/63

Once you're upgraded, I would expect it to use the correct hash for
the host adds.

On Wed, Sep 29, 2021 at 11:02 AM Andrew Gunnerson
 wrote:
>
> Hello all,
>
> I'm trying to troubleshoot a test cluster that is attempting to deploy an old
> quay.io/ceph/ceph@sha256: image that no longer exists when adding a new
> host.
>
> The cluster is running 16.2.6 and was deployed last week with:
>
> cephadm bootstrap --mon-ip $(facter -p ipaddress) --allow-fqdn-hostname 
> --ssh-user cephadm
> # Within "cephadm shell"
> ceph orch host add   _admin
> 
>
> This initial cluster worked fine and the mon/mgr/osd/crash/etc containers were
> all running the following image:
>
> 
> quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c
>
> This week, we tried deploying 3 additional hosts using the same "ceph orch 
> host
> add" commands and cephadm seems to be attempting to deploy the same image, but
> it no longer exists on quay.io.
>
> The error shows up in the active mgr's logs as:
>
> Non-zero exit code 125 from /bin/podman run --rm --ipc=host 
> --stop-signal=SIGTERM --net=host --entrypoint stat --init -e 
> CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c
>  -e NODE_NAME= -e CEPH_USE_RANDOM_NONCE=1 
> quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c
>  -c %u %g /var/lib/ceph
> stat: stderr Trying to pull 
> quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c...
> stat: stderr Error: Error initializing source 
> docker://quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c:
>  Error reading manifest 
> sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c in 
> quay.io/ceph/ceph: manifest unknown: manifest unknown
>
> I suspect this is because of the container_image global config option:
>
> [ceph: root@ /]# ceph config-key get 
> config/global/container_image
> 
> quay.io/ceph/ceph@sha256:31ad0a2bd8182c948cace326251ce1561804d7de948f370c8c44d29a175cc67c
>
> My questions are:
>
> * Is it expected for the cluster to reference a (potentially nonexistent) 
> image
>   by sha256 hash versus (eg.) the :v16 or :v16.2.6 tags?
>
> * What's the best way to get back into a state where new hosts can be added
>   again? Is it sufficient to just update the container_image global config?
>
> Thank you!
> Andrew Gunnerson
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: prometheus - figure out which mgr (metrics endpoint) that is active

2021-09-28 Thread David Orman
We scrape all mgr endpoints since we use external Prometheus clusters,
as well. The query results will have {instance=activemgrhost}. The
dashboards in upstream don't have multiple cluster support, so we have
to modify them to work with our deployments since we have multiple
ceph clusters being polled by Prometheus clusters. We effectively add
instance regular expressions to all the queries on the dashboards, and
a variable for the dashboard itself, to support getting the list of
clusters via a label_values call on one of the ceph_exporter metrics +
regular expression to parse out the part after the hostname portion of
the fqdn.

I don't think the current dashboards are intended for use outside the
internal Prometheus deployments, but we definitely intended (at some
point when time permitted) to try and submit patches that would work
for both use-cases, since it's painful to continually update the
dashboards on every release.

On Tue, Sep 28, 2021 at 12:45 PM Karsten Nielsen  wrote:
>
> Hi,
> I am running ceph 16.2.6 installed with cephadm.
> I have enabled prometheus to be able scrape metrics from an external
> promethus server.
> I have 3 nodes with mgr daeamon all reply to the query against
> node:9283/metrics 2 is returning a empty reply - the none active mgr's.
> Is there a node:9283/health or other path to query for the once that is
> not active ?
> I am asking as I am getting empty dashboards 2 of 3 times as there are
> no metrics when the wrong endpoint is getting scraped.
>
> Thanks,
> - Karsten
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Change max backfills

2021-09-24 Thread David Orman
With recent releases, 'ceph config' is probably a better option; do
keep in mind this sets things cluster-wide. If you're just wanting to
target specific daemons, then tell may be better for your use case.

# get current value
ceph config get osd osd_max_backfills

# set new value to 2, for example
ceph config set osd osd_max_backfills 2

# to go back to default
ceph config rm osd osd_max_backfills

Of course, I suggest you test in a non-production cluster first. :)

David

On Wed, Sep 22, 2021 at 8:34 AM Pascal Weißhaupt
 wrote:
>
> Hi,
>
>
>
> I recently upgraded from Ceph 15 to Ceph 16 and when I want to change the max 
> backfills via
>
>
>
> ceph tell 'osd.*' injectargs '--osd-max-backfills 1'
>
>
>
> I get no output:
>
>
>
> root@pve01:~# ceph tell 'osd.*' injectargs '--osd-max-backfills 1'
> osd.0: {}
> osd.1: {}
> osd.2: {}
> osd.3: {}
> osd.4: {}
> osd.5: {}
> osd.6: {}
> osd.7: {}
> osd.8: {}
> osd.9: {}
> osd.10: {}
> osd.11: {}
> osd.12: {}
> osd.13: {}
> osd.14: {}
> osd.15: {}
> osd.16: {}
> osd.17: {}
> osd.18: {}
> osd.19: {}
>
>
>
> If I remember correctly, with Ceph 15 I got something like "changed max 
> backfills to 1" or so.
>
>
>
> Is that command not supported anymore or is the empty output correct?
>
>
>
> Regards,
>
> Pascal
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remoto 1.1.4 in Ceph 16.2.6 containers

2021-09-22 Thread David Orman
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-4b2736a28c

^^ if people want to test and provide feedback for a potential merge
to EPEL8 stable.

David

On Wed, Sep 22, 2021 at 11:43 AM David Orman  wrote:
>
> I'm wondering if this was installed using pip/pypi before, and now
> switched to using EPEL? That would explain it - 1.2.1 may never have
> been pushed to EPEL.
>
> David
>
> On Wed, Sep 22, 2021 at 11:26 AM David Orman  wrote:
> >
> > We'd worked on pushing a change to fix
> > https://tracker.ceph.com/issues/50526 for a deadlock in remoto here:
> > https://github.com/alfredodeza/remoto/pull/63
> >
> > A new version, 1.2.1, was built to help with this. With the Ceph
> > release 16.2.6 (at least), we see 1.1.4 is again part of the
> > containers. Looking at EPEL8, all that is built now is 1.1.4. We're
> > not sure what happened, but would it be possible to get 1.2.1 pushed
> > to EPEL8 again, and figure out why it was removed? We'd then need a
> > rebuild of the 16.2.6 containers to 'fix' this bug.
> >
> > This is definitely a high urgency bug, as it impacts any deployments
> > with medium to large counts of OSDs or split db/wal devices, like many
> > modern deployments.
> >
> > https://koji.fedoraproject.org/koji/packageinfo?packageID=18747
> > https://dl.fedoraproject.org/pub/epel/8/Everything/x86_64/Packages/p/
> >
> > Respectfully,
> > David Orman
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remoto 1.1.4 in Ceph 16.2.6 containers

2021-09-22 Thread David Orman
I'm wondering if this was installed using pip/pypi before, and now
switched to using EPEL? That would explain it - 1.2.1 may never have
been pushed to EPEL.

David

On Wed, Sep 22, 2021 at 11:26 AM David Orman  wrote:
>
> We'd worked on pushing a change to fix
> https://tracker.ceph.com/issues/50526 for a deadlock in remoto here:
> https://github.com/alfredodeza/remoto/pull/63
>
> A new version, 1.2.1, was built to help with this. With the Ceph
> release 16.2.6 (at least), we see 1.1.4 is again part of the
> containers. Looking at EPEL8, all that is built now is 1.1.4. We're
> not sure what happened, but would it be possible to get 1.2.1 pushed
> to EPEL8 again, and figure out why it was removed? We'd then need a
> rebuild of the 16.2.6 containers to 'fix' this bug.
>
> This is definitely a high urgency bug, as it impacts any deployments
> with medium to large counts of OSDs or split db/wal devices, like many
> modern deployments.
>
> https://koji.fedoraproject.org/koji/packageinfo?packageID=18747
> https://dl.fedoraproject.org/pub/epel/8/Everything/x86_64/Packages/p/
>
> Respectfully,
> David Orman
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Remoto 1.1.4 in Ceph 16.2.6 containers

2021-09-22 Thread David Orman
We'd worked on pushing a change to fix
https://tracker.ceph.com/issues/50526 for a deadlock in remoto here:
https://github.com/alfredodeza/remoto/pull/63

A new version, 1.2.1, was built to help with this. With the Ceph
release 16.2.6 (at least), we see 1.1.4 is again part of the
containers. Looking at EPEL8, all that is built now is 1.1.4. We're
not sure what happened, but would it be possible to get 1.2.1 pushed
to EPEL8 again, and figure out why it was removed? We'd then need a
rebuild of the 16.2.6 containers to 'fix' this bug.

This is definitely a high urgency bug, as it impacts any deployments
with medium to large counts of OSDs or split db/wal devices, like many
modern deployments.

https://koji.fedoraproject.org/koji/packageinfo?packageID=18747
https://dl.fedoraproject.org/pub/epel/8/Everything/x86_64/Packages/p/

Respectfully,
David Orman
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread David Orman
Same question here, for clarity, was this on upgrading to 16.2.6 from
16.2.5? Or upgrading
from some other release?

On Mon, Sep 20, 2021 at 8:57 AM Sean  wrote:
>
>  I also ran into this with v16. In my case, trying to run a repair totally
> exhausted the RAM on the box, and was unable to complete.
>
> After removing/recreating the OSD, I did notice that it has a drastically
>  smaller OMAP size than the other OSDs. I don’t know if that actually means
> anything, but just wanted to mention it in case it does.
>
> ID   CLASS  WEIGHT REWEIGHT  SIZE RAW USE  DATA OMAP META
>   AVAIL%USE   VAR   PGS  STATUS  TYPE NAME
> 14   hdd10.91409   1.0   11 TiB  3.3 TiB  3.2 TiB  4.6 MiB  5.4 GiB
>  7.7 TiB  29.81  1.02   34  uposd.14
> 16   hdd10.91409   1.0   11 TiB  3.3 TiB  3.3 TiB   20 KiB  9.4 GiB
>  7.6 TiB  30.03  1.03   35  uposd.16
>
> ~ Sean
>
>
> On Sep 20, 2021 at 8:27:39 AM, Paul Mezzanini  wrote:
>
> > I got the exact same error on one of my OSDs when upgrading to 16.  I
> > used it as an exercise on trying to fix a corrupt rocksdb. A spent a few
> > days of poking with no success.  I got mostly tool crashes like you are
> > seeing with no forward progress.
> >
> > I eventually just gave up, purged the OSD, did a smart long test on the
> > drive to be sure and then threw it back in the mix.  Been HEALTH OK for
> > a week now after it finished refilling the drive.
> >
> >
> > On 9/19/21 10:47 AM, Andrej Filipcic wrote:
> >
> > 2021-09-19T15:47:13.610+0200 7f8bc1f0e700  2 rocksdb:
> >
> > [db_impl/db_impl_compaction_flush.cc:2344] Waiting after background
> >
> > compaction error: Corruption: block checksum mismatch: expected
> >
> > 2427092066, got 4051549320  in db/251935.sst offset 18414386 size
> >
> > 4032, Accumulated background error counts: 1
> >
> > 2021-09-19T15:47:13.636+0200 7f8bbacf1700 -1 rocksdb: submit_common
> >
> > error: Corruption: block checksum mismatch: expected 2427092066, got
> >
> > 4051549320  in db/251935.sst offset 18414386 size 4032 code = 2
> >
> > Rocksdb transaction:
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread David Orman
For clarity, was this on upgrading to 16.2.6 from 16.2.5? Or upgrading
from some other release?

On Mon, Sep 20, 2021 at 8:33 AM Paul Mezzanini  wrote:
>
> I got the exact same error on one of my OSDs when upgrading to 16.  I
> used it as an exercise on trying to fix a corrupt rocksdb. A spent a few
> days of poking with no success.  I got mostly tool crashes like you are
> seeing with no forward progress.
>
> I eventually just gave up, purged the OSD, did a smart long test on the
> drive to be sure and then threw it back in the mix.  Been HEALTH OK for
> a week now after it finished refilling the drive.
>
>
> On 9/19/21 10:47 AM, Andrej Filipcic wrote:
> > 2021-09-19T15:47:13.610+0200 7f8bc1f0e700  2 rocksdb:
> > [db_impl/db_impl_compaction_flush.cc:2344] Waiting after background
> > compaction error: Corruption: block checksum mismatch: expected
> > 2427092066, got 4051549320  in db/251935.sst offset 18414386 size
> > 4032, Accumulated background error counts: 1
> > 2021-09-19T15:47:13.636+0200 7f8bbacf1700 -1 rocksdb: submit_common
> > error: Corruption: block checksum mismatch: expected 2427092066, got
> > 4051549320  in db/251935.sst offset 18414386 size 4032 code = 2
> > Rocksdb transaction:
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD based ec-code

2021-09-14 Thread David Orman
We don't allow usage to grow over the threshold at which losing the
servers would be impactful to the cluster. We keep usage low enough
(we remove two hosts of capacity from the overall cluster allocation
limit in our provisioning and management systems) to tolerate at least
2 failures while still accepting writes, and will expand clusters
before we reach that threshold. Three host failures can be survived
with no data loss, but may impact writes.

Hope this helps,
David

On Tue, Sep 14, 2021 at 9:03 AM Szabo, Istvan (Agoda)
 wrote:
>
> Yeah understand this point as well. So you keep 3 nodes as kind of 'spare' 
> for data rebalance. Do you use the space of them or you maximize the usage on 
> a 11 cluster level?
>
> Why I'm asking I have a cluster with 6 hosts on a 4:2 ec pool, I'm planning 
> to add 1 more node but as a spare only, so the space I'll not use  if host 
> dies al the data can be rebalanced. Not sure to be honest what is the correct 
> node number for a 4:2 ec code. At least I guess 7 on a host based ec.
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> -------
>
> -Original Message-
> From: David Orman 
> Sent: Tuesday, September 14, 2021 8:55 PM
> To: Eugen Block 
> Cc: ceph-users 
> Subject: [ceph-users] Re: OSD based ec-code
>
> Email received from the internet. If in doubt, don't click any link nor open 
> any attachment !
> 
>
> Keep in mind performance, as well. Once you start getting into higher 'k' 
> values with EC, you've got a lot more drives involved that need to return 
> completions for operations, and on rotational drives this becomes especially 
> painful. We use 8+3 for a lot of our purposes, as it's a good balance of 
> efficiency, durability (number of complete host failures we can tolerate), 
> and enough performance. It's definitely significantly slower than something 
> like 4+2 or 3x replicated, though.
> It also means we don't deploy clusters below 14 hosts, so we can tolerate 
> multiple host failures _and still accept writes_. It never fails that you 
> have a host issue, and while working on that, another host dies. Same lessons 
> many learn with RAIDs with single drive redundancy - lose a drive, start a 
> rebuild, another drive fails and data gone. It's almost always the correct 
> response to err on the side of durability when it comes to these decisions, 
> unless the data is unimportant and maximum performance is required.
>
> On Tue, Sep 14, 2021 at 8:20 AM Eugen Block  wrote:
> >
> > Hi,
> >
> > consider yourself lucky that you haven't had a host failure. But I
> > would not draw the wrong conclusions here and change the
> > failure-domain based on luck.
> > In our production cluster we have an EC pool for archive purposes, it
> > all went well for quite some time and last Sunday one of the hosts
> > suddenly failed, we're still investigating the root cause. Our
> > failure-domain is host and I'm glad that we chose a suitable EC
> > profile for that, the cluster is healthy.
> >
> > > Also what is the "optimal" like 12:3 or ?
> >
> > You should evaluate that the other way around. What are your specific
> > requirements regarding resiliency (how many hosts can fail at the same
> > time without data loss)? How many hosts are available? Are you
> > planning to expand in the near future? Based on this evaluation you
> > can conclude a few options and choose the best for your requirements.
> >
> > Regards,
> > Eugen
> >
> >
> > Zitat von "Szabo, Istvan (Agoda)" :
> >
> > > Hi,
> > >
> > > What's your take on an osd based ec-code setup? I've never been
> > > brave enough to use OSD based crush rule because scared host failure
> > > but in the last 4 years we have never had any host issue so I'm
> > > thinking to change to there and use some more cost effective EC.
> > >
> > > Also what is the "optimal" like 12:3 or ?
> > >
> > > Thank you
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > > email to ceph-users-le...@ceph.io
> >
> >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
> ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD based ec-code

2021-09-14 Thread David Orman
Keep in mind performance, as well. Once you start getting into higher
'k' values with EC, you've got a lot more drives involved that need to
return completions for operations, and on rotational drives this
becomes especially painful. We use 8+3 for a lot of our purposes, as
it's a good balance of efficiency, durability (number of complete host
failures we can tolerate), and enough performance. It's definitely
significantly slower than something like 4+2 or 3x replicated, though.
It also means we don't deploy clusters below 14 hosts, so we can
tolerate multiple host failures _and still accept writes_. It never
fails that you have a host issue, and while working on that, another
host dies. Same lessons many learn with RAIDs with single drive
redundancy - lose a drive, start a rebuild, another drive fails and
data gone. It's almost always the correct response to err on the side
of durability when it comes to these decisions, unless the data is
unimportant and maximum performance is required.

On Tue, Sep 14, 2021 at 8:20 AM Eugen Block  wrote:
>
> Hi,
>
> consider yourself lucky that you haven't had a host failure. But I
> would not draw the wrong conclusions here and change the
> failure-domain based on luck.
> In our production cluster we have an EC pool for archive purposes, it
> all went well for quite some time and last Sunday one of the hosts
> suddenly failed, we're still investigating the root cause. Our
> failure-domain is host and I'm glad that we chose a suitable EC
> profile for that, the cluster is healthy.
>
> > Also what is the "optimal" like 12:3 or ?
>
> You should evaluate that the other way around. What are your specific
> requirements regarding resiliency (how many hosts can fail at the same
> time without data loss)? How many hosts are available? Are you
> planning to expand in the near future? Based on this evaluation you
> can conclude a few options and choose the best for your requirements.
>
> Regards,
> Eugen
>
>
> Zitat von "Szabo, Istvan (Agoda)" :
>
> > Hi,
> >
> > What's your take on an osd based ec-code setup? I've never been
> > brave enough to use OSD based crush rule because scared host failure
> > but in the last 4 years we have never had any host issue so I'm
> > thinking to change to there and use some more cost effective EC.
> >
> > Also what is the "optimal" like 12:3 or ?
> >
> > Thank you
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph progress bar stuck and 3rd manager not deploying

2021-09-09 Thread David Orman
No problem, and it looks like they will. Glad it worked out for you!

David

On Thu, Sep 9, 2021 at 9:31 AM mabi  wrote:
>
> Thank you Eugen. Indeed the answer went to Spam :(
>
> So thanks to David for his workaround, it worked like a charm. Hopefully 
> these patches can make it into the next pacific release.
>
> ‐‐‐ Original Message ‐‐‐
>
> On Thursday, September 9th, 2021 at 2:33 PM, Eugen Block  
> wrote:
>
> > You must have missed the response to your thread, I suppose:
> >
> > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/YA5KLI5MFJRKVQBKUBG7PJG4RFYLBZFA/
> >
> > Zitat von mabi m...@protonmail.ch:
> >
> > > Hello,
> > >
> > > A few days later the ceph status progress bar is still stuck and the
> > >
> > > third mon is for some unknown reason still not deploying itself as
> > >
> > > can be seen from the "ceph orch ls" output below:
> > >
> > > ceph orch ls
> > >
> > > NAME PORTS RUNNING REFRESHED AGE PLACEMENT
> > >
> > > alertmanager ?:9093,9094 1/1 3m ago 5w count:1
> > >
> > > crash 7/7 3m ago 5w *
> > >
> > > grafana ?:3000 1/1 3m ago 5w count:1
> > >
> > > mgr 2/2 3m ago 4w count:2;label:mgr
> > >
> > > mon 2/3 3m ago 16h count:3;label:mon
> > >
> > > node-exporter ?:9100 7/7 3m ago 5w *
> > >
> > > osd 1/1 3m ago - 
> > >
> > > prometheus ?:9095 1/1 3m ago 5w count:1
> > >
> > > Is this a bug in cephadm? and is there a workaround?
> > >
> > > Thanks for any hints.
> > >
> > > ‐‐‐ Original Message ‐‐‐
> > >
> > > On Tuesday, September 7th, 2021 at 2:30 PM, mabi m...@protonmail.ch wrote:
> > >
> > > > Hello
> > > >
> > > > I have a test ceph octopus 16.2.5 cluster with cephadm out of 7
> > > >
> > > > nodes on Ubuntu 20.04 LTS bare metal. I just upgraded each node's
> > > >
> > > > kernel and performed a rolling reboot and now the ceph -s output is
> > > >
> > > > stuck somehow and the manager service is only deployed to two nodes
> > > >
> > > > instead of 3 nodes. Here would be the ceph -s output:
> > > >
> > > > cluster:
> > > >
> > > > id: fb48d256-f43d-11eb-9f74-7fd39d4b232a
> > > >
> > > > health: HEALTH_WARN
> > > >
> > > > OSD count 1 < osd_pool_default_size 3
> > > >
> > > > services:
> > > >
> > > > mon: 2 daemons, quorum ceph1a,ceph1c (age 25m)
> > > >
> > > > mgr: ceph1a.guidwn(active, since 25m), standbys: ceph1c.bttxuu
> > > >
> > > > osd: 1 osds: 1 up (since 30m), 1 in (since 3w)
> > > >
> > > > data:
> > > >
> > > > pools: 0 pools, 0 pgs
> > > >
> > > > objects: 0 objects, 0 B
> > > >
> > > > usage: 5.3 MiB used, 7.0 TiB / 7.0 TiB avail
> > > >
> > > > pgs:
> > > >
> > > > progress:
> > > >
> > > > Updating crash deployment (-1 -> 6) (0s)
> > > >
> > > >   []
> > > >
> > > >
> > > > Ignore the HEALTH_WARN with of the OSD count because I have not
> > > >
> > > > finished to deploy all 3 OSDs. But you can see that the progress
> > > >
> > > > bar is stuck and I have only 2 managers, the third manager does not
> > > >
> > > > seem to start as can be seen here:
> > > >
> > > > $ ceph orch ps|grep stopped
> > > >
> > > > mon.ceph1b ceph1b stopped 4m ago 4w - 2048M   
> > > > 
> > > >
> > > > It looks like the orchestrator is stuck and does not continue it's
> > > >
> > > > job. Any idea how I can get it unstuck?
> > > >
> > > > Best regards,
> > > >
> > > > Mabi
> > >
> > > ceph-users mailing list -- ceph-users@ceph.io
> > >
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> > ceph-users mailing list -- ceph-users@ceph.io
> >
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Smarter DB disk replacement

2021-09-09 Thread David Orman
Exactly, we minimize the blast radius/data destruction by allocating
more devices for DB/WAL of smaller size than less of larger size. We
encountered this same issue on an earlier iteration of our hardware
design. With rotational drives and NVMEs, we are now aiming for a 6:1
ratio based on our CRUSH rules/rotational disk sizing/nvme
sizing/server sizing/EC setup/etc.

Make sure to use write-friendly NVMEs for DB/WAL and the failures
should be much fewer and further between.

On Thu, Sep 9, 2021 at 9:11 AM Janne Johansson  wrote:
>
> Den tors 9 sep. 2021 kl 16:09 skrev Michal Strnad :
> >  When the disk with DB died
> > it will cause inaccessibility of all depended OSDs (six or eight in our
> > environment),
> > How do you do it in your environment?
>
> Have two ssds for 8 OSDs, so only half go away when one ssd dies.
>
> --
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph progress bar stuck and 3rd manager not deploying

2021-09-08 Thread David Orman
I forgot to mention, the progress not updating is a seperate bug, you
can fail the mgr (ceph mgr fail ceph1a.guidwn in your example) to
resolve that. On the monitor side, I assume you deployed using labels?
If so - just remove the label from the host where the monitor did not
start, let it fully undeploy, then re-add the label, and it will
redeploy.

On Wed, Sep 8, 2021 at 7:03 AM David Orman  wrote:
>
> This sounds a lot like: https://tracker.ceph.com/issues/51027 which is
> fixed in https://github.com/ceph/ceph/pull/42690
>
> David
>
> On Tue, Sep 7, 2021 at 7:31 AM mabi  wrote:
> >
> > Hello
> >
> > I have a test ceph octopus 16.2.5 cluster with cephadm out of 7 nodes on 
> > Ubuntu 20.04 LTS bare metal. I just upgraded each node's kernel and 
> > performed a rolling reboot and now the ceph -s output is stuck somehow and 
> > the manager service is only deployed to two nodes instead of 3 nodes. Here 
> > would be the ceph -s output:
> >
> >   cluster:
> > id: fb48d256-f43d-11eb-9f74-7fd39d4b232a
> > health: HEALTH_WARN
> > OSD count 1 < osd_pool_default_size 3
> >
> >   services:
> > mon: 2 daemons, quorum ceph1a,ceph1c (age 25m)
> > mgr: ceph1a.guidwn(active, since 25m), standbys: ceph1c.bttxuu
> > osd: 1 osds: 1 up (since 30m), 1 in (since 3w)
> >
> >   data:
> > pools:   0 pools, 0 pgs
> > objects: 0 objects, 0 B
> > usage:   5.3 MiB used, 7.0 TiB / 7.0 TiB avail
> > pgs:
> >
> >   progress:
> > Updating crash deployment (-1 -> 6) (0s)
> >   []
> >
> > Ignore the HEALTH_WARN with of the OSD count because I have not finished to 
> > deploy all 3 OSDs. But you can see that the progress bar is stuck and I 
> > have only 2 managers, the third manager does not seem to start as can be 
> > seen here:
> >
> > $ ceph orch ps|grep stopped
> > mon.ceph1bceph1b   stopped   4m ago   4w
> > -2048M 
> >
> > It looks like the orchestrator is stuck and does not continue it's job. Any 
> > idea how I can get it unstuck?
> >
> > Best regards,
> > Mabi
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph progress bar stuck and 3rd manager not deploying

2021-09-08 Thread David Orman
This sounds a lot like: https://tracker.ceph.com/issues/51027 which is
fixed in https://github.com/ceph/ceph/pull/42690

David

On Tue, Sep 7, 2021 at 7:31 AM mabi  wrote:
>
> Hello
>
> I have a test ceph octopus 16.2.5 cluster with cephadm out of 7 nodes on 
> Ubuntu 20.04 LTS bare metal. I just upgraded each node's kernel and performed 
> a rolling reboot and now the ceph -s output is stuck somehow and the manager 
> service is only deployed to two nodes instead of 3 nodes. Here would be the 
> ceph -s output:
>
>   cluster:
> id: fb48d256-f43d-11eb-9f74-7fd39d4b232a
> health: HEALTH_WARN
> OSD count 1 < osd_pool_default_size 3
>
>   services:
> mon: 2 daemons, quorum ceph1a,ceph1c (age 25m)
> mgr: ceph1a.guidwn(active, since 25m), standbys: ceph1c.bttxuu
> osd: 1 osds: 1 up (since 30m), 1 in (since 3w)
>
>   data:
> pools:   0 pools, 0 pgs
> objects: 0 objects, 0 B
> usage:   5.3 MiB used, 7.0 TiB / 7.0 TiB avail
> pgs:
>
>   progress:
> Updating crash deployment (-1 -> 6) (0s)
>   []
>
> Ignore the HEALTH_WARN with of the OSD count because I have not finished to 
> deploy all 3 OSDs. But you can see that the progress bar is stuck and I have 
> only 2 managers, the third manager does not seem to start as can be seen here:
>
> $ ceph orch ps|grep stopped
> mon.ceph1bceph1b   stopped   4m ago   4w  
>   -2048M 
>
> It looks like the orchestrator is stuck and does not continue it's job. Any 
> idea how I can get it unstuck?
>
> Best regards,
> Mabi
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm cannot aquire lock

2021-09-02 Thread David Orman
It may be this:

https://tracker.ceph.com/issues/50526
https://github.com/alfredodeza/remoto/issues/62

Which we resolved with: https://github.com/alfredodeza/remoto/pull/63

What version of ceph are you running, and is it impacted by the above?

David

On Thu, Sep 2, 2021 at 9:53 AM fcid  wrote:
>
> Hi Sebastian,
>
> Following your sugestion, I've found this process:
>
> /usr/bin/python3
> /var/lib/ceph//cephadm.f77d9d71514a634758d4ad41ab6eef36d25386c99d8b365310ad41f9b74d5ce6
> --image
> ceph/ceph@sha256:9b04c0f15704c49591640a37c7adfd40ffad0a4b42fecb950c3407687cb4f29a
> ceph-volume --fsid  -- lvm list --format json
>
> That process have been running for more than 12 hours, so I killed it
> and then cephadm could aquire lock. Shortly after the process starts
> again and I can see that it is running on all the nodes (we have 3
> nodes). I tried executing the same sentence in all the nodes, from the
> command line, and it works fine, here is the output
> https://pastebin.com/v58Nyxdx.
>
> What can be causing this process to be stuck when it is launched by the
> orchestrator, since launching it from the command line works fine?
>
> Thank you, kind regards.
>
> On 02/09/2021 05:19, Sebastian Wagner wrote:
> >
> > Am 31.08.21 um 04:05 schrieb fcid:
> >> Hi ceph community,
> >>
> >> I'm having some trouble trying to delete an OSD.
> >>
> >> I've been using cephadm in one of our clusters and it's works fine,
> >> but lately, after an OSD failure, I cannot delete it using the
> >> orchestrator. Since the orchestrator is not working (for some unknown
> >> reason) I tried to manually delete the OSD using the following command:
> >>
> >> ceph purge osd  --yes-i-really-mean-it
> >>
> >> This command removed the OSD from the crush map, but then the warning
> >> CEPHADM_FAILED_DEAMON appeared. So the next step is delete de daemon
> >> in the server that use to host the failed OSD. The command I used
> >> here was the following:
> >>
> >> cephadm rm-daemon --name osd. --fsid 
> >>
> >> But this command does not work because, accoding to the log, cephadm
> >> cannot aquire lock:
> >>
> >> 2021-08-30 21:50:09,712 DEBUG Lock 139899822730784 not acquired on
> >> /run/cephadm/$FSID.lock, waiting 0.05 seconds ...
> >> 2021-08-30 21:50:09,762 DEBUG Acquiring lock 139899822730784 on
> >> /run/cephadm/$FSID.lock
> >> 2021-08-30 21:50:09,763 DEBUG Lock 139899822730784 not acquired on
> >> /run/cephadm/$FSID.lock, waiting 0.05 seconds ...
> >>
> >> The file /run/cephadm/$FSID.lock does exist. Can I safely remove it?
> >> What should I check before doing such task.
> >
> > Yes, in case you're sure that no other cephadm process (i.e. call
> > `ps`) is stuck.
> >
> >>
> >> I'll really appreciate any hint you can give relating this matter.
> >>
> >> Thanks! regards.
> >>
> >
> --
> AltaVoz 
> Fernando Cid
> Ingeniero de Operaciones
> www.altavoz.net 
> Ubicación AltaVoz
> Viña del Mar: 2 Poniente 355 of 53
>  | +56 32 276 8060
> 
> Santiago: Antonio Bellet 292 of 701
>  | +56 2 2585 4264
> 
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Missing OSD in SSD after disk failure

2021-08-30 Thread David Orman
I may have misread your original email, for which I apologize. If you
do a 'ceph orch device ls' does the NVME in question show available?
On that host with the failed OSD, if you lvs/lsblk do you see the old
DB on the NVME still? I'm not sure if the replacement process you
followed will work. Here's what we do on OSD pre-failure as well as
failures on nodes with NVME backing the OSD for DB/WAL:

In cephadm shell, on host with drive to replace (in this example,
let's say 391 on a node called ceph15):

# capture "db device" and raw device associated with OSD (just for safety)
ceph-volume lvm list | less

# drain drive if possible, do this when planning replacement,
otherwise do once failure has occurred
ceph orch osd rm 391 --replace

# One drained (or if failure occurred) (we don't use the orch version
yet because we've had issues with it)
ceph-volume lvm zap --osd-id 391 --destroy

# refresh devices
ceph orch device ls --refresh

# monitor ceph for replacement
ceph -W cephadm

# once daemon has been deployed "2021-03-25T18:03:16.742483+
mgr.ceph02.duoetc [INF] Deploying daemon osd.391 on ceph15", watch for
rebalance to complete
ceph -s

# consider increasing max_backfills if it's just a single drive replacement:
ceph config set osd osd_max_backfills 10

# if you do, after backfilling is complete (validate with 'ceph -s'):
ceph config rm osd osd_max_backfills

The lvm zap cleans up the db/wal LV, which allows for the replacement
drive to rebuild with db/wal on the NVME.

Hope this helps,
David

On Fri, Aug 27, 2021 at 7:21 PM Eric Fahnle  wrote:
>
> Hi David! Very much appreciated your response.
>
> I'm not sure that may be the problem. I tried with the following (without 
> using "rotational"):
>
> ...(snip)...
> data_devices:
>size: "15G:"
> db_devices:
>size: ":15G"
> filter_logic: AND
> placement:
>   label: "osdj2"
> service_id: test_db_device
> service_type: osd
> ...(snip)...
>
> Without success. Also tried without the "filter_logic: AND" in the yaml file 
> and the result was the same.
>
> Best regards,
> Eric
>
>
> -Original Message-
> From: David Orman [mailto:orma...@corenode.com]
> Sent: 27 August 2021 14:56
> To: Eric Fahnle
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Missing OSD in SSD after disk failure
>
> This was a bug in some versions of ceph, which has been fixed:
>
> https://tracker.ceph.com/issues/49014
> https://github.com/ceph/ceph/pull/39083
>
> You'll want to upgrade Ceph to resolve this behavior, or you can use size or 
> something else to filter if that is not possible.
>
> David
>
> On Thu, Aug 19, 2021 at 9:12 AM Eric Fahnle  wrote:
> >
> > Hi everyone!
> > I've got a doubt, I tried searching for it in this list, but didn't find an 
> > answer.
> >
> > I've got 4 OSD servers. Each server has 4 HDDs and 1 NVMe SSD disk. The 
> > deployment was done with "ceph orch apply deploy-osd.yaml", in which the 
> > file "deploy-osd.yaml" contained the following:
> > ---
> > service_type: osd
> > service_id: default_drive_group
> > placement:
> >   label: "osd"
> > data_devices:
> >   rotational: 1
> > db_devices:
> >   rotational: 0
> >
> > After the deployment, each HDD had an OSD and the NVMe shared the 4 OSDs, 
> > plus the DB.
> >
> > A few days ago, an HDD broke and got replaced. Ceph detected the new disk 
> > and created a new OSD for the HDD but didn't use the NVMe. Now the NVMe in 
> > that server has 3 OSDs running but didn't add the new one. I couldn't find 
> > out how to re-create the OSD with the exact configuration it had before. 
> > The only "way" I found was to delete all 4 OSDs and create everything from 
> > scratch (I didn't actually do it, as I hope there is a better way).
> >
> > Has anyone had this issue before? I'd be glad if someone pointed me in the 
> > right direction.
> >
> > Currently running:
> > Version
> > 15.2.8
> > octopus (stable)
> >
> > Thank you in advance and best regards, Eric
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Missing OSD in SSD after disk failure

2021-08-27 Thread David Orman
This was a bug in some versions of ceph, which has been fixed:

https://tracker.ceph.com/issues/49014
https://github.com/ceph/ceph/pull/39083

You'll want to upgrade Ceph to resolve this behavior, or you can use
size or something else to filter if that is not possible.

David

On Thu, Aug 19, 2021 at 9:12 AM Eric Fahnle  wrote:
>
> Hi everyone!
> I've got a doubt, I tried searching for it in this list, but didn't find an 
> answer.
>
> I've got 4 OSD servers. Each server has 4 HDDs and 1 NVMe SSD disk. The 
> deployment was done with "ceph orch apply deploy-osd.yaml", in which the file 
> "deploy-osd.yaml" contained the following:
> ---
> service_type: osd
> service_id: default_drive_group
> placement:
>   label: "osd"
> data_devices:
>   rotational: 1
> db_devices:
>   rotational: 0
>
> After the deployment, each HDD had an OSD and the NVMe shared the 4 OSDs, 
> plus the DB.
>
> A few days ago, an HDD broke and got replaced. Ceph detected the new disk and 
> created a new OSD for the HDD but didn't use the NVMe. Now the NVMe in that 
> server has 3 OSDs running but didn't add the new one. I couldn't find out how 
> to re-create the OSD with the exact configuration it had before. The only 
> "way" I found was to delete all 4 OSDs and create everything from scratch (I 
> didn't actually do it, as I hope there is a better way).
>
> Has anyone had this issue before? I'd be glad if someone pointed me in the 
> right direction.
>
> Currently running:
> Version
> 15.2.8
> octopus (stable)
>
> Thank you in advance and best regards,
> Eric
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Pacific mon is not starting after host reboot

2021-08-12 Thread David Orman
https://github.com/ceph/ceph/pull/42690 looks like it might be a fix,
but it's pending review.

On Thu, Aug 12, 2021 at 7:46 AM André Gemünd
 wrote:
>
> We're seeing the same here with v16.2.5 on CentOS 8.3
>
> Do you know of any progress?
>
> Best Greetings
> André
>
> - Am 9. Aug 2021 um 18:15 schrieb David Orman orma...@corenode.com:
>
> > Hi,
> >
> > We are seeing very similar behavior on 16.2.5, and also have noticed
> > that an undeploy/deploy cycle fixes things. Before we go rummaging
> > through the source code trying to determine the root cause, has
> > anybody else figured this out? It seems odd that a repeatable issue
> > (I've seen other mailing list posts about this same issue) impacting
> > 16.2.4/16.2.5, at least, on reboots hasn't been addressed yet, so
> > wanted to check.
> >
> > Here's one of the other thread titles that appears related:
> > "[ceph-users] mons assigned via orch label 'committing suicide' upon
> > reboot."
> >
> > Respectfully,
> > David
> >
> >
> > On Sun, May 23, 2021 at 3:40 AM Adrian Nicolae
> >  wrote:
> >>
> >> Hi guys,
> >>
> >> I'm testing Ceph Pacific 16.2.4 in my lab before deciding if I will put
> >> it in production on a 1PB+ storage cluster with rgw-only access.
> >>
> >> I noticed a weird issue with my mons :
> >>
> >> - if I reboot a mon host, the ceph-mon container is not starting after
> >> reboot
> >>
> >> - I can see with 'ceph orch ps' the following output :
> >>
> >> mon.node01   node01   running (20h)   4m ago
> >> 20h   16.2.4 8d91d370c2b8  0a2e86af94b2
> >> mon.node02   node02   running (115m)  12s ago
> >> 115m  16.2.4 8d91d370c2b8  51f4885a1b06
> >> mon.node03   node03   stopped 4m ago
> >> 19h  
> >>
> >> (where node03 is the host which was rebooted).
> >>
> >> - I tried to start the mon container manually on node03 with '/bin/bash
> >> /var/lib/ceph/c2d41ac4-baf5-11eb-865d-2dc838a337a3/mon.node03/unit.run'
> >> and I've got the following output :
> >>
> >> debug 2021-05-23T08:24:25.192+ 7f9a9e358700  0
> >> mon.node03@-1(???).osd e408 crush map has features 3314933069573799936,
> >> adjusting msgr requires
> >> debug 2021-05-23T08:24:25.192+ 7f9a9e358700  0
> >> mon.node03@-1(???).osd e408 crush map has features 43262930805112,
> >> adjusting msgr requires
> >> debug 2021-05-23T08:24:25.192+ 7f9a9e358700  0
> >> mon.node03@-1(???).osd e408 crush map has features 43262930805112,
> >> adjusting msgr requires
> >> debug 2021-05-23T08:24:25.192+ 7f9a9e358700  0
> >> mon.node03@-1(???).osd e408 crush map has features 43262930805112,
> >> adjusting msgr requires
> >> cluster 2021-05-23T08:07:12.189243+ mgr.node01.ksitls (mgr.14164)
> >> 36380 : cluster [DBG] pgmap v36392: 417 pgs: 417 active+clean; 33 KiB
> >> data, 605 MiB used, 651 GiB / 652 GiB avail; 9.6 KiB/s rd, 0 B/s wr, 15 
> >> op/s
> >> debug 2021-05-23T08:24:25.196+ 7f9a9e358700  1
> >> mon.node03@-1(???).paxosservice(auth 1..51) refresh upgraded, format 0 -> 3
> >> debug 2021-05-23T08:24:25.208+ 7f9a88176700  1 heartbeat_map
> >> reset_timeout 'Monitor::cpu_tp thread 0x7f9a88176700' had timed out
> >> after 0.0s
> >> debug 2021-05-23T08:24:25.208+ 7f9a9e358700  0
> >> mon.node03@-1(probing) e5  my rank is now 1 (was -1)
> >> debug 2021-05-23T08:24:25.212+ 7f9a87975700  0 mon.node03@1(probing)
> >> e6  removed from monmap, suicide.
> >>
> >> root@node03:/home/adrian# systemctl status
> >> ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3@mon.node03.service
> >> ● ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3@mon.node03.service - Ceph
> >> mon.node03 for c2d41ac4-baf5-11eb-865d-2dc838a337a3
> >>   Loaded: loaded
> >> (/etc/systemd/system/ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3@.service;
> >> enabled; vendor preset: enabled)
> >>   Active: inactive (dead) since Sun 2021-05-23 08:10:00 UTC; 16min ago
> >>  Process: 1176 ExecStart=/bin/bash
> >> /var/lib/ceph/c2d41ac4-baf5-11eb-865d-2dc838a337a3/mon.node03/unit.run
> >> (code=exited, status=0/SUCCESS)
> >>  Process: 1855 ExecStop=/usr/bin/docker stop
> >> ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3-mon.node03 (code=exited,
> >> status=1/F

[ceph-users] Re: Ceph Pacific mon is not starting after host reboot

2021-08-10 Thread David Orman
Just adding our feedback - this is affecting us as well. We reboot
periodically to test durability of the clusters we run, and this is
fairly impactful. I could see power loss/other scenarios in which this
could end quite poorly for those with less than perfect redundancy in
DCs across multiple racks/PDUs/etc. I see
https://github.com/ceph/ceph/pull/42690 has been submitted, but I'd
definitely make an argument for it being a 'very high' priority, so it
hopefully gets a review in time for 16.2.6. :)

David

On Tue, Aug 10, 2021 at 4:36 AM Sebastian Wagner  wrote:
>
> Good morning Robert,
>
> Am 10.08.21 um 09:53 schrieb Robert Sander:
> > Hi,
> >
> > Am 09.08.21 um 20:44 schrieb Adam King:
> >
> >> This issue looks the same as https://tracker.ceph.com/issues/51027
> >> which is
> >> being worked on. Essentially, it seems that hosts that were being
> >> rebooted
> >> were temporarily marked as offline and cephadm had an issue where it
> >> would
> >> try to remove all daemons (outside of osds I believe) from offline
> >> hosts.
> >
> > Sorry for maybe being rude but how on earth does one come up with the
> > idea to automatically remove components from a cluster where just one
> > node is currently rebooting without any operator interference?
>
> Obviously no one :-). We already have over 750 tests for the cephadm
> scheduler and I can foresee that we'll get some additional ones for this
> case as well.
>
> Kind regards,
>
> Sebastian
>
>
> >
> > Regards
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Pacific mon is not starting after host reboot

2021-08-09 Thread David Orman
Hi,

We are seeing very similar behavior on 16.2.5, and also have noticed
that an undeploy/deploy cycle fixes things. Before we go rummaging
through the source code trying to determine the root cause, has
anybody else figured this out? It seems odd that a repeatable issue
(I've seen other mailing list posts about this same issue) impacting
16.2.4/16.2.5, at least, on reboots hasn't been addressed yet, so
wanted to check.

Here's one of the other thread titles that appears related:
"[ceph-users] mons assigned via orch label 'committing suicide' upon
reboot."

Respectfully,
David


On Sun, May 23, 2021 at 3:40 AM Adrian Nicolae
 wrote:
>
> Hi guys,
>
> I'm testing Ceph Pacific 16.2.4 in my lab before deciding if I will put
> it in production on a 1PB+ storage cluster with rgw-only access.
>
> I noticed a weird issue with my mons :
>
> - if I reboot a mon host, the ceph-mon container is not starting after
> reboot
>
> - I can see with 'ceph orch ps' the following output :
>
> mon.node01   node01   running (20h)   4m ago
> 20h   16.2.4 8d91d370c2b8  0a2e86af94b2
> mon.node02   node02   running (115m)  12s ago
> 115m  16.2.4 8d91d370c2b8  51f4885a1b06
> mon.node03   node03   stopped 4m ago
> 19h  
>
> (where node03 is the host which was rebooted).
>
> - I tried to start the mon container manually on node03 with '/bin/bash
> /var/lib/ceph/c2d41ac4-baf5-11eb-865d-2dc838a337a3/mon.node03/unit.run'
> and I've got the following output :
>
> debug 2021-05-23T08:24:25.192+ 7f9a9e358700  0
> mon.node03@-1(???).osd e408 crush map has features 3314933069573799936,
> adjusting msgr requires
> debug 2021-05-23T08:24:25.192+ 7f9a9e358700  0
> mon.node03@-1(???).osd e408 crush map has features 43262930805112,
> adjusting msgr requires
> debug 2021-05-23T08:24:25.192+ 7f9a9e358700  0
> mon.node03@-1(???).osd e408 crush map has features 43262930805112,
> adjusting msgr requires
> debug 2021-05-23T08:24:25.192+ 7f9a9e358700  0
> mon.node03@-1(???).osd e408 crush map has features 43262930805112,
> adjusting msgr requires
> cluster 2021-05-23T08:07:12.189243+ mgr.node01.ksitls (mgr.14164)
> 36380 : cluster [DBG] pgmap v36392: 417 pgs: 417 active+clean; 33 KiB
> data, 605 MiB used, 651 GiB / 652 GiB avail; 9.6 KiB/s rd, 0 B/s wr, 15 op/s
> debug 2021-05-23T08:24:25.196+ 7f9a9e358700  1
> mon.node03@-1(???).paxosservice(auth 1..51) refresh upgraded, format 0 -> 3
> debug 2021-05-23T08:24:25.208+ 7f9a88176700  1 heartbeat_map
> reset_timeout 'Monitor::cpu_tp thread 0x7f9a88176700' had timed out
> after 0.0s
> debug 2021-05-23T08:24:25.208+ 7f9a9e358700  0
> mon.node03@-1(probing) e5  my rank is now 1 (was -1)
> debug 2021-05-23T08:24:25.212+ 7f9a87975700  0 mon.node03@1(probing)
> e6  removed from monmap, suicide.
>
> root@node03:/home/adrian# systemctl status
> ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3@mon.node03.service
> ● ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3@mon.node03.service - Ceph
> mon.node03 for c2d41ac4-baf5-11eb-865d-2dc838a337a3
>   Loaded: loaded
> (/etc/systemd/system/ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3@.service;
> enabled; vendor preset: enabled)
>   Active: inactive (dead) since Sun 2021-05-23 08:10:00 UTC; 16min ago
>  Process: 1176 ExecStart=/bin/bash
> /var/lib/ceph/c2d41ac4-baf5-11eb-865d-2dc838a337a3/mon.node03/unit.run
> (code=exited, status=0/SUCCESS)
>  Process: 1855 ExecStop=/usr/bin/docker stop
> ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3-mon.node03 (code=exited,
> status=1/FAILURE)
>  Process: 1861 ExecStopPost=/bin/bash
> /var/lib/ceph/c2d41ac4-baf5-11eb-865d-2dc838a337a3/mon.node03/unit.poststop
> (code=exited, status=0/SUCCESS)
> Main PID: 1176 (code=exited, status=0/SUCCESS)
>
> The only fix I could find was to redeploy the mon with :
>
> ceph orch daemon rm  mon.node03 --force
> ceph orch daemon add mon node03
>
> However, even if it's working after redeploy, it's not giving me a lot
> of trust to use it in a production environment having an issue like
> that.  I could reproduce it with 2 different mons so it's not just an
> exception.
>
> My setup is based on Ubuntu 20.04 and docker instead of podman :
>
> root@node01:~# docker -v
> Docker version 20.10.6, build 370c289
>
> Do you know a workaround for this issue or is this a known bug ? I
> noticed that there are some other complaints with the same behaviour in
> Octopus as well and the solution at that time was to delete the
> /var/lib/ceph/mon folder .
>
>
> Thanks.
>
>
>
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Having issues to start more than 24 OSDs per host

2021-06-22 Thread David Orman
https://tracker.ceph.com/issues/50526
https://github.com/alfredodeza/remoto/issues/62

If you're brave (YMMV, test first non-prod), we pushed an image with
the issue we encountered fixed as per above here:
https://hub.docker.com/repository/docker/ormandj/ceph/tags?page=1 that
you can use to install with.

I'm not sure when the next release is due out (I'm a little confused
why a breaking install/upgrade issue like this has been allowed to
sit), but it should include this fix, as well as others.

David


On Tue, Jun 22, 2021 at 1:16 AM  wrote:
>
> Hello
>
> We did try to use Cephadm with Podman to start 44 OSDs per host which 
> consistently stop after adding 24 OSDs per host.
> We did look into the cephadm.log on the problematic host and saw that the 
> command `cephadm ceph-volume lvm list --format json` did stuck.
> We were the output of the command wasn't complete. Therefore, we tried to use 
> compacted JSON and we could increase the number to 36 OSDs per host.
>
> If you need more information just ask.
>
>
> Podman version: 3.2.1
> Ceph version: 16.2.4
> OS version: Suse Leap 15.3
>
> Greetings,
> Jan
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Managers dieing?

2021-06-17 Thread David Orman
Hi Peter,

We fixed this bug: https://tracker.ceph.com/issues/47738 recently
here: 
https://github.com/ceph/ceph/commit/b4316d257e928b3789b818054927c2e98bb3c0d6
which should hopefully be in the next release(s).

David

On Thu, Jun 17, 2021 at 12:13 PM Peter Childs  wrote:
>
> Found the issue in the end  I'd managed to kill the autoscaling features by
> playing with pgp_num and pg_num and it was getting confusing. I fixed it in
> the end by reducing pg_num on some of my test pools and the manager woke up
> and started working again.
>
> It was not clear as to what I'd done to kill it but once I'd figured out
> what was crashing it was possible to figure out what was going to help it.
>
> So I've just learnt, Don't play with pgp_num and pg_num and let the
> autoscaling feature just work. setting the target size or ratio is probably
> better.
>
> I like ceph its very different to Spectrum Scale which I've used for years,
> but for now its different tools to resolve different issues.
>
> Must get around to doing something with what I've learnt so far.
>
> Peter
>
> On Thu, 17 Jun 2021 at 17:53, Eugen Block  wrote:
>
> > Hi,
> >
> > don't give up on Ceph. ;-)
> >
> > Did you try any of the steps from the troubleshooting section [1] to
> > gather some events and logs? Could you share them, and maybe also some
> > more details about that cluster? Did you enable any non-default mgr
> > modules? There have been a couple reports related to mgr modules.
> >
> > Regards
> > Eugen
> >
> > [1] https://docs.ceph.com/en/latest/cephadm/troubleshooting/
> >
> >
> > Zitat von Peter Childs :
> >
> > > Lets try to stop this message turning into a mass moaning session about
> > > Ceph and try and get this newbie able to use it.
> > >
> > > I've got a Ceph Octopus cluster, its relatively new and deployed using
> > > cephadm.
> > >
> > > It was working fine, but now the managers start up run for about 30
> > seconds
> > > and then die, until systemctl gives up and I have to reset-fail them to
> > get
> > > them to try again, when they fail.
> > >
> > > How do I work out why and get them working again?
> > >
> > > I've got 21 nodes and was looking to take it up to 32 over the next few
> > > weeks, but that is going to be difficult if the managers are not working.
> > >
> > > I did try Pacific and I'm happy to upgrade but that failed to deploy more
> > > than 6 osd's and I gave up and went back to Octopus.
> > >
> > > I'm about to give up on Ceph because it looks like its really really
> > > "fragile" and debugging what's going wrong is really difficult.
> > >
> > > I guess I could give up on cephadm and go with a different provisioning
> > > method but I'm not sure where to start on that.
> > >
> > > Thanks in advance.
> > >
> > > Peter.
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Fwd: Re: Ceph osd will not start.

2021-06-01 Thread David Orman
ormandj/ceph:v16.2.4-mgrfix <-- pushed to dockerhub.

Try bootstrap with: --image "docker.io/ormandj/ceph:v16.2.4-mgrfix" if
you want to give it a shot, or you can set CEPHADM_IMAGE. We think
these should both work during any cephadm command, even if the
documentation doesn't make it clear.

On Tue, Jun 1, 2021 at 2:30 AM David Orman  wrote:
>
> I do not believe it was in 16.2.4. I will build another patched version of 
> the image tomorrow based on that version. I do agree, I feel this breaks new 
> deploys as well as existing, and hope a point release will come soon that 
> includes the fix.
>
> On May 31, 2021, at 15:33, Marco Pizzolo  wrote:
>
> 
> David,
>
> What I can confirm is that if this fix is already in 16.2.4 and 15.2.13, then 
> there's another issue resulting in the same situation, as it continues to 
> happen in the latest available images.
> We are going to try and see if we can install a 15.2.x release and 
> subsequently upgrade using a fixed image.  We were not finding a good way to 
> bootstrap directly with a custom image, but maybe we missed something.  
> cephadm bootstrap command didn't seem to support image path.
>
> Thanks for your help thus far.  I'll update later today or tomorrow when we 
> get the chance to go the upgrade route.
>
> Seems tragic that when an all-stopping, immediately reproducible issue such 
> as this occurs, adopters are allowed to flounder for so long.  Ceph has had a 
> tremendously positive impact for us since we began using it in 
> luminous/mimic, but situations such as this are hard to look past.  It's 
> really unfortunate as our existing production clusters have been rock solid 
> thus far, but this does shake one's confidence, and I would wager that I'm 
> not alone.
>
> Marco
>
>
>
>
>
>
> On Mon, May 31, 2021 at 3:57 PM David Orman  wrote:
>>
>> Does the image we built fix the problem for you? That's how we worked
>> around it. Unfortunately, it even bites you with less OSDs if you have
>> DB/WAL on other devices, we have 24 rotational drives/OSDs, but split
>> DB/WAL onto multiple NVMEs. We're hoping the remoto fix (since it's
>> merged upstream and pushed) will land in the next point release of
>> 16.x (and it sounds like 15.x), since this is a blocking issue without
>> using patched containers. I guess testing isn't done against clusters
>> with these kinds of configurations, as we can replicate it on any of
>> our dev/test clusters with this type of drive configuration. We
>> weren't able to upgrade any clusters/deploy new hosts on any clusters,
>> so it caused quite an issue until we figured out the problem and
>> resolved it.
>>
>> If you want to build your own images, this is the simple Dockerfile we
>> used to get beyond this issue:
>>
>> $ cat Dockerfile
>> FROM docker.io/ceph/ceph:v16.2.3
>> COPY process.py /lib/python3.6/site-packages/remoto/process.py
>>
>> The process.py is the patched version we submitted here:
>> https://github.com/alfredodeza/remoto/pull/63/commits/6f98078a1479de1f246f971f311146a3c1605494
>> (merged upstream).
>>
>> Hope this helps,
>> David
>>
>> On Mon, May 31, 2021 at 11:43 AM Marco Pizzolo  
>> wrote:
>> >
>> > Unfortunately Ceph 16.2.4 is still not working for us.  We continue to 
>> > have issues where the 26th OSD is not fully created and started.  We've 
>> > confirmed that we do get the flock as described in:
>> >
>> > https://tracker.ceph.com/issues/50526
>> >
>> > -
>> >
>> > I have verified in our labs a way to reproduce easily the problem:
>> >
>> > 0. Please stop the cephadm orchestrator:
>> >
>> > In your bootstrap node:
>> >
>> > # cephadm shell
>> > # ceph mgr module disable cephadm
>> >
>> > 1. In one of the hosts where you want to create osds and you have a big 
>> > amount of devices:
>> >
>> > See if you have a "cephadm" filelock:
>> > for example:
>> >
>> > # lslocks | grep cephadm
>> > python3 1098782  FLOCK   0B WRITE 0 0   0 
>> > /run/cephadm/9fa2b396-adb5-11eb-a2d3-bc97e17cf960.lock
>> >
>> > if that is the case. just kill the process to start with a "clean" 
>> > situation
>> >
>> > 2. Go to the folder: /var/lib/ceph/
>> >
>> > you will find there a file called "cephadm.xx".
>> >
>> > execute:
>> >
>> > # python3 cephadm.xx ceph-volume inventory
>> >
>> > 3. If

[ceph-users] Re: Fwd: Re: Ceph osd will not start.

2021-06-01 Thread David Orman
I do not believe it was in 16.2.4. I will build another patched version of the 
image tomorrow based on that version. I do agree, I feel this breaks new 
deploys as well as existing, and hope a point release will come soon that 
includes the fix.

> On May 31, 2021, at 15:33, Marco Pizzolo  wrote:
> 
> 
> David,
> 
> What I can confirm is that if this fix is already in 16.2.4 and 15.2.13, then 
> there's another issue resulting in the same situation, as it continues to 
> happen in the latest available images.
> We are going to try and see if we can install a 15.2.x release and 
> subsequently upgrade using a fixed image.  We were not finding a good way to 
> bootstrap directly with a custom image, but maybe we missed something.  
> cephadm bootstrap command didn't seem to support image path.
> 
> Thanks for your help thus far.  I'll update later today or tomorrow when we 
> get the chance to go the upgrade route.
> 
> Seems tragic that when an all-stopping, immediately reproducible issue such 
> as this occurs, adopters are allowed to flounder for so long.  Ceph has had a 
> tremendously positive impact for us since we began using it in 
> luminous/mimic, but situations such as this are hard to look past.  It's 
> really unfortunate as our existing production clusters have been rock solid 
> thus far, but this does shake one's confidence, and I would wager that I'm 
> not alone.
> 
> Marco
> 
>  
> 
> 
>   
> 
>> On Mon, May 31, 2021 at 3:57 PM David Orman  wrote:
>> Does the image we built fix the problem for you? That's how we worked
>> around it. Unfortunately, it even bites you with less OSDs if you have
>> DB/WAL on other devices, we have 24 rotational drives/OSDs, but split
>> DB/WAL onto multiple NVMEs. We're hoping the remoto fix (since it's
>> merged upstream and pushed) will land in the next point release of
>> 16.x (and it sounds like 15.x), since this is a blocking issue without
>> using patched containers. I guess testing isn't done against clusters
>> with these kinds of configurations, as we can replicate it on any of
>> our dev/test clusters with this type of drive configuration. We
>> weren't able to upgrade any clusters/deploy new hosts on any clusters,
>> so it caused quite an issue until we figured out the problem and
>> resolved it.
>> 
>> If you want to build your own images, this is the simple Dockerfile we
>> used to get beyond this issue:
>> 
>> $ cat Dockerfile
>> FROM docker.io/ceph/ceph:v16.2.3
>> COPY process.py /lib/python3.6/site-packages/remoto/process.py
>> 
>> The process.py is the patched version we submitted here:
>> https://github.com/alfredodeza/remoto/pull/63/commits/6f98078a1479de1f246f971f311146a3c1605494
>> (merged upstream).
>> 
>> Hope this helps,
>> David
>> 
>> On Mon, May 31, 2021 at 11:43 AM Marco Pizzolo  
>> wrote:
>> >
>> > Unfortunately Ceph 16.2.4 is still not working for us.  We continue to 
>> > have issues where the 26th OSD is not fully created and started.  We've 
>> > confirmed that we do get the flock as described in:
>> >
>> > https://tracker.ceph.com/issues/50526
>> >
>> > -
>> >
>> > I have verified in our labs a way to reproduce easily the problem:
>> >
>> > 0. Please stop the cephadm orchestrator:
>> >
>> > In your bootstrap node:
>> >
>> > # cephadm shell
>> > # ceph mgr module disable cephadm
>> >
>> > 1. In one of the hosts where you want to create osds and you have a big 
>> > amount of devices:
>> >
>> > See if you have a "cephadm" filelock:
>> > for example:
>> >
>> > # lslocks | grep cephadm
>> > python3 1098782  FLOCK   0B WRITE 0 0   0 
>> > /run/cephadm/9fa2b396-adb5-11eb-a2d3-bc97e17cf960.lock
>> >
>> > if that is the case. just kill the process to start with a "clean" 
>> > situation
>> >
>> > 2. Go to the folder: /var/lib/ceph/
>> >
>> > you will find there a file called "cephadm.xx".
>> >
>> > execute:
>> >
>> > # python3 cephadm.xx ceph-volume inventory
>> >
>> > 3. If the problem is present in your cephadm file, you will have the 
>> > command blocked and you will see again a cephadm filelock
>> >
>> > 4. In the case that the modification was not present. Change your 
>> > cephadm.xx file to include the modification I did (is just to 
>> > remove the verbosity parameter in the cal

[ceph-users] Re: Fwd: Re: Ceph osd will not start.

2021-05-31 Thread David Orman
Does the image we built fix the problem for you? That's how we worked
around it. Unfortunately, it even bites you with less OSDs if you have
DB/WAL on other devices, we have 24 rotational drives/OSDs, but split
DB/WAL onto multiple NVMEs. We're hoping the remoto fix (since it's
merged upstream and pushed) will land in the next point release of
16.x (and it sounds like 15.x), since this is a blocking issue without
using patched containers. I guess testing isn't done against clusters
with these kinds of configurations, as we can replicate it on any of
our dev/test clusters with this type of drive configuration. We
weren't able to upgrade any clusters/deploy new hosts on any clusters,
so it caused quite an issue until we figured out the problem and
resolved it.

If you want to build your own images, this is the simple Dockerfile we
used to get beyond this issue:

$ cat Dockerfile
FROM docker.io/ceph/ceph:v16.2.3
COPY process.py /lib/python3.6/site-packages/remoto/process.py

The process.py is the patched version we submitted here:
https://github.com/alfredodeza/remoto/pull/63/commits/6f98078a1479de1f246f971f311146a3c1605494
(merged upstream).

Hope this helps,
David

On Mon, May 31, 2021 at 11:43 AM Marco Pizzolo  wrote:
>
> Unfortunately Ceph 16.2.4 is still not working for us.  We continue to have 
> issues where the 26th OSD is not fully created and started.  We've confirmed 
> that we do get the flock as described in:
>
> https://tracker.ceph.com/issues/50526
>
> -
>
> I have verified in our labs a way to reproduce easily the problem:
>
> 0. Please stop the cephadm orchestrator:
>
> In your bootstrap node:
>
> # cephadm shell
> # ceph mgr module disable cephadm
>
> 1. In one of the hosts where you want to create osds and you have a big 
> amount of devices:
>
> See if you have a "cephadm" filelock:
> for example:
>
> # lslocks | grep cephadm
> python3 1098782  FLOCK   0B WRITE 0 0   0 
> /run/cephadm/9fa2b396-adb5-11eb-a2d3-bc97e17cf960.lock
>
> if that is the case. just kill the process to start with a "clean" situation
>
> 2. Go to the folder: /var/lib/ceph/
>
> you will find there a file called "cephadm.xx".
>
> execute:
>
> # python3 cephadm.xx ceph-volume inventory
>
> 3. If the problem is present in your cephadm file, you will have the command 
> blocked and you will see again a cephadm filelock
>
> 4. In the case that the modification was not present. Change your 
> cephadm.xx file to include the modification I did (is just to remove 
> the verbosity parameter in the call_throws call)
>
> https://github.com/ceph/ceph/blob/2f4dc3147712f1991242ef0d059690b5fa3d8463/src/cephadm/cephadm#L4576
>
> go to step 1, to clean the filelock and try again... with the modification in 
> place it must work.
>
> -
>
> For us, it takes a few seconds but then the manual execution does come back, 
> and there are no file locks, however we remain unable to add any further OSDs.
>
> Furthermore, this is happening as part of the creation of a new Pacific 
> Cluster creation post bootstrap and adding one OSD daemon at a time and 
> allowing each OSD to be created, set in, and brought up.
>
> How is everyone else managing to get past this, or are we the only ones 
> (aside from David) using >25 OSDs per host?
>
> Our luck has been the same with 15.2.13 and 16.2.4, and using both Docker and 
> Podman on Ubuntu 20.04.2
>
> Thanks,
> Marco
>
>
>
> On Sun, May 30, 2021 at 7:33 AM Peter Childs  wrote:
>>
>> I've actually managed to get a little further with my problem.
>>
>> As I've said before these servers are slightly distorted in config.
>>
>> 63 drives and only 48g if memory.
>>
>> Once I create about 15-20 osds it continues to format the disks but won't 
>> actually create the containers or start any service.
>>
>> Worse than that on reboot the disks disappear, not stop working but not 
>> detected by Linux, which makes me think I'm hitting some kernel limit.
>>
>> At this point I'm going to cut my loses and give up and use the small 
>> slightly more powerful 30x drive systems I have (with 256g memory), maybe 
>> transplanting the larger disks if I need more capacity.
>>
>> Peter
>>
>> On Sat, 29 May 2021, 23:19 Marco Pizzolo,  wrote:
>>>
>>> Thanks David
>>> We will investigate the bugs as per your suggestion, and then will look to 
>>> test with the custom image.
>>>
>>> Appreciate it.
>>>
>>> On Sat, May 29, 2021, 4:11 PM David Orman  wrote:
>>>>
>>>> You may be running into the same issue we ran into (make sure 

[ceph-users] Re: Fwd: Re: Ceph osd will not start.

2021-05-29 Thread David Orman
You may be running into the same issue we ran into (make sure to read
the first issue, there's a few mingled in there), for which we
submitted a patch:

https://tracker.ceph.com/issues/50526
https://github.com/alfredodeza/remoto/issues/62

If you're brave (YMMV, test first non-prod), we pushed an image with
the issue we encountered fixed as per above here:
https://hub.docker.com/repository/docker/ormandj/ceph/tags?page=1 . We
'upgraded' to this when we encountered the mgr hanging on us after
updating ceph to v16 and experiencing this issue using: "ceph orch
upgrade start --image docker.io/ormandj/ceph:v16.2.3-mgrfix". I've not
tried to boostrap a new cluster with a custom image, and I don't know
when 16.2.4 will be released with this change (hopefully) integrated
as remoto accepted the patch upstream.

I'm not sure if this is your exact issue, see the bug reports and see
if you see the lock/the behavior matches, if so - then it may help you
out. The only change in that image is that patch to remoto being
overlaid on the default 16.2.3 image.

On Fri, May 28, 2021 at 1:15 PM Marco Pizzolo  wrote:
>
> Peter,
>
> We're seeing the same issues as you are.  We have 2 new hosts Intel(R)
> Xeon(R) Gold 6248R CPU @ 3.00GHz w/ 48 cores, 384GB RAM, and 60x 10TB SED
> drives and we have tried both 15.2.13 and 16.2.4
>
> Cephadm does NOT properly deploy and activate OSDs on Ubuntu 20.04.2 with
> Docker.
>
> Seems to be a bug in Cephadm and a product regression, as we have 4 near
> identical nodes on Centos running Nautilus (240 x 10TB SED drives) and had
> no problems.
>
> FWIW we had no luck yet with one-by-one OSD daemon additions through ceph
> orch either.  We also reproduced the issue easily in a virtual lab using
> small virtual disks on a single ceph VM with 1 mon.
>
> We are now looking into whether we can get past this with a manual buildout.
>
> If you, or anyone, has hit the same stumbling block and gotten past it, I
> would really appreciate some guidance.
>
> Thanks,
> Marco
>
> On Thu, May 27, 2021 at 2:23 PM Peter Childs  wrote:
>
> > In the end it looks like I might be able to get the node up to about 30
> > odds before it stops creating any more.
> >
> > Or more it formats the disks but freezes up starting the daemons.
> >
> > I suspect I'm missing somthing I can tune to get it working better.
> >
> > If I could see any error messages that might help, but I'm yet to spit
> > anything.
> >
> > Peter.
> >
> > On Wed, 26 May 2021, 10:57 Eugen Block,  wrote:
> >
> > > > If I add the osd daemons one at a time with
> > > >
> > > > ceph orch daemon add osd drywood12:/dev/sda
> > > >
> > > > It does actually work,
> > >
> > > Great!
> > >
> > > > I suspect what's happening is when my rule for creating osds run and
> > > > creates them all-at-once it ties the orch it overloads cephadm and it
> > > can't
> > > > cope.
> > >
> > > It's possible, I guess.
> > >
> > > > I suspect what I might need to do at least to work around the issue is
> > > set
> > > > "limit:" and bring it up until it stops working.
> > >
> > > It's worth a try, yes, although the docs state you should try to avoid
> > > it, it's possible that it doesn't work properly, in that case create a
> > > bug report. ;-)
> > >
> > > > I did work out how to get ceph-volume to nearly work manually.
> > > >
> > > > cephadm shell
> > > > ceph auth get client.bootstrap-osd -o
> > > > /var/lib/ceph/bootstrap-osd/ceph.keyring
> > > > ceph-volume lvm create --data /dev/sda --dmcrypt
> > > >
> > > > but given I've now got "add osd" to work, I suspect I just need to fine
> > > > tune my osd creation rules, so it does not try and create too many osds
> > > on
> > > > the same node at the same time.
> > >
> > > I agree, no need to do it manually if there is an automated way,
> > > especially if you're trying to bring up dozens of OSDs.
> > >
> > >
> > > Zitat von Peter Childs :
> > >
> > > > After a bit of messing around. I managed to get it somewhat working.
> > > >
> > > > If I add the osd daemons one at a time with
> > > >
> > > > ceph orch daemon add osd drywood12:/dev/sda
> > > >
> > > > It does actually work,
> > > >
> > > > I suspect what's happening is when my rule for creating osds run and
> > > > creates them all-at-once it ties the orch it overloads cephadm and it
> > > can't
> > > > cope.
> > > >
> > > > service_type: osd
> > > > service_name: osd.drywood-disks
> > > > placement:
> > > >   host_pattern: 'drywood*'
> > > > spec:
> > > >   data_devices:
> > > > size: "7TB:"
> > > >   objectstore: bluestore
> > > >
> > > > I suspect what I might need to do at least to work around the issue is
> > > set
> > > > "limit:" and bring it up until it stops working.
> > > >
> > > > I did work out how to get ceph-volume to nearly work manually.
> > > >
> > > > cephadm shell
> > > > ceph auth get client.bootstrap-osd -o
> > > > /var/lib/ceph/bootstrap-osd/ceph.keyring
> > > > ceph-volume lvm create --data /dev/sda --dmcrypt
> > > >
> > > > but given I've now 

[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-26 Thread David Orman
We've found that after doing the osd rm, you can use: "ceph-volume lvm
zap --osd-id 178 --destroy" on the server with that OSD as per:
https://docs.ceph.com/en/latest/ceph-volume/lvm/zap/#removing-devices
and it will clean things up so they work as expected.

On Tue, May 25, 2021 at 6:51 AM Kai Stian Olstad  wrote:
>
> Hi
>
> The server run 15.2.9 and has 15 HDD and 3 SSD.
> The OSDs was created with this YAML file
>
> hdd.yml
> 
> service_type: osd
> service_id: hdd
> placement:
>host_pattern: 'pech-hd-*'
> data_devices:
>rotational: 1
> db_devices:
>rotational: 0
>
>
> The result was that the 3 SSD is added to 1 VG with 15 LV on it.
>
> # vgs | egrep "VG|dbs"
>VG  #PV #LV #SN Attr
> VSize  VFree
>ceph-block-dbs-563432b7-f52d-4cfe-b952-11542594843b   3  15   0 wz--n-
> <5.24t 48.00m
>
>
> One of the osd failed and I run rm with replace
>
> # ceph orch osd rm 178 --replace
>
> and the result is
>
> # ceph osd tree | grep "ID|destroyed"
> ID   CLASS  WEIGHT  TYPE NAME STATUS REWEIGHT
> PRI-AFF
> 178hdd12.82390  osd.178   destroyed 0
> 1.0
>
>
> But I'm not able to replace the disk with the same YAML file as shown
> above.
>
>
> # ceph orch apply osd -i hdd.yml --dry-run
> 
> OSDSPEC PREVIEWS
> 
> +-+--+--+--++-+
> |SERVICE  |NAME  |HOST  |DATA  |DB  |WAL  |
> +-+--+--+--++-+
> +-+--+--+--++-+
>
> I guess this is the wrong way to do it, but I can't find the answer in
> the documentation.
> So how can I replace this failed disk in Cephadm?
>
>
> --
> Kai Stian Olstad
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.3 issues during upgrade from 15.2.10 with cephadm/lvm list

2021-05-14 Thread David Orman
We've created a PR to fix the root cause of this issue:
https://github.com/alfredodeza/remoto/pull/63

Thank you,
David

On Mon, May 10, 2021 at 7:29 PM David Orman  wrote:
>
> Hi Sage,
>
> We've got 2.0.27 installed. I restarted all the manager pods, just in
> case, and I have the same behavior afterwards.
>
> David
>
> On Mon, May 10, 2021 at 6:53 PM Sage Weil  wrote:
> >
> > The root cause is a bug in conmon.  If you can upgrade to >= 2.0.26
> > this will also fix the problem.  What version are you using?  The
> > kubic repos currently have 2.0.27.  See
> > https://build.opensuse.org/project/show/devel:kubic:libcontainers:stable
> >
> > We'll make sure the next release has the verbosity workaround!
> >
> > sage
> >
> > On Mon, May 10, 2021 at 5:47 PM David Orman  wrote:
> > >
> > > I think I may have found the issue:
> > >
> > > https://tracker.ceph.com/issues/50526
> > > It seems it may be fixed in: https://github.com/ceph/ceph/pull/41045
> > >
> > > I hope this can be prioritized as an urgent fix as it's broken
> > > upgrades on clusters of a relatively normal size (14 nodes, 24x OSDs,
> > > 2x NVME for DB/WAL w/ 12 OSDs per NVME), even when new OSDs are not
> > > being deployed, as it still tries to apply the OSD specification.
> > >
> > > On Mon, May 10, 2021 at 4:03 PM David Orman  wrote:
> > > >
> > > > Hi,
> > > >
> > > > We are seeing the mgr attempt to apply our OSD spec on the various
> > > > hosts, then block. When we investigate, we see the mgr has executed
> > > > cephadm calls like so, which are blocking:
> > > >
> > > > root 1522444  0.0  0.0 102740 23216 ?S17:32   0:00
> > > >  \_ /usr/bin/python3
> > > > /var/lib/ceph/X/cephadm.30cb78bdbbafb384af862e1c2292b944f15942b586128e91262b43e91e11ae90
> > > > --image 
> > > > docker.io/ceph/ceph@sha256:694ba9cdcbe6cb7d25ab14b34113c42c2d1af18d4c79c7ba4d1f62cf43d145fe
> > > > ceph-volume --fsid X -- lvm list --format json
> > > >
> > > > This occurs on all hosts in the cluster, following
> > > > starting/restarting/failing over a manager. It's blocking an
> > > > in-progress upgrade post-manager updates on one cluster, currently.
> > > >
> > > > Looking at the cephadm logs on the host(s) in question, we see the
> > > > last entry appears to be truncated, like:
> > > >
> > > > 2021-05-10 17:32:06,471 INFO /usr/bin/podman:
> > > > "ceph.db_uuid": "1n2f5v-EEgO-1Kn6-hQd2-v5QF-AN9o-XPkL6b",
> > > > 2021-05-10 17:32:06,471 INFO /usr/bin/podman:
> > > > "ceph.encrypted": "0",
> > > > 2021-05-10 17:32:06,471 INFO /usr/bin/podman:
> > > > "ceph.osd_fsid": "",
> > > > 2021-05-10 17:32:06,471 INFO /usr/bin/podman:
> > > > "ceph.osd_id": "205",
> > > > 2021-05-10 17:32:06,471 INFO /usr/bin/podman:
> > > > "ceph.osdspec_affinity": "osd_spec",
> > > > 2021-05-10 17:32:06,471 INFO /usr/bin/podman:
> > > > "ceph.type": "block",
> > > >
> > > > The previous entry looks like this:
> > > >
> > > > 2021-05-10 17:32:06,469 INFO /usr/bin/podman:
> > > > "ceph.db_uuid": "TMTPD5-MLqp-06O2-raqp-S8o5-TfRG-hbFmpu",
> > > > 2021-05-10 17:32:06,469 INFO /usr/bin/podman:
> > > > "ceph.encrypted": "0",
> > > > 2021-05-10 17:32:06,469 INFO /usr/bin/podman:
> > > > "ceph.osd_fsid": "",
> > > > 2021-05-10 17:32:06,469 INFO /usr/bin/podman:
> > > > "ceph.osd_id": "195",
> > > > 2021-05-10 17:32:06,470 INFO /usr/bin/podman:
> > > > "ceph.osdspec_affinity": "osd_spec",
> > > > 2021-05-10 17:32:06,470 INFO /usr/bin/podman:
> > > > "ceph.type": "block",
> > > > 2021-05-10 17:32:06,470 INFO /usr/bin/podman: 
> > > > "ceph.vdo": "0"
> > > > 2021-05-10 17:32:06,470 INFO /usr/bin/podman: },
> > > > 2021-05-10 17:32:06,470 INFO /usr/bin/podman: "type": 
> > > > "block",
> > > > 2021-05-10 17:32:06,470 INFO /usr/bin/podman: "vg_name":
> > > > "ceph-ffd1a4a7-316c-4c85-acde-06459e26f2c4"
> > > > 2021-05-10 17:32:06,470 INFO /usr/bin/podman: }
> > > > 2021-05-10 17:32:06,470 INFO /usr/bin/podman: ],
> > > >
> > > > We'd like to get to the bottom of this, please let us know what other
> > > > information we can provide.
> > > >
> > > > Thank you,
> > > > David
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.3 issues during upgrade from 15.2.10 with cephadm/lvm list

2021-05-10 Thread David Orman
Hi Sage,

We've got 2.0.27 installed. I restarted all the manager pods, just in
case, and I have the same behavior afterwards.

David

On Mon, May 10, 2021 at 6:53 PM Sage Weil  wrote:
>
> The root cause is a bug in conmon.  If you can upgrade to >= 2.0.26
> this will also fix the problem.  What version are you using?  The
> kubic repos currently have 2.0.27.  See
> https://build.opensuse.org/project/show/devel:kubic:libcontainers:stable
>
> We'll make sure the next release has the verbosity workaround!
>
> sage
>
> On Mon, May 10, 2021 at 5:47 PM David Orman  wrote:
> >
> > I think I may have found the issue:
> >
> > https://tracker.ceph.com/issues/50526
> > It seems it may be fixed in: https://github.com/ceph/ceph/pull/41045
> >
> > I hope this can be prioritized as an urgent fix as it's broken
> > upgrades on clusters of a relatively normal size (14 nodes, 24x OSDs,
> > 2x NVME for DB/WAL w/ 12 OSDs per NVME), even when new OSDs are not
> > being deployed, as it still tries to apply the OSD specification.
> >
> > On Mon, May 10, 2021 at 4:03 PM David Orman  wrote:
> > >
> > > Hi,
> > >
> > > We are seeing the mgr attempt to apply our OSD spec on the various
> > > hosts, then block. When we investigate, we see the mgr has executed
> > > cephadm calls like so, which are blocking:
> > >
> > > root 1522444  0.0  0.0 102740 23216 ?S17:32   0:00
> > >  \_ /usr/bin/python3
> > > /var/lib/ceph/X/cephadm.30cb78bdbbafb384af862e1c2292b944f15942b586128e91262b43e91e11ae90
> > > --image 
> > > docker.io/ceph/ceph@sha256:694ba9cdcbe6cb7d25ab14b34113c42c2d1af18d4c79c7ba4d1f62cf43d145fe
> > > ceph-volume --fsid X -- lvm list --format json
> > >
> > > This occurs on all hosts in the cluster, following
> > > starting/restarting/failing over a manager. It's blocking an
> > > in-progress upgrade post-manager updates on one cluster, currently.
> > >
> > > Looking at the cephadm logs on the host(s) in question, we see the
> > > last entry appears to be truncated, like:
> > >
> > > 2021-05-10 17:32:06,471 INFO /usr/bin/podman:
> > > "ceph.db_uuid": "1n2f5v-EEgO-1Kn6-hQd2-v5QF-AN9o-XPkL6b",
> > > 2021-05-10 17:32:06,471 INFO /usr/bin/podman:
> > > "ceph.encrypted": "0",
> > > 2021-05-10 17:32:06,471 INFO /usr/bin/podman:
> > > "ceph.osd_fsid": "",
> > > 2021-05-10 17:32:06,471 INFO /usr/bin/podman:
> > > "ceph.osd_id": "205",
> > > 2021-05-10 17:32:06,471 INFO /usr/bin/podman:
> > > "ceph.osdspec_affinity": "osd_spec",
> > > 2021-05-10 17:32:06,471 INFO /usr/bin/podman:
> > > "ceph.type": "block",
> > >
> > > The previous entry looks like this:
> > >
> > > 2021-05-10 17:32:06,469 INFO /usr/bin/podman:
> > > "ceph.db_uuid": "TMTPD5-MLqp-06O2-raqp-S8o5-TfRG-hbFmpu",
> > > 2021-05-10 17:32:06,469 INFO /usr/bin/podman:
> > > "ceph.encrypted": "0",
> > > 2021-05-10 17:32:06,469 INFO /usr/bin/podman:
> > > "ceph.osd_fsid": "",
> > > 2021-05-10 17:32:06,469 INFO /usr/bin/podman:
> > > "ceph.osd_id": "195",
> > > 2021-05-10 17:32:06,470 INFO /usr/bin/podman:
> > > "ceph.osdspec_affinity": "osd_spec",
> > > 2021-05-10 17:32:06,470 INFO /usr/bin/podman:
> > > "ceph.type": "block",
> > > 2021-05-10 17:32:06,470 INFO /usr/bin/podman: "ceph.vdo": 
> > > "0"
> > > 2021-05-10 17:32:06,470 INFO /usr/bin/podman: },
> > > 2021-05-10 17:32:06,470 INFO /usr/bin/podman: "type": "block",
> > > 2021-05-10 17:32:06,470 INFO /usr/bin/podman: "vg_name":
> > > "ceph-ffd1a4a7-316c-4c85-acde-06459e26f2c4"
> > > 2021-05-10 17:32:06,470 INFO /usr/bin/podman: }
> > > 2021-05-10 17:32:06,470 INFO /usr/bin/podman: ],
> > >
> > > We'd like to get to the bottom of this, please let us know what other
> > > information we can provide.
> > >
> > > Thank you,
> > > David
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.3 issues during upgrade from 15.2.10 with cephadm/lvm list

2021-05-10 Thread David Orman
I think I may have found the issue:

https://tracker.ceph.com/issues/50526
It seems it may be fixed in: https://github.com/ceph/ceph/pull/41045

I hope this can be prioritized as an urgent fix as it's broken
upgrades on clusters of a relatively normal size (14 nodes, 24x OSDs,
2x NVME for DB/WAL w/ 12 OSDs per NVME), even when new OSDs are not
being deployed, as it still tries to apply the OSD specification.

On Mon, May 10, 2021 at 4:03 PM David Orman  wrote:
>
> Hi,
>
> We are seeing the mgr attempt to apply our OSD spec on the various
> hosts, then block. When we investigate, we see the mgr has executed
> cephadm calls like so, which are blocking:
>
> root 1522444  0.0  0.0 102740 23216 ?S17:32   0:00
>  \_ /usr/bin/python3
> /var/lib/ceph/X/cephadm.30cb78bdbbafb384af862e1c2292b944f15942b586128e91262b43e91e11ae90
> --image 
> docker.io/ceph/ceph@sha256:694ba9cdcbe6cb7d25ab14b34113c42c2d1af18d4c79c7ba4d1f62cf43d145fe
> ceph-volume --fsid X -- lvm list --format json
>
> This occurs on all hosts in the cluster, following
> starting/restarting/failing over a manager. It's blocking an
> in-progress upgrade post-manager updates on one cluster, currently.
>
> Looking at the cephadm logs on the host(s) in question, we see the
> last entry appears to be truncated, like:
>
> 2021-05-10 17:32:06,471 INFO /usr/bin/podman:
> "ceph.db_uuid": "1n2f5v-EEgO-1Kn6-hQd2-v5QF-AN9o-XPkL6b",
> 2021-05-10 17:32:06,471 INFO /usr/bin/podman:
> "ceph.encrypted": "0",
> 2021-05-10 17:32:06,471 INFO /usr/bin/podman:
> "ceph.osd_fsid": "",
> 2021-05-10 17:32:06,471 INFO /usr/bin/podman:
> "ceph.osd_id": "205",
> 2021-05-10 17:32:06,471 INFO /usr/bin/podman:
> "ceph.osdspec_affinity": "osd_spec",
> 2021-05-10 17:32:06,471 INFO /usr/bin/podman:
> "ceph.type": "block",
>
> The previous entry looks like this:
>
> 2021-05-10 17:32:06,469 INFO /usr/bin/podman:
> "ceph.db_uuid": "TMTPD5-MLqp-06O2-raqp-S8o5-TfRG-hbFmpu",
> 2021-05-10 17:32:06,469 INFO /usr/bin/podman:
> "ceph.encrypted": "0",
> 2021-05-10 17:32:06,469 INFO /usr/bin/podman:
> "ceph.osd_fsid": "",
> 2021-05-10 17:32:06,469 INFO /usr/bin/podman:
> "ceph.osd_id": "195",
> 2021-05-10 17:32:06,470 INFO /usr/bin/podman:
> "ceph.osdspec_affinity": "osd_spec",
> 2021-05-10 17:32:06,470 INFO /usr/bin/podman:
> "ceph.type": "block",
> 2021-05-10 17:32:06,470 INFO /usr/bin/podman: "ceph.vdo": "0"
> 2021-05-10 17:32:06,470 INFO /usr/bin/podman: },
> 2021-05-10 17:32:06,470 INFO /usr/bin/podman: "type": "block",
> 2021-05-10 17:32:06,470 INFO /usr/bin/podman: "vg_name":
> "ceph-ffd1a4a7-316c-4c85-acde-06459e26f2c4"
> 2021-05-10 17:32:06,470 INFO /usr/bin/podman: }
> 2021-05-10 17:32:06,470 INFO /usr/bin/podman: ],
>
> We'd like to get to the bottom of this, please let us know what other
> information we can provide.
>
> Thank you,
> David
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph 16.2.3 issues during upgrade from 15.2.10 with cephadm/lvm list

2021-05-10 Thread David Orman
Hi,

We are seeing the mgr attempt to apply our OSD spec on the various
hosts, then block. When we investigate, we see the mgr has executed
cephadm calls like so, which are blocking:

root 1522444  0.0  0.0 102740 23216 ?S17:32   0:00
 \_ /usr/bin/python3
/var/lib/ceph/X/cephadm.30cb78bdbbafb384af862e1c2292b944f15942b586128e91262b43e91e11ae90
--image 
docker.io/ceph/ceph@sha256:694ba9cdcbe6cb7d25ab14b34113c42c2d1af18d4c79c7ba4d1f62cf43d145fe
ceph-volume --fsid X -- lvm list --format json

This occurs on all hosts in the cluster, following
starting/restarting/failing over a manager. It's blocking an
in-progress upgrade post-manager updates on one cluster, currently.

Looking at the cephadm logs on the host(s) in question, we see the
last entry appears to be truncated, like:

2021-05-10 17:32:06,471 INFO /usr/bin/podman:
"ceph.db_uuid": "1n2f5v-EEgO-1Kn6-hQd2-v5QF-AN9o-XPkL6b",
2021-05-10 17:32:06,471 INFO /usr/bin/podman:
"ceph.encrypted": "0",
2021-05-10 17:32:06,471 INFO /usr/bin/podman:
"ceph.osd_fsid": "",
2021-05-10 17:32:06,471 INFO /usr/bin/podman:
"ceph.osd_id": "205",
2021-05-10 17:32:06,471 INFO /usr/bin/podman:
"ceph.osdspec_affinity": "osd_spec",
2021-05-10 17:32:06,471 INFO /usr/bin/podman:
"ceph.type": "block",

The previous entry looks like this:

2021-05-10 17:32:06,469 INFO /usr/bin/podman:
"ceph.db_uuid": "TMTPD5-MLqp-06O2-raqp-S8o5-TfRG-hbFmpu",
2021-05-10 17:32:06,469 INFO /usr/bin/podman:
"ceph.encrypted": "0",
2021-05-10 17:32:06,469 INFO /usr/bin/podman:
"ceph.osd_fsid": "",
2021-05-10 17:32:06,469 INFO /usr/bin/podman:
"ceph.osd_id": "195",
2021-05-10 17:32:06,470 INFO /usr/bin/podman:
"ceph.osdspec_affinity": "osd_spec",
2021-05-10 17:32:06,470 INFO /usr/bin/podman:
"ceph.type": "block",
2021-05-10 17:32:06,470 INFO /usr/bin/podman: "ceph.vdo": "0"
2021-05-10 17:32:06,470 INFO /usr/bin/podman: },
2021-05-10 17:32:06,470 INFO /usr/bin/podman: "type": "block",
2021-05-10 17:32:06,470 INFO /usr/bin/podman: "vg_name":
"ceph-ffd1a4a7-316c-4c85-acde-06459e26f2c4"
2021-05-10 17:32:06,470 INFO /usr/bin/podman: }
2021-05-10 17:32:06,470 INFO /usr/bin/podman: ],

We'd like to get to the bottom of this, please let us know what other
information we can provide.

Thank you,
David
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stuck OSD service specification - can't remove

2021-05-10 Thread David Orman
This turns out to be worse than we thought. We attempted another Ceph
upgrade (15.2.10->16.2.3) on another cluster, and have run into this
again. We're seeing strange behavior with the OSD specifications,
which also have a count that is #OSDs + #hosts, so for example, on a
504 OSD cluster (21 nodes of 24 OSDs), we see:

osd.osd_spec504/5256s   *

It never deletes, and we cannot apply a specification over it (we
attempt, and it stays in deleting state - and a --export does not show
any specification).

On 15.2.10 we didn't have this problem, it appears new in 16.2.x. We
are using 16.2.3.

Thanks,
David


On Fri, May 7, 2021 at 9:06 AM David Orman  wrote:
>
> Hi,
>
> I'm not attempting to remove the OSDs, but instead the
> service/placement specification. I want the OSDs/data to persist.
> --force did not work on the service, as noted in the original email.
>
> Thank you,
> David
>
> On Fri, May 7, 2021 at 1:36 AM mabi  wrote:
> >
> > Hi David,
> >
> > I had a similar issue yesterday where I wanted to remove an OSD on an OSD 
> > node which had 2 OSDs so for that I used "ceph orch osd rm" command which 
> > completed successfully but after rebooting that OSD node I saw it was still 
> > trying to start the systemd service for that OSD and one CPU core was 100% 
> > busy trying to do a "crun delete" which I suppose here is trying to delete 
> > an image or container. So what I did here is to kill this process and I 
> > also had to run the following command:
> >
> > ceph orch daemon rm osd.3 --force
> >
> > After that everything was fine again. This is a Ceph 15.2.11 cluster on 
> > Ubuntu 20.04 and podman.
> >
> > Hope that helps.
> >
> > ‐‐‐ Original Message ‐‐‐
> > On Friday, May 7, 2021 1:24 AM, David Orman  wrote:
> >
> > > Has anybody run into a 'stuck' OSD service specification? I've tried
> > > to delete it, but it's stuck in 'deleting' state, and has been for
> > > quite some time (even prior to upgrade, on 15.2.x). This is on 16.2.3:
> > >
> > > NAME PORTS RUNNING REFRESHED AGE PLACEMENT
> > > osd.osd_spec 504/525  12m label:osd
> > > root@ceph01:/# ceph orch rm osd.osd_spec
> > > Removed service osd.osd_spec
> > >
> > > From active monitor:
> > >
> > > debug 2021-05-06T23:14:48.909+ 7f17d310b700 0
> > > log_channel(cephadm) log [INF] : Remove service osd.osd_spec
> > >
> > > Yet in ls, it's still there, same as above. --export on it:
> > >
> > > root@ceph01:/# ceph orch ls osd.osd_spec --export
> > > service_type: osd
> > > service_id: osd_spec
> > > service_name: osd.osd_spec
> > > placement: {}
> > > unmanaged: true
> > > spec:
> > > filter_logic: AND
> > > objectstore: bluestore
> > >
> > > We've tried --force, as well, with no luck.
> > >
> > > To be clear, the --export even prior to delete looks nothing like the
> > > actual service specification we're using, even after I re-apply it, so
> > > something seems 'bugged'. Here's the OSD specification we're applying:
> > >
> > > service_type: osd
> > > service_id: osd_spec
> > > placement:
> > > label: "osd"
> > > data_devices:
> > > rotational: 1
> > > db_devices:
> > > rotational: 0
> > > db_slots: 12
> > >
> > > I would appreciate any insight into how to clear this up (without
> > > removing the actual OSDs, we're just wanting to apply the updated
> > > service specification - we used to use host placement rules and are
> > > switching to label-based).
> > >
> > > Thanks,
> > > David
> > >
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: x-amz-request-id logging with beast + rgw (ceph 15.2.10/containerized)?

2021-05-07 Thread David Orman
Has anyone figured out an elegant way to emit this from inside cephadm
managed/containerized ceph, so it can be handled via the host's
journald and processed/shipped? We had gone down that path before, but
decided to hold off on the suggestion that the LUA-based scripting
might be a better option.

David

On Fri, May 7, 2021 at 4:21 PM Matt Benjamin  wrote:
>
> Hi David,
>
> I think the solution is most likely the ops log.  It is called for
> every op, and has the transaction id.
>
> Matt
>
> On Fri, May 7, 2021 at 4:58 PM David Orman  wrote:
> >
> > Hi Yuval,
> >
> > We've managed to get an upgrade done with the 16.2.3 release in a
> > testing cluster, and we've been able to implement some of the logging
> > I need via this mechanism, but the logs are emitted only when
> > debug_rgw is set to 20. I don't need to log any of that level of data
> > (we used centralized logging and the sheer volume of this output is
> > staggering); I'm just trying to get the full request log, to include
> > the transactionID, so I can match it up with the logging we do on our
> > load balancer solution. Is there another mechanism to emit these logs
> > at normal log levels? RGWDebugLog() doesn't appear to be what I'm
> > actually looking for. My intent is to emit JSON logs using this
> > mechanism, in the end, with all of the required fields for requests.
> > The current "beast: " log lines don't contain the information we need,
> > such as txid, which is what we're attempting to solve for - but can't
> > afford to have full debug logging enabled in production clusters.
> >
> > Thanks!
> > David
> >
> > On Thu, Apr 1, 2021 at 11:21 AM Yuval Lifshitz  wrote:
> > >
> > > Hi David,
> > > Don't have any good idea for "octopus" (other than ops log), but you can 
> > > do that (and more) in "pacific" using lua scripting on the RGW:
> > > https://docs.ceph.com/en/pacific/radosgw/lua-scripting/
> > >
> > > Yuval
> > >
> > > On Thu, Apr 1, 2021 at 7:11 PM David Orman  wrote:
> > >>
> > >> Hi,
> > >>
> > >> Is there any way to log the x-amz-request-id along with the request in
> > >> the rgw logs? We're using beast and don't see an option in the
> > >> configuration documentation to add headers to the request lines. We
> > >> use centralized logging and would like to be able to search all layers
> > >> of the request path (edge, lbs, ceph, etc) with a x-amz-request-id.
> > >>
> > >> Right now, all we see is this:
> > >>
> > >> debug 2021-04-01T15:55:31.105+ 7f54e599b700  1 beast:
> > >> 0x7f5604c806b0: x.x.x.x - - [2021-04-01T15:55:31.105455+] "PUT
> > >> /path/object HTTP/1.1" 200 556 - "aws-sdk-go/1.36.15 (go1.15.3; linux;
> > >> amd64)" -
> > >>
> > >> We've also tried this:
> > >>
> > >> ceph config set global rgw_enable_ops_log true
> > >> ceph config set global rgw_ops_log_socket_path /tmp/testlog
> > >>
> > >> After doing this, inside the rgw container, we can socat -
> > >> UNIX-CONNECT:/tmp/testlog and see the log entries being recorded that
> > >> we want, but there has to be a better way to do this, where the logs
> > >> are emitted like the request logs above by beast, so that we can
> > >> handle it using journald. If there's an alternative that would
> > >> accomplish the same thing, we're very open to suggestions.
> > >>
> > >> Thank you,
> > >> David
> > >> ___
> > >> ceph-users mailing list -- ceph-users@ceph.io
> > >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
>
>
> --
>
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: x-amz-request-id logging with beast + rgw (ceph 15.2.10/containerized)?

2021-05-07 Thread David Orman
Hi Yuval,

We've managed to get an upgrade done with the 16.2.3 release in a
testing cluster, and we've been able to implement some of the logging
I need via this mechanism, but the logs are emitted only when
debug_rgw is set to 20. I don't need to log any of that level of data
(we used centralized logging and the sheer volume of this output is
staggering); I'm just trying to get the full request log, to include
the transactionID, so I can match it up with the logging we do on our
load balancer solution. Is there another mechanism to emit these logs
at normal log levels? RGWDebugLog() doesn't appear to be what I'm
actually looking for. My intent is to emit JSON logs using this
mechanism, in the end, with all of the required fields for requests.
The current "beast: " log lines don't contain the information we need,
such as txid, which is what we're attempting to solve for - but can't
afford to have full debug logging enabled in production clusters.

Thanks!
David

On Thu, Apr 1, 2021 at 11:21 AM Yuval Lifshitz  wrote:
>
> Hi David,
> Don't have any good idea for "octopus" (other than ops log), but you can do 
> that (and more) in "pacific" using lua scripting on the RGW:
> https://docs.ceph.com/en/pacific/radosgw/lua-scripting/
>
> Yuval
>
> On Thu, Apr 1, 2021 at 7:11 PM David Orman  wrote:
>>
>> Hi,
>>
>> Is there any way to log the x-amz-request-id along with the request in
>> the rgw logs? We're using beast and don't see an option in the
>> configuration documentation to add headers to the request lines. We
>> use centralized logging and would like to be able to search all layers
>> of the request path (edge, lbs, ceph, etc) with a x-amz-request-id.
>>
>> Right now, all we see is this:
>>
>> debug 2021-04-01T15:55:31.105+ 7f54e599b700  1 beast:
>> 0x7f5604c806b0: x.x.x.x - - [2021-04-01T15:55:31.105455+] "PUT
>> /path/object HTTP/1.1" 200 556 - "aws-sdk-go/1.36.15 (go1.15.3; linux;
>> amd64)" -
>>
>> We've also tried this:
>>
>> ceph config set global rgw_enable_ops_log true
>> ceph config set global rgw_ops_log_socket_path /tmp/testlog
>>
>> After doing this, inside the rgw container, we can socat -
>> UNIX-CONNECT:/tmp/testlog and see the log entries being recorded that
>> we want, but there has to be a better way to do this, where the logs
>> are emitted like the request logs above by beast, so that we can
>> handle it using journald. If there's an alternative that would
>> accomplish the same thing, we're very open to suggestions.
>>
>> Thank you,
>> David
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stuck OSD service specification - can't remove

2021-05-07 Thread David Orman
Hi,

I'm not attempting to remove the OSDs, but instead the
service/placement specification. I want the OSDs/data to persist.
--force did not work on the service, as noted in the original email.

Thank you,
David

On Fri, May 7, 2021 at 1:36 AM mabi  wrote:
>
> Hi David,
>
> I had a similar issue yesterday where I wanted to remove an OSD on an OSD 
> node which had 2 OSDs so for that I used "ceph orch osd rm" command which 
> completed successfully but after rebooting that OSD node I saw it was still 
> trying to start the systemd service for that OSD and one CPU core was 100% 
> busy trying to do a "crun delete" which I suppose here is trying to delete an 
> image or container. So what I did here is to kill this process and I also had 
> to run the following command:
>
> ceph orch daemon rm osd.3 --force
>
> After that everything was fine again. This is a Ceph 15.2.11 cluster on 
> Ubuntu 20.04 and podman.
>
> Hope that helps.
>
> ‐‐‐ Original Message ‐‐‐
> On Friday, May 7, 2021 1:24 AM, David Orman  wrote:
>
> > Has anybody run into a 'stuck' OSD service specification? I've tried
> > to delete it, but it's stuck in 'deleting' state, and has been for
> > quite some time (even prior to upgrade, on 15.2.x). This is on 16.2.3:
> >
> > NAME PORTS RUNNING REFRESHED AGE PLACEMENT
> > osd.osd_spec 504/525  12m label:osd
> > root@ceph01:/# ceph orch rm osd.osd_spec
> > Removed service osd.osd_spec
> >
> > From active monitor:
> >
> > debug 2021-05-06T23:14:48.909+ 7f17d310b700 0
> > log_channel(cephadm) log [INF] : Remove service osd.osd_spec
> >
> > Yet in ls, it's still there, same as above. --export on it:
> >
> > root@ceph01:/# ceph orch ls osd.osd_spec --export
> > service_type: osd
> > service_id: osd_spec
> > service_name: osd.osd_spec
> > placement: {}
> > unmanaged: true
> > spec:
> > filter_logic: AND
> > objectstore: bluestore
> >
> > We've tried --force, as well, with no luck.
> >
> > To be clear, the --export even prior to delete looks nothing like the
> > actual service specification we're using, even after I re-apply it, so
> > something seems 'bugged'. Here's the OSD specification we're applying:
> >
> > service_type: osd
> > service_id: osd_spec
> > placement:
> > label: "osd"
> > data_devices:
> > rotational: 1
> > db_devices:
> > rotational: 0
> > db_slots: 12
> >
> > I would appreciate any insight into how to clear this up (without
> > removing the actual OSDs, we're just wanting to apply the updated
> > service specification - we used to use host placement rules and are
> > switching to label-based).
> >
> > Thanks,
> > David
> >
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Stuck OSD service specification - can't remove

2021-05-06 Thread David Orman
Has anybody run into a 'stuck' OSD service specification? I've tried
to delete it, but it's stuck in 'deleting' state, and has been for
quite some time (even prior to upgrade, on 15.2.x). This is on 16.2.3:

NAME   PORTS  RUNNING  REFRESHED   AGE  PLACEMENT
osd.osd_spec504/52512m  label:osd
root@ceph01:/# ceph orch rm osd.osd_spec
Removed service osd.osd_spec

>From active monitor:

debug 2021-05-06T23:14:48.909+ 7f17d310b700  0
log_channel(cephadm) log [INF] : Remove service osd.osd_spec

Yet in ls, it's still there, same as above. --export on it:

root@ceph01:/# ceph orch ls osd.osd_spec --export
service_type: osd
service_id: osd_spec
service_name: osd.osd_spec
placement: {}
unmanaged: true
spec:
  filter_logic: AND
  objectstore: bluestore

We've tried --force, as well, with no luck.

To be clear, the --export even prior to delete looks nothing like the
actual service specification we're using, even after I re-apply it, so
something seems 'bugged'. Here's the OSD specification we're applying:

service_type: osd
service_id: osd_spec
placement:
  label: "osd"
data_devices:
  rotational: 1
db_devices:
  rotational: 0
db_slots: 12

I would appreciate any insight into how to clear this up (without
removing the actual OSDs, we're just wanting to apply the updated
service specification - we used to use host placement rules and are
switching to label-based).

Thanks,
David
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Failed cephadm Upgrade - ValueError

2021-05-04 Thread David Orman
Can you please run: "cat /sys/kernel/security/apparmor/profiles"? See if
any of the lines have a label but no mode. Let us know what you find!

Thanks,
David

On Mon, May 3, 2021 at 8:58 AM Ashley Merrick  wrote:

> Created BugTicket : https://tracker.ceph.com/issues/50616
> > On Mon May 03 2021 21:49:41 GMT+0800 (Singapore Standard Time), Ashley
> Merrick  wrote:
> > Just checked cluster logs and they are full of:cephadm exited with an
> error code: 1, stderr:Reconfig daemon osd.16 ... Traceback (most recent
> call last): File
> "/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484bdc911a9c50d6408adfca696c2faaa65c018d660a3b697d119482",
> line 7931, in  main() File
> "/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484bdc911a9c50d6408adfca696c2faaa65c018d660a3b697d119482",
> line 7919, in main r = ctx.func(ctx) File
> "/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484bdc911a9c50d6408adfca696c2faaa65c018d660a3b697d119482",
> line 1717, in defaultimage return func(ctx) File
> "/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484bdc911a9c50d6408adfca696c2faaa65c018d660a3b697d119482",
> line 4162, in command_deploy c = get_container(ctx, ctx.fsid, daemon_type,
> daemon_id, File
> "/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484bdc911a9c50d6408adfca696c2faaa65c018d660a
>  3b697d119482", line 2451, in get_container
> volume_mounts=get_container_mounts(ctx, fsid, daemon_type, daemon_id), File
> "/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484bdc911a9c50d6408adfca696c2faaa65c018d660a3b697d119482",
> line 2292, in get_container_mounts if HostFacts(ctx).selinux_enabled: File
> "/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484bdc911a9c50d6408adfca696c2faaa65c018d660a3b697d119482",
> line 6451, in selinux_enabled return (self.kernel_security['type'] ==
> 'SELinux') and \ File
> "/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484bdc911a9c50d6408adfca696c2faaa65c018d660a3b697d119482",
> line 6434, in kernel_security ret = _fetch_apparmor() File
> "/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484bdc911a9c50d6408adfca696c2faaa65c018d660a3b697d119482",
> line 6415, in _fetch_apparmor item, mode = line.split(' ') ValueError: not
> enough values to unpack (expected 2, got 1) Traceback (most recent ca
>  ll last): File "/usr/share/ceph/mgr/cephadm/serve.py", line 1172, in
> _remote_connection yield (conn, connr) File
> "/usr/share/ceph/mgr/cephadm/serve.py", line 1087, in _run_cephadm code,
> '\n'.join(err))) orchestrator._interface.OrchestratorError: cephadm exited
> with an error code: 1, stderr:Reconfig daemon osd.16 ... Traceback (most
> recent call last): File
> "/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484bdc911a9c50d6408adfca696c2faaa65c018d660a3b697d119482",
> line 7931, in  main() File
> "/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484bdc911a9c50d6408adfca696c2faaa65c018d660a3b697d119482",
> line 7919, in main r = ctx.func(ctx) File
> "/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484bdc911a9c50d6408adfca696c2faaa65c018d660a3b697d119482",
> line 1717, in _default_image return func(ctx) File
> "/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484bdc911a9c50d6408adfca696c2faaa65c018d660a3b697d119482",
> lin
>  e 4162, in command_deploy c = get_container(ctx, ctx.fsid, daemon_type,
> daemon_id, File
> "/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484bdc911a9c50d6408adfca696c2faaa65c018d660a3b697d119482",
> line 2451, in get_container volume_mounts=get_container_mounts(ctx, fsid,
> daemon_type, daemon_id), File
> "/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484bdc911a9c50d6408adfca696c2faaa65c018d660a3b697d119482",
> line 2292, in get_container_mounts if HostFacts(ctx).selinux_enabled: File
> "/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484bdc911a9c50d6408adfca696c2faaa65c018d660a3b697d119482",
> line 6451, in selinux_enabled return (self.kernel_security['type'] ==
> 'SELinux') and \ File
> "/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484bdc911a9c50d6408adfca696c2faaa65c018d660a3b697d119482",
> line 6434, in kernel_security ret = _fetch_apparmor() File
> "/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484
>  bdc911a9c50d6408adfca696c2faaa65c018d660a3b697d119482", line 6415, in
> _fetch_apparmor item, mode = line.split(' ') ValueError: not enough values
> to unpack (expected 2, got 1)being repeated over and over again for each
> OSD.Again listing "ValueError: not enough values to unpack (expected 2, got
> 1)"
> >> On Mon May 03 2021 17:20:59 GMT+0800 (Singapore Standard Time), Ashley
> Merrick  wrote:
> >> Hello,Wondering if anyone had any feedback on some commands I could try
> to manually update the current OSD that is down to 16.2.1 so I 

[ceph-users] Re: using ec pool with rgw

2021-05-03 Thread David Orman
We haven't found a more 'elegant' way, but the process we follow: we
pre-create all the pools prior to creating the realm/zonegroup/zone, then
we period apply, then we remove the default zonegroup/zone, period apply,
then remove the default pools.

Hope this is at least somewhat helpful,
David

On Sat, May 1, 2021 at 11:39 AM Marco Savoca  wrote:

> Hi,
>
> I’m currently deploying a new cluster for cold storage with rgw.
>
> Is there actually a more elegant method to get the bucket data on an
> erasure coding pool other than moving the pool or creating the bucket.data
> pool prior to data upload?
>
> Thanks,
>
> Marco Savoca
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Version of podman for Ceph 15.2.10

2021-04-09 Thread David Orman
That container (ceph-grafana) is not built for ARM based processors,
only AMD64: 
https://hub.docker.com/r/ceph/ceph-grafana/tags?page=1=last_updated
. You'll probably need to disable that (I think it's part of the
dashboard module - I don't know - we run our own Prometheus/Grafana
infrastructure outside of Ceph).

On Fri, Apr 9, 2021 at 1:32 AM mabi  wrote:
>
> Thank you for confirming that podman 3.0.1 is fine.
>
> I have now bootstrapped my first node with cephadm bootstrap command on my 
> RasPi 4 (8GB RAM) with Ubuntu 20.04 LTS that worked well but in the logs I 
> can see that it fails to deploy the graphana container as you can see from 
> the log below:
>
> Traceback (most recent call last):
>   File "/usr/share/ceph/mgr/cephadm/module.py", line 1021, in 
> _remote_connection
> yield (conn, connr)
>   File "/usr/share/ceph/mgr/cephadm/module.py", line 1168, in _run_cephadm
> code, '\n'.join(err)))
> orchestrator._interface.OrchestratorError: cephadm exited with an error code: 
> 1, stderr:Deploy daemon grafana.ceph1a ...
> Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host --net=host 
> --entrypoint stat -e CONTAINER_IMAGE=docker.io/ceph/ceph-grafana:6.7.4 -e 
> NODE_NAME=ceph1a docker.io/ceph/ceph-grafana:6.7.4 -c %u %g /var/lib/grafana
> stat: stderr {"msg":"exec container process `/usr/bin/stat`: Exec format 
> error","level":"error","time":"2021-04-09T06:17:54.000910863Z"}
> Traceback (most recent call last):
>   File "", line 6153, in 
>   File "", line 1412, in _default_image
>   File "", line 3431, in command_deploy
>   File "", line 3362, in extract_uid_gid_monitoring
>   File "", line 2099, in extract_uid_gid
> RuntimeError: uid/gid not found
>
> Does anyone have a clue what could be going wrong here?? and how to fix that?
>
> Right now my bootstraped node has the following containers running:
>
> $ sudo podman ps
> CONTAINER ID  IMAGE COMMAND   
> CREATED   STATUS   PORTS   NAMES
> 5985c38ef718  docker.io/ceph/ceph:v15   -n mon.ceph1a -f ...  15 
> hours ago  Up 15 hours ago  
> ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa-mon.ceph1a
> 0773ea8c6951  docker.io/ceph/ceph:v15   -n mgr.ceph1a.rzc...  15 
> hours ago  Up 15 hours ago  
> ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa-mgr.ceph1a.rzcwjd
> 941db20cdc2e  docker.io/ceph/ceph:v15   -n client.crash.c...  15 
> hours ago  Up 15 hours ago  
> ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa-crash.ceph1a
> 897286d3c80f  docker.io/prom/node-exporter:v0.18.1  --no-collector.ti...  15 
> hours ago  Up 15 hours ago  
> ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa-node-exporter.ceph1a
> 08c4e95c0c03  docker.io/prom/prometheus:v2.18.1 --config.file=/et...  15 
> hours ago  Up 15 hours ago  
> ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa-prometheus.ceph1a
> 19944dbf7a63  docker.io/prom/alertmanager:v0.20.0   --web.listen-addr...  15 
> hours ago  Up 15 hours ago  
> ceph-8d47792c-987d-11eb-9bb6-a5302e00e1fa-alertmanager.ceph1a
>
>
>
> ‐‐‐ Original Message ‐‐‐
> On Friday, April 9, 2021 3:37 AM, David Orman  wrote:
>
> > The latest podman 3.0.1 release is fine (we have many production clusters 
> > running this). We have not tested 3.1 yet, however, but will soon.
> >
> > > On Apr 8, 2021, at 10:32, mabi m...@protonmail.ch wrote:
> > > Hello,
> > > I would like to install Ceph 15.2.10 using cephadm and just found the 
> > > following table by checking the requirements on the host:
> > > https://docs.ceph.com/en/latest/cephadm/compatibility/#compatibility-with-podman-versions
> > > Do I understand this table correctly that I should be using podman 
> > > version 2.1?
> > > and what happens if I use the latest podman version 3.0
> > > Best regards,
> > > Mabi
> > >
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Version of podman for Ceph 15.2.10

2021-04-08 Thread David Orman
The latest podman 3.0.1 release is fine (we have many production clusters 
running this). We have not tested 3.1 yet, however, but will soon.

> On Apr 8, 2021, at 10:32, mabi  wrote:
> 
> Hello,
> 
> I would like to install Ceph 15.2.10 using cephadm and just found the 
> following table by checking the requirements on the host:
> 
> https://docs.ceph.com/en/latest/cephadm/compatibility/#compatibility-with-podman-versions
> 
> Do I understand this table correctly that I should be using podman version 
> 2.1?
> 
> and what happens if I use the latest podman version 3.0
> 
> Best regards,
> Mabi
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] bluestore_min_alloc_size_hdd on Octopus (15.2.10) / XFS formatted RBDs

2021-04-07 Thread David Orman
Now that the hybrid allocator appears to be enabled by default in
Octopus, is it safe to change bluestore_min_alloc_size_hdd to 4k from
64k on Octopus 15.2.10 clusters, and then redeploy every OSD to switch
to the smaller allocation size, without massive performance impact for
RBD? We're seeing a lot of storage usage amplification on EC 8+3
clusters which are HDD backed that lines up with a lot of the mailing
list posts we've seen here. Upgrading to Pacific before making this
change is also a possibility once a more stable release arrives, if
that's necessary.

Second part of this question - we are using RBDs currently on the
clusters impacted. These have XFS filesystems on top, which detect the
sector size of the RBD as 512byte, and XFS has a block size of 4k.
With the default of 64k for bluestore_min_alloc_size_hdd, let's say a
1G file is written out to the XFS filesystem backed by the RBD. On the
ceph side, is this being seen as a lot of 4k objects thus a
significant space waste is occurring, or is RBD able to coalesce these
into 64k objects, even though XFS is using a 4k block size?

XFS details below, you can see the allocation groups are quite large:

meta-data=/dev/rbd0  isize=512agcount=501, agsize=268435440 blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=1finobt=1, sparse=1, rmapbt=0
 =   reflink=1
data =   bsize=4096   blocks=134217728000, imaxpct=1
 =   sunit=16 swidth=16 blks
naming   =version 2  bsize=4096   ascii-ci=0, ftype=1
log  =internal log   bsize=4096   blocks=521728, version=2
 =   sectsz=512   sunit=16 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

I'm curious if people have been tuning XFS on RBD for better
performance, as well.

Thank you!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


  1   2   >