[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-24 Thread Curt
Hey Mathew,

One more thing out of curiosity can you send the output of blockdev
--getbsz on the rbd dev and rbd info?

I'm using 16TB rbd images without issue, but I haven't updated to reef .2
yet.

Cheers,
Curt


On Sun, 24 Mar 2024, 11:12 duluxoz,  wrote:

> Hi Curt,
>
> Nope, no dropped packets or errors - sorry, wrong tree  :-)
>
> Thanks for chiming in.
>
> On 24/03/2024 20:01, Curt wrote:
> > I may be barking up the wrong tree, but if you run ip -s link show
> > yourNicID on this server or your OSDs do you see any
> > errors/dropped/missed?
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-24 Thread Curt
I may be barking up the wrong tree, but if you run ip -s link show
yourNicID on this server or your OSDs do you see any errors/dropped/missed?

On Sun, 24 Mar 2024, 09:20 duluxoz,  wrote:

> Hi,
>
> Yeah, I've been testing various configurations since I sent my last
> email - all to no avail.
>
> So I'm back to the start with a brand new 4T image which is rbdmapped to
> /dev/rbd0.
>
> Its not formatted (yet) and so not mounted.
>
> Every time I attempt a mkfs.xfs /dev/rbd0 (or mkfs.xfs
> /dev/rbd/my_pool/my_image) I get the errors I previous mentioned and the
> resulting image then becomes unusable (in ever sense of the word).
>
> If I run a fdisk -l (before trying the mkfs.xfs) the rbd image shows up
> in the list - no, I don't actually do a full fdisk on the image.
>
> An rbd info my_pool:my_image shows the same expected values on both the
> host and ceph cluster.
>
> I've tried this with a whole bunch of different sized images from 100G
> to 4T and all fail in exactly the same way. (My previous successful 100G
> test I haven't been able to reproduce).
>
> I've also tried all of the above using an "admin" CephX(sp?) account - I
> always can connect via rbdmap, but as soon as I try an mkfs.xfs it
> fails. This failure also occurs with a mkfs.ext4 as well (all size drives).
>
> The Ceph Cluster is good (self reported and there are other hosts
> happily connected via CephFS) and this host also has a CephFS mapping
> which is working.
>
> Between running experiments I've gone over the Ceph Doco (again) and I
> can't work out what's going wrong.
>
> There's also nothing obvious/helpful jumping out at me from the
> logs/journal (sample below):
>
> ~~~
>
> Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write at objno
> 524773 0~65536 result -1
> Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write at objno
> 524772 65536~4128768 result -1
> Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write result -1
> Mar 24 17:38:29 my_host.my_net.local kernel: blk_print_req_error: 119
> callbacks suppressed
> Mar 24 17:38:29 my_host.my_net.local kernel: I/O error, dev rbd0, sector
> 4298932352 op 0x1:(WRITE) flags 0x4000 phys_seg 1024 prio class 2
> Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write at objno
> 524774 0~65536 result -1
> Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write at objno
> 524773 65536~4128768 result -1
> Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write result -1
> Mar 24 17:38:29 my_host.my_net.local kernel: I/O error, dev rbd0, sector
> 4298940544 op 0x1:(WRITE) flags 0x4000 phys_seg 1024 prio class 2
> ~~~
>
> Any ideas what I should be looking at?
>
> And thank you for the help  :-)
>
> On 24/03/2024 17:50, Alexander E. Patrakov wrote:
> > Hi,
> >
> > Please test again, it must have been some network issue. A 10 TB RBD
> > image is used here without any problems.
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG stuck at recovery

2024-02-23 Thread Curt
ECC 2+2 & 4+2 HDD only.

On Tue, 20 Feb 2024, 00:25 Anthony D'Atri,  wrote:

> After wrangling with this myself, both with 17.2.7 and to an extent with
> 17.2.5, I'd like to follow up here and ask:
>
> Those who have experienced this, were the affected PGs
>
> * Part of an EC pool?
> * Part of an HDD pool?
> * Both?
>
>
> >
> > You don't say anything about the Ceph version you are running.
> > I had an similar issue with 17.2.7, and is seams to be an issue with
> mclock,
> > when I switch to wpq everything worked again.
> >
> > You can read more about it here
> >
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/IPHBE3DLW5ABCZHSNYOBUBSI3TLWVD22/#OE3QXLAJIY6NU7PNMGHP47UK2CBZJPUG
> >
> > - Kai Stian Olstad
> >
> >
> > On Tue, Feb 06, 2024 at 06:35:26AM -, LeonGao  wrote:
> >> Hi community
> >>
> >> We have a new Ceph cluster deployment with 100 nodes. When we are
> draining an OSD host from the cluster, we see a small amount of PGs that
> cannot make any progress to the end. From the logs and metrics, it seems
> like the recovery progress is stuck (0 recovery ops for several days).
> Would like to get some ideas on this. Re-peering and OSD restart do resolve
> to mitigate the issue but we want to get to the root cause of it as
> draining and recovery happen frequently.
> >>
> >> I have put some debugging information below. Any help is appreciated,
> thanks!
> >>
> >> ceph -s
> >>   pgs: 4210926/7380034104 objects misplaced (0.057%)
> >>41198 active+clean
> >>71active+remapped+backfilling
> >>12active+recovering
> >>
> >> One of the stuck PG:
> >> 6.38f1   active+remapped+backfilling [313,643,727]
>  313 [313,643,717] 313
> >>
> >> PG query result:
> >>
> >> ceph pg 6.38f1 query
> >> {
> >>   "snap_trimq": "[]",
> >>   "snap_trimq_len": 0,
> >>   "state": "active+remapped+backfilling",
> >>   "epoch": 246856,
> >>   "up": [
> >>   313,
> >>   643,
> >>   727
> >>   ],
> >>   "acting": [
> >>   313,
> >>   643,
> >>   717
> >>   ],
> >>   "backfill_targets": [
> >>   "727"
> >>   ],
> >>   "acting_recovery_backfill": [
> >>   "313",
> >>   "643",
> >>   "717",
> >>   "727"
> >>   ],
> >>   "info": {
> >>   "pgid": "6.38f1",
> >>   "last_update": "212333'38916",
> >>   "last_complete": "212333'38916",
> >>   "log_tail": "80608'37589",
> >>   "last_user_version": 38833,
> >>   "last_backfill": "MAX",
> >>   "purged_snaps": [],
> >>   "history": {
> >>   "epoch_created": 3726,
> >>   "epoch_pool_created": 3279,
> >>   "last_epoch_started": 243987,
> >>   "last_interval_started": 243986,
> >>   "last_epoch_clean": 220174,
> >>   "last_interval_clean": 220173,
> >>   "last_epoch_split": 3726,
> >>   "last_epoch_marked_full": 0,
> >>   "same_up_since": 238347,
> >>   "same_interval_since": 243986,
> >>   "same_primary_since": 3728,
> >>   "last_scrub": "212333'38916",
> >>   "last_scrub_stamp": "2024-01-29T13:43:10.654709+",
> >>   "last_deep_scrub": "212333'38916",
> >>   "last_deep_scrub_stamp": "2024-01-28T07:43:45.920198+",
> >>   "last_clean_scrub_stamp": "2024-01-29T13:43:10.654709+",
> >>   "prior_readable_until_ub": 0
> >>   },
> >>   "stats": {
> >>   "version": "212333'38916",
> >>   "reported_seq": 413425,
> >>   "reported_epoch": 246856,
> >>   "state": "active+remapped+backfilling",
> >>   "last_fresh": "2024-02-05T21:14:40.838785+",
> >>   "last_change": "2024-02-03T22:33:43.052272+",
> >>   "last_active": "2024-02-05T21:14:40.838785+",
> >>   "last_peered": "2024-02-05T21:14:40.838785+",
> >>   "last_clean": "2024-02-03T04:26:35.168232+",
> >>   "last_became_active": "2024-02-03T22:31:16.037823+",
> >>   "last_became_peered": "2024-02-03T22:31:16.037823+",
> >>   "last_unstale": "2024-02-05T21:14:40.838785+",
> >>   "last_undegraded": "2024-02-05T21:14:40.838785+",
> >>   "last_fullsized": "2024-02-05T21:14:40.838785+",
> >>   "mapping_epoch": 243986,
> >>   "log_start": "80608'37589",
> >>   "ondisk_log_start": "80608'37589",
> >>   "created": 3726,
> >>   "last_epoch_clean": 220174,
> >>   "parent": "0.0",
> >>   "parent_split_bits": 14,
> >>   "last_scrub": "212333'38916",
> >>   "last_scrub_stamp": "2024-01-29T13:43:10.654709+",
> >>   "last_deep_scrub": "212333'38916",
> >>   "last_deep_scrub_stamp": "2024-01-28T07:43:45.920198+",
> >>   "last_clean_scrub_stamp": "2024-01-29T13:43:10.654709+",
> >>   "objects_scrubbed": 17743,
> >>   "log_size": 1327,
> >>   "log_dups_size": 3000,
> >>   

[ceph-users] Re: Problems adding a new host via orchestration.

2024-02-05 Thread Curt
I don't use rocky, so stab in the dark and probably not the issue, but
could selinux be blocking the process?  Really long shot, but python3 is in
the standard location? So if you run python3 --version as your ceph user it
returns?

Probably not much help, but figured I'd throw it out there.

On Mon, 5 Feb 2024, 16:54 Gary Molenkamp,  wrote:

> I have verified the server's expected hostname (with `hostname`) matches
> the hostname I am trying to use.
> Just to be sure, I also ran:
>  cephadm check-host --expect-hostname 
> and it returns:
>  Hostname "" matches what is expected.
>
> On the current admin server where I am trying to add the host, the host
> is reachable, the shortname even matches proper IP with dns search order.
> Likewise, on the server where the mgr is running, I am able to confirm
> reachability and DNS resolution for the new server as well.
>
> I thought this may be a DNS/name resolution issue as well, but I don't
> see any errors in my setup wrt to host naming.
>
> Thanks
> Gary
>
>
> On 2024-02-03 06:46, Eugen Block wrote:
> > Hi,
> >
> > I found this blog post [1] which reports the same error message. It
> > seems a bit misleading because it appears to be about DNS. Can you check
> >
> > cephadm check-host --expect-hostname 
> >
> > Or is that what you already tried? It's not entirely clear how you
> > checked the hostname.
> >
> > Regards,
> > Eugen
> >
> > [1]
> >
> https://blog.mousetech.com/ceph-distributed-file-system-for-the-enterprise/ceph-bogus-error-cannot-allocate-memory/
> >
> > Zitat von Gary Molenkamp :
> >
> >> Happy Friday all.  I was hoping someone could point me in the right
> >> direction or clarify any limitations that could be impacting an issue
> >> I am having.
> >>
> >> I'm struggling to add a new set of hosts to my ceph cluster using
> >> cephadm and orchestration.  When trying to add a host:
> >> "ceph orch host add  172.31.102.41 --labels _admin"
> >> returns:
> >> "Error EINVAL: Can't communicate with remote host
> >> `172.31.102.41`, possibly because python3 is not installed there:
> >> [Errno 12] Cannot allocate memory"
> >>
> >> I've verified that the ceph ssh key works to the remote host, host's
> >> name matches that returned from `hostname`, python3 is installed, and
> >> "/usr/sbin/cephadm prepare-host" on the new hosts returns "host is
> >> ok".In addition, the cluster ssh key works between hosts and the
> >> existing hosts are able to ssh in using the ceph key.
> >>
> >> The existing ceph cluster is Pacific release using docker based
> >> containerization on RockyLinux8 base OS.  The new hosts are
> >> RockyLinux9 based, with the cephadm being installed from Quincy release:
> >> ./cephadm add-repo --release quincy
> >> ./cephadm install
> >> I did try installing cephadm from the Pacific release by changing the
> >> repo to el8,  but that did not work either.
> >>
> >> Is there a limitation is mixing RL8 and RL9 container hosts under
> >> Pacific?  Does this same limitation exist under Quincy? Is there a
> >> python version dependency?
> >> The reason for RL9 on the new hosts is to stage upgrading the OS's
> >> for the cluster.  I did this under Octopus for moving from Centos7 to
> >> RL8.
> >>
> >> Thanks and I appreciate any feedback/pointers.
> >> Gary
> >>
> >>
> >> I've added the log trace here in case that helps (from `ceph log last
> >> cephadm`)
> >>
> >>
> >>
> >> 2024-02-02T14:22:32.610048+ mgr.storage01.oonvfl (mgr.441023307)
> >> 4957871 : cephadm [ERR] Can't communicate with remote host
> >> `172.31.102.41`, possibly because python3 is not installed there:
> >> [Errno 12] Cannot allocate memory
> >> Traceback (most recent call last):
> >>   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1524, in
> >> _remote_connection
> >> conn, connr = self.mgr._get_connection(addr)
> >>   File "/usr/share/ceph/mgr/cephadm/module.py", line 1370, in
> >> _get_connection
> >> sudo=True if self.ssh_user != 'root' else False)
> >>   File "/lib/python3.6/site-packages/remoto/backends/__init__.py",
> >> line 35, in __init__
> >> self.gateway = self._make_gateway(hostname)
> >>   File "/lib/python3.6/site-packages/remoto/backends/__init__.py",
> >> line 46, in _make_gateway
> >> self._make_connection_string(hostname)
> >>   File "/lib/python3.6/site-packages/execnet/multi.py", line 133, in
> >> makegateway
> >> io = gateway_io.create_io(spec, execmodel=self.execmodel)
> >>   File "/lib/python3.6/site-packages/execnet/gateway_io.py", line
> >> 121, in create_io
> >> io = Popen2IOMaster(args, execmodel)
> >>   File "/lib/python3.6/site-packages/execnet/gateway_io.py", line 21,
> >> in __init__
> >> self.popen = p = execmodel.PopenPiped(args)
> >>   File "/lib/python3.6/site-packages/execnet/gateway_base.py", line
> >> 184, in PopenPiped
> >> return self.subprocess.Popen(args, stdout=PIPE, stdin=PIPE)
> >>   File "/lib64/python3.6/subprocess.py", line 729, in __init__
> >> 

[ceph-users] Re: RBD Image Returning 'Unknown Filesystem LVM2_member' On Mount - Help Please

2024-02-04 Thread Curt
Out of curiosity, how are you mapping the rbd?  Have you tried using
guestmount?

I'm just spitballing, I have no experience with your issue, so probably not
much help or useful.

On Mon, 5 Feb 2024, 10:05 duluxoz,  wrote:

> ~~~
> Hello,
> I think that /dev/rbd* devices are flitered "out" or not filter "in" by
> the fiter
> option in the devices section of /etc/lvm/lvm.conf.
> So pvscan (pvs, vgs and lvs) don't look at your device.
> ~~~
>
> Hi Gilles,
>
> So the lvm filter from the lvm.conf file is set to the default of `filter
> = [ "a|.*|" ]`, so that's accept every block device, so no luck there  :-(
>
>
> ~~~
> For Ceph based LVM volumes, you would do this to import:
> Map every one of the RBDs to the host
> Include this in /etc/lvm/lvm.conf:
> types = [ "rbd", 1024 ]
> pvscan
> vgscan
> pvs
> vgs
> If you see the VG:
> vgimportclone -n  /dev/rbd0 /dev/rbd1 ... --import
> Now you should be able to vgchange -a y  and see the LVs
> ~~~
>
> Hi Alex,
>
> Did the above as you suggested - the rbd devices (3 of them, none of which
> were originally part of an lvm on the ceph servers - at least, not set up
> manually by me) still do not show up using pvscan, etc.
>
> So I still can't mount any of them (not without re-creating a fs, anyway,
> and thus losing the data I'm trying to read/import) - they all return the
> same error message (see original post).
>
> Anyone got any other ideas?   :-)
>
> Cheers
>
> Dulux-Oz
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] physical vs osd performance

2024-01-10 Thread Curt
Hello all,

Looking at grafana reports, can anyone point me to documentation that
outlines physical vs osd?  https://docs.ceph.com/en/latest/monitoring/
gives some basic info, but I'm trying to get a better understanding. For
instance, physical latency is 20ms and osd is 200ms, these are just made up
for this example, why the huge difference? Same thing if for bytes or iops,
just using latency as an example.

Thanks,
Curt
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: EC Profiles & DR

2023-12-06 Thread Curt
Hi Patrick,

Yes K and M are chunks, but the default crush map is a chunk per host,
which is probably the best way to do it, but I'm no expert. I'm not sure
why you would want to do a crush map with 2 chunks per host and min size 4
as it' s just asking for trouble at some point, in my opinion.  Anyway,
take a look at this post if your interested in doing 2 chunks per host it
will give you an idea of crushmap setup,
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NB3M22GNAC7VNWW7YBVYTH6TBZOYLTWA/
.

Regards,
Curt


On Wed, Dec 6, 2023 at 6:26 PM Patrick Begou <
patrick.be...@univ-grenoble-alpes.fr> wrote:

> Le 06/12/2023 à 00:11, Rich Freeman a écrit :
> > On Tue, Dec 5, 2023 at 6:35 AM Patrick Begou
> >   wrote:
> >> Ok, so I've misunderstood the meaning of failure domain. If there is no
> >> way to request using 2 osd/node and node as failure domain, with 5 nodes
> >> k=3+m=1 is not secure enough and I will have to use k=2+m=2, so like a
> >> raid1  setup. A little bit better than replication in the point of view
> >> of global storage capacity.
> >>
> > I'm not sure what you mean by requesting 2osd/node.  If the failure
> > domain is set to the host, then by default k/m refer to hosts, and the
> > PGs will be spread across all OSDs on all hosts, but with any
> > particular PG only being present on one OSD on each host.  You can get
> > fancy with device classes and crush rules and such and be more
> > specific with how they're allocated, but that would be the typical
> > behavior.
> >
> > Since k/m refer to hosts, then k+m must be less than or equal to the
> > number of hosts or you'll have a degraded pool because there won't be
> > enough hosts to allocate them all.  It won't ever stack them across
> > multiple OSDs on the same host with that configuration.
> >
> > k=2,m=2 with min=3 would require at least 4 hosts (k+m), and would
> > allow you to operate degraded with a single host down, and the PGs
> > would become inactive but would still be recoverable with two hosts
> > down.  While strictly speaking only 4 hosts are required, you'd do
> > better to have more than that since then the cluster can immediately
> > recover from a loss, assuming you have sufficient space.  As you say
> > it is no more space-efficient than RAID1 or size=2, and it suffers
> > write amplification for modifications, but it does allow recovery
> > after the loss of up to two hosts, and you can operate degraded with
> > one host down which allows for somewhat high availability.
> >
> Hi Rich,
>
> My understood was that k and m were for EC chunks not hosts.  Of
> course if k and m are hosts the best choice would be k=2 and m=2.
>
> When Christian wrote:
> /For example if you run an EC=4+2 profile on 3 hosts you can structure
> your crushmap so that you have 2 chunks per host. This means even if one
> host is down you are still guaranteed to have 4 chunks available./
>
> This is that I had thought before (and using 5 nodes instead of 3 as the
> Christian's example). But it does not match what you explain if k and m
> are nodes.
>
> I'm a little bit confused with crushmap settings.
>
> Patrick
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] shrink db size

2023-11-13 Thread Curt
Hello,

As far as I can tell there is no way to shrink a db/wal after creation.

 I recently added a new server to my cluster with SSD's for the wal/db and
just used the ceph dashboard for deployment. I did not specify a db size,
which is my mistake, it seems by default it uses "block.db has no size
configuration, will fallback to using as much as possible".

So now my issue is I added 2 more drives, but with no space left on the
SSD's get "2 fast devices were passed, but none are available".  I had 7
4TB HDD and 2 2Tb SSD's and added 2 more 4TB. Distribution is 4 and 3
between the 2 SSD's.

I just want to confirm, the best option is to set the
bluestore_block_db_size/wal_size, zap and let them be recreated with a size
of 300 and 2. I chose those just because I have space.  I'm not going to do
them all at the same time.

Cheers,
Curt
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Questions since updating to 18.0.2

2023-08-28 Thread Curt
Hello,

We recently upgraded our cluster to version 18 and I've noticed some things
that I'd like feedback on before I go down a rabbit hole for
non-issues. cephadm was used for the upgrade and there were no issues.
Cluster is 56 OSD's spinners for right now only used for RBD images.

I've noticed active scrubs/deep scrubs. I don't remember seeing a large
amount before, usually around 20-30 scrubs and 15 deep I think, now I will
have 70 scrubs and 70 deep scrubs happening. Which I thought were limited
to 1 per OSD or am I misunderstanding osd_max_scrubs?  Everything on the
cluster is currently at default values.

The other thing I've noticed is since the upgrade it seems like any time
backfill happens the client io drops, but neither is high to begin with,
30MiB/s read/write client IO drops to 10-15 with 200MiB/s backfill. Before
upgrading backfill would be hitting 5-600 with 30 clientio. I realize lots
of things could affect this and it could be separate from the cluster, I'm
still investigating, but wanted to mention it incase someone could
recommend a check or some change to Reef that could cause this. mclock
profile is client_io.

Thanks,
Curt
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Blank dashboard

2023-07-31 Thread Curt
Hello,

Nevermind, sorry to disturb everyone.  I just disabled it and re-enabled it
and it now works, console errors are gone. This is version 17.2.6 btw.  If
anyone has any insight on what might have caused this, it would be
interesting to know.

Thanks,
Curt


On Mon, Jul 31, 2023 at 8:04 PM Curt  wrote:

> Hello,
>
> This is a strange one for me. My ceph dashboard just stopped loading,
> nothing but a white page.  I don't see anything in the logs and on browser
> side the only error I see is Failed to load resource:
> net::ERR_CONTENT_LENGTH_MISMATCH in chrome and Uncaught SyntaxError: expected
> expression, got end of script for Firefox on the
> file main.ddd4de0999172734.js.
>
> Nothing in the logs files, put mgr up to 20 on the log level.  Any
> suggestions?
>
> Thanks,
> Curt
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Blank dashboard

2023-07-31 Thread Curt
Hello,

This is a strange one for me. My ceph dashboard just stopped loading,
nothing but a white page.  I don't see anything in the logs and on browser
side the only error I see is Failed to load resource:
net::ERR_CONTENT_LENGTH_MISMATCH in chrome and Uncaught SyntaxError: expected
expression, got end of script for Firefox on the
file main.ddd4de0999172734.js.

Nothing in the logs files, put mgr up to 20 on the log level.  Any
suggestions?

Thanks,
Curt
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD stuck down

2023-06-15 Thread Curt
Hello,

Have you increased the osd debug level to get more output?  Does dmesg on
the host machine report anything? Are there any smart errors on the drive?

Regards,
Curt

On Thu, Jun 15, 2023, 13:30 Nicola Mori  wrote:

> Hi Dario,
>
> I think the connectivity is ok. My cluster has just a public interface,
> and all of the other services on the same machine (osds and mgr) work
> flawlessly so I guess the connectivity is ok. Or in other words, I don't
> know what to look for in the network since all the other services work,
> do you have any suggestion?
>
> Nicola
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-17 Thread Curt
Hi,

I've been following this thread with interest as it seems like a unique use
case to expand my knowledge. I don't use LRC or anything outside basic
erasure coding.

What is your current crush steps rule?  I know you made changes since your
first post and had some thoughts I wanted to share, but wanted to see your
rule first so I could try to visualize the distribution better.  The only
way I can currently visualize it working is with more servers, I'm thinking
6 or 9 per data center min, but that could be my lack of knowledge on some
of the step rules.

Thanks
Curt

On Tue, May 16, 2023 at 11:09 AM Michel Jouvin <
michel.jou...@ijclab.in2p3.fr> wrote:

> Hi Eugen,
>
> Yes, sure, no problem to share it. I attach it to this email (as it may
> clutter the discussion if inline).
>
> If somebody on the list has some clue on the LRC plugin, I'm still
> interested by understand what I'm doing wrong!
>
> Cheers,
>
> Michel
>
> Le 04/05/2023 à 15:07, Eugen Block a écrit :
> > Hi,
> >
> > I don't think you've shared your osd tree yet, could you do that?
> > Apparently nobody else but us reads this thread or nobody reading this
> > uses the LRC plugin. ;-)
> >
> > Thanks,
> > Eugen
> >
> > Zitat von Michel Jouvin :
> >
> >> Hi,
> >>
> >> I had to restart one of my OSD server today and the problem showed up
> >> again. This time I managed to capture "ceph health detail" output
> >> showing the problem with the 2 PGs:
> >>
> >> [WRN] PG_AVAILABILITY: Reduced data availability: 2 pgs inactive, 2
> >> pgs down
> >> pg 56.1 is down, acting
> >> [208,65,73,206,197,193,144,155,178,182,183,133,17,NONE,36,NONE,230,NONE]
> >> pg 56.12 is down, acting
> >>
> [NONE,236,28,228,218,NONE,215,117,203,213,204,115,136,181,171,162,137,128]
> >>
> >> I still doesn't understand why, if I am supposed to survive to a
> >> datacenter failure, I cannot survive to 3 OSDs down on the same host,
> >> hosting shards for the PG. In the second case it is only 2 OSDs down
> >> but I'm surprised they don't seem in the same "group" of OSD (I'd
> >> expected all the the OSDs of one datacenter to be in the same groupe
> >> of 5 if the order given really reflects the allocation done...
> >>
> >> Still interested by some explanation on what I'm doing wrong! Best
> >> regards,
> >>
> >> Michel
> >>
> >> Le 03/05/2023 à 10:21, Eugen Block a écrit :
> >>> I think I got it wrong with the locality setting, I'm still limited
> >>> by the number of hosts I have available in my test cluster, but as
> >>> far as I got with failure-domain=osd I believe k=6, m=3, l=3 with
> >>> locality=datacenter could fit your requirement, at least with
> >>> regards to the recovery bandwidth usage between DCs, but the
> >>> resiliency would not match your requirement (one DC failure). That
> >>> profile creates 3 groups of 4 chunks (3 data/coding chunks and one
> >>> parity chunk) across three DCs, in total 12 chunks. The min_size=7
> >>> would not allow an entire DC to go down, I'm afraid, you'd have to
> >>> reduce it to 6 to allow reads/writes in a disaster scenario. I'm
> >>> still not sure if I got it right this time, but maybe you're better
> >>> off without the LRC plugin with the limited number of hosts. Instead
> >>> you could use the jerasure plugin with a profile like k=4 m=5
> >>> allowing an entire DC to fail without losing data access (we have
> >>> one customer using that).
> >>>
> >>> Zitat von Eugen Block :
> >>>
> >>>> Hi,
> >>>>
> >>>> disclaimer: I haven't used LRC in a real setup yet, so there might
> >>>> be some misunderstandings on my side. But I tried to play around
> >>>> with one of my test clusters (Nautilus). Because I'm limited in the
> >>>> number of hosts (6 across 3 virtual DCs) I tried two different
> >>>> profiles with lower numbers to get a feeling for how that works.
> >>>>
> >>>> # first attempt
> >>>> ceph:~ # ceph osd erasure-code-profile set LRCprofile plugin=lrc
> >>>> k=4 m=2 l=3 crush-failure-domain=host
> >>>>
> >>>> For every third OSD one parity chunk is added, so 2 more chunks to
> >>>> store ==> 8 chunks in total. Since my failure-domain is host and I
> >>>> only have 6 I get incomplete PGs.
> >

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-29 Thread Curt
Hello,

What is your current setup, 1 server pet data center with 12 osd each? What
is your current crush rule and LRC crush rule?


On Fri, Apr 28, 2023, 12:29 Michel Jouvin 
wrote:

> Hi,
>
> I think I found a possible cause of my PG down but still understand why.
> As explained in a previous mail, I setup a 15-chunk/OSD EC pool (k=9,
> m=6) but I have only 12 OSD servers in the cluster. To workaround the
> problem I defined the failure domain as 'osd' with the reasoning that as
> I was using the LRC plugin, I had the warranty that I could loose a site
> without impact, thus the possibility to loose 1 OSD server. Am I wrong?
>
> Best regards,
>
> Michel
>
> Le 24/04/2023 à 13:24, Michel Jouvin a écrit :
> > Hi,
> >
> > I'm still interesting by getting feedback from those using the LRC
> > plugin about the right way to configure it... Last week I upgraded
> > from Pacific to Quincy (17.2.6) with cephadm which is doing the
> > upgrade host by host, checking if an OSD is ok to stop before actually
> > upgrading it. I had the surprise to see 1 or 2 PGs down at some points
> > in the upgrade (happened not for all OSDs but for every
> > site/datacenter). Looking at the details with "ceph health detail", I
> > saw that for these PGs there was 3 OSDs down but I was expecting the
> > pool to be resilient to 6 OSDs down (5 for R/W access) so I'm
> > wondering if there is something wrong in our pool configuration (k=9,
> > m=6, l=5).
> >
> > Cheers,
> >
> > Michel
> >
> > Le 06/04/2023 à 08:51, Michel Jouvin a écrit :
> >> Hi,
> >>
> >> Is somebody using LRC plugin ?
> >>
> >> I came to the conclusion that LRC  k=9, m=3, l=4 is not the same as
> >> jerasure k=9, m=6 in terms of protection against failures and that I
> >> should use k=9, m=6, l=5 to get a level of resilience >= jerasure
> >> k=9, m=6. The example in the documentation (k=4, m=2, l=3) suggests
> >> that this LRC configuration gives something better than jerasure k=4,
> >> m=2 as it is resilient to 3 drive failures (but not 4 if I understood
> >> properly). So how many drives can fail in the k=9, m=6, l=5
> >> configuration first without loosing RW access and second without
> >> loosing data?
> >>
> >> Another thing that I don't quite understand is that a pool created
> >> with this configuration (and failure domain=osd, locality=datacenter)
> >> has a min_size=3 (max_size=18 as expected). It seems wrong to me, I'd
> >> expected something ~10 (depending on answer to the previous question)...
> >>
> >> Thanks in advance if somebody could provide some sort of
> >> authoritative answer on these 2 questions. Best regards,
> >>
> >> Michel
> >>
> >> Le 04/04/2023 à 15:53, Michel Jouvin a écrit :
> >>> Answering to myself, I found the reason for 2147483647: it's
> >>> documented as a failure to find enough OSD (missing OSDs). And it is
> >>> normal as I selected different hosts for the 15 OSDs but I have only
> >>> 12 hosts!
> >>>
> >>> I'm still interested by an "expert" to confirm that LRC  k=9, m=3,
> >>> l=4 configuration is equivalent, in terms of redundancy, to a
> >>> jerasure configuration with k=9, m=6.
> >>>
> >>> Michel
> >>>
> >>> Le 04/04/2023 à 15:26, Michel Jouvin a écrit :
>  Hi,
> 
>  As discussed in another thread (Crushmap rule for multi-datacenter
>  erasure coding), I'm trying to create an EC pool spanning 3
>  datacenters (datacenters are present in the crushmap), with the
>  objective to be resilient to 1 DC down, at least keeping the
>  readonly access to the pool and if possible the read-write access,
>  and have a storage efficiency better than 3 replica (let say a
>  storage overhead <= 2).
> 
>  In the discussion, somebody mentioned LRC plugin as a possible
>  jerasure alternative to implement this without tweaking the
>  crushmap rule to implement the 2-step OSD allocation. I looked at
>  the documentation
>  (https://docs.ceph.com/en/latest/rados/operations/erasure-code-lrc/)
>  but I have some questions if someone has experience/expertise with
>  this LRC plugin.
> 
>  I tried to create a rule for using 5 OSDs per datacenter (15 in
>  total), with 3 (9 in total) being data chunks and others being
>  coding chunks. For this, based of my understanding of examples, I
>  used k=9, m=3, l=4. Is it right? Is this configuration equivalent,
>  in terms of redundancy, to a jerasure configuration with k=9, m=6?
> 
>  The resulting rule, which looks correct to me, is:
> 
>  
> 
>  {
>  "rule_id": 6,
>  "rule_name": "test_lrc_2",
>  "ruleset": 6,
>  "type": 3,
>  "min_size": 3,
>  "max_size": 15,
>  "steps": [
>  {
>  "op": "set_chooseleaf_tries",
>  "num": 5
>  },
>  {
>  "op": "set_choose_tries",
>  "num": 100
>  },
>  {
> 

[ceph-users] Re: Very slow backfilling

2023-03-02 Thread Curt
p_num 32 autoscale_mode on last_change
> 14979 flags hashpspool stripe_width 0 application rgw
> pool 11 'ncy.rgw.control' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> 14981 flags hashpspool stripe_width 0 application rgw
> pool 12 'ncy.rgw.meta' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 15105
> lfor 0/15105/15103 flags hashpspool stripe_width 0 pg_autoscale_bias 4
> pg_num_min 8 application rgw
> pool 13 'ncy.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 15236
> lfor 0/15236/15234 flags hashpspool stripe_width 0 pg_autoscale_bias 4
> pg_num_min 8 application rgw
> pool 14 'ncy.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> 15241 flags hashpspool stripe_width 0 application rgw
>
> (EC32 is a erasure coding with 3 datas and 2 codes)
>
> No output with "ceph osd pool autoscale-status"
>
> Le jeu. 2 mars 2023 à 15:02, Curt  a écrit :
>
>> Forgot to do a reply all.
>>
>> What does
>>
>> ceph osd df
>> ceph osd dump | grep pool return?
>>
>> Are you using auto scaling? 289pg with 272tb of data and 60 osds, that
>> seems like 3-4 pg per osd at almost 1TB each. Unless I'm thinking of this
>> wrong.
>>
>> On Thu, Mar 2, 2023, 17:37 Joffrey  wrote:
>>
>>> My Ceph Version is 17.2.5 and all configuration about osd_scrub* are
>>> defaults. I tried some updates on osd-max-backfills but no change.
>>> I have many HDD with NVME for db and all are connected in a 25G network.
>>>
>>> Yes, it's the same PG since 4 days.
>>>
>>> I got a failure on a HDD and get many days of recovery+backfilling last
>>> 2
>>> weeks.   Perhaps the 'not in time' warning is related to this.
>>>
>>> 'Jof
>>>
>>> Le jeu. 2 mars 2023 à 14:25, Anthony D'Atri  a
>>> écrit :
>>>
>>> > Run `ceph health detail`.
>>> >
>>> > Is it the same PG backfilling for a long time, or a different one over
>>> > time?
>>> >
>>> > That it’s remapped makes me think that what you’re seeing is the
>>> balancer
>>> > doing its job.
>>> >
>>> > As far as the scrubbing, do you limit the times when scrubbing can
>>> happen?
>>> > Are these HDDs? EC?
>>> >
>>> > > On Mar 2, 2023, at 07:20, Joffrey  wrote:
>>> > >
>>> > > Hi,
>>> > >
>>> > > I have many 'not {deep-}scrubbed in time' and a1 PG
>>> remapped+backfilling
>>> > > and I don't understand why this backfilling is taking so long.
>>> > >
>>> > > root@hbgt-ceph1-mon3:/# ceph -s
>>> > >  cluster:
>>> > >id: c300532c-51fa-11ec-9a41-0050569c3b55
>>> > >health: HEALTH_WARN
>>> > >15 pgs not deep-scrubbed in time
>>> > >13 pgs not scrubbed in time
>>> > >
>>> > >  services:
>>> > >mon: 3 daemons, quorum
>>> hbgt-ceph1-mon1,hbgt-ceph1-mon2,hbgt-ceph1-mon3
>>> > > (age 36h)
>>> > >mgr: hbgt-ceph1-mon2.nteihj(active, since 2d), standbys:
>>> > > hbgt-ceph1-mon1.thrnnu, hbgt-ceph1-mon3.gmfzqm
>>> > >osd: 60 osds: 60 up (since 13h), 60 in (since 13h); 1 remapped pgs
>>> > >rgw: 3 daemons active (3 hosts, 2 zones)
>>> > >
>>> > >  data:
>>> > >pools:   13 pools, 289 pgs
>>> > >objects: 67.74M objects, 127 TiB
>>> > >usage:   272 TiB used, 769 TiB / 1.0 PiB avail
>>> > >pgs: 288 active+clean
>>> > > 1   active+remapped+backfilling
>>> > >
>>> > >  io:
>>> > >client:   3.3 KiB/s rd, 1.5 MiB/s wr, 3 op/s rd, 8 op/s wr
>>> > >recovery: 790 KiB/s, 0 objects/s
>>> > >
>>> > >
>>> > > What can I do to understand this slow recovery (is it the backfill
>>> > action ?)
>>> > >
>>> > > Thanks you
>>> > >
>>> > > 'Jof
>>> > > ___
>>> > > ceph-users mailing list -- ceph-users@ceph.io
>>> > > To unsubscribe send an email to ceph-users-le...@ceph.io
>>> >
>>> >
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Very slow backfilling

2023-03-02 Thread Curt
Forgot to do a reply all.

What does

ceph osd df
ceph osd dump | grep pool return?

Are you using auto scaling? 289pg with 272tb of data and 60 osds, that
seems like 3-4 pg per osd at almost 1TB each. Unless I'm thinking of this
wrong.

On Thu, Mar 2, 2023, 17:37 Joffrey  wrote:

> My Ceph Version is 17.2.5 and all configuration about osd_scrub* are
> defaults. I tried some updates on osd-max-backfills but no change.
> I have many HDD with NVME for db and all are connected in a 25G network.
>
> Yes, it's the same PG since 4 days.
>
> I got a failure on a HDD and get many days of recovery+backfilling last  2
> weeks.   Perhaps the 'not in time' warning is related to this.
>
> 'Jof
>
> Le jeu. 2 mars 2023 à 14:25, Anthony D'Atri  a écrit :
>
> > Run `ceph health detail`.
> >
> > Is it the same PG backfilling for a long time, or a different one over
> > time?
> >
> > That it’s remapped makes me think that what you’re seeing is the balancer
> > doing its job.
> >
> > As far as the scrubbing, do you limit the times when scrubbing can
> happen?
> > Are these HDDs? EC?
> >
> > > On Mar 2, 2023, at 07:20, Joffrey  wrote:
> > >
> > > Hi,
> > >
> > > I have many 'not {deep-}scrubbed in time' and a1 PG
> remapped+backfilling
> > > and I don't understand why this backfilling is taking so long.
> > >
> > > root@hbgt-ceph1-mon3:/# ceph -s
> > >  cluster:
> > >id: c300532c-51fa-11ec-9a41-0050569c3b55
> > >health: HEALTH_WARN
> > >15 pgs not deep-scrubbed in time
> > >13 pgs not scrubbed in time
> > >
> > >  services:
> > >mon: 3 daemons, quorum
> hbgt-ceph1-mon1,hbgt-ceph1-mon2,hbgt-ceph1-mon3
> > > (age 36h)
> > >mgr: hbgt-ceph1-mon2.nteihj(active, since 2d), standbys:
> > > hbgt-ceph1-mon1.thrnnu, hbgt-ceph1-mon3.gmfzqm
> > >osd: 60 osds: 60 up (since 13h), 60 in (since 13h); 1 remapped pgs
> > >rgw: 3 daemons active (3 hosts, 2 zones)
> > >
> > >  data:
> > >pools:   13 pools, 289 pgs
> > >objects: 67.74M objects, 127 TiB
> > >usage:   272 TiB used, 769 TiB / 1.0 PiB avail
> > >pgs: 288 active+clean
> > > 1   active+remapped+backfilling
> > >
> > >  io:
> > >client:   3.3 KiB/s rd, 1.5 MiB/s wr, 3 op/s rd, 8 op/s wr
> > >recovery: 790 KiB/s, 0 objects/s
> > >
> > >
> > > What can I do to understand this slow recovery (is it the backfill
> > action ?)
> > >
> > > Thanks you
> > >
> > > 'Jof
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrade not doing anything...

2023-02-27 Thread Curt
Did any of your cluster get partial upgrade? What about ceph -W cephadm,
does that return anything or just hang, also what about ceph health
detail?  You can always try ceph orch upgrade pause and then orch upgrade
resume, might kick something loose, so to speak.

On Tue, Feb 28, 2023, 10:39 Jeremy Hansen  wrote:

> {
> "target_image": "quay.io/ceph/ceph:v16.2.11",
> "in_progress": true,
> "services_complete": [],
> "progress": "",
> "message": ""
> }
>
> Hasn’t changed in the past two hours.
>
> -jeremy
>
>
>
> On Monday, Feb 27, 2023 at 10:22 PM, Curt  wrote:
> What does Ceph orch upgrade status return?
>
> On Tue, Feb 28, 2023, 10:16 Jeremy Hansen  wrote:
>
>> I’m trying to upgrade from 16.2.7 to 16.2.11.  Reading the documentation,
>> I cut and paste the orchestrator command to begin the upgrade, but I
>> mistakenly pasted directly from the docs and it initiated an “upgrade” to
>> 16.2.6.  I stopped the upgrade per the docs and reissued the command
>> specifying 16.2.11 but now I see no progress in ceph -s.  Cluster is
>> healthy but it feels like the upgrade process is just paused for some
>> reason.
>>
>> Thanks!
>> -jeremy
>>
>>
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrade not doing anything...

2023-02-27 Thread Curt
What does Ceph orch upgrade status return?

On Tue, Feb 28, 2023, 10:16 Jeremy Hansen  wrote:

> I’m trying to upgrade from 16.2.7 to 16.2.11.  Reading the documentation,
> I cut and paste the orchestrator command to begin the upgrade, but I
> mistakenly pasted directly from the docs and it initiated an “upgrade” to
> 16.2.6.  I stopped the upgrade per the docs and reissued the command
> specifying 16.2.11 but now I see no progress in ceph -s.  Cluster is
> healthy but it feels like the upgrade process is just paused for some
> reason.
>
> Thanks!
> -jeremy
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd map error: couldn't connect to the cluster!

2023-02-27 Thread Curt
Needs to be inside the " with your other commands.

On Mon, Feb 27, 2023, 16:55 Thomas Schneider <74cmo...@gmail.com> wrote:

> Hi,
>
> I get an error running this ceph auth get-or-create syntax:
>
> # ceph auth get-or-create client.${rbdName} mon  "allow r" osd "allow
> rwx pool ${rbdPoolName} object_prefix rbd_data.${imageID}; allow rwx
> pool ${rbdPoolName} object_prefix rbd_header.${imageID}; allow rx pool
> ${rbdPoolName} object_prefix rbd_id.${rbdName}"; allow class rbd
> metadata_list pool ${rbdPoolName} -o
> /etc/ceph/ceph.client.${rbdName}.keyring;
> [client.VCT]
>  key = AQDGp/xj5EKrFRAArU7SyOVF8NFUC4lRCWwmCQ==
> -bash: allow: command not found.
>
> THX
>
> Am 26.02.2023 um 12:46 schrieb Ilya Dryomov:
> > allow class rbd metadata_list pool hdb_backup
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd map error: couldn't connect to the cluster!

2023-02-23 Thread Curt
What does 'rbd ls hbd_backup' return? Or is your pool VCT?  Which if that's
the case those should be switched. 'rbd map VCT/hdb_backup --id VCT
--keyring /etc/ceph/ceph.client.VCT.keyring'

On Thu, Feb 23, 2023 at 6:54 PM Thomas Schneider <74cmo...@gmail.com> wrote:

> Hm... I'm not sure about the correct rbd command syntax, but I thought
> it's correct.
>
> Anyway, using a different ID fails, too:
> # rbd map hdb_backup/VCT --id client.VCT --keyring
> /etc/ceph/ceph.client.VCT.keyring
> rbd: couldn't connect to the cluster!
>
> # rbd map hdb_backup/VCT --id VCT --keyring
> /etc/ceph/ceph.client.VCT.keyring
> 2023-02-23T15:46:16.848+0100 7f222d19d700 -1
> librbd::image::GetMetadataRequest: 0x7f220c001ef0 handle_metadata_list:
> failed to retrieve image metadata: (1) Operation not permitted
> 2023-02-23T15:46:16.848+0100 7f222d19d700 -1
> librbd::image::RefreshRequest: failed to retrieve pool metadata: (1)
> Operation not permitted
> 2023-02-23T15:46:16.848+0100 7f222d19d700 -1 librbd::image::OpenRequest:
> failed to refresh image: (1) Operation not permitted
> 2023-02-23T15:46:16.848+0100 7f222c99c700 -1 librbd::ImageState:
> 0x5569d8a16ba0 failed to open image: (1) Operation not permitted
> rbd: error opening image VCT: (1) Operation not permitted
>
>
> Am 23.02.2023 um 15:30 schrieb Eugen Block:
> > You don't specify which client in your rbd command:
> >
> >> rbd map hdb_backup/VCT --id client --keyring
> >> /etc/ceph/ceph.client.VCT.keyring
> >
> > Have you tried this (not sure about upper-case client names, haven't
> > tried that)?
> >
> > rbd map hdb_backup/VCT --id VCT --keyring
> > /etc/ceph/ceph.client.VCT.keyring
> >
> >
> > Zitat von Thomas Schneider <74cmo...@gmail.com>:
> >
> >> Hello,
> >>
> >> I'm trying to mount RBD using rbd map, but I get this error message:
> >> # rbd map hdb_backup/VCT --id client --keyring
> >> /etc/ceph/ceph.client.VCT.keyring
> >> rbd: couldn't connect to the cluster!
> >>
> >> Checking on Ceph server the required permission for relevant keyring
> >> exists:
> >> # ceph-authtool -l /etc/ceph/ceph.client.VCT.keyring
> >> [client.VCT]
> >> key = AQBj3LZjNGn/BhAAG8IqMyH0WLKi4kTlbjiW7g==
> >>
> >> # ceph auth get client.VCT
> >> [client.VCT]
> >> key = AQBj3LZjNGn/BhAAG8IqMyH0WLKi4kTlbjiW7g==
> >> caps mon = "allow r"
> >> caps osd = "allow rwx pool hdb_backup object_prefix
> >> rbd_data.b768d4baac048b; allow rwx pool hdb_backup object_prefix
> >> rbd_header.b768d4baac048b; allow rx pool hdb_backup object_prefix
> >> rbd_id.VCT"
> >> exported keyring for client.VCT
> >>
> >>
> >> Can you please advise how to fix this error?
> >>
> >>
> >> THX
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Telegraf plugin reset

2022-09-22 Thread Curt
Hello,

If I had to guess : indicates a port number like :443, so it's expecting an
int and you are passing a string.  Try changing https to 443

On Thu, Sep 22, 2022 at 8:24 PM Nikhil Mitra (nikmitra) 
wrote:

> Greetings,
>
> We are trying to use the telegraf module to send metrics to InfluxDB and
> we keep facing the below error. Any help will be appreciated, thank you.
>
> # ceph telegraf config-show
> Error EIO: Module 'telegraf' has experienced an error and cannot handle
> commands: invalid literal for int() with base 10: 'https'
>
> # ceph config dump | grep -i telegraf
>   mgr   advanced mgr/telegraf/address
>  tcp://test.xyz.com:https *
>
> ceph version 14.2.22-110.el7cp
>
> --
> Regards,
> Nikhil Mitra
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph orch device ls extents

2022-07-08 Thread Curt
Hello,

Ran into an interesting error today and I'm not sure best way to fix it.
When I run 'ceph orch device ls', I get the following error "Insufficient
space (<10 extents) on vgs, LVM detected, locked", on every HD.

Here's the output of ceph-volume lvm list, incase it helps
== osd.0 ===

  [block]
/dev/ceph-efb83a91-3c7b-4329-babc-017b0a00e95a/osd-block-b017780d-38f9-4da7-b9df-2da66e1aa0fd

  block device
 
/dev/ceph-efb83a91-3c7b-4329-babc-017b0a00e95a/osd-block-b017780d-38f9-4da7-b9df-2da66e1aa0fd
  block uuid8kIdfD-kQSh-Mhe4-zRIL-b1Pf-PTaC-CVosbE
  cephx lockbox secret
  cluster fsid  1684fe88-aae0-11ec-9593-df430e3982a0
  cluster name  ceph
  crush device classNone
  encrypted 0
  osd fsid  b017780d-38f9-4da7-b9df-2da66e1aa0fd
  osd id0
  osdspec affinity  dashboard-admin-1648152609405
  type  block
  vdo   0
  devices   /dev/sdb

== osd.10 ==

  [block]
/dev/ceph-a0e85035-cfe2-4070-b58a-a88ec964794c/osd-block-3c353f8c-ab0f-4589-9e98-4f840e86341a

  block device
 
/dev/ceph-a0e85035-cfe2-4070-b58a-a88ec964794c/osd-block-3c353f8c-ab0f-4589-9e98-4f840e86341a
  block uuidgvvrMV-O98L-P6Sl-dnJT-NVwM-P85e-Reqql4
  cephx lockbox secret
  cluster fsid  1684fe88-aae0-11ec-9593-df430e3982a0
  cluster name  ceph
  crush device classNone
  encrypted 0
  osd fsid  3c353f8c-ab0f-4589-9e98-4f840e86341a
  osd id10
  osdspec affinity  dashboard-admin-1648152609405
  type  block
  vdo   0
  devices   /dev/sdh

== osd.12 ==

lvdisplay
--- Logical volume ---
  LV Path
 
/dev/ceph-a0e85035-cfe2-4070-b58a-a88ec964794c/osd-block-3c353f8c-ab0f-4589-9e98-4f840e86341a
  LV Nameosd-block-3c353f8c-ab0f-4589-9e98-4f840e86341a
  VG Nameceph-a0e85035-cfe2-4070-b58a-a88ec964794c
  LV UUIDgvvrMV-O98L-P6Sl-dnJT-NVwM-P85e-Reqql4
  LV Write Accessread/write
  LV Creation host, time hyperion02, 2022-03-24 20:12:17 +
  LV Status  available
  # open 24
  LV Size<1.82 TiB
  Current LE 476932
  Segments   1
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device   253:4

Let me know if you need any other information.

Thanks,
Curt
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery network speed

2022-07-01 Thread Curt
On Wed, Jun 29, 2022 at 11:22 PM Curt  wrote:

>
>
> On Wed, Jun 29, 2022 at 9:55 PM Stefan Kooman  wrote:
>
>> On 6/29/22 19:34, Curt wrote:
>> > Hi Stefan,
>> >
>> > Thank you, that definitely helped. I bumped it to 20% for now and
>> that's
>> > giving me around 124 PGs backfilling at 187 MiB/s, 47 Objects/s.  I'll
>> > see how that runs and then increase it a bit more if the cluster
>> handles
>> > it ok.
>> >
>> > Do you think it's worth enabling scrubbing while backfilling?
>>
>> If the cluster can cope with the extra load, sure. If it slows down the
>> backfilling to levels that are too slow ... temporarily disable it.
>>
>> Since
>> > this is going to take a while. I do have 1 inconsistent PG that has now
>> > become 10 as it splits.
>>
>> Hmm. Well, if it finds broken PGs, for sure pause backfilling (ceph osd
>> set nobackfill) and have it handle this ASAP: ceph pg repair $pg.
>> Something is wrong, and you want to have this fixed sooner rather than
>> later.
>>
>
>  When I try to run a repair nothing happens, if I try to list
> inconsistent-obj I get No scrub information available for 12.12.  If I tell
> it to run a deep scrub, nothing.  I'll set debug and see what I can find in
> the logs.
>
Just to give a quick update. This one was my fault, I missed a flag. Once
set correctly, scrubbed and repaired.  It's now back to adding more PG's,
which continue to get a bit faster as it expands.  I'm now up to pg_num
1362 and pgp_num 1234, with backfills happening at 250-300 Mb/s 60-70
Objects/s.

Thanks for all the help.

>
>> Not sure what hardware you have, but you might benefit from disabling
>> write caches, see this link:
>>
>> https://docs.ceph.com/en/quincy/start/hardware-recommendations/#write-caches
>>
>> Thanks, I'm disabling cache and I'll see if it helps at all.

> Gr. Stefan
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery network speed

2022-06-29 Thread Curt
On Wed, Jun 29, 2022 at 9:55 PM Stefan Kooman  wrote:

> On 6/29/22 19:34, Curt wrote:
> > Hi Stefan,
> >
> > Thank you, that definitely helped. I bumped it to 20% for now and that's
> > giving me around 124 PGs backfilling at 187 MiB/s, 47 Objects/s.  I'll
> > see how that runs and then increase it a bit more if the cluster handles
> > it ok.
> >
> > Do you think it's worth enabling scrubbing while backfilling?
>
> If the cluster can cope with the extra load, sure. If it slows down the
> backfilling to levels that are too slow ... temporarily disable it.
>
> Since
> > this is going to take a while. I do have 1 inconsistent PG that has now
> > become 10 as it splits.
>
> Hmm. Well, if it finds broken PGs, for sure pause backfilling (ceph osd
> set nobackfill) and have it handle this ASAP: ceph pg repair $pg.
> Something is wrong, and you want to have this fixed sooner rather than
> later.
>

 When I try to run a repair nothing happens, if I try to list
inconsistent-obj I get No scrub information available for 12.12.  If I tell
it to run a deep scrub, nothing.  I'll set debug and see what I can find in
the logs.

>
> Not sure what hardware you have, but you might benefit from disabling
> write caches, see this link:
>
> https://docs.ceph.com/en/quincy/start/hardware-recommendations/#write-caches
>
> Gr. Stefan
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery network speed

2022-06-29 Thread Curt
Hi Stefan,

Thank you, that definitely helped. I bumped it to 20% for now and that's
giving me around 124 PGs backfilling at 187 MiB/s, 47 Objects/s.  I'll see
how that runs and then increase it a bit more if the cluster handles it ok.

Do you think it's worth enabling scrubbing while backfilling?  Since this
is going to take a while. I do have 1 inconsistent PG that has now become
10 as it splits.

ceph health detail
HEALTH_ERR 21 scrub errors; Possible data damage: 10 pgs inconsistent; 2
pgs not deep-scrubbed in time
[ERR] OSD_SCRUB_ERRORS: 21 scrub errors
[ERR] PG_DAMAGED: Possible data damage: 10 pgs inconsistent
pg 12.12 is active+clean+inconsistent, acting [28,1,37,0]
pg 12.32 is active+clean+inconsistent, acting [37,3,14,22]
pg 12.52 is active+clean+inconsistent, acting [4,33,7,23]
pg 12.72 is active+remapped+inconsistent+backfilling, acting
[37,3,14,22]
pg 12.92 is active+remapped+inconsistent+backfilling, acting [28,1,37,0]
pg 12.b2 is active+remapped+inconsistent+backfilling, acting
[37,3,14,22]
pg 12.d2 is active+clean+inconsistent, acting [4,33,7,23]
pg 12.f2 is active+remapped+inconsistent+backfilling, acting
[37,3,14,22]
pg 12.112 is active+clean+inconsistent, acting [28,1,37,0]
pg 12.132 is active+clean+inconsistent, acting [37,3,14,22]
[WRN] PG_NOT_DEEP_SCRUBBED: 2 pgs not deep-scrubbed in time
pg 4.13 not deep-scrubbed since 2022-06-16T03:15:16.758943+
pg 7.1 not deep-scrubbed since 2022-06-16T20:51:12.211259+

Thanks,
Curt

On Wed, Jun 29, 2022 at 5:53 PM Stefan Kooman  wrote:

> On 6/29/22 15:14, Curt wrote:
>
>
> >
> > Hi Stefan,
> >
> > Good to know.  I see the default if .05 for misplaced_ratio.  What do
> > you recommend would be a safe number to increase it to?
>
> It depends. It might be safe to put it to 1. But I would slowly increase
> it, have the manager increase pgp_num and see how the cluster copes with
> the increased load. If you have hardly any client workload you might
> bump this ratio quite a bit. At some point you would need to increase
> osd max backfill to avoid having PGs waiting on backfill.
>
> Gr. Stefan
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery network speed

2022-06-29 Thread Curt
On Wed, Jun 29, 2022 at 4:42 PM Stefan Kooman  wrote:

> On 6/29/22 11:21, Curt wrote:
> > On Wed, Jun 29, 2022 at 1:06 PM Frank Schilder  wrote:
> >
> >> Hi,
> >>
> >> did you wait for PG creation and peering to finish after setting pg_num
> >> and pgp_num? They should be right on the value you set and not lower.
> >>
> > Yes, only thing going on was backfill. It's still just slowly expanding
> pg
> > and pgp nums.   I even ran the set command again.  Here's the current
> info
> > ceph osd pool get EC-22-Pool all
> > size: 4
> > min_size: 3
> > pg_num: 226
> > pgp_num: 98
>
> This is coded in the mons and works like that from nautilus onwards:
>
> src/mon/OSDMonitor.cc
>
> ...
>  if (osdmap.require_osd_release < ceph_release_t::nautilus) {
>// pre-nautilus osdmap format; increase pg_num directly
>assert(n > (int)p.get_pg_num());
>// force pre-nautilus clients to resend their ops, since they
>// don't understand pg_num_target changes form a new interval
>p.last_force_op_resend_prenautilus = pending_inc.epoch;
>// force pre-luminous clients to resend their ops, since they
>// don't understand that split PGs now form a new interval.
>p.last_force_op_resend_preluminous = pending_inc.epoch;
>p.set_pg_num(n);
>  } else {
>// set targets; mgr will adjust pg_num_actual and pgp_num later.
>// make pgp_num track pg_num if it already matches.  if it is set
>// differently, leave it different and let the user control it
>// manually.
>if (p.get_pg_num_target() == p.get_pgp_num_target()) {
>  p.set_pgp_num_target(n);
>}
>p.set_pg_num_target(n);
>  }
> ...
>
> So, when pg_num and pgp_num are the same when pg_num is increased, it
> will slowly change pgp_num. If pgp_num is different (smaller, as it
> cannot be bigger than pg_num) it will not touch pgp_num.
>
> You might speed up this process by increasing "target_max_misplaced_ratio"
>
> Gr. Stefan
>

Hi Stefan,

Good to know.  I see the default if .05 for misplaced_ratio.  What do you
recommend would be a safe number to increase it to?

Thanks,
Curt
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery network speed

2022-06-29 Thread Curt
On Wed, Jun 29, 2022 at 1:06 PM Frank Schilder  wrote:

> Hi,
>
> did you wait for PG creation and peering to finish after setting pg_num
> and pgp_num? They should be right on the value you set and not lower.
>
Yes, only thing going on was backfill. It's still just slowly expanding pg
and pgp nums.   I even ran the set command again.  Here's the current info
ceph osd pool get EC-22-Pool all
size: 4
min_size: 3
pg_num: 226
pgp_num: 98
crush_rule: EC-22-Pool
hashpspool: true
allow_ec_overwrites: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
erasure_code_profile: EC-22-Pro
fast_read: 0
pg_autoscale_mode: off
eio: false
bulk: false

>
> > How do you set the upmap balancer per pool?
>
> I'm afraid the answer is RTFM. I don't use it, but I believe to remember
> one could configure it for equi-distribution of PGs for each pool.
>
> Ok, I'll dig around some more. I glanced at the balancer page and didn't
see it.


> Whenever you grow the cluster, you should make the same considerations
> again and select numbers of PG per pool depending on number of objects,
> capacity and performance.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Curt 
> Sent: 28 June 2022 16:33:24
> To: Frank Schilder
> Cc: Robert Gallop; ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: Ceph recovery network speed
>
> Hi Frank,
>
> Thank you for the thorough breakdown. I have increased the pg_num and
> pgp_num to 1024 to start on the ec-22 pool. That is going to be my primary
> pool with the most data.  It looks like ceph slowly scales the pg up even
> with autoscaling off, since I see target_pg_num 2048, pg_num 199.
>
> root@cephmgr:/# ceph osd pool set EC-22-Pool pg_num 2048
> set pool 12 pg_num to 2048
> root@cephmgr:/# ceph osd pool set EC-22-Pool pgp_num 2048
> set pool 12 pgp_num to 2048
> root@cephmgr:/# ceph osd pool get EC-22-Pool all
> size: 4
> min_size: 3
> pg_num: 199
> pgp_num: 71
> crush_rule: EC-22-Pool
> hashpspool: true
> allow_ec_overwrites: true
> nodelete: false
> nopgchange: false
> nosizechange: false
> write_fadvise_dontneed: false
> noscrub: false
> nodeep-scrub: false
> use_gmt_hitset: 1
> erasure_code_profile: EC-22-Pro
> fast_read: 0
> pg_autoscale_mode: off
> eio: false
> bulk: false
>
> This cluster will be growing quit a bit over the next few months.  I am
> migrating data from their old Giant cluster to a new one, by the time I'm
> done it should be 16 hosts with about 400TB of data. I'm guessing I'll have
> to increase pg again later when I start adding more servers to the cluster.
>
> I will look into if SSD's are an option.  How do you set the upmap
> balancer per pool?  Looking at ceph balancer status my mode is already
> upmap.
>
> Thanks again,
> Curt
>
> On Tue, Jun 28, 2022 at 1:23 AM Frank Schilder  fr...@dtu.dk>> wrote:
> Hi Curt,
>
> looking at what you sent here, I believe you are the victim of "the law of
> large numbers really only holds for large numbers". In other words, the
> statistics of small samples is biting you. The PG numbers of your pools are
> so low that they lead to a very large imbalance of data- and IO placement.
> In other words, in your cluster a few OSDs receive the majority of IO
> requests and bottleneck the entire cluster.
>
> If I see this correctly, the PG num per drive varies from 14 to 40. That's
> an insane imbalance. Also, on your EC pool PG_num is 128 but PGP_num is
> only 48. The autoscaler is screwing it up for you. It will slowly increase
> the number of active PGs, causing continuous relocation of objects for a
> very long time.
>
> I think the recovery speed you see for 8 objects per second is not too bad
> considering that you have an HDD only cluster. The speed does not increase,
> because it is a small number of PGs sending data - a subset of the 32 you
> had before. In addition, due to the imbalance of PGs per OSD, only a small
> number of PGs will be able to send data. You will need patience to get out
> of this corner.
>
> The first thing I would do is look at which pools are important for your
> workload in the long run. I see 2 pools having a significant number of
> objects: EC-22-Pool and default.rgw.buckets.data. EC-22-Pool has about 40
> times the number of objects and bytes as default.rgw.buckets.data. I would
> scale both up in PG count with emphasis on EC-22-Pool.
>
> Your cluster can safely operate between 1100 and 2200 PGs with replication
> <=4. If you don't plan to create more large pools, a good choice of
> distributin

[ceph-users] Re: Ceph recovery network speed

2022-06-28 Thread Curt
Hi Frank,

Thank you for the thorough breakdown. I have increased the pg_num and
pgp_num to 1024 to start on the ec-22 pool. That is going to be my primary
pool with the most data.  It looks like ceph slowly scales the pg up even
with autoscaling off, since I see target_pg_num 2048, pg_num 199.

root@cephmgr:/# ceph osd pool set EC-22-Pool pg_num 2048
set pool 12 pg_num to 2048
root@cephmgr:/# ceph osd pool set EC-22-Pool pgp_num 2048
set pool 12 pgp_num to 2048
root@cephmgr:/# ceph osd pool get EC-22-Pool all
size: 4
min_size: 3
pg_num: 199
pgp_num: 71
crush_rule: EC-22-Pool
hashpspool: true
allow_ec_overwrites: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
erasure_code_profile: EC-22-Pro
fast_read: 0
pg_autoscale_mode: off
eio: false
bulk: false

This cluster will be growing quit a bit over the next few months.  I am
migrating data from their old Giant cluster to a new one, by the time I'm
done it should be 16 hosts with about 400TB of data. I'm guessing I'll have
to increase pg again later when I start adding more servers to the cluster.

I will look into if SSD's are an option.  How do you set the upmap balancer
per pool?  Looking at ceph balancer status my mode is already upmap.

Thanks again,
Curt

On Tue, Jun 28, 2022 at 1:23 AM Frank Schilder  wrote:

> Hi Curt,
>
> looking at what you sent here, I believe you are the victim of "the law of
> large numbers really only holds for large numbers". In other words, the
> statistics of small samples is biting you. The PG numbers of your pools are
> so low that they lead to a very large imbalance of data- and IO placement.
> In other words, in your cluster a few OSDs receive the majority of IO
> requests and bottleneck the entire cluster.
>
> If I see this correctly, the PG num per drive varies from 14 to 40. That's
> an insane imbalance. Also, on your EC pool PG_num is 128 but PGP_num is
> only 48. The autoscaler is screwing it up for you. It will slowly increase
> the number of active PGs, causing continuous relocation of objects for a
> very long time.
>
> I think the recovery speed you see for 8 objects per second is not too bad
> considering that you have an HDD only cluster. The speed does not increase,
> because it is a small number of PGs sending data - a subset of the 32 you
> had before. In addition, due to the imbalance of PGs per OSD, only a small
> number of PGs will be able to send data. You will need patience to get out
> of this corner.
>
> The first thing I would do is look at which pools are important for your
> workload in the long run. I see 2 pools having a significant number of
> objects: EC-22-Pool and default.rgw.buckets.data. EC-22-Pool has about 40
> times the number of objects and bytes as default.rgw.buckets.data. I would
> scale both up in PG count with emphasis on EC-22-Pool.
>
> Your cluster can safely operate between 1100 and 2200 PGs with replication
> <=4. If you don't plan to create more large pools, a good choice of
> distributing this capacity might be
>
> EC-22-Pool: 1024 PGs (could be pushed up to 2048)
> default.rgw.buckets.data: 256 PGs
>
> That's towards the lower end of available PGs. Please make your own
> calculation and judgement.
>
> If you have settled on target numbers, change the pool sizes in one go,
> that is, set PG_num and PGP_num to the same value right away. You might
> need to turn autoscaler off for these 2 pools. The rebalancing will take a
> long time and also not speed up, because the few sending PGs are the
> bottleneck, not the receiving ones. You will have to sit it out.
>
> The goal is that, in the future, recovery and re-balancing are improved.
> In my experience, a reasonably high PG count will also reduce latency of
> client IO.
>
> Next thing to look at is distribution of PGs per OSD. This has an enormous
> performance impact, because a few too busy OSDs can throttle an entire
> cluster (its always the slowest disk that wins). I use the very simple
> reweight by utilization method, but my pools do not share OSDs as yours do.
> You might want to try the upmap balancer per pool to get PGs per pool
> evenly spread out over OSDs.
>
> Lastly, if you can afford it and your hosts have a slot left, consider
> buying one enterprise SSD per host for the meta-data pools to get this IO
> away from the HDDs. If you buy a bunch of 128G or 256G SATA SSDs, you can
> probably place everything except the EC-22-Pool on these drives, separating
> completely.
>
> Hope that helps and maybe someone else has ideas as well?
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Curt 
> Sent: 27 Ju

[ceph-users] Re: Ceph recovery network speed

2022-06-27 Thread Curt
   298 GiB5 KiB  1.4
GiB  1.5 TiB  16.05  0.61   26  up  osd.23
24hdd   1.81940   1.0  1.8 TiB   735 GiB   733 GiB8 KiB  2.3
GiB  1.1 TiB  39.45  1.50   33  up  osd.24
25hdd   1.81940   1.0  1.8 TiB   519 GiB   517 GiB5 KiB  1.4
GiB  1.3 TiB  27.85  1.06   26  up  osd.25
26hdd   1.81940   1.0  1.8 TiB   483 GiB   481 GiB  614 KiB  1.7
GiB  1.3 TiB  25.94  0.99   28  up  osd.26
27hdd   1.81940   1.0  1.8 TiB   226 GiB   225 GiB  1.5 MiB  1.0
GiB  1.6 TiB  12.11  0.46   17  up  osd.27
28hdd   1.81940   1.0  1.8 TiB   443 GiB   441 GiB   24 KiB  1.5
GiB  1.4 TiB  23.76  0.91   21  up  osd.28
29hdd   1.81940   1.0  1.8 TiB   801 GiB   799 GiB7 KiB  2.2
GiB  1.0 TiB  42.98  1.64   31  up  osd.29
30hdd   1.81940   1.0  1.8 TiB   523 GiB   522 GiB  174 KiB  1.2
GiB  1.3 TiB  28.09  1.07   29  up  osd.30
31hdd   1.81940   1.0  1.8 TiB   322 GiB   321 GiB4 KiB  1.2
GiB  1.5 TiB  17.30  0.66   26  up  osd.31
44hdd   1.81940   1.0  1.8 TiB   541 GiB   540 GiB  136 KiB  1.4
GiB  1.3 TiB  29.06  1.11   24  up  osd.44
-9 20.01337 -   20 TiB   5.3 TiB   5.2 TiB   25 MiB   16
GiB   15 TiB  26.25  1.00-  host hyperion04
33hdd   1.81940   1.0  1.8 TiB   466 GiB   465 GiB  469 KiB  1.4
GiB  1.4 TiB  25.02  0.95   28  up  osd.33
34hdd   1.81940   1.0  1.8 TiB   508 GiB   506 GiB2 KiB  1.8
GiB  1.3 TiB  27.28  1.04   30  up  osd.34
35hdd   1.81940   1.0  1.8 TiB   521 GiB   520 GiB2 KiB  1.4
GiB  1.3 TiB  27.98  1.07   32  up  osd.35
36hdd   1.81940   1.0  1.8 TiB   872 GiB   870 GiB3 KiB  2.3
GiB  991 GiB  46.81  1.78   40  up  osd.36
37hdd   1.81940   1.0  1.8 TiB   443 GiB   441 GiB  136 KiB  1.2
GiB  1.4 TiB  23.75  0.91   25  up  osd.37
38hdd   1.81940   1.0  1.8 TiB   138 GiB   137 GiB   24 MiB  647
MiB  1.7 TiB   7.40  0.28   27  up  osd.38
39hdd   1.81940   1.0  1.8 TiB   638 GiB   637 GiB  622 KiB  1.7
GiB  1.2 TiB  34.26  1.31   33  up  osd.39
40hdd   1.81940   1.0  1.8 TiB   444 GiB   443 GiB   14 KiB  1.4
GiB  1.4 TiB  23.85  0.91   25  up  osd.40
41hdd   1.81940   1.0  1.8 TiB   477 GiB   476 GiB  264 KiB  1.3
GiB  1.4 TiB  25.60  0.98   31  up  osd.41
42hdd   1.81940   1.0  1.8 TiB   514 GiB   513 GiB   35 KiB  1.2
GiB  1.3 TiB  27.61  1.05   29  up  osd.42
43hdd   1.81940   1.0  1.8 TiB   358 GiB   356 GiB  111 KiB  1.2
GiB  1.5 TiB  19.19  0.73   24  up  osd.43
TOTAL   80 TiB21 TiB21 TiB   32 MiB   69
GiB   59 TiB  26.23
MIN/MAX VAR: 0.12/2.36  STDDEV: 12.47

>
> The number of objects in flight looks small. Your objects seem to have an
> average size of 4MB and should recover with full bandwidth. Check with top
> how much IO wait percentage you have on the OSD hosts.
>
iowait is 3.3% and load avg is 3.7, nothing crazy from what I can tell.


>
> The one thing that jumps to my eye though is, that you only have 22 dirty
> PGs and they are all recovering/backfilling already. I wonder if you have a
> problem with your crush rules, they might not do what you think they do.
> You said you increased the PG count for EC-22-Pool to 128 (from what?) but
> it doesn't really look like a suitable number of PGs has been marked for
> backfilling. Can you post the output of "ceph osd pool get EC-22-Pool all"?
>
From 32 to 128
ceph osd pool get EC-22-Pool all
size: 4
min_size: 3
pg_num: 128
pgp_num: 48
crush_rule: EC-22-Pool
hashpspool: true
allow_ec_overwrites: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
erasure_code_profile: EC-22-Pro
fast_read: 0
pg_autoscale_mode: on
eio: false
bulk: false



>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Curt 
> Sent: 27 June 2022 19:41:06
> To: Robert Gallop
> Cc: Frank Schilder; ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: Ceph recovery network speed
>
> I would love to see those types of speeds. I tried setting it all the way
> to 0 and nothing, I did that before I sent the first email, maybe it was
> your old post I got it from.
>
> osd_recovery_sleep_hdd   0.00
>
>
>  override  (mon[0.00])
>
> On Mon, Jun 27, 2022 at 9:27 PM Robert Gallop  <mailto:robert.gal...@gmail.com>> wrote:
> I saw a major boost after having the sleep_hdd set to 0.  Only after that
> did I start staying at around 50

[ceph-users] Re: Ceph recovery network speed

2022-06-27 Thread Curt
I would love to see those types of speeds. I tried setting it all the way
to 0 and nothing, I did that before I sent the first email, maybe it was
your old post I got it from.

osd_recovery_sleep_hdd   0.00


override  (mon[0.00])

On Mon, Jun 27, 2022 at 9:27 PM Robert Gallop 
wrote:

> I saw a major boost after having the sleep_hdd set to 0.  Only after that
> did I start staying at around 500MiB to 1.2GiB/sec and 1.5k obj/sec to 2.5k
> obj/sec.
>
> Eventually it tapered back down, but for me sleep was the key, and
> specifically in my case:
>
> osd_recovery_sleep_hdd
>
> On Mon, Jun 27, 2022 at 11:17 AM Curt  wrote:
>
>> On Mon, Jun 27, 2022 at 8:52 PM Frank Schilder  wrote:
>>
>> > I think this is just how ceph is. Maybe you should post the output of
>> > "ceph status", "ceph osd pool stats" and "ceph df" so that we can get an
>> > idea whether what you look at is expected or not. As I wrote before,
>> object
>> > recovery is throttled and the recovery bandwidth depends heavily on
>> object
>> > size. The interesting question is, how many objects per second are
>> > recovered/rebalanced
>> >
>>  data:
>> pools:   11 pools, 369 pgs
>> objects: 2.45M objects, 9.2 TiB
>> usage:   20 TiB used, 60 TiB / 80 TiB avail
>> pgs: 512136/9729081 objects misplaced (5.264%)
>>  343 active+clean
>>  22  active+remapped+backfilling
>>
>>   io:
>> client:   2.0 MiB/s rd, 344 KiB/s wr, 142 op/s rd, 69 op/s wr
>> recovery: 34 MiB/s, 8 objects/s
>>
>> Pool 12 is the only one with any stats.
>>
>> pool EC-22-Pool id 12
>>   510048/9545052 objects misplaced (5.344%)
>>   recovery io 36 MiB/s, 9 objects/s
>>   client io 1.8 MiB/s rd, 404 KiB/s wr, 86 op/s rd, 72 op/s wr
>>
>> --- RAW STORAGE ---
>> CLASSSIZE   AVAILUSED  RAW USED  %RAW USED
>> hdd80 TiB  60 TiB  20 TiB20 TiB  25.45
>> TOTAL  80 TiB  60 TiB  20 TiB20 TiB  25.45
>>
>> --- POOLS ---
>> POOLID  PGS   STORED  OBJECTS USED  %USED  MAX
>> AVAIL
>> .mgr 11  152 MiB   38  457 MiB  0
>>  9.2 TiB
>> 21BadPool3   328 KiB1   12 KiB  0
>> 18 TiB
>> .rgw.root4   32  1.3 KiB4   48 KiB  0
>>  9.2 TiB
>> default.rgw.log  5   32  3.6 KiB  209  408 KiB  0
>>  9.2 TiB
>> default.rgw.control  6   32  0 B8  0 B  0
>>  9.2 TiB
>> default.rgw.meta 78  6.7 KiB   20  203 KiB  0
>>  9.2 TiB
>> rbd_rep_pool 8   32  2.0 MiB5  5.9 MiB  0
>>  9.2 TiB
>> default.rgw.buckets.index98  2.0 MiB   33  5.9 MiB  0
>>  9.2 TiB
>> default.rgw.buckets.non-ec  10   32  1.4 KiB0  4.3 KiB  0
>>  9.2 TiB
>> default.rgw.buckets.data11   32  232 GiB   61.02k  697 GiB   2.41
>>  9.2 TiB
>> EC-22-Pool  12  128  9.8 TiB2.39M   20 TiB  41.55
>> 14 TiB
>>
>>
>>
>> > Maybe provide the output of the first two commands for
>> > osd_recovery_sleep_hdd=0.05 and osd_recovery_sleep_hdd=0.1 each (wait a
>> bit
>> > after setting these and then collect the output). Include the applied
>> > values for osd_max_backfills* and osd_recovery_max_active* for one of
>> the
>> > OSDs in the pool (ceph config show osd.ID | grep -e osd_max_backfills -e
>> > osd_recovery_max_active).
>> >
>>
>> I didn't notice any speed difference with sleep values changed, but I'll
>> grab the stats between changes when I have a chance.
>>
>> ceph config show osd.19 | egrep
>> 'osd_max_backfills|osd_recovery_max_active'
>> osd_max_backfills1000
>>
>>
>> override  mon[5]
>> osd_recovery_max_active  1000
>>
>>
>> override
>> osd_recovery_max_active_hdd  1000
>>
>>
>> override  mon[5]
>> osd_recovery_max_active_ssd  1000
>>
>>
>> override
>>
>> >
>> > I don't really know if on such a small cluster one can expect more than
>> > what you see. It has nothing to do with network speed 

[ceph-users] Re: Ceph recovery network speed

2022-06-27 Thread Curt
On Mon, Jun 27, 2022 at 8:52 PM Frank Schilder  wrote:

> I think this is just how ceph is. Maybe you should post the output of
> "ceph status", "ceph osd pool stats" and "ceph df" so that we can get an
> idea whether what you look at is expected or not. As I wrote before, object
> recovery is throttled and the recovery bandwidth depends heavily on object
> size. The interesting question is, how many objects per second are
> recovered/rebalanced
>
 data:
pools:   11 pools, 369 pgs
objects: 2.45M objects, 9.2 TiB
usage:   20 TiB used, 60 TiB / 80 TiB avail
pgs: 512136/9729081 objects misplaced (5.264%)
 343 active+clean
 22  active+remapped+backfilling

  io:
client:   2.0 MiB/s rd, 344 KiB/s wr, 142 op/s rd, 69 op/s wr
recovery: 34 MiB/s, 8 objects/s

Pool 12 is the only one with any stats.

pool EC-22-Pool id 12
  510048/9545052 objects misplaced (5.344%)
  recovery io 36 MiB/s, 9 objects/s
  client io 1.8 MiB/s rd, 404 KiB/s wr, 86 op/s rd, 72 op/s wr

--- RAW STORAGE ---
CLASSSIZE   AVAILUSED  RAW USED  %RAW USED
hdd80 TiB  60 TiB  20 TiB20 TiB  25.45
TOTAL  80 TiB  60 TiB  20 TiB20 TiB  25.45

--- POOLS ---
POOLID  PGS   STORED  OBJECTS USED  %USED  MAX
AVAIL
.mgr 11  152 MiB   38  457 MiB  0
 9.2 TiB
21BadPool3   328 KiB1   12 KiB  0
18 TiB
.rgw.root4   32  1.3 KiB4   48 KiB  0
 9.2 TiB
default.rgw.log  5   32  3.6 KiB  209  408 KiB  0
 9.2 TiB
default.rgw.control  6   32  0 B8  0 B  0
 9.2 TiB
default.rgw.meta 78  6.7 KiB   20  203 KiB  0
 9.2 TiB
rbd_rep_pool 8   32  2.0 MiB5  5.9 MiB  0
 9.2 TiB
default.rgw.buckets.index98  2.0 MiB   33  5.9 MiB  0
 9.2 TiB
default.rgw.buckets.non-ec  10   32  1.4 KiB0  4.3 KiB  0
 9.2 TiB
default.rgw.buckets.data11   32  232 GiB   61.02k  697 GiB   2.41
 9.2 TiB
EC-22-Pool  12  128  9.8 TiB2.39M   20 TiB  41.55
14 TiB



> Maybe provide the output of the first two commands for
> osd_recovery_sleep_hdd=0.05 and osd_recovery_sleep_hdd=0.1 each (wait a bit
> after setting these and then collect the output). Include the applied
> values for osd_max_backfills* and osd_recovery_max_active* for one of the
> OSDs in the pool (ceph config show osd.ID | grep -e osd_max_backfills -e
> osd_recovery_max_active).
>

I didn't notice any speed difference with sleep values changed, but I'll
grab the stats between changes when I have a chance.

ceph config show osd.19 | egrep 'osd_max_backfills|osd_recovery_max_active'
osd_max_backfills1000


override  mon[5]
osd_recovery_max_active  1000


override
osd_recovery_max_active_hdd  1000


override  mon[5]
osd_recovery_max_active_ssd  1000


override

>
> I don't really know if on such a small cluster one can expect more than
> what you see. It has nothing to do with network speed if you have a 10G
> line. However, recovery is something completely different from a full
> link-speed copy.
>
> I can tell you that boatloads of tiny objects are a huge pain for
> recovery, even on SSD. Ceph doesn't raid up sections of disks against each
> other, but object for object. This might be a feature request: that PG
> space allocation and recovery should follow the model of LVM extends
> (ideally match with LVM extends) to allow recovery/rebalancing larger
> chunks of storage in one go, containing parts of a large or many small
> objects.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Curt 
> Sent: 27 June 2022 17:35:19
> To: Frank Schilder
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: Ceph recovery network speed
>
> Hello,
>
> I had already increased/changed those variables previously.  I increased
> the pg_num to 128. Which increased the number of PG's backfilling, but
> speed is still only at 30 MiB/s avg and has been backfilling 23 pg for the
> last several hours.  Should I increase it higher than 128?
>
> I'm still trying to figure out if this is just how ceph is or if there is
> a bottleneck somewhere.  Like if I sftp a 10G file between servers it's
> done in a couple min or less.  Am I thinking of this wrong?
>
> Thanks,
> Curt
>
> On Mon, Jun 27, 2022 at 12:33 PM Frank Schilder  fr...@dtu.dk>> wrote:
> Hi Curt,
>
> as far as I understood, a 2+2

[ceph-users] Re: Ceph recovery network speed

2022-06-27 Thread Curt
Hello,

I had already increased/changed those variables previously.  I increased
the pg_num to 128. Which increased the number of PG's backfilling, but
speed is still only at 30 MiB/s avg and has been backfilling 23 pg for the
last several hours.  Should I increase it higher than 128?

I'm still trying to figure out if this is just how ceph is or if there is a
bottleneck somewhere.  Like if I sftp a 10G file between servers it's done
in a couple min or less.  Am I thinking of this wrong?

Thanks,
Curt

On Mon, Jun 27, 2022 at 12:33 PM Frank Schilder  wrote:

> Hi Curt,
>
> as far as I understood, a 2+2 EC pool is recovering, which makes 1 OSD per
> host busy. My experience is, that the algorithm for selecting PGs to
> backfill/recover is not very smart. It could simply be that it doesn't find
> more PGs without violating some of these settings:
>
> osd_max_backfills
> osd_recovery_max_active
>
> I have never observed the second parameter to change anything (try any
> ways). However, the first one has a large impact. You could try increasing
> this slowly until recovery moves faster. Another parameter you might want
> to try is
>
> osd_recovery_sleep_[hdd|ssd]
>
> Be careful as this will impact client IO. I could reduce the sleep for my
> HDDs to 0.05. With your workload pattern, this might be something you can
> tune as well.
>
> Having said that, I think you should increase your PG count on the EC pool
> as soon as the cluster is healthy. You have only about 20 PGs per OSD and
> large PGs will take unnecessarily long to recover. A higher PG count will
> also make it easier for the scheduler to find PGs for recovery/backfill.
> Aim for a number between 100 and 200. Give the pool(s) with most data
> (#objects) the most PGs.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Curt 
> Sent: 24 June 2022 19:04
> To: Anthony D'Atri; ceph-users@ceph.io
> Subject: [ceph-users] Re: Ceph recovery network speed
>
> 2 PG's shouldn't take hours to backfill in my opinion.  Just 2TB enterprise
> HD's.
>
> Take this log entry below, 72 minutes and still backfilling undersized?
> Should it be that slow?
>
> pg 12.15 is stuck undersized for 72m, current state
> active+undersized+degraded+remapped+backfilling, last acting
> [34,10,29,NONE]
>
> Thanks,
> Curt
>
>
> On Fri, Jun 24, 2022 at 8:53 PM Anthony D'Atri 
> wrote:
>
> > Your recovery is slow *because* there are only 2 PGs backfilling.
> >
> > What kind of OSD media are you using?
> >
> > > On Jun 24, 2022, at 09:46, Curt  wrote:
> > >
> > > Hello,
> > >
> > > I'm trying to understand why my recovery is so slow with only 2 pg
> > > backfilling.  I'm only getting speeds of 3-4/MiB/s on a 10G network.  I
> > > have tested the speed between machines with a few tools and all confirm
> > 10G
> > > speed.  I've tried changing various settings of priority and recovery
> > sleep
> > > hdd, but still the same. Is this a configuration issue or something
> else?
> > >
> > > It's just a small cluster right now with 4 hosts, 11 osd's per.  Please
> > let
> > > me know if you need more information.
> > >
> > > Thanks,
> > > Curt
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery network speed

2022-06-24 Thread Curt
On Sat, Jun 25, 2022 at 3:27 AM Anthony D'Atri 
wrote:

> The pg_autoscaler aims IMHO way too low and I advise turning it off.
>
>
>
> > On Jun 24, 2022, at 11:11 AM, Curt  wrote:
> >
> >> You wrote 2TB before, are they 2TB or 18TB?  Is that 273 PGs total or
> per
> > osd?
> > Sorry, 18TB of data and 273 PGs total.
> >
> >> `ceph osd df` will show you toward the right how many PGs are on each
> > OSD.  If you have multiple pools, some PGs will have more data than
> others.
> >> So take an average # of PGs per OSD and divide the actual HDD capacity
> > by that.
> > 20 pg on avg / 2TB(technically 1.8 I guess) which would be 10.
>
> I’m confused.  Is 20 what `ceph osd df` is reporting?  Send me the output
> of

Yes, 20 would be the avg pg count.
 ID  CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA  OMAP META
  AVAIL%USE   VAR   PGS  STATUS
 1hdd  1.81940   1.0  1.8 TiB  748 GiB   746 GiB  207 KiB  1.7 GiB
 1.1 TiB  40.16  1.68   21  up
 3hdd  1.81940   1.0  1.8 TiB  459 GiB   457 GiB3 KiB  1.2 GiB
 1.4 TiB  24.61  1.03   20  up
 5hdd  1.81940   1.0  1.8 TiB  153 GiB   152 GiB   32 KiB  472 MiB
 1.7 TiB   8.20  0.34   15  up
 7hdd  1.81940   1.0  1.8 TiB  471 GiB   470 GiB   83 KiB  1.0 GiB
 1.4 TiB  25.27  1.06   24  up
 9hdd  1.81940   1.0  1.8 TiB  1.0 TiB  1022 GiB  136 KiB  2.4 GiB
 838 GiB  54.99  2.30   19  up
11hdd  1.81940   1.0  1.8 TiB  443 GiB   441 GiB4 KiB  1.1 GiB
 1.4 TiB  23.76  0.99   20  up
13hdd  1.81940   1.0  1.8 TiB  438 GiB   437 GiB  310 KiB  1.0 GiB
 1.4 TiB  23.50  0.98   18  up
15hdd  1.81940   1.0  1.8 TiB  334 GiB   333 GiB  621 KiB  929 MiB
 1.5 TiB  17.92  0.75   15  up
17hdd  1.81940   1.0  1.8 TiB  310 GiB   309 GiB2 KiB  807 MiB
 1.5 TiB  16.64  0.70   20  up
19hdd  1.81940   1.0  1.8 TiB  433 GiB   432 GiB7 KiB  974 MiB
 1.4 TiB  23.23  0.97   25  up
45hdd  1.81940   1.0  1.8 TiB  169 GiB   169 GiB2 KiB  615 MiB
 1.7 TiB   9.09  0.38   18  up
 0hdd  1.81940   1.0  1.8 TiB  582 GiB   580 GiB  295 KiB  1.7 GiB
 1.3 TiB  31.24  1.31   21  up
 2hdd  1.81940   1.0  1.8 TiB  870 MiB21 MiB  112 KiB  849 MiB
 1.8 TiB   0.05  0.00   14  up
 4hdd  1.81940   1.0  1.8 TiB  326 GiB   325 GiB   14 KiB  947 MiB
 1.5 TiB  17.48  0.73   24  up
 6hdd  1.81940   1.0  1.8 TiB  450 GiB   448 GiB1 KiB  1.4 GiB
 1.4 TiB  24.13  1.01   17  up
 8hdd  1.81940   1.0  1.8 TiB  152 GiB   152 GiB  618 KiB  900 MiB
 1.7 TiB   8.18  0.34   20  up
10hdd  1.81940   1.0  1.8 TiB  609 GiB   607 GiB4 KiB  1.7 GiB
 1.2 TiB  32.67  1.37   25  up
12hdd  1.81940   1.0  1.8 TiB  333 GiB   332 GiB  175 KiB  1.5 GiB
 1.5 TiB  17.89  0.75   24  up
14hdd  1.81940   1.0  1.8 TiB  1.0 TiB   1.0 TiB1 KiB  2.2 GiB
 834 GiB  55.24  2.31   17  up
16hdd  1.81940   1.0  1.8 TiB  168 GiB   167 GiB4 KiB  1.2 GiB
 1.7 TiB   9.03  0.38   15  up
18hdd  1.81940   1.0  1.8 TiB  299 GiB   298 GiB  261 KiB  1.6 GiB
 1.5 TiB  16.07  0.67   15  up
32hdd  1.81940   1.0  1.8 TiB  873 GiB   871 GiB   45 KiB  2.3 GiB
 990 GiB  46.88  1.96   18  up
22hdd  1.81940   1.0  1.8 TiB  449 GiB   447 GiB  139 KiB  1.6 GiB
 1.4 TiB  24.10  1.01   22  up
23hdd  1.81940   1.0  1.8 TiB  299 GiB   298 GiB5 KiB  1.6 GiB
 1.5 TiB  16.06  0.67   20  up
24hdd  1.81940   1.0  1.8 TiB  887 GiB   885 GiB8 KiB  2.4 GiB
 976 GiB  47.62  1.99   23  up
25hdd  1.81940   1.0  1.8 TiB  451 GiB   449 GiB4 KiB  1.6 GiB
 1.4 TiB  24.20  1.01   17  up
26hdd  1.81940   1.0  1.8 TiB  602 GiB   600 GiB  373 KiB  2.0 GiB
 1.2 TiB  32.29  1.35   21  up
27hdd  1.81940   1.0  1.8 TiB  152 GiB   151 GiB  1.5 MiB  564 MiB
 1.7 TiB   8.14  0.34   14  up
28hdd  1.81940   1.0  1.8 TiB  330 GiB   328 GiB7 KiB  1.6 GiB
 1.5 TiB  17.70  0.74   12  up
29hdd  1.81940   1.0  1.8 TiB  726 GiB   723 GiB7 KiB  2.1 GiB
 1.1 TiB  38.94  1.63   16  up
30hdd  1.81940   1.0  1.8 TiB  596 GiB   594 GiB  173 KiB  2.0 GiB
 1.2 TiB  32.01  1.34   19  up
31hdd  1.81940   1.0  1.8 TiB  304 GiB   303 GiB4 KiB  1.6 GiB
 1.5 TiB  16.34  0.68   20  up
44hdd  1.81940   1.0  1.8 TiB  150 GiB   149 GiB  0 B  599 MiB
 1.7 TiB   8.03  0.34   12  up
33hdd  1.81940   1.0  1.8 TiB  451 GiB   449 GiB  462 KiB  1.8 GiB
 1.4 TiB  24.22  1.01   19  up
34hdd  1.81940   1.0  1.8 TiB  449 GiB   448 GiB2 KiB  966 MiB
 1.4 TiB  24.12  1.01   21  up
35hdd  1.81940   1.0  1.8 TiB  458 GiB   457 GiB2 KiB  1.5 GiB
 1.4 TiB  24.60  1.03   23  up
36hdd  1.81940   1.0  1.8 TiB  872 GiB   870 GiB3 KiB  2.4 Gi

[ceph-users] Re: Ceph recovery network speed

2022-06-24 Thread Curt
Nope, majority of read/writes happen at night so it's doing less than 1
MiB/s client io right now, sometimes 0.

On Fri, Jun 24, 2022, 22:23 Stefan Kooman  wrote:

> On 6/24/22 20:09, Curt wrote:
> >
> >
> > On Fri, Jun 24, 2022 at 10:00 PM Stefan Kooman  > <mailto:ste...@bit.nl>> wrote:
> >
> > On 6/24/22 19:49, Curt wrote:
> >  > Pool 12 is my erasure coding pool, 2+2.  How can I tell if it's
> >  > objections or keys recovering?\
> >
> > ceph -s. wil tell you what type of recovery is going on.
> >
> > Is it a cephfs metadata pool? Or a rgw index pool?
> >
> > Gr. Stefan
> >
> >
> > object recovery, I guess I'm used to it always showing object, so didn't
> > know it could be key.
> >
> > rbd pool.
>
> recovery has lower priority than client IO. Is the cluster busy?
>
> Gr. Stefan
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery network speed

2022-06-24 Thread Curt
> You wrote 2TB before, are they 2TB or 18TB?  Is that 273 PGs total or per
osd?
Sorry, 18TB of data and 273 PGs total.

> `ceph osd df` will show you toward the right how many PGs are on each
OSD.  If you have multiple pools, some PGs will have more data than others.
>  So take an average # of PGs per OSD and divide the actual HDD capacity
by that.
20 pg on avg / 2TB(technically 1.8 I guess) which would be 10.  Shouldn't
that be used though, not capacity? My usage is only 23% capacity.  I
thought ceph autoscalling pg's changed the size dynamically according to
usage?  I'm guessing I'm misunderstanding that part?

Thanks,
Curt

On Fri, Jun 24, 2022 at 9:48 PM Anthony D'Atri 
wrote:

>
> > Yes, SATA, I think my benchmark put it around 125, but that was a year
> ago, so could be misremembering
>
> A FIO benchmark, especially a sequential one on an empty drive, can
> mislead as to the real-world performance one sees on a fragmented drive.
>
> >  273 pg at 18TB so each PG would be 60G.
>
> You wrote 2TB before, are they 2TB or 18TB?  Is that 273 PGs total or per
> osd?
>
> >  Mainly used for RBD, using erasure coding.  cephadm bootstrap with
> docker images.
>
> Ack.  Have to account for replication.
>
> `ceph osd df` will show you toward the right how many PGs are on each
> OSD.  If you have multiple pools, some PGs will have more data than others.
>
> So take an average # of PGs per OSD and divide the actual HDD capacity by
> that.
>
>
>
>
> >
> > On Fri, Jun 24, 2022 at 9:21 PM Anthony D'Atri 
> wrote:
> >
> >
> > >
> > > 2 PG's shouldn't take hours to backfill in my opinion.  Just 2TB
> enterprise HD's.
> >
> > SATA? Figure they can write at 70 MB/s
> >
> > How big are your PGs?  What is your cluster used for?  RBD? RGW? CephFS?
> >
> > >
> > > Take this log entry below, 72 minutes and still backfilling
> undersized?  Should it be that slow?
> > >
> > > pg 12.15 is stuck undersized for 72m, current state
> active+undersized+degraded+remapped+backfilling, last acting [34,10,29,NONE]
> > >
> > > Thanks,
> > > Curt
> > >
> > >
> > > On Fri, Jun 24, 2022 at 8:53 PM Anthony D'Atri <
> anthony.da...@gmail.com> wrote:
> > > Your recovery is slow *because* there are only 2 PGs backfilling.
> > >
> > > What kind of OSD media are you using?
> > >
> > > > On Jun 24, 2022, at 09:46, Curt  wrote:
> > > >
> > > > Hello,
> > > >
> > > > I'm trying to understand why my recovery is so slow with only 2 pg
> > > > backfilling.  I'm only getting speeds of 3-4/MiB/s on a 10G
> network.  I
> > > > have tested the speed between machines with a few tools and all
> confirm 10G
> > > > speed.  I've tried changing various settings of priority and
> recovery sleep
> > > > hdd, but still the same. Is this a configuration issue or something
> else?
> > > >
> > > > It's just a small cluster right now with 4 hosts, 11 osd's per.
> Please let
> > > > me know if you need more information.
> > > >
> > > > Thanks,
> > > > Curt
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> >
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery network speed

2022-06-24 Thread Curt
On Fri, Jun 24, 2022 at 10:00 PM Stefan Kooman  wrote:

> On 6/24/22 19:49, Curt wrote:
> > Pool 12 is my erasure coding pool, 2+2.  How can I tell if it's
> > objections or keys recovering?\
>
> ceph -s. wil tell you what type of recovery is going on.
>
> Is it a cephfs metadata pool? Or a rgw index pool?
>
> Gr. Stefan
>

object recovery, I guess I'm used to it always showing object, so didn't
know it could be key.

rbd pool.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery network speed

2022-06-24 Thread Curt
Pool 12 is my erasure coding pool, 2+2.  How can I tell if it's objections
or keys recovering?

Thanks,
Curt

On Fri, Jun 24, 2022 at 9:39 PM Stefan Kooman  wrote:

> On 6/24/22 19:04, Curt wrote:
> > 2 PG's shouldn't take hours to backfill in my opinion.  Just 2TB
> enterprise
> > HD's.
> >
> > Take this log entry below, 72 minutes and still backfilling undersized?
> > Should it be that slow?
> >
> > pg 12.15 is stuck undersized for 72m, current state
> > active+undersized+degraded+remapped+backfilling, last acting
> [34,10,29,NONE]
>
> What is in that pool 12? Is it objects that are recovering, or keys?
> OMAP data (keys) is slow.
>
> Gr. Stefan
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery network speed

2022-06-24 Thread Curt
2 PG's shouldn't take hours to backfill in my opinion.  Just 2TB enterprise
HD's.

Take this log entry below, 72 minutes and still backfilling undersized?
Should it be that slow?

pg 12.15 is stuck undersized for 72m, current state
active+undersized+degraded+remapped+backfilling, last acting [34,10,29,NONE]

Thanks,
Curt


On Fri, Jun 24, 2022 at 8:53 PM Anthony D'Atri 
wrote:

> Your recovery is slow *because* there are only 2 PGs backfilling.
>
> What kind of OSD media are you using?
>
> > On Jun 24, 2022, at 09:46, Curt  wrote:
> >
> > Hello,
> >
> > I'm trying to understand why my recovery is so slow with only 2 pg
> > backfilling.  I'm only getting speeds of 3-4/MiB/s on a 10G network.  I
> > have tested the speed between machines with a few tools and all confirm
> 10G
> > speed.  I've tried changing various settings of priority and recovery
> sleep
> > hdd, but still the same. Is this a configuration issue or something else?
> >
> > It's just a small cluster right now with 4 hosts, 11 osd's per.  Please
> let
> > me know if you need more information.
> >
> > Thanks,
> > Curt
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph recovery network speed

2022-06-24 Thread Curt
Hello,

I'm trying to understand why my recovery is so slow with only 2 pg
backfilling.  I'm only getting speeds of 3-4/MiB/s on a 10G network.  I
have tested the speed between machines with a few tools and all confirm 10G
speed.  I've tried changing various settings of priority and recovery sleep
hdd, but still the same. Is this a configuration issue or something else?

It's just a small cluster right now with 4 hosts, 11 osd's per.  Please let
me know if you need more information.

Thanks,
Curt
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io