The same osd crashed today:
0> 2022-10-24T06:30:00.875+ 7f0bbf3bc700 -1 *** Caught signal
(Segmentation fault) **
in thread 7f0bbf3bc700 thread_name:bstore_kv_final
ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific
(stable)
1: /lib64/libpthread.so.0(+0x12ce0)
Hi Maged,
Thanks for taking the time to go into a detailed explanation. It's
certainly not as easy as working out the appropriate object to get via
rados. As you suggest, I'll have to look into ceph-objectstore-tool and
perhaps librados to get any further.
Thanks again,
Chris
On Mon, Oct
Hi,
The large omap alert looks resolved last week, Although I don't know the
underlying reasons.
When I got your email and tried to get the data, I noticed that the alerts had
stopped. OMAP was 0 Bytes as follows. To make sure, I ran a deep scrub and
waited for a while, but the alert has not
Looking at the device health info for the OSDs in our cluster sometimes shows
"No SMART data available". This appears to only occur for SCSI type disks in
our cluster. ATA disks have their full health SMART data displayed, but the
non-ATA do not.
The actual SMART data (JSON formatted) is
I tried my luck and upgraded to 17.2.4 but unfortunately that didn't make
any difference here either.
I also looked more again at all kinds of client op and request stats and
wotnot which only made me even more certain that this io is not caused by
any clients.
What internal mds operation or
On Wed, Oct 19, 2022 at 7:54 AM Frank Schilder wrote:
>
> Hi Dan,
>
> I know that "fs fail ..." is not ideal, but we will not have time for a clean
> "fs down true" and wait for journal flush procedure to complete (on our
> cluster this takes at least 20 minutes, which is way too long). My
Quick napkin math
for your 3 way replicated pool - eg: pool 28 - you have 9.9 TB across 256
pgs ~= 10137 GB across 256 pgs ~= 39 GB per PG
for 4+2 ec on pool 51 - 32 TB across 128 pgs ~= 21768 GB across 128 pgs ~=
256 GB per pg - with the 4+2 profile this should be spread across 4 parts
~= 64 GB
Hello Sake,
Could you share the output of vgs / lvs commands?
Also, I would suggest you to open a tracker [1]
Thanks!
[1] https://tracker.ceph.com/projects/ceph-volume
On Mon, 24 Oct 2022 at 10:51, Sake Paulusma wrote:
> Last friday I upgrade the Ceph cluster from 17.2.3 to 17.2.5 with "ceph
Hi Zhongzhou,
I think most of the time it means that a device is not wiped correctly.
Can you check that?
Thanks!
On Sat, 22 Oct 2022 at 01:01, Zhongzhou Cai wrote:
> Hi folks,
>
> I'm trying to install ceph on GCE VMs (debian/ubuntu) with PD-SSDs using
> ceph-ansible image. The installation
Hi Tim,
Ah, it didn't sink in for me at first how many pools there were here.
I think you might be hitting the issue that the author of
https://github.com/TheJJ/ceph-balancer ran into, and thus their
balancer might help in this case.
Josh
On Mon, Oct 24, 2022 at 8:37 AM Tim Bishop wrote:
>
>
Hi Joseph,
Here's some of the larger pools. Notable the largest (pool 51, 32 TiB
CephFS data) doesn't have the highest number of PGs.
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
pool28 28 256 9.9 TiB2.61M 30 TiB 43.28 13 TiB
pool29
Hi Josh,
On Mon, Oct 24, 2022 at 07:20:46AM -0600, Josh Baergen wrote:
> > I've included the osd df output below, along with pool and crush rules.
>
> Looking at these, the balancer module should be taking care of this
> imbalance automatically. What does "ceph balancer status" say?
# ceph
Hi Tim,
You might want to check you pool utilization and see if there are
enough pg's in that pool. Higher GB per pg can result in this scenario.
I am also assuming that you have the balancer module turn on (ceph balancer
status) should tell you that as well.
If you have enough pgs in the bigger
Hey, Tim.
Visualization is great to help get a better sense of OSD fillage than a table
of numbers. A Grafana panel works, or a quick script:
Grab this from from CERN:
https://gitlab.cern.ch/ceph/ceph-scripts/-/blob/master/tools/histogram.py
Hi Tim,
> I've included the osd df output below, along with pool and crush rules.
Looking at these, the balancer module should be taking care of this
imbalance automatically. What does "ceph balancer status" say?
Josh
___
ceph-users mailing list --
Hi all,
ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable)
We're having an issue with the spread of data across our OSDs. We have
108 OSDs in our cluster, all identical disk size, same number in each
server, and the same number of servers in each rack. So I'd hoped
Hi, thank you, we replaced the domain of the service in text before
reporting the issue. Sorry, I should have mentioned.
admin.ceph.example.com was turned into admin.ceph. for
privacy sake.
Best Regards,
Martin Johansen
On Mon, Oct 24, 2022 at 2:53 PM Murilo Morais wrote:
> Hello Martin.
>
Hello Martin.
Apparently cephadm is not able to resolve to `admin.ceph.`, check
/etc/hosts or your DNS, try to ping and check if the IPs in `ceph orch host
ls` are pinged and there is no packet loss.
Try according to the documentation:
On 18/10/2022 01:24, Chris Dunlop wrote:
Hi,
Is there anywhere that describes exactly how rbd data (including
snapshots) are stored within a pool?
I can see how a rbd broadly stores its data in rados objects in the
pool, although the object map is opaque. But once an rbd snap is
created
Cheers again.
I am still stuck at this. Someone got an idea how to fix it?
Am Fr., 7. Okt. 2022 um 11:30 Uhr schrieb Boris Behrens :
> Hi,
> I just wanted to reshard a bucket but mistyped the amount of shards. In a
> reflex I hit ctrl-c and waited. It looked like the resharding did not
> finish
Hi, I deployed a Ceph cluster a week ago and have started experiencing
warnings. Any pointers as to how to further debug or fix it? Here is info
about the warnings:
# ceph version
ceph version 17.2.4 (1353ed37dec8d74973edc3d5d5908c20ad5a7332) quincy
(stable)
# ceph status
cluster:
id:
Hi,
In our Ceph Pacific clusters (16.2.10) (1 for OpenStack and S3, 2 for
backup on RBD and S3),
since the upgrade to Pacific, we have regularly the MGR not responding,
not seen anymore in ceph status.
The process is still there.
Noting in the MGR log, just no more logs.
Restarting the
Last friday I upgrade the Ceph cluster from 17.2.3 to 17.2.5 with "ceph orch
upgrade start --image
localcontainerregistry.local.com:5000/ceph/ceph:v17.2.5-20221017". After
sometime, an hour?, I've got a health warning: CEPHADM_REFRESH_FAILED: failed
to probe daemons or devices. I'm using only
23 matches
Mail list logo