Hi!
Could you please elaborate what you meant by "adding another disc to the
recovery process"?
/Z
On Sat, 25 May 2024, 22:49 Mazzystr, wrote:
> Well this was an interesting journey through the bowels of Ceph. I have
> about 6 hours into tweaking every setting imaginable just to circle back
I ended up manually cleaning up the OSD host, removing stale LVs and DM
entries, and then purging the OSD with `ceph osd purge osd.19`. Looks like
it's gone for good.
/Z
On Sat, 4 May 2024 at 08:29, Zakhar Kirpichenko wrote:
> Hi!
>
> An OSD failed in our 16.2.15 cluster. I
Hi!
An OSD failed in our 16.2.15 cluster. I prepared it for removal and ran
`ceph orch daemon rm osd.19 --force`. Somehow that didn't work as expected,
so now we still have osd.19 in the crush map:
-10 122.66965 host ceph02
19 1.0 osd.19
>
> Zitat von Eugen Block :
>
> > You can use the extra container arguments I pointed out a few months
> > ago. Those work in my test clusters, although I haven’t enabled that
> > in production yet. But it shouldn’t make a difference if it’s a test
> > cluster or not.
ut issues?
> Do you know if this works also with reef (we see massive writes as well
> there)?
>
> Can you briefly tabulate the commands you used to persistently set the
> compression options?
>
> Thanks so much,
>
>Dietmar
>
>
> On 10/18/23 06:14, Zakhar K
Well, I've replaced the failed drives and that cleared the error. Arguably,
it was a better solution :-)
/Z
On Sat, 6 Apr 2024 at 10:13, wrote:
> did it help? Maybe you found a better solution?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
>
cluster as well, but of course you'd have to reweight
> new OSDs manually.
>
> Regards,
> Eugen
>
> Zitat von Zakhar Kirpichenko :
>
> > Any comments regarding `osd noin`, please?
> >
> > /Z
> >
> > On Tue, 2 Apr 2024 at 16:09, Zakhar Kirpiche
Thanks, this is a good suggestion!
/Z
On Thu, 4 Apr 2024 at 10:29, Janne Johansson wrote:
> Den tors 4 apr. 2024 kl 06:11 skrev Zakhar Kirpichenko :
> > Any comments regarding `osd noin`, please?
> > >
> > > I'm adding a few OSDs to an existing cluster, the cluster i
Any comments regarding `osd noin`, please?
/Z
On Tue, 2 Apr 2024 at 16:09, Zakhar Kirpichenko wrote:
> Hi,
>
> I'm adding a few OSDs to an existing cluster, the cluster is running with
> `osd noout,noin`:
>
> cluster:
> id: 3f50555a-ae2a-11eb-a2fc-ffde
d drives
> it should work. I also don't expect an impact on the rest of the OSDs
> (except for backfilling, of course).
>
> Regards,
> Eugen
>
> [1] https://docs.ceph.com/en/latest/cephadm/services/osd/#replacing-an-osd
>
> Zitat von Zakhar Kirpichenko :
>
> >
Hi,
I'm adding a few OSDs to an existing cluster, the cluster is running with
`osd noout,noin`:
cluster:
id: 3f50555a-ae2a-11eb-a2fc-ffde44714d86
health: HEALTH_WARN
noout,noin flag(s) set
Specifically `noin` is documented as "prevents booting OSDs from being
marked
Hi,
Unfortunately, some of our HDDs failed and we need to replace these drives
which are parts of "combined" OSDs (DB/WAL on NVME, block storage on HDD).
All OSDs are defined with a service definition similar to this one:
```
service_type: osd
service_id: ceph02_combined_osd
service_name:
Hi,
A disk failed in our cephadm-managed 16.2.15 cluster, the affected OSD is
down, out and stopped with cephadm, I also removed the failed drive from
the host's service definition. The cluster has finished recovering but the
following warning persists:
[WRN] CEPHADM_FAILED_DAEMON: 1 failed
se it's an option that has to be present *during* mon
> startup, not *after* the startup when it can read the config store.
>
> Zitat von Zakhar Kirpichenko :
>
> > Hi Eugen,
> >
> > It is correct that I manually added the configuration, but not to the
> > unit.run b
true,bottommost_compression=kLZ4HCCompression,max_background_jobs=4,max_subcompactions=2'
>
> Regards,
> Eugen
>
> Zitat von Zakhar Kirpichenko :
>
> > Hi,
> >
> > I have upgraded my test and production cephadm-managed clusters from
> > 16.2.14 to 16.2.15. The u
Hi,
I have upgraded my test and production cephadm-managed clusters from
16.2.14 to 16.2.15. The upgrade was smooth and completed without issues.
There were a few things which I noticed after each upgrade:
1. RocksDB options, which I provided to each mon via their configuration
files, got
This is great news! Many thanks!
/Z
On Mon, 4 Mar 2024 at 17:25, Yuri Weinstein wrote:
> We're happy to announce the 15th, and expected to be the last,
> backport release in the Pacific series.
>
> https://ceph.io/en/news/blog/2024/v16-2-15-pacific-released/
>
> Notable Changes
>
Hi,
We randomly got several Pacific package updates to 16.2.15 available for
Ubuntu 20.04. As far as I can see, 16.2.15 hasn't been released and there's
been no release announcement. The updates seem to be no longer available.
What's going on with 16.2.15?
/Z
and apply it:
>
> ceph orch apply -i new-drivegroup.yml
>
> Zitat von Zakhar Kirpichenko :
>
> > Many thanks for your response, Eugen!
> >
> > I tried to fail mgr twice, unfortunately that had no effect on the issue.
> > Neither `cephadm ceph-volume inventory
Answering my own question: I exported the spec, removed the failed drive
and re-applied the spec again; the spec appears to have been updated
correctly and the warning is gone.
/Z
On Fri, 16 Feb 2024 at 14:33, Zakhar Kirpichenko wrote:
> Many thanks for your response, Eugen!
>
>
orch ls --export
>
> Does it contain specific device paths or something? Does 'cephadm ls'
> on that node show any traces of the previous OSD?
> I'd probably try to check some things like
>
> cephadm ceph-volume inventory
> ceph device ls-by-host
>
> Regards,
> E
Hi,
We had a physical drive malfunction in one of our Ceph OSD hosts managed by
cephadm (Ceph 16.2.14). I have removed the drive from the system, and the
kernel no longer sees it:
ceph03 ~]# ls -al /dev/sde
ls: cannot access '/dev/sde': No such file or directory
I have removed the corresponding
Indeed, it looks like it's been recently reopened. Thanks for this!
/Z
On Wed, 7 Feb 2024 at 15:43, David Orman wrote:
> That tracker's last update indicates it's slated for inclusion.
>
> On Thu, Feb 1, 2024, at 10:47, Zakhar Kirpichenko wrote:
> > Hi,
> >
> >
Hi,
Please consider not leaving this behind:
https://github.com/ceph/ceph/pull/55109
It's a serious bug, which potentially affects a whole node stability if the
affected mgr is colocated with OSDs. The bug was known for quite a while
and really shouldn't be left unfixed.
/Z
On Thu, 1 Feb 2024
ere any zombie or unwanted process that make the ceph cluster busy or
> the IOPS budget of the disk that makes the cluster busy?
>
>
> On November 4, 2023 at 4:29 PM Zakhar Kirpichenko
> wrote:
>
> You have an IOPS budget, i.e. how much I/O your spinners can deliver.
> Space
I have to say that not including a fix for a serious issue into the last
minor release of Pacific is a rather odd decision.
/Z
On Thu, 25 Jan 2024 at 09:00, Konstantin Shalygin wrote:
> Hi,
>
> The backport to pacific was rejected [1], you may switch to reef, when [2]
> merged and released
>
>
I found that quickly restarting the affected mgr every 2 days is an okay
kludge. It takes less than a second to restart, and never grows to
dangerous sizes which is when it randomly starts ballooning.
/Z
On Thu, 25 Jan 2024, 03:12 changzhi tan, <544463...@qq.com> wrote:
> Is there any way to
thoughts?
/Z
On Mon, 11 Dec 2023 at 12:34, Zakhar Kirpichenko wrote:
> Hi,
>
> Another update: after 2 more weeks the mgr process grew to ~1.5 GB, which
> again was expected:
>
> mgr.ceph01.vankui ceph01 *:8443,9283 running (2w)102s ago 2y
> 1519M-
112M- 16.2.14 fc0182d6cda5 1c3d2d83b6df
The cluster is healthy and operating normally, the mgr process is growing
slowly. It's still unclear what caused the ballooning and OOM issue under
very similar conditions.
/Z
On Sat, 25 Nov 2023 at 08:31, Zakhar Kirpichenko wrote:
>
now.
I've already checked the file descriptor numbers, the defaults already are
very high and the usage is rather low.
/Z
On Wed, 6 Dec 2023 at 03:24, Tyler Stachecki
wrote:
> On Tue, Dec 5, 2023 at 10:13 AM Zakhar Kirpichenko
> wrote:
> >
> > Any input from anyone?
&g
Any input from anyone?
/Z
On Mon, 4 Dec 2023 at 12:52, Zakhar Kirpichenko wrote:
> Hi,
>
> Just to reiterate, I'm referring to an OSD crash loop because of the
> following error:
>
> "2023-12-03T04:00:36.686+ 7f08520e2700 -1 bdev(0x55f02a28a400
> /var/
ideas?
/Z
On Sun, 3 Dec 2023 at 16:09, Zakhar Kirpichenko wrote:
> Thanks! The bug I referenced is the reason for the 1st OSD crash, but not
> for the subsequent crashes. The reason for those is described where you
> . I'm asking for help with that one.
>
> /Z
>
> On Sun, 3 Dec
Thanks! The bug I referenced is the reason for the 1st OSD crash, but not
for the subsequent crashes. The reason for those is described where you
. I'm asking for help with that one.
/Z
On Sun, 3 Dec 2023 at 15:31, Kai Stian Olstad wrote:
> On Sun, Dec 03, 2023 at 06:53:08AM +0200, Zak
Hi,
One of our 16.2.14 cluster OSDs crashed again because of the dreaded
https://tracker.ceph.com/issues/53906 bug. Usually an OSD, which crashed
because of this bug, restarts within seconds and continues normal
operation. This time it failed to restart and kept crashing:
"assert_condition":
days, which likely
means that whatever triggers the issue happens randomly and quite suddenly.
I'll continue monitoring the mgr and get back with more observations.
/Z
On Wed, 22 Nov 2023 at 16:33, Zakhar Kirpichenko wrote:
> Thanks for this. This looks similar to what we're observing. Altho
Hi,
Please note that there are cases where the use of ceph.conf inside a
container is justified. For example, I was unable to set monitor's
mon_rocksdb_options by any means except for providing them in monitor's own
ceph.conf within the container, all other attempts to pass this settings
were
We use podman, could it
> > be some docker restriction?
> >
> > Zitat von Zakhar Kirpichenko :
> >
> >> It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has
> >> 384
> >> GB of RAM, each OSD has a memory target of 16 GB, about 100
;
> Zitat von Zakhar Kirpichenko :
>
> > It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has 384
> > GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of memory,
> > give or take, is available (mostly used by page cache) on each node
> during
&
gt; COMMAND
> 6077 ceph 20 0 6357560 4,522g 22316 S 12,00 1,797
> 57022:54 ceph-mgr
>
> In our own cluster (smaller than that and not really heavily used) the
> mgr uses almost 2 GB. So those numbers you have seem relatively small.
>
> Zitat von Zakhar Kirpichenko
2023 at 13:07, Eugen Block wrote:
> I see these progress messages all the time, I don't think they cause
> it, but I might be wrong. You can disable it just to rule that out.
>
> Zitat von Zakhar Kirpichenko :
>
> > Unfortunately, I don't have a full stack trace because ther
ock wrote:
> Do you have the full stack trace? The pastebin only contains the
> "tcmalloc: large alloc" messages (same as in the tracker issue). Maybe
> comment in the tracker issue directly since Radek asked for someone
> with a similar problem in a newer release.
n’t this quite similar?
>
> https://tracker.ceph.com/issues/45136
>
> Zitat von Zakhar Kirpichenko :
>
> > Hi,
> >
> > I'm facing a rather new issue with our Ceph cluster: from time to time
> > ceph-mgr on one of the two mgr nodes gets oom-killed after consum
Hi,
I'm facing a rather new issue with our Ceph cluster: from time to time
ceph-mgr on one of the two mgr nodes gets oom-killed after consuming over
100 GB RAM:
[Nov21 15:02] tp_osd_tp invoked oom-killer:
gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[ +0.10]
running (3d) 7m ago
> 12d3039M4096M 17.2.6 90a2664234e1 16e04a1da987
> osd.34 sg-osd02 running (3d) 7m ago
> 12d2434M4096M 17.2.6 90a2664234e1 014076e28182
>
>
>
> btw as you said, I feel this value does
out memory leak. A nice man, @Anthony D'Atri ,
> on this forum helped me to understand that it wont help to limit OSD usage.
>
> I set it to 1GB because I want to see how this option works.
>
> I will read and test with caches options.
>
> Nguyen Huu Khoi
>
>
> On Thu
Hi,
osd_memory_target is a "target", i.e. an OSD make an effort to consume up
to the specified amount of RAM, but won't consume less than required for
its operation and caches, which have some minimum values such as for
example osd_memory_cache_min, bluestore_cache_size,
bluestore_cache_size_hdd,
Take hints from this: "544 pgs not deep-scrubbed in time". Your OSDs are
unable to scrub their data in time, likely because they cannot cope with
the client + scrubbing I/O. I.e. there's too much data on too few and too
slow spindles.
You can play with osd_deep_scrub_interval and increase the
; Is there any parameter in the ceph osd log and ceph mon log that gives me
> the clue for the cluster business?
> Is there any zombie or unwanted process that make the ceph cluster busy or
> the IOPS budget of the disk that makes the cluster busy?
>
>
> On November 4, 2023 at 4:29 P
orage of 1.6 TB free in each of my OSD,
> that will not help in my IOPS issue right?
> Please guide me
>
> On November 2, 2023 at 12:47 PM Zakhar Kirpichenko
> wrote:
>
> >1. The calculated IOPS is for the rw operation right ?
>
> Total drive IOPS, read or write. Depending
om the output of ceph osd df tree that is count of
> pgs(45/OSD) and use% (65 to 67%). Is that not significant?
> Correct me if my queries are irrelevant
>
>
>
> On November 2, 2023 at 11:36 AM Zakhar Kirpichenko
> wrote:
>
> Sure, it's 36 OSDs at 200 IOPS each (tops, like
busy and OSDs aren't coping.
Also your nodes are not balanced.
/Z
On Thu, 2 Nov 2023 at 07:33, V A Prabha wrote:
> Can you please elaborate your identifications and the statement .
>
>
> On November 2, 2023 at 9:40 AM Zakhar Kirpichenko
> wrote:
>
> I'm afraid you're s
I'm afraid you're simply hitting the I/O limits of your disks.
/Z
On Thu, 2 Nov 2023 at 03:40, V A Prabha wrote:
> Hi Eugen
> Please find the details below
>
>
> root@meghdootctr1:/var/log/ceph# ceph -s
> cluster:
> id: c59da971-57d1-43bd-b2b7-865d392412a5
> health: HEALTH_WARN
>
tick_period 10
> >
> > Regards,
> > Eugen
> >
> > Zitat von Chris Palmer :
> >
> >> I have just checked 2 quincy 17.2.6 clusters, and I see exactly the
> >> same. The pgmap version is bumping every two seconds (which ties in
> >> with th
rst.
>
> Alternatively you might consider building updated code yourself and make
> patched binaries on top of .14...
>
>
> Thanks,
>
> Igor
>
>
> On 20/10/2023 15:10, Zakhar Kirpichenko wrote:
>
> Thank you, Igor.
>
> It is somewhat disappointing
This should be coupled with enabling
>> > 'level_compaction_dynamic_level_bytes' mode in RocksDB - there is pretty
>> > good spec on applying this mode to BlueStore attached to
>> > https://github.com/ceph/ceph/pull/37156.
>> >
>> >
>> > T
fast" mode. This should be coupled with enabling
> 'level_compaction_dynamic_level_bytes' mode in RocksDB - there is pretty
> good spec on applying this mode to BlueStore attached to
> https://github.com/ceph/ceph/pull/37156.
>
>
> Thanks,
>
> Igor
> On 20/10/2
16/10/2023 14:13, Zakhar Kirpichenko wrote:
>
> Many thanks, Igor. I found previously submitted bug reports and subscribed
> to them. My understanding is that the issue is going to be fixed in the
> next Pacific minor release.
>
> /Z
>
> On Mon, 16 Oct 2023 at 14:03, Igor
ers are healthy with
> > nothing apart from client IO happening.
> >
> > On 13/10/2023 12:09, Zakhar Kirpichenko wrote:
> >> Hi,
> >>
> >> I am investigating excessive mon writes in our cluster and wondering
> >> whether excessive pgmap updates could be t
>
> since its a bit beyond of the scope of basic, could you please post the
> complete ceph.conf config section for these changes for reference?
>
> Thanks!
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> _________
confirm
> that compression works well for MONs too, compression could be enabled
> by default as well.
>
> Regards,
> Eugen
>
> https://tracker.ceph.com/issues/63229
>
> Zitat von Zakhar Kirpichenko :
>
> > With the help of community members, I managed to enable RocksDB
>
with the config database and which require this
> extra-entrypoint-argument.
>
> Thanks again, Mykola!
> Eugen
>
> [1]
>
> https://docs.ceph.com/en/quincy/cephadm/services/#extra-entrypoint-arguments
>
> Zitat von Zakhar Kirpichenko :
>
> > Thanks for the sugg
adding compression to other monitors.
/Z
On Mon, 16 Oct 2023 at 14:57, Zakhar Kirpichenko wrote:
> The issue persists, although to a lesser extent. Any comments from the
> Ceph team please?
>
> /Z
>
> On Fri, 13 Oct 2023 at 20:51, Zakhar Kirpichenko wrote:
>
>> &
set. The reason I think this is that rocksdb mount options are needed
> _before_ the mon is able to access any of the centralized conf data,
> which I believe is itself stored in rocksdb.
>
> Josh
>
> On Sun, Oct 15, 2023 at 10:29 PM Zakhar Kirpichenko
> wrote:
> >
> > Out of curi
The issue persists, although to a lesser extent. Any comments from the Ceph
team please?
/Z
On Fri, 13 Oct 2023 at 20:51, Zakhar Kirpichenko wrote:
> > Some of it is transferable to RocksDB on mons nonetheless.
>
> Please point me to relevant Ceph documentation, i.e. a descri
e similar issue at:
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/YNJ35HXN4HXF4XWB6IOZ2RKXX7EQCEIY/
>
>
> Thanks,
>
> Igor
>
> On 16/10/2023 09:26, Zakhar Kirpichenko wrote:
> > Hi,
> >
> > After upgrading to Ceph 16.2.14 w
ed: 17024548864 unmapped: 4164534272
heap: 21189083136 old mem: 13797582406 new mem: 13797582406
There's plenty of RAM in the system, about 120 GB free and used for cache.
/Z
On Mon, 16 Oct 2023 at 09:26, Zakhar Kirpichenko wrote:
> Hi,
>
> After upgrading to Ceph 16.2.14 we had several OSD c
Not sure how it managed to screw up formatting, OSD configuration in a more
readable form: https://pastebin.com/mrC6UdzN
/Z
On Mon, 16 Oct 2023 at 09:26, Zakhar Kirpichenko wrote:
> Hi,
>
> After upgrading to Ceph 16.2.14 we had several OSD crashes
> in bstore_kv_sync thread
Hi,
After upgrading to Ceph 16.2.14 we had several OSD crashes
in bstore_kv_sync thread:
1. "assert_thread_name": "bstore_kv_sync",
2. "backtrace": [
3. "/lib64/libpthread.so.0(+0x12cf0) [0x7ff2f6750cf0]",
4. "gsignal()",
5. "abort()",
6. "(ceph::__ceph_assert_fail(char
it if someone from the Ceph team could please chip in
and suggest a working way to enable RocksDB compression in Ceph monitors.
/Z
On Sat, 14 Oct 2023 at 19:16, Zakhar Kirpichenko wrote:
> Thanks for your response, Josh. Our ceph.conf doesn't have anything but
> the mon addresses, moder
I wonder if mon settings like
> this one won't actually apply the way you want because they're needed
> before the mon has the ability to obtain configuration from,
> effectively, itself.
>
> Josh
>
> On Sat, Oct 14, 2023 at 1:32 AM Zakhar Kirpichenko
> wrote:
>
from anyone, please?
/Z
On Fri, 13 Oct 2023 at 23:01, Zakhar Kirpichenko wrote:
> Hi,
>
> I'm still trying to fight large Ceph monitor writes. One option I
> considered is enabling RocksDB compression, as our nodes have more than
> sufficient RAM and CPU. Unfortunately
Hi,
I'm still trying to fight large Ceph monitor writes. One option I
considered is enabling RocksDB compression, as our nodes have more than
sufficient RAM and CPU. Unfortunately, monitors seem to completely ignore
the compression setting:
I tried:
- setting ceph config set mon.ceph05
;
>
> Please point me to such recommendations, if they're on docs.ceph.com I'll
> get them updated.
>
> On Oct 13, 2023, at 13:34, Zakhar Kirpichenko wrote:
>
> Thank you, Anthony. As I explained to you earlier, the article you had
> sent is about RocksDB tuning for Bluestore OSD
a client SKU and really not suited for
> enterprise use. If you had the 1TB SKU you'd get much longer life, or you
> could change the overprovisioning on the ones you have.
>
> On Oct 13, 2023, at 12:30, Zakhar Kirpichenko wrote:
>
> I would very much appreciate it if someone
would very much appreciate it if someone with a better understanding of
monitor internals and use of RocksDB could please chip in.
/Z
On Wed, 11 Oct 2023 at 19:00, Zakhar Kirpichenko wrote:
> Thank you, Frank. This confirms that monitors indeed do this, and
>
> Our boot drives in 3 systems
Thank you, Frank.
Tbh, I think it doesn't matter if the number of manual compactions is for
24h or for a smaller period, as long as it's over a reasonable period of
time, so that an average number of compactions per hour can be calculated.
/Z
On Fri, 13 Oct 2023 at 16:01, Frank Schilder wrote:
Hi,
I am investigating excessive mon writes in our cluster and wondering
whether excessive pgmap updates could be the culprit. Basically pgmap is
updated every few seconds, sometimes over ten times per minute, in a
healthy cluster with no OSD and/or PG changes:
Oct 13 11:03:03 ceph03 bash[4019]:
Hi!
Further to my thread "Ceph 16.2.x mon compactions, disk writes" (
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/XGCI2LFW5RH3GUOQFJ542ISCSZH3FRX2/)
where we have established that Ceph monitors indeed write considerable
amounts of data to disks, I would like to request fellow
very large and also provide extra
> endurance with SSDs with good controllers.
>
> I also think the recommendations on the ceph docs deserve a reality check.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
>
uot;hundreds of GB per day"? I see similar stats as
> Frank on different clusters with different client IO.
>
> Zitat von Zakhar Kirpichenko :
>
> > Sure, nothing unusual there:
> >
> > ---
> >
> > cluster:
> > id: 3f50555a-ae2a-11eb
> Can you add some more details as requested by Frank? Which mgr modules
> are enabled? What's the current 'ceph -s' output?
>
> > Is autoscaler running and doing stuff?
> > Is balancer running and doing stuff?
> > Is backfill going on?
> > Is recovery going on?
> >
an option to limit logging to the MON store?
>
> I don't recall at the moment, worth checking tough.
>
> Zitat von Zakhar Kirpichenko :
>
> > Thank you, Frank.
> >
> > The cluster is healthy, operating normally, nothing unusual is going on.
> We
>
n to limit logging to the MON store?
>
> For information to readers, we followed old recommendations from a Dell
> white paper for building a ceph cluster and have a 1TB Raid10 array on 6x
> write intensive SSDs for the MON stores. After 5 years we are below 10%
> wear. Average size of the M
probably wouldn't change too much, only if you know what you're doing.
> Maybe Igor can comment if some other tuning makes sense here.
>
> Regards,
> Eugen
>
> Zitat von Zakhar Kirpichenko :
>
> > Any input from anyone, please?
> >
> > On Tue, 10 Oct 2023 at 09
Any input from anyone, please?
On Tue, 10 Oct 2023 at 09:44, Zakhar Kirpichenko wrote:
> Any input from anyone, please?
>
> It's another thing that seems to be rather poorly documented: it's unclear
> what to expect, what 'normal' behavior should be, and what can be done
> about
Any input from anyone, please?
It's another thing that seems to be rather poorly documented: it's unclear
what to expect, what 'normal' behavior should be, and what can be done
about the huge amount of writes by monitors.
/Z
On Mon, 9 Oct 2023 at 12:40, Zakhar Kirpichenko wrote:
>
Thanks for the suggestion. That pid belongs to the mon process. I.e. the
monitor is logging all client connections and commands.
/Z
On Mon, 9 Oct 2023 at 14:24, Kai Stian Olstad wrote:
> On 09.10.2023 10:05, Zakhar Kirpichenko wrote:
> > I did try to play with various debug settings.
s with
> > ceph daemon mon.a config show | grep debug_ | grep mgr
> >
> > ceph tell mon.* injectargs --$monk=0/0
> >
> > >
> > > Any input from anyone, please?
> > >
> > > This part of Ceph is
Hi,
Monitors in our 16.2.14 cluster appear to quite often run "manual
compaction" tasks:
debug 2023-10-09T09:30:53.888+ 7f48a329a700 4 rocksdb: EVENT_LOG_v1
{"time_micros": 1696843853892760, "job": 64225, "event": "flush_started",
"num_memtables": 1, "num_entries": 715, "num_deletes": 251,
_ | grep mgr
>
> ceph tell mon.* injectargs --$monk=0/0
>
> >
> > Any input from anyone, please?
> >
> > This part of Ceph is very poorly documented. Perhaps there's a better
> place
> > to ask this question? Please let me know.
> >
> > /Z
> &
Any input from anyone, please?
This part of Ceph is very poorly documented. Perhaps there's a better place
to ask this question? Please let me know.
/Z
On Sat, 7 Oct 2023 at 22:00, Zakhar Kirpichenko wrote:
> Hi!
>
> I am still fighting excessive logging. I've reduced unnecessar
produces a significant part of the logging traffic.
>>
>>
>> Thanks,
>>
>> Igor
>>
>> On 04/10/2023 20:51, Zakhar Kirpichenko wrote:
>> > Any input from anyone, please?
>> >
>> > On Tue, 19 Sept 2023 at 09:01, Zakhar Kirpiche
?
/Z
On Wed, 4 Oct 2023 at 21:23, Igor Fedotov wrote:
> Hi Zakhar,
>
> do reduce rocksdb logging verbosity you might want to set debug_rocksdb
> to 3 (or 0).
>
> I presume it produces a significant part of the logging traffic.
>
>
> Thanks,
>
> Igor
>
>
Any input from anyone, please?
On Tue, 19 Sept 2023 at 09:01, Zakhar Kirpichenko wrote:
> Hi,
>
> Our Ceph 16.2.x cluster managed by cephadm is logging a lot of very
> detailed messages, Ceph logs alone on hosts with monitors and several OSDs
> has already eaten through 50% o
Many thanks for the clarification!
/Z
On Fri, 29 Sept 2023 at 16:43, Tyler Stachecki
wrote:
>
>
> On Fri, Sep 29, 2023, 9:40 AM Zakhar Kirpichenko wrote:
>
>> Thanks for the suggestion, Tyler! Do you think switching the progress
>> module off will have no material
Thanks for the suggestion, Tyler! Do you think switching the progress
module off will have no material impact on the operation of the cluster?
/Z
On Fri, 29 Sept 2023 at 14:13, Tyler Stachecki
wrote:
> On Fri, Sep 29, 2023, 5:55 AM Zakhar Kirpichenko wrote:
>
>> Thank you, Eugen.
would have, so maybe
> investigate first and then try just clearing it. Maybe a mgr failover
> would do the same, not sure.
>
> Regards,
> Eugen
>
> [1]
>
> https://github.com/ceph/ceph/blob/1d10b71792f3be8887a7631e69851ac2df3585af/src/pybind/mgr/progress/module.py#
Hi,
Mgr of my cluster logs this every few seconds:
[progress WARNING root] complete: ev 7de5bb74-790b-4fda-8838-e4af4af18c62
does not exist
[progress WARNING root] complete: ev fff93fce-b630-4141-81ee-19e7a3e61483
does not exist
[progress WARNING root] complete: ev
Any input from anyone, please?
On Tue, 19 Sept 2023 at 09:01, Zakhar Kirpichenko wrote:
> Hi,
>
> Our Ceph 16.2.x cluster managed by cephadm is logging a lot of very
> detailed messages, Ceph logs alone on hosts with monitors and several OSDs
> has already eaten through 50% o
Hi,
Our Ceph 16.2.x cluster managed by cephadm is logging a lot of very
detailed messages, Ceph logs alone on hosts with monitors and several OSDs
has already eaten through 50% of the endurance of the flash system drives
over a couple of years.
Cluster logging settings are default, and it seems
1 - 100 of 171 matches
Mail list logo