Hi Yuri,
wrt Round 1 - an ability to expand block(main) device has been added to
Nautilus,
see: https://github.com/ceph/ceph/pull/25308
wrt Round 2:
- not setting 'size' label looks like a bug although I recall I fixed
it... Will double check.
- wrong stats output is probably related to
inous once mimic patch is approved.
See https://github.com/ceph/ceph/pull/27447
Thanks,
Igor
On 4/5/2019 4:07 PM, Yury Shevchuk wrote:
On Fri, Apr 05, 2019 at 02:42:53PM +0300, Igor Fedotov wrote:
wrt Round 1 - an ability to expand block(main) device has been added to
Nautilus,
see: https://
It's ceph-bluestore-tool.
On 4/10/2019 10:27 AM, Wido den Hollander wrote:
On 4/10/19 9:25 AM, jes...@krogh.cc wrote:
On 4/10/19 9:07 AM, Charles Alva wrote:
Hi Ceph Users,
Is there a way around to minimize rocksdb compacting event so that it
won't use all the spinning disk IO utilization an
ation here for now.
You can also note that reported SIZE for osd.2 is 400GiB in your case
which is absolutely inline with slow device capacity. Hence no DB involved.
Thanks for your help,
-- Yury
On Mon, Apr 08, 2019 at 10:17:24PM +0300, Igor Fedotov wrote:
Hi Yuri,
both issues from
.4 GiB 644 GiB 35.41
MIN/MAX VAR: 0.91/1.10 STDDEV: 3.37
It worked: AVAIL = 594+50 = 644. Great!
Thanks a lot for your help.
And one more question regarding your last remark is inline below.
On Wed, Apr 10, 2019 at 09:54:35PM +0300, Igor Fedotov wrote:
On 4/9/2019 1:59 PM, Yury Shevchuk wr
Hi Wido,
the main driver for this backport were multiple complains on write ops
latency increasing over time.
E.g. see thread named: "ceph osd commit latency increase over time,
until restart" here.
Or http://tracker.ceph.com/issues/38738
Most symptoms showed Stupid Allocator as a root ca
Besides already mentioned store_test.cc one can also use ceph
objectstore fio plugin
(https://github.com/ceph/ceph/tree/master/src/test/fio) to access
standalone BlueStore instance from FIO benchmarking tool.
Thanks,
Igor
On 4/16/2019 7:58 AM, Can ZHANG wrote:
Hi,
I'd like to run a standa
On 4/15/2019 4:17 PM, Wido den Hollander wrote:
On 4/15/19 2:55 PM, Igor Fedotov wrote:
Hi Wido,
the main driver for this backport were multiple complains on write ops
latency increasing over time.
E.g. see thread named: "ceph osd commit latency increase over time,
until restart"
(https://tracker.ceph.com/issues/38360), the error seems to be
caused by mixed versions. My build environment is CentOS 7.5.1804 with
SCL devtoolset-7, and ceph is latest master branch. Does someone know
about the symbol?
Best,
Can Zhang
Best,
Can Zhang
On Tue, Apr 16, 2019 at 8:37 PM Igor
Or try full rebuild?
On 4/17/2019 5:37 PM, Igor Fedotov wrote:
Could you please check if libfio_ceph_objectstore.so has been rebuilt
with your last build?
On 4/17/2019 6:37 AM, Can Zhang wrote:
Thanks for your suggestions.
I tried to build libfio_ceph_objectstore.so, but it fails to load
?
-- Try different allocator.
Ah, BTW, except memory allocator there's another option: recently
backported bitmap allocator.
Igor Fedotov wrote about it's expected to have lesser memory footprint
with time:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-April/034299.html
Hi Ashley,
general rule is that compression switch do not affect existing data but
controls future write request processing.
You can enable/disable compression at any time.
Once disabled - no more compression is happening. And data that has been
compressed remains in this state until removal
4ab0) [0x55b2136b3ab0]
NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrato
Hi Paul,
could you please set both "debug bluestore" and "debug bluefs" to 20,
run again and share the resulting log.
Thanks,
Igor
On 5/9/2019 2:34 AM, Rawson, Paul L. wrote:
Hi Folks,
I'm having trouble getting some of my OSDs to boot. At some point, these
disks got very full. I fixed the
Hi Manuel,
Just in case - haven't you done any manipulation with underlying
disk/partition/volume - resize, replacement etc?
Thanks,
Igor
On 5/17/2019 3:00 PM, EDH - Manuel Rios Fernandez wrote:
Hi ,
Today we got some osd that crash after scrub. Version 14.2.1
2019-05-17 12:49:40.955 7f
Hi Guillaume,
Could you please set debug-bluefs to 20, restart OSD and collect the
whole log.
Thanks,
Igor
On 5/24/2019 4:50 PM, Guillaume Chenuet wrote:
Hi,
We are running a Ceph cluster with 36 OSD splitted on 3 servers (12
OSD per server) and Ceph version
12.2.11 (26dc3775efc7bb286a1
Konstantin,
one should resize device before using bluefs-bdev-expand command.
So the first question should be what's the backend for block.db - simple
device partition, LVM volume, raw file?
LVM volume and raw file resizing is quite simple, while partition one
might need manual data movemen
Hi Jake,
just my 2 cents - I'd suggest to use LVM for DB/WAL to be able
seamlessly extend their sizes if needed.
Once you've configured this way and if you're able to add more NVMe
later you're almost free to select any size at the initial stage.
Thanks,
Igor
On 5/28/2019 4:13 PM, Jake
Hi Max,
I don't think this is allocator related issue. The symptoms that
triggered us to start using bitmap allocator over stupid one were:
- write op latency gradually increasing over time (days not hours)
- perf showing significant amount of time spent in allocator related
function
- OSD
Hi Maged,
min_alloc_size determines allocation granularity hence if object size
isn't aligned with its value allocation overhead still takes place.
E.g. with min_alloc_size = 16K and object size = 24K total allocation
(i.e. bluestore_allocated) would be 32K.
And yes, this overhead is perman
Hi Brett,
this issue has been with you long before upgrade to 14.2.1. This upgrade
just brought corresponding alert visible.
You can turn the alert off by setting
bluestore_warn_on_bluefs_spillover=false.
But generally this warning shows DB data layout inefficiency - some data
is kept at s
. I'm fine turning the warnings off, but it's curious that
only this cluster is showing the alerts. Is there any value in
rebuilding the with smaller SSD meta data volumes? Say 60GB or 30GB?
-Brett
On Tue, Jun 18, 2019 at 1:55 PM Igor Fedotov <mailto:ifedo...@suse.de>&g
Hi Dan,
bluestore_compression_max_blob_size is applied for objects marked with
some additional hints only:
if ((alloc_hints & CEPH_OSD_ALLOC_HINT_FLAG_SEQUENTIAL_READ) &&
(alloc_hints & CEPH_OSD_ALLOC_HINT_FLAG_RANDOM_READ) == 0 &&
(alloc_hints & (CEPH_OSD_ALLOC_HINT_FLAG_IMMUTAB
I'd like to see more details (preferably backed with logs) on this...
On 6/20/2019 6:23 PM, Dan van der Ster wrote:
P.S. I know this has been discussed before, but the
compression_(mode|algorithm) pool options [1] seem completely broken
-- With the pool mode set to force, we see that sometimes t
On 6/20/2019 8:55 PM, Dan van der Ster wrote:
On Thu, Jun 20, 2019 at 6:55 PM Igor Fedotov wrote:
Hi Dan,
bluestore_compression_max_blob_size is applied for objects marked with
some additional hints only:
if ((alloc_hints & CEPH_OSD_ALLOC_HINT_FLAG_SEQUENTIAL_
Hi Saulo,
looks like disk I/O error.
Will you set debug_bluefs to 20 and collect the log, then share a few
lines prior to the assertion?
Checking smartctl output might be a good idea too.
Thanks,
Igor
On 6/21/2019 11:30 AM, Saulo Silva wrote:
Hi,
After a power failure all OSD´s from a po
ht but it looks like there is a bug: osd compression algorithm isn't
applied when osd compression mode set to none. Hence no compression if
pool lacks explicit algorithm specification.
-- dan
-- Dan
On Thu, Jun 20, 2019 at 6:57 PM Igor Fedotov wrote:
I'd like to see more details
ped compressing, until I did
'ceph daemon osd.130 config set bluestore_compression_mode force',
where it restarted immediately.
FTR, it *should* compress with osd bluestore_compression_mode=none and
the pool's compression_mode=force, right?
-- dan
-- Dan
On Thu, Jun 20, 2019 at 6:
bluefs _read got 4096
2019-06-21 10:50:56.475440 7f462db84d00 10 bluefs _replay 0x104000:
txn(seq 332735 len 0xca5 crc 0x4715a5c6)
The entire file as 17M and I can send with necessary ,
Saulo Augusto Silva
Em sex, 21 de jun de 2019 às 06:42, Igor Fedotov <mailto:ifedo...@suse.de>>
Hi Andrei,
The most obvious reason is space usage overhead caused by BlueStore
allocation granularity, e.g. if bluestore_min_alloc_size is 64K and
average object size is 16K one will waste 48K per object in average.
This is rather a speculation so far as we lack key the information about
you
expectations (precise numbers to be verified soon).
The issues seems to be only with the .rgw-buckets pool where the "ceph
df " output shows 15TB of usage and the sum of all buckets in that
pool shows just over 6.5TB.
Cheers
Andrei
--
.
Cheers
*From: *"Igor Fedotov"
*To: *"andrei"
*Cc: *"ceph-users"
*Sent: *Wednesday, 3 July, 2019 12:29:33
*Subject: *Re: [ceph-users] troubleshooting space usage
Hi Andrei,
Additionally I'd like to see performan
:
Hi Igor.
The numbers are identical it seems:
.rgw.buckets 19 15 TiB 78.22 4.3 TiB *8786934*
# cat /root/ceph-rgw.buckets-rados-ls-all |wc -l
*8786934*
Cheers
*From: *"Igor Fedotov"
Hi Lukasz,
I've seen something like that - slow requests and relevant OSD reboots
on suicide timeout at least twice with two different clusters. The root
cause was slow omap listing for some objects which had started to happen
after massive removals from RocksDB.
To verify if this is the cas
Hi Brett,
looks like BlueStore is unable to allocate additional space for BlueFS
at main device. It's either lacking free space or it's too fragmented...
Would you share osd log, please?
Also please run "ceph-bluestore-tool --path path-to-osd!!!> bluefs-bdev-sizes" and share the output.
Tha
or
On 7/8/2019 8:00 PM, Igor Fedotov wrote:
Hi Brett,
looks like BlueStore is unable to allocate additional space for BlueFS
at main device. It's either lacking free space or it's too fragmented...
Would you share osd log, please?
Also please run "ceph-bluestore-tool --pa
Hi Lukasz,
if this is filestore then most probably my comments are irrelevant. The
issue I expected is BlueStore specific
Unfortunately I'm not an expert in filestore hence unable to help in
further investigation. Sorry...
Thanks,
Igor
On 7/9/2019 11:39 AM, Luk wrote:
We have (stil
that a try. Is it something like...
ceph tell 'osd.*' bluestore_allocator stupid
ceph tell 'osd.*' bluefs_allocator stupid
And should I expect any issues doing this?
On Mon, Jul 8, 2019 at 1:04 PM Igor Fedotov <mailto:ifedo...@suse.de>> wrote:
I should read cal
ng
issues with OSDs crashing. Interestingly it seems that the dying OSDs
are always working on a pg from the .rgw.meta pool when they crash.
Log : https://pastebin.com/yuJKcPvX
On Tue, Jul 9, 2019 at 5:14 AM Igor Fedotov <mailto:ifedo...@suse.de>> wrote:
Hi Brett,
in Nautilus
Hi Mark,
I doubt read-only mode would help here.
Log replay is required to build a consistent store state and one can't
bypass it. And looks like your drive/controller still detect some errors
while reading.
For the second issue this PR might help (you'll be able to disable csum
verificati
Please try to set bluestore_bluefs_gift_ratio to 0.0002
On 7/9/2019 7:39 PM, Brett Chancellor wrote:
Too large for pastebin.. The problem is continually crashing new OSDs.
Here is the latest one.
On Tue, Jul 9, 2019 at 11:46 AM Igor Fedotov <mailto:ifedo...@suse.de>> wrote:
? I can't find any
documentation on it. Also do you think this could be related to the
.rgw.meta pool having too many objects per PG? The disks that die
always seem to be backfilling a pg from that pool, and they have ~550k
objects per PG.
-Brett
On Tue, Jul 9, 2019 at 1:03 PM Igor Fe
ant to try manual rocksdb compaction using
ceph-kvstore-tool..
Sent from my Huawei tablet
Original Message
Subject: Re: [ceph-users] 3 OSDs stopped and unable to restart
From: Brett Chancellor
To: Igor Fedotov
CC: Ceph Users
Once backfill
H Paul,
there was a post from Sage named "Pool stats issue with upgrades to
nautilus" recently.
Perhaps that's the case if you add new OSD or repair existing one...
Thanks,
Igor
On 7/17/2019 6:29 PM, Paul Mezzanini wrote:
Sometime after our upgrade to Nautilus our disk usage statistics
Forgot to provide a workaround...
If that's the case then you need to repair each OSD with corresponding
command in ceph-objectstore-tool...
Thanks,
Igor.
On 7/17/2019 6:29 PM, Paul Mezzanini wrote:
Sometime after our upgrade to Nautilus our disk usage statistics went off the
rails wrong.
please contact the sender and
destroy any copies of this information.
____
From: Igor Fedotov
Sent: Wednesday, July 17, 2019 11:33 AM
To: Paul Mezzanini; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] disk usage reported incorrectly
F
Hi Manuel,
this looks like either corrupted data in BlueStore data base or memory
related (some leakage?) issue.
This is reproducible, right?
Could you please make a ticket in upstream tracker, rerun repair with
debug bluestore set to 5/20 and upload corresponding log.
Please observe memo
Hi Frank,
you can specify new db size in the following way:
CEPH_ARGS="--bluestore-block-db-size 107374182400" ceph-bluestore-tool
bluefs-bdev-new-db
Thanks,
Igor
On 7/26/2019 2:49 PM, Frank Rothenstein wrote:
Hi,
I'm running a small (3 hosts) ceph cluster. ATM I want to speed up my
Hi Sylvain,
have you upgraded to Nautilus recently?
Have you added/repaired any OSDs since then?
If so then you're facing a known issue caused by a mixture of legacy and
new approaches to collect pool statistics.
Sage shared detailed information on the issue in this mailing list under
"Pool
Hi Manuel,
as Brad pointed out timeouts and suicides are rather consequences of
some other issues with OSDs.
I recall at least two recent relevant tickets:
https://tracker.ceph.com/issues/36482
https://tracker.ceph.com/issues/40741 (see last comments)
Both had massive and slow reads from Ro
observed for l_bluestore_commit_lat, latency = 87.7928s, txc =
0x55eaa7a40600
Maybe move OMAP +META from all OSD to a NVME of 480GB per node helps
in this situation but not sure.
Manuel
*De:*Igor Fedotov
*Enviado el:* miércoles, 7 de agosto de 2019 13:10
*Para:* EDH - Manuel Rios Fernand
Hi Wido & Hermant.
On 8/14/2019 11:36 AM, Wido den Hollander wrote:
On 8/14/19 9:33 AM, Hemant Sonawane wrote:
Hello guys,
Thank you so much for your responses really appreciate it. But I would
like to mention one more thing which I forgot in my last email is that I
am going to use this stora
Hi Stefan,
this looks like a duplicate for
https://tracker.ceph.com/issues/37282
Actually the root cause selection might be quite wide.
From HW issues to broken logic in RocksDB/BlueStore/BlueFS etc.
As far as I understand you have different OSDs which are failing, right?
Is the set of thes
see inline
On 8/27/2019 4:41 PM, Stefan Priebe - Profihost AG wrote:
Hi Igor,
Am 27.08.19 um 14:11 schrieb Igor Fedotov:
Hi Stefan,
this looks like a duplicate for
https://tracker.ceph.com/issues/37282
Actually the root cause selection might be quite wide.
From HW issues to broken logic
rth checking as well...
Igor
On 8/27/2019 4:52 PM, Stefan Priebe - Profihost AG wrote:
see inline
Am 27.08.19 um 15:43 schrieb Igor Fedotov:
see inline
On 8/27/2019 4:41 PM, Stefan Priebe - Profihost AG wrote:
Hi Igor,
Am 27.08.19 um 14:11 schrieb Igor Fedotov:
Hi Stefan,
this looks like a
Hi,
this line:
-2> 2019-09-12 16:38:15.101 7fcd02fd1f80 1
bluestore(/var/lib/ceph/osd/ceph-71) _open_alloc loaded 0 B in 0 extents
tells me that OSD is unable to load free list manager properly, i.e.
list of free/allocated blocks in unavailable.
You might want to set 'debug bluestore
ed to prevent rebooting all ceph nodes.
Greets,
Stefan
Am 27.08.19 um 16:20 schrieb Igor Fedotov:
It sounds like OSD is "recovering" after checksum error.
I.e. just failed OSD shows no errors in fsck and is able to restart and
process new write requests for long enough period (longer tha
Hi Massimo,
On 9/29/2019 9:13 AM, Massimo Sgaravatto wrote:
In my ceph cluster I am use spinning disks for bluestore OSDs and SSDs
just for the block.db.
If I have got it right, right now:
a) only 3,30,300GB can be used on the SSD rocksdb spillover to slow
device, so you don't have any ben
ceph crash ls
added the log+meta to this email
can something dose this logs can shed some light ?
On Thu, Sep 12, 2019 at 7:20 PM Igor Fedotov mailto:ifedo...@suse.de>> wrote:
Hi,
this line:
-2> 2019-09-12 16:38:15.101 7fcd02fd1f80 1
Hi Lazuardi,
never seen that. Just wondering what Ceph version are you running?
Thanks,
Igor
On 10/8/2019 3:52 PM, Lazuardi Nasution wrote:
Hi,
I get following weird negative objects number on tiering. Why is this
happening? How to get back to normal?
Best regards,
[root@management-a ~]
Hi Eugene,
this looks like https://tracker.ceph.com/issues/42223 indeed.
Would you please find the first crash for these OSDs and share
corresponding logs in the ticket.
Unfortunately I don't know reliable enough ways to recover OSD after
such a failure. If they exist at all... :(
I've be
Hi Lars,
I've also seen interim space usage burst during my experiments. Up to 2x
times of max level size when topmost RocksDB level is L3 (i.e. 25GB
max). So I think 2x (which results in 60-64 GB for DB) is a good grade
when your DB is expected to be small and medium sized. Not sure this
mu
Hi Stefan,
would you please share log snippet prior the assertions? Looks like
RocksDB is failing during transaction submission...
Thanks,
Igor
On 1/16/2020 11:56 AM, Stefan Priebe - Profihost AG wrote:
Hello,
does anybody know a fix for this ASSERT / crash?
2020-01-16 02:02:31.316394 7f
16
01:10:13.404113
/build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r == 0)
ceph version 12.2.12-11-gd3eae83543
(d3eae83543bffc0fc6c43823feb637fa851b6213) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x55c9a712d232]
2: (BlueSt
g but not limited to failure logs, perf counter dumps, system
resource reports etc) for future analysis.
On 1/16/2020 11:58 PM, Stefan Priebe - Profihost AG wrote:
Hi Igor,
answers inline.
Am 16.01.20 um 21:34 schrieb Igor Fedotov:
you may want to run fsck against failing OSDs. Hopefully it
Hi Samuel,
wondering if you have bluestore_fsck_on_mount option set to true? Can
you see high read load over OSD device(s) during the startup?
If so it might be fsck running which takes that long.
Thanks,
Igor
On 1/19/2020 11:53 AM, huxia...@horebdata.cn wrote:
Dear folks,
I had a stra
ges and/or prior restarts, etc) sometimes this
might provide some hints.
Thanks,
Igor
On 1/17/2020 2:30 PM, Stefan Priebe - Profihost AG wrote:
HI Igor,
Am 17.01.20 um 12:10 schrieb Igor Fedotov:
hmmm..
Just in case - suggest to check H/W errors with dmesg.
this happens on around 80 nodes
?
huxia...@horebdata.cn
*From:* Igor Fedotov <mailto:ifedo...@suse.de>
*Date:* 2020-01-19 11:41
*To:* huxia...@horebdata.cn <mailto:huxia...@horebdata.cn>;
ceph-users <mailto:ceph-users@lists.ceph.com>
*Subject:* Re: [ceph-u
t hw but
same ceph version and same kernel version.
Greets,
Stefan
Am 19.01.2020 um 11:53 schrieb Igor Fedotov :
So the intermediate summary is:
Any OSD in the cluster can experience interim RocksDB checksum failure. Which
isn't present after OSD restart.
No HW issues observed, no pe
Hi Martin,
looks like a bug to me.
You might want to remove all custom settings from config database and
try to set osd-memory-target only.
Would it help?
Thanks,
Igor
On 1/22/2020 3:43 PM, Martin Mlynář wrote:
Dne 21. 01. 20 v 21:12 Stefan Kooman napsal(a):
Quoting Martin Mlynář (nex
101 - 170 of 170 matches
Mail list logo