[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-13 Thread Szabo, Istvan (Agoda)
Is it possible to extend the block.db lv of that specific osd with lvextend command or it needs some special bluestore extend? I want to extend that lv with the size of the spillover, compact it and migrate after. Istvan Szabo Senior Infrastructure Engineer

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-13 Thread Igor Fedotov
Yes. For DB volume expanding underlying device/lv should be enough... -- Igor Fedotov Ceph Lead Developer Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-12 Thread Szabo, Istvan (Agoda)
One more thing, what I’m doing at the moment: Noout norebalance on 1 host Stop all osd Compact all the osds Migrate the db 1 by 1 Start the osds 1 by 1 Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e:

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-12 Thread Szabo, Istvan (Agoda)
I’m having 1 billions of objects in the cluster and we are still increasing and faced spillovers allover the clusters. After 15-18 spilledover osds (out of the 42-50) the osds started to die, flapping. Tried to compact manually the spilleovered ones, but didn’t help, however the not spilled

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-12 Thread Szabo, Istvan (Agoda)
Omg, I’ve already migrated 24x osds in each dc-s (altogether 72). What should I do then? 12 left (altogether 36). In my case slow device is faster in random write iops than the one which is serving it. Istvan Szabo Senior Infrastructure Engineer

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-12 Thread Szabo, Istvan (Agoda)
Hi Igor, I’ve attached here, thank you in advance. Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com --- From: Igor

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-12 Thread Igor Fedotov
Istvan, So things with migrations are clear at the moment, right? As I mentioned the migrate command in 15.2.14 has a bug which causes corrupted OSD if db->slow migration occurs on spilled over OSD. To work around that you might want to migrate slow to db first or try manual compaction.

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-12 Thread Igor Fedotov
You mean you run migrate for these 72 OSDs and all of them aren't starting any more? Or you just upgraded them to Octopus and experiencing performance issues. In the latter case and if you have enough space at DB device you might want to try to migrate data from slow to db first. Run fsck

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-12 Thread Igor Fedotov
Istvan, you're bitten by https://github.com/ceph/ceph/pull/43140 It's not fixed in 15.2.14. This has got a backport to upcoming Octopus minor release. Please do not use 'migrate' command from WAL/DB to slow volume if some data is already present there... Thanks, Igor On 10/12/2021 12:13

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-11 Thread Igor Fedotov
No, that's just backtrace of the crash - I'd like to see the full OSD log from the process startup till the crash instead... On 10/8/2021 4:02 PM, Szabo, Istvan (Agoda) wrote: Hi Igor, Here is a bluestore tool fsck output: https://justpaste.it/7igrb Is this

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-08 Thread Szabo, Istvan (Agoda)
Hi Igor, Here is a bluestore tool fsck output: https://justpaste.it/7igrb Is this that you are looking for? Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-05 Thread Szabo, Istvan (Agoda)
This unable to load table properties also interesting before caught signal: -16> 2021-10-05T20:31:28.484+0700 7f310cce5f00 2 rocksdb: [db/version_set.cc:1362] Unable to load table properties for file 247222 --- NotFound: -15> 2021-10-05T20:31:28.484+0700 7f310cce5f00 2 rocksdb:

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-05 Thread Szabo, Istvan (Agoda)
Hmm, I’ve removed from the cluster, now data rebalance, I’ll do with the next one ☹ Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-05 Thread Szabo, Istvan (Agoda)
This one is in messages: https://justpaste.it/3x08z Buffered_io is turned on by default in 15.2.14 octopus FYI. Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-05 Thread Szabo, Istvan (Agoda)
Hmm, tried another one which hasn’t been spilledover disk, still coredumped ☹ Is there any special thing that we need to do before we migrate db next to the block? Our osds are using dmcrypt, is it an issue? { "backtrace": [ "(()+0x12b20) [0x7f310aa49b20]", "(gsignal()+0x10f)

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-05 Thread Igor Fedotov
Not sure dmcrypt is a culprit here. Could you please set debug-bluefs to 20 and collect an OSD startup log. On 10/5/2021 4:43 PM, Szabo, Istvan (Agoda) wrote: Hmm, tried another one which hasn’t been spilledover disk, still coredumped ☹ Is there any special thing that we need to do before

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-05 Thread Eugen Block
Do you see oom killers in dmesg on this host? This line indicates it: "(tcmalloc::allocate_full_cpp_throw_oom(unsigned long)+0x146) [0x7f310b7d8c96]", Zitat von "Szabo, Istvan (Agoda)" : Hmm, tried another one which hasn’t been spilledover disk, still coredumped ☹ Is there any

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-03 Thread Szabo, Istvan (Agoda)
Seems like it cannot start anymore once migrated ☹ https://justpaste.it/5hkot Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-03 Thread 胡 玮文
Sorry, I read it again and found “tcmalloc: large alloc 94477368950784 bytes == (nil)”. This unrealistic large malloc seems indicating a bug. But I didn’t find one in the tracker. 发件人: Szabo, Istvan (Agoda) 发送时间: Monday, October 4, 2021 12:45:20 AM 收件人: Igor

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-02 Thread Szabo, Istvan (Agoda)
Ok, also spillover and not deep-scrub/scrub less than 5 pg shouldn’t be an issue in case of minor update right? Less than 5 pg not scrubbed, I will update complete os also with kernel python …. with ceph from 15.2.10. Usually I never update ceph alone. Istvan Szabo Senior Infrastructure

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-01 Thread Szabo, Istvan (Agoda)
Dear Igor, Is the ceph-volume lvm migrate command smart enough in octopus 15.2.14 to be able to remove the db (included the wall) from the nvme even if it is spilledover? I can’t compact back to normal many disk to not show spillover warning. I think Christian has the truth of the issue, my

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-01 Thread Igor Fedotov
Hi Istvan, yeah both db and wal to slow migration are supported. And spillover state isn't a show stopper for that. On 10/2/2021 1:16 AM, Szabo, Istvan (Agoda) wrote: Dear Igor, Is the ceph-volume lvm migrate command smart enough in octopus 15.2.14 to be able to remove the db (included

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-01 Thread Szabo, Istvan (Agoda)
3x SSD osd /nvme Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com --- -Original Message- From: Igor Fedotov Sent: Friday, October 1, 2021

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-01 Thread Szabo, Istvan (Agoda)
I have my dashboards and I can see that the db nvmes are always running on 100% utilization (you can monitor with iostat -x 1) and it generates all the time iowaits which is between 1-3. I’m using nvme in front of the ssds. Istvan Szabo Senior Infrastructure Engineer

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-01 Thread Igor Fedotov
And how many OSDs are per single NVMe do you have? On 10/1/2021 9:55 AM, Szabo, Istvan (Agoda) wrote: I have my dashboards and I can see that the db nvmes are always running on 100% utilization (you can monitor with iostat -x 1) and it generates all the time iowaits which is between 1-3.

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-30 Thread Victor Hooi
Hi, I'm curious - how did you tell that the separate WAL+DB volume was slowing things down? I assume you did some benchmarking - is there any chance you'd be willing to share results? (Or anybody else that's been in a similar situation). What sorts of devices are you using for the WAL+DB, versus

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-30 Thread Szabo, Istvan (Agoda)
Wow, it works like a charm  Thank you very much, I've tried in my lab, however I need to update the cluster to 15.2.14, because in this version is available the migrate. In error state not sure I can update though. Very smooth: num=14;ceph-volume lvm migrate --osd-id $num --osd-fsid `cat

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-30 Thread Eugen Block
Yes, I believe for you it should work without containers although I haven't tried the migrate command in a non-containerized cluster yet. But I believe this is a general issue for containerized clusters with regards to maintenance. I haven't checked yet if there are existing tracker issues

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread Szabo, Istvan (Agoda)
Actually I don't have containerized deployment, my is normal one. So it should work the lvm migrate. Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread Eugen Block
That's what I did and pasted the results in my previous comments. Zitat von 胡 玮文 : Yes. And “cephadm shell” command does not depend on the running daemon, it will start a new container. So I think it is perfectly fine to stop the OSD first then run the “cephadm shell” command, and run

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread 胡 玮文
Yes. And “cephadm shell” command does not depend on the running daemon, it will start a new container. So I think it is perfectly fine to stop the OSD first then run the “cephadm shell” command, and run ceph-volume in the new shell. 发件人: Eugen Block 发送时间: 2021年9月29日 21:40

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread Eugen Block
The OSD has to be stopped in order to migrate DB/WAL, it can't be done live. ceph-volume requires a lock on the device. Zitat von 胡 玮文 : I’ve not tried it, but how about: cephadm shell -n osd.0 then run “ceph-volume” commands in the newly opened shell. The directory structure seems

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-28 Thread Eugen Block
I tried this in my lab again with Nautilus and it worked as expected, I could start the new OSD immediately. I'll try with Octopus tomorrow again. Zitat von "Szabo, Istvan (Agoda)" : Gave a try of it, so all the 3 osds finally failed :/ Not sure what went wrong. Do the normal

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-28 Thread Szabo, Istvan (Agoda)
Gave a try of it, so all the 3 osds finally failed :/ Not sure what went wrong. Do the normal maintenance things, ceph osd set noout, ceph osd set norebalance, stop the osd and run this command: ceph-bluestore-tool bluefs-bdev-migrate --dev-target /var/lib/ceph/osd/ceph-0/block --devs-source

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-27 Thread Eugen Block
Hi, I think 'ceph-bluestore-tool bluefs-bdev-migrate' could be of use here. I haven't tried it in a production environment yet, only in virtual labs. Regards, Eugen Zitat von "Szabo, Istvan (Agoda)" : Hi, Seems like in our config the nvme device as a wal+db in front of the ssd