Re: [ceph-users] Ceph inside Docker containers inside VirtualBox
Hi ! I am not 100% sure, but i think, --net=host does not propagate /dev/ inside the conatiner. From the Error Message : 2019-04-18 07:30:06 /opt/ceph-container/bin/entrypoint.sh: ERROR- The device pointed by OSD_DEVICE (/dev/vdd) doesn't exist ! I whould say, you should add something like --device=/dev/vdd to the docker run command for the osd. Br Am 18.04.2019 um 14:46 schrieb Varun Singh: Hi, I am trying to setup Ceph through Docker inside a VM. My host machine is Mac. My VM is an Ubuntu 18.04. Docker version is 18.09.5, build e8ff056. I am following the documentation present on ceph/daemon Docker Hub page. The idea is, if I spawn docker containers as mentioned on the page, I should get a ceph setup without KV store. I am not worried about KV store as I just want to try it out. Following are the commands I am firing to bring the containers up: Monitor: docker run -d --net=host -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ -e MON_IP=10.0.2.15 -e CEPH_PUBLIC_NETWORK=10.0.2.0/24 ceph/daemon mon Manager: docker run -d --net=host -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ ceph/daemon mgr OSD: docker run -d --net=host --pid=host --privileged=true -v /etc/ceph:/etc/ceph -v /var/lib/ceph/:/var/lib/ceph/ -v /dev/:/dev/ -e OSD_DEVICE=/dev/vdd ceph/daemon osd From the above commands I am able to spawn monitor and manager properly. I verified this by firing this command on both monitor and manager containers: sudo docker exec d1ab985 ceph -s I get following outputs for both: cluster: id: 14a6e40a-8e54-4851-a881-661a84b3441c health: HEALTH_OK services: mon: 1 daemons, quorum serverceph-VirtualBox (age 62m) mgr: serverceph-VirtualBox(active, since 56m) osd: 0 osds: 0 up, 0 in data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs: However when I try to bring up OSD using above command, it doesn't work. Docker logs show this output: 2019-04-18 07:30:06 /opt/ceph-container/bin/entrypoint.sh: static: does not generate config 2019-04-18 07:30:06 /opt/ceph-container/bin/entrypoint.sh: ERROR- The device pointed by OSD_DEVICE (/dev/vdd) doesn't exist ! I am not sure why the doc asks to pass /dev/vdd to OSD_DEVICE env var. I know there are five different ways to spawning the OSD, but I am not able to figure out which one would be suitable for a simple deployment. If you could please let me know how to spawn OSDs using Docker, it would help a lot. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v12.2.11 Luminous released
Hi ! We have now successfully upgraded (from 12.2.10) to 12.2.11. Seems to be quite stable. (Using RBD, CephFS and RadosGW) Most of our OSDs are still on Filestore. Should we set the "pglog_hardlimit" (as it mus not be unset anymore) ? What exactly will this limit ? Are there any risks ? Any pre-checks recommended ? Br, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ubuntu18 and RBD Kernel Module
Hi ! We are running a ceph 12.2.7 Cluster and use it for RBDs. We have now a few new servers installed with Ubuntu 18. The default kernel version is v4.15.0. When we create a new rbd and map/xfs-format/mount it, everything looks fine. But if we want to map/mount a rbd that has already data in it, it takes a very long time (>5minutes) - sometimes to map, sometimes to mount it. There seems to be a process taking 100% of a cpu core during that "hang": 3103 root 20 0 0 0 0 R 100.0 0.0 0:04.65 kworker/11:1 With the "ukuu" tool, we have tested some other kernel versions : v4.16.18 - same behavior v4.18.5 - same behavior And then an older kernel : 4.4.152-0404152-generic - rbd map/mount/umount/unmap - looks fine ! In the ceph.conf there is the line "rbd default features = 3" already (on all Servers). Is there a need to further debug this, or did we miss some parameter/feature that needs to be set differently on newer Kernels ? Br, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Luminous - OSD constantly crashing caused by corrupted placement group
Hi ! We have now deleted all snapshots of the pool in question. With "ceph pg dump" we can see that pg 5.9b has a SNAPTRIMQ_LEN of 27826. All other PGs have 0. It looks like this value does not decrease. LAST_SCRUB and LAST_DEEP_SCRUB are both from 2018-04-24. Almost 1 month ago. OSD still crashing a while after we start it. OSD Log : *** Caught signal (Aborted) ** and /build/ceph-12.2.5/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != recovery_info.ss.clone_snaps.end()) Any Ideas howto fix this ? Is there a way to "force" the snaptrim of the pg in question ? Or anyother way to "clean" this pg ? We have searched a lot in the mail archives but couldnt find anything that could help us in that case. Br, Am 17.05.2018 um 00:12 schrieb Gregory Farnum: On Wed, May 16, 2018 at 6:49 AM Siegfried Höllrigl <siegfried.hoellr...@xidras.com <mailto:siegfried.hoellr...@xidras.com>> wrote: Hi Greg ! Thank you for your fast reply. We have now deleted the PG on OSD.130 like you suggested and started it : ceph-s-06 # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-130/ --pgid 5.9b --op remove --force marking collection for removal setting '_remove' omap key finish_remove_pgs 5.9b_head removing 5.9b Remove successful ceph-s-06 # systemctl start ceph-osd@130.service The cluster recovered again until it came to the PG 5.9b. Then OSD.130 crashed again. -> No Change So we wanted to start the other way and export the PG from the primary (healthy) OSD. (OSD.19) but that fails: root@ceph-s-03:/tmp5.9b# ceph-objectstore-tool --op export --pgid 5.9b --data-path /var/lib/ceph/osd/ceph-19 --file /tmp5.9b/5.9b.export OSD has the store locked But we don't want to stop OSD.19 on this server because this Pool has size=3 and size_min=2. (this would make pg5.9b inaccessable) I'm a bit confused. Are you saying that 1) the ceph-objectstore-tool you pasted there successfully removed pg 5.9b from osd.130 (as it appears), AND 2) pg 5.9b was active with one of the other nodes as primary, so all data remained available, AND 3) when pg 5.9b got backfilled into osd.130, osd.130 crashed again? (But the other OSDs kept the PG fully available, without crashing?) That sequence of events is *deeply* confusing and I really don't understand how it might happen. Sadly I don't think you can grab a PG for export without stopping the OSD in question. When we query the pg, we can see a lot of "snap_trimq". Can this be cleaned somehow, even if the pg is undersized and degraded ? I *think* the PG will keep trimming snapshots even if undersized+degraded (though I don't remember for sure), but snapshot trimming is often heavily throttled and I'm not aware of any way to specifically push one PG to the front. If you're interested in speeding snaptrimming up you can search the archives or check the docs for the appropriate config options. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Luminous - OSD constantly crashing caused by corrupted placement group
Am 17.05.2018 um 00:12 schrieb Gregory Farnum: I'm a bit confused. Are you saying that 1) the ceph-objectstore-tool you pasted there successfully removed pg 5.9b from osd.130 (as it appears), AND Yes. The process ceph-osd for osd.130 was not runnin in that phase. 2) pg 5.9b was active with one of the other OSDs as primary, so all data remained available, AND Yes. pg 5.9b is active all of the time (on two other OSDs). I think OSD.19 is the primary for that pg. "ceph pg 5.9b query" thells me : . "up": [ 19, 166 ], "acting": [ 19, 166 ], "actingbackfill": [ "19", "166" ], 3) when pg 5.9b got backfilled into osd.130, osd.130 crashed again? (But the other OSDs kept the PG fully available, without crashing?) Yes. It crashes again with the following lines in the osd log : -2> 2018-05-16 11:11:59.639980 7fe812ffd700 5 -- 10.7.2.141:6800/173031 >> 10.7.2.49:6836/3920 conn(0x5619ed76c000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=24047 cs=1 l=0). rx osd.19 seq 24 0x5619eebd6d00 pg_backfill(progress 5.9b e 505567/505567 lb 5:d97d84eb:::rbd_data.112913b238e1f29.0ba3:56c06) v3 -1> 2018-05-16 11:11:59.639995 7fe812ffd700 1 -- 10.7.2.141:6800/173031 <== osd.19 10.7.2.49:6836/3920 24 pg_backfill(progress 5.9b e 505567/505567 lb 5:d97d84eb:::rbd_data.112913b238e1f29.0ba3:56c06) v3 955+0+0 (3741758263 0 0) 0x5619eebd6d00 con 0x5619ed76c000 0> 2018-05-16 11:11:59.645952 7fe7fe7eb700 -1 /build/ceph-12.2.5/src/osd/PrimaryLogPG.cc: In function 'virtual void PrimaryLogPG::on_local_recover(const hobject_t&, const ObjectRecoveryInfo&, ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7fe7fe7eb700 time 2018-05-16 11:11:59.640238 /build/ceph-12.2.5/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != recovery_info.ss.clone_snaps.end()) ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x5619c11b1a02] 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo const&, std::shared_ptr, bool, ObjectStore::Transaction*)+0xd63) [0x5619c0d1f873] 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, PushReplyOp*, ObjectStore::Transaction*)+0x2da) [0x5619c0eb15ca] 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr)+0x12e) [0x5619c0eb17fe] 5: (ReplicatedBackend::_handle_message(boost::intrusive_ptr)+0x2c1) [0x5619c0ec0d71] 6: (PGBackend::handle_message(boost::intrusive_ptr)+0x50) [0x5619c0dcc440] 7: (PrimaryLogPG::do_request(boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x543) [0x5619c0d30853] 8: (OSD::dequeue_op(boost::intrusive_ptr, boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3a9) [0x5619c0ba7539] 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr const&)+0x57) [0x5619c0e50f37] 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x1047) [0x5619c0bd5847] 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x5619c11b67f4] 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5619c11b9830] 13: (()+0x76ba) [0x7fe8173746ba] 14: (clone()+0x6d) [0x7fe8163eb41d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. That sequence of events is *deeply* confusing and I really don't understand how it might happen. Sadly I don't think you can grab a PG for export without stopping the OSD in question. When we query the pg, we can see a lot of "snap_trimq". Can this be cleaned somehow, even if the pg is undersized and degraded ? I *think* the PG will keep trimming snapshots even if undersized+degraded (though I don't remember for sure), but snapshot trimming is often heavily throttled and I'm not aware of any way to specifically push one PG to the front. If you're interested in speeding snaptrimming up you can search the archives or check the docs for the appropriate config options. -Greg Ok. I think we should try that next. Thank you ! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph Luminous - OSD constantly crashing caused by corrupted placement group
Hi ! We have upgraded our Ceph cluster (3 Mon Servers, 9 OSD Servers, 190 OSDs total) From 10.2.10 to Ceph 12.2.4 and then to 12.2.5. (A mixture of Ubuntu 14 and 16 with the Repos from https://download.ceph.com/debian-luminous/) Now we have the Problem that One ODS is crashing again and again (approx. once per day). systemd restarts it. We could now propably identify the problem. It looks like one placement group (5.9b) causes the crash. It seems like it doesnt matter if it is running on a filestore or a bluestore osd. We could even break it down to some RBDs that were in this pool. They are already deleted, but it looks like there are some objects on the osd left, but we cant delete them : rados -p rbd ls > radosrbdls.txt echo radosrbdls.txt | grep -vE "($(rados -p rbd ls | grep rbd_header | grep -o "\.[0-9a-f]*" | sed -e :a -e '$!N; s/\n/|/; ta' -e 's/\./\\./g'))" | grep -E '(rbd_data|journal|rbd_object_map)' rbd_data.112913b238e1f29.0e3f rbd_data.112913b238e1f29.09d2 rbd_data.112913b238e1f29.0ba3 rados -p rbd rm rbd_data.112913b238e1f29.0e3f error removing rbd>rbd_data.112913b238e1f29.0e3f: (2) No such file or directory rados -p rbd rm rbd_data.112913b238e1f29.09d2 error removing rbd>rbd_data.112913b238e1f29.09d2: (2) No such file or directory rados -p rbd rm rbd_data.112913b238e1f29.0ba3 error removing rbd>rbd_data.112913b238e1f29.0ba3: (2) No such file or directory In the "current" directory of the osd there are a lot more files with this rbd prefix. Is there any chance to delete these obviously orpahed stuff before the pg becomes healthy ? (it is running now at only 2 of 3 osds) What else could cause such a crash ? We attatch (hopefully all) of the relevant logs. -103> 2018-05-14 13:01:50.514850 7f389894c700 5 -- 10.7.2.141:6801/139719 >> 10.7.2.49:0/2866 conn(0x55a13fd0d000 :6801 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=453 cs=1 l=1). rx osd.60 seq 2720 0x55a13e7bac00 osd_ping(ping e502962 stamp 2018-05-14 13:01:50.511610) v4 -102> 2018-05-14 13:01:50.514878 7f389894c700 1 -- 10.7.2.141:6801/139719 <== osd.60 10.7.2.49:0/2866 2720 osd_ping(ping e502962 stamp 2018-05-14 13:01:50.511610) v4 2004+0+0 (1134770966 0 0) 0x55a13e7bac00 con 0x55a13fd0d000 -101> 2018-05-14 13:01:50.514896 7f389894c700 1 -- 10.7.2.141:6801/139719 --> 10.7.2.49:0/2866 -- osd_ping(ping_reply e502962 stamp 2018-05-14 13:01:50.511610) v4 -- 0x55a13fd27200 con 0 -100> 2018-05-14 13:01:50.525876 7f389894c700 5 -- 10.7.2.141:6801/139719 >> 10.7.2.144:0/2988 conn(0x55a13f2dd000 :6801 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=865 cs=1 l=1). rx osd.179 seq 2652 0x55a13e442600 osd_ping(ping e502962 stamp 2018-05-14 13:01:50.531899) v4 -99> 2018-05-14 13:01:50.525902 7f389894c700 1 -- 10.7.2.141:6801/139719 <== osd.179 10.7.2.144:0/2988 2652 osd_ping(ping e502962 stamp 2018-05-14 13:01:50.531899) v4 2004+0+0 (3454691771 0 0) 0x55a13e442600 con 0x55a13f2dd000 -98> 2018-05-14 13:01:50.525917 7f389894c700 1 -- 10.7.2.141:6801/139719 --> 10.7.2.144:0/2988 -- osd_ping(ping_reply e502962 stamp 2018-05-14 13:01:50.531899) v4 -- 0x55a13fd27200 con 0 -97> 2018-05-14 13:01:50.526649 7f389914d700 5 -- 10.0.0.28:6801/139719 >> 10.0.0.24:0/2988 conn(0x55a13f2de800 :6801 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=869 cs=1 l=1). rx osd.179 seq 2652 0x55a17bd8a200 osd_ping(ping e502962 stamp 2018-05-14 13:01:50.531899) v4 -96> 2018-05-14 13:01:50.526675 7f389914d700 1 -- 10.0.0.28:6801/139719 <== osd.179 10.0.0.24:0/2988 2652 osd_ping(ping e502962 stamp 2018-05-14 13:01:50.531899) v4 2004+0+0 (3454691771 0 0) 0x55a17bd8a200 con 0x55a13f2de800 -95> 2018-05-14 13:01:50.526688 7f389914d700 1 -- 10.0.0.28:6801/139719 --> 10.0.0.24:0/2988 -- osd_ping(ping_reply e502962 stamp 2018-05-14 13:01:50.531899) v4 -- 0x55a13e43ec00 con 0 -94> 2018-05-14 13:01:50.546508 7f389994e700 5 -- 10.7.2.141:6800/139719 >> 10.7.2.50:6802/2519 conn(0x55a13e724000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=18716 cs=1 l=0). rx osd.47 seq 4894 0x55a13ec9d000 MOSDScrubReserve(3.111 REQUEST e502962) v1 -93> 2018-05-14 13:01:50.546537 7f389994e700 1 -- 10.7.2.141:6800/139719 <== osd.47 10.7.2.50:6802/2519 4894 MOSDScrubReserve(3.111 REQUEST e502962) v1 43+0+0 (327031511 0 0) 0x55a13ec9d000 con 0x55a13e724000 -92> 2018-05-14 13:01:50.546655 7f3883138700 1 -- 10.7.2.141:6800/139719 --> 10.7.2.50:6802/2519 -- MOSDScrubReserve(3.111 REJECT e502962) v1 -- 0x55a13e8fd200 con 0 -91> 2018-05-14 13:01:50.547685 7f389994e700 5 -- 10.7.2.141:6800/139719 >> 10.7.2.50:6802/2519 conn(0x55a13e724000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=18716 cs=1 l=0). rx osd.47 seq 4895 0x55a13e8fd200 MOSDScrubReserve(3.111 RELEASE e502962) v1 -90> 2018-05-14 13:01:50.547714 7f389994e700 1 --