Re: [ceph-users] Nautilus (14.2.0) OSDs crashing at startup after removing a pool containing a PG with an unrepairable error
Thanks for the pointer to ceph-objectstore-tool, it turns out that removing and exporting the PG from all three disks was enough to make it boot! I've exported the three copies of the bad PG, let me know if you'd like me to upload them anywhere for inspection. All data has been recovered (since I was originally removing the pool that contained pg 25.0 anyway) and all systems are go on my end. Ceph's architecture is very solid. $ ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-6 --op export-remove --pgid 25.0 --file pg_25_0_from_osd_6.bin Exporting 25.0 info 25.0( v 7592'106 (0'0,7592'106] lb MIN (bitwise) local-lis/les=7488/7489 n=9 ec=5191/5191 lis/c 7488/7488 les/c/f 7489/7489/0 7593/7593/7593) Export successful marking collection for removal setting '_remove' omap key finish_remove_pgs 25.0_head removing 25.0 Remove successful On Fri, Apr 26, 2019 at 8:33 PM Elise Burke wrote: > Using ceph-objectstore-info on PG 25.0 (which indeed, was the one I > remember having the error) shows this: > > struct_v 10 > { > "pgid": "25.0", > "last_update": "7592'106", > "last_complete": "7592'106", > "log_tail": "0'0", > "last_user_version": 106, > "last_backfill": "MIN", > "last_backfill_bitwise": 1, > "purged_snaps": [], > "history": { > "epoch_created": 5191, > "epoch_pool_created": 5191, > "last_epoch_started": 7489, > "last_interval_started": 7488, > "last_epoch_clean": 7489, > "last_interval_clean": 7488, > "last_epoch_split": 0, > "last_epoch_marked_full": 0, > "same_up_since": 7593, > "same_interval_since": 7593, > "same_primary_since": 7593, > "last_scrub": "7592'106", > "last_scrub_stamp": "2019-04-25 21:34:52.079721", > "last_deep_scrub": "7485'70", > "last_deep_scrub_stamp": "2019-04-22 10:15:40.532014", > "last_clean_scrub_stamp": "2019-04-19 14:52:44.047548" > }, > "stats": { > "version": "7592'105", > "reported_seq": "2621", > "reported_epoch": "7592", > "state": "active+clean+inconsistent", > "last_fresh": "2019-04-25 20:02:55.620028", > "last_change": "2019-04-24 19:52:45.072473", > "last_active": "2019-04-25 20:02:55.620028", > "last_peered": "2019-04-25 20:02:55.620028", > "last_clean": "2019-04-25 20:02:55.620028", > "last_became_active": "2019-04-22 17:55:37.578239", > "last_became_peered": "2019-04-22 17:55:37.578239", > "last_unstale": "2019-04-25 20:02:55.620028", > "last_undegraded": "2019-04-25 20:02:55.620028", > "last_fullsized": "2019-04-25 20:02:55.620028", > "mapping_epoch": 7593, > "log_start": "0'0", > "ondisk_log_start": "0'0", > "created": 5191, > "last_epoch_clean": 7489, > "parent": "0.0", > "parent_split_bits": 0, > "last_scrub": "7592'88", > "last_scrub_stamp": "2019-04-24 19:52:45.072367", > "last_deep_scrub": "7485'70", > "last_deep_scrub_stamp": "2019-04-22 10:15:40.532014", > "last_clean_scrub_stamp": "2019-04-19 14:52:44.047548", > "log_size": 105, > "ondisk_log_size": 105, > "stats_invalid": false, > "dirty_stats_invalid": false, > "omap_stats_invalid": false, > "hitset_stats_invalid": false, > "hitset_bytes_stats_invalid": false, > "pin_stats_invalid": false, > "manifest_stats_invalid": false, > "snaptrimq_len": 0, > "stat_sum": { > "num_bytes": 0, > "num_objects": 9, > "num_object_clones": 0, > "num_object_copies": 27, > "num_objects_missing_on_primary": 0, > "num_objects_missing": 0, > "num_objects_degraded": 0, > "num_objects_misplaced": 0, > "num_objects_unfound": 0, > "num_objects_dirty": 9, > "num_whiteouts": 0, > "num_read": 87, > "num_read_kb": 87, > "num_write": 98, > "num_write_kb": 98, > "num_scrub_errors": 0, > "num_shallow_scrub_errors": 0, > "num_deep_scrub_errors": 0, > "num_objects_recovered": 0, > "num_bytes_recovered": 0, > "num_keys_recovered": 0, > "num_objects_omap": 9, > "num_objects_hit_set_archive": 0, > "num_bytes_hit_set_archive": 0, > "num_flush": 0, > "num_flush_kb": 0, > "num_evict": 0, > "num_evict_kb": 0, > "num_promote": 0, > "num_flush_mode_high": 0, > "num_flush_mode_low": 0, > "num_evict_mode_some": 0, > "num_evict_mode_full": 0, > "num_objects_pinned": 0, > "num_legacy_snapsets": 0, > "num_large_omap_objects": 0, >
[ceph-users] How to enable TRIM on dmcrypt bluestore ssd devices
Hello I am using "ceph-deploy osd create --dmcrypt --bluestore" to create the OSD's. I know there is some security concern when enabling TRIM/discard on encrypted devices, but i would rather get the performance increase. Wondering how to enable TRIM in this scenario ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Nautilus (14.2.0) OSDs crashing at startup after removing a pool containing a PG with an unrepairable error
Using ceph-objectstore-info on PG 25.0 (which indeed, was the one I remember having the error) shows this: struct_v 10 { "pgid": "25.0", "last_update": "7592'106", "last_complete": "7592'106", "log_tail": "0'0", "last_user_version": 106, "last_backfill": "MIN", "last_backfill_bitwise": 1, "purged_snaps": [], "history": { "epoch_created": 5191, "epoch_pool_created": 5191, "last_epoch_started": 7489, "last_interval_started": 7488, "last_epoch_clean": 7489, "last_interval_clean": 7488, "last_epoch_split": 0, "last_epoch_marked_full": 0, "same_up_since": 7593, "same_interval_since": 7593, "same_primary_since": 7593, "last_scrub": "7592'106", "last_scrub_stamp": "2019-04-25 21:34:52.079721", "last_deep_scrub": "7485'70", "last_deep_scrub_stamp": "2019-04-22 10:15:40.532014", "last_clean_scrub_stamp": "2019-04-19 14:52:44.047548" }, "stats": { "version": "7592'105", "reported_seq": "2621", "reported_epoch": "7592", "state": "active+clean+inconsistent", "last_fresh": "2019-04-25 20:02:55.620028", "last_change": "2019-04-24 19:52:45.072473", "last_active": "2019-04-25 20:02:55.620028", "last_peered": "2019-04-25 20:02:55.620028", "last_clean": "2019-04-25 20:02:55.620028", "last_became_active": "2019-04-22 17:55:37.578239", "last_became_peered": "2019-04-22 17:55:37.578239", "last_unstale": "2019-04-25 20:02:55.620028", "last_undegraded": "2019-04-25 20:02:55.620028", "last_fullsized": "2019-04-25 20:02:55.620028", "mapping_epoch": 7593, "log_start": "0'0", "ondisk_log_start": "0'0", "created": 5191, "last_epoch_clean": 7489, "parent": "0.0", "parent_split_bits": 0, "last_scrub": "7592'88", "last_scrub_stamp": "2019-04-24 19:52:45.072367", "last_deep_scrub": "7485'70", "last_deep_scrub_stamp": "2019-04-22 10:15:40.532014", "last_clean_scrub_stamp": "2019-04-19 14:52:44.047548", "log_size": 105, "ondisk_log_size": 105, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "manifest_stats_invalid": false, "snaptrimq_len": 0, "stat_sum": { "num_bytes": 0, "num_objects": 9, "num_object_clones": 0, "num_object_copies": 27, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 0, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 9, "num_whiteouts": 0, "num_read": 87, "num_read_kb": 87, "num_write": 98, "num_write_kb": 98, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_objects_recovered": 0, "num_bytes_recovered": 0, "num_keys_recovered": 0, "num_objects_omap": 9, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0, "num_objects_pinned": 0, "num_legacy_snapsets": 0, "num_large_omap_objects": 0, "num_objects_manifest": 0, "num_omap_bytes": 0, "num_omap_keys": 0 }, "up": [], "acting": [], "blocked_by": [], "up_primary": -1, "acting_primary": -1, "purged_snaps": [] }, "empty": 0, "dne": 0, "incomplete": 1, "last_epoch_started": 7489, "hit_set_history": { "current_last_update": "0'0", "history": [] } } On Fri, Apr 26, 2019 at 7:35 PM Elise Burke wrote: > Thanks for the suggestions. I've uploaded the surprisingly large (1.5G!) > log file: ceph-post-file: 2d8d22f4-580b-4b57-a13a-f49dade34ba7 > > Looks like these are the relevant lines: > >-52> 2019-04-26 19:23:05.190 7fb2657dc700 20 osd.2 op_wq(2) _process > empty q, waiting >-51> 2019-04-26 19:23:05.190 7fb2667de700 20 osd.2 op_wq(0) _process > OpQueueItem(25.0 PGDelete(25.0 e7648) prio 5 cost 1048576 e7648) queued >-50> 2019-04-26 19:23:05.190 7fb2717f4700 20 > bluestore(/var/lib/ceph/osd/ceph-2) _kv_finalize_thread sleep >-49> 2019-04-26 19:23:05.190 7fb2667de700 20 osd.2 op_wq(0) _process > 25.0 to_process e7648)>
Re: [ceph-users] clock skew
> @Janne: i will checkout/implement the peer config per your suggestion. > However what confuses us is that chrony thinks the clocks match, and > only ceph feels it doesn't. So we are not sure if the peer config will > actually help in this situation. But time will tell. Ar ar. Chrony thinks that the clocks match *what*, though? That each system matches the public pool against which it’s synced? Something I’ve noticed, especially when using the public pool in Asia, is that DNS rotation results in the pool FQDNs to resolve differently multiple times a day. And that the quality of those servers varies considerably. Naturally the ones in that pool that I set up a few years ago are spot-on, but I digress ;) Consider this scenario: Ceph mon A resolves the pool FQDN to serverX, which is 10ms slow Ceph mon B resolves the pool FQDN to serverY, which is 20ms fast with lots of jitter That can get you a 30ms spread right there. This is the benefit of having the mons peer with each other as well as with upstream servers of varying stratum/quality — worst case, they will select one of their own to sync with. With iburst and modern client polling backoff, there usually isn’t much reason to not configure a bunch of peers. Multiple Public/vendor pool FQDNs are reasonable to include, but I also like to hardcode in a few known-good public peers as well, even one in a different region if necessary. Have your systems peer against each other too. Depending on the size of your operation, consider your own timeserver to deploy on-prem, though antenna placement can be a hassle. This is horribly non-enterprise, but I also suggest picking up one of these: https://www.netburner.com/products/network-time-server/pk70-ex-ntp-network-time-server/ It’s cheap and it can’t handle tens of thousands of clients, but it doesn’t have to. Stick it in an office window and add it to your peer lists. If you have a larger number of clients, have your internal NTP servers configure it (as well as each other, K-complete). If you don’t, include it in their local peer constellation. Best case you have an excellent low-stratum source for your systems for cheap. Worst case you are no worse off than you were before. Now, whether this situation is what you’re seeing I can’t say without more info, but it is at least plausible. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Nautilus (14.2.0) OSDs crashing at startup after removing a pool containing a PG with an unrepairable error
Thanks for the suggestions. I've uploaded the surprisingly large (1.5G!) log file: ceph-post-file: 2d8d22f4-580b-4b57-a13a-f49dade34ba7 Looks like these are the relevant lines: -52> 2019-04-26 19:23:05.190 7fb2657dc700 20 osd.2 op_wq(2) _process empty q, waiting -51> 2019-04-26 19:23:05.190 7fb2667de700 20 osd.2 op_wq(0) _process OpQueueItem(25.0 PGDelete(25.0 e7648) prio 5 cost 1048576 e7648) queued -50> 2019-04-26 19:23:05.190 7fb2717f4700 20 bluestore(/var/lib/ceph/osd/ceph-2) _kv_finalize_thread sleep -49> 2019-04-26 19:23:05.190 7fb2667de700 20 osd.2 op_wq(0) _process 25.0 to_process waiting <> waiting_peering {} -48> 2019-04-26 19:23:05.190 7fb2667de700 20 osd.2 op_wq(0) _process OpQueueItem(25.0 PGDelete(25.0 e7648) prio 5 cost 1048576 e7648) pg 0x7c20a0 -47> 2019-04-26 19:23:05.190 7fb2667de700 10 osd.2 pg_epoch: 7648 pg[25.0( v 7592'106 (0'0,7592'106] lb MIN (bitwise) local-lis/les=7488/7489 n=9 ec=5191/5191 lis/c 7488/7488 les/c/f 7489/7489/0 7593/7593/7593) [] r=-1 lpr=7648 D ELETING pi=[7488,7593)/1 crt=7592'106 lcod 0'0 unknown NOTIFY mbc={}] do_peering_event: epoch_sent: 7648 epoch_requested: 7648 DeleteSome -46> 2019-04-26 19:23:05.190 7fb2667de700 10 osd.2 pg_epoch: 7648 pg[25.0( v 7592'106 (0'0,7592'106] lb MIN (bitwise) local-lis/les=7488/7489 n=9 ec=5191/5191 lis/c 7488/7488 les/c/f 7489/7489/0 7593/7593/7593) [] r=-1 lpr=7648 D ELETING pi=[7488,7593)/1 crt=7592'106 lcod 0'0 unknown NOTIFY mbc={}] _delete_some -45> 2019-04-26 19:23:05.190 7fb2667de700 15 bluestore(/var/lib/ceph/osd/ceph-2) collection_list 25.0_head start GHMIN end GHMAX max 30 -44> 2019-04-26 19:23:05.190 7fb2667de700 20 bluestore(/var/lib/ceph/osd/ceph-2) _collection_list range 0x7f7fe5 to 0x7f7fe5 and 0x7f8019 to 0x7f8019 start GHMIN -43> 2019-04-26 19:23:05.191 7fb2667de700 20 bluestore(/var/lib/ceph/osd/ceph-2) _collection_list pend 0x7f7fe5 -42> 2019-04-26 19:23:05.191 7fb2667de700 20 bluestore(/var/lib/ceph/osd/ceph-2) _collection_list key 0x7f7fff100020'P!osdmap.7114!='0x'o' >= GHMAX -41> 2019-04-26 19:23:05.191 7fb2667de700 20 bluestore(/var/lib/ceph/osd/ceph-2) _collection_list oid #25:head# end GHMAX -40> 2019-04-26 19:23:05.191 7fb2667de700 20 bluestore(/var/lib/ceph/osd/ceph-2) _collection_list oid #25:b08b92bdhead# end GHMAX -39> 2019-04-26 19:23:05.191 7fb2667de700 20 bluestore(/var/lib/ceph/osd/ceph-2) _collection_list key 0x80800c1c0021213dfffe'o' >= GHMAX -38> 2019-04-26 19:23:05.191 7fb2667de700 10 bluestore(/var/lib/ceph/osd/ceph-2) collection_list 25.0_head start GHMAX end GHMAX max 30 = 0, ls.size() = 2, next = GHMAX -37> 2019-04-26 19:23:05.191 7fb2667de700 20 osd.2 pg_epoch: 7648 pg[25.0( v 7592'106 (0'0,7592'106] lb MIN (bitwise) local-lis/les=7488/7489 n=9 ec=5191/5191 lis/c 7488/7488 les/c/f 7489/7489/0 7593/7593/7593) [] r=-1 lpr=7648 D ELETING pi=[7488,7593)/1 crt=7592'106 lcod 0'0 unknown NOTIFY mbc={}] _delete_some [#25:head#,#25:b08b92bdhead#] -36> 2019-04-26 19:23:05.191 7fb2667de700 20 osd.2 pg_epoch: 7648 pg[25.0( v 7592'106 (0'0,7592'106] lb MIN (bitwise) local-lis/les=7488/7489 n=9 ec=5191/5191 lis/c 7488/7488 les/c/f 7489/7489/0 7593/7593/7593) [] r=-1 lpr=7648 D ELETING pi=[7488,7593)/1 crt=7592'106 lcod 0'0 unknown NOTIFY mbc={}] _delete_some finished -35> 2019-04-26 19:23:05.191 7fb2667de700 10 bluestore(/var/lib/ceph/osd/ceph-2) queue_transactions ch 0x7c27d94b40 25.0_head -34> 2019-04-26 19:23:05.191 7fb2667de700 20 bluestore(/var/lib/ceph/osd/ceph-2) _txc_create osr 0x7c274f6c60 = 0x7c2ef34c00 seq 3 -33> 2019-04-26 19:23:05.191 7fb2667de700 15 bluestore(/var/lib/ceph/osd/ceph-2) _remove 25.0_head #25:head# onode 0x7c20aaf200 txc 0x7c2ef34c00 -32> 2019-04-26 19:23:05.191 7fb2667de700 15 bluestore(/var/lib/ceph/osd/ceph-2) _do_truncate 25.0_head #25:head# 0x0 -31> 2019-04-26 19:23:05.191 7fb2667de700 20 bluestore.onode(0x7c20aaf200).flush flush done -30> 2019-04-26 19:23:05.191 7fb2667de700 20 bluestore(/var/lib/ceph/osd/ceph-2) _do_omap_clear remove range start: 0x00b0295a'-' end: 0x00b0295a'~' -29> 2019-04-26 19:23:05.191 7fb2667de700 10 bluestore(/var/lib/ceph/osd/ceph-2) _remove 25.0_head #25:head# = 0 -28> 2019-04-26 19:23:05.191 7fb2667de700 15 bluestore(/var/lib/ceph/osd/ceph-2) _remove_collection 25.0_head -27> 2019-04-26 19:23:05.191 7fb2667de700 20 bluestore.OnodeSpace(0x7c27d94c98 in 0x7c1f7e10a0) map_any -26> 2019-04-26 19:23:05.191 7fb2667de700 20 bluestore(/var/lib/ceph/osd/ceph-2) _collection_list range 0x7f7fe5 to 0x7f7fe5 and 0x7f8019 to 0x7f8019 start GHMIN -25> 2019-04-26 19:23:05.192 7fb2667de700 20
Re: [ceph-users] PG stuck peering - OSD cephx: verify_authorizer key problem
> On Apr 26, 2019, at 1:50 PM, Gregory Farnum wrote: > > Hmm yeah, it's probably not using UTC. (Despite it being good > practice, it's actually not an easy default to adhere to.) cephx > requires synchronized clocks and probably the same timezone (though I > can't swear to that.) Apps don’t “see” timezones, timezones are a rendering transform of an absolute time. The instant “now” is the same throughout space and time, regardless of how that instant is quantified. UNIX wall time is just one such quantification. Problems ensue when the rendered time is incorrect for the time zone shown in the rendering. If a machine that is “not using time zones” shows that the time is 3PM UTC and one lives in London, the internal time will be correct. On the other hand, if they live in NYC, the internal time is incorrect. This is to say 15:00UTC rendered at 3PM in NYC is very wrong *because it’s not 3PM in London*, where UTC is true. tl;dr: Make sure that your clock is set for the correct time in the time zone in whatever rendering is set. It doesn’t matter where the system actually resides or whether the TZ matches it’s geographic location. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Nautilus (14.2.0) OSDs crashing at startup after removing a pool containing a PG with an unrepairable error
You'll probably want to generate a log with "debug osd = 20" and "debug bluestore = 20", then share that or upload it with ceph-post-file, to get more useful info about which PGs are breaking (is it actually the ones that were supposed to delete?). If there's a particular set of PGs you need to rescue, you can also look at using the ceph-objectstore-tool to export them off the busted OSD stores and import them into OSDs that still work. On Fri, Apr 26, 2019 at 12:01 PM Elise Burke wrote: > > Hi, > > I upgraded to Nautilus a week or two ago and things had been mostly fine. I > was interested in trying the device health stats feature and enabled it. In > doing so it created a pool, device_health_metrics, which contained zero bytes. > > Unfortunately this pool developed a PG that could not be repaired with `ceph > pg repair`. That's okay, I thought, this pool is empty (zero bytes), so I'll > just remove it and discard the PG entirely. > > So I did: `ceph osd pool rm device_health_metrics device_health_metrics > --yes-i-really-really-mean-it` > > Within a few seconds three OSDs had gone missing (this pool was size=3) and > now crashloop at startup. > > Any assistance in getting these OSDs up (such as by discarding the errant PG) > would be appreciated. I'm most concerned about the other pools in the system, > as losing three OSDs at once has not been ideal. > > This is made more difficult as these are in the Bluestore configuration and > were set up with ceph-deploy to bare metal (using LVM mode). > > Here's the traceback as noted in journalctl: > > Apr 26 11:01:43 databox ceph-osd[1878533]: -5381> 2019-04-26 11:01:08.902 > 7f8a00866d80 -1 Falling back to public interface > Apr 26 11:01:43 databox ceph-osd[1878533]: -4241> 2019-04-26 11:01:41.835 > 7f8a00866d80 -1 osd.2 7630 log_to_monitors {default=true} > Apr 26 11:01:43 databox ceph-osd[1878533]: -3> 2019-04-26 11:01:43.203 > 7f89dee53700 -1 bluestore(/var/lib/ceph/osd/ceph-2) _txc_add_transaction > error (39) Directory not empty not handled on operation 21 (op 1, counting > from 0) > Apr 26 11:01:43 databox ceph-osd[1878533]: -1> 2019-04-26 11:01:43.209 > 7f89dee53700 -1 > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14 > Apr 26 11:01:43 databox ceph-osd[1878533]: > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.0/rpm/el7/BUILD/ceph-14.2.0/src/os/bluest > Apr 26 11:01:43 databox ceph-osd[1878533]: ceph version 14.2.0 > (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable) > Apr 26 11:01:43 databox ceph-osd[1878533]: 1: (ceph::__ceph_abort(char > const*, int, char const*, std::string const&)+0xd8) [0xfc63afe40] > Apr 26 11:01:43 databox ceph-osd[1878533]: 2: > (BlueStore::_txc_add_transaction(BlueStore::TransContext*, > ObjectStore::Transaction*)+0x2a85) [0xfc698e5f5] > Apr 26 11:01:43 databox ceph-osd[1878533]: 3: > (BlueStore::queue_transactions(boost::intrusive_ptr&, > std::vector std::allocator >&, boost::intrusive_ptr > Apr 26 11:01:43 databox ceph-osd[1878533]: 4: > (ObjectStore::queue_transaction(boost::intrusive_ptr&, > ObjectStore::Transaction&&, boost::intrusive_ptr, > ThreadPool::TPHandle*)+0x7f) [0xfc656b81f > Apr 26 11:01:43 databox ceph-osd[1878533]: 5: > (PG::_delete_some(ObjectStore::Transaction*)+0x83d) [0xfc65ce70d] > Apr 26 11:01:43 databox ceph-osd[1878533]: 6: > (PG::RecoveryState::Deleting::react(PG::DeleteSome const&)+0x38) [0xfc65cf528] > Apr 26 11:01:43 databox ceph-osd[1878533]: 7: > (boost::statechart::simple_state PG::RecoveryState::ToDelete, boost::mpl::list mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na > Apr 26 11:01:43 databox ceph-osd[1878533]: 8: > (boost::statechart::state_machine PG::RecoveryState::Initial, std::allocator, > boost::statechart::null_exception_translator>::process_event(boost > Apr 26 11:01:43 databox ceph-osd[1878533]: 9: > (PG::do_peering_event(std::shared_ptr, > PG::RecoveryCtx*)+0x119) [0xfc65dac99] > Apr 26 11:01:43 databox ceph-osd[1878533]: 10: > (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr, > ThreadPool::TPHandle&)+0x1b4) [0xfc6515494] > Apr 26 11:01:43 databox ceph-osd[1878533]: 11: > (OSD::dequeue_delete(OSDShard*, PG*, unsigned int, > ThreadPool::TPHandle&)+0x234) [0xfc65158d4] > Apr 26 11:01:43 databox ceph-osd[1878533]: 12: > (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x9f4) > [0xfc6509c14] > Apr 26 11:01:43 databox ceph-osd[1878533]: 13: > (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) > [0xfc6b01f43] > Apr 26 11:01:43 databox ceph-osd[1878533]: 14: > (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xfc6b04fe0] > Apr 26 11:01:43 databox ceph-osd[1878533]: 15: (()+0x7dd5) [0x7f89fd4b0dd5] > Apr 26 11:01:43 databox ceph-osd[1878533]: 16: (clone()+0x6d) [0x7f89fc376ead] > Apr 26
Re: [ceph-users] PG stuck peering - OSD cephx: verify_authorizer key problem
On Fri, Apr 26, 2019 at 10:55 AM Jan Pekař - Imatic wrote: > > Hi, > > yesterday my cluster reported slow request for minutes and after restarting > OSDs (reporting slow requests) it stuck with peering PGs. Whole > cluster was not responding and IO stopped. > > I also notice, that problem was with cephx - all OSDs were reporting the same > (even the same number of secret_id) > > cephx: verify_authorizer could not get service secret for service osd > secret_id=14086 > .. conn(0x559e15a5 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH > pgs=0 cs=0 l=1).handle_connect_msg: got bad authorizer > auth: could not find secret_id=14086 > > My questions are: > > Why happened that? > Can I prevent cluster from stopping to work (with cephx enabled)? > How quickly are keys rotating/expiring and can I check problems with that > anyhow? > > I'm running NTP on nodes (and also ceph monitors), so time should not be the > issue. I noticed, that some monitor nodes has no timezone set, > but I hope MONs are using UTC to distribute keys to clients. Or different > timezone between MON and OSD can cause the problem? Hmm yeah, it's probably not using UTC. (Despite it being good practice, it's actually not an easy default to adhere to.) cephx requires synchronized clocks and probably the same timezone (though I can't swear to that.) > > I "fixed" the problem by restarting monitors. > > It happened for the second time during last 3 months, so I'm reporting it as > issue, that can happen. > > I also noticed in all OSDs logs > > 2019-04-25 10:06:55.652239 7faf00096700 -1 monclient: _check_auth_rotating > possible clock skew, rotating keys expired way too early (before > 2019-04-25 09:06:55.65) > > approximately 7 hours before problem occurred. I can see, that it related to > the issue. But why 7 hours? Is there some timeout or grace > period of old keys usage before they are invalidated? 7 hours shouldn't be directly related. IIRC by default a new rotating key is issued every hour, it gives out the current and next key on request, and daemons accept keys within a half-hour offset of what they believe the current time to be. Something like that. -Greg > Thank you > > With regards > > Jan Pekar > > -- > > Ing. Jan Pekař > jan.pe...@imatic.cz > > Imatic | Jagellonská 14 | Praha 3 | 130 00 > http://www.imatic.cz > > -- > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Nautilus (14.2.0) OSDs crashing at startup after removing a pool containing a PG with an unrepairable error
Hi, I upgraded to Nautilus a week or two ago and things had been mostly fine. I was interested in trying the device health stats feature and enabled it. In doing so it created a pool, device_health_metrics, which contained zero bytes. Unfortunately this pool developed a PG that could not be repaired with `ceph pg repair`. That's okay, I thought, this pool is empty (zero bytes), so I'll just remove it and discard the PG entirely. So I did: `ceph osd pool rm device_health_metrics device_health_metrics --yes-i-really-really-mean-it` Within a few seconds three OSDs had gone missing (this pool was size=3) and now crashloop at startup. Any assistance in getting these OSDs up (such as by discarding the errant PG) would be appreciated. I'm most concerned about the other pools in the system, as losing three OSDs at once has not been ideal. This is made more difficult as these are in the Bluestore configuration and were set up with ceph-deploy to bare metal (using LVM mode). Here's the traceback as noted in journalctl: Apr 26 11:01:43 databox ceph-osd[1878533]: -5381> 2019-04-26 11:01:08.902 7f8a00866d80 -1 Falling back to public interface Apr 26 11:01:43 databox ceph-osd[1878533]: -4241> 2019-04-26 11:01:41.835 7f8a00866d80 -1 osd.2 7630 log_to_monitors {default=true} Apr 26 11:01:43 databox ceph-osd[1878533]: -3> 2019-04-26 11:01:43.203 7f89dee53700 -1 bluestore(/var/lib/ceph/osd/ceph-2) _txc_add_transaction error (39) Directory not empty not handled on operation 21 (op 1, counting from 0) Apr 26 11:01:43 databox ceph-osd[1878533]: -1> 2019-04-26 11:01:43.209 7f89dee53700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14 Apr 26 11:01:43 databox ceph-osd[1878533]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.0/rpm/el7/BUILD/ceph-14.2.0/src/os/bluest Apr 26 11:01:43 databox ceph-osd[1878533]: ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable) Apr 26 11:01:43 databox ceph-osd[1878533]: 1: (ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0xd8) [0xfc63afe40] Apr 26 11:01:43 databox ceph-osd[1878533]: 2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x2a85) [0xfc698e5f5] Apr 26 11:01:43 databox ceph-osd[1878533]: 3: (BlueStore::queue_transactions(boost::intrusive_ptr&, std::vector >&, boost::intrusive_ptr Apr 26 11:01:43 databox ceph-osd[1878533]: 4: (ObjectStore::queue_transaction(boost::intrusive_ptr&, ObjectStore::Transaction&&, boost::intrusive_ptr, ThreadPool::TPHandle*)+0x7f) [0xfc656b81f Apr 26 11:01:43 databox ceph-osd[1878533]: 5: (PG::_delete_some(ObjectStore::Transaction*)+0x83d) [0xfc65ce70d] Apr 26 11:01:43 databox ceph-osd[1878533]: 6: (PG::RecoveryState::Deleting::react(PG::DeleteSome const&)+0x38) [0xfc65cf528] Apr 26 11:01:43 databox ceph-osd[1878533]: 7: (boost::statechart::simple_state, boost::statechart::null_exception_translator>::process_event(boost Apr 26 11:01:43 databox ceph-osd[1878533]: 9: (PG::do_peering_event(std::shared_ptr, PG::RecoveryCtx*)+0x119) [0xfc65dac99] Apr 26 11:01:43 databox ceph-osd[1878533]: 10: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr, ThreadPool::TPHandle&)+0x1b4) [0xfc6515494] Apr 26 11:01:43 databox ceph-osd[1878533]: 11: (OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&)+0x234) [0xfc65158d4] Apr 26 11:01:43 databox ceph-osd[1878533]: 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x9f4) [0xfc6509c14] Apr 26 11:01:43 databox ceph-osd[1878533]: 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) [0xfc6b01f43] Apr 26 11:01:43 databox ceph-osd[1878533]: 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xfc6b04fe0] Apr 26 11:01:43 databox ceph-osd[1878533]: 15: (()+0x7dd5) [0x7f89fd4b0dd5] Apr 26 11:01:43 databox ceph-osd[1878533]: 16: (clone()+0x6d) [0x7f89fc376ead] Apr 26 11:01:43 databox ceph-osd[1878533]: 0> 2019-04-26 11:01:43.217 7f89dee53700 -1 *** Caught signal (Aborted) ** Apr 26 11:01:43 databox ceph-osd[1878533]: in thread 7f89dee53700 thread_name:tp_osd_tp Apr 26 11:01:43 databox ceph-osd[1878533]: ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable) Apr 26 11:01:43 databox ceph-osd[1878533]: 1: (()+0xf5d0) [0x7f89fd4b85d0] Apr 26 11:01:43 databox ceph-osd[1878533]: 2: (gsignal()+0x37) [0x7f89fc2af207] Apr 26 11:01:43 databox ceph-osd[1878533]: 3: (abort()+0x148) [0x7f89fc2b08f8] Apr 26 11:01:43 databox ceph-osd[1878533]: 4: (ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0x19c) [0xfc63aff04] Apr 26 11:01:43 databox ceph-osd[1878533]: 5: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x2a85) [0xfc698e5f5] Apr 26 11:01:43 databox ceph-osd[1878533]: 6:
[ceph-users] Mimic/13.2.5 bluestore OSDs crashing during startup in OSDMap::decode
Hi list, In conjunction with taking a new storage server online we observed that a whole bunch of the SSD OSDs we use for metadata went offline, and crash every time they try to restart with an abort signal in OSDMap::decode - brief log below: 2019-04-26 17:56:08.123 7f4f2956ae00 4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/version_set.cc:3362] Recovered from manifest file:db/MANIFEST-001493 succeeded,manifest_file_number is 1493, next_file_number is 1496, last_sequence is 45904669, log_number is 0,prev_log_number is 0,max_column_family is 0,deleted_log_number is 1491 2019-04-26 17:56:08.123 7f4f2956ae00 4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/version_set.cc:3370] Column family [default] (ID 0), log number is 1492 2019-04-26 17:56:08.123 7f4f2956ae00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1556294168125624, "job": 1, "event": "recovery_started", "log_files": [1494]} 2019-04-26 17:56:08.123 7f4f2956ae00 4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/db_impl_open.cc:551] Recovering log #1494 mode 0 2019-04-26 17:56:08.123 7f4f2956ae00 4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/version_set.cc:2863] Creating manifest 1496 2019-04-26 17:56:08.123 7f4f2956ae00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1556294168126875, "job": 1, "event": "recovery_finished"} 2019-04-26 17:56:08.127 7f4f2956ae00 4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/db_impl_open.cc:1218] DB pointer 0x5634c2f6 2019-04-26 17:56:08.127 7f4f2956ae00 1 bluestore(/var/lib/ceph/osd/ceph-126) _open_db opened rocksdb path db options compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152 2019-04-26 17:56:08.135 7f4f2956ae00 1 freelist init 2019-04-26 17:56:08.143 7f4f2956ae00 1 bluestore(/var/lib/ceph/osd/ceph-126) _open_alloc opening allocation metadata 2019-04-26 17:56:08.147 7f4f2956ae00 1 bluestore(/var/lib/ceph/osd/ceph-126) _open_alloc loaded 223 GiB in 233 extents 2019-04-26 17:56:08.151 7f4f2956ae00 -1 *** Caught signal (Aborted) ** in thread 7f4f2956ae00 thread_name:ceph-osd ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable) 1: (()+0x92b730) [0x5634c0151730] 2: (()+0x12890) [0x7f4f1f02b890] 3: (gsignal()+0xc7) [0x7f4f1df06e97] 4: (abort()+0x141) [0x7f4f1df08801] 5: (()+0x8c8b7) [0x7f4f1e8fb8b7] 6: (()+0x92a06) [0x7f4f1e901a06] 7: (()+0x92a41) [0x7f4f1e901a41] 8: (()+0x92c74) [0x7f4f1e901c74] 9: (OSDMap::decode(ceph::buffer::list::iterator&)+0x1864) [0x7f4f20aff694] 10: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f4f20b00af1] 11: (OSDService::try_get_map(unsigned int)+0x508) [0x5634bfbf73a8] 12: (OSDService::get_map(unsigned int)+0x1e) [0x5634bfc56ffe] 13: (OSD::init()+0x1d5f) [0x5634bfc048ef] 14: (main()+0x383d) [0x5634bfaef8cd] 15: (__libc_start_main()+0xe7) [0x7f4f1dee9b97] 16: (_start()+0x2a) [0x5634bfbb97aa] We have seen this at least once in the past, and I suspect it might be related to high load (?) in the servers when lots of PGs are peering and/or large amounts of backfilling happens. In that case it was only a single disk, so we "fixed" it by just recreating that OSD - but this time we need to get them working to avoid losing metadata. Based on previous posts to the mailing list and the bugtracker, I guessed this might be due to a corrupt osdmap for these OSDs. We managed to copy an osdmap from another OSD with ceph-objectstore-tool, but when trying to read the new osdmap again from the OSD (with get-osdmap) I still get an error: root@storage05:/var/log/ceph# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-55 --op get-osdmap --file=/tmp/osdmap terminate called after throwing an instance of 'ceph::buffer::malformed_input' what(): buffer::malformed_input: bad crc, actual 3828477398 != expected 3773790681 *** Caught signal (Aborted) ** in thread 7f7611698f80 thread_name:ceph-objectstor ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable) 1: (()+0x9895c0) [0x562e4aa3d5c0] 2: (()+0x12890) [0x7f760715c890] 3: (gsignal()+0xc7) [0x7f7606033e97] 4: (abort()+0x141) [0x7f7606035801] 5: (()+0x8c8b7) [0x7f7606a288b7] 6: (()+0x92a06) [0x7f7606a2ea06] 7: (()+0x92a41) [0x7f7606a2ea41] 8: (()+0x92c74) [0x7f7606a2ec74] 9: (OSDMap::decode(ceph::buffer::list::iterator&)+0x1864) [0x7f7607af2694] 10: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f7607af3af1] 11: (get_osdmap(ObjectStore*, unsigned int, OSDMap&, ceph::buffer::list&)+0x1e5) [0x562e4a473655] 12: (main()+0x39be) [0x562e4a3a43fe] 13: (__libc_start_main()+0xe7) [0x7f7606016b97] 14: (_start()+0x2a) [0x562e4a4727aa] Aborted (core dumped) Any clues or pointers what we should try? We have one nonimportant OSD where we can play around, but for the others I guess it makes sense to first try and duplicate (?) the data to avoid data loss when testing various things - pointers how to best do that would be
[ceph-users] PG stuck peering - OSD cephx: verify_authorizer key problem
Hi, yesterday my cluster reported slow request for minutes and after restarting OSDs (reporting slow requests) it stuck with peering PGs. Whole cluster was not responding and IO stopped. I also notice, that problem was with cephx - all OSDs were reporting the same (even the same number of secret_id) cephx: verify_authorizer could not get service secret for service osd secret_id=14086 .. conn(0x559e15a5 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: got bad authorizer auth: could not find secret_id=14086 My questions are: Why happened that? Can I prevent cluster from stopping to work (with cephx enabled)? How quickly are keys rotating/expiring and can I check problems with that anyhow? I'm running NTP on nodes (and also ceph monitors), so time should not be the issue. I noticed, that some monitor nodes has no timezone set, but I hope MONs are using UTC to distribute keys to clients. Or different timezone between MON and OSD can cause the problem? I "fixed" the problem by restarting monitors. It happened for the second time during last 3 months, so I'm reporting it as issue, that can happen. I also noticed in all OSDs logs 2019-04-25 10:06:55.652239 7faf00096700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2019-04-25 09:06:55.65) approximately 7 hours before problem occurred. I can see, that it related to the issue. But why 7 hours? Is there some timeout or grace period of old keys usage before they are invalidated? Thank you With regards Jan Pekar -- Ing. Jan Pekař jan.pe...@imatic.cz Imatic | Jagellonská 14 | Praha 3 | 130 00 http://www.imatic.cz -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Mimic/13.2.5 bluestore OSDs crashing during startup in OSDMap::decode
Hi list, In conjunction with taking a new storage server online we observed that a whole bunch of the SSD OSDs we use for metadata went offline, and crash every time they try to restart with an abort signal in OSDMap::decode - brief log below. We have seen this at least once in the past, and I suspect it might be related to high load (?) in the servers when lots of PGs are peering and/or large amounts of backfilling happens. In that case it was only a single disk, so we "fixed" it by just recreating that OSD - but this time we need to get them working to avoid losing metadata :-) Based on previous posts to the mailing list and the bugtracker, I would guess this might be due to a corrupt osdmap for these OSDs. Should we try to replace the osdmap, and if so: how do we do that for bluestore OSDs? Cheers, Erik 2019-04-26 17:56:08.123 7f4f2956ae00 4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/version_set.cc:3362] Recovered from manifest file:db/MANIFEST-001493 succeeded,manifest_file_number is 1493, next_file_number is 1496, last_sequence is 45904669, log_number is 0,prev_log_number is 0,max_column_family is 0,deleted_log_number is 1491 2019-04-26 17:56:08.123 7f4f2956ae00 4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/version_set.cc:3370] Column family [default] (ID 0), log number is 1492 2019-04-26 17:56:08.123 7f4f2956ae00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1556294168125624, "job": 1, "event": "recovery_started", "log_files": [1494]} 2019-04-26 17:56:08.123 7f4f2956ae00 4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/db_impl_open.cc:551] Recovering log #1494 mode 0 2019-04-26 17:56:08.123 7f4f2956ae00 4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/version_set.cc:2863] Creating manifest 1496 2019-04-26 17:56:08.123 7f4f2956ae00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1556294168126875, "job": 1, "event": "recovery_finished"} 2019-04-26 17:56:08.127 7f4f2956ae00 4 rocksdb: [/build/ceph-13.2.5/src/rocksdb/db/db_impl_open.cc:1218] DB pointer 0x5634c2f6 2019-04-26 17:56:08.127 7f4f2956ae00 1 bluestore(/var/lib/ceph/osd/ceph-126) _open_db opened rocksdb path db options compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152 2019-04-26 17:56:08.135 7f4f2956ae00 1 freelist init 2019-04-26 17:56:08.143 7f4f2956ae00 1 bluestore(/var/lib/ceph/osd/ceph-126) _open_alloc opening allocation metadata 2019-04-26 17:56:08.147 7f4f2956ae00 1 bluestore(/var/lib/ceph/osd/ceph-126) _open_alloc loaded 223 GiB in 233 extents 2019-04-26 17:56:08.151 7f4f2956ae00 -1 *** Caught signal (Aborted) ** in thread 7f4f2956ae00 thread_name:ceph-osd ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable) 1: (()+0x92b730) [0x5634c0151730] 2: (()+0x12890) [0x7f4f1f02b890] 3: (gsignal()+0xc7) [0x7f4f1df06e97] 4: (abort()+0x141) [0x7f4f1df08801] 5: (()+0x8c8b7) [0x7f4f1e8fb8b7] 6: (()+0x92a06) [0x7f4f1e901a06] 7: (()+0x92a41) [0x7f4f1e901a41] 8: (()+0x92c74) [0x7f4f1e901c74] 9: (OSDMap::decode(ceph::buffer::list::iterator&)+0x1864) [0x7f4f20aff694] 10: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f4f20b00af1] 11: (OSDService::try_get_map(unsigned int)+0x508) [0x5634bfbf73a8] 12: (OSDService::get_map(unsigned int)+0x1e) [0x5634bfc56ffe] 13: (OSD::init()+0x1d5f) [0x5634bfc048ef] 14: (main()+0x383d) [0x5634bfaef8cd] 15: (__libc_start_main()+0xe7) [0x7f4f1dee9b97] 16: (_start()+0x2a) [0x5634bfbb97aa] Cheers, Erik ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] clock skew
Hi all, Thanks for all replies! @Huang: ceph time-sync-status is exactly what I was looking for, thanks! @Janne: i will checkout/implement the peer config per your suggestion. However what confuses us is that chrony thinks the clocks match, and only ceph feels it doesn't. So we are not sure if the peer config will actually help in this situation. But time will tell. @John: Thanks for the maxsources suggestion @Bill: thanks for the interesting article, will check it out! MJ On 4/25/19 5:47 PM, Bill Sharer wrote: If you are just synching to the outside pool, the three hosts may end up latching on to different outside servers as their definitive sources. You might want to make one of the three a higher priority source to the other two and possibly just have it use the outside sources as sync. Also for hardware newer than about five years old, you might want to look at enabling the NIC clocks using LinuxPTP to keep clock jitter down inside your LAN. I wrote this article on the Gentoo wiki on enabling PTP in chrony. https://wiki.gentoo.org/wiki/Chrony_with_hardware_timestamping Bill Sharer On 4/25/19 6:33 AM, mj wrote: Hi all, On our three-node cluster, we have setup chrony for time sync, and even though chrony reports that it is synced to ntp time, at the same time ceph occasionally reports time skews that can last several hours. See for example: root@ceph2:~# ceph -v ceph version 12.2.10 (fc2b1783e3727b66315cc667af9d663d30fe7ed4) luminous (stable) root@ceph2:~# ceph health detail HEALTH_WARN clock skew detected on mon.1 MON_CLOCK_SKEW clock skew detected on mon.1 mon.1 addr 10.10.89.2:6789/0 clock skew 0.506374s > max 0.5s (latency 0.000591877s) root@ceph2:~# chronyc tracking Reference ID : 7F7F0101 () Stratum : 10 Ref time (UTC) : Wed Apr 24 19:05:28 2019 System time : 0.00133 seconds slow of NTP time Last offset : -0.00524 seconds RMS offset : 0.00524 seconds Frequency : 12.641 ppm slow Residual freq : +0.000 ppm Skew : 0.000 ppm Root delay : 0.00 seconds Root dispersion : 0.00 seconds Update interval : 1.4 seconds Leap status : Normal root@ceph2:~# For the record: mon.1 = ceph2 = 10.10.89.2, and time is synced similarly with NTP on the two other nodes. We don't understand this... I have now injected mon_clock_drift_allowed 0.7, so at least we have HEALTH_OK again. (to stop upsetting my monitoring system) But two questions: - can anyone explain why this is happening, is it looks as if ceph and NTP/chrony disagree on just how time-synced the servers are..? - how to determine the current clock skew from cephs perspective? Because "ceph health detail" in case of HEALTH_OK does not show it. (I want to start monitoring it continuously, to see if I can find some sort of pattern) Thanks! MJ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Luminous 12.2.8, active+undersized+degraded+inconsistent
Hello, I am running Ceph cluster on Luminous 12.2.8 with 36 OSD. Today deep-scrub has found error on PG 25.60 and later fail one of OSD. Now PG 25.60 stuck in active+undersized+degraded+inconsistent state. I cant repair it by ceph pg repair 25.60 – the repair process does not start at all. What is the correct recovery process for this situation? === ceph health detail === HEALTH_ERR 1 osds down; 1 scrub errors; Possible data damage: 1 pg inconsistent; Degraded data redundancy: 188063/5900718 objects degraded (3.187%), 117 pgs degraded, 117 pgs undersized OSD_DOWN 1 osds down osd.6 (root=default,host=hv203) is down OSD_SCRUB_ERRORS 1 scrub errors PG_DAMAGED Possible data damage: 1 pg inconsistent pg 25.60 is active+undersized+degraded+inconsistent, acting [25,4] === ceph.log === 2019-04-26 04:01:35.129464 osd.25 osd.25 10.4.5.207:6800/2469060 167 : cluster [ERR] 25.60 shard 6: soid 25:065e49e9:::rbd_data.3759266b8b4567.00018202:head candidate had a read error 2019-04-26 04:03:31.533671 osd.25 osd.25 10.4.5.207:6800/2469060 168 : cluster [ERR] 25.60 deep-scrub 0 missing, 1 inconsistent objects 2019-04-26 04:03:31.533677 osd.25 osd.25 10.4.5.207:6800/2469060 169 : cluster [ERR] 25.60 deep-scrub 1 errors === ceph-osd.6.log === 2019-04-26 04:53:17.939436 7f6a8ae48700 4 rocksdb: [/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/compaction_job.cc:1403] [default] [JOB 284] Compacting 4@0 + 4@1 files to L1, score 1.00 2019-04-26 04:53:17.939715 7f6a8ae48700 4 rocksdb: [/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/compaction_job.cc:1407] [default] Compaction start summary: Base version 283 Base level 0, inputs: [31929(25MB) 31927(21MB) 31925(22MB) 31923(26MB)], [31912(65MB) 31913(65MB) 31914(65MB) 31915(29MB)] 2019-04-26 04:53:17.939978 7f6a8ae48700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1556243597939747, "job": 284, "event": "compaction_started", "files_L0": [31929, 31927, 31925, 31923], "files_L1": [31912, 31913, 31914, 31915], "score": 1, "input_data_size": 339668148} 2019-04-26 04:53:21.500373 7f6a8ae48700 4 rocksdb: [/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/compaction_job.cc:1116] [default] [JOB 284] Generated table #31930: 380678 keys, 69567323 bytes 2019-04-26 04:53:21.500410 7f6a8ae48700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1556243601500399, "cf_name": "default", "job": 284, "event": "table_file_creation", "file_number": 31930, "file_size": 69567323, "table_properties": {"data_size": 67110779, "index_size": 1349659, "filter_size": 1105896, "raw_key_size": 22641147, "raw_average_key_size": 59, "raw_value_size": 59452413, "raw_average_value_size": 156, "num_data_blocks": 16601, "num_entries": 380678, "filter_policy_name": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "161626", "kMergeOperands": "0"}} 2019-04-26 04:53:24.294928 7f6a8ae48700 4 rocksdb: [/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/compaction_job.cc:1116] [default] [JOB 284] Generated table #31931: 118877 keys, 69059681 bytes 2019-04-26 04:53:24.294964 7f6a8ae48700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1556243604294950, "cf_name": "default", "job": 284, "event": "table_file_creation", "file_number": 31931, "file_size": 69059681, "table_properties": {"data_size": 67109949, "index_size": 1495694, "filter_size": 453050, "raw_key_size": 10391245, "raw_average_key_size": 87, "raw_value_size": 63028568, "raw_average_value_size": 530, "num_data_blocks": 16621, "num_entries": 118877, "filter_policy_name": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "1266", "kMergeOperands": "0"}} 2019-04-26 04:53:27.979518 7f6a8ae48700 4 rocksdb: [/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/compaction_job.cc:1116] [default] [JOB 284] Generated table #31932: 119238 keys, 69066929 bytes 2019-04-26 04:53:27.979545 7f6a8ae48700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1556243607979532, "cf_name": "default", "job": 284, "event": "table_file_creation", "file_number": 31932, "file_size": 69066929, "table_properties": {"data_size": 67112338, "index_size": 1499661, "filter_size": 453942, "raw_key_size": 10424324, "raw_average_key_size": 87, "raw_value_size": 63036698, "raw_average_value_size": 528, "num_data_blocks": 16599, "num_entries": 119238, "filter_policy_name": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "3045", "kMergeOperands": "0"}} 2019-04-26 04:53:31.014387 7f6a8ae48700 3 rocksdb: [/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/db_impl_compaction_flush.cc:1591] Compaction error: Corruption: block checksum mismatch 2019-04-26 04:53:31.014409 7f6a8ae48700 4 rocksdb: (Original Log Time 2019/04/26-04:53:31.012695) [/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/compaction_job.cc:621] [default] compacted to: base level 1 max bytes base 268435456 files[4 4 16 0 0 0 0] max score 0.29, MB/sec: 26.0 rd, 21.2 wr, level 1, files in(4, 4) out(4) MB in(97.3, 226.6) out(263.9), read-write-amplify(6.0) write-amplify(2.7) Corruption: block checksum mismatch, records in: 975162,
[ceph-users] Nautilus - The Manager Daemon spams its logfile with level 0 messages
I updated the test cluster from luminous to nautilus and now the ceph manager daemon starts to spam it's logfile with log level 0 messages. There is a new entry every two seconds: 2019-04-26 12:27:18.889 7f8af1afe700 0 log_channel(cluster) log [DBG] : pgmap v15: 128 pgs: 128 active+clean; 2.1 KiB data, 1.0 GiB used, 7.0 TiB / 7.0 TiB avail 2019-04-26 12:27:20.889 7f8af1afe700 0 log_channel(cluster) log [DBG] : pgmap v16: 128 pgs: 128 active+clean; 2.1 KiB data, 1.0 GiB used, 7.0 TiB / 7.0 TiB avail 2019-04-26 12:27:22.899 7f8af1afe700 0 log_channel(cluster) log [DBG] : pgmap v17: 128 pgs: 128 active+clean; 2.1 KiB data, 1.0 GiB used, 7.0 TiB / 7.0 TiB avail How can I stop these messages or at least change the interval from two seconds to every hour? I tried to change the loglevel of the manager daemon log from 0 to -1 but this is not allowed. Regards Markus Baier -- Markus Baier Systemadministrator Fachgebiet Bioinspired Communication Systems TU Darmstadt, Germany S3|19 1.6 Rundeturmstrasse 12 64283 Darmstadt ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RGW Beast frontend and ipv6 options
Currently RGW's beast frontend supports ipv6 via the endpoint configurable. The port option will bind to ipv4 _only_. http://docs.ceph.com/docs/master/radosgw/frontends/#options Since many Linux systems may default the sysconfig net.ipv6.bindv6only flag to true, it usually means that specifying a v6 endpoint will bind to both v4 and v6. But this also means that deployment systems must be aware of this while configuring depending on whether both v4 and v6 endpoints need to work or not. Specifying both a v4 and v6 endpoint or a port (v4) and endpoint with the same v6 port will currently lead to a failure as the system would've already bound the v6 port to both v4 and v6. This leaves us with a few options. 1. Keep the implicit behaviour as it is, document this, as systems are already aware of sysconfig flags and will expect that at a v6 endpoint will bind to both v4 and v6. 2. Be explicit with endpoints & configuration, Beast itself overrides the socket option to bind both v4 and v6, which means that v6 endpoint will bind to v6 *only* and binding to a v4 will need an explicit specification. (there is a pr in progress for this: https://github.com/ceph/ceph/pull/27270) Any more suggestions on how systems handle this are also welcome. -- Abhishek ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com