Re: [ceph-users] Nautilus (14.2.0) OSDs crashing at startup after removing a pool containing a PG with an unrepairable error

2019-04-26 Thread Elise Burke
Thanks for the pointer to ceph-objectstore-tool, it turns out that removing
and exporting the PG from all three disks was enough to make it boot! I've
exported the three copies of the bad PG, let me know if you'd like me to
upload them anywhere for inspection.

All data has been recovered (since I was originally removing the pool that
contained pg 25.0 anyway) and all systems are go on my end. Ceph's
architecture is very solid.


$ ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-6 --op
export-remove --pgid 25.0 --file pg_25_0_from_osd_6.bin
Exporting 25.0 info 25.0( v 7592'106 (0'0,7592'106] lb MIN (bitwise)
local-lis/les=7488/7489 n=9 ec=5191/5191 lis/c 7488/7488 les/c/f
7489/7489/0 7593/7593/7593)
Export successful
 marking collection for removal
setting '_remove' omap key
finish_remove_pgs 25.0_head removing 25.0
Remove successful


On Fri, Apr 26, 2019 at 8:33 PM Elise Burke  wrote:

> Using ceph-objectstore-info on PG 25.0 (which indeed, was the one I
> remember having the error) shows this:
>
> struct_v 10
> {
> "pgid": "25.0",
> "last_update": "7592'106",
> "last_complete": "7592'106",
> "log_tail": "0'0",
> "last_user_version": 106,
> "last_backfill": "MIN",
> "last_backfill_bitwise": 1,
> "purged_snaps": [],
> "history": {
> "epoch_created": 5191,
> "epoch_pool_created": 5191,
> "last_epoch_started": 7489,
> "last_interval_started": 7488,
> "last_epoch_clean": 7489,
> "last_interval_clean": 7488,
> "last_epoch_split": 0,
> "last_epoch_marked_full": 0,
> "same_up_since": 7593,
> "same_interval_since": 7593,
> "same_primary_since": 7593,
> "last_scrub": "7592'106",
> "last_scrub_stamp": "2019-04-25 21:34:52.079721",
> "last_deep_scrub": "7485'70",
> "last_deep_scrub_stamp": "2019-04-22 10:15:40.532014",
> "last_clean_scrub_stamp": "2019-04-19 14:52:44.047548"
> },
> "stats": {
> "version": "7592'105",
> "reported_seq": "2621",
> "reported_epoch": "7592",
> "state": "active+clean+inconsistent",
> "last_fresh": "2019-04-25 20:02:55.620028",
> "last_change": "2019-04-24 19:52:45.072473",
> "last_active": "2019-04-25 20:02:55.620028",
> "last_peered": "2019-04-25 20:02:55.620028",
> "last_clean": "2019-04-25 20:02:55.620028",
> "last_became_active": "2019-04-22 17:55:37.578239",
> "last_became_peered": "2019-04-22 17:55:37.578239",
> "last_unstale": "2019-04-25 20:02:55.620028",
> "last_undegraded": "2019-04-25 20:02:55.620028",
> "last_fullsized": "2019-04-25 20:02:55.620028",
> "mapping_epoch": 7593,
> "log_start": "0'0",
> "ondisk_log_start": "0'0",
> "created": 5191,
> "last_epoch_clean": 7489,
> "parent": "0.0",
> "parent_split_bits": 0,
> "last_scrub": "7592'88",
> "last_scrub_stamp": "2019-04-24 19:52:45.072367",
> "last_deep_scrub": "7485'70",
> "last_deep_scrub_stamp": "2019-04-22 10:15:40.532014",
> "last_clean_scrub_stamp": "2019-04-19 14:52:44.047548",
> "log_size": 105,
> "ondisk_log_size": 105,
> "stats_invalid": false,
> "dirty_stats_invalid": false,
> "omap_stats_invalid": false,
> "hitset_stats_invalid": false,
> "hitset_bytes_stats_invalid": false,
> "pin_stats_invalid": false,
> "manifest_stats_invalid": false,
> "snaptrimq_len": 0,
> "stat_sum": {
> "num_bytes": 0,
> "num_objects": 9,
> "num_object_clones": 0,
> "num_object_copies": 27,
> "num_objects_missing_on_primary": 0,
> "num_objects_missing": 0,
> "num_objects_degraded": 0,
> "num_objects_misplaced": 0,
> "num_objects_unfound": 0,
> "num_objects_dirty": 9,
> "num_whiteouts": 0,
> "num_read": 87,
> "num_read_kb": 87,
> "num_write": 98,
> "num_write_kb": 98,
> "num_scrub_errors": 0,
> "num_shallow_scrub_errors": 0,
> "num_deep_scrub_errors": 0,
> "num_objects_recovered": 0,
> "num_bytes_recovered": 0,
> "num_keys_recovered": 0,
> "num_objects_omap": 9,
> "num_objects_hit_set_archive": 0,
> "num_bytes_hit_set_archive": 0,
> "num_flush": 0,
> "num_flush_kb": 0,
> "num_evict": 0,
> "num_evict_kb": 0,
> "num_promote": 0,
> "num_flush_mode_high": 0,
> "num_flush_mode_low": 0,
> "num_evict_mode_some": 0,
> "num_evict_mode_full": 0,
> "num_objects_pinned": 0,
> "num_legacy_snapsets": 0,
> "num_large_omap_objects": 0,
> 

[ceph-users] How to enable TRIM on dmcrypt bluestore ssd devices

2019-04-26 Thread Kári Bertilsson
Hello

I am using "ceph-deploy osd create --dmcrypt --bluestore" to create the
OSD's.

I know there is some security concern when enabling TRIM/discard on
encrypted devices, but i would rather get the performance increase.

Wondering how to enable TRIM in this scenario ?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nautilus (14.2.0) OSDs crashing at startup after removing a pool containing a PG with an unrepairable error

2019-04-26 Thread Elise Burke
Using ceph-objectstore-info on PG 25.0 (which indeed, was the one I
remember having the error) shows this:

struct_v 10
{
"pgid": "25.0",
"last_update": "7592'106",
"last_complete": "7592'106",
"log_tail": "0'0",
"last_user_version": 106,
"last_backfill": "MIN",
"last_backfill_bitwise": 1,
"purged_snaps": [],
"history": {
"epoch_created": 5191,
"epoch_pool_created": 5191,
"last_epoch_started": 7489,
"last_interval_started": 7488,
"last_epoch_clean": 7489,
"last_interval_clean": 7488,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 7593,
"same_interval_since": 7593,
"same_primary_since": 7593,
"last_scrub": "7592'106",
"last_scrub_stamp": "2019-04-25 21:34:52.079721",
"last_deep_scrub": "7485'70",
"last_deep_scrub_stamp": "2019-04-22 10:15:40.532014",
"last_clean_scrub_stamp": "2019-04-19 14:52:44.047548"
},
"stats": {
"version": "7592'105",
"reported_seq": "2621",
"reported_epoch": "7592",
"state": "active+clean+inconsistent",
"last_fresh": "2019-04-25 20:02:55.620028",
"last_change": "2019-04-24 19:52:45.072473",
"last_active": "2019-04-25 20:02:55.620028",
"last_peered": "2019-04-25 20:02:55.620028",
"last_clean": "2019-04-25 20:02:55.620028",
"last_became_active": "2019-04-22 17:55:37.578239",
"last_became_peered": "2019-04-22 17:55:37.578239",
"last_unstale": "2019-04-25 20:02:55.620028",
"last_undegraded": "2019-04-25 20:02:55.620028",
"last_fullsized": "2019-04-25 20:02:55.620028",
"mapping_epoch": 7593,
"log_start": "0'0",
"ondisk_log_start": "0'0",
"created": 5191,
"last_epoch_clean": 7489,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "7592'88",
"last_scrub_stamp": "2019-04-24 19:52:45.072367",
"last_deep_scrub": "7485'70",
"last_deep_scrub_stamp": "2019-04-22 10:15:40.532014",
"last_clean_scrub_stamp": "2019-04-19 14:52:44.047548",
"log_size": 105,
"ondisk_log_size": 105,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"manifest_stats_invalid": false,
"snaptrimq_len": 0,
"stat_sum": {
"num_bytes": 0,
"num_objects": 9,
"num_object_clones": 0,
"num_object_copies": 27,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 9,
"num_whiteouts": 0,
"num_read": 87,
"num_read_kb": 87,
"num_write": 98,
"num_write_kb": 98,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 0,
"num_bytes_recovered": 0,
"num_keys_recovered": 0,
"num_objects_omap": 9,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0,
"num_flush": 0,
"num_flush_kb": 0,
"num_evict": 0,
"num_evict_kb": 0,
"num_promote": 0,
"num_flush_mode_high": 0,
"num_flush_mode_low": 0,
"num_evict_mode_some": 0,
"num_evict_mode_full": 0,
"num_objects_pinned": 0,
"num_legacy_snapsets": 0,
"num_large_omap_objects": 0,
"num_objects_manifest": 0,
"num_omap_bytes": 0,
"num_omap_keys": 0
},
"up": [],
"acting": [],
"blocked_by": [],
"up_primary": -1,
"acting_primary": -1,
"purged_snaps": []
},
"empty": 0,
"dne": 0,
"incomplete": 1,
"last_epoch_started": 7489,
"hit_set_history": {
"current_last_update": "0'0",
"history": []
}
}


On Fri, Apr 26, 2019 at 7:35 PM Elise Burke  wrote:

> Thanks for the suggestions. I've uploaded the surprisingly large (1.5G!)
> log file: ceph-post-file: 2d8d22f4-580b-4b57-a13a-f49dade34ba7
>
> Looks like these are the relevant lines:
>
>-52> 2019-04-26 19:23:05.190 7fb2657dc700 20 osd.2 op_wq(2) _process
> empty q, waiting
>-51> 2019-04-26 19:23:05.190 7fb2667de700 20 osd.2 op_wq(0) _process
> OpQueueItem(25.0 PGDelete(25.0 e7648) prio 5 cost 1048576 e7648) queued
>-50> 2019-04-26 19:23:05.190 7fb2717f4700 20
> bluestore(/var/lib/ceph/osd/ceph-2) _kv_finalize_thread sleep
>-49> 2019-04-26 19:23:05.190 7fb2667de700 20 osd.2 op_wq(0) _process
> 25.0 to_process  e7648)> 

Re: [ceph-users] clock skew

2019-04-26 Thread Anthony D'Atri
> @Janne: i will checkout/implement the peer config per your suggestion. 
> However what confuses us is that chrony thinks the clocks match, and 
> only ceph feels it doesn't. So we are not sure if the peer config will 
> actually help in this situation. But time will tell.

Ar ar.

Chrony thinks that the clocks match *what*, though?  That each system matches 
the public pool against which it’s synced?

Something I’ve noticed, especially when using the public pool in Asia, is that 
DNS rotation results in the pool FQDNs to resolve differently multiple times a 
day.  And that the quality of those servers varies considerably.  Naturally the 
ones in that pool that I set up a few years ago are spot-on, but I digress ;)

Consider this scenario:

Ceph mon A resolves the pool FQDN to serverX, which is 10ms slow
Ceph mon B resolves the pool FQDN to serverY, which is 20ms fast with lots of 
jitter

That can get you a 30ms spread right there.  This is the benefit of having the 
mons peer with each other as well as with upstream servers of varying 
stratum/quality — worst case, they will select one of their own to sync with.

With iburst and modern client polling backoff, there usually isn’t much reason 
to not configure a bunch of peers.  Multiple Public/vendor pool FQDNs are 
reasonable to include, but I also like to hardcode in a few known-good public 
peers as well, even one in a different region if necessary.  Have your systems 
peer against each other too.  

Depending on the size of your operation, consider your own timeserver to deploy 
on-prem, though antenna placement can be a hassle.

This is horribly non-enterprise, but I also suggest picking up one of these:

https://www.netburner.com/products/network-time-server/pk70-ex-ntp-network-time-server/

It’s cheap and it can’t handle tens of thousands of clients, but it doesn’t 
have to.  Stick it in an office window and add it to your peer lists.  If you 
have a larger number of clients, have your internal NTP servers configure it 
(as well as each other, K-complete).  If you don’t, include it in their local 
peer constellation.  Best case you have an excellent low-stratum source for 
your systems for cheap.  Worst case you are no worse off than you were before.

Now, whether this situation is what you’re seeing I can’t say without more 
info, but it is at least plausible.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nautilus (14.2.0) OSDs crashing at startup after removing a pool containing a PG with an unrepairable error

2019-04-26 Thread Elise Burke
Thanks for the suggestions. I've uploaded the surprisingly large (1.5G!)
log file: ceph-post-file: 2d8d22f4-580b-4b57-a13a-f49dade34ba7

Looks like these are the relevant lines:

   -52> 2019-04-26 19:23:05.190 7fb2657dc700 20 osd.2 op_wq(2) _process
empty q, waiting
   -51> 2019-04-26 19:23:05.190 7fb2667de700 20 osd.2 op_wq(0) _process
OpQueueItem(25.0 PGDelete(25.0 e7648) prio 5 cost 1048576 e7648) queued
   -50> 2019-04-26 19:23:05.190 7fb2717f4700 20
bluestore(/var/lib/ceph/osd/ceph-2) _kv_finalize_thread sleep
   -49> 2019-04-26 19:23:05.190 7fb2667de700 20 osd.2 op_wq(0) _process
25.0 to_process  waiting <> waiting_peering {}
   -48> 2019-04-26 19:23:05.190 7fb2667de700 20 osd.2 op_wq(0) _process
OpQueueItem(25.0 PGDelete(25.0 e7648) prio 5 cost 1048576 e7648) pg
0x7c20a0
   -47> 2019-04-26 19:23:05.190 7fb2667de700 10 osd.2 pg_epoch: 7648
pg[25.0( v 7592'106 (0'0,7592'106] lb MIN (bitwise) local-lis/les=7488/7489
n=9 ec=5191/5191 lis/c 7488/7488 les/c/f 7489/7489/0 7593/7593/7593) []
r=-1 lpr=7648 D
ELETING pi=[7488,7593)/1 crt=7592'106 lcod 0'0 unknown NOTIFY mbc={}]
do_peering_event: epoch_sent: 7648 epoch_requested: 7648 DeleteSome
   -46> 2019-04-26 19:23:05.190 7fb2667de700 10 osd.2 pg_epoch: 7648
pg[25.0( v 7592'106 (0'0,7592'106] lb MIN (bitwise) local-lis/les=7488/7489
n=9 ec=5191/5191 lis/c 7488/7488 les/c/f 7489/7489/0 7593/7593/7593) []
r=-1 lpr=7648 D
ELETING pi=[7488,7593)/1 crt=7592'106 lcod 0'0 unknown NOTIFY mbc={}]
_delete_some
   -45> 2019-04-26 19:23:05.190 7fb2667de700 15
bluestore(/var/lib/ceph/osd/ceph-2) collection_list 25.0_head start GHMIN
end GHMAX max 30
   -44> 2019-04-26 19:23:05.190 7fb2667de700 20
bluestore(/var/lib/ceph/osd/ceph-2) _collection_list range
0x7f7fe5 to 0x7f7fe5 and
0x7f8019 to 0x7f8019
 start GHMIN
   -43> 2019-04-26 19:23:05.191 7fb2667de700 20
bluestore(/var/lib/ceph/osd/ceph-2) _collection_list pend
0x7f7fe5
   -42> 2019-04-26 19:23:05.191 7fb2667de700 20
bluestore(/var/lib/ceph/osd/ceph-2) _collection_list key
0x7f7fff100020'P!osdmap.7114!='0x'o'
>= GHMAX
   -41> 2019-04-26 19:23:05.191 7fb2667de700 20
bluestore(/var/lib/ceph/osd/ceph-2) _collection_list oid
#25:head# end GHMAX
   -40> 2019-04-26 19:23:05.191 7fb2667de700 20
bluestore(/var/lib/ceph/osd/ceph-2) _collection_list oid
#25:b08b92bdhead# end GHMAX
   -39> 2019-04-26 19:23:05.191 7fb2667de700 20
bluestore(/var/lib/ceph/osd/ceph-2) _collection_list key
0x80800c1c0021213dfffe'o' >=
GHMAX
   -38> 2019-04-26 19:23:05.191 7fb2667de700 10
bluestore(/var/lib/ceph/osd/ceph-2) collection_list 25.0_head start GHMAX
end GHMAX max 30 = 0, ls.size() = 2, next = GHMAX
   -37> 2019-04-26 19:23:05.191 7fb2667de700 20 osd.2 pg_epoch: 7648
pg[25.0( v 7592'106 (0'0,7592'106] lb MIN (bitwise) local-lis/les=7488/7489
n=9 ec=5191/5191 lis/c 7488/7488 les/c/f 7489/7489/0 7593/7593/7593) []
r=-1 lpr=7648 D
ELETING pi=[7488,7593)/1 crt=7592'106 lcod 0'0 unknown NOTIFY mbc={}]
_delete_some [#25:head#,#25:b08b92bdhead#]
   -36> 2019-04-26 19:23:05.191 7fb2667de700 20 osd.2 pg_epoch: 7648
pg[25.0( v 7592'106 (0'0,7592'106] lb MIN (bitwise) local-lis/les=7488/7489
n=9 ec=5191/5191 lis/c 7488/7488 les/c/f 7489/7489/0 7593/7593/7593) []
r=-1 lpr=7648 D
ELETING pi=[7488,7593)/1 crt=7592'106 lcod 0'0 unknown NOTIFY mbc={}]
_delete_some finished
   -35> 2019-04-26 19:23:05.191 7fb2667de700 10
bluestore(/var/lib/ceph/osd/ceph-2) queue_transactions ch 0x7c27d94b40
25.0_head
   -34> 2019-04-26 19:23:05.191 7fb2667de700 20
bluestore(/var/lib/ceph/osd/ceph-2) _txc_create osr 0x7c274f6c60 =
0x7c2ef34c00 seq 3
   -33> 2019-04-26 19:23:05.191 7fb2667de700 15
bluestore(/var/lib/ceph/osd/ceph-2) _remove 25.0_head #25:head#
onode 0x7c20aaf200 txc 0x7c2ef34c00
   -32> 2019-04-26 19:23:05.191 7fb2667de700 15
bluestore(/var/lib/ceph/osd/ceph-2) _do_truncate 25.0_head
#25:head# 0x0
   -31> 2019-04-26 19:23:05.191 7fb2667de700 20
bluestore.onode(0x7c20aaf200).flush flush done
   -30> 2019-04-26 19:23:05.191 7fb2667de700 20
bluestore(/var/lib/ceph/osd/ceph-2) _do_omap_clear remove range start:
0x00b0295a'-' end: 0x00b0295a'~'
   -29> 2019-04-26 19:23:05.191 7fb2667de700 10
bluestore(/var/lib/ceph/osd/ceph-2) _remove 25.0_head #25:head#
= 0
   -28> 2019-04-26 19:23:05.191 7fb2667de700 15
bluestore(/var/lib/ceph/osd/ceph-2) _remove_collection 25.0_head
   -27> 2019-04-26 19:23:05.191 7fb2667de700 20
bluestore.OnodeSpace(0x7c27d94c98 in 0x7c1f7e10a0) map_any
   -26> 2019-04-26 19:23:05.191 7fb2667de700 20
bluestore(/var/lib/ceph/osd/ceph-2) _collection_list range
0x7f7fe5 to 0x7f7fe5 and
0x7f8019 to 0x7f8019
 start GHMIN
   -25> 2019-04-26 19:23:05.192 7fb2667de700 20

Re: [ceph-users] PG stuck peering - OSD cephx: verify_authorizer key problem

2019-04-26 Thread Brian Topping
> On Apr 26, 2019, at 1:50 PM, Gregory Farnum  wrote:
> 
> Hmm yeah, it's probably not using UTC. (Despite it being good
> practice, it's actually not an easy default to adhere to.) cephx
> requires synchronized clocks and probably the same timezone (though I
> can't swear to that.)

Apps don’t “see” timezones, timezones are a rendering transform of an absolute 
time. The instant “now” is the same throughout space and time, regardless of 
how that instant is quantified. UNIX wall time is just one such quantification.

Problems ensue when the rendered time is incorrect for the time zone shown in 
the rendering. If a machine that is “not using time zones” shows that the time 
is 3PM UTC and one lives in London, the internal time will be correct. On the 
other hand, if they live in NYC, the internal time is incorrect. This is to say 
15:00UTC rendered at 3PM in NYC is very wrong *because it’s not 3PM in London*, 
where UTC is true.

tl;dr: Make sure that your clock is set for the correct time in the time zone 
in whatever rendering is set. It doesn’t matter where the system actually 
resides or whether the TZ matches it’s geographic location. ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nautilus (14.2.0) OSDs crashing at startup after removing a pool containing a PG with an unrepairable error

2019-04-26 Thread Gregory Farnum
You'll probably want to generate a log with "debug osd = 20" and
"debug bluestore = 20", then share that or upload it with
ceph-post-file, to get more useful info about which PGs are breaking
(is it actually the ones that were supposed to delete?).

If there's a particular set of PGs you need to rescue, you can also
look at using the ceph-objectstore-tool to export them off the busted
OSD stores and import them into OSDs that still work.


On Fri, Apr 26, 2019 at 12:01 PM Elise Burke  wrote:
>
> Hi,
>
> I upgraded to Nautilus a week or two ago and things had been mostly fine. I 
> was interested in trying the device health stats feature and enabled it. In 
> doing so it created a pool, device_health_metrics, which contained zero bytes.
>
> Unfortunately this pool developed a PG that could not be repaired with `ceph 
> pg repair`. That's okay, I thought, this pool is empty (zero bytes), so I'll 
> just remove it and discard the PG entirely.
>
> So I did: `ceph osd pool rm device_health_metrics device_health_metrics 
> --yes-i-really-really-mean-it`
>
> Within a few seconds three OSDs had gone missing (this pool was size=3) and 
> now crashloop at startup.
>
> Any assistance in getting these OSDs up (such as by discarding the errant PG) 
> would be appreciated. I'm most concerned about the other pools in the system, 
> as losing three OSDs at once has not been ideal.
>
> This is made more difficult as these are in the Bluestore configuration and 
> were set up with ceph-deploy to bare metal (using LVM mode).
>
> Here's the traceback as noted in journalctl:
>
> Apr 26 11:01:43 databox ceph-osd[1878533]: -5381> 2019-04-26 11:01:08.902 
> 7f8a00866d80 -1 Falling back to public interface
> Apr 26 11:01:43 databox ceph-osd[1878533]: -4241> 2019-04-26 11:01:41.835 
> 7f8a00866d80 -1 osd.2 7630 log_to_monitors {default=true}
> Apr 26 11:01:43 databox ceph-osd[1878533]: -3> 2019-04-26 11:01:43.203 
> 7f89dee53700 -1 bluestore(/var/lib/ceph/osd/ceph-2) _txc_add_transaction 
> error (39) Directory not empty not handled on operation 21 (op 1, counting 
> from 0)
> Apr 26 11:01:43 databox ceph-osd[1878533]: -1> 2019-04-26 11:01:43.209 
> 7f89dee53700 -1 
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14
> Apr 26 11:01:43 databox ceph-osd[1878533]: 
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.0/rpm/el7/BUILD/ceph-14.2.0/src/os/bluest
> Apr 26 11:01:43 databox ceph-osd[1878533]: ceph version 14.2.0 
> (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
> Apr 26 11:01:43 databox ceph-osd[1878533]: 1: (ceph::__ceph_abort(char 
> const*, int, char const*, std::string const&)+0xd8) [0xfc63afe40]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 2: 
> (BlueStore::_txc_add_transaction(BlueStore::TransContext*, 
> ObjectStore::Transaction*)+0x2a85) [0xfc698e5f5]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 3: 
> (BlueStore::queue_transactions(boost::intrusive_ptr&,
>  std::vector std::allocator >&, boost::intrusive_ptr
> Apr 26 11:01:43 databox ceph-osd[1878533]: 4: 
> (ObjectStore::queue_transaction(boost::intrusive_ptr&,
>  ObjectStore::Transaction&&, boost::intrusive_ptr, 
> ThreadPool::TPHandle*)+0x7f) [0xfc656b81f
> Apr 26 11:01:43 databox ceph-osd[1878533]: 5: 
> (PG::_delete_some(ObjectStore::Transaction*)+0x83d) [0xfc65ce70d]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 6: 
> (PG::RecoveryState::Deleting::react(PG::DeleteSome const&)+0x38) [0xfc65cf528]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 7: 
> (boost::statechart::simple_state PG::RecoveryState::ToDelete, boost::mpl::list mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na
> Apr 26 11:01:43 databox ceph-osd[1878533]: 8: 
> (boost::statechart::state_machine PG::RecoveryState::Initial, std::allocator, 
> boost::statechart::null_exception_translator>::process_event(boost
> Apr 26 11:01:43 databox ceph-osd[1878533]: 9: 
> (PG::do_peering_event(std::shared_ptr, 
> PG::RecoveryCtx*)+0x119) [0xfc65dac99]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 10: 
> (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr, 
> ThreadPool::TPHandle&)+0x1b4) [0xfc6515494]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 11: 
> (OSD::dequeue_delete(OSDShard*, PG*, unsigned int, 
> ThreadPool::TPHandle&)+0x234) [0xfc65158d4]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 12: 
> (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x9f4) 
> [0xfc6509c14]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 13: 
> (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) 
> [0xfc6b01f43]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 14: 
> (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xfc6b04fe0]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 15: (()+0x7dd5) [0x7f89fd4b0dd5]
> Apr 26 11:01:43 databox ceph-osd[1878533]: 16: (clone()+0x6d) [0x7f89fc376ead]
> Apr 26 

Re: [ceph-users] PG stuck peering - OSD cephx: verify_authorizer key problem

2019-04-26 Thread Gregory Farnum
On Fri, Apr 26, 2019 at 10:55 AM Jan Pekař - Imatic  wrote:
>
> Hi,
>
> yesterday my cluster reported slow request for minutes and after restarting 
> OSDs (reporting slow requests) it stuck with peering PGs. Whole
> cluster was not responding and IO stopped.
>
> I also notice, that problem was with cephx - all OSDs were reporting the same 
> (even the same number of secret_id)
>
> cephx: verify_authorizer could not get service secret for service osd 
> secret_id=14086
> .. conn(0x559e15a5 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH 
> pgs=0 cs=0 l=1).handle_connect_msg: got bad authorizer
> auth: could not find secret_id=14086
>
> My questions are:
>
> Why happened that?
> Can I prevent cluster from stopping to work (with cephx enabled)?
> How quickly are keys rotating/expiring and can I check problems with that 
> anyhow?
>
> I'm running NTP on nodes (and also ceph monitors), so time should not be the 
> issue. I noticed, that some monitor nodes has no timezone set,
> but I hope MONs are using UTC to distribute keys to clients. Or different 
> timezone between MON and OSD can cause the problem?

Hmm yeah, it's probably not using UTC. (Despite it being good
practice, it's actually not an easy default to adhere to.) cephx
requires synchronized clocks and probably the same timezone (though I
can't swear to that.)

>
> I "fixed" the problem by restarting monitors.
>
> It happened for the second time during last 3 months, so I'm reporting it as 
> issue, that can happen.
>
> I also noticed in all OSDs logs
>
> 2019-04-25 10:06:55.652239 7faf00096700 -1 monclient: _check_auth_rotating 
> possible clock skew, rotating keys expired way too early (before
> 2019-04-25 09:06:55.65)
>
> approximately 7 hours before problem occurred. I can see, that it related to 
> the issue. But why 7 hours? Is there some timeout or grace
> period of old keys usage before they are invalidated?

7 hours shouldn't be directly related. IIRC by default a new rotating
key is issued every hour, it gives out the current and next key on
request, and daemons accept keys within a half-hour offset of what
they believe the current time to be. Something like that.
-Greg

> Thank you
>
> With regards
>
> Jan Pekar
>
> --
> 
> Ing. Jan Pekař
> jan.pe...@imatic.cz
> 
> Imatic | Jagellonská 14 | Praha 3 | 130 00
> http://www.imatic.cz
> 
> --
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Nautilus (14.2.0) OSDs crashing at startup after removing a pool containing a PG with an unrepairable error

2019-04-26 Thread Elise Burke
Hi,

I upgraded to Nautilus a week or two ago and things had been mostly fine. I
was interested in trying the device health stats feature and enabled it. In
doing so it created a pool, device_health_metrics, which contained zero
bytes.

Unfortunately this pool developed a PG that could not be repaired with `ceph
pg repair`. That's okay, I thought, this pool is empty (zero bytes), so
I'll just remove it and discard the PG entirely.

So I did: `ceph osd pool rm device_health_metrics device_health_metrics
--yes-i-really-really-mean-it`

Within a few seconds three OSDs had gone missing (this pool was size=3) and
now crashloop at startup.

Any assistance in getting these OSDs up (such as by discarding the errant
PG) would be appreciated. I'm most concerned about the other pools in the
system, as losing three OSDs at once has not been ideal.

This is made more difficult as these are in the Bluestore configuration and
were set up with ceph-deploy to bare metal (using LVM mode).

Here's the traceback as noted in journalctl:

Apr 26 11:01:43 databox ceph-osd[1878533]: -5381> 2019-04-26 11:01:08.902
7f8a00866d80 -1 Falling back to public interface
Apr 26 11:01:43 databox ceph-osd[1878533]: -4241> 2019-04-26 11:01:41.835
7f8a00866d80 -1 osd.2 7630 log_to_monitors {default=true}
Apr 26 11:01:43 databox ceph-osd[1878533]: -3> 2019-04-26 11:01:43.203
7f89dee53700 -1 bluestore(/var/lib/ceph/osd/ceph-2) _txc_add_transaction
error (39) Directory not empty not handled on operation 21 (op 1, counting
from 0)
Apr 26 11:01:43 databox ceph-osd[1878533]: -1> 2019-04-26 11:01:43.209
7f89dee53700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14
Apr 26 11:01:43 databox ceph-osd[1878533]:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.0/rpm/el7/BUILD/ceph-14.2.0/src/os/bluest
Apr 26 11:01:43 databox ceph-osd[1878533]: ceph version 14.2.0
(3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
Apr 26 11:01:43 databox ceph-osd[1878533]: 1: (ceph::__ceph_abort(char
const*, int, char const*, std::string const&)+0xd8) [0xfc63afe40]
Apr 26 11:01:43 databox ceph-osd[1878533]: 2:
(BlueStore::_txc_add_transaction(BlueStore::TransContext*,
ObjectStore::Transaction*)+0x2a85) [0xfc698e5f5]
Apr 26 11:01:43 databox ceph-osd[1878533]: 3:
(BlueStore::queue_transactions(boost::intrusive_ptr&,
std::vector >&, boost::intrusive_ptr
Apr 26 11:01:43 databox ceph-osd[1878533]: 4:
(ObjectStore::queue_transaction(boost::intrusive_ptr&,
ObjectStore::Transaction&&, boost::intrusive_ptr,
ThreadPool::TPHandle*)+0x7f) [0xfc656b81f
Apr 26 11:01:43 databox ceph-osd[1878533]: 5:
(PG::_delete_some(ObjectStore::Transaction*)+0x83d) [0xfc65ce70d]
Apr 26 11:01:43 databox ceph-osd[1878533]: 6:
(PG::RecoveryState::Deleting::react(PG::DeleteSome const&)+0x38)
[0xfc65cf528]
Apr 26 11:01:43 databox ceph-osd[1878533]: 7:
(boost::statechart::simple_state,
boost::statechart::null_exception_translator>::process_event(boost
Apr 26 11:01:43 databox ceph-osd[1878533]: 9:
(PG::do_peering_event(std::shared_ptr,
PG::RecoveryCtx*)+0x119) [0xfc65dac99]
Apr 26 11:01:43 databox ceph-osd[1878533]: 10:
(OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr,
ThreadPool::TPHandle&)+0x1b4) [0xfc6515494]
Apr 26 11:01:43 databox ceph-osd[1878533]: 11:
(OSD::dequeue_delete(OSDShard*, PG*, unsigned int,
ThreadPool::TPHandle&)+0x234) [0xfc65158d4]
Apr 26 11:01:43 databox ceph-osd[1878533]: 12:
(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x9f4)
[0xfc6509c14]
Apr 26 11:01:43 databox ceph-osd[1878533]: 13:
(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433)
[0xfc6b01f43]
Apr 26 11:01:43 databox ceph-osd[1878533]: 14:
(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xfc6b04fe0]
Apr 26 11:01:43 databox ceph-osd[1878533]: 15: (()+0x7dd5) [0x7f89fd4b0dd5]
Apr 26 11:01:43 databox ceph-osd[1878533]: 16: (clone()+0x6d)
[0x7f89fc376ead]
Apr 26 11:01:43 databox ceph-osd[1878533]: 0> 2019-04-26 11:01:43.217
7f89dee53700 -1 *** Caught signal (Aborted) **
Apr 26 11:01:43 databox ceph-osd[1878533]: in thread 7f89dee53700
thread_name:tp_osd_tp
Apr 26 11:01:43 databox ceph-osd[1878533]: ceph version 14.2.0
(3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus (stable)
Apr 26 11:01:43 databox ceph-osd[1878533]: 1: (()+0xf5d0) [0x7f89fd4b85d0]
Apr 26 11:01:43 databox ceph-osd[1878533]: 2: (gsignal()+0x37)
[0x7f89fc2af207]
Apr 26 11:01:43 databox ceph-osd[1878533]: 3: (abort()+0x148)
[0x7f89fc2b08f8]
Apr 26 11:01:43 databox ceph-osd[1878533]: 4: (ceph::__ceph_abort(char
const*, int, char const*, std::string const&)+0x19c) [0xfc63aff04]
Apr 26 11:01:43 databox ceph-osd[1878533]: 5:
(BlueStore::_txc_add_transaction(BlueStore::TransContext*,
ObjectStore::Transaction*)+0x2a85) [0xfc698e5f5]
Apr 26 11:01:43 databox ceph-osd[1878533]: 6:

[ceph-users] Mimic/13.2.5 bluestore OSDs crashing during startup in OSDMap::decode

2019-04-26 Thread Erik Lindahl
Hi list,

In conjunction with taking a new storage server online we observed that a
whole bunch of the SSD OSDs we use for metadata went offline, and crash
every time they try to restart with an abort signal in OSDMap::decode -
brief log below:



2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb:
[/build/ceph-13.2.5/src/rocksdb/db/version_set.cc:3362] Recovered from
manifest file:db/MANIFEST-001493 succeeded,manifest_file_number is 1493,
next_file_number is 1496, last_sequence is 45904669, log_number is
0,prev_log_number is 0,max_column_family is 0,deleted_log_number is 1491

2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb:
[/build/ceph-13.2.5/src/rocksdb/db/version_set.cc:3370] Column family
[default] (ID 0), log number is 1492

2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1556294168125624, "job": 1, "event": "recovery_started",
"log_files": [1494]}
2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb:
[/build/ceph-13.2.5/src/rocksdb/db/db_impl_open.cc:551] Recovering log
#1494 mode 0
2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb:
[/build/ceph-13.2.5/src/rocksdb/db/version_set.cc:2863] Creating manifest
1496

2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1556294168126875, "job": 1, "event": "recovery_finished"}
2019-04-26 17:56:08.127 7f4f2956ae00  4 rocksdb:
[/build/ceph-13.2.5/src/rocksdb/db/db_impl_open.cc:1218] DB pointer
0x5634c2f6
2019-04-26 17:56:08.127 7f4f2956ae00  1
bluestore(/var/lib/ceph/osd/ceph-126) _open_db opened rocksdb path db
options
compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152
2019-04-26 17:56:08.135 7f4f2956ae00  1 freelist init
2019-04-26 17:56:08.143 7f4f2956ae00  1
bluestore(/var/lib/ceph/osd/ceph-126) _open_alloc opening allocation
metadata
2019-04-26 17:56:08.147 7f4f2956ae00  1
bluestore(/var/lib/ceph/osd/ceph-126) _open_alloc loaded 223 GiB in 233
extents
2019-04-26 17:56:08.151 7f4f2956ae00 -1 *** Caught signal (Aborted) **
 in thread 7f4f2956ae00 thread_name:ceph-osd

 ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic
(stable)
 1: (()+0x92b730) [0x5634c0151730]
 2: (()+0x12890) [0x7f4f1f02b890]
 3: (gsignal()+0xc7) [0x7f4f1df06e97]
 4: (abort()+0x141) [0x7f4f1df08801]
 5: (()+0x8c8b7) [0x7f4f1e8fb8b7]
 6: (()+0x92a06) [0x7f4f1e901a06]
 7: (()+0x92a41) [0x7f4f1e901a41]
 8: (()+0x92c74) [0x7f4f1e901c74]
 9: (OSDMap::decode(ceph::buffer::list::iterator&)+0x1864) [0x7f4f20aff694]
 10: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f4f20b00af1]
 11: (OSDService::try_get_map(unsigned int)+0x508) [0x5634bfbf73a8]
 12: (OSDService::get_map(unsigned int)+0x1e) [0x5634bfc56ffe]
 13: (OSD::init()+0x1d5f) [0x5634bfc048ef]
 14: (main()+0x383d) [0x5634bfaef8cd]
 15: (__libc_start_main()+0xe7) [0x7f4f1dee9b97]
 16: (_start()+0x2a) [0x5634bfbb97aa]



We have seen this at least once in the past, and I suspect it might be
related to high load (?) in the servers when lots of PGs are peering and/or
large amounts of backfilling happens. In that case it was only a single
disk, so we "fixed" it by just recreating that OSD - but this time we need
to get them working to avoid losing metadata.



Based on previous posts to the mailing list and the bugtracker, I guessed
this might be due to a corrupt osdmap for these OSDs. We managed to copy an
osdmap from another OSD with ceph-objectstore-tool, but when trying to read
the new osdmap again from the OSD (with get-osdmap) I still get an error:


root@storage05:/var/log/ceph# ceph-objectstore-tool --data-path
/var/lib/ceph/osd/ceph-55 --op get-osdmap --file=/tmp/osdmap
terminate called after throwing an instance of
'ceph::buffer::malformed_input'
  what():  buffer::malformed_input: bad crc, actual 3828477398 != expected
3773790681
*** Caught signal (Aborted) **
 in thread 7f7611698f80 thread_name:ceph-objectstor
 ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic
(stable)
 1: (()+0x9895c0) [0x562e4aa3d5c0]
 2: (()+0x12890) [0x7f760715c890]
 3: (gsignal()+0xc7) [0x7f7606033e97]
 4: (abort()+0x141) [0x7f7606035801]
 5: (()+0x8c8b7) [0x7f7606a288b7]
 6: (()+0x92a06) [0x7f7606a2ea06]
 7: (()+0x92a41) [0x7f7606a2ea41]
 8: (()+0x92c74) [0x7f7606a2ec74]
 9: (OSDMap::decode(ceph::buffer::list::iterator&)+0x1864) [0x7f7607af2694]
 10: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f7607af3af1]
 11: (get_osdmap(ObjectStore*, unsigned int, OSDMap&,
ceph::buffer::list&)+0x1e5) [0x562e4a473655]
 12: (main()+0x39be) [0x562e4a3a43fe]
 13: (__libc_start_main()+0xe7) [0x7f7606016b97]
 14: (_start()+0x2a) [0x562e4a4727aa]
Aborted (core dumped)



Any clues or pointers what we should try? We have one nonimportant OSD
where we can play around, but for the others I guess it makes sense to
first try and duplicate (?) the data to avoid data loss when testing
various things - pointers how to best do that would be 

[ceph-users] PG stuck peering - OSD cephx: verify_authorizer key problem

2019-04-26 Thread Jan Pekař - Imatic

Hi,

yesterday my cluster reported slow request for minutes and after restarting OSDs (reporting slow requests) it stuck with peering PGs. Whole 
cluster was not responding and IO stopped.


I also notice, that problem was with cephx - all OSDs were reporting the same 
(even the same number of secret_id)

cephx: verify_authorizer could not get service secret for service osd 
secret_id=14086
.. conn(0x559e15a5 :6800 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 
cs=0 l=1).handle_connect_msg: got bad authorizer
auth: could not find secret_id=14086

My questions are:

Why happened that?
Can I prevent cluster from stopping to work (with cephx enabled)?
How quickly are keys rotating/expiring and can I check problems with that 
anyhow?

I'm running NTP on nodes (and also ceph monitors), so time should not be the issue. I noticed, that some monitor nodes has no timezone set, 
but I hope MONs are using UTC to distribute keys to clients. Or different timezone between MON and OSD can cause the problem?


I "fixed" the problem by restarting monitors.

It happened for the second time during last 3 months, so I'm reporting it as 
issue, that can happen.

I also noticed in all OSDs logs

2019-04-25 10:06:55.652239 7faf00096700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 
2019-04-25 09:06:55.65)


approximately 7 hours before problem occurred. I can see, that it related to the issue. But why 7 hours? Is there some timeout or grace 
period of old keys usage before they are invalidated?


Thank you

With regards

Jan Pekar

--

Ing. Jan Pekař
jan.pe...@imatic.cz

Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz

--

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Mimic/13.2.5 bluestore OSDs crashing during startup in OSDMap::decode

2019-04-26 Thread Erik Lindahl
Hi list,

In conjunction with taking a new storage server online we observed that a
whole bunch of the SSD OSDs we use for metadata went offline, and crash
every time they try to restart with an abort signal in OSDMap::decode -
brief log below.

We have seen this at least once in the past, and I suspect it might be
related to high load (?) in the servers when lots of PGs are peering and/or
large amounts of backfilling happens. In that case it was only a single
disk, so we "fixed" it by just recreating that OSD - but this time we need
to get them working to avoid losing metadata :-)


Based on previous posts to the mailing list and the bugtracker, I would
guess this might be due to a corrupt osdmap for these OSDs.

Should we try to replace the osdmap, and if so: how do we do that for
bluestore OSDs?

Cheers,

Erik


2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb:
[/build/ceph-13.2.5/src/rocksdb/db/version_set.cc:3362] Recovered from
manifest file:db/MANIFEST-001493 succeeded,manifest_file_number is 1493,
next_file_number is 1496, last_sequence is 45904669, log_number is
0,prev_log_number is 0,max_column_family is 0,deleted_log_number is 1491

2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb:
[/build/ceph-13.2.5/src/rocksdb/db/version_set.cc:3370] Column family
[default] (ID 0), log number is 1492

2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1556294168125624, "job": 1, "event": "recovery_started",
"log_files": [1494]}
2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb:
[/build/ceph-13.2.5/src/rocksdb/db/db_impl_open.cc:551] Recovering log
#1494 mode 0
2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb:
[/build/ceph-13.2.5/src/rocksdb/db/version_set.cc:2863] Creating manifest
1496

2019-04-26 17:56:08.123 7f4f2956ae00  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1556294168126875, "job": 1, "event": "recovery_finished"}
2019-04-26 17:56:08.127 7f4f2956ae00  4 rocksdb:
[/build/ceph-13.2.5/src/rocksdb/db/db_impl_open.cc:1218] DB pointer
0x5634c2f6
2019-04-26 17:56:08.127 7f4f2956ae00  1
bluestore(/var/lib/ceph/osd/ceph-126) _open_db opened rocksdb path db
options
compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152
2019-04-26 17:56:08.135 7f4f2956ae00  1 freelist init
2019-04-26 17:56:08.143 7f4f2956ae00  1
bluestore(/var/lib/ceph/osd/ceph-126) _open_alloc opening allocation
metadata
2019-04-26 17:56:08.147 7f4f2956ae00  1
bluestore(/var/lib/ceph/osd/ceph-126) _open_alloc loaded 223 GiB in 233
extents
2019-04-26 17:56:08.151 7f4f2956ae00 -1 *** Caught signal (Aborted) **
 in thread 7f4f2956ae00 thread_name:ceph-osd

 ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic
(stable)
 1: (()+0x92b730) [0x5634c0151730]
 2: (()+0x12890) [0x7f4f1f02b890]
 3: (gsignal()+0xc7) [0x7f4f1df06e97]
 4: (abort()+0x141) [0x7f4f1df08801]
 5: (()+0x8c8b7) [0x7f4f1e8fb8b7]
 6: (()+0x92a06) [0x7f4f1e901a06]
 7: (()+0x92a41) [0x7f4f1e901a41]
 8: (()+0x92c74) [0x7f4f1e901c74]
 9: (OSDMap::decode(ceph::buffer::list::iterator&)+0x1864) [0x7f4f20aff694]
 10: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f4f20b00af1]
 11: (OSDService::try_get_map(unsigned int)+0x508) [0x5634bfbf73a8]
 12: (OSDService::get_map(unsigned int)+0x1e) [0x5634bfc56ffe]
 13: (OSD::init()+0x1d5f) [0x5634bfc048ef]
 14: (main()+0x383d) [0x5634bfaef8cd]
 15: (__libc_start_main()+0xe7) [0x7f4f1dee9b97]
 16: (_start()+0x2a) [0x5634bfbb97aa]

Cheers,

Erik
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] clock skew

2019-04-26 Thread mj

Hi all,

Thanks for all replies!

@Huang: ceph time-sync-status is exactly what I was looking for, thanks!

@Janne: i will checkout/implement the peer config per your suggestion. 
However what confuses us is that chrony thinks the clocks match, and 
only ceph feels it doesn't. So we are not sure if the peer config will 
actually help in this situation. But time will tell.


@John: Thanks for the maxsources suggestion

@Bill: thanks for the interesting article, will check it out!

MJ

On 4/25/19 5:47 PM, Bill Sharer wrote:
If you are just synching to the outside pool, the three hosts may end up 
latching on to different outside servers as their definitive sources. 
You might want to make one of the three a higher priority source to the 
other two and possibly just have it use the outside sources as sync. 
Also for hardware newer than about five years old, you might want to 
look at enabling the NIC clocks using LinuxPTP to keep clock jitter down 
inside your LAN.  I wrote this article on the Gentoo wiki on enabling 
PTP in chrony.


https://wiki.gentoo.org/wiki/Chrony_with_hardware_timestamping

Bill Sharer


On 4/25/19 6:33 AM, mj wrote:

Hi all,

On our three-node cluster, we have setup chrony for time sync, and 
even though chrony reports that it is synced to ntp time, at the same 
time ceph occasionally reports time skews that can last several hours.


See for example:


root@ceph2:~# ceph -v
ceph version 12.2.10 (fc2b1783e3727b66315cc667af9d663d30fe7ed4) 
luminous (stable)

root@ceph2:~# ceph health detail
HEALTH_WARN clock skew detected on mon.1
MON_CLOCK_SKEW clock skew detected on mon.1
    mon.1 addr 10.10.89.2:6789/0 clock skew 0.506374s > max 0.5s 
(latency 0.000591877s)

root@ceph2:~# chronyc tracking
Reference ID    : 7F7F0101 ()
Stratum : 10
Ref time (UTC)  : Wed Apr 24 19:05:28 2019
System time : 0.00133 seconds slow of NTP time
Last offset : -0.00524 seconds
RMS offset  : 0.00524 seconds
Frequency   : 12.641 ppm slow
Residual freq   : +0.000 ppm
Skew    : 0.000 ppm
Root delay  : 0.00 seconds
Root dispersion : 0.00 seconds
Update interval : 1.4 seconds
Leap status : Normal
root@ceph2:~# 


For the record: mon.1 = ceph2 = 10.10.89.2, and time is synced 
similarly with NTP on the two other nodes.


We don't understand this...

I have now injected mon_clock_drift_allowed 0.7, so at least we have 
HEALTH_OK again. (to stop upsetting my monitoring system)


But two questions:

- can anyone explain why this is happening, is it looks as if ceph and 
NTP/chrony disagree on just how time-synced the servers are..?


- how to determine the current clock skew from cephs perspective? 
Because "ceph health detail" in case of HEALTH_OK does not show it.
(I want to start monitoring it continuously, to see if I can find some 
sort of pattern)


Thanks!

MJ
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Luminous 12.2.8, active+undersized+degraded+inconsistent

2019-04-26 Thread Slava Astashonok
Hello,
I am running Ceph cluster on Luminous 12.2.8 with 36 OSD.
Today deep-scrub  has found error on PG 25.60 and later fail one of OSD.
Now PG 25.60  stuck in active+undersized+degraded+inconsistent state. I
cant repair it by ceph pg repair 25.60 – the repair process does not
start at all. What is the correct recovery process for this situation?


=== ceph health detail ===
HEALTH_ERR 1 osds down; 1 scrub errors; Possible data damage: 1 pg
inconsistent; Degraded data redundancy: 188063/5900718 objects degraded
(3.187%), 117 pgs degraded, 117 pgs undersized
OSD_DOWN 1 osds down
osd.6 (root=default,host=hv203) is down
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 25.60 is active+undersized+degraded+inconsistent, acting [25,4]


=== ceph.log ===
2019-04-26 04:01:35.129464 osd.25 osd.25 10.4.5.207:6800/2469060 167 :
cluster [ERR] 25.60 shard 6: soid
25:065e49e9:::rbd_data.3759266b8b4567.00018202:head candidate
had a read error
2019-04-26 04:03:31.533671 osd.25 osd.25 10.4.5.207:6800/2469060 168 :
cluster [ERR] 25.60 deep-scrub 0 missing, 1 inconsistent objects
2019-04-26 04:03:31.533677 osd.25 osd.25 10.4.5.207:6800/2469060 169 :
cluster [ERR] 25.60 deep-scrub 1 errors


=== ceph-osd.6.log ===
2019-04-26 04:53:17.939436 7f6a8ae48700  4 rocksdb:
[/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/compaction_job.cc:1403]
[default] [JOB 284] Compacting 4@0 + 4@1 files to L1, score 1.00
2019-04-26 04:53:17.939715 7f6a8ae48700  4 rocksdb:
[/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/compaction_job.cc:1407]
[default] Compaction start summary: Base version 283 Base level 0,
inputs: [31929(25MB) 31927(21MB) 31925(22MB) 31923(26MB)], [31912(65MB)
31913(65MB) 31914(65MB) 31915(29MB)]

2019-04-26 04:53:17.939978 7f6a8ae48700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1556243597939747, "job": 284, "event":
"compaction_started", "files_L0": [31929, 31927, 31925, 31923],
"files_L1": [31912, 31913, 31914, 31915], "score": 1, "input_data_size":
339668148}
2019-04-26 04:53:21.500373 7f6a8ae48700  4 rocksdb:
[/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/compaction_job.cc:1116]
[default] [JOB 284] Generated table #31930: 380678 keys, 69567323 bytes
2019-04-26 04:53:21.500410 7f6a8ae48700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1556243601500399, "cf_name": "default", "job": 284,
"event": "table_file_creation", "file_number": 31930, "file_size":
69567323, "table_properties": {"data_size": 67110779, "index_size":
1349659, "filter_size": 1105896, "raw_key_size": 22641147,
"raw_average_key_size": 59, "raw_value_size": 59452413,
"raw_average_value_size": 156, "num_data_blocks": 16601, "num_entries":
380678, "filter_policy_name": "rocksdb.BuiltinBloomFilter",
"kDeletedKeys": "161626", "kMergeOperands": "0"}}
2019-04-26 04:53:24.294928 7f6a8ae48700  4 rocksdb:
[/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/compaction_job.cc:1116]
[default] [JOB 284] Generated table #31931: 118877 keys, 69059681 bytes
2019-04-26 04:53:24.294964 7f6a8ae48700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1556243604294950, "cf_name": "default", "job": 284,
"event": "table_file_creation", "file_number": 31931, "file_size":
69059681, "table_properties": {"data_size": 67109949, "index_size":
1495694, "filter_size": 453050, "raw_key_size": 10391245,
"raw_average_key_size": 87, "raw_value_size": 63028568,
"raw_average_value_size": 530, "num_data_blocks": 16621, "num_entries":
118877, "filter_policy_name": "rocksdb.BuiltinBloomFilter",
"kDeletedKeys": "1266", "kMergeOperands": "0"}}
2019-04-26 04:53:27.979518 7f6a8ae48700  4 rocksdb:
[/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/compaction_job.cc:1116]
[default] [JOB 284] Generated table #31932: 119238 keys, 69066929 bytes
2019-04-26 04:53:27.979545 7f6a8ae48700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1556243607979532, "cf_name": "default", "job": 284,
"event": "table_file_creation", "file_number": 31932, "file_size":
69066929, "table_properties": {"data_size": 67112338, "index_size":
1499661, "filter_size": 453942, "raw_key_size": 10424324,
"raw_average_key_size": 87, "raw_value_size": 63036698,
"raw_average_value_size": 528, "num_data_blocks": 16599, "num_entries":
119238, "filter_policy_name": "rocksdb.BuiltinBloomFilter",
"kDeletedKeys": "3045", "kMergeOperands": "0"}}
2019-04-26 04:53:31.014387 7f6a8ae48700  3 rocksdb:
[/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/db_impl_compaction_flush.cc:1591]
Compaction error: Corruption: block checksum mismatch
2019-04-26 04:53:31.014409 7f6a8ae48700  4 rocksdb: (Original Log Time
2019/04/26-04:53:31.012695)
[/mnt/npool/a.antreich/ceph/ceph-12.2.8/src/rocksdb/db/compaction_job.cc:621]
[default] compacted to: base level 1 max bytes base 268435456 files[4 4
16 0 0 0 0] max score 0.29, MB/sec: 26.0 rd, 21.2 wr, level 1, files
in(4, 4) out(4) MB in(97.3, 226.6) out(263.9), read-write-amplify(6.0)
write-amplify(2.7) Corruption: block checksum mismatch, records in:
975162, 

[ceph-users] Nautilus - The Manager Daemon spams its logfile with level 0 messages

2019-04-26 Thread Markus Baier

I updated the test cluster from luminous to nautilus
and now the ceph manager daemon starts to spam it's logfile
with log level 0 messages.
There is a new entry every two seconds:

2019-04-26 12:27:18.889 7f8af1afe700  0 log_channel(cluster) log [DBG] : pgmap 
v15: 128 pgs: 128 active+clean; 2.1 KiB data, 1.0 GiB used, 7.0 TiB / 7.0 TiB 
avail
2019-04-26 12:27:20.889 7f8af1afe700  0 log_channel(cluster) log [DBG] : pgmap 
v16: 128 pgs: 128 active+clean; 2.1 KiB data, 1.0 GiB used, 7.0 TiB / 7.0 TiB 
avail
2019-04-26 12:27:22.899 7f8af1afe700  0 log_channel(cluster) log [DBG] : pgmap 
v17: 128 pgs: 128 active+clean; 2.1 KiB data, 1.0 GiB used, 7.0 TiB / 7.0 TiB 
avail


How can I stop these messages or at least change the interval
from two seconds to every hour?
I tried to change the loglevel of the manager daemon log
from 0 to -1 but this is not allowed.


Regards
Markus Baier

--
Markus Baier
Systemadministrator
Fachgebiet Bioinspired Communication Systems
TU Darmstadt, Germany
S3|19 1.6
Rundeturmstrasse 12
64283 Darmstadt

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW Beast frontend and ipv6 options

2019-04-26 Thread Abhishek Lekshmanan


Currently RGW's beast frontend supports ipv6 via the endpoint
configurable. The port option will bind to ipv4 _only_.

http://docs.ceph.com/docs/master/radosgw/frontends/#options

Since many Linux systems may default the sysconfig net.ipv6.bindv6only
flag to true, it usually means that specifying a v6 endpoint will bind
to both v4 and v6. But this also means that deployment systems must be
aware of this while configuring depending on whether both v4 and v6
endpoints need to work or not. Specifying both a v4 and v6 endpoint or a
port (v4) and endpoint with the same v6 port will currently lead to a
failure as the system would've already bound the v6 port to both v4 and
v6. This leaves us with a few options.

1. Keep the implicit behaviour as it is, document this, as systems are
already aware of sysconfig flags and will expect that at a v6 endpoint
will bind to both v4 and v6.

2. Be explicit with endpoints & configuration, Beast itself overrides
the socket option to bind both v4 and v6, which means that v6 endpoint
will bind to v6 *only* and binding to a v4 will need an explicit
specification. (there is a pr in progress for this:
https://github.com/ceph/ceph/pull/27270)

Any more suggestions on how systems handle this are also welcome.

--
Abhishek 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com