Re: [ceph-users] Unexpected "out" OSD behaviour

2019-12-22 Thread Jonas Jelten
hi!

I've also noticed that behavior and have submitted a patch some time ago that 
should fix (2):
https://github.com/ceph/ceph/pull/27288

But it may well be that there's more cases where PGs are not discovered on 
devices that do have them. Just recently a
lot of my data was degraded and then recreated even though it would have been 
available on a node that had taken very
long to reboot.

What you can do also is to mark your OSD in and then out right away, the data 
is discovered then. Although with my patch
that shouldn't be necessary any more. Hope this helps you.

Cheers
  -- Jonas


On 22/12/2019 19.48, Oliver Freyermuth wrote:
> Dear Cephers,
> 
> I realized the following behaviour only recently:
> 
> 1. Marking an OSD "out" sets the weight to zero and allows to migrate data 
> away (as long as it is up),
>i.e. it is still considered as a "source" and nothing goes to degraded 
> state (so far, everything expected). 
> 2. Restarting an "out" OSD, however, means it will come back with "0 pgs", 
> and if data was not fully migrated away yet,
>it means the PGs which were still kept on it before will enter degraded 
> state since they now lack a copy / shard.
> 
> Is (2) expected? 
> 
> If so, my understanding that taking an OSD "out" to let the data be migrated 
> away without losing any redundancy is wrong,
> since redundancy will be lost as soon as the "out" OSD is restarted (e.g. due 
> to a crash, node reboot,...) and the only safe options would be:
> 1. Disable the automatic balancer. 
> 2. Either adjust the weights of the OSDs to drain to zero, or use pg upmap to 
> drain them. 
> 3. Reenable the automatic balancer only after having fully drained those OSDs 
> and performing the necessary intervention
>(in our case, recreating the OSDs with a faster blockdb). 
> 
> Is this correct? 
> 
> Cheers,
>   Oliver
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] krbd namespace missing in /dev

2019-06-10 Thread Jonas Jelten
When I run:

  rbd map --name client.lol poolname/somenamespace/imagename

The image is mapped to /dev/rbd0 and

  /dev/rbd/poolname/imagename

I would expect the rbd to be mapped to (the rbdmap tool tries this name):

  /dev/rbd/poolname/somenamespace/imagename

The current map point would not allow same-named images in different 
namespaces, and the automatic mount of rbdmap fails
because of this.


Are there plans to fix this?


Cheers
-- Jonas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG scrub stamps reset to 0.000000 in 14.2.1

2019-06-05 Thread Jonas Jelten
Hi!

I'm also affected by this:

HEALTH_WARN 13 pgs not deep-scrubbed in time; 13 pgs not scrubbed in time
PG_NOT_DEEP_SCRUBBED 13 pgs not deep-scrubbed in time
pg 6.b1 not deep-scrubbed since 0.00
pg 7.ac not deep-scrubbed since 0.00
pg 7.a0 not deep-scrubbed since 0.00
pg 6.96 not deep-scrubbed since 0.00
pg 7.92 not deep-scrubbed since 0.00
pg 6.86 not deep-scrubbed since 0.00
pg 7.74 not deep-scrubbed since 0.00
pg 7.75 not deep-scrubbed since 0.00
pg 7.49 not deep-scrubbed since 0.00
pg 7.47 not deep-scrubbed since 0.00
pg 6.2a not deep-scrubbed since 0.00
pg 6.26 not deep-scrubbed since 0.00
pg 6.b not deep-scrubbed since 0.00
PG_NOT_SCRUBBED 13 pgs not scrubbed in time
pg 6.b1 not scrubbed since 0.00
pg 7.ac not scrubbed since 0.00
pg 7.a0 not scrubbed since 0.00
pg 6.96 not scrubbed since 0.00
pg 7.92 not scrubbed since 0.00
pg 6.86 not scrubbed since 0.00
pg 7.74 not scrubbed since 0.00
pg 7.75 not scrubbed since 0.00
pg 7.49 not scrubbed since 0.00
pg 7.47 not scrubbed since 0.00
pg 6.2a not scrubbed since 0.00
pg 6.26 not scrubbed since 0.00
pg 6.b not scrubbed since 0.00


A week ago this status was:


HEALTH_WARN 6 pgs not deep-scrubbed in time; 6 pgs not scrubbed in time
PG_NOT_DEEP_SCRUBBED 6 pgs not deep-scrubbed in time
pg 7.b1 not deep-scrubbed since 0.00
pg 7.7e not deep-scrubbed since 0.00
pg 6.6e not deep-scrubbed since 0.00
pg 7.8 not deep-scrubbed since 0.00
pg 7.40 not deep-scrubbed since 0.00
pg 6.f5 not deep-scrubbed since 0.00
PG_NOT_SCRUBBED 6 pgs not scrubbed in time
pg 7.b1 not scrubbed since 0.00
pg 7.7e not scrubbed since 0.00
pg 6.6e not scrubbed since 0.00
pg 7.8 not scrubbed since 0.00
pg 7.40 not scrubbed since 0.00
pg 6.f5 not scrubbed since 0.00


Is this a known problem already? I can't find a bug report.


Cheers

-- Jonas



On 16/05/2019 01.13, Brett Chancellor wrote:
> After upgrading from 14.2.0 to 14.2.1, I've noticed PGs are frequently 
> resetting their scrub and deep scrub time stamps
> to 0.00.  It's extra strange because the peers show timestamps for deep 
> scrubs.
> 
> ## First entry from a pg list at 7pm
> $ grep 11.2f2 ~/pgs-active.7pm 
> 11.2f2     691        0         0       0 2897477632           0          0 
> 2091 active+clean    3h  7378'12291 
>  8048:36261    [1,6,37]p1    [1,6,37]p1 2019-05-14 21:01:29.172460 2019-05-14 
> 21:01:29.172460 
> 
> ## Next Entry 3 minutes later
> $ ceph pg ls active |grep 11.2f2
> 11.2f2     695        0         0       0 2914713600           0          0 
> 2091 active+clean    6s  7378'12291 
>  8049:36330    [1,6,37]p1    [1,6,37]p1                   0.00            
>        0.00 
> 
> ## PG Query
> {
>     "state": "active+clean",
>     "snap_trimq": "[]",
>     "snap_trimq_len": 0,
>     "epoch": 8049,
>     "up": [
>         1,
>         6,
>         37
>     ],
>     "acting": [
>         1,
>         6,
>         37
>     ],
>     "acting_recovery_backfill": [
>         "1",
>         "6",
>         "37"
>     ],
>     "info": {
>         "pgid": "11.2f2",
>         "last_update": "7378'12291",
>         "last_complete": "7378'12291",
>         "log_tail": "1087'10200",
>         "last_user_version": 12291,
>         "last_backfill": "MAX",
>         "last_backfill_bitwise": 1,
>         "purged_snaps": [],
>         "history": {
>             "epoch_created": 1549,
>             "epoch_pool_created": 216,
>             "last_epoch_started": 6148,
>             "last_interval_started": 6147,
>             "last_epoch_clean": 6148,
>             "last_interval_clean": 6147,
>             "last_epoch_split": 6147,
>             "last_epoch_marked_full": 0,
>             "same_up_since": 6126,
>             "same_interval_since": 6147,
>             "same_primary_since": 6126,
>             "last_scrub": "7378'12291",
>             "last_scrub_stamp": "0.00",
>             "last_deep_scrub": "6103'12186",
>             "last_deep_scrub_stamp": "0.00",
>             "last_clean_scrub_stamp": "2019-05-15 23:08:17.014575"
>         },
>         "stats": {
>             "version": "7378'12291",
>             "reported_seq": "36700",
>             "reported_epoch": "8049",
>             "state": "active+clean",
>             "last_fresh": "2019-05-15 23:08:17.014609",
>             "last_change": "2019-05-15 23:08:17.014609",
>             "last_active": "2019-05-15 23:08:17.014609",
>             "last_peered": "2019-05-15 23:08:17.014609",
>             "last_clean": "2019-05-15 23:08:17.014609",
>             "last_became_active": "2019-05-15 19:25:01.484322",
>             "last_became_peered": "2019-05-15 19:25:01.484322",
>             "last_unstale": "2019-05-15 23:08:17.014609",
>      

Re: [ceph-users] inline_data (was: CephFS and many small files)

2019-04-02 Thread Jonas Jelten
On 02/04/2019 15.05, Yan, Zheng wrote:
> I don't use this feature.  We don't have plan to mark this feature
> stable. (probably we will remove this feature in the furthure).

Oh no! We have activated inline_data since our cluster does have lots of small 
files (but also big ones), and
performance increased significantly, especially when deleting those small files 
on CephFS.

I hope inline_data is not removed, but improved (or at least marked stable) 
instead!

Of course, if CephFS can handle small files with the same performance after 
inline_data was removed, that would also be
okay. But stepping back would hurt :)


-- Jonas

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mount.ceph replacement in Python

2019-02-15 Thread Jonas Jelten
Hi!

I've created a mount.ceph.c replacement in Python which also utilizes the 
kernel keyring and does name resolutions.
You can mount a CephFS without installing Ceph that way (and without using the 
legacy secret= mount option).

https://github.com/SFTtech/ceph-mount

When you place the script (or a symlink) in /sbin/mount.ceph, you can mount 
CephFS with systemd .mount units.
I hope it's useful for somebody here someday :)
Currently it's not optimized for proper packaging (no setup.py yet).

If things don't work or you wanna change something, just open bugs or pull 
requests please.


   -- Jonas

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Automated Deep Scrub always inconsistent

2018-11-12 Thread Jonas Jelten
Maybe you are hitting the kernel bug worked around by 
https://github.com/ceph/ceph/pull/23273

-- Jonas


On 12/11/2018 16.39, Ashley Merrick wrote:
> Is anyone else seeing this?
> 
> I have just setup another cluster to check on completely different hardware 
> and everything running EC still.
> 
> And getting inconsistent PG’s flagged after an auto deep scrub which can be 
> fixed by just running another deep-scrub.
> 
> On Thu, 8 Nov 2018 at 4:23 PM, Ashley Merrick  > wrote:
> 
> Have in the past few days noticed that every single automated deep scrub 
> comes back as inconsistent, once I run a
> manual deep-scrub it finishes fine and the PG is marked as clean.
> 
> I am running the latest mimic but have noticed someone else under 
> luminous is facing the same issue
> : 
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-November/031166.html
> 
> I don't believe this is any form of hardware failure as every time it is 
> different OSD's and every time a manual
> started deep-scrub finishes without issue.
> 
> Is there something that was released in the most recent mimic and 
> luminous patches that could be linked to this? Is
> it somehow linked with the main issue with the 12.2.9 release?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Priority for backfilling misplaced and degraded objects

2018-11-01 Thread Jonas Jelten
Hm I'm not so sure, because we did have a disk outage indeed.

When we added many new disks, 50% of objects were misplaced.
Then the disk failed and ~2% of objects were degraded.
The recovery went on fine, but I would expect that fixing the degraded objects 
should have a priority over data
migrations, which it seems they don't have. The ratio (then 50%/2% = 20%/0.8% 
now) remained the same.

Maybe there's just some configuration option I haven't found yet?


-- Jonas


On 01/11/2018 08.47, Janne Johansson wrote:
> I think that all the misplaced PGs that are in the queue that get
> writes _while_ waiting for backfill will get the "degraded" status,
> meaning that before they were just on the wrong place, now they are on
> the wrong place, AND the newly made PG they should backfill into will
> get an old dump made first, then an incremental with all the changes
> that came in while waiting or while finishing the first backfill, then
> become active+clean.
> Nothing to worry about, that is how recovery looks on all clusters.
> 
> Den ons 31 okt. 2018 kl 22:29 skrev Jonas Jelten :
>>
>> Hello!
>>
>> My cluster currently has this health state:
>>
>> 2018-10-31 21:20:13.694633 mon.lol [WRN] Health check update: 
>> 39010709/192173470 objects misplaced (20.300%)
>> (OBJECT_MISPLACED)
>> 2018-10-31 21:20:13.694684 mon.lol [WRN] Health check update: Degraded data 
>> redundancy: 1624786/192173470 objects
>> degraded (0.845%), 49 pgs degraded, 57 pgs undersized (PG_DEGRADED)
>> [...]
>> 2018-10-31 21:39:24.113440 mon.lol [WRN] Health check update: 
>> 38897646/192173470 objects misplaced (20.241%)
>> (OBJECT_MISPLACED)
>> 2018-10-31 21:39:24.113526 mon.lol [WRN] Health check update: Degraded data 
>> redundancy: 1613658/192173470 objects
>> degraded (0.840%), 49 pgs degraded, 57 pgs undersized (PG_DEGRADED)
>>
>>
>> It is recovering slowly, but apparenly does not recover the 0.8% degraded 
>> objects first. Instead it recovers both at the
>> same relative rate, which even means that the misplaced objects are 
>> recovered way slower than the misplaced objects!
>>
>> Is there a way to recover the degraded objects first?
>>
>>
>> Cheers
>>   -- Jonas
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Priority for backfilling misplaced and degraded objects

2018-10-31 Thread Jonas Jelten
Hello!

My cluster currently has this health state:

2018-10-31 21:20:13.694633 mon.lol [WRN] Health check update: 
39010709/192173470 objects misplaced (20.300%)
(OBJECT_MISPLACED)
2018-10-31 21:20:13.694684 mon.lol [WRN] Health check update: Degraded data 
redundancy: 1624786/192173470 objects
degraded (0.845%), 49 pgs degraded, 57 pgs undersized (PG_DEGRADED)
[...]
2018-10-31 21:39:24.113440 mon.lol [WRN] Health check update: 
38897646/192173470 objects misplaced (20.241%)
(OBJECT_MISPLACED)
2018-10-31 21:39:24.113526 mon.lol [WRN] Health check update: Degraded data 
redundancy: 1613658/192173470 objects
degraded (0.840%), 49 pgs degraded, 57 pgs undersized (PG_DEGRADED)


It is recovering slowly, but apparenly does not recover the 0.8% degraded 
objects first. Instead it recovers both at the
same relative rate, which even means that the misplaced objects are recovered 
way slower than the misplaced objects!

Is there a way to recover the degraded objects first?


Cheers
  -- Jonas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster broken and ODSs crash with failed assertion in PGLog::merge_log

2018-10-09 Thread Jonas Jelten
Yay!

I've managed to get the cluster up and running again.
Deep scrub is still fixing inconsistencies.

I had to do a depth-first-search in the tree of startup errors.
My procedure was the already described one: Find and delete PGs from OSDs which 
trigger the assertion.
I've created a script to automate the seek-and-destroy of the broken PGs.

https://gist.github.com/TheJJ/c6be62e612ac4782bd0aa279d8c82197

Cheers
JJ


On 04/10/2018 18.29, Jonas Jelten wrote:
> Hello!
> 
> Unfortunately, our single-node-"Cluster" with 11 ODSs is broken because some 
> ODSs crash when they start peering.
> I'm on Ubuntu 18.04 with Ceph Mimic (13.2.2).
> 
> The problem was induced by when RAM was filled up and ODS processes then 
> crashed because of memory allocation failures.
> 
> No weird commands (e.g. force_create_pg) were used on this cluster and it was 
> set up with 13.2.1 initially.
> The affected pool seems to be a replicated pool with size=3 and min_size=2 
> (which haven't been changed).
> 
> Crash log of osd.4 (only the crashed thread):
> 
> 99424: -1577> 2018-10-04 13:40:11.024 7f3838417700 10 log is not dirty
> 99425: -1576> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 1433 
> queue_want_up_thru want 1433 <= queued 1433, currently 1426
> 99427: -1574> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 
> 3.8 to_process <> waiting <>
> waiting_peering {}
> 99428: -1573> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 
> OpQueueItem(3.8 PGPeeringEvent(epoch_sent:
> 1433 epoch_requested: 1433 MNotifyRec 3.8 from 2 notify: (query:1433 
> sent:1433 3.8( v 866'122691 (569'119300,866'122691]
> local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 
> 1433/1433/1433)) features:
> 0x3ffddff8ffa4fffb ([859,1432] intervals=([1213,1215] acting 
> 0,2),([1308,1311] acting 4,10),([1401,1403] acting
> 2,10),([1426,1428] acting 2,4)) +create_info) prio 255 cost 10 e1433) queued
> 99430: -1571> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 
> 3.8 to_process  PGPeeringEvent(epoch_sent: 1433 epoch_requested: 1433 MNotifyRec 3.8 from 2 
> notify: (query:1433 sent:1433 3.8( v
> 866'122691 (569'119300,866'122691] local-lis/les=1401/1402 n=54053 ec=126/126 
> lis/c 1401/859 les/c/f 1402/860/0
> 1433/1433/1433)) features: 0x3ffddff8ffa4fffb ([859,1432] 
> intervals=([1213,1215] acting 0,2),([1308,1311] acting
> 4,10),([1401,1403] acting 2,10),([1426,1428] acting 2,4)) +create_info) prio 
> 255 cost 10 e1433)> waiting <>
> waiting_peering {}
> 99433: -1568> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 
> OpQueueItem(3.8 PGPeeringEvent(epoch_sent:
> 1433 epoch_requested: 1433 MNotifyRec 3.8 from 2 notify: (query:1433 
> sent:1433 3.8( v 866'122691 (569'119300,866'122691]
> local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 
> 1433/1433/1433)) features:
> 0x3ffddff8ffa4fffb ([859,1432] intervals=([1213,1215] acting 
> 0,2),([1308,1311] acting 4,10),([1401,1403] acting
> 2,10),([1426,1428] acting 2,4)) +create_info) prio 255 cost 10 e1433) pg 
> 0x56013bc87400
> 99437: -1564> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 
> pg[3.8( v 866'127774 (866'124700,866'127774]
> local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 
> 1433/1433/1433) [4,2] r=0 lpr=1433
> pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] 
> do_peering_event: epoch_sent: 1433 epoch_requested:
> 1433 MNotifyRec 3.8 from 2 notify: (query:1433 sent:1433 3.8( v 866'122691 
> (569'119300,866'122691]
> local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 
> 1433/1433/1433)) features:
> 0x3ffddff8ffa4fffb ([859,1432] intervals=([1213,1215] acting 
> 0,2),([1308,1311] acting 4,10),([1401,1403] acting
> 2,10),([1426,1428] acting 2,4)) +create_info
> 99440: -1561> 2018-10-04 13:40:11.024 7f3838417700  7 osd.4 pg_epoch: 1433 
> pg[3.8( v 866'127774 (866'124700,866'127774]
> local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 
> 1433/1433/1433) [4,2] r=0 lpr=1433
> pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] 
> state: handle_pg_notify from osd.2
> 99444: -1557> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 
> pg[3.8( v 866'127774 (866'124700,866'127774]
> local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 
> 1433/1433/1433) [4,2] r=0 lpr=1433
> pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}]  got dup 
> osd.2 info 3.8( v 866'122691
> (569'119300,866'122691] local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 
> 1401/859 les/c/f 1402/860/0 1433/1433/1433),
> identical to ours
> 99445: -

Re: [ceph-users] daahboard

2018-10-08 Thread Jonas Jelten
You need to add or generate a certificate, without it the dashboard doesn't 
start.
The procedure is described in the documentation.

-- JJ

On 09/10/2018 00.05, solarflow99 wrote:
> seems like it did, yet I don't see anything listening on the port it should 
> be for dashboard.
> 
> # ceph mgr module ls
> {
>     "enabled_modules": [
>     "dashboard",
>     "status"
>     ],
> 
> 
> 
> # ceph status
>   cluster:
>     id: d36fd17c-174e-40d6-95b9-86bdd196b7d2
>     health: HEALTH_OK
> 
>   services:
>     mon: 3 daemons, quorum cephmgr101,cephmgr102,cephmgr103
>     mgr: cephmgr103(active), standbys: cephmgr102, cephmgr101
>     mds: cephfs-1/1/1 up  {0=cephmgr103=up:active}, 2 up:standby
>     osd: 3 osds: 3 up, 3 in
> 
>   data:
>     pools:   3 pools, 192 pgs
>     objects: 2.02 k objects, 41 MiB
>     usage:   6.5 GiB used, 86 GiB / 93 GiB avail
>     pgs: 192 active+clean
> 
> 
> 
> # netstat -tlpn | grep ceph
> tcp    0  0 172.20.3.23:6789     
> 0.0.0.0:*   LISTEN  8422/ceph-mon
> tcp    0  0 172.20.3.23:6800     
> 0.0.0.0:*   LISTEN  21250/ceph-mds
> tcp    0  0 172.20.3.23:6801     
> 0.0.0.0:*   LISTEN  16562/ceph-mgr
> 
> 
> On Mon, Oct 8, 2018 at 2:48 AM John Spray  > wrote:
> 
> Assuming that ansible is correctly running "ceph mgr module enable
> dashboard", then the next place to look is in "ceph status" (any
> errors?) and "ceph mgr module ls" (any reports of the module unable to
> run?)
> 
> John
> On Sat, Oct 6, 2018 at 1:53 AM solarflow99  > wrote:
> >
> > I enabled the dashboard module in ansible but I don't see ceph-mgr 
> listening on a port for it.  Is there something
> else I missed?
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster broken and OSDs crash with failed assertion in PGLog::merge_log

2018-10-07 Thread Jonas Jelten
Thanks, I did that now: https://tracker.ceph.com/issues/36337

On 05/10/2018 19.12, Neha Ojha wrote:
> Hi JJ,
> 
> In the case, the condition olog.head >= log.tail is not true,
> therefore it crashes. Could you please open a tracker
> issue(https://tracker.ceph.com/) and attach the osd logs and the pg
> dump output?
> 
> Thanks,
> Neha
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cluster broken and ODSs crash with failed assertion in PGLog::merge_log

2018-10-04 Thread Jonas Jelten
Hello!

Unfortunately, our single-node-"Cluster" with 11 ODSs is broken because some 
ODSs crash when they start peering.
I'm on Ubuntu 18.04 with Ceph Mimic (13.2.2).

The problem was induced by when RAM was filled up and ODS processes then 
crashed because of memory allocation failures.

No weird commands (e.g. force_create_pg) were used on this cluster and it was 
set up with 13.2.1 initially.
The affected pool seems to be a replicated pool with size=3 and min_size=2 
(which haven't been changed).

Crash log of osd.4 (only the crashed thread):

99424: -1577> 2018-10-04 13:40:11.024 7f3838417700 10 log is not dirty
99425: -1576> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 1433 
queue_want_up_thru want 1433 <= queued 1433, currently 1426
99427: -1574> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 
3.8 to_process <> waiting <>
waiting_peering {}
99428: -1573> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 
OpQueueItem(3.8 PGPeeringEvent(epoch_sent:
1433 epoch_requested: 1433 MNotifyRec 3.8 from 2 notify: (query:1433 sent:1433 
3.8( v 866'122691 (569'119300,866'122691]
local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 
1433/1433/1433)) features:
0x3ffddff8ffa4fffb ([859,1432] intervals=([1213,1215] acting 0,2),([1308,1311] 
acting 4,10),([1401,1403] acting
2,10),([1426,1428] acting 2,4)) +create_info) prio 255 cost 10 e1433) queued
99430: -1571> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 
3.8 to_process  waiting <>
waiting_peering {}
99433: -1568> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 
OpQueueItem(3.8 PGPeeringEvent(epoch_sent:
1433 epoch_requested: 1433 MNotifyRec 3.8 from 2 notify: (query:1433 sent:1433 
3.8( v 866'122691 (569'119300,866'122691]
local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 
1433/1433/1433)) features:
0x3ffddff8ffa4fffb ([859,1432] intervals=([1213,1215] acting 0,2),([1308,1311] 
acting 4,10),([1401,1403] acting
2,10),([1426,1428] acting 2,4)) +create_info) prio 255 cost 10 e1433) pg 
0x56013bc87400
99437: -1564> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 
pg[3.8( v 866'127774 (866'124700,866'127774]
local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 
1433/1433/1433) [4,2] r=0 lpr=1433
pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] 
do_peering_event: epoch_sent: 1433 epoch_requested:
1433 MNotifyRec 3.8 from 2 notify: (query:1433 sent:1433 3.8( v 866'122691 
(569'119300,866'122691]
local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 
1433/1433/1433)) features:
0x3ffddff8ffa4fffb ([859,1432] intervals=([1213,1215] acting 0,2),([1308,1311] 
acting 4,10),([1401,1403] acting
2,10),([1426,1428] acting 2,4)) +create_info
99440: -1561> 2018-10-04 13:40:11.024 7f3838417700  7 osd.4 pg_epoch: 1433 
pg[3.8( v 866'127774 (866'124700,866'127774]
local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 
1433/1433/1433) [4,2] r=0 lpr=1433
pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] 
state: handle_pg_notify from osd.2
99444: -1557> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 
pg[3.8( v 866'127774 (866'124700,866'127774]
local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 
1433/1433/1433) [4,2] r=0 lpr=1433
pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}]  got dup 
osd.2 info 3.8( v 866'122691
(569'119300,866'122691] local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 
1401/859 les/c/f 1402/860/0 1433/1433/1433),
identical to ours
99445: -1556> 2018-10-04 13:40:11.024 7f3838417700 10 log is not dirty
99446: -1555> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 1433 
queue_want_up_thru want 1433 <= queued 1433, currently 1426
99448: -1553> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 
3.8 to_process <> waiting <>
waiting_peering {}
99450: -1551> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 
OpQueueItem(3.8 PGPeeringEvent(epoch_sent:
1433 epoch_requested: 1433 MLogRec from 2 +create_info) prio 255 cost 10 e1433) 
queued
99456: -1545> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 
3.8 to_process  waiting <>
waiting_peering {}
99458: -1543> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 
OpQueueItem(3.8 PGPeeringEvent(epoch_sent:
1433 epoch_requested: 1433 MLogRec from 2 +create_info) prio 255 cost 10 e1433) 
pg 0x56013bc87400
99461: -1540> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 
pg[3.8( v 866'127774 (866'124700,866'127774]
local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 
1433/1433/1433) [4,2] r=0 lpr=1433
pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] 
do_peering_event: epoch_sent: 1433 epoch_requested:
1433 MLogRec from 2 +create_info
99465: -1536> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 
pg[3.8( v 866'127774