Re: [PVE-User] Ceph - PG expected clone missing
Hi Karsten, On Mon, Feb 19, 2018 at 06:54:38PM +0100, Karsten Becker wrote: > Hmm, a little bit: > > > > 2018-02-19 14:29:50.309181 7fc3e82ef700 1 osd.29 pg_epoch: 48372 > pg[10.7b9( v 48371'1976510 (48031'1975009,48371'1976510] > local-lis/les=48362/48363 n=1816 ec=36999/1069 lis/c 48362/48362 les/ > > c/f 48363/48371/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372 > pi=[48362,48372)/1 luod=0'0 crt=48371'1976510 mlcod 0'0 active] > start_peering_interval up [29,10,22] -> [29,10,22], acting [10 > > ,22,32] -> [29,10,22], acting_primary 10 -> 29, up_primary 29 -> 29, > role -1 -> 0, features acting 2305244844532236283 upacting > 2305244844532236283 > > 2018-02-19 14:29:50.309317 7fc3e82ef700 1 osd.29 pg_epoch: 48372 > pg[10.7b9( v 48371'1976510 (48031'1975009,48371'1976510] > local-lis/les=48362/48363 n=1816 ec=36999/1069 lis/c 48362/48362 les/ > > c/f 48363/48371/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372 > pi=[48362,48372)/1 crt=48371'1976510 mlcod 0'0 inconsistent] > state: transitioning to Primary > > 2018-02-19 14:30:34.445237 7fc3e6aec700 0 log_channel(cluster) log > [DBG] : 10.7b9 repair starts > > 2018-02-19 14:31:07.147350 7fc3e6aec700 -1 osd.29 pg_epoch: 48373 > pg[10.7b9( v 48373'1976520 (48031'1975009,48373'1976520] > local-lis/les=48372/48373 n=1816 ec=36999/1069 lis/c 48372/48372 les/ > > c/f 48373/48373/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372 > crt=48373'1976520 lcod 48373'1976519 mlcod 48373'1976519 > active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps no head > > for 10:9deb7da1:::rbd_data.966489238e1f29.4619:18 (have MIN) > > 2018-02-19 14:31:23.281765 7fc3e6aec700 -1 log_channel(cluster) log > [ERR] : repair 10.7b9 > 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected > clone 10:9defb021:::rbd_data.231 > > 3975238e1f29.0002cbb5:64e 1 missing > > 2018-02-19 14:31:23.281780 7fc3e6aec700 0 log_channel(cluster) log > [INF] : repair 10.7b9 > 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head 1 missing > clone(s) > > 2018-02-19 14:32:05.166585 7fc3e6aec700 -1 log_channel(cluster) log > [ERR] : 10.7b9 repair 1 errors, 0 fixed > > > Whereas this should be the additional info that may help: > > > c/f 48373/48373/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372 > crt=48373'1976520 lcod 48373'1976519 mlcod 48373'1976519 > active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps no head > > > During a night an automated qm snapshots of a Windows Server VM seems to > have failed. But it's suboptimal if this crashes Ceph in this way... > > > Best > Karsten I guess one of your snapshots is corrupt, maybe you are hitting the following issue. https://www.spinics.net/lists/ceph-users/msg41266.html http://tracker.ceph.com/issues/19413 -- Cheers, Alwin ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] Ceph - PG expected clone missing
Hmm, a little bit: > 2018-02-19 14:29:50.309181 7fc3e82ef700 1 osd.29 pg_epoch: 48372 pg[10.7b9( v 48371'1976510 (48031'1975009,48371'1976510] local-lis/les=48362/48363 n=1816 ec=36999/1069 lis/c 48362/48362 les/ > c/f 48363/48371/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372 pi=[48362,48372)/1 luod=0'0 crt=48371'1976510 mlcod 0'0 active] start_peering_interval up [29,10,22] -> [29,10,22], acting [10 > ,22,32] -> [29,10,22], acting_primary 10 -> 29, up_primary 29 -> 29, role -1 -> 0, features acting 2305244844532236283 upacting 2305244844532236283 > 2018-02-19 14:29:50.309317 7fc3e82ef700 1 osd.29 pg_epoch: 48372 pg[10.7b9( v 48371'1976510 (48031'1975009,48371'1976510] local-lis/les=48362/48363 n=1816 ec=36999/1069 lis/c 48362/48362 les/ > c/f 48363/48371/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372 pi=[48362,48372)/1 crt=48371'1976510 mlcod 0'0 inconsistent] state: transitioning to Primary > 2018-02-19 14:30:34.445237 7fc3e6aec700 0 log_channel(cluster) log [DBG] : 10.7b9 repair starts > 2018-02-19 14:31:07.147350 7fc3e6aec700 -1 osd.29 pg_epoch: 48373 pg[10.7b9( v 48373'1976520 (48031'1975009,48373'1976520] local-lis/les=48372/48373 n=1816 ec=36999/1069 lis/c 48372/48372 les/ > c/f 48373/48373/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372 crt=48373'1976520 lcod 48373'1976519 mlcod 48373'1976519 active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps no head > for 10:9deb7da1:::rbd_data.966489238e1f29.4619:18 (have MIN) > 2018-02-19 14:31:23.281765 7fc3e6aec700 -1 log_channel(cluster) log [ERR] : repair 10.7b9 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected clone 10:9defb021:::rbd_data.231 > 3975238e1f29.0002cbb5:64e 1 missing > 2018-02-19 14:31:23.281780 7fc3e6aec700 0 log_channel(cluster) log [INF] : repair 10.7b9 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head 1 missing clone(s) > 2018-02-19 14:32:05.166585 7fc3e6aec700 -1 log_channel(cluster) log [ERR] : 10.7b9 repair 1 errors, 0 fixed Whereas this should be the additional info that may help: > c/f 48373/48373/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372 crt=48373'1976520 lcod 48373'1976519 mlcod 48373'1976519 active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps no head During a night an automated qm snapshots of a Windows Server VM seems to have failed. But it's suboptimal if this crashes Ceph in this way... Best Karsten On 19.02.2018 16:01, Alwin Antreich wrote: > Hi Karsten, > > On Mon, Feb 19, 2018 at 02:36:41PM +0100, Karsten Becker wrote: >> Hi, >> >> I have one damaged PG in my Ceph cluster. All OSDs are BlueStore. How do I >> fix this? >> >> >>> 2018-02-19 14:30:24.371058 mon.0 [ERR] overall HEALTH_ERR 1 scrub errors; >>> Possible data damage: 1 pg inconsistent >>> 2018-02-19 14:30:37.733236 mon.0 [ERR] Health check update: Possible data >>> damage: 1 pg inconsistent, 1 pg repair (PG_DAMAGED) >>> 2018-02-19 14:31:24.371286 mon.0 [ERR] overall HEALTH_ERR 1 scrub errors; >>> Possible data damage: 1 pg inconsistent, 1 pg repair >>> 2018-02-19 14:31:23.281772 osd.29 [ERR] repair 10.7b9 >>> 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected clone >>> 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:64e 1 missing >>> 2018-02-19 14:31:23.281784 osd.29 [INF] repair 10.7b9 >>> 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head 1 missing >>> clone(s) >>> 2018-02-19 14:32:05.166591 osd.29 [ERR] 10.7b9 repair 1 errors, 0 fixed >>> 2018-02-19 14:32:05.580906 mon.0 [ERR] Health check update: Possible data >>> damage: 1 pg inconsistent (PG_DAMAGED) >> >> >> "ceph pg repair 10.7b9" fails and is not able to fix ist. A manually >> started scrub "ceph pg scrub 10.7b9" also. >> >> size=3 min_size=2... if it's of interest. >> >> Any help appreciated. >> >> Best from Berlin/Germany >> Karsten >> > Check your osd.29, the disk may be faulty. > > Can you see more in the log of the osd.29? > > -- > Cheers, > Alwin > > ___ > pve-user mailing list > pve-user@pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > Ecologic Institut gemeinnuetzige GmbH Pfalzburger Str. 43/44, D-10717 Berlin Geschaeftsfuehrerin / Director: Dr. Camilla Bausch Sitz der Gesellschaft / Registered Office: Berlin (Germany) Registergericht / Court of Registration: Amtsgericht Berlin (Charlottenburg), HRB 57947 ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Re: [PVE-User] Ceph - PG expected clone missing
Hi Karsten, On Mon, Feb 19, 2018 at 02:36:41PM +0100, Karsten Becker wrote: > Hi, > > I have one damaged PG in my Ceph cluster. All OSDs are BlueStore. How do I > fix this? > > > > 2018-02-19 14:30:24.371058 mon.0 [ERR] overall HEALTH_ERR 1 scrub errors; > > Possible data damage: 1 pg inconsistent > > 2018-02-19 14:30:37.733236 mon.0 [ERR] Health check update: Possible data > > damage: 1 pg inconsistent, 1 pg repair (PG_DAMAGED) > > 2018-02-19 14:31:24.371286 mon.0 [ERR] overall HEALTH_ERR 1 scrub errors; > > Possible data damage: 1 pg inconsistent, 1 pg repair > > 2018-02-19 14:31:23.281772 osd.29 [ERR] repair 10.7b9 > > 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected clone > > 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:64e 1 missing > > 2018-02-19 14:31:23.281784 osd.29 [INF] repair 10.7b9 > > 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head 1 missing > > clone(s) > > 2018-02-19 14:32:05.166591 osd.29 [ERR] 10.7b9 repair 1 errors, 0 fixed > > 2018-02-19 14:32:05.580906 mon.0 [ERR] Health check update: Possible data > > damage: 1 pg inconsistent (PG_DAMAGED) > > > "ceph pg repair 10.7b9" fails and is not able to fix ist. A manually > started scrub "ceph pg scrub 10.7b9" also. > > size=3 min_size=2... if it's of interest. > > Any help appreciated. > > Best from Berlin/Germany > Karsten > Check your osd.29, the disk may be faulty. Can you see more in the log of the osd.29? -- Cheers, Alwin ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
[PVE-User] Ceph - PG expected clone missing
Hi, I have one damaged PG in my Ceph cluster. All OSDs are BlueStore. How do I fix this? > 2018-02-19 14:30:24.371058 mon.0 [ERR] overall HEALTH_ERR 1 scrub errors; > Possible data damage: 1 pg inconsistent > 2018-02-19 14:30:37.733236 mon.0 [ERR] Health check update: Possible data > damage: 1 pg inconsistent, 1 pg repair (PG_DAMAGED) > 2018-02-19 14:31:24.371286 mon.0 [ERR] overall HEALTH_ERR 1 scrub errors; > Possible data damage: 1 pg inconsistent, 1 pg repair > 2018-02-19 14:31:23.281772 osd.29 [ERR] repair 10.7b9 > 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected clone > 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:64e 1 missing > 2018-02-19 14:31:23.281784 osd.29 [INF] repair 10.7b9 > 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head 1 missing > clone(s) > 2018-02-19 14:32:05.166591 osd.29 [ERR] 10.7b9 repair 1 errors, 0 fixed > 2018-02-19 14:32:05.580906 mon.0 [ERR] Health check update: Possible data > damage: 1 pg inconsistent (PG_DAMAGED) "ceph pg repair 10.7b9" fails and is not able to fix ist. A manually started scrub "ceph pg scrub 10.7b9" also. size=3 min_size=2... if it's of interest. Any help appreciated. Best from Berlin/Germany Karsten Ecologic Institut gemeinnuetzige GmbH Pfalzburger Str. 43/44, D-10717 Berlin Geschaeftsfuehrerin / Director: Dr. Camilla Bausch Sitz der Gesellschaft / Registered Office: Berlin (Germany) Registergericht / Court of Registration: Amtsgericht Berlin (Charlottenburg), HRB 57947 ___ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user