Hi Karsten, On Mon, Feb 19, 2018 at 06:54:38PM +0100, Karsten Becker wrote: > Hmm, a little bit: > > > > 2018-02-19 14:29:50.309181 7fc3e82ef700 1 osd.29 pg_epoch: 48372 > pg[10.7b9( v 48371'1976510 (48031'1975009,48371'1976510] > local-lis/les=48362/48363 n=1816 ec=36999/1069 lis/c 48362/48362 les/ > > c/f 48363/48371/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372 > pi=[48362,48372)/1 luod=0'0 crt=48371'1976510 mlcod 0'0 active] > start_peering_interval up [29,10,22] -> [29,10,22], acting [10 > > ,22,32] -> [29,10,22], acting_primary 10 -> 29, up_primary 29 -> 29, > role -1 -> 0, features acting 2305244844532236283 upacting > 2305244844532236283 > > 2018-02-19 14:29:50.309317 7fc3e82ef700 1 osd.29 pg_epoch: 48372 > pg[10.7b9( v 48371'1976510 (48031'1975009,48371'1976510] > local-lis/les=48362/48363 n=1816 ec=36999/1069 lis/c 48362/48362 les/ > > c/f 48363/48371/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372 > pi=[48362,48372)/1 crt=48371'1976510 mlcod 0'0 inconsistent] > state<Start>: transitioning to Primary > > 2018-02-19 14:30:34.445237 7fc3e6aec700 0 log_channel(cluster) log > [DBG] : 10.7b9 repair starts > > 2018-02-19 14:31:07.147350 7fc3e6aec700 -1 osd.29 pg_epoch: 48373 > pg[10.7b9( v 48373'1976520 (48031'1975009,48373'1976520] > local-lis/les=48372/48373 n=1816 ec=36999/1069 lis/c 48372/48372 les/ > > c/f 48373/48373/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372 > crt=48373'1976520 lcod 48373'1976519 mlcod 48373'1976519 > active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps no head > > for 10:9deb7da1:::rbd_data.966489238e1f29.0000000000004619:18 (have MIN) > > 2018-02-19 14:31:23.281765 7fc3e6aec700 -1 log_channel(cluster) log > [ERR] : repair 10.7b9 > 10:9defb021:::rbd_data.2313975238e1f29.000000000002cbb5:head expected > clone 10:9defb021:::rbd_data.231 > > 3975238e1f29.000000000002cbb5:64e 1 missing > > 2018-02-19 14:31:23.281780 7fc3e6aec700 0 log_channel(cluster) log > [INF] : repair 10.7b9 > 10:9defb021:::rbd_data.2313975238e1f29.000000000002cbb5:head 1 missing > clone(s) > > 2018-02-19 14:32:05.166585 7fc3e6aec700 -1 log_channel(cluster) log > [ERR] : 10.7b9 repair 1 errors, 0 fixed > > > Whereas this should be the additional info that may help: > > > c/f 48373/48373/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372 > crt=48373'1976520 lcod 48373'1976519 mlcod 48373'1976519 > active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps no head > > > During a night an automated qm snapshots of a Windows Server VM seems to > have failed. But it's suboptimal if this crashes Ceph in this way... > > > Best > Karsten
I guess one of your snapshots is corrupt, maybe you are hitting the following issue. https://www.spinics.net/lists/ceph-users/msg41266.html http://tracker.ceph.com/issues/19413 -- Cheers, Alwin _______________________________________________ pve-user mailing list [email protected] https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
