Re: [PVE-User] Ceph - PG expected clone missing

2018-02-20 Thread Alwin Antreich
Hi Karsten,

On Mon, Feb 19, 2018 at 06:54:38PM +0100, Karsten Becker wrote:
> Hmm, a little bit:
>
>
> > 2018-02-19 14:29:50.309181 7fc3e82ef700  1 osd.29 pg_epoch: 48372
> pg[10.7b9( v 48371'1976510 (48031'1975009,48371'1976510]
> local-lis/les=48362/48363 n=1816 ec=36999/1069 lis/c 48362/48362 les/
> > c/f 48363/48371/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372
> pi=[48362,48372)/1 luod=0'0 crt=48371'1976510 mlcod 0'0 active]
> start_peering_interval up [29,10,22] -> [29,10,22], acting [10
> > ,22,32] -> [29,10,22], acting_primary 10 -> 29, up_primary 29 -> 29,
> role -1 -> 0, features acting 2305244844532236283 upacting
> 2305244844532236283
> > 2018-02-19 14:29:50.309317 7fc3e82ef700  1 osd.29 pg_epoch: 48372
> pg[10.7b9( v 48371'1976510 (48031'1975009,48371'1976510]
> local-lis/les=48362/48363 n=1816 ec=36999/1069 lis/c 48362/48362 les/
> > c/f 48363/48371/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372
> pi=[48362,48372)/1 crt=48371'1976510 mlcod 0'0 inconsistent]
> state: transitioning to Primary
> > 2018-02-19 14:30:34.445237 7fc3e6aec700  0 log_channel(cluster) log
> [DBG] : 10.7b9 repair starts
> > 2018-02-19 14:31:07.147350 7fc3e6aec700 -1 osd.29 pg_epoch: 48373
> pg[10.7b9( v 48373'1976520 (48031'1975009,48373'1976520]
> local-lis/les=48372/48373 n=1816 ec=36999/1069 lis/c 48372/48372 les/
> > c/f 48373/48373/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372
> crt=48373'1976520 lcod 48373'1976519 mlcod 48373'1976519
> active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps no head
> > for 10:9deb7da1:::rbd_data.966489238e1f29.4619:18 (have MIN)
> > 2018-02-19 14:31:23.281765 7fc3e6aec700 -1 log_channel(cluster) log
> [ERR] : repair 10.7b9
> 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected
> clone 10:9defb021:::rbd_data.231
> > 3975238e1f29.0002cbb5:64e 1 missing
> > 2018-02-19 14:31:23.281780 7fc3e6aec700  0 log_channel(cluster) log
> [INF] : repair 10.7b9
> 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head 1 missing
> clone(s)
> > 2018-02-19 14:32:05.166585 7fc3e6aec700 -1 log_channel(cluster) log
> [ERR] : 10.7b9 repair 1 errors, 0 fixed
>
>
> Whereas this should be the additional info that may help:
>
> > c/f 48373/48373/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372
> crt=48373'1976520 lcod 48373'1976519 mlcod 48373'1976519
> active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps no head
>
>
> During a night an automated qm snapshots of a Windows Server VM seems to
> have failed. But it's suboptimal if this crashes Ceph in this way...
>
>
> Best
> Karsten

I guess one of your snapshots is corrupt, maybe you are hitting the
following issue.

https://www.spinics.net/lists/ceph-users/msg41266.html
http://tracker.ceph.com/issues/19413

--
Cheers,
Alwin

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Ceph - PG expected clone missing

2018-02-19 Thread Karsten Becker
Hmm, a little bit:


> 2018-02-19 14:29:50.309181 7fc3e82ef700  1 osd.29 pg_epoch: 48372
pg[10.7b9( v 48371'1976510 (48031'1975009,48371'1976510]
local-lis/les=48362/48363 n=1816 ec=36999/1069 lis/c 48362/48362 les/
> c/f 48363/48371/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372
pi=[48362,48372)/1 luod=0'0 crt=48371'1976510 mlcod 0'0 active]
start_peering_interval up [29,10,22] -> [29,10,22], acting [10
> ,22,32] -> [29,10,22], acting_primary 10 -> 29, up_primary 29 -> 29,
role -1 -> 0, features acting 2305244844532236283 upacting
2305244844532236283
> 2018-02-19 14:29:50.309317 7fc3e82ef700  1 osd.29 pg_epoch: 48372
pg[10.7b9( v 48371'1976510 (48031'1975009,48371'1976510]
local-lis/les=48362/48363 n=1816 ec=36999/1069 lis/c 48362/48362 les/
> c/f 48363/48371/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372
pi=[48362,48372)/1 crt=48371'1976510 mlcod 0'0 inconsistent]
state: transitioning to Primary
> 2018-02-19 14:30:34.445237 7fc3e6aec700  0 log_channel(cluster) log
[DBG] : 10.7b9 repair starts
> 2018-02-19 14:31:07.147350 7fc3e6aec700 -1 osd.29 pg_epoch: 48373
pg[10.7b9( v 48373'1976520 (48031'1975009,48373'1976520]
local-lis/les=48372/48373 n=1816 ec=36999/1069 lis/c 48372/48372 les/
> c/f 48373/48373/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372
crt=48373'1976520 lcod 48373'1976519 mlcod 48373'1976519
active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps no head
> for 10:9deb7da1:::rbd_data.966489238e1f29.4619:18 (have MIN)
> 2018-02-19 14:31:23.281765 7fc3e6aec700 -1 log_channel(cluster) log
[ERR] : repair 10.7b9
10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected
clone 10:9defb021:::rbd_data.231
> 3975238e1f29.0002cbb5:64e 1 missing
> 2018-02-19 14:31:23.281780 7fc3e6aec700  0 log_channel(cluster) log
[INF] : repair 10.7b9
10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head 1 missing
clone(s)
> 2018-02-19 14:32:05.166585 7fc3e6aec700 -1 log_channel(cluster) log
[ERR] : 10.7b9 repair 1 errors, 0 fixed


Whereas this should be the additional info that may help:

> c/f 48373/48373/12767 48361/48372/48372) [29,10,22] r=0 lpr=48372
crt=48373'1976520 lcod 48373'1976519 mlcod 48373'1976519
active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps no head


During a night an automated qm snapshots of a Windows Server VM seems to
have failed. But it's suboptimal if this crashes Ceph in this way...


Best
Karsten





On 19.02.2018 16:01, Alwin Antreich wrote:
> Hi Karsten,
> 
> On Mon, Feb 19, 2018 at 02:36:41PM +0100, Karsten Becker wrote:
>> Hi,
>>
>> I have one damaged PG in my Ceph cluster. All OSDs are BlueStore. How do I
>> fix this?
>>
>>
>>> 2018-02-19 14:30:24.371058 mon.0 [ERR] overall HEALTH_ERR 1 scrub errors; 
>>> Possible data damage: 1 pg inconsistent
>>> 2018-02-19 14:30:37.733236 mon.0 [ERR] Health check update: Possible data 
>>> damage: 1 pg inconsistent, 1 pg repair (PG_DAMAGED)
>>> 2018-02-19 14:31:24.371286 mon.0 [ERR] overall HEALTH_ERR 1 scrub errors; 
>>> Possible data damage: 1 pg inconsistent, 1 pg repair
>>> 2018-02-19 14:31:23.281772 osd.29 [ERR] repair 10.7b9 
>>> 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected clone 
>>> 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:64e 1 missing
>>> 2018-02-19 14:31:23.281784 osd.29 [INF] repair 10.7b9 
>>> 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head 1 missing 
>>> clone(s)
>>> 2018-02-19 14:32:05.166591 osd.29 [ERR] 10.7b9 repair 1 errors, 0 fixed
>>> 2018-02-19 14:32:05.580906 mon.0 [ERR] Health check update: Possible data 
>>> damage: 1 pg inconsistent (PG_DAMAGED)
>>
>>
>> "ceph pg repair 10.7b9" fails and is not able to fix ist. A manually
>> started scrub "ceph pg scrub 10.7b9" also.
>>
>> size=3 min_size=2... if it's of interest.
>>
>> Any help appreciated.
>>
>> Best from Berlin/Germany
>> Karsten
>>
> Check your osd.29, the disk may be faulty.
> 
> Can you see more in the log of the osd.29?
> 
> --
> Cheers,
> Alwin
> 
> ___
> pve-user mailing list
> pve-user@pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 


Ecologic Institut gemeinnuetzige GmbH
Pfalzburger Str. 43/44, D-10717 Berlin
Geschaeftsfuehrerin / Director: Dr. Camilla Bausch
Sitz der Gesellschaft / Registered Office: Berlin (Germany)
Registergericht / Court of Registration: Amtsgericht Berlin (Charlottenburg), 
HRB 57947
___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


Re: [PVE-User] Ceph - PG expected clone missing

2018-02-19 Thread Alwin Antreich
Hi Karsten,

On Mon, Feb 19, 2018 at 02:36:41PM +0100, Karsten Becker wrote:
> Hi,
>
> I have one damaged PG in my Ceph cluster. All OSDs are BlueStore. How do I
> fix this?
>
>
> > 2018-02-19 14:30:24.371058 mon.0 [ERR] overall HEALTH_ERR 1 scrub errors; 
> > Possible data damage: 1 pg inconsistent
> > 2018-02-19 14:30:37.733236 mon.0 [ERR] Health check update: Possible data 
> > damage: 1 pg inconsistent, 1 pg repair (PG_DAMAGED)
> > 2018-02-19 14:31:24.371286 mon.0 [ERR] overall HEALTH_ERR 1 scrub errors; 
> > Possible data damage: 1 pg inconsistent, 1 pg repair
> > 2018-02-19 14:31:23.281772 osd.29 [ERR] repair 10.7b9 
> > 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected clone 
> > 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:64e 1 missing
> > 2018-02-19 14:31:23.281784 osd.29 [INF] repair 10.7b9 
> > 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head 1 missing 
> > clone(s)
> > 2018-02-19 14:32:05.166591 osd.29 [ERR] 10.7b9 repair 1 errors, 0 fixed
> > 2018-02-19 14:32:05.580906 mon.0 [ERR] Health check update: Possible data 
> > damage: 1 pg inconsistent (PG_DAMAGED)
>
>
> "ceph pg repair 10.7b9" fails and is not able to fix ist. A manually
> started scrub "ceph pg scrub 10.7b9" also.
>
> size=3 min_size=2... if it's of interest.
>
> Any help appreciated.
>
> Best from Berlin/Germany
> Karsten
>
Check your osd.29, the disk may be faulty.

Can you see more in the log of the osd.29?

--
Cheers,
Alwin

___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user


[PVE-User] Ceph - PG expected clone missing

2018-02-19 Thread Karsten Becker
Hi,

I have one damaged PG in my Ceph cluster. All OSDs are BlueStore. How do I
fix this?


> 2018-02-19 14:30:24.371058 mon.0 [ERR] overall HEALTH_ERR 1 scrub errors; 
> Possible data damage: 1 pg inconsistent
> 2018-02-19 14:30:37.733236 mon.0 [ERR] Health check update: Possible data 
> damage: 1 pg inconsistent, 1 pg repair (PG_DAMAGED)
> 2018-02-19 14:31:24.371286 mon.0 [ERR] overall HEALTH_ERR 1 scrub errors; 
> Possible data damage: 1 pg inconsistent, 1 pg repair
> 2018-02-19 14:31:23.281772 osd.29 [ERR] repair 10.7b9 
> 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected clone 
> 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:64e 1 missing
> 2018-02-19 14:31:23.281784 osd.29 [INF] repair 10.7b9 
> 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head 1 missing 
> clone(s)
> 2018-02-19 14:32:05.166591 osd.29 [ERR] 10.7b9 repair 1 errors, 0 fixed
> 2018-02-19 14:32:05.580906 mon.0 [ERR] Health check update: Possible data 
> damage: 1 pg inconsistent (PG_DAMAGED)


"ceph pg repair 10.7b9" fails and is not able to fix ist. A manually
started scrub "ceph pg scrub 10.7b9" also.

size=3 min_size=2... if it's of interest.

Any help appreciated.

Best from Berlin/Germany
Karsten

Ecologic Institut gemeinnuetzige GmbH
Pfalzburger Str. 43/44, D-10717 Berlin
Geschaeftsfuehrerin / Director: Dr. Camilla Bausch
Sitz der Gesellschaft / Registered Office: Berlin (Germany)
Registergericht / Court of Registration: Amtsgericht Berlin (Charlottenburg), 
HRB 57947
___
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user