Re: [ceph-users] XFS Metadata corruption while activating OSD

2018-03-20 Thread 赵赵贺东
I’m sorry for my late reply.
Thank you for your reply.
Yes, this error only exists while backend is xfs.
Ext4 will not trigger the error.



> 在 2018年3月12日,下午6:31,Peter Woodman  写道:
> 
> from what i've heard, xfs has problems on arm. use btrfs, or (i
> believe?) ext4+bluestore will work.
> 
> On Sun, Mar 11, 2018 at 9:49 PM, Christian Wuerdig
>  wrote:
>> Hm, so you're running OSD nodes with 2GB of RAM and 2x10TB = 20TB of
>> storage? Literally everything posted on this list in relation to HW
>> requirements and related problems will tell you that this simply isn't going
>> to work. The slightest hint of a problem will simply kill the OSD nodes with
>> OOM. Have you tried with smaller disks - like 1TB models (or even smaller
>> like 256GB SSDs) and see if the same problem persists?
>> 
>> 
>> On Tue, 6 Mar 2018 at 10:51, 赵赵贺东  wrote:
>>> 
>>> Hello ceph-users,
>>> 
>>> It is a really really Really tough problem for our team.
>>> We investigated in the problem for a long time, try a lot of efforts, but
>>> can’t solve the problem, even the concentrate cause of the problem is still
>>> unclear for us!
>>> So, Anyone give any solution/suggestion/opinion whatever  will be highly
>>> highly appreciated!!!
>>> 
>>> Problem Summary:
>>> When we activate osd, there will be  metadata corrupttion in the
>>> activating disk, probability is 100% !
>>> 
>>> Admin Nodes node:
>>> Platform: X86
>>> OS: Ubuntu 16.04
>>> Kernel: 4.12.0
>>> Ceph: Luminous 12.2.2
>>> 
>>> OSD nodes:
>>> Platform: armv7
>>> OS:   Ubuntu 14.04
>>> Kernel:   4.4.39
>>> Ceph: Lominous 12.2.2
>>> Disk: 10T+10T
>>> Memory: 2GB
>>> 
>>> Deploy log:
>>> 
>>> 
>>> dmesg log:(Sorry arms001-01 dmesg log has log has been lost, but error
>>> message about metadata corruption on arms003-10 are the same with
>>> arms001-01)
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.534232] XFS (sda1): Unmount and
>>> run xfs_repair
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.539100] XFS (sda1): First 64
>>> bytes of corrupted metadata buffer:
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.545504] eb82f000: 58 46 53 42 00
>>> 00 10 00 00 00 00 00 91 73 fe fb  XFSB.s..
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.553569] eb82f010: 00 00 00 00 00
>>> 00 00 00 00 00 00 00 00 00 00 00  
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.561624] eb82f020: fc 4e e3 89 50
>>> 8f 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.n../
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.569706] eb82f030: 00 00 00 00 80
>>> 00 00 07 ff ff ff ff ff ff ff ff  
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.58] XFS (sda1): metadata I/O
>>> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.602944] XFS (sda1): Metadata
>>> corruption detected at xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data
>>> block 0x48b9ff80
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.614170] XFS (sda1): Unmount and
>>> run xfs_repair
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.619030] XFS (sda1): First 64
>>> bytes of corrupted metadata buffer:
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.625403] eb901000: 58 46 53 42 00
>>> 00 10 00 00 00 00 00 91 73 fe fb  XFSB.s..
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.633441] eb901010: 00 00 00 00 00
>>> 00 00 00 00 00 00 00 00 00 00 00  
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.641474] eb901020: fc 4e e3 89 50
>>> 8f 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.n../
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.649519] eb901030: 00 00 00 00 80
>>> 00 00 07 ff ff ff ff ff ff ff ff  
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.657554] XFS (sda1): metadata I/O
>>> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.675056] XFS (sda1): Metadata
>>> corruption detected at xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data
>>> block 0x48b9ff80
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.686228] XFS (sda1): Unmount and
>>> run xfs_repair
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.691054] XFS (sda1): First 64
>>> bytes of corrupted metadata buffer:
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.697425] eb901000: 58 46 53 42 00
>>> 00 10 00 00 00 00 00 91 73 fe fb  XFSB.s..
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.705459] eb901010: 00 00 00 00 00
>>> 00 00 00 00 00 00 00 00 00 00 00  
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.713489] eb901020: fc 4e e3 89 50
>>> 8f 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.n../
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.721520] eb901030: 00 00 00 00 80
>>> 00 00 07 ff ff ff ff ff ff ff ff  
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.729558] XFS (sda1): metadata I/O
>>> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
>>> Mar  5 11:08:49 arms003-10 kernel: [  252.741953] XFS 

Re: [ceph-users] XFS Metadata corruption while activating OSD

2018-03-20 Thread 赵赵贺东


> 在 2018年3月12日,上午9:49,Christian Wuerdig  写道:
> 
> Hm, so you're running OSD nodes with 2GB of RAM and 2x10TB = 20TB of storage? 
> Literally everything posted on this list in relation to HW requirements and 
> related problems will tell you that this simply isn't going to work. The 
> slightest hint of a problem will simply kill the OSD nodes with OOM. Have you 
> tried with smaller disks - like 1TB models (or even smaller like 256GB SSDs) 
> and see if the same problem persists?

Thank you for your reply.
I am sorry for my late reply.
You are right , when the backend is bluestore , there was OOM from time to time.
Now will upgrade our HW to see whether we avoid OOM.
Besides, after we upgrade kernel from 4.4.39 to 4.4.120, the activating osd xfs 
error seems to be fixed.

> 
> 
> On Tue, 6 Mar 2018 at 10:51, 赵赵贺东  > wrote:
> Hello ceph-users,
> 
> It is a really really Really tough problem for our team.
> We investigated in the problem for a long time, try a lot of efforts, but 
> can’t solve the problem, even the concentrate cause of the problem is still 
> unclear for us!
> So, Anyone give any solution/suggestion/opinion whatever  will be highly 
> highly appreciated!!!
> 
> Problem Summary:
> When we activate osd, there will be  metadata corrupttion in the activating 
> disk, probability is 100% !
> 
> Admin Nodes node:
> Platform: X86
> OS:   Ubuntu 16.04
> Kernel:   4.12.0
> Ceph: Luminous 12.2.2
> 
> OSD nodes:
> Platform: armv7
> OS:   Ubuntu 14.04
> Kernel:   4.4.39
> Ceph: Lominous 12.2.2
> Disk: 10T+10T
> Memory:   2GB
> 
> Deploy log:
> 
> 
> dmesg log:(Sorry arms001-01 dmesg log has log has been lost, but error 
> message about metadata corruption on arms003-10 are the same with arms001-01)
> Mar  5 11:08:49 arms003-10 kernel: [  252.534232] XFS (sda1): Unmount and run 
> xfs_repair
> Mar  5 11:08:49 arms003-10 kernel: [  252.539100] XFS (sda1): First 64 bytes 
> of corrupted metadata buffer:
> Mar  5 11:08:49 arms003-10 kernel: [  252.545504] eb82f000: 58 46 53 42 00 00 
> 10 00 00 00 00 00 91 73 fe fb  XFSB.s..
> Mar  5 11:08:49 arms003-10 kernel: [  252.553569] eb82f010: 00 00 00 00 00 00 
> 00 00 00 00 00 00 00 00 00 00  
> Mar  5 11:08:49 arms003-10 kernel: [  252.561624] eb82f020: fc 4e e3 89 50 8f 
> 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.n../
> Mar  5 11:08:49 arms003-10 kernel: [  252.569706] eb82f030: 00 00 00 00 80 00 
> 00 07 ff ff ff ff ff ff ff ff  
> Mar  5 11:08:49 arms003-10 kernel: [  252.58] XFS (sda1): metadata I/O 
> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
> Mar  5 11:08:49 arms003-10 kernel: [  252.602944] XFS (sda1): Metadata 
> corruption detected at xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data 
> block 0x48b9ff80
> Mar  5 11:08:49 arms003-10 kernel: [  252.614170] XFS (sda1): Unmount and run 
> xfs_repair
> Mar  5 11:08:49 arms003-10 kernel: [  252.619030] XFS (sda1): First 64 bytes 
> of corrupted metadata buffer:
> Mar  5 11:08:49 arms003-10 kernel: [  252.625403] eb901000: 58 46 53 42 00 00 
> 10 00 00 00 00 00 91 73 fe fb  XFSB.s..
> Mar  5 11:08:49 arms003-10 kernel: [  252.633441] eb901010: 00 00 00 00 00 00 
> 00 00 00 00 00 00 00 00 00 00  
> Mar  5 11:08:49 arms003-10 kernel: [  252.641474] eb901020: fc 4e e3 89 50 8f 
> 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.n../
> Mar  5 11:08:49 arms003-10 kernel: [  252.649519] eb901030: 00 00 00 00 80 00 
> 00 07 ff ff ff ff ff ff ff ff  
> Mar  5 11:08:49 arms003-10 kernel: [  252.657554] XFS (sda1): metadata I/O 
> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
> Mar  5 11:08:49 arms003-10 kernel: [  252.675056] XFS (sda1): Metadata 
> corruption detected at xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data 
> block 0x48b9ff80
> Mar  5 11:08:49 arms003-10 kernel: [  252.686228] XFS (sda1): Unmount and run 
> xfs_repair
> Mar  5 11:08:49 arms003-10 kernel: [  252.691054] XFS (sda1): First 64 bytes 
> of corrupted metadata buffer:
> Mar  5 11:08:49 arms003-10 kernel: [  252.697425] eb901000: 58 46 53 42 00 00 
> 10 00 00 00 00 00 91 73 fe fb  XFSB.s..
> Mar  5 11:08:49 arms003-10 kernel: [  252.705459] eb901010: 00 00 00 00 00 00 
> 00 00 00 00 00 00 00 00 00 00  
> Mar  5 11:08:49 arms003-10 kernel: [  252.713489] eb901020: fc 4e e3 89 50 8f 
> 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.n../
> Mar  5 11:08:49 arms003-10 kernel: [  252.721520] eb901030: 00 00 00 00 80 00 
> 00 07 ff ff ff ff ff ff ff ff  
> Mar  5 11:08:49 arms003-10 kernel: [  252.729558] XFS (sda1): metadata I/O 
> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
> Mar  5 11:08:49 arms003-10 kernel: [  252.741953] XFS (sda1): Metadata 
> corruption detected at xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data 
> block 

Re: [ceph-users] XFS Metadata corruption while activating OSD

2018-03-12 Thread Peter Woodman
from what i've heard, xfs has problems on arm. use btrfs, or (i
believe?) ext4+bluestore will work.

On Sun, Mar 11, 2018 at 9:49 PM, Christian Wuerdig
 wrote:
> Hm, so you're running OSD nodes with 2GB of RAM and 2x10TB = 20TB of
> storage? Literally everything posted on this list in relation to HW
> requirements and related problems will tell you that this simply isn't going
> to work. The slightest hint of a problem will simply kill the OSD nodes with
> OOM. Have you tried with smaller disks - like 1TB models (or even smaller
> like 256GB SSDs) and see if the same problem persists?
>
>
> On Tue, 6 Mar 2018 at 10:51, 赵赵贺东  wrote:
>>
>> Hello ceph-users,
>>
>> It is a really really Really tough problem for our team.
>> We investigated in the problem for a long time, try a lot of efforts, but
>> can’t solve the problem, even the concentrate cause of the problem is still
>> unclear for us!
>> So, Anyone give any solution/suggestion/opinion whatever  will be highly
>> highly appreciated!!!
>>
>> Problem Summary:
>> When we activate osd, there will be  metadata corrupttion in the
>> activating disk, probability is 100% !
>>
>> Admin Nodes node:
>> Platform: X86
>> OS: Ubuntu 16.04
>> Kernel: 4.12.0
>> Ceph: Luminous 12.2.2
>>
>> OSD nodes:
>> Platform: armv7
>> OS:   Ubuntu 14.04
>> Kernel:   4.4.39
>> Ceph: Lominous 12.2.2
>> Disk: 10T+10T
>> Memory: 2GB
>>
>> Deploy log:
>>
>>
>> dmesg log:(Sorry arms001-01 dmesg log has log has been lost, but error
>> message about metadata corruption on arms003-10 are the same with
>> arms001-01)
>> Mar  5 11:08:49 arms003-10 kernel: [  252.534232] XFS (sda1): Unmount and
>> run xfs_repair
>> Mar  5 11:08:49 arms003-10 kernel: [  252.539100] XFS (sda1): First 64
>> bytes of corrupted metadata buffer:
>> Mar  5 11:08:49 arms003-10 kernel: [  252.545504] eb82f000: 58 46 53 42 00
>> 00 10 00 00 00 00 00 91 73 fe fb  XFSB.s..
>> Mar  5 11:08:49 arms003-10 kernel: [  252.553569] eb82f010: 00 00 00 00 00
>> 00 00 00 00 00 00 00 00 00 00 00  
>> Mar  5 11:08:49 arms003-10 kernel: [  252.561624] eb82f020: fc 4e e3 89 50
>> 8f 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.n../
>> Mar  5 11:08:49 arms003-10 kernel: [  252.569706] eb82f030: 00 00 00 00 80
>> 00 00 07 ff ff ff ff ff ff ff ff  
>> Mar  5 11:08:49 arms003-10 kernel: [  252.58] XFS (sda1): metadata I/O
>> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
>> Mar  5 11:08:49 arms003-10 kernel: [  252.602944] XFS (sda1): Metadata
>> corruption detected at xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data
>> block 0x48b9ff80
>> Mar  5 11:08:49 arms003-10 kernel: [  252.614170] XFS (sda1): Unmount and
>> run xfs_repair
>> Mar  5 11:08:49 arms003-10 kernel: [  252.619030] XFS (sda1): First 64
>> bytes of corrupted metadata buffer:
>> Mar  5 11:08:49 arms003-10 kernel: [  252.625403] eb901000: 58 46 53 42 00
>> 00 10 00 00 00 00 00 91 73 fe fb  XFSB.s..
>> Mar  5 11:08:49 arms003-10 kernel: [  252.633441] eb901010: 00 00 00 00 00
>> 00 00 00 00 00 00 00 00 00 00 00  
>> Mar  5 11:08:49 arms003-10 kernel: [  252.641474] eb901020: fc 4e e3 89 50
>> 8f 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.n../
>> Mar  5 11:08:49 arms003-10 kernel: [  252.649519] eb901030: 00 00 00 00 80
>> 00 00 07 ff ff ff ff ff ff ff ff  
>> Mar  5 11:08:49 arms003-10 kernel: [  252.657554] XFS (sda1): metadata I/O
>> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
>> Mar  5 11:08:49 arms003-10 kernel: [  252.675056] XFS (sda1): Metadata
>> corruption detected at xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data
>> block 0x48b9ff80
>> Mar  5 11:08:49 arms003-10 kernel: [  252.686228] XFS (sda1): Unmount and
>> run xfs_repair
>> Mar  5 11:08:49 arms003-10 kernel: [  252.691054] XFS (sda1): First 64
>> bytes of corrupted metadata buffer:
>> Mar  5 11:08:49 arms003-10 kernel: [  252.697425] eb901000: 58 46 53 42 00
>> 00 10 00 00 00 00 00 91 73 fe fb  XFSB.s..
>> Mar  5 11:08:49 arms003-10 kernel: [  252.705459] eb901010: 00 00 00 00 00
>> 00 00 00 00 00 00 00 00 00 00 00  
>> Mar  5 11:08:49 arms003-10 kernel: [  252.713489] eb901020: fc 4e e3 89 50
>> 8f 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.n../
>> Mar  5 11:08:49 arms003-10 kernel: [  252.721520] eb901030: 00 00 00 00 80
>> 00 00 07 ff ff ff ff ff ff ff ff  
>> Mar  5 11:08:49 arms003-10 kernel: [  252.729558] XFS (sda1): metadata I/O
>> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
>> Mar  5 11:08:49 arms003-10 kernel: [  252.741953] XFS (sda1): Metadata
>> corruption detected at xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data
>> block 0x48b9ff80
>> Mar  5 11:08:49 arms003-10 kernel: [  252.753139] XFS (sda1): Unmount and
>> run xfs_repair
>> Mar  5 11:08:49 arms003-10 kernel: [  252.757955] XFS (sda1): First 64
>> bytes of corrupted metadata buffer:

Re: [ceph-users] XFS Metadata corruption while activating OSD

2018-03-11 Thread Christian Wuerdig
Hm, so you're running OSD nodes with 2GB of RAM and 2x10TB = 20TB of
storage? Literally everything posted on this list in relation to HW
requirements and related problems will tell you that this simply isn't
going to work. The slightest hint of a problem will simply kill the OSD
nodes with OOM. Have you tried with smaller disks - like 1TB models (or
even smaller like 256GB SSDs) and see if the same problem persists?


On Tue, 6 Mar 2018 at 10:51, 赵赵贺东  wrote:

> Hello ceph-users,
>
> It is a really really *Really* tough problem for our team.
> We investigated in the problem for a long time, try a lot of efforts, but
> can’t solve the problem, even the concentrate cause of the problem is still
> unclear for us!
> So, Anyone give any solution/suggestion/opinion whatever  will be highly
> highly appreciated!!!
>
> Problem Summary:
> When we activate osd, there will be  metadata corrupttion in the
> activating disk, probability is 100% !
>
> Admin Nodes node:
> Platform: X86
> OS: Ubuntu 16.04
> Kernel: 4.12.0
> Ceph: Luminous 12.2.2
>
> OSD nodes:
> Platform: armv7
> OS:   Ubuntu 14.04
> Kernel:   4.4.39
> Ceph: Lominous 12.2.2
> Disk: 10T+10T
> Memory: 2GB
>
> Deploy log:
>
>
> dmesg log:(Sorry arms001-01 dmesg log has log has been lost, but error
> message about metadata corruption on arms003-10 are the same with
> arms001-01)
> Mar  5 11:08:49 arms003-10 kernel: [  252.534232] XFS (sda1): Unmount and
> run xfs_repair
> Mar  5 11:08:49 arms003-10 kernel: [  252.539100] XFS (sda1): First 64
> bytes of corrupted metadata buffer:
> Mar  5 11:08:49 arms003-10 kernel: [  252.545504] eb82f000: 58 46 53 42 00
> 00 10 00 00 00 00 00 91 73 fe fb  XFSB.s..
> Mar  5 11:08:49 arms003-10 kernel: [  252.553569] eb82f010: 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00  
> Mar  5 11:08:49 arms003-10 kernel: [  252.561624] eb82f020: fc 4e e3 89 50
> 8f 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.n../
> Mar  5 11:08:49 arms003-10 kernel: [  252.569706] eb82f030: 00 00 00 00 80
> 00 00 07 ff ff ff ff ff ff ff ff  
> Mar  5 11:08:49 arms003-10 kernel: [  252.58] XFS (sda1): metadata I/O
> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
> Mar  5 11:08:49 arms003-10 kernel: [  252.602944] XFS (sda1): Metadata
> corruption detected at xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data
> block 0x48b9ff80
> Mar  5 11:08:49 arms003-10 kernel: [  252.614170] XFS (sda1): Unmount and
> run xfs_repair
> Mar  5 11:08:49 arms003-10 kernel: [  252.619030] XFS (sda1): First 64
> bytes of corrupted metadata buffer:
> Mar  5 11:08:49 arms003-10 kernel: [  252.625403] eb901000: 58 46 53 42 00
> 00 10 00 00 00 00 00 91 73 fe fb  XFSB.s..
> Mar  5 11:08:49 arms003-10 kernel: [  252.633441] eb901010: 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00  
> Mar  5 11:08:49 arms003-10 kernel: [  252.641474] eb901020: fc 4e e3 89 50
> 8f 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.n../
> Mar  5 11:08:49 arms003-10 kernel: [  252.649519] eb901030: 00 00 00 00 80
> 00 00 07 ff ff ff ff ff ff ff ff  
> Mar  5 11:08:49 arms003-10 kernel: [  252.657554] XFS (sda1): metadata I/O
> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
> Mar  5 11:08:49 arms003-10 kernel: [  252.675056] XFS (sda1): Metadata
> corruption detected at xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data
> block 0x48b9ff80
> Mar  5 11:08:49 arms003-10 kernel: [  252.686228] XFS (sda1): Unmount and
> run xfs_repair
> Mar  5 11:08:49 arms003-10 kernel: [  252.691054] XFS (sda1): First 64
> bytes of corrupted metadata buffer:
> Mar  5 11:08:49 arms003-10 kernel: [  252.697425] eb901000: 58 46 53 42 00
> 00 10 00 00 00 00 00 91 73 fe fb  XFSB.s..
> Mar  5 11:08:49 arms003-10 kernel: [  252.705459] eb901010: 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00  
> Mar  5 11:08:49 arms003-10 kernel: [  252.713489] eb901020: fc 4e e3 89 50
> 8f 42 aa be bc 07 0c 6e fa 83 2f  .N..P.B.n../
> Mar  5 11:08:49 arms003-10 kernel: [  252.721520] eb901030: 00 00 00 00 80
> 00 00 07 ff ff ff ff ff ff ff ff  
> Mar  5 11:08:49 arms003-10 kernel: [  252.729558] XFS (sda1): metadata I/O
> error: block 0x48b9ff80 ("xfs_trans_read_buf_map") error 117 numblks 8
> Mar  5 11:08:49 arms003-10 kernel: [  252.741953] XFS (sda1): Metadata
> corruption detected at xfs_dir3_data_read_verify+0x58/0xd0, xfs_dir3_data
> block 0x48b9ff80
> Mar  5 11:08:49 arms003-10 kernel: [  252.753139] XFS (sda1): Unmount and
> run xfs_repair
> Mar  5 11:08:49 arms003-10 kernel: [  252.757955] XFS (sda1): First 64
> bytes of corrupted metadata buffer:
> Mar  5 11:08:49 arms003-10 kernel: [  252.764336] eb901000: 58 46 53 42 00
> 00 10 00 00 00 00 00 91 73 fe fb  XFSB.s..
> Mar  5 11:08:49 arms003-10 kernel: [  252.772365] eb901010: 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00  
> Mar  5 11:08:49 arms003-10 

[ceph-users] XFS Metadata corruption while activating OSD

2018-03-05 Thread 赵赵贺东
Hello ceph-users,It is a really really Really tough problem for our team.We investigated in the problem for a long time, try a lot of efforts, but can’t solve the problem, even the concentrate cause of the problem is still unclear for us!So, Anyone give any solution/suggestion/opinion whatever  will be highly highly appreciated!!!Problem Summary:When we activate osd, there will be  metadata corrupttion in the activating disk, probability is 100% !Admin Nodes node:Platform:	X86OS:		Ubuntu 16.04Kernel:	4.12.0Ceph:	Luminous 12.2.2OSD nodes:Platform:	armv7OS:      	Ubuntu 14.04Kernel:  	4.4.39Ceph:	Lominous 12.2.2Disk:	10T+10TMemory:	2GBDeploy log:
root@mnc000:/home/mnvadmin/ceph# ceph-deploy disk zap arms001-01:sda
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (1.5.39): /usr/bin/ceph-deploy disk zap 
arms001-01:sda
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] subcommand : zap
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf : 
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] func : 
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.cli][INFO ] disk : [('arms001-01', '/dev/sda', None)]
[ceph_deploy.osd][DEBUG ] zapping /dev/sda on arms001-01
[arms001-01][DEBUG ] connection detected need for sudo
[arms001-01][DEBUG ] connected to host: arms001-01
[arms001-01][DEBUG ] detect platform information from remote host
[arms001-01][DEBUG ] detect machine type
[arms001-01][DEBUG ] find the location of an executable
[arms001-01][INFO ] Running command: sudo /sbin/initctl version
[arms001-01][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO ] Distro info: Ubuntu 14.04 trusty
[arms001-01][DEBUG ] zeroing last few blocks of device
[arms001-01][DEBUG ] find the location of an executable
[arms001-01][INFO ] Running command: sudo /usr/local/bin/ceph-disk zap /dev/sda
[arms001-01][WARNIN] 
/usr/local/lib/python2.7/dist-packages/ceph_disk-1.0.0-py2.7.egg/ceph_disk/main.py:5653:
 UserWarning:
[arms001-01][WARNIN] 
***
[arms001-01][WARNIN] This tool is now deprecated in favor of ceph-volume.
[arms001-01][WARNIN] It is recommended to use ceph-volume for OSD deployments. 
For details see:
[arms001-01][WARNIN]
[arms001-01][WARNIN] http://docs.ceph.com/docs/master/ceph-volume/#migrating
[arms001-01][WARNIN]
[arms001-01][WARNIN] 
***
[arms001-01][WARNIN]
[arms001-01][DEBUG ] 4 bytes were erased at offset 0x0 (xfs)
[arms001-01][DEBUG ] they were: 58 46 53 42
[arms001-01][WARNIN] 10+0 records in
[arms001-01][WARNIN] 10+0 records out
[arms001-01][WARNIN] 10485760 bytes (10 MB) copied, 0.0610462 s, 172 MB/s
[arms001-01][WARNIN] 10+0 records in
[arms001-01][WARNIN] 10+0 records out
[arms001-01][WARNIN] 10485760 bytes (10 MB) copied, 0.129642 s, 80.9 MB/s
[arms001-01][WARNIN] Caution: invalid backup GPT header, but valid main header; 
regenerating
[arms001-01][WARNIN] backup header from main header.
[arms001-01][WARNIN]
[arms001-01][WARNIN] Warning! Main and backup partition tables differ! Use the 
'c' and 'e' options
[arms001-01][WARNIN] on the recovery & transformation menu to examine the two 
tables.
[arms001-01][WARNIN]
[arms001-01][WARNIN] Warning! One or more CRCs don't match. You should repair 
the disk!
[arms001-01][WARNIN]
[arms001-01][DEBUG ] 

[arms001-01][DEBUG ] Caution: Found protective or hybrid MBR and corrupt GPT. 
Using GPT, but disk
[arms001-01][DEBUG ] verification and recovery are STRONGLY recommended.
[arms001-01][DEBUG ] 

[arms001-01][DEBUG ] GPT data structures destroyed! You may now partition the 
disk using fdisk or
[arms001-01][DEBUG ] other utilities.
[arms001-01][DEBUG ] Creating new GPT entries.
[arms001-01][DEBUG ] The operation has completed successfully.
[arms001-01][WARNIN] 
/usr/local/lib/python2.7/dist-packages/ceph_disk-1.0.0-py2.7.egg/ceph_disk/main.py:5685:
 UserWarning:
[arms001-01][WARNIN] 
***
[arms001-01][WARNIN] This tool is now deprecated in favor of ceph-volume.
[arms001-01][WARNIN] It is recommended to use ceph-volume for OSD deployments. 
For details see:
[arms001-01][WARNIN]
[arms001-01][WARNIN] http://docs.ceph.com/docs/master/ceph-volume/#migrating
[arms001-01][WARNIN]
[arms001-01][WARNIN] 
***
[arms001-01][WARNIN]


root@mnc000:/home/mnvadmin/ceph# ceph-deploy osd prepare --filestore 
arms001-01:sda