Re: Need help with potential ~45TB dataloss

2018-12-04 Thread Chris Murphy
On Tue, Dec 4, 2018 at 3:09 AM Patrick Dijkgraaf wrote: > > Hi Chris, > > See the output below. Any suggestions based on it? If they're SATA drives, they may not support SCT ERC; and if they're SAS, depending on what controller they're behind, smartctl might need a hint to properly ask the drive

Re: Need help with potential ~45TB dataloss

2018-12-04 Thread Patrick Dijkgraaf
Hi Chris, See the output below. Any suggestions based on it? Thanks! -- Groet / Cheers, Patrick Dijkgraaf On Mon, 2018-12-03 at 20:16 -0700, Chris Murphy wrote: > Also useful information for autopsy, perhaps not for fixing, is to > know whether the SCT ERC value for every drive is less than

Re: Need help with potential ~45TB dataloss

2018-12-04 Thread Patrick Dijkgraaf
Hi, thanks again. Please see answers inline. -- Groet / Cheers, Patrick Dijkgraaf On Mon, 2018-12-03 at 08:35 +0800, Qu Wenruo wrote: > > On 2018/12/2 下午5:03, Patrick Dijkgraaf wrote: > > Hi Qu, > > > > Thanks for helping me! > > > > Please see the reponses in-line. > > Any suggestions

Re: Need help with potential ~45TB dataloss

2018-12-03 Thread Chris Murphy
Also useful information for autopsy, perhaps not for fixing, is to know whether the SCT ERC value for every drive is less than the kernel's SCSI driver block device command timeout value. It's super important that the drive reports an explicit read failure before the read command is considered

Re: Need help with potential ~45TB dataloss

2018-12-02 Thread Qu Wenruo
On 2018/12/3 上午4:30, Andrei Borzenkov wrote: > 02.12.2018 23:14, Patrick Dijkgraaf пишет: >> I have some additional info. >> >> I found the reason the FS got corrupted. It was a single failing drive, >> which caused the entire cabinet (containing 7 drives) to reset. So the >> FS suddenly lost 7

Re: Need help with potential ~45TB dataloss

2018-12-02 Thread Qu Wenruo
On 2018/12/3 上午8:35, Qu Wenruo wrote: > > > On 2018/12/2 下午5:03, Patrick Dijkgraaf wrote: >> Hi Qu, >> >> Thanks for helping me! >> >> Please see the reponses in-line. >> Any suggestions based on this? >> >> Thanks! >> >> >> On Sat, 2018-12-01 at 07:57 +0800, Qu Wenruo wrote: >>> On 2018/11/30

Re: Need help with potential ~45TB dataloss

2018-12-02 Thread Qu Wenruo
On 2018/12/2 下午5:03, Patrick Dijkgraaf wrote: > Hi Qu, > > Thanks for helping me! > > Please see the reponses in-line. > Any suggestions based on this? > > Thanks! > > > On Sat, 2018-12-01 at 07:57 +0800, Qu Wenruo wrote: >> On 2018/11/30 下午9:53, Patrick Dijkgraaf wrote: >>> Hi all, >>> >>>

Re: Need help with potential ~45TB dataloss

2018-12-02 Thread Andrei Borzenkov
02.12.2018 23:14, Patrick Dijkgraaf пишет: > I have some additional info. > > I found the reason the FS got corrupted. It was a single failing drive, > which caused the entire cabinet (containing 7 drives) to reset. So the > FS suddenly lost 7 drives. > This remains mystery for me. btrfs is

Re: Need help with potential ~45TB dataloss

2018-12-02 Thread Patrick Dijkgraaf
I have some additional info. I found the reason the FS got corrupted. It was a single failing drive, which caused the entire cabinet (containing 7 drives) to reset. So the FS suddenly lost 7 drives. I have removed the failed drive, so the RAID is now degraded. I hope the data is still

Re: Need help with potential ~45TB dataloss

2018-12-02 Thread Patrick Dijkgraaf
Hi Qu, Thanks for helping me! Please see the reponses in-line. Any suggestions based on this? Thanks! On Sat, 2018-12-01 at 07:57 +0800, Qu Wenruo wrote: > On 2018/11/30 下午9:53, Patrick Dijkgraaf wrote: > > Hi all, > > > > I have been a happy BTRFS user for quite some time. But now I'm > >

Re: Need help with potential ~45TB dataloss

2018-11-30 Thread Qu Wenruo
On 2018/11/30 下午9:53, Patrick Dijkgraaf wrote: > Hi all, > > I have been a happy BTRFS user for quite some time. But now I'm facing > a potential ~45TB dataloss... :-( > I hope someone can help! > > I have Server A and Server B. Both having a 20-devices BTRFS RAID6 > filesystem. Because of