Re: [PATCH] Btrfs: fix heavy delalloc related deadlock
On wed, 14 Aug 2013 11:41:00 -0400, Josef Bacik wrote: > I added a patch where we started taking the ordered operations mutex when we > waited on ordered extents. We need this because we splice the list and > process > it, so if a flusher came in during this scenario it would think the list was > empty and we'd usually get an early ENOSPC. The problem with this is that > this > lock is used in transaction committing. So we end up with something like this > > Transaction commit > -> wait on writers > > Delalloc flusher > -> run_ordered_operations (holds mutex) > ->wait for filemap-flush to do its thing > > flush task > -> cow_file_range > ->wait on btrfs_join_transaction because we're commiting > > some other task > -> commit_transaction because we notice trans->transaction->flush is set > -> run_ordered_operations (hang on mutex) Sorry, I can not understand this explanation. As far as I know, if the flush task waits on btrfs_join_transaction(), it means the transaction is under commit (state = TRANS_STATE_COMMIT_DOING), and all the external writers(TRANS_START/TRANS_ATTACH/ TRANS_USERSPACE) have quitted the current transaction, so no one would try to call run_ordered_operations(). Could you show us the reproduce steps? Thanks Miao > > We need to disentangle the ordered operations flushing from the delalloc > flushing, since they are separate things. This solves the deadlock issue I > was > seeing. Thanks, > > Signed-off-by: Josef Bacik > --- > fs/btrfs/ctree.h|7 +++ > fs/btrfs/disk-io.c |1 + > fs/btrfs/ordered-data.c |4 ++-- > 3 files changed, 10 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index ea4cc16..d79e32c 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -1418,6 +1418,13 @@ struct btrfs_fs_info { >* before jumping into the main commit. >*/ > struct mutex ordered_operations_mutex; > + > + /* > + * Same as ordered_operations_mutex except this is for ordered extents > + * and not the operations. > + */ > + struct mutex ordered_extent_flush_mutex; > + > struct rw_semaphore extent_commit_sem; > > struct rw_semaphore cleanup_work_sem; > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index c82025d..880dcde 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -2288,6 +2288,7 @@ int open_ctree(struct super_block *sb, > > > mutex_init(&fs_info->ordered_operations_mutex); > + mutex_init(&fs_info->ordered_extent_flush_mutex); > mutex_init(&fs_info->tree_log_mutex); > mutex_init(&fs_info->chunk_mutex); > mutex_init(&fs_info->transaction_kthread_mutex); > diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c > index 8136982..b52b2c4 100644 > --- a/fs/btrfs/ordered-data.c > +++ b/fs/btrfs/ordered-data.c > @@ -671,7 +671,7 @@ int btrfs_run_ordered_operations(struct > btrfs_trans_handle *trans, > INIT_LIST_HEAD(&splice); > INIT_LIST_HEAD(&works); > > - mutex_lock(&root->fs_info->ordered_operations_mutex); > + mutex_lock(&root->fs_info->ordered_extent_flush_mutex); > spin_lock(&root->fs_info->ordered_root_lock); > list_splice_init(&cur_trans->ordered_operations, &splice); > while (!list_empty(&splice)) { > @@ -718,7 +718,7 @@ out: > list_del_init(&work->list); > btrfs_wait_and_free_delalloc_work(work); > } > - mutex_unlock(&root->fs_info->ordered_operations_mutex); > + mutex_unlock(&root->fs_info->ordered_extent_flush_mutex); > return ret; > } > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: restore passing of super_bytenr to device scan
Hi, On thu, 15 Aug 2013 20:42:42 -0400, Jeff Mahoney wrote: > Commit 615f2867 (Btrfs-progs: cleanup similar code in open_ctree_* > and close_ctree) introduced a regression in btrfs-convert. Wang has fixed this problem. [PATCH] Btrfs-progs: fix wrong arg sb_bytenr for btrfs_scan_fs_devices() Thanks Miao > open_ctree takes a sb_bytenr argument to specify where to find the > superblock. Under normal conditions, this will be at BTRFS_SUPER_INFO_OFFSET, > and that commit assumed as much under all conditions. > > make_btrfs allows the caller to specify which blocks to use for > certain blocks (including the superblock) and this is used by btrfs-convert > to avoid overwriting the source file system's superblock until the > conversion is complete. > > When btrfs-convert goes to open the newly initialized file system, it > fails with: "No valid btrfs found" since its superblock wasn't written > to the normal location. > > This patch restores the passing down of super_bytesnr to > btrfs_scan_one_device. > > Signed-off-by: Jeff Mahoney > --- > btrfs-find-root.c | 2 +- > cmds-chunk.c | 2 +- > disk-io.c | 10 +++--- > disk-io.h | 3 ++- > 4 files changed, 11 insertions(+), 6 deletions(-) > > diff --git a/btrfs-find-root.c b/btrfs-find-root.c > index 9b3d7df..374cf81 100644 > --- a/btrfs-find-root.c > +++ b/btrfs-find-root.c > @@ -82,7 +82,7 @@ static struct btrfs_root *open_ctree_broken(int fd, const > char *device) > return NULL; > } > > - ret = btrfs_scan_fs_devices(fd, device, &fs_devices); > + ret = btrfs_scan_fs_devices(fd, device, &fs_devices, 0); > if (ret) > goto out; > > diff --git a/cmds-chunk.c b/cmds-chunk.c > index 03314de..6ada328 100644 > --- a/cmds-chunk.c > +++ b/cmds-chunk.c > @@ -1291,7 +1291,7 @@ static int recover_prepare(struct recover_control *rc, > char *path) > goto fail_free_sb; > } > > - ret = btrfs_scan_fs_devices(fd, path, &fs_devices); > + ret = btrfs_scan_fs_devices(fd, path, &fs_devices, 0); > if (ret) > goto fail_free_sb; > > diff --git a/disk-io.c b/disk-io.c > index 13dbe27..1b91de6 100644 > --- a/disk-io.c > +++ b/disk-io.c > @@ -909,13 +909,17 @@ void btrfs_cleanup_all_caches(struct btrfs_fs_info > *fs_info) > } > > int btrfs_scan_fs_devices(int fd, const char *path, > - struct btrfs_fs_devices **fs_devices) > + struct btrfs_fs_devices **fs_devices, > + u64 super_bytenr) > { > u64 total_devs; > int ret; > > + if (super_bytenr == 0) > + super_bytenr = BTRFS_SUPER_INFO_OFFSET; > + > ret = btrfs_scan_one_device(fd, path, fs_devices, > - &total_devs, BTRFS_SUPER_INFO_OFFSET); > + &total_devs, super_bytenr); > if (ret) { > fprintf(stderr, "No valid Btrfs found on %s\n", path); > return ret; > @@ -1001,7 +1005,7 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, > const char *path, > if (restore) > fs_info->on_restoring = 1; > > - ret = btrfs_scan_fs_devices(fp, path, &fs_devices); > + ret = btrfs_scan_fs_devices(fp, path, &fs_devices, sb_bytenr); > if (ret) > goto out; > > diff --git a/disk-io.h b/disk-io.h > index effaa9f..d7792e0 100644 > --- a/disk-io.h > +++ b/disk-io.h > @@ -59,7 +59,8 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, > void btrfs_release_all_roots(struct btrfs_fs_info *fs_info); > void btrfs_cleanup_all_caches(struct btrfs_fs_info *fs_info); > int btrfs_scan_fs_devices(int fd, const char *path, > - struct btrfs_fs_devices **fs_devices); > + struct btrfs_fs_devices **fs_devices, > + u64 super_bytenr); > int btrfs_setup_chunk_tree_and_device_map(struct btrfs_fs_info *fs_info); > > struct btrfs_root *open_ctree(const char *filename, u64 sb_bytenr, int > writes); > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: uncorrectable errors after btrfs replace
This is just a comment from someone following all of this from the sidelines. And that is that I see so much going on here with this procedure that is scares me. Once a single operation reaches a certain degree of complexity I get really scared because all it takes is a single misstep and my data is gone. And that happens so easily as complexity increases and confusion tends to set in. In this particular situation, my solution would probably have been to create a new btrfs partition from scratch on the new drive and simply mount the source partition/drive ro and rsync the data across to the target partition/drive rather than trying to do the btrfs replace operation. That way I could have verified the target drive before erasing the source drive and I would not have had to worry about partition sizes, encryption, etc. That said, I am certainly thankful that this was backup data and not working data. But I think it serves as a cautionary tale as to not assuming that something should be done just because it theoretically can be done. I am not really familiar with btrfs replace but would imagine that it is intended for use more in a raid situation than in simply moving data from one drive to another. On 08/18/2013 05:42 PM, Chris Murphy wrote: On Aug 18, 2013, at 4:35 PM, Stuart Pook wrote: You first shrank a 2TB btrfs file system on dmcrypt device to 590GB. But then you didn't resize the dm device or the partition? no, I had no need to resize the dm device or partition. OK well it's unusual to resize a file system and then not resize the containing block device. I don't know if Btrfs cares about this or not. I ran a badblocks scan on the raw device (not the luks device) and didn't get any errors. badblocks will depend on the drive determining a persistent read failure with a sector, and timing out before the SCSI block layer times out. Since the linux SCSI driver time out is 30 seconds, and most consumer drive ECT is 120 seconds, the bus is reset before the drive has a chance to report a bad sector. So I think you're better off using smartctl -l long tests to find bad sectors on a disk. Further a smartctl -x may show SATA Phy Event Counters, which should have 0's or very low numbers and if not then that's also an indicator of hardware problems. The data was written to the WD-Blue (640Gb) disk and then copied off it. The only errors I saw concerned the WB-Blue. If the errors were data corruption on writing or reading the WD-Blue then I would have thought that the checksums would have told me that there was something wrong. btrfs didn't give me an IO error until I started to read the files when the data was on a final disk. How does Btrfs know there's been a failure during write if the hardware hasn't detected it? Btrfs doesn't re-read everything it just wrote to the drive to confirm it was written correctly. It assumes it was unless there's a hardware error. It wouldn't know this until a Btrfs scrub is done on the written drive. What I can't tell you is how Btrfs behaves and if it behaves correctly, when writing data to hardware having transient errors. I don't know what it does when the hardware reports the error, but presumably if the hardware doesn't report an error Btrfs can't do anything about that except on the next read or scrub. Just to be clear. This is the series of btrfs replace I did: backups : HD204UI -> WD-Blue /mnt : WD-Black -> HD204UI backups : WD-Blue -> WD-Black I guess that my backups were corrupted was they were written to or read from the WD-Blue. Wouldn't the checksums have detected this problem before the data was written to the WD-Black? When you first encountered the btrfs reported csum errors, what operation was occurring? There's only so much software can do to overcome blatant hardware problems. I was hoping to be informed of them Well you were informed of them in dmesg, by virtue of the controller having problems talking to a SATA rev 2 drive at rev 2 speed, with a negotiated fallback to rev 1 speed. But, it seems unlikely such a high percent of errors would go undetected to result in so many uncorrectable errors, so there may be user error here along with a bug. I'm not sure how I could have done it better. Does "btrfs replace" check that the data is correctly written to the new disk before it is removed from the old disk? That's a valid question. Hopefully someone more knowledgable can answer what the expected error handling behavior is supposed to be. Should I have used the 2 disks to make a RAID-1 and then done a scrub before removing the old disk? Good question. Possibly it's best practices to use btrfs replace with an existing raid1, rather than using it as a way to move a single copy of data from one disk to another. I think you'd have been better off using btrfs send and receive for this operation. A full dmesg might also be enlightening even if it is really long. Just put it
Re: uncorrectable errors after btrfs replace
On Aug 18, 2013, at 4:35 PM, Stuart Pook wrote: >> >> You first shrank a 2TB btrfs file system on dmcrypt device to 590GB. >> But then you didn't resize the dm device or the partition? > > no, I had no need to resize the dm device or partition. OK well it's unusual to resize a file system and then not resize the containing block device. I don't know if Btrfs cares about this or not. > > I ran a badblocks scan on the raw device (not the luks device) and didn't get > any errors. badblocks will depend on the drive determining a persistent read failure with a sector, and timing out before the SCSI block layer times out. Since the linux SCSI driver time out is 30 seconds, and most consumer drive ECT is 120 seconds, the bus is reset before the drive has a chance to report a bad sector. So I think you're better off using smartctl -l long tests to find bad sectors on a disk. Further a smartctl -x may show SATA Phy Event Counters, which should have 0's or very low numbers and if not then that's also an indicator of hardware problems. > The data was written to the WD-Blue (640Gb) disk and then copied off it. The > only errors I saw concerned the WB-Blue. If the errors were data corruption > on writing or reading the WD-Blue then I would have thought that the > checksums would have told me that there was something wrong. btrfs didn't > give me an IO error until I started to read the files when the data was on a > final disk. How does Btrfs know there's been a failure during write if the hardware hasn't detected it? Btrfs doesn't re-read everything it just wrote to the drive to confirm it was written correctly. It assumes it was unless there's a hardware error. It wouldn't know this until a Btrfs scrub is done on the written drive. What I can't tell you is how Btrfs behaves and if it behaves correctly, when writing data to hardware having transient errors. I don't know what it does when the hardware reports the error, but presumably if the hardware doesn't report an error Btrfs can't do anything about that except on the next read or scrub. > > Just to be clear. This is the series of btrfs replace I did: > > backups : HD204UI -> WD-Blue > /mnt : WD-Black -> HD204UI > backups : WD-Blue -> WD-Black > > I guess that my backups were corrupted was they were written to or read from > the WD-Blue. Wouldn't the checksums have detected this problem before the > data was written to the WD-Black? When you first encountered the btrfs reported csum errors, what operation was occurring? > >> There's only so much software can do to overcome blatant hardware problems. > > I was hoping to be informed of them Well you were informed of them in dmesg, by virtue of the controller having problems talking to a SATA rev 2 drive at rev 2 speed, with a negotiated fallback to rev 1 speed. > >> But, it seems unlikely such a high percent of errors would go >> undetected to result in so many uncorrectable errors, so there may be >> user error here along with a bug. > > I'm not sure how I could have done it better. Does "btrfs replace" check that > the data is correctly written to the new disk before it is removed from the > old disk? That's a valid question. Hopefully someone more knowledgable can answer what the expected error handling behavior is supposed to be. > Should I have used the 2 disks to make a RAID-1 and then done a scrub before > removing the old disk? Good question. Possibly it's best practices to use btrfs replace with an existing raid1, rather than using it as a way to move a single copy of data from one disk to another. I think you'd have been better off using btrfs send and receive for this operation. A full dmesg might also be enlightening even if it is really long. Just put it in its own email without comment. I think pasting it out of forum is less preferred. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: uncorrectable errors after btrfs replace
hi Chris thanks for your reply. I was unable to save the filesystem. Even after deleting all but 4Gb I still had too many errors so I just reformated the device. I'm glad that it was my backups and not my data. On 18/08/13 23:43, Chris Murphy wrote: On Aug 18, 2013, at 1:12 PM, Stuart Pook wrote: 6 btrfs filesystem resize 580g . You first shrank a 2TB btrfs file system on dmcrypt device to 590GB. But then you didn't resize the dm device or the partition? no, I had no need to resize the dm device or partition. I just read that when doing a replace the new device must be no smaller than the old device. So I shrunk the old device using "btrfs filesystem resize". Once the resize worked I was able to do the replace but I didn't try to replace before resizing. This is what btrfs(1) says on Debian: "The targetdev needs to be same size or larger than the srcdev." I may be confused here. 9 time btrfs balance start -musage=1 -dusage=1 . && time btrfs filesystem resize 580g . I was surprised that the resize to 580Gb didn't work so I tried a magical rebalance before doing the resize to 580 again. It still didn't work (not enough space) but a resize to 590 Gb did. 10 time btrfs filesystem resize 590g . this worked You followed the resize of the fs, but not the underlying devices, with a balance, then resized it two more times? The resize to 580 didn't work. So I did a balance. The resize to 580 still didn't work so I resized to 590. This is weird, but also makes the sequence difficult to follow. 13 time btrfs replace start /dev/dm-11 /dev/dm-12 -B /disks/backups 14 time btrfs replace start /dev/dm-11 /dev/dm-12-B /disks/backups Why is this command repeated? What's with the numbering system that skips numbers? The command is repeated because I cancelled it my mistake by setting the filesystem to readonly. I'm not sure if I restarted it by rerunning the replace or just by remounting the filesystem readwrite in another window. I'll put all of the commands at the end of this list. Aug 18 12:28:17 kooka kernel: [54139.448029] ata10: SATA link up1.5 Gbps (SStatus 113 SControl 310) Bad connection so libata is dropping the link from 3 Gbps to1.5Gbps. 199 UDMA_CRC_Error_Count0x0032 200 200 000Old_age Always - 12080 This confirms that both ends of the cable are sensing communication problems between drive and controller. The cable needs to be replaced, likely it's the connector not the cable itself. I think that I should stop using my SATA dock with the SATA ports on my motherboard which are probably not designed to be hot plugged. I guess that /disks/backup is mostly dead and that I should just reformat it. What do you think? Well I think I'd try to simplify this drastically and see if you've got a reproducing bug. I ran a badblocks scan on the raw device (not the luks device) and didn't get any errors. The steps you've got I find mostly incoherent, so I can't try to do what you did to see if it's reproducible. yes, this was the first time I've tried this. And just to make this more difficult some commands were typed in a different window. Next time I'll watch /var/log/syslog but I would have preferred that "btrfs replace" stop when getting errors. The errors should be self correcting, but the mere fact they're happening means that some errors could be occurring but aren't detected. If the data is corrupting in-transit, but the drive or controller didn't report a problem, then btrfs has no way of knowing it was written incorrectly. The data was written to the WD-Blue (640Gb) disk and then copied off it. The only errors I saw concerned the WB-Blue. If the errors were data corruption on writing or reading the WD-Blue then I would have thought that the checksums would have told me that there was something wrong. btrfs didn't give me an IO error until I started to read the files when the data was on a final disk. Does "btrfs replace" check the ckecksums as it reads the data from the disk that is being replaced? Just to be clear. This is the series of btrfs replace I did: backups : HD204UI -> WD-Blue /mnt : WD-Black -> HD204UI backups : WD-Blue -> WD-Black I guess that my backups were corrupted was they were written to or read from the WD-Blue. Wouldn't the checksums have detected this problem before the data was written to the WD-Black? There's only so much software can do to overcome blatant hardware problems. I was hoping to be informed of them But, it seems unlikely such a high percent of errors would go undetected to result in so many uncorrectable errors, so there may be user error here along with a bug. I'm not sure how I could have done it better. Does "btrfs replace" check that the data is correctly written to the new disk before it is removed from the old disk? Should I have used the 2 disks to make a RAID-1 and then done a scrub before removing the old disk? Here is the complete list of co
Re: uncorrectable errors after btrfs replace
On Aug 18, 2013, at 1:12 PM, Stuart Pook wrote: >6 btrfs filesystem resize 580g . You first shrank a 2TB btrfs file system on dmcrypt device to 590GB. But then you didn't resize the dm device or the partition? > 9 time btrfs balance start -musage=1 -dusage=1 . && time btrfs filesystem > resize 580g . > 10 time btrfs filesystem resize 590g . You followed the resize of the fs, but not the underlying devices, with a balance, then resized it two more times? This is weird, but also makes the sequence difficult to follow. > 13 time btrfs replace start /dev/dm-11 /dev/dm-12 -B /disks/backups > 14 time btrfs replace start /dev/dm-11 /dev/dm-12 -B /disks/backups Why is this command repeated? What's with the numbering system that skips numbers? > > > [...] > Aug 18 12:28:03 kooka kernel: [54125.020262] ata10: hard resetting link > Aug 18 12:28:03 kooka kernel: [54125.512032] ata10: SATA link up 3.0 Gbps > (SStatus 123 SControl 300) > Aug 18 12:28:03 kooka kernel: [54125.523759] ata10.00: configured for UDMA/133 > Aug 18 12:28:03 kooka kernel: [54125.536380] ata10: EH complete > Aug 18 12:28:04 kooka kernel: [54125.770176] ata10.00: exception Emask 0x10 > SAct 0x7fff SErr 0x780100 action 0x6 > Aug 18 12:28:04 kooka kernel: [54125.770181] ata10.00: irq_stat 0x0800 > Aug 18 12:28:04 kooka kernel: [54125.770184] ata10: SError: { UnrecovData > 10B8B Dispar BadCRC Handshk } > [...] > Aug 18 12:28:17 kooka kernel: [54138.957095] ata10.00: status: { DRDY } > Aug 18 12:28:17 kooka kernel: [54138.957100] ata10: hard resetting link > Aug 18 12:28:17 kooka kernel: [54139.448029] ata10: SATA link up 1.5 Gbps > (SStatus 113 SControl 310) > Aug 18 12:28:17 kooka kernel: [54139.449972] ata10.00: configured for UDMA/133 > Aug 18 12:28:17 kooka kernel: [54139.464065] ata10: EH complete Bad connection so libata is dropping the link from 3 Gbps to 1.5Gbps. > > 199 UDMA_CRC_Error_Count0x0032 200 200 000Old_age Always > - 12080 This confirms that both ends of the cable are sensing communication problems between drive and controller. The cable needs to be replaced, likely it's the connector not the cable itself. > I guess that /disks/backup is mostly dead and that I should just reformat it. > What do you think? Well I think I'd try to simplify this drastically and see if you've got a reproducing bug. The steps you've got I find mostly incoherent, so I can't try to do what you did to see if it's reproducible. > Next time I'll watch /var/log/syslog but I would have preferred that "btrfs > replace" stop when getting errors. The errors should be self correcting, but the mere fact they're happening means that some errors could be occurring but aren't detected. If the data is corrupting in-transit, but the drive or controller didn't report a problem, then btrfs has no way of knowing it was written incorrectly. There's only so much software can do to overcome blatant hardware problems. But, it seems unlikely such a high percent of errors would go undetected to result in so many uncorrectable errors, so there may be user error here along with a bug. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
uncorrectable errors after btrfs replace
hi all I moved my btrfs filesystems around using btrfs replace and now I have errors (lots of errors) [63724.419779] BTRFS info (device dm-12): csum failed ino 9340 off 8192 csum 717036259 private 94677163 : root; time btrfs scrub start -Bd /disks/backups scrub device /dev/dm-11 (id 1) done scrub started at Sun Aug 18 15:17:50 2013 and finished after 4487 seconds total bytes scrubbed: 576.46GB with 261883 errors error details: csum=261883 corrected errors: 0, uncorrectable errors: 261883, unverified errors: 0 I had two 2 Tb disks who's data I needed to swap (/mnt on a WD-Black & /disks/backup on a HD204UI). Both had btrfs systems but /disks/backup was encrypted using luks. I had a spare 640 Gb WD-Blue disk that I plugged into an SATA dock for this operation. I "btrfs resize"d /disks/backup to fit in 590 GB then I "btrfs replace"d /disks/backup to a new luks partition on the WD-Blue disk. Then I "btrfs replace"d /mnt to the HD204UI. Then I "btrfs replace"d the backup data to a new luks partition on the WD-Black. I then got IO Errors reading /disks/backup. I'm using: Linux kooka 3.10-2-amd64 #1 SMP Debian 3.10.5-1 (2013-08-07) x86_64 GNU/Linux and btrfs-tools 0.19+20130315-5 rsync: write failed on "/disks/backups/snapshot_rsync/stuart/secret/current/.purple/accounts.xml": Input/output error (5) Lots of files on /disks/backup have errors. smartctl says passed for all the drives. This is a summary of what I did: 6 btrfs filesystem resize 580g . 9 time btrfs balance start -musage=1 -dusage=1 . && time btrfs filesystem resize 580g . 10 time btrfs filesystem resize 590g . 12 cryptsetup luksOpen /dev/sdd2 640Gb 13 time btrfs replace start /dev/dm-11 /dev/dm-12 -B /disks/backups 14 time btrfs replace start /dev/dm-11 /dev/dm-12 -B /disks/backups 18 cryptsetup remove _dev_sdc2 19 fdisk /dev/sdc 32 time btrfs replace start /dev/sdb1 /dev/sdc2 -B /mnt 34 btrfs filesystem label /dev/dm-12 36 btrfs filesystem label /disks/backups backups2Tb 38 btrfs filesystem label /disks/backups 39 cryptsetup luksFormat /dev/sdb2 40 cryptsetup luksAddKey /dev/sdb2 41 cryptsetup open /dev/sdb2 newbackups 43 time btrfs replace start /dev/dm-12 /dev/dm-11 -B /disks/backups 44 btrfs filesystem show 45 cryptsetup status 640Gb 46 cryptsetup remove 640Gb 47 btrfs filesystem show 49 btrfs filesystem resize max /disks/backups/ 54 /etc/local/backups # errors ! 57 time btrfs scrub start -Bd /disks/backups Lots of errors in /var/log/syslog Aug 18 12:27:51 kooka kernel: [54113.507151] btrfs: dev_replace from /dev/mapper/640Gb (devid 1) to /dev/dm-11) started Aug 18 12:27:51 kooka kernel: [54113.601334] device label backups2Tb devid 1 transid 39282 /dev/dm-12 Aug 18 12:28:03 kooka kernel: [54125.020038] ata10.00: exception Emask 0x10 SAct 0x3dfe0ff0 SErr 0x780100 action 0x6 Aug 18 12:28:03 kooka kernel: [54125.020043] ata10.00: irq_stat 0x0800 Aug 18 12:28:03 kooka kernel: [54125.020047] ata10: SError: { UnrecovData 10B8B Dispar BadCRC Handshk } Aug 18 12:28:03 kooka kernel: [54125.020050] ata10.00: failed command: READ FPDMA QUEUED Aug 18 12:28:03 kooka kernel: [54125.020056] ata10.00: cmd 60/18:20:c0:18:0b/00:00:00:00:00/40 tag 4 ncq 12288 in Aug 18 12:28:03 kooka kernel: [54125.020056] res 40/00:5c:f0:1a:0b/00:00:00:00:00/40 Emask 0x10 (ATA bus error) Aug 18 12:28:03 kooka kernel: [54125.020059] ata10.00: status: { DRDY } [...] Aug 18 12:28:03 kooka kernel: [54125.020262] ata10: hard resetting link Aug 18 12:28:03 kooka kernel: [54125.512032] ata10: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 18 12:28:03 kooka kernel: [54125.523759] ata10.00: configured for UDMA/133 Aug 18 12:28:03 kooka kernel: [54125.536380] ata10: EH complete Aug 18 12:28:04 kooka kernel: [54125.770176] ata10.00: exception Emask 0x10 SAct 0x7fff SErr 0x780100 action 0x6 Aug 18 12:28:04 kooka kernel: [54125.770181] ata10.00: irq_stat 0x0800 Aug 18 12:28:04 kooka kernel: [54125.770184] ata10: SError: { UnrecovData 10B8B Dispar BadCRC Handshk } [...] Aug 18 12:28:17 kooka kernel: [54138.957095] ata10.00: status: { DRDY } Aug 18 12:28:17 kooka kernel: [54138.957100] ata10: hard resetting link Aug 18 12:28:17 kooka kernel: [54139.448029] ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Aug 18 12:28:17 kooka kernel: [54139.449972] ata10.00: configured for UDMA/133 Aug 18 12:28:17 kooka kernel: [54139.464065] ata10: EH complete [...] Aug 18 12:38:31 kooka kernel: [54753.527070] btrfs: checksum error at logical 52642709504 on dev /dev/dm-12, sector 104931328, root 1281, inode 42152, offset 0, length 4096, links 1 (path: X) [...] Aug 18 12:38:31 kooka kernel: [54753.606566] btrfs: bdev /dev/dm-12 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 [...] Aug 18 12:38:32 kooka kernel: [54753.679513] btrfs: bdev /dev/dm-12 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0 Aug 1
Re: [PATCH 10/15] btrfs-progs: fix qgroup realloc inheritance
On 08/15/13 01:16, Zach Brown wrote: > qgroup.c:82:23: warning: memcpy with byte count of 0 > qgroup.c:83:23: warning: memcpy with byte count of 0 > > The inheritance wasn't copying qgroups[] because a confused sizeof() > gave 0 byte memcpy()s. It's been like this for the year since it was > merged, so I guess this isn't a very important thing to do :). It only seems to hit if you give -[cx] before -i. I guess only very few people use these options in the first place. They are primarily for hosting providers. Reviewed-by: Arne Jansen > > Signed-off-by: Zach Brown > --- > qgroup.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/qgroup.c b/qgroup.c > index 038c4dc..86fe2b2 100644 > --- a/qgroup.c > +++ b/qgroup.c > @@ -74,7 +74,7 @@ qgroup_inherit_realloc(struct btrfs_qgroup_inherit > **inherit, int n, int pos) > > if (*inherit) { > struct btrfs_qgroup_inherit *i = *inherit; > - int s = sizeof(out->qgroups); > + int s = sizeof(out->qgroups[0]); > > out->num_qgroups = i->num_qgroups; > out->num_ref_copies = i->num_ref_copies; > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html