Re: Regarding handling of file renames in Btrfs
On 2017年09月10日 14:41, Qu Wenruo wrote: On 2017年09月10日 07:50, Rohan Kadekodi wrote: Hello, I was trying to understand how file renames are handled in Btrfs. I read the code documentation, but had a problem understanding a few things. During a file rename, btrfs_commit_transaction() is called which is because Btrfs has to commit the whole FS before storing the information related to the new renamed file. It has to commit the FS because a rename first does an unlink, which is not recorded in the btrfs_rename() transaction and so is not logged in the log tree. Is my understanding correct? If yes, my questions are as follows: Not familiar with rename kernel code, so not much help for rename opeartion. 1. What does committing the whole FS mean? Committing the whole fs means a lot of things, but generally speaking, it makes that the on-disk data is inconsistent with each other. ^consistent Sorry for the typo. Thanks, Qu For obvious part, it writes modified fs/subvolume trees to disk (with handling of tree operations so no half modified trees). Also other trees like extent tree (very hot since every CoW will update it, and the most complicated one), csum tree if modified. After transaction is committed, the on-disk btrfs will represent the states when commit trans is called, and every tree should match each other. Despite of this, after a transaction is committed, generation of the fs get increased and modified tree blocks will have the same generation number. Blktrace shows that there are 2 256KB writes, which are essentially writes to the data of the root directory of the file system (which I found out through btrfs-debug-tree). I'd say you didn't check btrfs-debug-tree output carefully enough. I strongly recommend to do vimdiff to get what tree is modified. At least the following trees are modified: 1) fs/subvolume tree Rename modified the DIR_INDEX/DIR_ITEM/INODE_REF at least, and updated inode time. So fs/subvolume tree must be CoWed. 2) extent tree CoW of above metadata operation will definitely cause extent allocation and freeing, extent tree will also get updated. 3) root tree Both extent tree and fs/subvolume tree modified, their root bytenr needs to be updated and root tree must be updated. And finally superblocks. I just verified the behavior with empty btrfs created on a 1G file, only one file to do the rename. In that case (with 4K sectorsize and 16K nodesize), the total IO should be (3 * 16K) * 2 + 4K * 2 = 104K. "3" = number of tree blocks get modified "16K" = nodesize 1st "*2" = DUP profile for metadata "4K" = superblock size 2nd "*2" = 2 superblocks for 1G fs. If your extent/root/fs trees have higher level, then more tree blocks needs to be updated. And if your fs is very large, you may have 3 superblocks. Is this equivalent to doing a shell sync, as the same block groups are written during a shell sync too? For shell "sync" the difference is that, "sync" will write all dirty data pages to disk, and then commit transaction. While only calling btrfs_commit_transacation() doesn't trigger dirty page writeback. So there is a difference. And furthermore, if there is nothing to modified at all, sync will just skip the fs, so btrfs_commit_transaction() is not ensured if you call "sync". Also, does it imply that all the metadata held by the log tree is now checkpointed to the respective trees? Log tree part is a little tricky, as the log tree is not really a journal for btrfs. Btrfs uses CoW for metadata so in theory (and in fact) btrfs doesn't need any journal. Log tree is mainly used for enhancing btrfs fsync performance. You can totally disable log tree by notreelog mount option and btrfs will behave just fine. And furthermore, I'm not very familiar with log tree, I need to verify the code to see if log tree is used in rename, so I can't say much right now. But to make things easy, I strongly recommend to ignore log tree for now. 2. Why are there 2 complete writes to the data held by the root directory and not just 1? These writes are 256KB each, which is the size of the extent allocated to the root directory Check my first calculation and verify the debug-tree output before and after rename. I think there is some extra factors affecting the number, from the tree height to your fs tree organization. 3. Why are the writes being done to the root directory of the file system / subvolume and not just the parent directory where the unlink happened? That's why I strongly recommend to understand btrfs on-disk format first. A lot of things can be answered after understanding the on-disk layout, without asking any other guys. The short answer is, btrfs puts all its child dir/inode info into one tree for one subvolume. (And the term "root directory" here is a little confusing, are you talking about the fs tree root or the root tree?) Not the
Re: Regarding handling of file renames in Btrfs
On 2017年09月10日 07:50, Rohan Kadekodi wrote: Hello, I was trying to understand how file renames are handled in Btrfs. I read the code documentation, but had a problem understanding a few things. During a file rename, btrfs_commit_transaction() is called which is because Btrfs has to commit the whole FS before storing the information related to the new renamed file. It has to commit the FS because a rename first does an unlink, which is not recorded in the btrfs_rename() transaction and so is not logged in the log tree. Is my understanding correct? If yes, my questions are as follows: Not familiar with rename kernel code, so not much help for rename opeartion. 1. What does committing the whole FS mean? Committing the whole fs means a lot of things, but generally speaking, it makes that the on-disk data is inconsistent with each other. For obvious part, it writes modified fs/subvolume trees to disk (with handling of tree operations so no half modified trees). Also other trees like extent tree (very hot since every CoW will update it, and the most complicated one), csum tree if modified. After transaction is committed, the on-disk btrfs will represent the states when commit trans is called, and every tree should match each other. Despite of this, after a transaction is committed, generation of the fs get increased and modified tree blocks will have the same generation number. Blktrace shows that there are 2 256KB writes, which are essentially writes to the data of the root directory of the file system (which I found out through btrfs-debug-tree). I'd say you didn't check btrfs-debug-tree output carefully enough. I strongly recommend to do vimdiff to get what tree is modified. At least the following trees are modified: 1) fs/subvolume tree Rename modified the DIR_INDEX/DIR_ITEM/INODE_REF at least, and updated inode time. So fs/subvolume tree must be CoWed. 2) extent tree CoW of above metadata operation will definitely cause extent allocation and freeing, extent tree will also get updated. 3) root tree Both extent tree and fs/subvolume tree modified, their root bytenr needs to be updated and root tree must be updated. And finally superblocks. I just verified the behavior with empty btrfs created on a 1G file, only one file to do the rename. In that case (with 4K sectorsize and 16K nodesize), the total IO should be (3 * 16K) * 2 + 4K * 2 = 104K. "3" = number of tree blocks get modified "16K" = nodesize 1st "*2" = DUP profile for metadata "4K" = superblock size 2nd "*2" = 2 superblocks for 1G fs. If your extent/root/fs trees have higher level, then more tree blocks needs to be updated. And if your fs is very large, you may have 3 superblocks. Is this equivalent to doing a shell sync, as the same block groups are written during a shell sync too? For shell "sync" the difference is that, "sync" will write all dirty data pages to disk, and then commit transaction. While only calling btrfs_commit_transacation() doesn't trigger dirty page writeback. So there is a difference. And furthermore, if there is nothing to modified at all, sync will just skip the fs, so btrfs_commit_transaction() is not ensured if you call "sync". Also, does it imply that all the metadata held by the log tree is now checkpointed to the respective trees? Log tree part is a little tricky, as the log tree is not really a journal for btrfs. Btrfs uses CoW for metadata so in theory (and in fact) btrfs doesn't need any journal. Log tree is mainly used for enhancing btrfs fsync performance. You can totally disable log tree by notreelog mount option and btrfs will behave just fine. And furthermore, I'm not very familiar with log tree, I need to verify the code to see if log tree is used in rename, so I can't say much right now. But to make things easy, I strongly recommend to ignore log tree for now. 2. Why are there 2 complete writes to the data held by the root directory and not just 1? These writes are 256KB each, which is the size of the extent allocated to the root directory Check my first calculation and verify the debug-tree output before and after rename. I think there is some extra factors affecting the number, from the tree height to your fs tree organization. 3. Why are the writes being done to the root directory of the file system / subvolume and not just the parent directory where the unlink happened? That's why I strongly recommend to understand btrfs on-disk format first. A lot of things can be answered after understanding the on-disk layout, without asking any other guys. The short answer is, btrfs puts all its child dir/inode info into one tree for one subvolume. (And the term "root directory" here is a little confusing, are you talking about the fs tree root or the root tree?) Not the common one tree for one inode layout. So if you rename one file in a subvolume, the subvolume tree get CoWed, which means from the
Re: Please help with exact actions for raid1 hot-swap
It doesn't need replaced disk to be readable, right? Then what prevents same procedure to work without a spare bay? -- With Best Regards, Marat Khalili On September 9, 2017 1:29:08 PM GMT+03:00, Patrik Lundquist wrote: >On 9 September 2017 at 12:05, Marat Khalili wrote: >> Forgot to add, I've got a spare empty bay if it can be useful here. > >That makes it much easier since you don't have to mount it degraded, >with the risks involved. > >Add and partition the disk. > ># btrfs replace start /dev/sdb7 /dev/sdc(?)7 /mnt/data > >Remove the old disk when it is done. > >> -- >> >> With Best Regards, >> Marat Khalili >> >> On September 9, 2017 10:46:10 AM GMT+03:00, Marat Khalili > wrote: >>>Dear list, >>> >>>I'm going to replace one hard drive (partition actually) of a btrfs >>>raid1. Can you please spell exactly what I need to do in order to get >>>my >>>filesystem working as RAID1 again after replacement, exactly as it >was >>>before? I saw some bad examples of drive replacement in this list so >I >>>afraid to just follow random instructions on wiki, and putting this >>>system out of action even temporarily would be very inconvenient. >>> >>>For this filesystem: >>> $ sudo btrfs fi show /dev/sdb7 Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 Total devices 2 FS bytes used 106.23GiB devid1 size 2.71TiB used 126.01GiB path /dev/sda7 devid2 size 2.71TiB used 126.01GiB path /dev/sdb7 $ grep /mnt/data /proc/mounts /dev/sda7 /mnt/data btrfs rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0 $ sudo btrfs fi df /mnt/data Data, RAID1: total=123.00GiB, used=104.57GiB System, RAID1: total=8.00MiB, used=48.00KiB Metadata, RAID1: total=3.00GiB, used=1.67GiB GlobalReserve, single: total=512.00MiB, used=0.00B $ uname -a Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux >>> >>>I've got this in dmesg: >>> [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 action 0x0 [ +0.51] ata6.00: irq_stat 0x4008 [ +0.29] ata6.00: failed command: READ FPDMA QUEUED [ +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag >3 ncq 57344 in res 41/40:00:68:6c:f3/00:00:79:00:00/40 >Emask 0x409 (media error) [ +0.94] ata6.00: status: { DRDY ERR } [ +0.26] ata6.00: error: { UNC } [ +0.001195] ata6.00: configured for UDMA/133 [ +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: >hostbyte=DID_OK driverbyte=DRIVER_SENSE [ +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error [current] [descriptor] [ +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read error - auto reallocate failed [ +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 >00 >>> 79 f3 6c 50 00 00 00 70 00 00 [ +0.03] blk_update_request: I/O error, dev sdb, sector >>>2045996136 [ +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, >>>rd 1, flush 0, corrupt 0, gen 0 [ +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, >>>rd 2, flush 0, corrupt 0, gen 0 [ +0.77] ata6: EH complete >>> >>>There's still 1 in Current_Pending_Sector line of smartctl output as >of >>> >>>now, so it probably won't heal by itself. >>> >>>-- >>> >>>With Best Regards, >>>Marat Khalili >>>-- >>>To unsubscribe from this list: send the line "unsubscribe >linux-btrfs" >>>in >>>the body of a message to majord...@vger.kernel.org >>>More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> To unsubscribe from this list: send the line "unsubscribe >linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >-- >To unsubscribe from this list: send the line "unsubscribe linux-btrfs" >in >the body of a message to majord...@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs check --repair now runs in minutes instead of hours? aborting
On 2017年09月10日 01:44, Marc MERLIN wrote: So, should I assume that btrfs progs git has some issue since there is no plausible way that a check --repair should be faster than a regular check? Yes, the assumption that repair should be no faster than RO check is correct. Especially for clean fs, repair should just behave the same as RO check. And I'll first submit a patch (or patches) to output the consumed time for each tree, so we could have a clue what is going wrong. (Digging the code is just a little too boring for me) Thanks, Qu Thanks, Marc On Tue, Sep 05, 2017 at 07:45:25AM -0700, Marc MERLIN wrote: On Tue, Sep 05, 2017 at 04:05:04PM +0800, Qu Wenruo wrote: gargamel:~# btrfs fi df /mnt/btrfs_pool1 Data, single: total.60TiB, used.54TiB System, DUP: total2.00MiB, used=1.19MiB Metadata, DUP: totalX.00GiB, used.69GiB Wait for a minute. Is that .69GiB means 706 MiB? Or my email client/GMX screwed up the format (again)? This output format must be changed, at least to 0.69 GiB, or 706 MiB. Email client problem. I see control characters in what you quoted. Let's try again gargamel:~# btrfs fi df /mnt/btrfs_pool1 Data, single: total=10.66TiB, used=10.60TiB => 10TB System, DUP: total=64.00MiB, used=1.20MiB=> 1.2MB Metadata, DUP: total=57.50GiB, used=12.76GiB => 13GB GlobalReserve, single: total=512.00MiB, used=0.00B => 0 You mean lowmem is actually FASTER than original mode? That's very surprising. Correct, unless I add --repair and then original mode is 2x faster than lowmem. Is there any special operation done for that btrfs? Like offline dedupe or tons of reflinks? In this case, no. Note that btrfs check used to take many hours overnight until I did a git pull of btrfs progs and built the latest from TOT. BTW, how many subvolumes do you have in the fs? gargamel:/mnt/btrfs_pool1# btrfs subvolume list . | wc -l 91 If I remove snapshots for btrfs send and historical 'backups': gargamel:/mnt/btrfs_pool1# btrfs subvolume list . | grep -Ev '(hourly|daily|weekly|rw|ro)' | wc -l 5 This looks like a bug. My first guess is related to number of subvolumes/reflinks, but I'm not sure since I don't have many real-world btrfs. I'll take sometime to look into it. Thanks for the very interesting report, Thanks for having a look :) Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
Ok mount -o clear_cache, umount and run fsck again just to make sure. Then if it comes out clean mount with ref_verify again and wait for it to blow up again. Thanks, Josef Sent from my iPhone > On Sep 9, 2017, at 10:37 PM, Marc MERLIN wrote: > >> On Sat, Sep 09, 2017 at 10:56:14PM +, Josef Bacik wrote: >> Well that's odd, a block allocated on disk is in the free space cache. Can >> I see the full output of the fsck? I want to make sure it's actually >> getting to the part where it checks the free space cache. If it does then >> I'll have to think of how to catch this kind of bug, because you've got a >> weird one. Thanks, > > Well, btrfs check was clean before, that, but now I returned this: > gargamel:~# time btrfs check /dev/mapper/dshelf1 > Checking filesystem on /dev/mapper/dshelf1 > UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d > checking extents > checking free space cache > Wanted bytes 16384, found 196608 for off 13282417049600 > Wanted bytes 536870912, found 196608 for off 13282417049600 > cache appears valid but isn't 13282417049600 > There is no free space entry for 13849889603584-13849889652736 > There is no free space entry for 13849889603584-13850426474496 > cache appears valid but isn't 13849889603584 > Wanted bytes 5832704, found 81920 for off 13870290698240 > Wanted bytes 536870912, found 81920 for off 13870290698240 > cache appears valid but isn't 13870290698240 > block group 13928272756736 has wrong amount of free space > failed to load free space cache for block group 13928272756736 > Duplicate entries in free space cache > failed to load free space cache for block group 13962095624192 > block group 14003434684416 has wrong amount of free space > failed to load free space cache for block group 14003434684416 > block group 14470042615808 has wrong amount of free space > failed to load free space cache for block group 14470042615808 > block group 14610702794752 has wrong amount of free space > failed to load free space cache for block group 14610702794752 > block group 14612313407488 has wrong amount of free space > failed to load free space cache for block group 14612313407488 > block group 14624661438464 has wrong amount of free space > failed to load free space cache for block group 14624661438464 > block group 14648820629504 has wrong amount of free space > failed to load free space cache for block group 14648820629504 > Wanted offset 14657410793472, found 14657410760704 > Wanted offset 14657410793472, found 14657410760704 > cache appears valid but isn't 14657410564096 > block group 15886844952576 has wrong amount of free space > failed to load free space cache for block group 15886844952576 > There is no free space entry for 15905635434496-15905636499456 > There is no free space entry for 15905635434496-15906172305408 > cache appears valid but isn't 15905635434496 > block group 16542901207040 has wrong amount of free space > failed to load free space cache for block group 16542901207040 > block group 16581019041792 has wrong amount of free space > failed to load free space cache for block group 16581019041792 > block group 16616989392896 has wrong amount of free space > failed to load free space cache for block group 16616989392896 > block group 16676582064128 has wrong amount of free space > failed to load free space cache for block group 16676582064128 > block group 16697520029696 has wrong amount of free space > failed to load free space cache for block group 16697520029696 > block group 16848380755968 has wrong amount of free space > failed to load free space cache for block group 16848380755968 > ERROR: errors found in free space cache > found 11732749766656 bytes used, error(s) found > total csum bytes: 11441478452 > total tree bytes: 13793296384 > total fs tree bytes: 727580672 > total extent tree bytes: 483426304 > btree space waste bytes: 1194373662 > file data blocks allocated: 12133646495744 > referenced 12155707805696 > > real100m12.252s > user0m33.771s > sys 1m11.220s > > Marc > -- > "A mouse is a device used to point at the xterm you want to type in" - A.S.R. > Microsoft is to operating systems > what McDonalds is to gourmet cooking > Home page: > https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=aM1dKUolLxTtIO-Lzj78H4ut4SBtL_PddTteGDuBebc&s=vl4rfHfvogAgd7IHj7J1ZX4Joo9Rwj87HHq-BoldS8k&e= > | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
On Sat, Sep 09, 2017 at 10:56:14PM +, Josef Bacik wrote: > Well that's odd, a block allocated on disk is in the free space cache. Can I > see the full output of the fsck? I want to make sure it's actually getting > to the part where it checks the free space cache. If it does then I'll have > to think of how to catch this kind of bug, because you've got a weird one. > Thanks, Well, btrfs check was clean before, that, but now I returned this: gargamel:~# time btrfs check /dev/mapper/dshelf1 Checking filesystem on /dev/mapper/dshelf1 UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d checking extents checking free space cache Wanted bytes 16384, found 196608 for off 13282417049600 Wanted bytes 536870912, found 196608 for off 13282417049600 cache appears valid but isn't 13282417049600 There is no free space entry for 13849889603584-13849889652736 There is no free space entry for 13849889603584-13850426474496 cache appears valid but isn't 13849889603584 Wanted bytes 5832704, found 81920 for off 13870290698240 Wanted bytes 536870912, found 81920 for off 13870290698240 cache appears valid but isn't 13870290698240 block group 13928272756736 has wrong amount of free space failed to load free space cache for block group 13928272756736 Duplicate entries in free space cache failed to load free space cache for block group 13962095624192 block group 14003434684416 has wrong amount of free space failed to load free space cache for block group 14003434684416 block group 14470042615808 has wrong amount of free space failed to load free space cache for block group 14470042615808 block group 14610702794752 has wrong amount of free space failed to load free space cache for block group 14610702794752 block group 14612313407488 has wrong amount of free space failed to load free space cache for block group 14612313407488 block group 14624661438464 has wrong amount of free space failed to load free space cache for block group 14624661438464 block group 14648820629504 has wrong amount of free space failed to load free space cache for block group 14648820629504 Wanted offset 14657410793472, found 14657410760704 Wanted offset 14657410793472, found 14657410760704 cache appears valid but isn't 14657410564096 block group 15886844952576 has wrong amount of free space failed to load free space cache for block group 15886844952576 There is no free space entry for 15905635434496-15905636499456 There is no free space entry for 15905635434496-15906172305408 cache appears valid but isn't 15905635434496 block group 16542901207040 has wrong amount of free space failed to load free space cache for block group 16542901207040 block group 16581019041792 has wrong amount of free space failed to load free space cache for block group 16581019041792 block group 16616989392896 has wrong amount of free space failed to load free space cache for block group 16616989392896 block group 16676582064128 has wrong amount of free space failed to load free space cache for block group 16676582064128 block group 16697520029696 has wrong amount of free space failed to load free space cache for block group 16697520029696 block group 16848380755968 has wrong amount of free space failed to load free space cache for block group 16848380755968 ERROR: errors found in free space cache found 11732749766656 bytes used, error(s) found total csum bytes: 11441478452 total tree bytes: 13793296384 total fs tree bytes: 727580672 total extent tree bytes: 483426304 btree space waste bytes: 1194373662 file data blocks allocated: 12133646495744 referenced 12155707805696 real100m12.252s user0m33.771s sys 1m11.220s Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regarding handling of file renames in Btrfs
Rohan Kadekodi posted on Sat, 09 Sep 2017 18:50:09 -0500 as excerpted: > Hello, > > I was trying to understand how file renames are handled in Btrfs. I read > the code documentation, but had a problem understanding a few things. > > During a file rename, btrfs_commit_transaction() is called which is > because Btrfs has to commit the whole FS before storing the information > related to the new renamed file. It has to commit the FS because a > rename first does an unlink, which is not recorded in the btrfs_rename() > transaction and so is not logged in the log tree. Is my understanding > correct? If yes, my questions are as follows: I'm not a dev, but am a btrfs user and list regular, and can try my hand at answering... and if I'm wrong, a dev's reply can correct my misconceptions as well. =:^) > 1. What does committing the whole FS mean? Blktrace shows that there are > 2 256KB writes, which are essentially writes to the data of the > root directory of the file system (which I found out through > btrfs-debug-tree). Is this equivalent to doing a shell sync, as the same > block groups are written during a shell sync too? Also, does it imply > that all the metadata held by the log tree is now checkpointed to the > respective trees? A btrfs commit is the equivalent of a *single* filesystem sync, yes. The difference compared to the sync(1) command is that sync applies to all filesystems of all types, not just a single btrfs filesystem. See also the btrfs filesystem sync command (btrfs-filesystem(8) manpage), which applies to a a single btrfs, but also triggers deleted subvolume cleanup. But these are not writes to the /data/ of the root directory. In btrfs, data and metadata are separated, and these are writes to the /metadata/ of the filesystem, including writing a new filesystem top-level (aka root) block and the superblock and its backups. Yes, the log is synced too. But regarding the log, in btrfs, because btrfs is atomic cow-based (copy- on-write), at each commit the filesystem is designed to be entirely self- consistent, with the result being that most actions don't need to be and are not logged. At a crash and later remount, the filesystem as of the last atomically-written root-block state will be mounted, and anything being written at the time of the crash will either have been entirely written and committed (the top-level root tree block will have been updated to reflect it), or that update will not have happened yet, so the state of the filesystem will be that of the last root tree block commit, with newer in-process actions lost. The btrfs log is an exception, a compromise in the interest of fsync speed. The only thing it logs are fsyncs (filesyncs, as opposed to whole filesystem syncs) that would otherwise not return until the next commit (with commits on a 30-second timer by default), since the filesystem would otherwise be unable to guarantee that the fsync had been entirely written to permanent media and thus should survive a crash. The log ensures the fsynced file's new data (if any) is written to its new location on the media (cow so new block location), updates the metadata (also cow so written to a new location), then logs the metadata update so it can be committed at log replay if necessary, and returns. If a crash happens before the next full filesystem atomic commit, the fsync can be replayed from the log, thus satisfying the fsync guarantee without forcing a wait for a full atomic commit. But once that full filesystem atomic commit happens (again, with a 30-second default timeout), all updates are now reflected in the new filesystem state as registered in the new root tree block, and the previous log is now dead/unreferenced on the media (because the new root block doesn't refer to it any longer, referring instead to a new log). > 2. Why are there 2 complete writes to the data held by the root > directory and not just 1? These writes are 256KB each, which is the size > of the extent allocated to the root directory I'm not sure on this one, hopefully a btrfs dev can clarify, but at a guess, you may be seeing writes to the superblock and its backup -- on a large enough filesystem there's two backups, but your filesystem may be small enough to have just one backup. It's also possible you're seeing the new copy of the metadata tree being written out, then the root block and superblocks (and backups) being updated. > 3. Why are the writes being done to the root directory of the file > system / subvolume and not just the parent directory where the unlink > happened? Remember, everything's in trees, and updates are cowed, with updates at lower levels of the tree not reflected in the atomic state of the filesystem until they've recursed up the tree and a new root tree block is written, pointing at the new trees instead of the old ones, with the superblock and backups then updated to point at the new root tree block. So nothing's loc
Regarding handling of file renames in Btrfs
Hello, I was trying to understand how file renames are handled in Btrfs. I read the code documentation, but had a problem understanding a few things. During a file rename, btrfs_commit_transaction() is called which is because Btrfs has to commit the whole FS before storing the information related to the new renamed file. It has to commit the FS because a rename first does an unlink, which is not recorded in the btrfs_rename() transaction and so is not logged in the log tree. Is my understanding correct? If yes, my questions are as follows: 1. What does committing the whole FS mean? Blktrace shows that there are 2 256KB writes, which are essentially writes to the data of the root directory of the file system (which I found out through btrfs-debug-tree). Is this equivalent to doing a shell sync, as the same block groups are written during a shell sync too? Also, does it imply that all the metadata held by the log tree is now checkpointed to the respective trees? 2. Why are there 2 complete writes to the data held by the root directory and not just 1? These writes are 256KB each, which is the size of the extent allocated to the root directory 3. Why are the writes being done to the root directory of the file system / subvolume and not just the parent directory where the unlink happened? It would be great if I could get the answers to these questions. Thanks, Rohan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
Well that's odd, a block allocated on disk is in the free space cache. Can I see the full output of the fsck? I want to make sure it's actually getting to the part where it checks the free space cache. If it does then I'll have to think of how to catch this kind of bug, because you've got a weird one. Thanks, Josef Sent from my iPhone > On Sep 9, 2017, at 2:39 PM, Marc MERLIN wrote: > >> On Tue, Sep 05, 2017 at 06:19:25PM +, Josef Bacik wrote: >> Alright I just reworked the build tree ref stuff and tested it to make sure >> it wasn’t going to give false positives again. Apparently I had only ever >> used this with very basic existing fs’es and nothing super complicated, so >> it was just broken for anything complex. I’ve pushed it to my tree, you can >> just pull and build and try again. This time the stack traces will even >> work! Thanks, > > Ok, so I found out that I just need to copy a bunch of data to the > filesystem to trigger the bug. > > There you go: > [318400.507972] re-allocated a block that still has references to it! > [318400.527517] Dumping block entry [13282417065984 16384], num_refs 2, > metadata 1, from disk 1 > [318400.553751] Ref root 2, parent 0, owner 0, offset 0, num_refs 1 > [318400.573208] Root entry 2, num_refs 1 > [318400.585614] Root entry 7, num_refs 0 > [318400.598028] Ref action 3, root 7, ref_root 7, parent 0, owner 1, offset > 0, num_refs 1 > [318400.623774]btrfs_alloc_tree_block+0x33e/0x3e1 > [318400.639083]__btrfs_cow_block+0xf3/0x420 > [318400.652817]btrfs_cow_block+0xcf/0x145 > [318400.666024]btrfs_search_slot+0x269/0x6de > [318400.680041]btrfs_del_csums+0xac/0x2f9 > [318400.693245]__btrfs_free_extent+0x88b/0xa0b > [318400.707718]__btrfs_run_delayed_refs+0xb4e/0xd20 > [318400.723491]btrfs_run_delayed_refs+0x77/0x1a1 > [318400.738993]btrfs_write_dirty_block_groups+0xf5/0x2c1 > [318400.755994]commit_cowonly_roots+0x1da/0x273 > [318400.770673]btrfs_commit_transaction+0x3dd/0x761 > [318400.786397]transaction_kthread+0xe2/0x178 > [318400.800515]kthread+0xfb/0x100 > [318400.811487]ret_from_fork+0x25/0x30 > [318400.823748]0x > [318400.957574] [ cut here ] > [318400.972498] WARNING: CPU: 2 PID: 3242 at fs/btrfs/extent-tree.c:3015 > btrfs_run_delayed_refs+0xa2/0x1a1 > [318401.001382] Modules linked in: veth ip6table_filter ip6_tables > ebtable_nat ebtables ppdev lp xt_addrtype br_netfilter bridge stp llc tun > autofs4 softdog binfmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace > fscache sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat > xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter lm85 > hwmon_vid pl2303 dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 > nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE > nf_nat_masquerade_ipv4 nf_nat nf_conntrack x_tables sg st snd_pcm_oss > snd_mixer_oss bcache kvm_intel kvm irqbypass snd_hda_codec_realtek > snd_hda_codec_generic snd_cmipci snd_hda_intel snd_mpu401_uart snd_hda_codec > snd_opl3_lib eeepc_wmi snd_hda_core tpm_infineon snd_rawmidi asix asus_wmi > rc_ati_x10 tpm_tis > [318401.218357] snd_seq_device sparse_keymap snd_hwdep tpm_tis_core > ati_remote usbnet parport_pc snd_pcm rfkill pcspkr i915 hwmon tpm parport > rc_core libphy mei_me snd_timer lpc_ich wmi_bmof battery usbserial evdev wmi > input_leds i2c_i801 snd soundcore e1000e ptp pps_core fuse raid456 multipath > mmc_block mmc_core lrw ablk_helper dm_crypt dm_mod async_raid6_recov async_pq > async_xor async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common > pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd xhci_pci ehci_pci > xhci_hcd ehci_hcd mvsas libsas r8169 sata_sil24 usbcore mii > scsi_transport_sas thermal fan [last unloaded: ftdi_sio] > [318401.392440] CPU: 2 PID: 3242 Comm: btrfs-transacti Tainted: G U >4.13.0-rc5-amd64-stkreg-sysrq-20170902d+ #6 > [318401.426262] Hardware name: System manufacturer System Product > Name/P8H67-M PRO, BIOS 3904 04/27/2013 > [318401.454894] task: 948ef791e200 task.stack: b18a091ec000 > [318401.473918] RIP: 0010:btrfs_run_delayed_refs+0xa2/0x1a1 > [318401.490849] RSP: 0018:b18a091efd08 EFLAGS: 00010296 > [318401.507751] RAX: 0026 RBX: 9488208be618 RCX: > > [318401.530384] RDX: 948f1e295e01 RSI: 948f1e28dd58 RDI: > 948f1e28dd58 > [318401.553548] RBP: b18a091efd50 R08: 0003dc12ea8bcc57 R09: > 948f1f50b868 > [318401.576127] R10: 948b1f1cc460 R11: aef37285 R12: > ffef > [318401.598717] R13: R14: 948edb7efd48 R15: > 948cdbdeb000 > [318401.621327] FS: () GS:948f1e28() > knlGS: > [318401.646737] CS: 0010 DS: ES: CR0: 80050033 > [318401.665149] CR2: f7f05001 CR3: 00061f587000 CR4: >
Re: netapp-alike snapshots?
On Sat 2017-09-09 (22:43), Andrei Borzenkov wrote: > > Your tool does not create .snapshot subdirectories in EVERY directory like > > Neither does NetApp. Those "directories" are magic handles that do not > really exist. I know. But symbolic links are the next close thing (I am not a kernel programmer). > Apart from obvious problem with recursive directory traversal (NetApp > .snapshot are not visible with normal directory list) Yes, they are, at least sometimes, eg tar includes the snapshots. -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de Allmandring 30aTel:++49-711-68565868 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ REF:<14c87878-a5a0-d7d3-4a76-c55812e75...@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netapp-alike snapshots?
09.09.2017 16:44, Ulli Horlacher пишет: > > Your tool does not create .snapshot subdirectories in EVERY directory like Neither does NetApp. Those "directories" are magic handles that do not really exist. > Netapp does. > Example: > > framstag@fex:~: cd ~/Mail/.snapshot/ > framstag@fex:~/Mail/.snapshot: l > lR-X - 2017-09-09 09:55 2017-09-09_.daily -> > /local/home/.snapshot/2017-09-09_.daily/framstag/Mail Apart from obvious problem with recursive directory traversal (NetApp .snapshot are not visible with normal directory list) those will also be captured in snapshots and cannot be removed. NetApp snapshots themselves do not expose .snapshot "directories". > lR-X - 2017-09-09 14:00 2017-09-09_1400.hourly -> > /local/home/.snapshot/2017-09-09_1400.hourly/framstag/Mail > lR-X - 2017-09-09 15:00 2017-09-09_1500.hourly -> > /local/home/.snapshot/2017-09-09_1500.hourly/framstag/Mail > lR-X - 2017-09-09 15:18 2017-09-09_1518.single -> > /local/home/.snapshot/2017-09-09_1518.single/framstag/Mail > lR-X - 2017-09-09 15:20 2017-09-09_1520.single -> > /local/home/.snapshot/2017-09-09_1520.single/framstag/Mail > lR-X - 2017-09-09 15:22 2017-09-09_1522.single -> > /local/home/.snapshot/2017-09-09_1522.single/framstag/Mail > > My users (and I) need snapshots in this way. > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
On Tue, Sep 05, 2017 at 06:19:25PM +, Josef Bacik wrote: > Alright I just reworked the build tree ref stuff and tested it to make sure > it wasn’t going to give false positives again. Apparently I had only ever > used this with very basic existing fs’es and nothing super complicated, so it > was just broken for anything complex. I’ve pushed it to my tree, you can > just pull and build and try again. This time the stack traces will even > work! Thanks, Ok, so I found out that I just need to copy a bunch of data to the filesystem to trigger the bug. There you go: [318400.507972] re-allocated a block that still has references to it! [318400.527517] Dumping block entry [13282417065984 16384], num_refs 2, metadata 1, from disk 1 [318400.553751] Ref root 2, parent 0, owner 0, offset 0, num_refs 1 [318400.573208] Root entry 2, num_refs 1 [318400.585614] Root entry 7, num_refs 0 [318400.598028] Ref action 3, root 7, ref_root 7, parent 0, owner 1, offset 0, num_refs 1 [318400.623774]btrfs_alloc_tree_block+0x33e/0x3e1 [318400.639083]__btrfs_cow_block+0xf3/0x420 [318400.652817]btrfs_cow_block+0xcf/0x145 [318400.666024]btrfs_search_slot+0x269/0x6de [318400.680041]btrfs_del_csums+0xac/0x2f9 [318400.693245]__btrfs_free_extent+0x88b/0xa0b [318400.707718]__btrfs_run_delayed_refs+0xb4e/0xd20 [318400.723491]btrfs_run_delayed_refs+0x77/0x1a1 [318400.738993]btrfs_write_dirty_block_groups+0xf5/0x2c1 [318400.755994]commit_cowonly_roots+0x1da/0x273 [318400.770673]btrfs_commit_transaction+0x3dd/0x761 [318400.786397]transaction_kthread+0xe2/0x178 [318400.800515]kthread+0xfb/0x100 [318400.811487]ret_from_fork+0x25/0x30 [318400.823748]0x [318400.957574] [ cut here ] [318400.972498] WARNING: CPU: 2 PID: 3242 at fs/btrfs/extent-tree.c:3015 btrfs_run_delayed_refs+0xa2/0x1a1 [318401.001382] Modules linked in: veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype br_netfilter bridge stp llc tun autofs4 softdog binfmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter lm85 hwmon_vid pl2303 dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_cmipci snd_hda_intel snd_mpu401_uart snd_hda_codec snd_opl3_lib eeepc_wmi snd_hda_core tpm_infineon snd_rawmidi asix asus_wmi rc_ati_x10 tpm_tis [318401.218357] snd_seq_device sparse_keymap snd_hwdep tpm_tis_core ati_remote usbnet parport_pc snd_pcm rfkill pcspkr i915 hwmon tpm parport rc_core libphy mei_me snd_timer lpc_ich wmi_bmof battery usbserial evdev wmi input_leds i2c_i801 snd soundcore e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core lrw ablk_helper dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd xhci_pci ehci_pci xhci_hcd ehci_hcd mvsas libsas r8169 sata_sil24 usbcore mii scsi_transport_sas thermal fan [last unloaded: ftdi_sio] [318401.392440] CPU: 2 PID: 3242 Comm: btrfs-transacti Tainted: G U 4.13.0-rc5-amd64-stkreg-sysrq-20170902d+ #6 [318401.426262] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013 [318401.454894] task: 948ef791e200 task.stack: b18a091ec000 [318401.473918] RIP: 0010:btrfs_run_delayed_refs+0xa2/0x1a1 [318401.490849] RSP: 0018:b18a091efd08 EFLAGS: 00010296 [318401.507751] RAX: 0026 RBX: 9488208be618 RCX: [318401.530384] RDX: 948f1e295e01 RSI: 948f1e28dd58 RDI: 948f1e28dd58 [318401.553548] RBP: b18a091efd50 R08: 0003dc12ea8bcc57 R09: 948f1f50b868 [318401.576127] R10: 948b1f1cc460 R11: aef37285 R12: ffef [318401.598717] R13: R14: 948edb7efd48 R15: 948cdbdeb000 [318401.621327] FS: () GS:948f1e28() knlGS: [318401.646737] CS: 0010 DS: ES: CR0: 80050033 [318401.665149] CR2: f7f05001 CR3: 00061f587000 CR4: 001406e0 [318401.687684] Call Trace: [318401.696148] btrfs_write_dirty_block_groups+0xf5/0x2c1 [318401.712745] ? btrfs_run_delayed_refs+0x127/0x1a1 [318401.727981] commit_cowonly_roots+0x1da/0x273 [318401.742183] btrfs_commit_transaction+0x3dd/0x761 [318401.757447] transaction_kthread+0xe2/0x178 [318401.771158] ? btrfs_cleanup_transaction+0x3c2/0x3c2 [318401.787169] kthread+0xfb/0x100 [318401.797769] ? init_completion+0x24/0x24 [318401.810718] ret_from_fork+0x25/0x30 [318401.822588] Code: 85 c0 41 89 c4 79 60 48 8b 43 60 f0 0f ba a8 d8 16
Re: btrfs check --repair now runs in minutes instead of hours? aborting
So, should I assume that btrfs progs git has some issue since there is no plausible way that a check --repair should be faster than a regular check? Thanks, Marc On Tue, Sep 05, 2017 at 07:45:25AM -0700, Marc MERLIN wrote: > On Tue, Sep 05, 2017 at 04:05:04PM +0800, Qu Wenruo wrote: > > > gargamel:~# btrfs fi df /mnt/btrfs_pool1 > > > Data, single: total.60TiB, used.54TiB > > > System, DUP: total2.00MiB, used=1.19MiB > > > Metadata, DUP: totalX.00GiB, used.69GiB > > > > Wait for a minute. > > > > Is that .69GiB means 706 MiB? Or my email client/GMX screwed up the format > > (again)? > > This output format must be changed, at least to 0.69 GiB, or 706 MiB. > > Email client problem. I see control characters in what you quoted. > > Let's try again > gargamel:~# btrfs fi df /mnt/btrfs_pool1 > Data, single: total=10.66TiB, used=10.60TiB => 10TB > System, DUP: total=64.00MiB, used=1.20MiB=> 1.2MB > Metadata, DUP: total=57.50GiB, used=12.76GiB => 13GB > GlobalReserve, single: total=512.00MiB, used=0.00B => 0 > > > You mean lowmem is actually FASTER than original mode? > > That's very surprising. > > Correct, unless I add --repair and then original mode is 2x faster than > lowmem. > > > Is there any special operation done for that btrfs? > > Like offline dedupe or tons of reflinks? > > In this case, no. > Note that btrfs check used to take many hours overnight until I did a > git pull of btrfs progs and built the latest from TOT. > > > BTW, how many subvolumes do you have in the fs? > > gargamel:/mnt/btrfs_pool1# btrfs subvolume list . | wc -l > 91 > > If I remove snapshots for btrfs send and historical 'backups': > gargamel:/mnt/btrfs_pool1# btrfs subvolume list . | grep -Ev > '(hourly|daily|weekly|rw|ro)' | wc -l > 5 > > > This looks like a bug. My first guess is related to number of > > subvolumes/reflinks, but I'm not sure since I don't have many real-world > > btrfs. > > > > I'll take sometime to look into it. > > > > Thanks for the very interesting report, > > Thanks for having a look :) > > Marc > -- > "A mouse is a device used to point at the xterm you want to type in" - A.S.R. > Microsoft is to operating systems > what McDonalds is to gourmet > cooking > Home page: http://marc.merlins.org/ | PGP > 1024R/763BE901 > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Clean up dead code in root-tree
The value of variable 'can_recover' is never used after being set, thus it should be removed. Signed-off-by: Christos Gkekas --- fs/btrfs/root-tree.c | 4 1 file changed, 4 deletions(-) diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c index 95bcc3c..3338407 100644 --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -226,10 +226,6 @@ int btrfs_find_orphan_roots(struct btrfs_fs_info *fs_info) struct btrfs_root *root; int err = 0; int ret; - bool can_recover = true; - - if (sb_rdonly(fs_info->sb)) - can_recover = false; path = btrfs_alloc_path(); if (!path) -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netapp-alike snapshots?
On Sat 2017-09-09 (06:36), Marc MERLIN wrote: > > On Tue 2017-08-22 (15:22), Ulli Horlacher wrote: > > > With Netapp/waffle you have automatic hourly/daily/weekly snapshots. > > > You can find these snapshots in every local directory (readonly). > > > > I have found none, so I have implemented it by myself: > > > > https://fex.rus.uni-stuttgart.de/snaprotate.html > > Not sure how you looked :) > http://marc.merlins.org/perso/btrfs/post_2014-03-21_Btrfs-Tips_-How-To-Setup-Netapp-Style-Snapshots.html > > Might not be exactly what you wanted, but been using it for 3 years. Your tool does not create .snapshot subdirectories in EVERY directory like Netapp does. Example: framstag@fex:~: cd ~/Mail/.snapshot/ framstag@fex:~/Mail/.snapshot: l lR-X - 2017-09-09 09:55 2017-09-09_.daily -> /local/home/.snapshot/2017-09-09_.daily/framstag/Mail lR-X - 2017-09-09 14:00 2017-09-09_1400.hourly -> /local/home/.snapshot/2017-09-09_1400.hourly/framstag/Mail lR-X - 2017-09-09 15:00 2017-09-09_1500.hourly -> /local/home/.snapshot/2017-09-09_1500.hourly/framstag/Mail lR-X - 2017-09-09 15:18 2017-09-09_1518.single -> /local/home/.snapshot/2017-09-09_1518.single/framstag/Mail lR-X - 2017-09-09 15:20 2017-09-09_1520.single -> /local/home/.snapshot/2017-09-09_1520.single/framstag/Mail lR-X - 2017-09-09 15:22 2017-09-09_1522.single -> /local/home/.snapshot/2017-09-09_1522.single/framstag/Mail My users (and I) need snapshots in this way. -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de Allmandring 30aTel:++49-711-68565868 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ REF:<20170909133612.7iqwr6cbjxzvf...@merlins.org> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netapp-alike snapshots?
On Sat, Sep 09, 2017 at 03:26:14PM +0200, Ulli Horlacher wrote: > On Tue 2017-08-22 (15:22), Ulli Horlacher wrote: > > With Netapp/waffle you have automatic hourly/daily/weekly snapshots. > > You can find these snapshots in every local directory (readonly). > > > I would like to have something similar with btrfs. > > Is there (where?) such a tool? > > I have found none, so I have implemented it by myself: > > https://fex.rus.uni-stuttgart.de/snaprotate.html Not sure how you looked :) https://www.google.com/search?q=btrfs+netapp+snapshot http://marc.merlins.org/perso/btrfs/post_2014-03-21_Btrfs-Tips_-How-To-Setup-Netapp-Style-Snapshots.html Might not be exactly what you wanted, but been using it for 3 years. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netapp-alike snapshots?
On Tue 2017-08-22 (15:22), Ulli Horlacher wrote: > With Netapp/waffle you have automatic hourly/daily/weekly snapshots. > You can find these snapshots in every local directory (readonly). > I would like to have something similar with btrfs. > Is there (where?) such a tool? I have found none, so I have implemented it by myself: https://fex.rus.uni-stuttgart.de/snaprotate.html In contrast to Netapp, with snaprotate the local host administrator can create a snapshot at any time or by cronjob. Example: root@fex:~# snaprotate single 3 /local/home Create a readonly snapshot of '/local/home' in '/local/home/.snapshot/2017-09-09_1518.single' Delete subvolume '/local/home/.snapshot/2017-09-09_1255.single' root@fex:~# snaprotate -l /local/home/.snapshot/2017-09-08_.daily /local/home/.snapshot/2017-09-09_.daily /local/home/.snapshot/2017-09-09_1331.single /local/home/.snapshot/2017-09-09_1332.single /local/home/.snapshot/2017-09-09_1400.hourly /local/home/.snapshot/2017-09-09_1500.hourly /local/home/.snapshot/2017-09-09_1518.single root@fex:~# crontab -l | grep snaprotate 0 * * * * /root/bin/snaprotate -q hourly 2 /local/home 0 0 * * * /root/bin/snaprotate -q daily 3 /local/home 0 0 * * 1 /root/bin/snaprotate -q weekly 1 /local/home -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de Allmandring 30aTel:++49-711-68565868 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ REF:<20170822132208.gd14...@rus.uni-stuttgart.de> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
Patrik Lundquist posted on Sat, 09 Sep 2017 12:29:08 +0200 as excerpted: > On 9 September 2017 at 12:05, Marat Khalili wrote: >> Forgot to add, I've got a spare empty bay if it can be useful here. > > That makes it much easier since you don't have to mount it degraded, > with the risks involved. > > Add and partition the disk. > > # btrfs replace start /dev/sdb7 /dev/sdc(?)7 /mnt/data > > Remove the old disk when it is done. I did this with my dozen-plus (but small) btrfs raid1s on ssd partitions several kernel cycles ago. It went very smoothly. =:^) (TL;DR can stop there.) I had actually been taking advantage of btrfs raid1's checksumming and scrub ability to continue running a failing ssd, with more and more sectors going bad and being replaced from spares, for quite some time after I'd have otherwise replaced it. Everything of value was backed up, and I was simply doing it for the experience with both btrfs raid1 scrubbing and continuing ssd sector failure. But eventually the scrubs were finding and fixing errors every boot, especially when off for several hours, and further experience was of diminishing value while the hassle factor was building fast, so I attached the spare ssd, partitioned it up, did a final scrub on all the btrfs, and then one btrfs at a time btrfs replaced the devices from the old ssd's partitions to the new one's partitions. Given that I was already used to running scrubs at every boot, the entirely uneventful replacements were actually somewhat anticlimactic, but that was a good thing! =:^) Then more recently I bought a larger/newer pair of ssds (1 TB each, the old ones were quarter TB each) and converted my media partitions and secondary backups, which had still been on reiserfs on spinning rust, to btrfs raid1 on ssd as well, making me all-btrfs on all-ssd now, with everything but /boot and its backups on the other ssds being btrfs raid1, and /boot and its backups being btrfs dup. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: generic name for volume and subvolume root?
On 09/09/2017 01:06 PM, Hugo Mills wrote: > On Sat, Sep 09, 2017 at 06:58:38PM +0800, Qu Wenruo wrote: >> >> >> On 2017年09月09日 18:48, Ulli Horlacher wrote: >>> On Sat 2017-09-09 (18:40), Qu Wenruo wrote: >>> > Is there a generic name for both volume and subvolume root? Nope, subvolume (including snapshot) is not distinguished by its filename/path/directory name. And you can only do snapshot on subvolume (snapshot is one kind of subvolume) boundary. >>> >>> So, I can name a btrfs root volume also btrfs subvolume? >> >> Yes, root volume is also a subvolume, so just call "btrfs root volume" >> a "subvolume". > >I find it's best to avoid the word "root" entirely, as it's got > several meanings, and it tends to get confusing in conversation. > Instead, we have: > > - "the top level" (subvolid=5) > - "/" (what you see at / in your running system) > - "/@" or similar names >(the subvolume that's mounted at /) > >>> I am talking about documentation, not coding! >>> >>> I just want yo use the correct terms. >> >> If you're referring to the term, I think subvolume is good enough. >> Which represents your original term, "directories one can snapshot". >> >> >> For the whole btrfs "volume", I would just call it "filesystem" to >> avoid the name "volume" or "subvolume" at all. > >Yes, it's a filesystem. (Although that does occasionally cause > confusion between "the conceptual filesystem implemented by btrfs.ko" > and "the concrete filesystem stored on /dev/sda1", but it's generally > far less confusing than the overloading of "root"). Yes, because every subvolume is a filesystem root! :-D https://i.imgur.com/2VzmC.gif -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: generic name for volume and subvolume root?
On Sat, Sep 09, 2017 at 06:58:38PM +0800, Qu Wenruo wrote: > > > On 2017年09月09日 18:48, Ulli Horlacher wrote: > >On Sat 2017-09-09 (18:40), Qu Wenruo wrote: > > > >>>Is there a generic name for both volume and subvolume root? > >> > >>Nope, subvolume (including snapshot) is not distinguished by its > >>filename/path/directory name. > >> > >>And you can only do snapshot on subvolume (snapshot is one kind of > >>subvolume) boundary. > > > >So, I can name a btrfs root volume also btrfs subvolume? > > Yes, root volume is also a subvolume, so just call "btrfs root volume" > a "subvolume". I find it's best to avoid the word "root" entirely, as it's got several meanings, and it tends to get confusing in conversation. Instead, we have: - "the top level" (subvolid=5) - "/" (what you see at / in your running system) - "/@" or similar names (the subvolume that's mounted at /) > >I am talking about documentation, not coding! > > > >I just want yo use the correct terms. > > If you're referring to the term, I think subvolume is good enough. > Which represents your original term, "directories one can snapshot". > > > For the whole btrfs "volume", I would just call it "filesystem" to > avoid the name "volume" or "subvolume" at all. Yes, it's a filesystem. (Although that does occasionally cause confusion between "the conceptual filesystem implemented by btrfs.ko" and "the concrete filesystem stored on /dev/sda1", but it's generally far less confusing than the overloading of "root"). Hugo. -- Hugo Mills | Well, you don't get to be a kernel hacker simply by hugo@... carfax.org.uk | looking good in Speedos. http://carfax.org.uk/ | PGP: E2AB1DE4 | Rusty Russell signature.asc Description: Digital signature
Re: generic name for volume and subvolume root?
On Sat, Sep 09, 2017 at 10:35:51AM +0200, Ulli Horlacher wrote: > As I am writing some documentation abount creating snapshots: > Is there a generic name for both volume and subvolume root? > > Example: > > root@fex:~# btrfs subvol show /mnt > ERROR: not a subvolume: /mnt > > root@fex:~# btrfs subvol show /mnt/test > /mnt/test is toplevel subvolume > > root@fex:~# btrfs subvol show /mnt/test/data > /mnt/test/data > Name: data > UUID: b32a5949-dfd6-ef45-8616-34ae4cdf6fb8 > (...) > > root@fex:~# btrfs subvol show /mnt/test/data/sw > ERROR: not a subvolume: /mnt/test/data/sw > > > I can create snapshots of /mnt/test and /mnt/test/data, but not of /mnt > and /mnt/test/data/sw > > Is there a simple name for directories I can snapshot? Subvolume. If you can snapshot it, it's a subvolume. Some subvolumes are also snapshots. (And all snapshots are subvolumes). The subvolume with ID 5 (or ID 0, which is an alias) is the "top level subvolume", and has the unique property that it can't be renamed, deleted or replaced, where all other subvolumes can be. Hugo. -- Hugo Mills | Well, you don't get to be a kernel hacker simply by hugo@... carfax.org.uk | looking good in Speedos. http://carfax.org.uk/ | PGP: E2AB1DE4 | Rusty Russell signature.asc Description: Digital signature
Re: generic name for volume and subvolume root?
On 2017年09月09日 18:48, Ulli Horlacher wrote: On Sat 2017-09-09 (18:40), Qu Wenruo wrote: Is there a generic name for both volume and subvolume root? Nope, subvolume (including snapshot) is not distinguished by its filename/path/directory name. And you can only do snapshot on subvolume (snapshot is one kind of subvolume) boundary. So, I can name a btrfs root volume also btrfs subvolume? Yes, root volume is also a subvolume, so just call "btrfs root volume" a "subvolume". I am talking about documentation, not coding! I just want yo use the correct terms. If you're referring to the term, I think subvolume is good enough. Which represents your original term, "directories one can snapshot". For the whole btrfs "volume", I would just call it "filesystem" to avoid the name "volume" or "subvolume" at all. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: generic name for volume and subvolume root?
On Sat 2017-09-09 (18:40), Qu Wenruo wrote: > > Is there a generic name for both volume and subvolume root? > > Nope, subvolume (including snapshot) is not distinguished by its > filename/path/directory name. > > And you can only do snapshot on subvolume (snapshot is one kind of > subvolume) boundary. So, I can name a btrfs root volume also btrfs subvolume? I am talking about documentation, not coding! I just want yo use the correct terms. -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de Allmandring 30aTel:++49-711-68565868 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ REF:<48008a58-a82e-d9f7-327e-eeb905e18...@gmx.com> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: generic name for volume and subvolume root?
On 2017年09月09日 16:35, Ulli Horlacher wrote: As I am writing some documentation abount creating snapshots: Is there a generic name for both volume and subvolume root? Example: root@fex:~# btrfs subvol show /mnt ERROR: not a subvolume: /mnt root@fex:~# btrfs subvol show /mnt/test /mnt/test is toplevel subvolume root@fex:~# btrfs subvol show /mnt/test/data /mnt/test/data Name: data UUID: b32a5949-dfd6-ef45-8616-34ae4cdf6fb8 (...) root@fex:~# btrfs subvol show /mnt/test/data/sw ERROR: not a subvolume: /mnt/test/data/sw I can create snapshots of /mnt/test and /mnt/test/data, but not of /mnt and /mnt/test/data/sw Is there a simple name for directories I can snapshot? Nope, subvolume (including snapshot) is not distinguished by its filename/path/directory name. And you can only do snapshot on subvolume (snapshot is one kind of subvolume) boundary. For user to determine where is the subvolume boundary, one should first determine there is the btrfs mounted and then use "btrfs subvol show" to determine the boundaries. Or, in a btrfs test the directory inode number. Subvolume/snapshot in btrfs will always have the same inode number 256, and regular file/directories/special files will not use that magic number. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
On 9 September 2017 at 12:05, Marat Khalili wrote: > Forgot to add, I've got a spare empty bay if it can be useful here. That makes it much easier since you don't have to mount it degraded, with the risks involved. Add and partition the disk. # btrfs replace start /dev/sdb7 /dev/sdc(?)7 /mnt/data Remove the old disk when it is done. > -- > > With Best Regards, > Marat Khalili > > On September 9, 2017 10:46:10 AM GMT+03:00, Marat Khalili wrote: >>Dear list, >> >>I'm going to replace one hard drive (partition actually) of a btrfs >>raid1. Can you please spell exactly what I need to do in order to get >>my >>filesystem working as RAID1 again after replacement, exactly as it was >>before? I saw some bad examples of drive replacement in this list so I >>afraid to just follow random instructions on wiki, and putting this >>system out of action even temporarily would be very inconvenient. >> >>For this filesystem: >> >>> $ sudo btrfs fi show /dev/sdb7 >>> Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 >>> Total devices 2 FS bytes used 106.23GiB >>> devid1 size 2.71TiB used 126.01GiB path /dev/sda7 >>> devid2 size 2.71TiB used 126.01GiB path /dev/sdb7 >>> $ grep /mnt/data /proc/mounts >>> /dev/sda7 /mnt/data btrfs >>> rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0 >>> $ sudo btrfs fi df /mnt/data >>> Data, RAID1: total=123.00GiB, used=104.57GiB >>> System, RAID1: total=8.00MiB, used=48.00KiB >>> Metadata, RAID1: total=3.00GiB, used=1.67GiB >>> GlobalReserve, single: total=512.00MiB, used=0.00B >>> $ uname -a >>> Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC >>> 2017 x86_64 x86_64 x86_64 GNU/Linux >> >>I've got this in dmesg: >> >>> [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 >>> action 0x0 >>> [ +0.51] ata6.00: irq_stat 0x4008 >>> [ +0.29] ata6.00: failed command: READ FPDMA QUEUED >>> [ +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag 3 >>> ncq 57344 in >>>res 41/40:00:68:6c:f3/00:00:79:00:00/40 Emask >>> 0x409 (media error) >>> [ +0.94] ata6.00: status: { DRDY ERR } >>> [ +0.26] ata6.00: error: { UNC } >>> [ +0.001195] ata6.00: configured for UDMA/133 >>> [ +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_OK >>> driverbyte=DRIVER_SENSE >>> [ +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error >>> [current] [descriptor] >>> [ +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read >>> error - auto reallocate failed >>> [ +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 >> >>> 79 f3 6c 50 00 00 00 70 00 00 >>> [ +0.03] blk_update_request: I/O error, dev sdb, sector >>2045996136 >>> [ +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, >>rd >>> 1, flush 0, corrupt 0, gen 0 >>> [ +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, >>rd >>> 2, flush 0, corrupt 0, gen 0 >>> [ +0.77] ata6: EH complete >> >>There's still 1 in Current_Pending_Sector line of smartctl output as of >> >>now, so it probably won't heal by itself. >> >>-- >> >>With Best Regards, >>Marat Khalili >>-- >>To unsubscribe from this list: send the line "unsubscribe linux-btrfs" >>in >>the body of a message to majord...@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
Forgot to add, I've got a spare empty bay if it can be useful here. -- With Best Regards, Marat Khalili On September 9, 2017 10:46:10 AM GMT+03:00, Marat Khalili wrote: >Dear list, > >I'm going to replace one hard drive (partition actually) of a btrfs >raid1. Can you please spell exactly what I need to do in order to get >my >filesystem working as RAID1 again after replacement, exactly as it was >before? I saw some bad examples of drive replacement in this list so I >afraid to just follow random instructions on wiki, and putting this >system out of action even temporarily would be very inconvenient. > >For this filesystem: > >> $ sudo btrfs fi show /dev/sdb7 >> Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 >> Total devices 2 FS bytes used 106.23GiB >> devid1 size 2.71TiB used 126.01GiB path /dev/sda7 >> devid2 size 2.71TiB used 126.01GiB path /dev/sdb7 >> $ grep /mnt/data /proc/mounts >> /dev/sda7 /mnt/data btrfs >> rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0 >> $ sudo btrfs fi df /mnt/data >> Data, RAID1: total=123.00GiB, used=104.57GiB >> System, RAID1: total=8.00MiB, used=48.00KiB >> Metadata, RAID1: total=3.00GiB, used=1.67GiB >> GlobalReserve, single: total=512.00MiB, used=0.00B >> $ uname -a >> Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC >> 2017 x86_64 x86_64 x86_64 GNU/Linux > >I've got this in dmesg: > >> [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 >> action 0x0 >> [ +0.51] ata6.00: irq_stat 0x4008 >> [ +0.29] ata6.00: failed command: READ FPDMA QUEUED >> [ +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag 3 >> ncq 57344 in >>res 41/40:00:68:6c:f3/00:00:79:00:00/40 Emask >> 0x409 (media error) >> [ +0.94] ata6.00: status: { DRDY ERR } >> [ +0.26] ata6.00: error: { UNC } >> [ +0.001195] ata6.00: configured for UDMA/133 >> [ +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_OK >> driverbyte=DRIVER_SENSE >> [ +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error >> [current] [descriptor] >> [ +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read >> error - auto reallocate failed >> [ +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 > >> 79 f3 6c 50 00 00 00 70 00 00 >> [ +0.03] blk_update_request: I/O error, dev sdb, sector >2045996136 >> [ +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, >rd >> 1, flush 0, corrupt 0, gen 0 >> [ +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, >rd >> 2, flush 0, corrupt 0, gen 0 >> [ +0.77] ata6: EH complete > >There's still 1 in Current_Pending_Sector line of smartctl output as of > >now, so it probably won't heal by itself. > >-- > >With Best Regards, >Marat Khalili >-- >To unsubscribe from this list: send the line "unsubscribe linux-btrfs" >in >the body of a message to majord...@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please help with exact actions for raid1 hot-swap
On 9 September 2017 at 09:46, Marat Khalili wrote: > > Dear list, > > I'm going to replace one hard drive (partition actually) of a btrfs raid1. > Can you please spell exactly what I need to do in order to get my filesystem > working as RAID1 again after replacement, exactly as it was before? I saw > some bad examples of drive replacement in this list so I afraid to just > follow random instructions on wiki, and putting this system out of action > even temporarily would be very inconvenient. I recently replaced both disks in a two disk Btrfs raid1 to increase capacity and took some notes. Using systemd? systemd will automatically unmount a degraded disk and ruin your one chance to replace the disk as long as Btrfs has the bug where it notes single chunks and one disk missing and refuses to mount degraded again. Comment out your mount in fstab and run "systemctl daemon-reload". The mount file in /var/run/systemd/generator/ will be removed. (Is there a better way?) Unmount the volume. # hdparm -Y /dev/sdb # echo 1 > /sys/block/sdb/device/delete Replace the disk. Create partitions etc. You might have to restart smartd, if using it. Make Btrfs forget the old device. Will otherwise think the old disk is still there. (Is there a better way?) # rmmod btrfs; modprobe btrfs # btrfs device scan # mount -o degraded /dev/sda7 /mnt/data # btrfs device usage /mnt/data # btrfs replace start /dev/sdbX /mnt/data # btrfs replace status /mnt/data Convert single or dup chunks to raid1 # btrfs balance start -fv -dconvert=raid1,soft -mconvert=raid1,soft -sconvert=raid1,soft /mnt/data Unmount, restore fstab, reload systemd again, mount. > > For this filesystem: > >> $ sudo btrfs fi show /dev/sdb7 >> Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 >> Total devices 2 FS bytes used 106.23GiB >> devid1 size 2.71TiB used 126.01GiB path /dev/sda7 >> devid2 size 2.71TiB used 126.01GiB path /dev/sdb7 >> $ grep /mnt/data /proc/mounts >> /dev/sda7 /mnt/data btrfs >> rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0 >> $ sudo btrfs fi df /mnt/data >> Data, RAID1: total=123.00GiB, used=104.57GiB >> System, RAID1: total=8.00MiB, used=48.00KiB >> Metadata, RAID1: total=3.00GiB, used=1.67GiB >> GlobalReserve, single: total=512.00MiB, used=0.00B >> $ uname -a >> Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 >> x86_64 x86_64 x86_64 GNU/Linux > > > I've got this in dmesg: > >> [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 action >> 0x0 >> [ +0.51] ata6.00: irq_stat 0x4008 >> [ +0.29] ata6.00: failed command: READ FPDMA QUEUED >> [ +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag 3 ncq >> 57344 in >>res 41/40:00:68:6c:f3/00:00:79:00:00/40 Emask 0x409 >> (media error) >> [ +0.94] ata6.00: status: { DRDY ERR } >> [ +0.26] ata6.00: error: { UNC } >> [ +0.001195] ata6.00: configured for UDMA/133 >> [ +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_OK >> driverbyte=DRIVER_SENSE >> [ +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error [current] >> [descriptor] >> [ +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read error - >> auto reallocate failed >> [ +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 79 f3 >> 6c 50 00 00 00 70 00 00 >> [ +0.03] blk_update_request: I/O error, dev sdb, sector 2045996136 >> [ +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd 1, >> flush 0, corrupt 0, gen 0 >> [ +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd 2, >> flush 0, corrupt 0, gen 0 >> [ +0.77] ata6: EH complete > > > There's still 1 in Current_Pending_Sector line of smartctl output as of now, > so it probably won't heal by itself. > > -- > > With Best Regards, > Marat Khalili > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
generic name for volume and subvolume root?
As I am writing some documentation abount creating snapshots: Is there a generic name for both volume and subvolume root? Example: root@fex:~# btrfs subvol show /mnt ERROR: not a subvolume: /mnt root@fex:~# btrfs subvol show /mnt/test /mnt/test is toplevel subvolume root@fex:~# btrfs subvol show /mnt/test/data /mnt/test/data Name: data UUID: b32a5949-dfd6-ef45-8616-34ae4cdf6fb8 (...) root@fex:~# btrfs subvol show /mnt/test/data/sw ERROR: not a subvolume: /mnt/test/data/sw I can create snapshots of /mnt/test and /mnt/test/data, but not of /mnt and /mnt/test/data/sw Is there a simple name for directories I can snapshot? -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum TIK Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de Allmandring 30aTel:++49-711-68565868 70569 Stuttgart (Germany) WWW:http://www.tik.uni-stuttgart.de/ REF:<20170909083551.gc22...@rus.uni-stuttgart.de> -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Please help with exact actions for raid1 hot-swap
Dear list, I'm going to replace one hard drive (partition actually) of a btrfs raid1. Can you please spell exactly what I need to do in order to get my filesystem working as RAID1 again after replacement, exactly as it was before? I saw some bad examples of drive replacement in this list so I afraid to just follow random instructions on wiki, and putting this system out of action even temporarily would be very inconvenient. For this filesystem: $ sudo btrfs fi show /dev/sdb7 Label: 'data' uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0 Total devices 2 FS bytes used 106.23GiB devid1 size 2.71TiB used 126.01GiB path /dev/sda7 devid2 size 2.71TiB used 126.01GiB path /dev/sdb7 $ grep /mnt/data /proc/mounts /dev/sda7 /mnt/data btrfs rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0 $ sudo btrfs fi df /mnt/data Data, RAID1: total=123.00GiB, used=104.57GiB System, RAID1: total=8.00MiB, used=48.00KiB Metadata, RAID1: total=3.00GiB, used=1.67GiB GlobalReserve, single: total=512.00MiB, used=0.00B $ uname -a Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux I've got this in dmesg: [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 action 0x0 [ +0.51] ata6.00: irq_stat 0x4008 [ +0.29] ata6.00: failed command: READ FPDMA QUEUED [ +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag 3 ncq 57344 in res 41/40:00:68:6c:f3/00:00:79:00:00/40 Emask 0x409 (media error) [ +0.94] ata6.00: status: { DRDY ERR } [ +0.26] ata6.00: error: { UNC } [ +0.001195] ata6.00: configured for UDMA/133 [ +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error [current] [descriptor] [ +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read error - auto reallocate failed [ +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 79 f3 6c 50 00 00 00 70 00 00 [ +0.03] blk_update_request: I/O error, dev sdb, sector 2045996136 [ +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 [ +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0 [ +0.77] ata6: EH complete There's still 1 in Current_Pending_Sector line of smartctl output as of now, so it probably won't heal by itself. -- With Best Regards, Marat Khalili -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html