date:20170909

Re: Regarding handling of file renames in Btrfs

2017-09-09 Thread Qu Wenruo




On 2017年09月10日 14:41, Qu Wenruo wrote:



On 2017年09月10日 07:50, Rohan Kadekodi wrote:

Hello,

I was trying to understand how file renames are handled in Btrfs. I
read the code documentation, but had a problem understanding a few
things.

During a file rename, btrfs_commit_transaction() is called which is
because Btrfs has to commit the whole FS before storing the
information related to the new renamed file. It has to commit the FS
because a rename first does an unlink, which is not recorded in the
btrfs_rename() transaction and so is not logged in the log tree. Is my
understanding correct? If yes, my questions are as follows:


Not familiar with rename kernel code, so not much help for rename 
opeartion.




1. What does committing the whole FS mean?


Committing the whole fs means a lot of things, but generally speaking, 
it makes that the on-disk data is inconsistent with each other.

^consistent
Sorry for the typo.

Thanks,
Qu


For obvious part, it writes modified fs/subvolume trees to disk (with 
handling of tree operations so no half modified trees).


Also other trees like extent tree (very hot since every CoW will update 
it, and the most complicated one), csum tree if modified.


After transaction is committed, the on-disk btrfs will represent the 
states when commit trans is called, and every tree should match each other.


Despite of this, after a transaction is committed, generation of the fs 
get increased and modified tree blocks will have the same generation 
number.



Blktrace shows that there
are 2   256KB writes, which are essentially writes to the data of
the root directory of the file system (which I found out through
btrfs-debug-tree).


I'd say you didn't check btrfs-debug-tree output carefully enough.
I strongly recommend to do vimdiff to get what tree is modified.

At least the following trees are modified:

1) fs/subvolume tree
    Rename modified the DIR_INDEX/DIR_ITEM/INODE_REF at least, and
    updated inode time.
    So fs/subvolume tree must be CoWed.

2) extent tree
    CoW of above metadata operation will definitely cause extent
    allocation and freeing, extent tree will also get updated.

3) root tree
    Both extent tree and fs/subvolume tree modified, their root bytenr
    needs to be updated and root tree must be updated.

And finally superblocks.

I just verified the behavior with empty btrfs created on a 1G file, only 
one file to do the rename.


In that case (with 4K sectorsize and 16K nodesize), the total IO should 
be (3 * 16K) * 2 + 4K * 2 = 104K.


"3" = number of tree blocks get modified
"16K" = nodesize
1st "*2" = DUP profile for metadata
"4K" = superblock size
2nd "*2" = 2 superblocks for 1G fs.

If your extent/root/fs trees have higher level, then more tree blocks 
needs to be updated.

And if your fs is very large, you may have 3 superblocks.


Is this equivalent to doing a shell sync, as the
same block groups are written during a shell sync too?


For shell "sync" the difference is that, "sync" will write all dirty 
data pages to disk, and then commit transaction.
While only calling btrfs_commit_transacation() doesn't trigger dirty 
page writeback.


So there is a difference.

And furthermore, if there is nothing to modified at all, sync will just 
skip the fs, so btrfs_commit_transaction() is not ensured if you call 
"sync".



Also, does it
imply that all the metadata held by the log tree is now checkpointed
to the respective trees?


Log tree part is a little tricky, as the log tree is not really a 
journal for btrfs.
Btrfs uses CoW for metadata so in theory (and in fact) btrfs doesn't 
need any journal.


Log tree is mainly used for enhancing btrfs fsync performance.
You can totally disable log tree by notreelog mount option and btrfs 
will behave just fine.


And furthermore, I'm not very familiar with log tree, I need to verify 
the code to see if log tree is used in rename, so I can't say much right 
now.


But to make things easy, I strongly recommend to ignore log tree for now.



2. Why are there 2 complete writes to the data held by the root
directory and not just 1? These writes are 256KB each, which is the
size of the extent allocated to the root directory


Check my first calculation and verify the debug-tree output before and 
after rename.


I think there is some extra factors affecting the number, from the tree 
height to your fs tree organization.




3. Why are the writes being done to the root directory of the file
system / subvolume and not just the parent directory where the unlink
happened?


That's why I strongly recommend to understand btrfs on-disk format first.
A lot of things can be answered after understanding the on-disk layout, 
without asking any other guys.


The short answer is, btrfs puts all its child dir/inode info into one 
tree for one subvolume.
(And the term "root directory" here is a little confusing, are you 
talking about the fs tree root or the root tree?)


Not the

Re: Regarding handling of file renames in Btrfs

2017-09-09 Thread Qu Wenruo




On 2017年09月10日 07:50, Rohan Kadekodi wrote:

Hello,

I was trying to understand how file renames are handled in Btrfs. I
read the code documentation, but had a problem understanding a few
things.

During a file rename, btrfs_commit_transaction() is called which is
because Btrfs has to commit the whole FS before storing the
information related to the new renamed file. It has to commit the FS
because a rename first does an unlink, which is not recorded in the
btrfs_rename() transaction and so is not logged in the log tree. Is my
understanding correct? If yes, my questions are as follows:


Not familiar with rename kernel code, so not much help for rename opeartion.



1. What does committing the whole FS mean?


Committing the whole fs means a lot of things, but generally speaking, 
it makes that the on-disk data is inconsistent with each other.


For obvious part, it writes modified fs/subvolume trees to disk (with 
handling of tree operations so no half modified trees).


Also other trees like extent tree (very hot since every CoW will update 
it, and the most complicated one), csum tree if modified.


After transaction is committed, the on-disk btrfs will represent the 
states when commit trans is called, and every tree should match each other.


Despite of this, after a transaction is committed, generation of the fs 
get increased and modified tree blocks will have the same generation number.



Blktrace shows that there
are 2   256KB writes, which are essentially writes to the data of
the root directory of the file system (which I found out through
btrfs-debug-tree).


I'd say you didn't check btrfs-debug-tree output carefully enough.
I strongly recommend to do vimdiff to get what tree is modified.

At least the following trees are modified:

1) fs/subvolume tree
   Rename modified the DIR_INDEX/DIR_ITEM/INODE_REF at least, and
   updated inode time.
   So fs/subvolume tree must be CoWed.

2) extent tree
   CoW of above metadata operation will definitely cause extent
   allocation and freeing, extent tree will also get updated.

3) root tree
   Both extent tree and fs/subvolume tree modified, their root bytenr
   needs to be updated and root tree must be updated.

And finally superblocks.

I just verified the behavior with empty btrfs created on a 1G file, only 
one file to do the rename.


In that case (with 4K sectorsize and 16K nodesize), the total IO should 
be (3 * 16K) * 2 + 4K * 2 = 104K.


"3" = number of tree blocks get modified
"16K" = nodesize
1st "*2" = DUP profile for metadata
"4K" = superblock size
2nd "*2" = 2 superblocks for 1G fs.

If your extent/root/fs trees have higher level, then more tree blocks 
needs to be updated.

And if your fs is very large, you may have 3 superblocks.


Is this equivalent to doing a shell sync, as the
same block groups are written during a shell sync too?


For shell "sync" the difference is that, "sync" will write all dirty 
data pages to disk, and then commit transaction.
While only calling btrfs_commit_transacation() doesn't trigger dirty 
page writeback.


So there is a difference.

And furthermore, if there is nothing to modified at all, sync will just 
skip the fs, so btrfs_commit_transaction() is not ensured if you call 
"sync".



Also, does it
imply that all the metadata held by the log tree is now checkpointed
to the respective trees?


Log tree part is a little tricky, as the log tree is not really a 
journal for btrfs.
Btrfs uses CoW for metadata so in theory (and in fact) btrfs doesn't 
need any journal.


Log tree is mainly used for enhancing btrfs fsync performance.
You can totally disable log tree by notreelog mount option and btrfs 
will behave just fine.


And furthermore, I'm not very familiar with log tree, I need to verify 
the code to see if log tree is used in rename, so I can't say much right 
now.


But to make things easy, I strongly recommend to ignore log tree for now.



2. Why are there 2 complete writes to the data held by the root
directory and not just 1? These writes are 256KB each, which is the
size of the extent allocated to the root directory


Check my first calculation and verify the debug-tree output before and 
after rename.


I think there is some extra factors affecting the number, from the tree 
height to your fs tree organization.




3. Why are the writes being done to the root directory of the file
system / subvolume and not just the parent directory where the unlink
happened?


That's why I strongly recommend to understand btrfs on-disk format first.
A lot of things can be answered after understanding the on-disk layout, 
without asking any other guys.


The short answer is, btrfs puts all its child dir/inode info into one 
tree for one subvolume.
(And the term "root directory" here is a little confusing, are you 
talking about the fs tree root or the root tree?)


Not the common one tree for one inode layout.

So if you rename one file in a subvolume, the subvolume tree get CoWed, 
which means from the

Re: Please help with exact actions for raid1 hot-swap

2017-09-09 Thread Marat Khalili

It doesn't need replaced disk to be readable, right? Then what prevents same 
procedure to work without a spare bay?
-- 

With Best Regards,
Marat Khalili

On September 9, 2017 1:29:08 PM GMT+03:00, Patrik Lundquist 
 wrote:
>On 9 September 2017 at 12:05, Marat Khalili  wrote:
>> Forgot to add, I've got a spare empty bay if it can be useful here.
>
>That makes it much easier since you don't have to mount it degraded,
>with the risks involved.
>
>Add and partition the disk.
>
># btrfs replace start /dev/sdb7 /dev/sdc(?)7 /mnt/data
>
>Remove the old disk when it is done.
>
>> --
>>
>> With Best Regards,
>> Marat Khalili
>>
>> On September 9, 2017 10:46:10 AM GMT+03:00, Marat Khalili
> wrote:
>>>Dear list,
>>>
>>>I'm going to replace one hard drive (partition actually) of a btrfs
>>>raid1. Can you please spell exactly what I need to do in order to get
>>>my
>>>filesystem working as RAID1 again after replacement, exactly as it
>was
>>>before? I saw some bad examples of drive replacement in this list so
>I
>>>afraid to just follow random instructions on wiki, and putting this
>>>system out of action even temporarily would be very inconvenient.
>>>
>>>For this filesystem:
>>>
 $ sudo btrfs fi show /dev/sdb7
 Label: 'data'  uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0
 Total devices 2 FS bytes used 106.23GiB
 devid1 size 2.71TiB used 126.01GiB path /dev/sda7
 devid2 size 2.71TiB used 126.01GiB path /dev/sdb7
 $ grep /mnt/data /proc/mounts
 /dev/sda7 /mnt/data btrfs
 rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0
 $ sudo btrfs fi df /mnt/data
 Data, RAID1: total=123.00GiB, used=104.57GiB
 System, RAID1: total=8.00MiB, used=48.00KiB
 Metadata, RAID1: total=3.00GiB, used=1.67GiB
 GlobalReserve, single: total=512.00MiB, used=0.00B
 $ uname -a
 Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC
 2017 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>>I've got this in dmesg:
>>>
 [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0
 action 0x0
 [  +0.51] ata6.00: irq_stat 0x4008
 [  +0.29] ata6.00: failed command: READ FPDMA QUEUED
 [  +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag
>3
 ncq 57344 in
res 41/40:00:68:6c:f3/00:00:79:00:00/40
>Emask
 0x409 (media error) 
 [  +0.94] ata6.00: status: { DRDY ERR }
 [  +0.26] ata6.00: error: { UNC }
 [  +0.001195] ata6.00: configured for UDMA/133
 [  +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result:
>hostbyte=DID_OK
 driverbyte=DRIVER_SENSE
 [  +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error
 [current] [descriptor]
 [  +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read
 error - auto reallocate failed
 [  +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00
>00
>>>
 79 f3 6c 50 00 00 00 70 00 00
 [  +0.03] blk_update_request: I/O error, dev sdb, sector
>>>2045996136
 [  +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0,
>>>rd
 1, flush 0, corrupt 0, gen 0
 [  +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0,
>>>rd
 2, flush 0, corrupt 0, gen 0
 [  +0.77] ata6: EH complete
>>>
>>>There's still 1 in Current_Pending_Sector line of smartctl output as
>of
>>>
>>>now, so it probably won't heal by itself.
>>>
>>>--
>>>
>>>With Best Regards,
>>>Marat Khalili
>>>--
>>>To unsubscribe from this list: send the line "unsubscribe
>linux-btrfs"
>>>in
>>>the body of a message to majord...@vger.kernel.org
>>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>--
>To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
>in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs check --repair now runs in minutes instead of hours? aborting

2017-09-09 Thread Qu Wenruo




On 2017年09月10日 01:44, Marc MERLIN wrote:

So, should I assume that btrfs progs git has some issue since there is
no plausible way that a check --repair should be faster than a regular
check?


Yes, the assumption that repair should be no faster than RO check is 
correct.

Especially for clean fs, repair should just behave the same as RO check.

And I'll first submit a patch (or patches) to output the consumed time 
for each tree, so we could have a clue what is going wrong.

(Digging the code is just a little too boring for me)

Thanks,
Qu



Thanks,
Marc

On Tue, Sep 05, 2017 at 07:45:25AM -0700, Marc MERLIN wrote:

On Tue, Sep 05, 2017 at 04:05:04PM +0800, Qu Wenruo wrote:

gargamel:~# btrfs fi df /mnt/btrfs_pool1
Data, single: total.60TiB, used.54TiB
System, DUP: total2.00MiB, used=1.19MiB
Metadata, DUP: totalX.00GiB, used.69GiB


Wait for a minute.

Is that .69GiB means 706 MiB? Or my email client/GMX screwed up the format
(again)?
This output format must be changed, at least to 0.69 GiB, or 706 MiB.
  
Email client problem. I see control characters in what you quoted.


Let's try again
gargamel:~# btrfs fi df /mnt/btrfs_pool1
Data, single: total=10.66TiB, used=10.60TiB  => 10TB
System, DUP: total=64.00MiB, used=1.20MiB=> 1.2MB
Metadata, DUP: total=57.50GiB, used=12.76GiB => 13GB
GlobalReserve, single: total=512.00MiB, used=0.00B  => 0


You mean lowmem is actually FASTER than original mode?
That's very surprising.
  
Correct, unless I add --repair and then original mode is 2x faster than

lowmem.


Is there any special operation done for that btrfs?
Like offline dedupe or tons of reflinks?


In this case, no.
Note that btrfs check used to take many hours overnight until I did a
git pull of btrfs progs and built the latest from TOT.


BTW, how many subvolumes do you have in the fs?
  
gargamel:/mnt/btrfs_pool1# btrfs subvolume list . | wc -l

91

If I remove snapshots for btrfs send and historical 'backups':
gargamel:/mnt/btrfs_pool1# btrfs subvolume list . | grep -Ev 
'(hourly|daily|weekly|rw|ro)' | wc -l
5


This looks like a bug. My first guess is related to number of
subvolumes/reflinks, but I'm not sure since I don't have many real-world
btrfs.

I'll take sometime to look into it.

Thanks for the very interesting report,


Thanks for having a look :)

Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
    what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-09 Thread Josef Bacik

Ok mount -o clear_cache, umount and run fsck again just to make sure.  Then if 
it comes out clean mount with ref_verify again and wait for it to blow up 
again.  Thanks,

Josef

Sent from my iPhone

> On Sep 9, 2017, at 10:37 PM, Marc MERLIN  wrote:
> 
>> On Sat, Sep 09, 2017 at 10:56:14PM +, Josef Bacik wrote:
>> Well that's odd, a block allocated on disk is in the free space cache.  Can 
>> I see the full output of the fsck?  I want to make sure it's actually 
>> getting to the part where it checks the free space cache.  If it does then 
>> I'll have to think of how to catch this kind of bug, because you've got a 
>> weird one.  Thanks,
> 
> Well, btrfs check was clean before, that, but now I returned this:
> gargamel:~# time btrfs check /dev/mapper/dshelf1  
> Checking filesystem on /dev/mapper/dshelf1  
> UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d  
> checking extents  
> checking free space cache  
> Wanted bytes 16384, found 196608 for off 13282417049600  
> Wanted bytes 536870912, found 196608 for off 13282417049600  
> cache appears valid but isn't 13282417049600  
> There is no free space entry for 13849889603584-13849889652736  
> There is no free space entry for 13849889603584-13850426474496  
> cache appears valid but isn't 13849889603584  
> Wanted bytes 5832704, found 81920 for off 13870290698240  
> Wanted bytes 536870912, found 81920 for off 13870290698240  
> cache appears valid but isn't 13870290698240  
> block group 13928272756736 has wrong amount of free space  
> failed to load free space cache for block group 13928272756736  
> Duplicate entries in free space cache  
> failed to load free space cache for block group 13962095624192  
> block group 14003434684416 has wrong amount of free space  
> failed to load free space cache for block group 14003434684416  
> block group 14470042615808 has wrong amount of free space  
> failed to load free space cache for block group 14470042615808  
> block group 14610702794752 has wrong amount of free space  
> failed to load free space cache for block group 14610702794752  
> block group 14612313407488 has wrong amount of free space  
> failed to load free space cache for block group 14612313407488  
> block group 14624661438464 has wrong amount of free space  
> failed to load free space cache for block group 14624661438464  
> block group 14648820629504 has wrong amount of free space  
> failed to load free space cache for block group 14648820629504  
> Wanted offset 14657410793472, found 14657410760704  
> Wanted offset 14657410793472, found 14657410760704  
> cache appears valid but isn't 14657410564096  
> block group 15886844952576 has wrong amount of free space  
> failed to load free space cache for block group 15886844952576  
> There is no free space entry for 15905635434496-15905636499456  
> There is no free space entry for 15905635434496-15906172305408  
> cache appears valid but isn't 15905635434496  
> block group 16542901207040 has wrong amount of free space  
> failed to load free space cache for block group 16542901207040  
> block group 16581019041792 has wrong amount of free space  
> failed to load free space cache for block group 16581019041792  
> block group 16616989392896 has wrong amount of free space  
> failed to load free space cache for block group 16616989392896  
> block group 16676582064128 has wrong amount of free space  
> failed to load free space cache for block group 16676582064128  
> block group 16697520029696 has wrong amount of free space  
> failed to load free space cache for block group 16697520029696  
> block group 16848380755968 has wrong amount of free space  
> failed to load free space cache for block group 16848380755968  
> ERROR: errors found in free space cache  
> found 11732749766656 bytes used, error(s) found  
> total csum bytes: 11441478452  
> total tree bytes: 13793296384  
> total fs tree bytes: 727580672  
> total extent tree bytes: 483426304  
> btree space waste bytes: 1194373662  
> file data blocks allocated: 12133646495744  
> referenced 12155707805696  
> 
> real100m12.252s  
> user0m33.771s  
> sys 1m11.220s 
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>   what McDonalds is to gourmet cooking
> Home page: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=aM1dKUolLxTtIO-Lzj78H4ut4SBtL_PddTteGDuBebc&s=vl4rfHfvogAgd7IHj7J1ZX4Joo9Rwj87HHq-BoldS8k&e=
>   | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-09 Thread Marc MERLIN

On Sat, Sep 09, 2017 at 10:56:14PM +, Josef Bacik wrote:
> Well that's odd, a block allocated on disk is in the free space cache.  Can I 
> see the full output of the fsck?  I want to make sure it's actually getting 
> to the part where it checks the free space cache.  If it does then I'll have 
> to think of how to catch this kind of bug, because you've got a weird one.  
> Thanks,
 
Well, btrfs check was clean before, that, but now I returned this:
gargamel:~# time btrfs check /dev/mapper/dshelf1  
Checking filesystem on /dev/mapper/dshelf1  
UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d  
checking extents  
checking free space cache  
Wanted bytes 16384, found 196608 for off 13282417049600  
Wanted bytes 536870912, found 196608 for off 13282417049600  
cache appears valid but isn't 13282417049600  
There is no free space entry for 13849889603584-13849889652736  
There is no free space entry for 13849889603584-13850426474496  
cache appears valid but isn't 13849889603584  
Wanted bytes 5832704, found 81920 for off 13870290698240  
Wanted bytes 536870912, found 81920 for off 13870290698240  
cache appears valid but isn't 13870290698240  
block group 13928272756736 has wrong amount of free space  
failed to load free space cache for block group 13928272756736  
Duplicate entries in free space cache  
failed to load free space cache for block group 13962095624192  
block group 14003434684416 has wrong amount of free space  
failed to load free space cache for block group 14003434684416  
block group 14470042615808 has wrong amount of free space  
failed to load free space cache for block group 14470042615808  
block group 14610702794752 has wrong amount of free space  
failed to load free space cache for block group 14610702794752  
block group 14612313407488 has wrong amount of free space  
failed to load free space cache for block group 14612313407488  
block group 14624661438464 has wrong amount of free space  
failed to load free space cache for block group 14624661438464  
block group 14648820629504 has wrong amount of free space  
failed to load free space cache for block group 14648820629504  
Wanted offset 14657410793472, found 14657410760704  
Wanted offset 14657410793472, found 14657410760704  
cache appears valid but isn't 14657410564096  
block group 15886844952576 has wrong amount of free space  
failed to load free space cache for block group 15886844952576  
There is no free space entry for 15905635434496-15905636499456  
There is no free space entry for 15905635434496-15906172305408  
cache appears valid but isn't 15905635434496  
block group 16542901207040 has wrong amount of free space  
failed to load free space cache for block group 16542901207040  
block group 16581019041792 has wrong amount of free space  
failed to load free space cache for block group 16581019041792  
block group 16616989392896 has wrong amount of free space  
failed to load free space cache for block group 16616989392896  
block group 16676582064128 has wrong amount of free space  
failed to load free space cache for block group 16676582064128  
block group 16697520029696 has wrong amount of free space  
failed to load free space cache for block group 16697520029696  
block group 16848380755968 has wrong amount of free space  
failed to load free space cache for block group 16848380755968  
ERROR: errors found in free space cache  
found 11732749766656 bytes used, error(s) found  
total csum bytes: 11441478452  
total tree bytes: 13793296384  
total fs tree bytes: 727580672  
total extent tree bytes: 483426304  
btree space waste bytes: 1194373662  
file data blocks allocated: 12133646495744  
 referenced 12155707805696  
  
real100m12.252s  
user0m33.771s  
sys 1m11.220s 

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Regarding handling of file renames in Btrfs

2017-09-09 Thread Duncan

Rohan Kadekodi posted on Sat, 09 Sep 2017 18:50:09 -0500 as excerpted:

> Hello,
> 
> I was trying to understand how file renames are handled in Btrfs. I read
> the code documentation, but had a problem understanding a few things.
> 
> During a file rename, btrfs_commit_transaction() is called which is
> because Btrfs has to commit the whole FS before storing the information
> related to the new renamed file. It has to commit the FS because a
> rename first does an unlink, which is not recorded in the btrfs_rename()
> transaction and so is not logged in the log tree. Is my understanding
> correct? If yes, my questions are as follows:

I'm not a dev, but am a btrfs user and list regular, and can try my hand 
at answering... and if I'm wrong, a dev's reply can correct my 
misconceptions as well. =:^)

> 1. What does committing the whole FS mean? Blktrace shows that there are
> 2  256KB writes, which are essentially writes to the data of the
> root directory of the file system (which I found out through
> btrfs-debug-tree). Is this equivalent to doing a shell sync, as the same
> block groups are written during a shell sync too? Also, does it imply
> that all the metadata held by the log tree is now checkpointed to the
> respective trees?

A btrfs commit is the equivalent of a *single* filesystem sync, yes.  The 
difference compared to the sync(1) command is that sync applies to all 
filesystems of all types, not just a single btrfs filesystem.  See also 
the btrfs filesystem sync command (btrfs-filesystem(8) manpage), which 
applies to a a single btrfs, but also triggers deleted subvolume cleanup.

But these are not writes to the /data/ of the root directory.  In btrfs, 
data and metadata are separated, and these are writes to the /metadata/ 
of the filesystem, including writing a new filesystem top-level (aka 
root) block and the superblock and its backups.

Yes, the log is synced too.

But regarding the log, in btrfs, because btrfs is atomic cow-based (copy-
on-write), at each commit the filesystem is designed to be entirely self-
consistent, with the result being that most actions don't need to be and 
are not logged.  At a crash and later remount, the filesystem as of the 
last atomically-written root-block state will be mounted, and anything 
being written at the time of the crash will either have been entirely 
written and committed (the top-level root tree block will have been 
updated to reflect it), or that update will not have happened yet, so the 
state of the filesystem will be that of the last root tree block commit, 
with newer in-process actions lost.

The btrfs log is an exception, a compromise in the interest of fsync 
speed.  The only thing it logs are fsyncs (filesyncs, as opposed to whole 
filesystem syncs) that would otherwise not return until the next commit 
(with commits on a 30-second timer by default), since the filesystem 
would otherwise be unable to guarantee that the fsync had been entirely 
written to permanent media and thus should survive a crash.  The log 
ensures the fsynced file's new data (if any) is written to its new 
location on the media (cow so new block location), updates the metadata 
(also cow so written to a new location), then logs the metadata update so 
it can be committed at log replay if necessary, and returns.  If a crash 
happens before the next full filesystem atomic commit, the fsync can be 
replayed from the log, thus satisfying the fsync guarantee without 
forcing a wait for a full atomic commit.  But once that full filesystem 
atomic commit happens (again, with a 30-second default timeout), all 
updates are now reflected in the new filesystem state as registered in 
the new root tree block, and the previous log is now dead/unreferenced on 
the media (because the new root block doesn't refer to it any longer, 
referring instead to a new log).

> 2. Why are there 2 complete writes to the data held by the root
> directory and not just 1? These writes are 256KB each, which is the size
> of the extent allocated to the root directory

I'm not sure on this one, hopefully a btrfs dev can clarify, but at a 
guess, you may be seeing writes to the superblock and its backup -- on a 
large enough filesystem there's two backups, but your filesystem may be 
small enough to have just one backup.

It's also possible you're seeing the new copy of the metadata tree being 
written out, then the root block and superblocks (and backups) being 
updated.

> 3. Why are the writes being done to the root directory of the file
> system / subvolume and not just the parent directory where the unlink
> happened?

Remember, everything's in trees, and updates are cowed, with updates at 
lower levels of the tree not reflected in the atomic state of the 
filesystem until they've recursed up the tree and a new root tree block 
is written, pointing at the new trees instead of the old ones, with the 
superblock and backups then updated to point at the new root tree block.

So nothing's loc

Regarding handling of file renames in Btrfs

2017-09-09 Thread Rohan Kadekodi

Hello,

I was trying to understand how file renames are handled in Btrfs. I
read the code documentation, but had a problem understanding a few
things.

During a file rename, btrfs_commit_transaction() is called which is
because Btrfs has to commit the whole FS before storing the
information related to the new renamed file. It has to commit the FS
because a rename first does an unlink, which is not recorded in the
btrfs_rename() transaction and so is not logged in the log tree. Is my
understanding correct? If yes, my questions are as follows:

1. What does committing the whole FS mean? Blktrace shows that there
are 2   256KB writes, which are essentially writes to the data of
the root directory of the file system (which I found out through
btrfs-debug-tree). Is this equivalent to doing a shell sync, as the
same block groups are written during a shell sync too? Also, does it
imply that all the metadata held by the log tree is now checkpointed
to the respective trees?

2. Why are there 2 complete writes to the data held by the root
directory and not just 1? These writes are 256KB each, which is the
size of the extent allocated to the root directory

3. Why are the writes being done to the root directory of the file
system / subvolume and not just the parent directory where the unlink
happened?

It would be great if I could get the answers to these questions.

Thanks,
Rohan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-09 Thread Josef Bacik

Well that's odd, a block allocated on disk is in the free space cache.  Can I 
see the full output of the fsck?  I want to make sure it's actually getting to 
the part where it checks the free space cache.  If it does then I'll have to 
think of how to catch this kind of bug, because you've got a weird one.  Thanks,

Josef

Sent from my iPhone

> On Sep 9, 2017, at 2:39 PM, Marc MERLIN  wrote:
> 
>> On Tue, Sep 05, 2017 at 06:19:25PM +, Josef Bacik wrote:
>> Alright I just reworked the build tree ref stuff and tested it to make sure 
>> it wasn’t going to give false positives again.  Apparently I had only ever 
>> used this with very basic existing fs’es and nothing super complicated, so 
>> it was just broken for anything complex.  I’ve pushed it to my tree, you can 
>> just pull and build and try again.  This time the stack traces will even 
>> work!  Thanks,
> 
> Ok, so I found out that I just need to copy a bunch of data to the
> filesystem to trigger the bug.
> 
> There you go:
> [318400.507972] re-allocated a block that still has references to it!
> [318400.527517] Dumping block entry [13282417065984 16384], num_refs 2, 
> metadata 1, from disk 1
> [318400.553751]   Ref root 2, parent 0, owner 0, offset 0, num_refs 1
> [318400.573208]   Root entry 2, num_refs 1
> [318400.585614]   Root entry 7, num_refs 0
> [318400.598028]   Ref action 3, root 7, ref_root 7, parent 0, owner 1, offset 
> 0, num_refs 1
> [318400.623774]btrfs_alloc_tree_block+0x33e/0x3e1
> [318400.639083]__btrfs_cow_block+0xf3/0x420
> [318400.652817]btrfs_cow_block+0xcf/0x145
> [318400.666024]btrfs_search_slot+0x269/0x6de
> [318400.680041]btrfs_del_csums+0xac/0x2f9
> [318400.693245]__btrfs_free_extent+0x88b/0xa0b
> [318400.707718]__btrfs_run_delayed_refs+0xb4e/0xd20
> [318400.723491]btrfs_run_delayed_refs+0x77/0x1a1
> [318400.738993]btrfs_write_dirty_block_groups+0xf5/0x2c1
> [318400.755994]commit_cowonly_roots+0x1da/0x273
> [318400.770673]btrfs_commit_transaction+0x3dd/0x761
> [318400.786397]transaction_kthread+0xe2/0x178
> [318400.800515]kthread+0xfb/0x100
> [318400.811487]ret_from_fork+0x25/0x30
> [318400.823748]0x
> [318400.957574] [ cut here ]
> [318400.972498] WARNING: CPU: 2 PID: 3242 at fs/btrfs/extent-tree.c:3015 
> btrfs_run_delayed_refs+0xa2/0x1a1
> [318401.001382] Modules linked in: veth ip6table_filter ip6_tables 
> ebtable_nat ebtables ppdev lp xt_addrtype br_netfilter bridge stp llc tun 
> autofs4 softdog binfmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace 
> fscache sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat 
> xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter lm85 
> hwmon_vid pl2303 dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 
> nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE 
> nf_nat_masquerade_ipv4 nf_nat nf_conntrack x_tables sg st snd_pcm_oss 
> snd_mixer_oss bcache kvm_intel kvm irqbypass snd_hda_codec_realtek 
> snd_hda_codec_generic snd_cmipci snd_hda_intel snd_mpu401_uart snd_hda_codec 
> snd_opl3_lib eeepc_wmi snd_hda_core tpm_infineon snd_rawmidi asix asus_wmi 
> rc_ati_x10 tpm_tis
> [318401.218357]  snd_seq_device sparse_keymap snd_hwdep tpm_tis_core 
> ati_remote usbnet parport_pc snd_pcm rfkill pcspkr i915 hwmon tpm parport 
> rc_core libphy mei_me snd_timer lpc_ich wmi_bmof battery usbserial evdev wmi 
> input_leds i2c_i801 snd soundcore e1000e ptp pps_core fuse raid456 multipath 
> mmc_block mmc_core lrw ablk_helper dm_crypt dm_mod async_raid6_recov async_pq 
> async_xor async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common 
> pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd xhci_pci ehci_pci 
> xhci_hcd ehci_hcd mvsas libsas r8169 sata_sil24 usbcore mii 
> scsi_transport_sas thermal fan [last unloaded: ftdi_sio]
> [318401.392440] CPU: 2 PID: 3242 Comm: btrfs-transacti Tainted: G U   
>4.13.0-rc5-amd64-stkreg-sysrq-20170902d+ #6
> [318401.426262] Hardware name: System manufacturer System Product 
> Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [318401.454894] task: 948ef791e200 task.stack: b18a091ec000
> [318401.473918] RIP: 0010:btrfs_run_delayed_refs+0xa2/0x1a1
> [318401.490849] RSP: 0018:b18a091efd08 EFLAGS: 00010296
> [318401.507751] RAX: 0026 RBX: 9488208be618 RCX: 
> 
> [318401.530384] RDX: 948f1e295e01 RSI: 948f1e28dd58 RDI: 
> 948f1e28dd58
> [318401.553548] RBP: b18a091efd50 R08: 0003dc12ea8bcc57 R09: 
> 948f1f50b868
> [318401.576127] R10: 948b1f1cc460 R11: aef37285 R12: 
> ffef
> [318401.598717] R13:  R14: 948edb7efd48 R15: 
> 948cdbdeb000
> [318401.621327] FS:  () GS:948f1e28() 
> knlGS:
> [318401.646737] CS:  0010 DS:  ES:  CR0: 80050033
> [318401.665149] CR2: f7f05001 CR3: 00061f587000 CR4: 
>

Re: netapp-alike snapshots?

2017-09-09 Thread Ulli Horlacher

On Sat 2017-09-09 (22:43), Andrei Borzenkov wrote:

> > Your tool does not create .snapshot subdirectories in EVERY directory like
> 
> Neither does NetApp. Those "directories" are magic handles that do not
> really exist.

I know.
But symbolic links are the next close thing (I am not a kernel programmer).


> Apart from obvious problem with recursive directory traversal (NetApp
> .snapshot are not visible with normal directory list)

Yes, they are, at least sometimes, eg tar includes the snapshots.


-- 
Ullrich Horlacher  Server und Virtualisierung
Rechenzentrum TIK 
Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de
Allmandring 30aTel:++49-711-68565868
70569 Stuttgart (Germany)  WWW:http://www.tik.uni-stuttgart.de/
REF:<14c87878-a5a0-d7d3-4a76-c55812e75...@gmail.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netapp-alike snapshots?

2017-09-09 Thread Andrei Borzenkov

09.09.2017 16:44, Ulli Horlacher пишет:
> 
> Your tool does not create .snapshot subdirectories in EVERY directory like

Neither does NetApp. Those "directories" are magic handles that do not
really exist.

> Netapp does.
> Example:
> 
> framstag@fex:~: cd ~/Mail/.snapshot/
> framstag@fex:~/Mail/.snapshot: l
> lR-X - 2017-09-09 09:55 2017-09-09_.daily -> 
> /local/home/.snapshot/2017-09-09_.daily/framstag/Mail

Apart from obvious problem with recursive directory traversal (NetApp
.snapshot are not visible with normal directory list) those will also be
captured in snapshots and cannot be removed. NetApp snapshots themselves
do not expose .snapshot "directories".

> lR-X - 2017-09-09 14:00 2017-09-09_1400.hourly -> 
> /local/home/.snapshot/2017-09-09_1400.hourly/framstag/Mail
> lR-X - 2017-09-09 15:00 2017-09-09_1500.hourly -> 
> /local/home/.snapshot/2017-09-09_1500.hourly/framstag/Mail
> lR-X - 2017-09-09 15:18 2017-09-09_1518.single -> 
> /local/home/.snapshot/2017-09-09_1518.single/framstag/Mail
> lR-X - 2017-09-09 15:20 2017-09-09_1520.single -> 
> /local/home/.snapshot/2017-09-09_1520.single/framstag/Mail
> lR-X - 2017-09-09 15:22 2017-09-09_1522.single -> 
> /local/home/.snapshot/2017-09-09_1522.single/framstag/Mail
> 
> My users (and I) need snapshots in this way.
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-09 Thread Marc MERLIN

On Tue, Sep 05, 2017 at 06:19:25PM +, Josef Bacik wrote:
> Alright I just reworked the build tree ref stuff and tested it to make sure 
> it wasn’t going to give false positives again.  Apparently I had only ever 
> used this with very basic existing fs’es and nothing super complicated, so it 
> was just broken for anything complex.  I’ve pushed it to my tree, you can 
> just pull and build and try again.  This time the stack traces will even 
> work!  Thanks,

Ok, so I found out that I just need to copy a bunch of data to the
filesystem to trigger the bug.

There you go:
[318400.507972] re-allocated a block that still has references to it!
[318400.527517] Dumping block entry [13282417065984 16384], num_refs 2, 
metadata 1, from disk 1
[318400.553751]   Ref root 2, parent 0, owner 0, offset 0, num_refs 1
[318400.573208]   Root entry 2, num_refs 1
[318400.585614]   Root entry 7, num_refs 0
[318400.598028]   Ref action 3, root 7, ref_root 7, parent 0, owner 1, offset 
0, num_refs 1
[318400.623774]btrfs_alloc_tree_block+0x33e/0x3e1
[318400.639083]__btrfs_cow_block+0xf3/0x420
[318400.652817]btrfs_cow_block+0xcf/0x145
[318400.666024]btrfs_search_slot+0x269/0x6de
[318400.680041]btrfs_del_csums+0xac/0x2f9
[318400.693245]__btrfs_free_extent+0x88b/0xa0b
[318400.707718]__btrfs_run_delayed_refs+0xb4e/0xd20
[318400.723491]btrfs_run_delayed_refs+0x77/0x1a1
[318400.738993]btrfs_write_dirty_block_groups+0xf5/0x2c1
[318400.755994]commit_cowonly_roots+0x1da/0x273
[318400.770673]btrfs_commit_transaction+0x3dd/0x761
[318400.786397]transaction_kthread+0xe2/0x178
[318400.800515]kthread+0xfb/0x100
[318400.811487]ret_from_fork+0x25/0x30
[318400.823748]0x
[318400.957574] [ cut here ]
[318400.972498] WARNING: CPU: 2 PID: 3242 at fs/btrfs/extent-tree.c:3015 
btrfs_run_delayed_refs+0xa2/0x1a1
[318401.001382] Modules linked in: veth ip6table_filter ip6_tables ebtable_nat 
ebtables ppdev lp xt_addrtype br_netfilter bridge stp llc tun autofs4 softdog 
binfmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc 
ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 
nf_log_common xt_LOG iptable_mangle iptable_filter lm85 hwmon_vid pl2303 
dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat 
nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm 
irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_cmipci snd_hda_intel 
snd_mpu401_uart snd_hda_codec snd_opl3_lib eeepc_wmi snd_hda_core tpm_infineon 
snd_rawmidi asix asus_wmi rc_ati_x10 tpm_tis
[318401.218357]  snd_seq_device sparse_keymap snd_hwdep tpm_tis_core ati_remote 
usbnet parport_pc snd_pcm rfkill pcspkr i915 hwmon tpm parport rc_core libphy 
mei_me snd_timer lpc_ich wmi_bmof battery usbserial evdev wmi input_leds 
i2c_i801 snd soundcore e1000e ptp pps_core fuse raid456 multipath mmc_block 
mmc_core lrw ablk_helper dm_crypt dm_mod async_raid6_recov async_pq async_xor 
async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common pcbc 
aesni_intel aes_x86_64 crypto_simd glue_helper cryptd xhci_pci ehci_pci 
xhci_hcd ehci_hcd mvsas libsas r8169 sata_sil24 usbcore mii scsi_transport_sas 
thermal fan [last unloaded: ftdi_sio]
[318401.392440] CPU: 2 PID: 3242 Comm: btrfs-transacti Tainted: G U 
 4.13.0-rc5-amd64-stkreg-sysrq-20170902d+ #6
[318401.426262] Hardware name: System manufacturer System Product Name/P8H67-M 
PRO, BIOS 3904 04/27/2013
[318401.454894] task: 948ef791e200 task.stack: b18a091ec000
[318401.473918] RIP: 0010:btrfs_run_delayed_refs+0xa2/0x1a1
[318401.490849] RSP: 0018:b18a091efd08 EFLAGS: 00010296
[318401.507751] RAX: 0026 RBX: 9488208be618 RCX: 

[318401.530384] RDX: 948f1e295e01 RSI: 948f1e28dd58 RDI: 
948f1e28dd58
[318401.553548] RBP: b18a091efd50 R08: 0003dc12ea8bcc57 R09: 
948f1f50b868
[318401.576127] R10: 948b1f1cc460 R11: aef37285 R12: 
ffef
[318401.598717] R13:  R14: 948edb7efd48 R15: 
948cdbdeb000
[318401.621327] FS:  () GS:948f1e28() 
knlGS:
[318401.646737] CS:  0010 DS:  ES:  CR0: 80050033
[318401.665149] CR2: f7f05001 CR3: 00061f587000 CR4: 
001406e0
[318401.687684] Call Trace:
[318401.696148]  btrfs_write_dirty_block_groups+0xf5/0x2c1
[318401.712745]  ? btrfs_run_delayed_refs+0x127/0x1a1
[318401.727981]  commit_cowonly_roots+0x1da/0x273
[318401.742183]  btrfs_commit_transaction+0x3dd/0x761
[318401.757447]  transaction_kthread+0xe2/0x178
[318401.771158]  ? btrfs_cleanup_transaction+0x3c2/0x3c2
[318401.787169]  kthread+0xfb/0x100
[318401.797769]  ? init_completion+0x24/0x24
[318401.810718]  ret_from_fork+0x25/0x30
[318401.822588] Code: 85 c0 41 89 c4 79 60 48 8b 43 60 f0 0f ba a8 d8 16

Re: btrfs check --repair now runs in minutes instead of hours? aborting

2017-09-09 Thread Marc MERLIN

So, should I assume that btrfs progs git has some issue since there is
no plausible way that a check --repair should be faster than a regular
check?

Thanks,
Marc

On Tue, Sep 05, 2017 at 07:45:25AM -0700, Marc MERLIN wrote:
> On Tue, Sep 05, 2017 at 04:05:04PM +0800, Qu Wenruo wrote:
> > > gargamel:~# btrfs fi df /mnt/btrfs_pool1
> > > Data, single: total.60TiB, used.54TiB
> > > System, DUP: total2.00MiB, used=1.19MiB
> > > Metadata, DUP: totalX.00GiB, used.69GiB
> > 
> > Wait for a minute.
> > 
> > Is that .69GiB means 706 MiB? Or my email client/GMX screwed up the format
> > (again)?
> > This output format must be changed, at least to 0.69 GiB, or 706 MiB.
>  
> Email client problem. I see control characters in what you quoted.
> 
> Let's try again
> gargamel:~# btrfs fi df /mnt/btrfs_pool1
> Data, single: total=10.66TiB, used=10.60TiB  => 10TB
> System, DUP: total=64.00MiB, used=1.20MiB=> 1.2MB
> Metadata, DUP: total=57.50GiB, used=12.76GiB => 13GB
> GlobalReserve, single: total=512.00MiB, used=0.00B  => 0
> 
> > You mean lowmem is actually FASTER than original mode?
> > That's very surprising.
>  
> Correct, unless I add --repair and then original mode is 2x faster than
> lowmem.
> 
> > Is there any special operation done for that btrfs?
> > Like offline dedupe or tons of reflinks?
> 
> In this case, no.
> Note that btrfs check used to take many hours overnight until I did a
> git pull of btrfs progs and built the latest from TOT.
> 
> > BTW, how many subvolumes do you have in the fs?
>  
> gargamel:/mnt/btrfs_pool1# btrfs subvolume list . | wc -l
> 91
> 
> If I remove snapshots for btrfs send and historical 'backups':
> gargamel:/mnt/btrfs_pool1# btrfs subvolume list . | grep -Ev 
> '(hourly|daily|weekly|rw|ro)' | wc -l
> 5
> 
> > This looks like a bug. My first guess is related to number of
> > subvolumes/reflinks, but I'm not sure since I don't have many real-world
> > btrfs.
> > 
> > I'll take sometime to look into it.
> > 
> > Thanks for the very interesting report,
> 
> Thanks for having a look :)
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>    what McDonalds is to gourmet 
> cooking
> Home page: http://marc.merlins.org/ | PGP 
> 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: Clean up dead code in root-tree

2017-09-09 Thread Christos Gkekas

The value of variable 'can_recover' is never used after being set, thus
it should be removed.

Signed-off-by: Christos Gkekas 
---
 fs/btrfs/root-tree.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c
index 95bcc3c..3338407 100644
--- a/fs/btrfs/root-tree.c
+++ b/fs/btrfs/root-tree.c
@@ -226,10 +226,6 @@ int btrfs_find_orphan_roots(struct btrfs_fs_info *fs_info)
struct btrfs_root *root;
int err = 0;
int ret;
-   bool can_recover = true;
-
-   if (sb_rdonly(fs_info->sb))
-   can_recover = false;
 
path = btrfs_alloc_path();
if (!path)
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netapp-alike snapshots?

2017-09-09 Thread Ulli Horlacher

On Sat 2017-09-09 (06:36), Marc MERLIN wrote:

> > On Tue 2017-08-22 (15:22), Ulli Horlacher wrote:
> > > With Netapp/waffle you have automatic hourly/daily/weekly snapshots.
> > > You can find these snapshots in every local directory (readonly).
> > 
> > I have found none, so I have implemented it by myself:
> > 
> > https://fex.rus.uni-stuttgart.de/snaprotate.html
> 
> Not sure how you looked :)
> http://marc.merlins.org/perso/btrfs/post_2014-03-21_Btrfs-Tips_-How-To-Setup-Netapp-Style-Snapshots.html
> 
> Might not be exactly what you wanted, but been using it for 3 years.

Your tool does not create .snapshot subdirectories in EVERY directory like
Netapp does.
Example:

framstag@fex:~: cd ~/Mail/.snapshot/
framstag@fex:~/Mail/.snapshot: l
lR-X - 2017-09-09 09:55 2017-09-09_.daily -> 
/local/home/.snapshot/2017-09-09_.daily/framstag/Mail
lR-X - 2017-09-09 14:00 2017-09-09_1400.hourly -> 
/local/home/.snapshot/2017-09-09_1400.hourly/framstag/Mail
lR-X - 2017-09-09 15:00 2017-09-09_1500.hourly -> 
/local/home/.snapshot/2017-09-09_1500.hourly/framstag/Mail
lR-X - 2017-09-09 15:18 2017-09-09_1518.single -> 
/local/home/.snapshot/2017-09-09_1518.single/framstag/Mail
lR-X - 2017-09-09 15:20 2017-09-09_1520.single -> 
/local/home/.snapshot/2017-09-09_1520.single/framstag/Mail
lR-X - 2017-09-09 15:22 2017-09-09_1522.single -> 
/local/home/.snapshot/2017-09-09_1522.single/framstag/Mail

My users (and I) need snapshots in this way.


-- 
Ullrich Horlacher  Server und Virtualisierung
Rechenzentrum TIK 
Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de
Allmandring 30aTel:++49-711-68565868
70569 Stuttgart (Germany)  WWW:http://www.tik.uni-stuttgart.de/
REF:<20170909133612.7iqwr6cbjxzvf...@merlins.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netapp-alike snapshots?

2017-09-09 Thread Marc MERLIN

On Sat, Sep 09, 2017 at 03:26:14PM +0200, Ulli Horlacher wrote:
> On Tue 2017-08-22 (15:22), Ulli Horlacher wrote:
> > With Netapp/waffle you have automatic hourly/daily/weekly snapshots.
> > You can find these snapshots in every local directory (readonly).
> 
> > I would like to have something similar with btrfs.
> > Is there (where?) such a tool?
> 
> I have found none, so I have implemented it by myself:
> 
> https://fex.rus.uni-stuttgart.de/snaprotate.html

Not sure how you looked :)
https://www.google.com/search?q=btrfs+netapp+snapshot
http://marc.merlins.org/perso/btrfs/post_2014-03-21_Btrfs-Tips_-How-To-Setup-Netapp-Style-Snapshots.html

Might not be exactly what you wanted, but been using it for 3 years.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netapp-alike snapshots?

2017-09-09 Thread Ulli Horlacher

On Tue 2017-08-22 (15:22), Ulli Horlacher wrote:
> With Netapp/waffle you have automatic hourly/daily/weekly snapshots.
> You can find these snapshots in every local directory (readonly).

> I would like to have something similar with btrfs.
> Is there (where?) such a tool?

I have found none, so I have implemented it by myself:

https://fex.rus.uni-stuttgart.de/snaprotate.html

In contrast to Netapp, with snaprotate the local host administrator can
create a snapshot at any time or by cronjob.

Example:

root@fex:~# snaprotate single 3 /local/home
Create a readonly snapshot of '/local/home' in 
'/local/home/.snapshot/2017-09-09_1518.single'
Delete subvolume '/local/home/.snapshot/2017-09-09_1255.single'

root@fex:~# snaprotate -l
/local/home/.snapshot/2017-09-08_.daily
/local/home/.snapshot/2017-09-09_.daily
/local/home/.snapshot/2017-09-09_1331.single
/local/home/.snapshot/2017-09-09_1332.single
/local/home/.snapshot/2017-09-09_1400.hourly
/local/home/.snapshot/2017-09-09_1500.hourly
/local/home/.snapshot/2017-09-09_1518.single

root@fex:~# crontab -l | grep snaprotate
0 * * * * /root/bin/snaprotate -q hourly 2 /local/home
0 0 * * * /root/bin/snaprotate -q daily  3 /local/home
0 0 * * 1 /root/bin/snaprotate -q weekly 1 /local/home

-- 
Ullrich Horlacher  Server und Virtualisierung
Rechenzentrum TIK 
Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de
Allmandring 30aTel:++49-711-68565868
70569 Stuttgart (Germany)  WWW:http://www.tik.uni-stuttgart.de/
REF:<20170822132208.gd14...@rus.uni-stuttgart.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Please help with exact actions for raid1 hot-swap

2017-09-09 Thread Duncan

Patrik Lundquist posted on Sat, 09 Sep 2017 12:29:08 +0200 as excerpted:

> On 9 September 2017 at 12:05, Marat Khalili  wrote:
>> Forgot to add, I've got a spare empty bay if it can be useful here.
> 
> That makes it much easier since you don't have to mount it degraded,
> with the risks involved.
> 
> Add and partition the disk.
> 
> # btrfs replace start /dev/sdb7 /dev/sdc(?)7 /mnt/data
> 
> Remove the old disk when it is done.

I did this with my dozen-plus (but small) btrfs raid1s on ssd partitions 
several kernel cycles ago.  It went very smoothly. =:^)

(TL;DR can stop there.)

I had actually been taking advantage of btrfs raid1's checksumming and 
scrub ability to continue running a failing ssd, with more and more 
sectors going bad and being replaced from spares, for quite some time 
after I'd have otherwise replaced it.  Everything of value was backed up, 
and I was simply doing it for the experience with both btrfs raid1 
scrubbing and continuing ssd sector failure.  But eventually the scrubs 
were finding and fixing errors every boot, especially when off for 
several hours, and further experience was of diminishing value while the 
hassle factor was building fast, so I attached the spare ssd, partitioned 
it up, did a final scrub on all the btrfs, and then one btrfs at a time 
btrfs replaced the devices from the old ssd's partitions to the new one's 
partitions.  Given that I was already used to running scrubs at every 
boot, the entirely uneventful replacements were actually somewhat 
anticlimactic, but that was a good thing! =:^)

Then more recently I bought a larger/newer pair of ssds (1 TB each, the 
old ones were quarter TB each) and converted my media partitions and 
secondary backups, which had still been on reiserfs on spinning rust, to 
btrfs raid1 on ssd as well, making me all-btrfs on all-ssd now, with 
everything but /boot and its backups on the other ssds being btrfs raid1, 
and /boot and its backups being btrfs dup. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: generic name for volume and subvolume root?

2017-09-09 Thread Hans van Kranenburg

On 09/09/2017 01:06 PM, Hugo Mills wrote:
> On Sat, Sep 09, 2017 at 06:58:38PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2017年09月09日 18:48, Ulli Horlacher wrote:
>>> On Sat 2017-09-09 (18:40), Qu Wenruo wrote:
>>>
> Is there a generic name for both volume and subvolume root?

 Nope, subvolume (including snapshot) is not distinguished by its
 filename/path/directory name.

 And you can only do snapshot on subvolume (snapshot is one kind of
 subvolume) boundary.
>>>
>>> So, I can name a btrfs root volume also btrfs subvolume?
>>
>> Yes, root volume is also a subvolume, so just call "btrfs root volume"
>> a "subvolume".
> 
>I find it's best to avoid the word "root" entirely, as it's got
> several meanings, and it tends to get confusing in conversation.
> Instead, we have:
> 
>  - "the top level" (subvolid=5)
>  - "/" (what you see at / in your running system)
>  - "/@" or similar names
>(the subvolume that's mounted at /)
> 
>>> I am talking about documentation, not coding!
>>>
>>> I just want yo use the correct terms.
>>
>> If you're referring to the term, I think subvolume is good enough.
>> Which represents your original term, "directories one can snapshot".
>>
>>
>> For the whole btrfs "volume", I would just call it "filesystem" to
>> avoid the name "volume" or "subvolume" at all.
> 
>Yes, it's a filesystem. (Although that does occasionally cause
> confusion between "the conceptual filesystem implemented by btrfs.ko"
> and "the concrete filesystem stored on /dev/sda1", but it's generally
> far less confusing than the overloading of "root").

Yes, because every subvolume is a filesystem root! :-D

https://i.imgur.com/2VzmC.gif

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: generic name for volume and subvolume root?

2017-09-09 Thread Hugo Mills

On Sat, Sep 09, 2017 at 06:58:38PM +0800, Qu Wenruo wrote:
> 
> 
> On 2017年09月09日 18:48, Ulli Horlacher wrote:
> >On Sat 2017-09-09 (18:40), Qu Wenruo wrote:
> >
> >>>Is there a generic name for both volume and subvolume root?
> >>
> >>Nope, subvolume (including snapshot) is not distinguished by its
> >>filename/path/directory name.
> >>
> >>And you can only do snapshot on subvolume (snapshot is one kind of
> >>subvolume) boundary.
> >
> >So, I can name a btrfs root volume also btrfs subvolume?
> 
> Yes, root volume is also a subvolume, so just call "btrfs root volume"
> a "subvolume".

   I find it's best to avoid the word "root" entirely, as it's got
several meanings, and it tends to get confusing in conversation.
Instead, we have:

 - "the top level" (subvolid=5)
 - "/" (what you see at / in your running system)
 - "/@" or similar names
   (the subvolume that's mounted at /)

> >I am talking about documentation, not coding!
> >
> >I just want yo use the correct terms.
> 
> If you're referring to the term, I think subvolume is good enough.
> Which represents your original term, "directories one can snapshot".
> 
> 
> For the whole btrfs "volume", I would just call it "filesystem" to
> avoid the name "volume" or "subvolume" at all.

   Yes, it's a filesystem. (Although that does occasionally cause
confusion between "the conceptual filesystem implemented by btrfs.ko"
and "the concrete filesystem stored on /dev/sda1", but it's generally
far less confusing than the overloading of "root").

   Hugo.

-- 
Hugo Mills | Well, you don't get to be a kernel hacker simply by
hugo@... carfax.org.uk | looking good in Speedos.
http://carfax.org.uk/  |
PGP: E2AB1DE4  | Rusty Russell


signature.asc
Description: Digital signature

Re: generic name for volume and subvolume root?

2017-09-09 Thread Hugo Mills

On Sat, Sep 09, 2017 at 10:35:51AM +0200, Ulli Horlacher wrote:
> As I am writing some documentation abount creating snapshots:
> Is there a generic name for both volume and subvolume root?
> 
> Example:
> 
> root@fex:~# btrfs subvol show /mnt
> ERROR: not a subvolume: /mnt
> 
> root@fex:~# btrfs subvol show /mnt/test
> /mnt/test is toplevel subvolume
> 
> root@fex:~# btrfs subvol show /mnt/test/data
> /mnt/test/data
> Name:   data
> UUID:   b32a5949-dfd6-ef45-8616-34ae4cdf6fb8
> (...)
> 
> root@fex:~# btrfs subvol show /mnt/test/data/sw
> ERROR: not a subvolume: /mnt/test/data/sw
> 
> 
> I can create snapshots of /mnt/test and /mnt/test/data, but not of /mnt
> and /mnt/test/data/sw
> 
> Is there a simple name for directories I can snapshot?

   Subvolume. If you can snapshot it, it's a subvolume. Some
subvolumes are also snapshots. (And all snapshots are subvolumes).

   The subvolume with ID 5 (or ID 0, which is an alias) is the "top
level subvolume", and has the unique property that it can't be
renamed, deleted or replaced, where all other subvolumes can be.

   Hugo.

-- 
Hugo Mills | Well, you don't get to be a kernel hacker simply by
hugo@... carfax.org.uk | looking good in Speedos.
http://carfax.org.uk/  |
PGP: E2AB1DE4  | Rusty Russell


signature.asc
Description: Digital signature

Re: generic name for volume and subvolume root?

2017-09-09 Thread Qu Wenruo




On 2017年09月09日 18:48, Ulli Horlacher wrote:

On Sat 2017-09-09 (18:40), Qu Wenruo wrote:


Is there a generic name for both volume and subvolume root?


Nope, subvolume (including snapshot) is not distinguished by its
filename/path/directory name.

And you can only do snapshot on subvolume (snapshot is one kind of
subvolume) boundary.


So, I can name a btrfs root volume also btrfs subvolume?


Yes, root volume is also a subvolume, so just call "btrfs root volume"
a "subvolume".



I am talking about documentation, not coding!

I just want yo use the correct terms.


If you're referring to the term, I think subvolume is good enough.
Which represents your original term, "directories one can snapshot".


For the whole btrfs "volume", I would just call it "filesystem" to avoid 
the name "volume" or "subvolume" at all.


Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: generic name for volume and subvolume root?

2017-09-09 Thread Ulli Horlacher

On Sat 2017-09-09 (18:40), Qu Wenruo wrote:

> > Is there a generic name for both volume and subvolume root?
> 
> Nope, subvolume (including snapshot) is not distinguished by its 
> filename/path/directory name.
> 
> And you can only do snapshot on subvolume (snapshot is one kind of 
> subvolume) boundary.

So, I can name a btrfs root volume also btrfs subvolume?

I am talking about documentation, not coding!

I just want yo use the correct terms.


-- 
Ullrich Horlacher  Server und Virtualisierung
Rechenzentrum TIK 
Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de
Allmandring 30aTel:++49-711-68565868
70569 Stuttgart (Germany)  WWW:http://www.tik.uni-stuttgart.de/
REF:<48008a58-a82e-d9f7-327e-eeb905e18...@gmx.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: generic name for volume and subvolume root?

2017-09-09 Thread Qu Wenruo




On 2017年09月09日 16:35, Ulli Horlacher wrote:

As I am writing some documentation abount creating snapshots:
Is there a generic name for both volume and subvolume root?

Example:

root@fex:~# btrfs subvol show /mnt
ERROR: not a subvolume: /mnt

root@fex:~# btrfs subvol show /mnt/test
/mnt/test is toplevel subvolume

root@fex:~# btrfs subvol show /mnt/test/data
/mnt/test/data
 Name:   data
 UUID:   b32a5949-dfd6-ef45-8616-34ae4cdf6fb8
(...)

root@fex:~# btrfs subvol show /mnt/test/data/sw
ERROR: not a subvolume: /mnt/test/data/sw


I can create snapshots of /mnt/test and /mnt/test/data, but not of /mnt
and /mnt/test/data/sw

Is there a simple name for directories I can snapshot?




Nope, subvolume (including snapshot) is not distinguished by its 
filename/path/directory name.


And you can only do snapshot on subvolume (snapshot is one kind of 
subvolume) boundary.


For user to determine where is the subvolume boundary, one should first 
determine there is the btrfs mounted and then use "btrfs subvol show" to 
determine the boundaries.


Or, in a btrfs test the directory inode number. Subvolume/snapshot in 
btrfs will always have the same inode number 256, and regular 
file/directories/special files will not use that magic number.


Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Please help with exact actions for raid1 hot-swap

2017-09-09 Thread Patrik Lundquist

On 9 September 2017 at 12:05, Marat Khalili  wrote:
> Forgot to add, I've got a spare empty bay if it can be useful here.

That makes it much easier since you don't have to mount it degraded,
with the risks involved.

Add and partition the disk.

# btrfs replace start /dev/sdb7 /dev/sdc(?)7 /mnt/data

Remove the old disk when it is done.

> --
>
> With Best Regards,
> Marat Khalili
>
> On September 9, 2017 10:46:10 AM GMT+03:00, Marat Khalili  wrote:
>>Dear list,
>>
>>I'm going to replace one hard drive (partition actually) of a btrfs
>>raid1. Can you please spell exactly what I need to do in order to get
>>my
>>filesystem working as RAID1 again after replacement, exactly as it was
>>before? I saw some bad examples of drive replacement in this list so I
>>afraid to just follow random instructions on wiki, and putting this
>>system out of action even temporarily would be very inconvenient.
>>
>>For this filesystem:
>>
>>> $ sudo btrfs fi show /dev/sdb7
>>> Label: 'data'  uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0
>>> Total devices 2 FS bytes used 106.23GiB
>>> devid1 size 2.71TiB used 126.01GiB path /dev/sda7
>>> devid2 size 2.71TiB used 126.01GiB path /dev/sdb7
>>> $ grep /mnt/data /proc/mounts
>>> /dev/sda7 /mnt/data btrfs
>>> rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0
>>> $ sudo btrfs fi df /mnt/data
>>> Data, RAID1: total=123.00GiB, used=104.57GiB
>>> System, RAID1: total=8.00MiB, used=48.00KiB
>>> Metadata, RAID1: total=3.00GiB, used=1.67GiB
>>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>> $ uname -a
>>> Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC
>>> 2017 x86_64 x86_64 x86_64 GNU/Linux
>>
>>I've got this in dmesg:
>>
>>> [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0
>>> action 0x0
>>> [  +0.51] ata6.00: irq_stat 0x4008
>>> [  +0.29] ata6.00: failed command: READ FPDMA QUEUED
>>> [  +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag 3
>>> ncq 57344 in
>>>res 41/40:00:68:6c:f3/00:00:79:00:00/40 Emask
>>> 0x409 (media error) 
>>> [  +0.94] ata6.00: status: { DRDY ERR }
>>> [  +0.26] ata6.00: error: { UNC }
>>> [  +0.001195] ata6.00: configured for UDMA/133
>>> [  +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_OK
>>> driverbyte=DRIVER_SENSE
>>> [  +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error
>>> [current] [descriptor]
>>> [  +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read
>>> error - auto reallocate failed
>>> [  +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00
>>
>>> 79 f3 6c 50 00 00 00 70 00 00
>>> [  +0.03] blk_update_request: I/O error, dev sdb, sector
>>2045996136
>>> [  +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0,
>>rd
>>> 1, flush 0, corrupt 0, gen 0
>>> [  +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0,
>>rd
>>> 2, flush 0, corrupt 0, gen 0
>>> [  +0.77] ata6: EH complete
>>
>>There's still 1 in Current_Pending_Sector line of smartctl output as of
>>
>>now, so it probably won't heal by itself.
>>
>>--
>>
>>With Best Regards,
>>Marat Khalili
>>--
>>To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
>>in
>>the body of a message to majord...@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Please help with exact actions for raid1 hot-swap

2017-09-09 Thread Marat Khalili

Forgot to add, I've got a spare empty bay if it can be useful here.
--

With Best Regards,
Marat Khalili

On September 9, 2017 10:46:10 AM GMT+03:00, Marat Khalili  wrote:
>Dear list,
>
>I'm going to replace one hard drive (partition actually) of a btrfs 
>raid1. Can you please spell exactly what I need to do in order to get
>my 
>filesystem working as RAID1 again after replacement, exactly as it was 
>before? I saw some bad examples of drive replacement in this list so I 
>afraid to just follow random instructions on wiki, and putting this 
>system out of action even temporarily would be very inconvenient.
>
>For this filesystem:
>
>> $ sudo btrfs fi show /dev/sdb7
>> Label: 'data'  uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0
>> Total devices 2 FS bytes used 106.23GiB
>> devid1 size 2.71TiB used 126.01GiB path /dev/sda7
>> devid2 size 2.71TiB used 126.01GiB path /dev/sdb7
>> $ grep /mnt/data /proc/mounts
>> /dev/sda7 /mnt/data btrfs 
>> rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0
>> $ sudo btrfs fi df /mnt/data
>> Data, RAID1: total=123.00GiB, used=104.57GiB
>> System, RAID1: total=8.00MiB, used=48.00KiB
>> Metadata, RAID1: total=3.00GiB, used=1.67GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>> $ uname -a
>> Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 
>> 2017 x86_64 x86_64 x86_64 GNU/Linux
>
>I've got this in dmesg:
>
>> [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 
>> action 0x0
>> [  +0.51] ata6.00: irq_stat 0x4008
>> [  +0.29] ata6.00: failed command: READ FPDMA QUEUED
>> [  +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag 3 
>> ncq 57344 in
>>res 41/40:00:68:6c:f3/00:00:79:00:00/40 Emask 
>> 0x409 (media error) 
>> [  +0.94] ata6.00: status: { DRDY ERR }
>> [  +0.26] ata6.00: error: { UNC }
>> [  +0.001195] ata6.00: configured for UDMA/133
>> [  +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_OK 
>> driverbyte=DRIVER_SENSE
>> [  +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error 
>> [current] [descriptor]
>> [  +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read 
>> error - auto reallocate failed
>> [  +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00
>
>> 79 f3 6c 50 00 00 00 70 00 00
>> [  +0.03] blk_update_request: I/O error, dev sdb, sector
>2045996136
>> [  +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0,
>rd 
>> 1, flush 0, corrupt 0, gen 0
>> [  +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0,
>rd 
>> 2, flush 0, corrupt 0, gen 0
>> [  +0.77] ata6: EH complete
>
>There's still 1 in Current_Pending_Sector line of smartctl output as of
>
>now, so it probably won't heal by itself.
>
>--
>
>With Best Regards,
>Marat Khalili
>--
>To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
>in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Please help with exact actions for raid1 hot-swap

2017-09-09 Thread Patrik Lundquist

On 9 September 2017 at 09:46, Marat Khalili  wrote:
>
> Dear list,
>
> I'm going to replace one hard drive (partition actually) of a btrfs raid1. 
> Can you please spell exactly what I need to do in order to get my filesystem 
> working as RAID1 again after replacement, exactly as it was before? I saw 
> some bad examples of drive replacement in this list so I afraid to just 
> follow random instructions on wiki, and putting this system out of action 
> even temporarily would be very inconvenient.


I recently replaced both disks in a two disk Btrfs raid1 to increase
capacity and took some notes.

Using systemd? systemd will automatically unmount a degraded disk and
ruin your one chance to replace the disk as long as Btrfs has the bug
where it notes single chunks and one disk missing and refuses to mount
degraded again.

Comment out your mount in fstab and run "systemctl daemon-reload". The
mount file in /var/run/systemd/generator/ will be removed. (Is there a
better way?)

Unmount the volume.

# hdparm -Y /dev/sdb
# echo 1 > /sys/block/sdb/device/delete

Replace the disk. Create partitions etc. You might have to restart
smartd, if using it.

Make Btrfs forget the old device. Will otherwise think the old disk is
still there. (Is there a better way?)
# rmmod btrfs; modprobe btrfs
# btrfs device scan

# mount -o degraded /dev/sda7 /mnt/data
# btrfs device usage /mnt/data

# btrfs replace start  /dev/sdbX /mnt/data
# btrfs replace status /mnt/data

Convert single or dup chunks to raid1
# btrfs balance start -fv -dconvert=raid1,soft -mconvert=raid1,soft
-sconvert=raid1,soft /mnt/data

Unmount, restore fstab, reload systemd again, mount.

>
> For this filesystem:
>
>> $ sudo btrfs fi show /dev/sdb7
>> Label: 'data'  uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0
>> Total devices 2 FS bytes used 106.23GiB
>> devid1 size 2.71TiB used 126.01GiB path /dev/sda7
>> devid2 size 2.71TiB used 126.01GiB path /dev/sdb7
>> $ grep /mnt/data /proc/mounts
>> /dev/sda7 /mnt/data btrfs 
>> rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0
>> $ sudo btrfs fi df /mnt/data
>> Data, RAID1: total=123.00GiB, used=104.57GiB
>> System, RAID1: total=8.00MiB, used=48.00KiB
>> Metadata, RAID1: total=3.00GiB, used=1.67GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>> $ uname -a
>> Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017 
>> x86_64 x86_64 x86_64 GNU/Linux
>
>
> I've got this in dmesg:
>
>> [Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 action 
>> 0x0
>> [  +0.51] ata6.00: irq_stat 0x4008
>> [  +0.29] ata6.00: failed command: READ FPDMA QUEUED
>> [  +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag 3 ncq 
>> 57344 in
>>res 41/40:00:68:6c:f3/00:00:79:00:00/40 Emask 0x409 
>> (media error) 
>> [  +0.94] ata6.00: status: { DRDY ERR }
>> [  +0.26] ata6.00: error: { UNC }
>> [  +0.001195] ata6.00: configured for UDMA/133
>> [  +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_OK 
>> driverbyte=DRIVER_SENSE
>> [  +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error [current] 
>> [descriptor]
>> [  +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read error - 
>> auto reallocate failed
>> [  +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 79 f3 
>> 6c 50 00 00 00 70 00 00
>> [  +0.03] blk_update_request: I/O error, dev sdb, sector 2045996136
>> [  +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd 1, 
>> flush 0, corrupt 0, gen 0
>> [  +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd 2, 
>> flush 0, corrupt 0, gen 0
>> [  +0.77] ata6: EH complete
>
>
> There's still 1 in Current_Pending_Sector line of smartctl output as of now, 
> so it probably won't heal by itself.
>
> --
>
> With Best Regards,
> Marat Khalili
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

generic name for volume and subvolume root?

2017-09-09 Thread Ulli Horlacher

As I am writing some documentation abount creating snapshots:
Is there a generic name for both volume and subvolume root?

Example:

root@fex:~# btrfs subvol show /mnt
ERROR: not a subvolume: /mnt

root@fex:~# btrfs subvol show /mnt/test
/mnt/test is toplevel subvolume

root@fex:~# btrfs subvol show /mnt/test/data
/mnt/test/data
Name:   data
UUID:   b32a5949-dfd6-ef45-8616-34ae4cdf6fb8
(...)

root@fex:~# btrfs subvol show /mnt/test/data/sw
ERROR: not a subvolume: /mnt/test/data/sw


I can create snapshots of /mnt/test and /mnt/test/data, but not of /mnt
and /mnt/test/data/sw

Is there a simple name for directories I can snapshot?


-- 
Ullrich Horlacher  Server und Virtualisierung
Rechenzentrum TIK 
Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de
Allmandring 30aTel:++49-711-68565868
70569 Stuttgart (Germany)  WWW:http://www.tik.uni-stuttgart.de/
REF:<20170909083551.gc22...@rus.uni-stuttgart.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Please help with exact actions for raid1 hot-swap

2017-09-09 Thread Marat Khalili


Dear list,

I'm going to replace one hard drive (partition actually) of a btrfs 
raid1. Can you please spell exactly what I need to do in order to get my 
filesystem working as RAID1 again after replacement, exactly as it was 
before? I saw some bad examples of drive replacement in this list so I 
afraid to just follow random instructions on wiki, and putting this 
system out of action even temporarily would be very inconvenient.


For this filesystem:


$ sudo btrfs fi show /dev/sdb7
Label: 'data'  uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0
Total devices 2 FS bytes used 106.23GiB
devid1 size 2.71TiB used 126.01GiB path /dev/sda7
devid2 size 2.71TiB used 126.01GiB path /dev/sdb7
$ grep /mnt/data /proc/mounts
/dev/sda7 /mnt/data btrfs 
rw,noatime,space_cache,autodefrag,subvolid=5,subvol=/ 0 0

$ sudo btrfs fi df /mnt/data
Data, RAID1: total=123.00GiB, used=104.57GiB
System, RAID1: total=8.00MiB, used=48.00KiB
Metadata, RAID1: total=3.00GiB, used=1.67GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
$ uname -a
Linux host 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 
2017 x86_64 x86_64 x86_64 GNU/Linux


I've got this in dmesg:

[Sep 8 20:31] ata6.00: exception Emask 0x0 SAct 0x7ecaa5ef SErr 0x0 
action 0x0

[  +0.51] ata6.00: irq_stat 0x4008
[  +0.29] ata6.00: failed command: READ FPDMA QUEUED
[  +0.38] ata6.00: cmd 60/70:18:50:6c:f3/00:00:79:00:00/40 tag 3 
ncq 57344 in
   res 41/40:00:68:6c:f3/00:00:79:00:00/40 Emask 
0x409 (media error) 

[  +0.94] ata6.00: status: { DRDY ERR }
[  +0.26] ata6.00: error: { UNC }
[  +0.001195] ata6.00: configured for UDMA/133
[  +0.30] sd 6:0:0:0: [sdb] tag#3 FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
[  +0.05] sd 6:0:0:0: [sdb] tag#3 Sense Key : Medium Error 
[current] [descriptor]
[  +0.04] sd 6:0:0:0: [sdb] tag#3 Add. Sense: Unrecovered read 
error - auto reallocate failed
[  +0.05] sd 6:0:0:0: [sdb] tag#3 CDB: Read(16) 88 00 00 00 00 00 
79 f3 6c 50 00 00 00 70 00 00

[  +0.03] blk_update_request: I/O error, dev sdb, sector 2045996136
[  +0.47] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd 
1, flush 0, corrupt 0, gen 0
[  +0.62] BTRFS error (device sda7): bdev /dev/sdb7 errs: wr 0, rd 
2, flush 0, corrupt 0, gen 0

[  +0.77] ata6: EH complete


There's still 1 in Current_Pending_Sector line of smartctl output as of 
now, so it probably won't heal by itself.


--

With Best Regards,
Marat Khalili
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Regarding handling of file renames in Btrfs

Re: Regarding handling of file renames in Btrfs

Re: Please help with exact actions for raid1 hot-swap

Re: btrfs check --repair now runs in minutes instead of hours? aborting

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

Re: Regarding handling of file renames in Btrfs

Regarding handling of file renames in Btrfs

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

Re: netapp-alike snapshots?

Re: netapp-alike snapshots?

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

Re: btrfs check --repair now runs in minutes instead of hours? aborting

[PATCH] btrfs: Clean up dead code in root-tree

Re: netapp-alike snapshots?

Re: netapp-alike snapshots?

Re: netapp-alike snapshots?

Re: Please help with exact actions for raid1 hot-swap

Re: generic name for volume and subvolume root?

Re: generic name for volume and subvolume root?

Re: generic name for volume and subvolume root?

Re: generic name for volume and subvolume root?

Re: generic name for volume and subvolume root?

Re: generic name for volume and subvolume root?

Re: Please help with exact actions for raid1 hot-swap

Re: Please help with exact actions for raid1 hot-swap

Re: Please help with exact actions for raid1 hot-swap

generic name for volume and subvolume root?

Please help with exact actions for raid1 hot-swap

29 matches

Site Navigation

Mail list logo

Footer information