Re: coredump in btrfsck
On Fri, Jan 03, 2014 at 05:14:56PM -0700, Chris Murphy wrote: > > On Jan 3, 2014, at 5:33 AM, Marc MERLIN wrote: > > > > Would it be possible for whoever maintains btrfs-tools to change both > > the man page and the help included in the tool to clearly state that > > running the fsck tool is unlikely to be the right course of action > > and talk about btrfs-zero-log as well as mount -o recovery? > > The problem FAQ doesn't even mention btrfsck so I think people are just > getting around that page or making assumptions. > https://btrfs.wiki.kernel.org/index.php/Problem_FAQ It's easy to find btrfsck without the wiki, whether it's with dpkg -l, rpm -ql, or command line completion. My point is that as you said, it's most often not the command to use, it can even do more damage than good, but neither its command line help, nor the man page warn of anything dangerous or bad in using it. Telling people they should have read a wiki instead of the canonical man page isn't the right way to go longer term, nor how things are done on linux usually. > Should btrfs check (btrfsck without --repair) work similar to xfs_repair when > the file system is not cleanly unmounted? If an XFS volume is not cleanly > unmounted, running xfs_repair will instruct the user to first mount the > volume so that the journal is replayed, then umount the volume, then run > xfs_repair. I don't know about what the actual tool does when it works, I've never had it do anything useful for me, so I can't comment, except about the fact that it should warn users about "I'm not the fsck you're used to or are likely looking for" > A possible variant of this for btrfs check: inform the user the first step in > repairing a problem Btrfs volume is to use -o recovery, for more information > see Btrfs FAQ for additional problem solving recommendations. Yes, along with tweaking the man page to say the same. Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transaction blocked for more than 120 seconds
On Fri, Jan 03, 2014 at 09:34:10PM +, Duncan wrote: > > Thank you for that tip, I had been unaware of it 'till now. > > This will make my virtualbox image directory much happier :) > > I think I said it, but it bears repeating. Once you set that attribute > on the dir, you may want to move the files out of the dir (to another > partition would make sure the data is actually moved) and back in, so > they're effectively new files in the dir. Or use something like cat > oldfile > newfile, so you know it's actually creating the new file, not > reflinking. That'll ensure the NOCOW takes effect. Yes, I got that. That why I ran btrfs defrag on the files after that (I explained why, copy would waste lots of snapshot space by replacing all the block needlessly). > > Unfortunately, on a 83GB vdi (virtualbox) file, with 3.12.5, it did a > > lot of writing and chewed up my 4 CPUs. Then, it started to be hard to > > move my mouse cursor and my procmeter graph was barely updating seconds. > > Next, nothing updated on my X server anymore, not even seconds in time > > widgets. > > > > But, I could still sometimes move my mouse cursor, and I could sometimes > > see the HD light fliker a bit before going dead again. In other words, > > the system wasn't fully deadlocked, but btrfs sure got into a state > > where it was unable to to finish the job, and took the kernel down with > > it (64bit, 8GB of RAM). > > > > I waited 2H and it never came out of it, I had to power down the system > > in the end. Note that this was on a top of the line 500MB/s write > > Samsung Evo 840 SSD, not a slow HD. > > That was defrag (the command) or autodefrag (the mount option)? I'd > guess defrag (the command). defrag, the btrfs subcommand. > That's fragmentation for you! What did/does filefrag have to say about > that file? Were you the one that posted the 6-digit extents? Nope, I never posted anything until now. Hopefully you agree that it's not ok for btrfs/kernel to just kill my system for over 2H until I power it off before of defragging one file. I did hit a severe performance but if it's not a never ending loop. gandalfthegreat:/var/local/nobck/VirtualBox VMs/Win7# filefrag Win7.vdi Win7.vdi: 156222 extents found Considering how virtualbox works, that's hardly surprising. > For something that bad, it might be faster to copy/move it off-device > (expect it to take awhile) then move it back. That way you're only > trying to read OR write on the device, not both, and the move elsewhere > should defrag it quite a bit, effectively sequential write, then read and > write on the move back. Yes, I know how I can work around the problem (although I'll likely have to delete all my historical snapshots to delete the old blocks, which I don't love to do). But doesn't it make sense to see why the kernel is near deadlocking on a single file defrag first? > But even that might be prohibitive. At some point, you may need to > either simply give up on it (if you're lazy), or get down and dirty with > the tracing/profiling, working with a dev to figure out where it's > spending its time and hopefully get btrfs recoded to work a bit faster > for that sort of thing. I'm on my way to a linux conf where I'm speaking, so I have limited time and can't crash my laptop, but I'm happy to type some commands and give output. > As I suggested above, you might try the old school method of defrag, move > the file to a different device, then move it back. And if possible do it > when nothing else is using the system. But it may simply be practically > inaccessible with a current kernel, in which case you'd either have to > work with the devs to optimize, or give it up as a lost cause. =:( I can fix my problem, actually virtualbox works fine with the fragmented file, without even feeling slow, so really I don't need to fix it urgently, I was just trying it out after your post. > Then if the process completed successfully, you could cat the parts back > together again... and the written parts would be basically sequential, so > that should go MUCH faster! =:^) All that noted, but I'm not desperate, just trying commands I hadn't tried yet :) Thanks for your replies, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Sat, 4 Jan 2014 02:56:39 PM Chris Mason wrote: > Seconded +ADs-) We're really focused on nailing down these problems instead > of hiding behind the experimental flag. I know we won't be perfect > overnight, but it's time to focus on production workloads. Perhaps an option here is to remove the need to specify the degraded flag but if the filesystem notice that it is mounting a RAID array and would otherwise fail it then sets the degraded flag itself and carries on? That way the fact it was degraded would be visible in /proc/mounts and could be detected with health check scripts like NRPE for icinga/nagios. Looking at the code this would be in read_one_dev() in fs/btrfs/volumes.c ? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP signature.asc Description: This is a digitally signed message part.
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
Jim Salter posted on Sat, 04 Jan 2014 16:22:53 -0500 as excerpted: > On 01/04/2014 01:10 AM, Duncan wrote: >> The example given in the OP was of a 4-device raid10, already the >> minimum number to work undegraded, with one device dropped out, to >> below the minimum required number to mount undegraded, so of /course/ >> it wouldn't mount without that option. > > The issue was not realizing that a degraded fault-tolerant array would > refuse to mount without being passed an -o degraded option. Yes, it's on > the wiki - but it's on the wiki under *replacing* a device, not in the > FAQ, not in the head of the "multiple devices" section, etc; and no > coherent message is thrown either on the console or in the kernel log > when you do attempt to mount a degraded array without the correct > argument. > > IMO that's a bug. =) I'd agree, usability bug, one of many smoothing out the rough "it works, but it's not easy to work with it" bugs. FWIW I'm seeing progress in that area, now. The rush of functional bugs and fixes for them has finally slowed down to the point where there's beginning to be time to focus on the usability and rough edges bugs. I believe I saw a post in October or November from Chris Mason, where he said yes, the maturing of btrfs has been predicted before, but it really does seem like the functional bugs are slowing down to the point where the usability bugs can finally be addressed, and 2014 really does look like the year that btrfs will finally start shaping up into a mature looking and acting filesystem, including in usability, etc. And Chris mentioned the GSoS project that worked on one angle of this specific issue, too. Getting that code integrated and having btrfs finally be able to recognize a dropped and re-added device and automatically trigger a resync... that'd be a pretty sweet improvement to get. =:^) While they're working on that they may well take a look at at least giving the admin more information on a degraded-needed mount failure, too, tweaking the kernel log messages, etc, and possibly taking a second look as to whether full refusing to mount is the best situation then, or not. Actually, I wonder... what about mounting in such a situation, but read- only and refusing to go writable unless degraded is added too? That would preserve the "first, do no harm, don't make the problem worse" ideal, while mounting but read-only unless degraded is added with the rw, wouldn't be /quite/ as drastic as refusing to mount entirely, unless degraded is added. I actually think that, plus some better logging saying hey, we don't have enough devices to write with the requested raid level, so remount rw,degraded, and either add another device or reconfigure the raid mode to something suitable for the number of devices. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
Chris Samuel posted on Sun, 05 Jan 2014 20:20:26 +1100 as excerpted: > On Sat, 4 Jan 2014 02:56:39 PM Chris Mason wrote: > >> Seconded +ADs-) We're really focused on nailing down these problems >> instead of hiding behind the experimental flag. I know we won't be >> perfect overnight, but it's time to focus on production workloads. > > Perhaps an option here is to remove the need to specify the degraded > flag but if the filesystem notice that it is mounting a RAID array and > would otherwise fail it then sets the degraded flag itself and carries > on? > > That way the fact it was degraded would be visible in /proc/mounts and > could be detected with health check scripts like NRPE for icinga/nagios. > > Looking at the code this would be in read_one_dev() in > fs/btrfs/volumes.c ? The idea I came up elsewhere was to mount read-only, with a dmesg to the effect that the filesystem was configured for a raid-level that the current number of devices couldn't support, so mount rw,degraded to accept that temporarily and to make changes, either by adding a new device to fill out the required number for the configured raid level, or by reducing the configured raid level to match reality. The read-only mount would be better than not mounting at all, while preserving the "first, do no further harm" ideal, since mounted read- only, the existing situation should at least remain stable. It would also alert the admin to problems, with a reasonable log message saying how to fix them, while letting the admin at least access the filesystem in read-only mode, thereby giving him tools access to manage whatever maintenance tasks are necessary, should it be the rootfs. The admin could then take the action they deemed appropriate, whether that was getting the data backed up, or mounting degraded,rw in ordered to either add a device and bring it back to functional or to rebalance to a lower data/metadata redundancy level due to lack of devices. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfsck does not fix
Hello, What messages in dmesg so you get when you use recovery? I'll find out, tomorrow (I can't access the disk just now). Here it is: [90098.989872] btrfs: device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2 transid 162460 /dev/sdc1 That's all. The same in the syslog. Do you have further suggestions to fix the file-system? Regards, Hendrik -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Help! - "btrfs device delete missing" running out of space
Hello, I replaced a failed 500G disk in btrfs raid1 with 2 smaller ones - 200 and 250GB. I started "btrfs device delete missing" command, which is running since friday, and it still seems far from finished. It seems to be doing something very strange: used space on the largest drive is going down, while smaller drives are filling up to the brim. I'm afraid it will soon run out of space and I don't know what to do. The current situation is: Label: none uuid: cff1e711-97fe-4c1c-b8a3-6184010b5027 Total devices 4 FS bytes used 376.66GB devid4 size 172.81GB used 167.00GB path /dev/dm-16 devid3 size 232.88GB used 227.00GB path /dev/dm-4 devid2 size 452.26GB used 327.01GB path /dev/dm-0 *** Some devices missing According to btrfs FAQ and http://carfax.org.uk/btrfs-usage, there should be around 404GB of usable space in this configuration. Assuming even distribution, used space on each device should be accordingly: 160, 216, and 405. The current situation is FAR from that. I'd be grateful for tips what do to and how to get out of this situation. Regards -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
Fengguang, Instead of rebooting, can you trigger a crash dump when this happens and send us the backtrace (to start with)? Kent, Did you do any btrfs test with your changes? Regards, Muthu On Sun, Jan 5, 2014 at 1:46 AM, Fengguang Wu wrote: > Hi Muthu, > > On Fri, Jan 03, 2014 at 11:51:31AM -0800, Muthu Kumar wrote: >> Looks like Kent missed the btrfs endio in the original commit. How >> about this patch: >> >> - >> >> In btrfs_end_bio, call bio_endio_nodec on the restored bio so the >> bi_remaining is accounted for correctly. >> >> Reported-by: fengguang...@intel.com >> Cc: Kent Overstreet >> CC: Jens Axboe >> Signed-off-by: Muthukumar Ratty >> >> >> fs/btrfs/volumes.c |6 +- >> 1 files changed, 5 insertions(+), 1 deletions(-) >> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >> index f2130de..edfed52 100644 >> --- a/fs/btrfs/volumes.c >> +++ b/fs/btrfs/volumes.c >> @@ -5316,7 +5316,11 @@ static void btrfs_end_bio(struct bio *bio, int err) >> } >> kfree(bbio); >> >> - bio_endio(bio, err); >> +/* >> + * Call endio_nodec on the restored bio so the bi_remaining >> is >> + * accounted for correctly >> + */ >> + bio_endio_nodec(bio, err); >> } else if (!is_orig_bio) { >> bio_put(bio); >> } > > Interestingly, the BUG message disappeared but it blocks the test run. > In the end, the test watchdog reboots the machine with SysRq: > > 2014-01-04 23:13:02 mount -t btrfs /dev/vda /fs/vda > [ 20.184264] btrfs: device fsid > f0e06999-0518-47e0-a622-21b8749438be devid 1 transid 4 /dev/vda > [ 20.186552] btrfs: disk space caching is enabled > [ 131.360457] random: nonblocking pool is initialized > ==> [ 1465.069342] SysRq : Emergency Sync > ==> [ 1475.071055] SysRq : Resetting > > Attached is the full dmesg for a good run (v3.13-rc7) and a bad run > (this patch). > > Thanks, > Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfsck does not fix
On Jan 4, 2014, at 2:21 PM, Hendrik Friedel wrote: > Hi Chris, > > > >> I ran btrfsck on my volume with the repair option. When I re-run it, >>I > >> get the same errors as before. >> >> Did you try mounting with -o recovery first? >> https://btrfs.wiki.kernel.org/index.php/Problem_FAQ > > No, I did not. > In fact, I had visited the FAQ before, and my understanding was, that -o > recovery was used/needed when mounting is impossible. This is not the case. > In fact, the disk does work without obvious problems. It mounts without errors? So why then btrfsck/btrfs repair? What precipitated the repair? If mount option -o recovery is used, dmesg should report 'btrfs: enabling auto recovery' and I think you're right if it's mounting OK then probably recovery isn't applicable. Can you just do a btrfs check and report the results? Repair can sometimes make problems worse it seems. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transaction blocked for more than 120 seconds
On Jan 4, 2014, at 11:39 PM, Marc MERLIN wrote: > > Nope, I never posted anything until now. Hopefully you agree that it's > not ok for btrfs/kernel to just kill my system for over 2H until I power > it off before of defragging one file. I did hit a severe performance but > if it's not a never ending loop. > > gandalfthegreat:/var/local/nobck/VirtualBox VMs/Win7# filefrag Win7.vdi > Win7.vdi: 156222 extents found > > Considering how virtualbox works, that's hardly surprising. I haven't read anything so far indicating defrag applies to the VM container use case, rather nodatacow via xattr +C is the way to go. At least for now. > > But doesn't it make sense to see why the kernel is near deadlocking on a > single file defrag first? It's better than a panic or corrupt data. So far the best combination I've found, open to other suggestions though, is +C xattr on /var/lib/libvirt/images, creating non-preallocated qcow2 files, and snapshotting the qcow2 file with qemu-img. Granted when sysroot is snapshot, I'm making btrfs snapshots of these qcow2 files. Another option is to make /var/lib/libvirt/images a subvolume, and then when sysroot is snapshot, then /var/lib/libvirt/images is immune to being snapshot automatically with the parent subvolume. I'd have to explicitly snapshot it. This may be a better way to go to avoid accumulation of btrfs snapshots of qcow2 snapshot files. This may already be a known problem but it's worth sysrq+w, and then dmesg and posting those results if you haven't already. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to properly mount an external usb hard drive & other questions
On 05.01.2014 18:43, Justus Seifert wrote: > On 05.01.2014 05:34, dhan.war wrote: >> hi all >> >> i am using up to date debian sid with xfce desktop environment. i am >> using Linux 3.13-rc6-amd64 #1 SMP Debian 3.13~rc6-1~exp1 (2013-12-30) >> x86_64 GNU/Linux from experimental. >> i have installed usbmount to auto mount all the devices connected >> through USB. >> […] >> >> [e] what is the appropriate fstab entry for my device ? [ i don't want >> to remove usbmount]. > > /dev/sdc /path/to/your/favorite/mountpoint/that/has/to/exist/already > btrfs compress,noauto 0 0 oh i forgot: if you want to mount it without su privileges you have to use: /dev/sdc /path/to/your/favorite/mountpoint compress,noauto,users,user 0 0 also look into subvolume mounting with "subvol=myfirstsubvolume" in your list of mount options, if you want to do cool stuff with subvolumes. <> signature.asc Description: OpenPGP digital signature
Re: how to properly mount an external usb hard drive & other questions
On 05.01.2014 05:34, dhan.war wrote: > hi all > > i am using up to date debian sid with xfce desktop environment. i am > using Linux 3.13-rc6-amd64 #1 SMP Debian 3.13~rc6-1~exp1 (2013-12-30) > x86_64 GNU/Linux from experimental. > i have installed usbmount to auto mount all the devices connected > through USB. > > [cmd# 1] i have created btrfs partition on my external USB hard drive > using the following command : > > # mkfs.btrfs -f -L btrfs -m single /dev/sdc > Turning ON incompat feature 'extref': increased hardlink limit per file > to 65536 > fs created label btrfs on /dev/sdc > nodesize 16384 leafsize 16384 sectorsize 4096 size 931.51GiB > Btrfs v3.12 > > [cmd# 2] my permissions of the device : > # ls -l /dev/sdc > brw-rw 1 root floppy 8, 32 Jan 5 09:47 /dev/sdc > > my questions : > [a] does the partition created by me is appropriate ? it seems ok > [b] how do i specify lzo compression in fstab ? last time when i tried > to create entry fstab it is complaining about the auto mounting of the > device by automount. if you dont want the partition to be mounted with the fstab during boot then you should add "noauto" to the list of options in the respective fstab line. > [c] what compression method is used by btrfs by default for the > partitions created using the command mentioned above. [ cmd# 1] none. if you order mount to use compression without spezifieng the algo, it will use zlib (thats like gz). if you do not use the option "compression" then it will not compress new files. > [d] does the file permissions for my device are accurate ? [ cmd# 2] i dont know. are you member of the group floppy? what is the purpose of the group floppy on your machine? what users are members of the group floppy? > [e] what is the appropriate fstab entry for my device ? [ i don't want > to remove usbmount]. /dev/sdc /path/to/your/favorite/mountpoint/that/has/to/exist/already btrfs compress,noauto 0 0 > [f] should i use single or dup for the device ? maybe use single > > please provide suggestions for configuring my device appropriately. > thank you for reading the message patiently. > > please alway cc me. > > regards, > wardhan. i tried to keep it short. feel free to ask for more. cheers justus <> signature.asc Description: OpenPGP digital signature
Re: btrfs-transaction blocked for more than 120 seconds
On 01/05/2014 12:09 PM, Chris Murphy wrote: I haven't read anything so far indicating defrag applies to the VM container use case, rather nodatacow via xattr +C is the way to go. At least for now. Can you elaborate on the rationale behind database or VM binaries being set nodatacow? I experimented with this*, and found no significant (to me, anyway) performance enhancement with nodatacow on - maybe 10% at best, and if I understand correctly, that implies losing the live per-block checksumming of the data that's set nodatacow, meaning you won't get automatic correction if you're on a redundant array. All I've heard so far is "better performance" without any more detailed explanation, and if the only benefit is an added MAYBE 10%ish performance... I'd rather take the hit, personally. * "experimented with this" == set up a Win2008R2 test VM and ran HDTunePro for several runs on binaries stored with and without nodatacow set, 5G of random and sequential read and write access per run. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to properly mount an external usb hard drive & other questions
On 01/05/2014 12:50 PM, Justus Seifert wrote: oh i forgot: if you want to mount it without su privileges you have to use: /dev/sdc /path/to/your/favorite/mountpoint compress,noauto,users,user 0 0 If you want LZO compression, as you specified: /dev/sdc /path/to/mountpoint compress=lzo,noauto,users,user 0 0 Better yet, if your btrfs is actually on /dev/sdc right now, let's get that fstab entry mounting it by UUID instead. ls -l /dev/disk/by-uuid | grep sdc lrwxrwxrwx 1 root root 10 Jan 3 09:40 12345678-9abc0-1234-5678-9a0123456789 -> ../../sdc So then: # this is not a real UUID, you need to check /dev/disk/by-uuid on your machine for a real UUID UUID=12345678-9abc0-1234-5678-9a0123456789 /path/to/mountpoint compress=lzo,noauto,users,user 0 0 This is EXTRA important with a USB drive, since it's HIGHLY likely it won't always be on the same physical devicename. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to properly mount an external usb hard drive & other questions
On 01/05/2014 01:02 PM, Jim Salter wrote: If you want LZO compression, as you specified: /dev/sdc /path/to/mountpoint compress=lzo,noauto,users,user 0 0 Better yet, if your btrfs is actually on /dev/sdc right now, let's get that fstab entry mounting it by UUID instead. ls -l /dev/disk/by-uuid | grep sdc lrwxrwxrwx 1 root root 10 Jan 3 09:40 12345678-9abc0-1234-5678-9a0123456789 -> ../../sdc So then: # this is not a real UUID, you need to check /dev/disk/by-uuid on your machine for a real UUID UUID=12345678-9abc0-1234-5678-9a0123456789 /path/to/mountpoint compress=lzo,noauto,users,user 0 0 This is EXTRA important with a USB drive, since it's HIGHLY likely it won't always be on the same physical devicename. One other note: in this particular case, you might actually be better served setting compression by mounting the drive normally, then: cd /path/to/drive chattr +c . ; chattr +c * ; chattr +c .* This will set compression on by default for any future files stored on that USB drive, *without* needing any special mount options. Why might this be a better idea? Well, if it's a USB drive, presumably you might want to mount it on foreign systems from time to time. This way, even if you mount the drive on a foreign system that doesn't know anything about your preferences, it will see the +c on the root directory of the drive, and store any new data on the drive compressed. The only caveat: +c won't set the compression algorithm to LZO. It'll be gzip, which is the default algorithm. (And, of course, this won't compress any EXISTING data already stored there - only NEW data written to it after you set the +c attribute.) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Fix 32/64-bit problem with BTRFS_SET_RECEIVED_SUBVOL ioctl
The structure for BTRFS_SET_RECEIVED_IOCTL packs differently on 32-bit and 64-bit systems. This means that it is impossible to use btrfs receive on a system with a 64-bit kernel and 32-bit userspace, because the structure size (and hence the ioctl number) is different. This patch adds a compatibility structure and ioctl to deal with the above case. Signed-off-by: Hugo Mills --- fs/btrfs/ioctl.c | 95 +++- 1 file changed, 87 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 21da576..e186439 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -57,6 +57,32 @@ #include "send.h" #include "dev-replace.h" +#ifdef CONFIG_64BIT +/* If we have a 32-bit userspace and 64-bit kernel, then the UAPI + * structures are incorrect, as the timespec structure from userspace + * is 4 bytes too small. We define these alternatives here to teach + * the kernel about the 32-bit struct packing. + */ +struct btrfs_ioctl_timespec { + __u64 sec; + __u32 nsec; +} ((__packed__)); + +struct btrfs_ioctl_received_subvol_args { + charuuid[BTRFS_UUID_SIZE]; /* in */ + __u64 stransid; /* in */ + __u64 rtransid; /* out */ + struct btrfs_ioctl_timespec stime; /* in */ + struct btrfs_ioctl_timespec rtime; /* out */ + __u64 flags; /* in */ + __u64 reserved[16]; /* in */ +} ((__packed__)); +#endif + +#define BTRFS_IOC_SET_RECEIVED_SUBVOL_32 _IOWR(BTRFS_IOCTL_MAGIC, 37, \ + struct btrfs_ioctl_received_subvol_args_32) + + static int btrfs_clone(struct inode *src, struct inode *inode, u64 off, u64 olen, u64 olen_aligned, u64 destoff); @@ -4313,10 +4339,69 @@ static long btrfs_ioctl_quota_rescan_wait(struct file *file, void __user *arg) return btrfs_qgroup_wait_for_completion(root->fs_info); } +#ifdef CONFIG_64BIT +static long btrfs_ioctl_set_received_subvol_32(struct file *file, + void __user *arg) +{ + struct btrfs_ioctl_received_subvol_args_32 *args32 = NULL; + struct btrfs_ioctl_received_subvol_args *args64 = NULL; + int ret = 0; + + args32 = memdup_user(arg, sizeof(*args32)); + if (IS_ERR(args32)) { + ret = PTR_ERR(args32); + args32 = NULL; + goto out; + } + + args64 = malloc(sizeof(*args64)); + if (IS_ERR(args64)) { + ret = PTR_ERR(args64); + args64 = NULL; + goto out; + } + + memcpy(args64->uuid, args32->uuid, BTRFS_UUID_SIZE); + args64->stransid = args32->stransid; + args64->rtransid = args32->rtransid; + args64->stime.sec = args32->stime.sec; + args64->stime.nsec = args32->stime.nsec; + args64->rtime.sec = args32->rtime.sec; + args64->rtime.nsec = args32->rtime.nsec; + args64->flags = args32->flags; + + ret = _btrfs_ioctl_set_received_subvol(file, args64); + +out: + kfree(args32); + kfree(args64); + return ret; +} +#endif + static long btrfs_ioctl_set_received_subvol(struct file *file, void __user *arg) { struct btrfs_ioctl_received_subvol_args *sa = NULL; + int ret = 0; + + sa = memdup_user(arg, sizeof(*sa)); + if (IS_ERR(sa)) { + ret = PTR_ERR(sa); + sa = NULL; + goto out; + } + + ret = _btrfs_ioctl_set_received_subvol(file, sa); + +out: + kfree(sa); + return ret; +} + +static long _btrfs_ioctl_set_received_subvol(struct file *file, + struct btrfs_ioctl_received_subvol_args *sa) +{ struct inode *inode = file_inode(file); struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_root_item *root_item = &root->root_item; @@ -4346,13 +4431,6 @@ static long btrfs_ioctl_set_received_subvol(struct file *file, goto out; } - sa = memdup_user(arg, sizeof(*sa)); - if (IS_ERR(sa)) { - ret = PTR_ERR(sa); - sa = NULL; - goto out; - } - /* * 1 - root item * 2 - uuid items (received uuid + subvol uuid) @@ -4411,7 +4489,6 @@ static long btrfs_ioctl_set_received_subvol(struct file *file, ret = -EFAULT; out: - kfree(sa); up_write(&root->fs_info->subvol_sem); mnt_drop_write_file(file); return ret; @@ -4572,6 +4649,8 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_balance_progress(root, argp); case BTRFS_IOC_SET_RECEIVED_SUBVOL: return btrfs_ioctl_set_received_subvol(file, argp); + case BTRFS_IOC_SET_RECEIVED_SUBVOL_32: + return btrfs_ioctl_set_received_subvol_32(file, argp);
Re: [PATCH] btrfs: Fix 32/64-bit problem with BTRFS_SET_RECEIVED_SUBVOL ioctl
On Sun, Jan 05, 2014 at 05:55:27PM +, Hugo Mills wrote: > The structure for BTRFS_SET_RECEIVED_IOCTL packs differently on 32-bit > and 64-bit systems. This means that it is impossible to use btrfs > receive on a system with a 64-bit kernel and 32-bit userspace, because > the structure size (and hence the ioctl number) is different. > > This patch adds a compatibility structure and ioctl to deal with the > above case. Oops, forgot to mention -- this has been compile tested, but not actually run yet. The machine in question is several miles away and is a production machine (it's my work desktop, and I can't afford much downtime on it). Hugo. > Signed-off-by: Hugo Mills > --- > fs/btrfs/ioctl.c | 95 > +++- > 1 file changed, 87 insertions(+), 8 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 21da576..e186439 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -57,6 +57,32 @@ > #include "send.h" > #include "dev-replace.h" > > +#ifdef CONFIG_64BIT > +/* If we have a 32-bit userspace and 64-bit kernel, then the UAPI > + * structures are incorrect, as the timespec structure from userspace > + * is 4 bytes too small. We define these alternatives here to teach > + * the kernel about the 32-bit struct packing. > + */ > +struct btrfs_ioctl_timespec { > + __u64 sec; > + __u32 nsec; > +} ((__packed__)); > + > +struct btrfs_ioctl_received_subvol_args { > + charuuid[BTRFS_UUID_SIZE]; /* in */ > + __u64 stransid; /* in */ > + __u64 rtransid; /* out */ > + struct btrfs_ioctl_timespec stime; /* in */ > + struct btrfs_ioctl_timespec rtime; /* out */ > + __u64 flags; /* in */ > + __u64 reserved[16]; /* in */ > +} ((__packed__)); > +#endif > + > +#define BTRFS_IOC_SET_RECEIVED_SUBVOL_32 _IOWR(BTRFS_IOCTL_MAGIC, 37, \ > + struct btrfs_ioctl_received_subvol_args_32) > + > + > static int btrfs_clone(struct inode *src, struct inode *inode, > u64 off, u64 olen, u64 olen_aligned, u64 destoff); > > @@ -4313,10 +4339,69 @@ static long btrfs_ioctl_quota_rescan_wait(struct file > *file, void __user *arg) > return btrfs_qgroup_wait_for_completion(root->fs_info); > } > > +#ifdef CONFIG_64BIT > +static long btrfs_ioctl_set_received_subvol_32(struct file *file, > + void __user *arg) > +{ > + struct btrfs_ioctl_received_subvol_args_32 *args32 = NULL; > + struct btrfs_ioctl_received_subvol_args *args64 = NULL; > + int ret = 0; > + > + args32 = memdup_user(arg, sizeof(*args32)); > + if (IS_ERR(args32)) { > + ret = PTR_ERR(args32); > + args32 = NULL; > + goto out; > + } > + > + args64 = malloc(sizeof(*args64)); > + if (IS_ERR(args64)) { > + ret = PTR_ERR(args64); > + args64 = NULL; > + goto out; > + } > + > + memcpy(args64->uuid, args32->uuid, BTRFS_UUID_SIZE); > + args64->stransid = args32->stransid; > + args64->rtransid = args32->rtransid; > + args64->stime.sec = args32->stime.sec; > + args64->stime.nsec = args32->stime.nsec; > + args64->rtime.sec = args32->rtime.sec; > + args64->rtime.nsec = args32->rtime.nsec; > + args64->flags = args32->flags; > + > + ret = _btrfs_ioctl_set_received_subvol(file, args64); > + > +out: > + kfree(args32); > + kfree(args64); > + return ret; > +} > +#endif > + > static long btrfs_ioctl_set_received_subvol(struct file *file, > void __user *arg) > { > struct btrfs_ioctl_received_subvol_args *sa = NULL; > + int ret = 0; > + > + sa = memdup_user(arg, sizeof(*sa)); > + if (IS_ERR(sa)) { > + ret = PTR_ERR(sa); > + sa = NULL; > + goto out; > + } > + > + ret = _btrfs_ioctl_set_received_subvol(file, sa); > + > +out: > + kfree(sa); > + return ret; > +} > + > +static long _btrfs_ioctl_set_received_subvol(struct file *file, > + struct > btrfs_ioctl_received_subvol_args *sa) > +{ > struct inode *inode = file_inode(file); > struct btrfs_root *root = BTRFS_I(inode)->root; > struct btrfs_root_item *root_item = &root->root_item; > @@ -4346,13 +4431,6 @@ static long btrfs_ioctl_set_received_subvol(struct > file *file, > goto out; > } > > - sa = memdup_user(arg, sizeof(*sa)); > - if (IS_ERR(sa)) { > - ret = PTR_ERR(sa); > - sa = NULL; > - goto out; > - } > - > /* >* 1 - root item >* 2 - uuid items (received uuid + subvol uuid) > @@ -4411,7 +4489,6 @@ static long btrfs_ioctl_set_received_subvol(struct file > *file, > ret = -EFAULT; > > out: > - kfree(sa); > up_write(&root->fs_
Re: Help! - "btrfs device delete missing" running out of space
Hello, > distribution, used space on each device should be accordingly: 160, > 216, and 405. The last number should be 376, I copied the wrong one. Anyway, I deleted as much data as possible, which probably won't help in the end, but at the moment it's still going. Meanwhile, I made a script to replicate this problem: http://pastebin.com/W2c2pJYp On kernel 3.12.6 the output is: --- WARNING! - Btrfs v0.20-rc1 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using adding device /dev/loop1 id 2 fs created label (null) on /dev/loop0 nodesize 4096 leafsize 4096 sectorsize 4096 size 40.00GB Btrfs v0.20-rc1 17000+0 records in 17000+0 records out 17825792000 bytes (18 GB) copied, 533,571 s, 33,4 MB/s Label: none uuid: f8a01060-94c2-4665-b5ff-f134f9b6ad9b Total devices 2 FS bytes used 16.63GB devid2 size 20.00GB used 18.01GB path /dev/loop1 devid1 size 20.00GB used 18.03GB path /dev/loop0 Btrfs v0.20-rc1 ERROR: error removing the device 'missing' - No space left on device Label: none uuid: f8a01060-94c2-4665-b5ff-f134f9b6ad9b Total devices 4 FS bytes used 16.62GB devid4 size 10.00GB used 9.03GB path /dev/loop3 devid3 size 10.00GB used 9.25GB path /dev/loop2 devid1 size 20.00GB used 12.31GB path /dev/loop0 *** Some devices missing Btrfs v0.20-rc1 --- The "delete missing" logic is pretty much broken, at least in this case. Instead of just replicating the data to other drives, it moves some of the data which fills up smaller drives and it fails with "No space left on device" error. Regards -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transaction blocked for more than 120 seconds
Jim Salter posted on Sun, 05 Jan 2014 12:54:44 -0500 as excerpted: > On 01/05/2014 12:09 PM, Chris Murphy wrote: >> I haven't read anything so far indicating defrag applies to the VM >> container use case, rather nodatacow via xattr +C is the way to go. At >> least for now. Well, NOCOW from the get-go would certainly be better, but given that the file is already there and heavily fragmented, my idea was to get it defragmented and then set the +C, to prevent it reoccurring. But I do very little snapshotting here, and as a result hadn't considered the knockon effect of 100K-plus extents in perhaps 1000 snapshots. I guess that's what's killing the defrag, however it's initiated. The only way to get rid of the problem, then, would be to move the file away and then back, but doing so does still leave all those snapshots with the crazy fragmentation, and to kill that would require either killing all those snapshots, or setting them writable and doing the same move out, move back, on each one! OUCH, but I guess that's why it just seems impossible to deal with the fragmentation on these things, whether it's autodefrag, or named file defrag, or doing the whole move out and back thing, and then having to worry about all those snapshots. Still, I'd guess ultimately it'll need done, whether it's a wipe the filesystem and restore from backup or whatever. > Can you elaborate on the rationale behind database or VM binaries being > set nodatacow? I experimented with this*, and found no significant (to > me, > anyway) performance enhancement with nodatacow on - maybe 10% at best, > and if I understand correctly, that implies losing the live per-block > checksumming of the data that's set nodatacow, meaning you won't get > automatic correction if you're on a redundant array. > > All I've heard so far is "better performance" without any more detailed > explanation, and if the only benefit is an added MAYBE 10%ish > performance... I'd rather take the hit, personally. > > * "experimented with this" == set up a Win2008R2 test VM and ran > HDTunePro for several runs on binaries stored with and without nodatacow > set, 5G of random and sequential read and write access per run. Well, the problem isn't just performance, it's that in most such cases the apps actually have their own date integrity checking and management, and sometimes the app's integrity management and that of btrfs end up fighting each other, destroying the data as a result. In normal operation, everything's fine. But should the system crash at the wrong moment, btrfs' atomic commit and data integrity mechanisms can roll back to a slightly earlier version of the file. Which is normally fine. But because hardware is known to often lie about having committed writes that may actually still only be in buffer, if the power outage/crash occurred at the wrong moment, ordinary write-barrier ordering guarantees may be invalid (particularly on large files with finite-seek-speed devices), the app's own integrity checksum may have been updated before the data it was supposed to be a checksum on actually got to disk. If btrfs ends up rolling back to that condition, btrfs will likely consider the file fine, but the app's own integrity management will consider it corrupted, which it actually is. But if btrfs only stays out of the way, the application often can fix whatever minor corruption it detects, doing its own roll-backs to an earlier checkpoint, because it's /designed/ to be able to handle such problems on filesystems that don't have integrity management. So having btrfs trying to manage integrity too on such data where the app already handles it is self-defeating, because neither knows about nor considers what the other one is doing, and the two end up undoing each other's careful work. Again, this isn't something you'll see in normal operation, but several people have reported exactly that sort of problem with the general large- internally-written-file, application-self-managed-file-integrity, scenario. In those cases, the best thing btrfs can do is simply get out of the way and let the application handle its own integrity management, and the way to tell btrfs to do that, as well as to do in-place rewrites instead of COW-based rewrites, is with the NOCOW xattrib, chattr +C, and that must be done before the file gets so fragmented (and multi- snapshotted in its fragmented state) in the first place. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Jan 4, 2014, at 2:16 PM, Jim Salter wrote: > > On 01/04/2014 02:18 PM, Chris Murphy wrote: >> I'm not sure what else you're referring to?(working on boot environment of >> btrfs) > > Just the string of caveats regarding mounting at boot time - needing to > monkeypatch 00_header to avoid the bogus sparse file error I don't know what "bogus sparse file error" refers to. What version of GRUB? I'm seeing Ubuntu 12.03 precise-updates listing GRUB 1.99 which is rather old. > (which, worse, tells you to press a key when pressing a key does nothing) > followed by this, in my opinion completely unexpected, behavior when missing > a disk in a fault-tolerant array, which also requires monkey-patching in > fstab and now elsewhere in GRUB to avoid. and… > I'm aware it's not intended for production yet. On the one hand you say you're aware, yet on the other hand you say the missing disk behavior is completely unexpected. Some parts of Btrfs, in certain contexts, are production ready. But the developmental state of Btrfs places a burden on the user to know more details about that state than he might otherwise be expected to know with more stable/mature file systems. My opinion is that it's inappropriate for degraded mounts to be made automatic when there's no method of notifying user space of this state change. Gnome-shell via udisks will inform users of a degraded md array. Something equivalent to that is needed before Btrfs should enable a scenario where a user boots a computer in degraded state without being informed as if there's nothing wrong at all. That's demonstrably far worse than "scary" boot failure, during which one copy of data is still likely safe, unlike permitting uninformed degraded rw operation. > However, it's just on the cusp, with distributions not only including it in > their installers but a couple teetering on the fence with declaring it their > next default FS (Oracle Unbreakable, OpenSuse, hell even RedHat was flirting > with the idea) that it seems to me some extra testing with an eye towards > production isn't a bad thing. Does the Ubuntu 12.03 LTS installer let you create sysroot on a Btrfs raid1 volume? > That's why I'm here. Not to crap on anybody, but to get involved, hopefully > helpfully. I think you're better off using something more developmental, it necessarily needs to exist in the first place there, before it can trickle down to an LTS release. > >> fs_passno is 1 which doesn't apply to Btrfs. > Again, that's the distribution's default, so the argument should be with > them, not me… Yes so you'd want to file a bug? That's how you get involved. > with that said, I'd respectfully argue that fs_passno 1 is correct for any > root file system; if the file system itself declines to run an fsck that's up > to the filesystem, but it's correct to specify fs_passno 1 if the filesystem > is to be mounted as root in the first place. > > I'm open to hearing why that's a bad idea, if you have a specific reason? It's a minor point, but it shows that fs_passno has become quaint, like grandma's iron cozy. It's not applicable for either XFS or Btrfs. It's arguably inapplicable for ext3/4 but its fsck program has an optimization to skip fully checking the file system if the journal replay succeeds. There is no unattended fsck for either XFS or Btrfs. On systemd systems, it reads fstab, and if fs_passno is non-zero it checks for the existence of /sbin/fsck. and if it doesn't exist, then it doesn't run fsck for that entry. This topic was recently brought up and is in the archives. >> Well actually LVM thinp does have fast snapshots without requiring >> preallocation, and uses COW. > > LVM's snapshots aren't very useful for me - there's a performance penalty > while you have them in place, so they're best used as a transient > use-then-immediately-delete feature, for instance for rsync'ing off a > database binary. Until recently, there also wasn't a good way to roll back an > LV to a snapshot, and even now, that can be pretty problematic. This describes old LVM snapshots, not LVM thinp snapshots. > Finally, there's no way to get a partial copy of an LV snapshot out of the > snapshot and back into production, so if eg you have virtual machines of > significant size, you could be looking at *hours* of file copy operations to > restore an individual VM out of a snapshot (if you even have the drive space > available for it), as compared to btrfs' cp --reflink=always operation, which > allows you to do the same thing instantaneously. LVM isn't a file system, so limitations compared to Btrfs are expected. > >> I'm not sure what you mean by self-correcting, but if the drive reports a >> read error md, lvm, and Btrfs raid1+ all will get missing data from >> mirror/parity reconstruction, and write corrected data back to the bad >> sector. > > You're assuming that the drive will actually *report* a read error, which is > frequen
Re: btrfs-transaction blocked for more than 120 seconds
On Dec 31, 2013, at 4:46 AM, Sulla wrote: > Dear all! > > On my Ubuntu Server 13.10 I use a RAID5 blockdevice consisting of 3 WD20EARS Sulla is this md raid5? If so can you report the result from mdadm -D , I'm curious what the chunk size is. Thanks. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transaction blocked for more than 120 seconds
On Jan 5, 2014, at 12:57 PM, Duncan <1i5t5.dun...@cox.net> wrote: > > But I do very little snapshotting here, and as a result hadn't considered > the knockon effect of 100K-plus extents in perhaps 1000 snapshots. I wonder if this is an issue with snapshot aware defrag? Some problems were fixed recently but I'm not sure of the status. The OP's case involves Btrfs on LVM on (I think) md raid5. The mdadm default stripe size is 512KB, which would be a 1MB full stripe. There are some optimizations for non-full stripe reads and writes for raid5 (not for raid6 so it takes a much bigger performance hit) but nevertheless it might be a factor. > I > guess that's what's killing the defrag, however it's initiated. The only > way to get rid of the problem, then, would be to move the file away and > then back, but doing so does still leave all those snapshots with the > crazy fragmentation, and to kill that would require either killing all > those snapshots, or setting them writable and doing the same move out, > move back, on each one! OUCH, but I guess that's why it just seems > impossible to deal with the fragmentation on these things, whether it's > autodefrag, or named file defrag, or doing the whole move out and back > thing, and then having to worry about all those snapshots. It's why in the short term I'm using +C from the get go. And if I had more VM images and qcow2 snapshots, I would put them in a subvolume of their own so that they aren't snapshotted along with rootfs. Using Btrfs within the VM I still get the features I expect and the performance is quite good. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transaction blocked for more than 120 seconds
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear Chris! Certainly: I have 3 HDDs, all of which WD20EARS. Originally I wanted to let btrfs handle all 3 devices directly without making partitions, but this was impossible, as at least /boot needed to be ext4, at least back then when I set up the server. And back then btrfs also hadn't raid5-like functionality, so I decided to put good old partitions and md-Raids and LVM on them and use btrfs just as plain file-systems on the partitions provided by LVM. On the WD disks I thus created 2 partitions each, the first sdX1 being ~500MiB, the rest, 1.9995 TiB is one partition of, sdX2. I built a Raid1 on the 3 small partitions sdX1 with ext4 for boot, each disk is bootable with grub installed into the MBR. I combined the 3 large partitions to a Raid5 of size 3,64TB: /proc/mdstat reads: md0 : active raid1 sda1[5] sdb1[4] sdc1[3] 498676 blocks super 1.2 [3/3] [UUU] md1 : active raid5 sda2[5] sdb2[4] sdc2[3] 3904907520 blocks super 1.2 level 5, 8k chunk, algorithm 2 [3/3] [UUU] the information you requested: # sudo mdadm -D /dev/md1 /dev/md1: Version : 1.2 Creation Time : Thu Jul 14 18:49:25 2011 Raid Level : raid5 Array Size : 3904907520 (3724.01 GiB 3998.63 GB) Used Dev Size : 1952453760 (1862.01 GiB 1999.31 GB) Raid Devices : 3 Total Devices : 3 Persistence : Superblock is persistent Update Time : Sun Jan 5 22:07:22 2014 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 8K Name : freedom:1 (local to host freedom) UUID : 44b72520:a78af6f7:dba13fb3:2203127d Events : 576884 Number Major Minor RaidDevice State 4 8 180 active sync /dev/sdb2 5 821 active sync /dev/sda2 3 8 342 active sync /dev/sdc2 I use the Raid5 md1 as physical volume for LVM: pvdisplay gives: --- Physical volume --- PV Name /dev/md1 VG Name MAIN PV Size 3.64 TiB / not usable 2.06 MiB Allocatable yes PE Size 4.00 MiB Total PE 953346 Free PE 6274 Allocated PE 947072 PV UUID WcuEx8-ehJL-xHdf-ElwF-b9s3-dlmM-KZlDNG I keep a reserve of 6274 4MiB blocks (=24GiB) in case one of the logical volumes runs out of space... I created the following logical volumes, named after their intended mountpoints: --- Logical volume --- LV Path/dev/MAIN/ROOT LV NameROOT VG NameMAIN LV UUIDkURJks-xHox-73B5-n02x-eZfS-agDD-n1dtAm LV Write Accessread/write LV Creation host, time , LV Status available # open 1 LV Size19.31 GiB Current LE 4944 Segments 2 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 252:0 and similar: --- Logical volume --- LV Path/dev/MAIN/SWAP: 1.8GB LV Path/dev/MAIN/HOME: 18.6GB LV Path/dev/MAIN/TMP: 9.3 GB LV Path/dev/MAIN/DATA1 2.6 TB LV Path/dev/MAIN/DATA2: 0.9 TB as filesystem I used btrfs during install form an ubuntu server, I don't recall which, might have been 11.10 or 12.04 (?) for all logical partitions except swap, of course, any other information I can supply? regards, Sulla - -- Cogito cogito ergo cogito sum. Ambrose Bierce -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.21 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlLJy+8ACgkQR6b2EdogPFupxgCfeDRdeO+PYoQNIjtySAYEmSEr PNoAoLPNcSqDHsDzM8pAuHlbva7j18MS =XBOA -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transaction blocked for more than 120 seconds
On Sun, 05 Jan 2014 08:42:46 -0500 Jim Salter wrote: > On Jan 5, 2014 1:39 AM, Marc MERLIN wrote: > > > > On Fri, Jan 03, 2014 at 09:34:10PM +, Duncan wrote: > > Yes, I got that. That why I ran btrfs defrag on the files after that > > Why are you trying to defrag an SSD? There's no seek penalty for > moving between fragmented blocks, so defrag isn't really desirable in > the first place. [I normally try to reply directly to list but don't believe I've seen this there yet, but got it direct-mailed so will reply-all in response.] There's no seek penalty so the overall problem is dramatically lessened as that's the significant part of it on spinning rust, correct, but... SSDs do remain IOPS-bound, and tens or hundreds of thousands of extents do exact an IOPS (as well as general extent bookkeeping) toll, too. That's why I ended up enabling autodefrag here when I was first setting up, even tho I'm on SSD. (Only after asking the list basically the same question, what good it is autodefrag on SSD, tho.) Luckily I don't happen to deal with any of the internal-write-in-huge-files scenarios, however, and I enabled autodefrag to cover the internal-write-in-small-file scenarios BEFORE I started putting any data on the filesystems at all, so I'm basically covered, here, without actually having to do chattr +C on anything. > That doesn't change the fact that the described lockup sounds like a > bug not a feature of course, but I think the answer to your personal > issue on that particular machine is "don't defrag a solid state > drive". I now believe the lockup must be due to processing the hundreds of thousands of extents on all those snapshots, too, in addition to doing it on the main volume. I don't actually make very extensive use of snapshots here anyway, so I didn't think about that aspect originally, but that's gotta be what's throwing the real spanner in the works, turning a possibly long but workable normal defrag (O(1)) into a lockup scenario (O(n)) where virtually no progress is made as currently coded. -- Duncan - No HTML messages please, as they are filtered as spam. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transaction blocked for more than 120 seconds
On 2014/01/05 11:17 PM, Sulla wrote: Certainly: I have 3 HDDs, all of which WD20EARS. Maybe/maybe-not off-topic: Poor hardware performance, though not necessarily the root cause, can be a major factor with these errors. WD Greens (Reds too, for that matter) have poor non-sequential performance. An educated guess I'd say there's a 15% chance this is a major factor to the problem and, perhaps, a 60% chance it is merely a "small contributor" to the problem. Greens are aimed at consumers wanting high capacity and a low pricepoint. The result is poor performance. See footnote * re my experience. My general recommendation (use cases vary of course) is to install a tiny SSD (60GB, for example) just for the OS. It is typically cheaper than the larger drives and will be *much* faster. WD Greens and Reds have good *sequential* throughput but comparatively abysmal random throughput even in comparison to regular non-SSD consumer drives. * I had 8x 1.5TB WD1500EARS drives in an mdRAID5 array. With it I had a single 250GB IDE disk for the OS. When the very old IDE disk inevitably died, I decided to use a spare 1.5TB drive for the OS. Performance was bad enough that I simply bought my first SSD the same week. -- __ Brendan Hide http://swiftspirit.co.za/ http://www.webafrica.co.za/?AFF1E97 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
FILE_EXTENT_SAME changes mtime and ctime
Hello, I am currently playing with snapshots and manual deduplication of files. During these tests I noticed the change of ctime and mtime in the snapshot after the deduplication with FILE_EXTENT_SAME. Does this happens on purpose? Otherwise I would like to have ctime and mtime left unmodified, because on a read only snapshot I cannot change them back after the ioctl call. I attached a very basic patch, which illustrates my idea. Thanks, Gerhard diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 9d46f60..975d207 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -59,7 +59,7 @@ #include "dev-replace.h" static int btrfs_clone(struct inode *src, struct inode *inode, - u64 off, u64 olen, u64 olen_aligned, u64 destoff); + u64 off, u64 olen, u64 olen_aligned, u64 destoff, int update_time); /* Mask out flags that are inappropriate for the given type of inode. */ static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags) @@ -2683,7 +2683,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 len, ret = btrfs_cmp_data(src, loff, dst, dst_loff, len); if (ret == 0) - ret = btrfs_clone(src, dst, loff, len, len, dst_loff); + ret = btrfs_clone(src, dst, loff, len, len, dst_loff, /* update time */ 0); out_unlock: btrfs_double_unlock(src, loff, dst, dst_loff, len); @@ -2836,9 +2836,10 @@ out: * @olen_aligned: Block-aligned value of olen, extent_same uses * identical values here * @destoff: Offset within @inode to start clone + * @update_time: Should we update ctime and mtime of @inode? */ static int btrfs_clone(struct inode *src, struct inode *inode, - u64 off, u64 olen, u64 olen_aligned, u64 destoff) + u64 off, u64 olen, u64 olen_aligned, u64 destoff, int update_time) { struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_path *path = NULL; @@ -3081,8 +3082,10 @@ static int btrfs_clone(struct inode *src, struct inode *inode, btrfs_mark_buffer_dirty(leaf); btrfs_release_path(path); - inode_inc_iversion(inode); - inode->i_mtime = inode->i_ctime = CURRENT_TIME; + if (update_time) { +inode_inc_iversion(inode); +inode->i_mtime = inode->i_ctime = CURRENT_TIME; + } /* * we round up to the block size at eof when @@ -3227,7 +3230,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd, lock_extent_range(src, off, len); - ret = btrfs_clone(src, inode, off, olen, len, destoff); + ret = btrfs_clone(src, inode, off, olen, len, destoff, /* update time */ 1); unlock_extent(&BTRFS_I(src)->io_tree, off, off + len - 1); out_unlock:
Re: btrfs-transaction blocked for more than 120 seconds
On Mon, 06 Jan 2014 00:36:22 +0200 Brendan Hide wrote: > I had 8x 1.5TB WD1500EARS drives in an mdRAID5 array. With it I had a > single 250GB IDE disk for the OS. When the very old IDE disk inevitably > died, I decided to use a spare 1.5TB drive for the OS. Performance was > bad enough that I simply bought my first SSD the same week. Did you align your partitions to accommodate for the 4K sector of the EARS? -- With respect, Roman signature.asc Description: PGP signature
Re: btrfs-transaction blocked for more than 120 seconds
On Jan 5, 2014, at 2:17 PM, Sulla wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Dear Chris! > > Certainly: I have 3 HDDs, all of which WD20EARS. These drives don't have a configurable SCT ERC, so you need to modify the SCSI block layer timeout: echo 120 >/sys/block/sdX/device/timeout You also need to schedule regular scrubs at the md level as well. echo check > /sys/block/mdX/md/sync_action cat /sys/block/mdX/mismatch_cnt More info about this is in man 4 md, and on the linux-raid list. > > 3904907520 blocks super 1.2 level 5, 8k chunk, algorithm 2 [3/3] [UUU] OK so 8KB chunk, 16KB full stripe, so that doesn't apply to what I was thinking might be the case. The workload is presumably small file sizes, like a mail server? > any other information I can supply? I'm not a developer, I don't know if this problem is known or maybe fixed in a newer kernel than 3.11.0 - which has been around for 5-6 months. I think the main suggestion is to try a newer kernel, granted with the configuration of md, lvm, and btrfs you have three layers that will likely have kernel changes. I'd make sure you have backups. While this layout is valid and should work, it's also probably less common and therefore less tested. Usually in case of blocking devs want to see sysrq+w issued. The setup is dmesg -n7, and enable sysrq functions. Then reproduce the block, and during the block issue w to the sysrq trigger, then capture dmesg contents and post the block and any other nearby btrfs messages. https://www.kernel.org/doc/Documentation/sysrq.txt Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transaction blocked for more than 120 seconds
On Jan 5, 2014, at 4:48 PM, Chris Murphy wrote: > > On Jan 5, 2014, at 2:17 PM, Sulla wrote: > >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA1 >> >> Dear Chris! >> >> Certainly: I have 3 HDDs, all of which WD20EARS. > > These drives don't have a configurable SCT ERC, so you need to modify the > SCSI block layer timeout: > > echo 120 >/sys/block/sdX/device/timeout > > You also need to schedule regular scrubs at the md level as well. > > echo check > /sys/block/mdX/md/sync_action > cat /sys/block/mdX/mismatch_cnt > > More info about this is in man 4 md, and on the linux-raid list. > >> >> 3904907520 blocks super 1.2 level 5, 8k chunk, algorithm 2 [3/3] [UUU] > > OK so 8KB chunk, 16KB full stripe, so that doesn't apply to what I was > thinking might be the case. The workload is presumably small file sizes, like > a mail server? > > >> any other information I can supply? > > I'm not a developer, I don't know if this problem is known or maybe fixed in > a newer kernel than 3.11.0 - which has been around for 5-6 months. I think > the main suggestion is to try a newer kernel, granted with the configuration > of md, lvm, and btrfs you have three layers that will likely have kernel > changes. I'd make sure you have backups. While this layout is valid and > should work, it's also probably less common and therefore less tested. > > Usually in case of blocking devs want to see sysrq+w issued. The setup is > dmesg -n7, and enable sysrq functions. Then reproduce the block, and during > the block issue w to the sysrq trigger, then capture dmesg contents and post > the block and any other nearby btrfs messages. > > https://www.kernel.org/doc/Documentation/sysrq.txt Also, this thread is pretty cluttered with other conversations by now so I think you're best off starting a new thread with this information, maybe a title of "PROBLEM: btrfs on LVM on md raid, blocking > 120 seconds" Since it's almost inevitable you'd be asked to test with a newer kernel anyway, you might as well go to 3.13rc7 and see if you can reproduce, if reproducible, be specific with the problem report by following this template: https://www.kernel.org/pub/linux/docs/lkml/reporting-bugs.html Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transaction blocked for more than 120 seconds
On Jan 5, 2014, at 3:36 PM, Brendan Hide wrote: > WD Greens (Reds too, for that matter) have poor non-sequential performance. > An educated guess I'd say there's a 15% chance this is a major factor to the > problem and, perhaps, a 60% chance it is merely a "small contributor" to the > problem. Greens are aimed at consumers wanting high capacity and a low > pricepoint. The result is poor performance. See footnote * re my experience. > > My general recommendation (use cases vary of course) is to install a tiny SSD > (60GB, for example) just for the OS. It is typically cheaper than the larger > drives and will be *much* faster. WD Greens and Reds have good *sequential* > throughput but comparatively abysmal random throughput even in comparison to > regular non-SSD consumer drives. Another thing with md raid and parallel flie systems that's been an issue is cqf. On the XFS list cqf is approximately in the realm of persona non grata. It might be worth Sulla also setting elevator=deadline and see if simply different scheduling is a work around, not that it's OK to get blocks with cqf. But it might be worth a shot as a more conservative approach to upgrading the kernel from 3.11.0. > I had 8x 1.5TB WD1500EARS drives in an mdRAID5 array. With it I had a single > 250GB IDE disk for the OS. When the very old IDE disk inevitably died, I > decided to use a spare 1.5TB drive for the OS. Performance was bad enough > that I simply bought my first SSD the same week. Yeah for what it's worth, the current WD Green PDF says these drives are not to be used in RAID at all. Not 0, 1, 5 or 6. Even Caviar Black is proscribed from use in RAID environments using multibay chassis, as in, no warranty. It's desktop raid0 and raid1 only, and arguably the lack of configurable SCT ERC makes it not ideal even for raid1. Anyway, Sulla, how about putting up a smartctl -x for each drive? Curious if there are any bad sectors that have developed, and may be worth filtering all /var/log/messages for the word "reset" and see if you find any of these drives ever being reset by the kernel and if so, post the full output of that. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transaction blocked for more than 120 seconds
On Jan 5, 2014, at 5:15 PM, Chris Murphy wrote: > > On Jan 5, 2014, at 3:36 PM, Brendan Hide wrote: > >> WD Greens (Reds too, for that matter) have poor non-sequential performance. >> An educated guess I'd say there's a 15% chance this is a major factor to the >> problem and, perhaps, a 60% chance it is merely a "small contributor" to the >> problem. Greens are aimed at consumers wanting high capacity and a low >> pricepoint. The result is poor performance. See footnote * re my experience. >> >> My general recommendation (use cases vary of course) is to install a tiny >> SSD (60GB, for example) just for the OS. It is typically cheaper than the >> larger drives and will be *much* faster. WD Greens and Reds have good >> *sequential* throughput but comparatively abysmal random throughput even in >> comparison to regular non-SSD consumer drives. > > > Another thing with md raid and parallel flie systems that's been an issue is > cqf. Oops, CFQ! Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transaction blocked for more than 120 seconds
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thanks Chris! Thanks for your support. >> echo 120 >/sys/block/sdX/device/timeout timeout is 30 for my HDDs. I'm well aware that the WD green HDDs are not the perfect ones for servers, but they were cheaper - and quieter - than the black ones for servers. I'll get the red ones next, though. ;-) >> You also need to schedule regular scrubs at the md level as well. Ubuntu does that once a month. >> cat /sys/block/mdX/mismatch_cnt this resides in cat /sys/devices/virtual/block/md1/md/mismatch_cnt on my machine. the count is zero. >> The workload is presumably small file sizes, like a mail server? Yes. It serves as a mailserver (maildir-format), but also as a samba file server with quite big files... btrfs ran fine for more than a year, so I'm not sure how reproducible the problem is... I don't really wish to install or compile cumstom kernels, to be honest. Not sure how problematic they might be during the next do-release-upgrade... Sulla - -- Russian Roulette is not the same without a gun and baby when it's love, if it's not rough, it isn't fun, fun. Lady GaGa, "Pokerface" -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.21 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlLJ+A8ACgkQR6b2EdogPFuFwwCffSjZpDJvIj70Ag+CPbClCVuc viEAnjqnxcEdhKR2Gq84eGYEXfjfb23F =pmTS -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transaction blocked for more than 120 seconds
On Jan 5, 2014, at 5:25 PM, Sulla wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Thanks Chris! > > Thanks for your support. > >>> echo 120 >/sys/block/sdX/device/timeout > timeout is 30 for my HDDs. I don't think those drives support a configurable time out; the Green hasn't support it in years. Where are you getting this information? What do you get for 'smartctl -l scterc /dev/sdX'? > I don't really wish to install or compile cumstom kernels, to be honest. If the problem is reproducible, then that's the fastest way to find out if it's been fixed or not. In this case 3.11 is EOL already, no more updates. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 07/11] btrfs: Add noinode_cache mount option.
On fri, 3 Jan 2014 18:52:44 +0100, David Sterba wrote: On Fri, Jan 03, 2014 at 02:10:30PM +0800, Qu Wenruo wrote: Add noinode_cache mount option to disable inode map cache with remount option. This looks almost safe, there's a sync_filesystem called before the filesystem's remount handler, the transaction gets committed and flushes all tha data related to inode_cache. The caching thread keeps running, which is not a serious problem as it'll finish at umount time, only consuming resources. There's a window between sync_filesystem and successful remount when the INODE_MAP_CACHE bit is set and the cache could be used to get a free ino, then the INODE_MAP_CACHE is cleared but the ino cache remains is not synced back to disk, normally called from transaction commit via btrfs_unpin_free_ino. I haven't looked if something else blocks that to happen. I'd leave this patch out for now, it probably needs more code updates than just unsetting the bit. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Thanks for pointing out the hidden problem. I'll check the related source again to keep this behavior safe or add new codes. So in next patchset, the inode map cache option will not be included and will be seperated to a new patch. Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] btrfs: Add missing pairing mount options.
On fri, 03 Jan 2014 11:52:07 -0600, Eric Sandeen wrote: On 1/3/14, 12:10 AM, Qu Wenruo wrote: Some options should be paired to support triggering different functions when remounting. This patchset add these missing pairing mount options. I think this really would benefit from a regression test which ensures that every remount transition works properly... Thanks, -Eric Xfstests test case for the remounting test is under development and will submit soon.(for both generic and btrfs mount options) As far as I tested, no problem occurs in my test environment but since the IO pressure is low, a more heavier test case is needed though. Qu changelog: v1: Initial commit with only barrier option v2: Add other missing pairing options Qu Wenruo (11): btrfs: Add "barrier" option to support "-o remount,barrier" btrfs: Add noautodefrag mount option. btrfs: Add nocheck_int mount option. btrfs: Add nodiscard mount option. btrfs: Add noenospc_debug mount option. btrfs: Add noflushoncommit mount option. btrfs: Add noinode_cache mount option. btrfs: Add acl mount option. btrfs: Add datacow mount option. btrfs: Add datasum mount option. btrfs: Add treelog mount option. Documentation/filesystems/btrfs.txt | 56 ++-- fs/btrfs/super.c| 74 - 2 files changed, 110 insertions(+), 20 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] btrfs: Add missing pairing mount options.
On fri, 3 Jan 2014 18:58:28 +0100, David Sterba wrote: On Fri, Jan 03, 2014 at 02:10:23PM +0800, Qu Wenruo wrote: Some options should be paired to support triggering different functions when remounting. This patchset add these missing pairing mount options. Thanks! btrfs: Add nocheck_int mount option. btrfs: Add noinode_cache mount option. Commented separately, imho not to be merged in current state. btrfs: Add "barrier" option to support "-o remount,barrier" btrfs: Add noautodefrag mount option. btrfs: Add nodiscard mount option. btrfs: Add noenospc_debug mount option. btrfs: Add noflushoncommit mount option. btrfs: Add acl mount option. btrfs: Add datacow mount option. btrfs: Add datasum mount option. btrfs: Add treelog mount option. All ok. Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Thanks for your commenting. nocheck_int and noinode_cache will be remove in next version, and noinode_cache will be resent as a independent patch after more investigation and tests. Also remounting test case will be added to xfstest soon. Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 03/11] btrfs: Add nocheck_int mount option.
On Fri, 3 Jan 2014 18:13:08 +0100, David Sterba wrote: On Fri, Jan 03, 2014 at 02:10:26PM +0800, Qu Wenruo wrote: Add nocheck_int mount option to disable integrity check with remount option. + nocheck_int disables all the debug options above. I think this option is not needed, the integrity checker is a deveoplment functionality and used by people who know what they're doing. Besides this would need to clean up all the data structures that the checker uses (see eg. btrfsic_unmount that's called only if the mount option is used). I see little benefit compared to the amount of work to make sure that disabling the checker functionality in the middle works properly. david That's right, since most people won't enable integrity check until checking the all-yes config or running xfstests, it's better not to add this option. Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transaction blocked for more than 120 seconds
On Jan 5, 2014, at 6:29 PM, Sulla wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hi Chris! > > # sudo smartctl -l scterc /dev/sda > tells me > SCT Error Recovery Control command not supported > > you're right. the /sys/block/sdX/device/timeout file probably is useless then. OK there's some confusion. /sys/block/sdX/device/timeout is the SCSI block layer timeout - linux itself has a timeout for each command issued to a block device, and will reset the link upon timeout being reached. So writing 120 to this will cause linux to wait for up to 120 seconds for the drive to respond. This is necessary because if there's a bad sector, the drive must report a read error in order for the md driver to reconstruct that data from parity. This is needed bothfor effective scrubs, and recovery on read error in normal operation. It is not a persistent setting so you'll want to create a start up script for it. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/9] btrfs: Add missing pairing mount options.
Some options should be paired to support triggering different functions when remounting. This patchset add these missing pairing mount options except noinode_cache, which may need more investigation to ensure the safety and will be sent as independent patch. Qu Wenruo (9): btrfs: Add "barrier" option to support "-o remount,barrier" btrfs: Add noautodefrag mount option. btrfs: Add nodiscard mount option. btrfs: Add noenospc_debug mount option. btrfs: Add noflushoncommit mount option. btrfs: Add acl mount option. btrfs: Add datacow mount option. btrfs: Add datasum mount option. btrfs: Add treelog mount option. Documentation/filesystems/btrfs.txt | 47 +++ fs/btrfs/super.c| 55 - 2 files changed, 84 insertions(+), 18 deletions(-) -- 1.8.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/9] btrfs: Add "barrier" option to support "-o remount,barrier"
Btrfs can be remounted without barrier, but there is no "barrier" option so nobody can remount btrfs back with barrier on. Only umount and mount again can re-enable barrier.(Quite awkward) Also the mount options in the document is also changed slightly for the further pairing options changes. Reported-by: Daniel Blueman Signed-off-by: Qu Wenruo Signed-off-by: Mike Fleetwood Reviewed-by: David Sterba --- Changelog: v1: Add barrier option v2: Document style change v3: Small description change --- Documentation/filesystems/btrfs.txt | 13 +++-- fs/btrfs/super.c| 8 +++- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt index 5dd282d..ce487a2 100644 --- a/Documentation/filesystems/btrfs.txt +++ b/Documentation/filesystems/btrfs.txt @@ -38,7 +38,7 @@ Mount Options = When mounting a btrfs filesystem, the following option are accepted. -Unless otherwise specified, all options default to off. +Options with (*) are default options and will not show in the mount options. alloc_start= Debugging option to force all block allocations above a certain @@ -138,12 +138,13 @@ Unless otherwise specified, all options default to off. Disable support for Posix Access Control Lists (ACLs). See the acl(5) manual page for more information about ACLs. + barrier(*) nobarrier -Disables the use of block layer write barriers. Write barriers ensure - that certain IOs make it through the device cache and are on persistent - storage. If used on a device with a volatile (non-battery-backed) - write-back cache, this option will lead to filesystem corruption on a - system crash or power loss. +Enable/disable the use of block layer write barriers. Write barriers + ensure that certain IOs make it through the device cache and are on + persistent storage. If disabled on a device with a volatile + (non-battery-backed) write-back cache, nobarrier option will lead to + filesystem corruption on a system crash or power loss. nodatacow Disable data copy-on-write for newly created files. Implies nodatasum, diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index e9c13fb..fe9d8a6 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -323,7 +323,7 @@ enum { Opt_no_space_cache, Opt_recovery, Opt_skip_balance, Opt_check_integrity, Opt_check_integrity_including_extent_data, Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree, - Opt_commit_interval, + Opt_commit_interval, Opt_barrier, Opt_err, }; @@ -335,6 +335,7 @@ static match_table_t tokens = { {Opt_nodatasum, "nodatasum"}, {Opt_nodatacow, "nodatacow"}, {Opt_nobarrier, "nobarrier"}, + {Opt_barrier, "barrier"}, {Opt_max_inline, "max_inline=%s"}, {Opt_alloc_start, "alloc_start=%s"}, {Opt_thread_pool, "thread_pool=%d"}, @@ -494,6 +495,11 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) btrfs_clear_opt(info->mount_opt, SSD); btrfs_clear_opt(info->mount_opt, SSD_SPREAD); break; + case Opt_barrier: + if (btrfs_test_opt(root, NOBARRIER)) + btrfs_info(root->fs_info, "turning on barriers"); + btrfs_clear_opt(info->mount_opt, NOBARRIER); + break; case Opt_nobarrier: btrfs_info(root->fs_info, "turning off barriers"); btrfs_set_opt(info->mount_opt, NOBARRIER); -- 1.8.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 5/9] btrfs: Add noflushoncommit mount option.
Add noflushoncommit mount option to disable flush on commit with remount option. Signed-off-by: Qu Wenruo Reviewed-by: David Sterba --- Changelog: v2: Add noflushoncommit option v3: None --- Documentation/filesystems/btrfs.txt | 1 + fs/btrfs/super.c| 8 +++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt index 13a7cac..303b49c 100644 --- a/Documentation/filesystems/btrfs.txt +++ b/Documentation/filesystems/btrfs.txt @@ -117,6 +117,7 @@ Options with (*) are default options and will not show in the mount options. "bug" - BUG() on a fatal error. This is the default. "panic" - panic() on a fatal error. + noflushoncommit(*) flushoncommit The 'flushoncommit' mount option forces any data dirtied by a write in a prior transaction to commit as part of the current commit. This makes diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index acf3e7d..b2c752e 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -324,7 +324,7 @@ enum { Opt_check_integrity, Opt_check_integrity_including_extent_data, Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree, Opt_commit_interval, Opt_barrier, Opt_nodefrag, Opt_nodiscard, - Opt_noenospc_debug, + Opt_noenospc_debug, Opt_noflushoncommit, Opt_err, }; @@ -350,6 +350,7 @@ static match_table_t tokens = { {Opt_noacl, "noacl"}, {Opt_notreelog, "notreelog"}, {Opt_flushoncommit, "flushoncommit"}, + {Opt_noflushoncommit, "noflushoncommit"}, {Opt_ratio, "metadata_ratio=%d"}, {Opt_discard, "discard"}, {Opt_nodiscard, "nodiscard"}, @@ -562,6 +563,11 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) btrfs_info(root->fs_info, "turning on flush-on-commit"); btrfs_set_opt(info->mount_opt, FLUSHONCOMMIT); break; + case Opt_noflushoncommit: + if (btrfs_test_opt(root, FLUSHONCOMMIT)) + btrfs_info(root->fs_info, "turning off flush-on-commit"); + btrfs_clear_opt(info->mount_opt, FLUSHONCOMMIT); + break; case Opt_ratio: ret = match_int(&args[0], &intarg); if (ret) { -- 1.8.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 4/9] btrfs: Add noenospc_debug mount option.
Add noenospc_debug mount option to disable ENOSPC debug with remount option. Signed-off-by: Qu Wenruo Reviewed-by: David Sterba --- Changelog: v2: Add noenospc_debug option v3: None --- Documentation/filesystems/btrfs.txt | 3 ++- fs/btrfs/super.c| 5 + 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt index 7254cf5..13a7cac 100644 --- a/Documentation/filesystems/btrfs.txt +++ b/Documentation/filesystems/btrfs.txt @@ -108,8 +108,9 @@ Options with (*) are default options and will not show in the mount options. performance impact. (The fstrim command is also available to initiate batch trims from userspace). + noenospc_debug(*) enospc_debug - Debugging option to be more verbose in some ENOSPC conditions. + Disable/enable debugging option to be more verbose in some ENOSPC conditions. fatal_errors= Action to take when encountering a fatal error: diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 8731ee6..acf3e7d 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -324,6 +324,7 @@ enum { Opt_check_integrity, Opt_check_integrity_including_extent_data, Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree, Opt_commit_interval, Opt_barrier, Opt_nodefrag, Opt_nodiscard, + Opt_noenospc_debug, Opt_err, }; @@ -356,6 +357,7 @@ static match_table_t tokens = { {Opt_clear_cache, "clear_cache"}, {Opt_user_subvol_rm_allowed, "user_subvol_rm_allowed"}, {Opt_enospc_debug, "enospc_debug"}, + {Opt_noenospc_debug, "noenospc_debug"}, {Opt_subvolrootid, "subvolrootid=%d"}, {Opt_defrag, "autodefrag"}, {Opt_nodefrag, "noautodefrag"}, @@ -603,6 +605,9 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) case Opt_enospc_debug: btrfs_set_opt(info->mount_opt, ENOSPC_DEBUG); break; + case Opt_noenospc_debug: + btrfs_clear_opt(info->mount_opt, ENOSPC_DEBUG); + break; case Opt_defrag: btrfs_info(root->fs_info, "enabling auto defrag"); btrfs_set_opt(info->mount_opt, AUTO_DEFRAG); -- 1.8.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/9] btrfs: Add noautodefrag mount option.
Btrfs has autodefrag mount option but no pairing noautodefrag option, which makes it impossible to disable autodefrag without umount. Signed-off-by: Qu Wenruo Reviewed-by: David Sterba --- Changelog: v2: Add noautodefrag option v3: None --- Documentation/filesystems/btrfs.txt | 8 +--- fs/btrfs/super.c| 8 +++- 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt index ce487a2..e87609a 100644 --- a/Documentation/filesystems/btrfs.txt +++ b/Documentation/filesystems/btrfs.txt @@ -46,10 +46,12 @@ Options with (*) are default options and will not show in the mount options. bytes, optionally with a K, M, or G suffix, case insensitive. Default is 1MB. + noautodefrag(*) autodefrag - Detect small random writes into files and queue them up for the - defrag process. Works best for small files; Not well suited for - large database workloads. + Disable/enable auto defragmentation. + Auto defragmentation detects small random writes into files and queue + them up for the defrag process. Works best for small files; + Not well suited for large database workloads. check_int check_int_data diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index fe9d8a6..c65f696 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -323,7 +323,7 @@ enum { Opt_no_space_cache, Opt_recovery, Opt_skip_balance, Opt_check_integrity, Opt_check_integrity_including_extent_data, Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree, - Opt_commit_interval, Opt_barrier, + Opt_commit_interval, Opt_barrier, Opt_nodefrag, Opt_err, }; @@ -357,6 +357,7 @@ static match_table_t tokens = { {Opt_enospc_debug, "enospc_debug"}, {Opt_subvolrootid, "subvolrootid=%d"}, {Opt_defrag, "autodefrag"}, + {Opt_nodefrag, "noautodefrag"}, {Opt_inode_cache, "inode_cache"}, {Opt_no_space_cache, "nospace_cache"}, {Opt_recovery, "recovery"}, @@ -602,6 +603,11 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) btrfs_info(root->fs_info, "enabling auto defrag"); btrfs_set_opt(info->mount_opt, AUTO_DEFRAG); break; + case Opt_nodefrag: + if (btrfs_test_opt(root, AUTO_DEFRAG)) + btrfs_info(root->fs_info, "disabling auto defrag"); + btrfs_clear_opt(info->mount_opt, AUTO_DEFRAG); + break; case Opt_recovery: btrfs_info(root->fs_info, "enabling auto recovery"); btrfs_set_opt(info->mount_opt, RECOVERY); -- 1.8.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/9] btrfs: Add nodiscard mount option.
Add nodiscard mount option to disable discard with remount option. Signed-off-by: Qu Wenruo Reviewed-by: David Sterba --- Changelog: v2: Add nodiscard option v3: None --- Documentation/filesystems/btrfs.txt | 7 +-- fs/btrfs/super.c| 6 +- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt index e87609a..7254cf5 100644 --- a/Documentation/filesystems/btrfs.txt +++ b/Documentation/filesystems/btrfs.txt @@ -98,9 +98,12 @@ Options with (*) are default options and will not show in the mount options. can be avoided. Especially useful when trying to mount a multi-device setup as root. May be specified multiple times for multiple devices. + nodiscard(*) discard - Issue frequent commands to let the block device reclaim space freed by - the filesystem. This is useful for SSD devices, thinly provisioned + Disable/enable discard mount option. + Discard issues frequent commands to let the block device reclaim space + freed by the filesystem. + This is useful for SSD devices, thinly provisioned LUNs and virtual machine images, but may have a significant performance impact. (The fstrim command is also available to initiate batch trims from userspace). diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index c65f696..8731ee6 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -323,7 +323,7 @@ enum { Opt_no_space_cache, Opt_recovery, Opt_skip_balance, Opt_check_integrity, Opt_check_integrity_including_extent_data, Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree, - Opt_commit_interval, Opt_barrier, Opt_nodefrag, + Opt_commit_interval, Opt_barrier, Opt_nodefrag, Opt_nodiscard, Opt_err, }; @@ -351,6 +351,7 @@ static match_table_t tokens = { {Opt_flushoncommit, "flushoncommit"}, {Opt_ratio, "metadata_ratio=%d"}, {Opt_discard, "discard"}, + {Opt_nodiscard, "nodiscard"}, {Opt_space_cache, "space_cache"}, {Opt_clear_cache, "clear_cache"}, {Opt_user_subvol_rm_allowed, "user_subvol_rm_allowed"}, @@ -575,6 +576,9 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) case Opt_discard: btrfs_set_opt(info->mount_opt, DISCARD); break; + case Opt_nodiscard: + btrfs_clear_opt(info->mount_opt, DISCARD); + break; case Opt_space_cache: btrfs_set_opt(info->mount_opt, SPACE_CACHE); break; -- 1.8.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 7/9] btrfs: Add datacow mount option.
Add datacow mount option to enable copy-on-write with remount option. Signed-off-by: Qu Wenruo Reviewed-by: David Sterba --- Changelog: v2: add datacow mount option v3: None --- Documentation/filesystems/btrfs.txt | 5 +++-- fs/btrfs/super.c| 8 +++- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt index 79c08f3..bbd1f0f 100644 --- a/Documentation/filesystems/btrfs.txt +++ b/Documentation/filesystems/btrfs.txt @@ -154,9 +154,10 @@ Options with (*) are default options and will not show in the mount options. (non-battery-backed) write-back cache, nobarrier option will lead to filesystem corruption on a system crash or power loss. + datacow(*) nodatacow - Disable data copy-on-write for newly created files. Implies nodatasum, - and disables all compression. + Enable/disable data copy-on-write for newly created files. + Nodatacow implies nodatasum, and disables all compression. nodatasum Disable data checksumming for newly created files. diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 3d743cf..1bf9202 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -324,7 +324,7 @@ enum { Opt_check_integrity, Opt_check_integrity_including_extent_data, Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree, Opt_commit_interval, Opt_barrier, Opt_nodefrag, Opt_nodiscard, - Opt_noenospc_debug, Opt_noflushoncommit, Opt_acl, + Opt_noenospc_debug, Opt_noflushoncommit, Opt_acl, Opt_datacow, Opt_err, }; @@ -335,6 +335,7 @@ static match_table_t tokens = { {Opt_device, "device=%s"}, {Opt_nodatasum, "nodatasum"}, {Opt_nodatacow, "nodatacow"}, + {Opt_datacow, "datacow"}, {Opt_nobarrier, "nobarrier"}, {Opt_barrier, "barrier"}, {Opt_max_inline, "max_inline=%s"}, @@ -446,6 +447,11 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) btrfs_set_opt(info->mount_opt, NODATACOW); btrfs_set_opt(info->mount_opt, NODATASUM); break; + case Opt_datacow: + if (btrfs_test_opt(root, NODATACOW)) + btrfs_info(root->fs_info, "setting datacow"); + btrfs_clear_opt(info->mount_opt, NODATACOW); + break; case Opt_compress_force: case Opt_compress_force_type: compress_force = true; -- 1.8.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 8/9] btrfs: Add datasum mount option.
Add datasum mount option to enable checksum with remount option. Signed-off-by: Qu Wenruo Reviewed-by: David Sterba --- Changelog: v2: Add datasum option v3: None --- Documentation/filesystems/btrfs.txt | 4 +++- fs/btrfs/super.c| 10 ++ 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt index bbd1f0f..e05c6ae 100644 --- a/Documentation/filesystems/btrfs.txt +++ b/Documentation/filesystems/btrfs.txt @@ -159,8 +159,10 @@ Options with (*) are default options and will not show in the mount options. Enable/disable data copy-on-write for newly created files. Nodatacow implies nodatasum, and disables all compression. + datasum(*) nodatasum - Disable data checksumming for newly created files. + Enable/disable data checksumming for newly created files. + Datasum implies datacow. notreelog Disable the tree logging used for fsync and O_SYNC writes. diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 1bf9202..fa74252 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -325,6 +325,7 @@ enum { Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree, Opt_commit_interval, Opt_barrier, Opt_nodefrag, Opt_nodiscard, Opt_noenospc_debug, Opt_noflushoncommit, Opt_acl, Opt_datacow, + Opt_datasum, Opt_err, }; @@ -334,6 +335,7 @@ static match_table_t tokens = { {Opt_subvolid, "subvolid=%s"}, {Opt_device, "device=%s"}, {Opt_nodatasum, "nodatasum"}, + {Opt_datasum, "datasum"}, {Opt_nodatacow, "nodatacow"}, {Opt_datacow, "datacow"}, {Opt_nobarrier, "nobarrier"}, @@ -434,6 +436,14 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) btrfs_info(root->fs_info, "setting nodatasum"); btrfs_set_opt(info->mount_opt, NODATASUM); break; + case Opt_datasum: + if (btrfs_test_opt(root, NODATACOW)) + btrfs_info(root->fs_info, "setting datasum, datacow enabled"); + else + btrfs_info(root->fs_info, "setting datasum"); + btrfs_clear_opt(info->mount_opt, NODATACOW); + btrfs_clear_opt(info->mount_opt, NODATASUM); + break; case Opt_nodatacow: if (!btrfs_test_opt(root, COMPRESS) || !btrfs_test_opt(root, FORCE_COMPRESS)) { -- 1.8.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 9/9] btrfs: Add treelog mount option.
Add treelog mount option to enable tree log with remount option. Signed-off-by: Qu Wenruo Reviewed-by: David Sterba --- Changelog: v2: Add treelog option v3: None --- Documentation/filesystems/btrfs.txt | 3 ++- fs/btrfs/super.c| 8 +++- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt index e05c6ae..d11cc2f 100644 --- a/Documentation/filesystems/btrfs.txt +++ b/Documentation/filesystems/btrfs.txt @@ -164,8 +164,9 @@ Options with (*) are default options and will not show in the mount options. Enable/disable data checksumming for newly created files. Datasum implies datacow. + treelog(*) notreelog - Disable the tree logging used for fsync and O_SYNC writes. + Enable/disable the tree logging used for fsync and O_SYNC writes. recovery Enable autorecovery attempts if a bad tree root is found at mount time. diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index fa74252..d353b9e 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -325,7 +325,7 @@ enum { Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree, Opt_commit_interval, Opt_barrier, Opt_nodefrag, Opt_nodiscard, Opt_noenospc_debug, Opt_noflushoncommit, Opt_acl, Opt_datacow, - Opt_datasum, + Opt_datasum, Opt_treelog, Opt_err, }; @@ -353,6 +353,7 @@ static match_table_t tokens = { {Opt_acl, "acl"}, {Opt_noacl, "noacl"}, {Opt_notreelog, "notreelog"}, + {Opt_treelog, "treelog"}, {Opt_flushoncommit, "flushoncommit"}, {Opt_noflushoncommit, "noflushoncommit"}, {Opt_ratio, "metadata_ratio=%d"}, @@ -579,6 +580,11 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) btrfs_info(root->fs_info, "disabling tree log"); btrfs_set_opt(info->mount_opt, NOTREELOG); break; + case Opt_treelog: + if (btrfs_test_opt(root, NOTREELOG)) + btrfs_info(root->fs_info, "enabling tree log"); + btrfs_clear_opt(info->mount_opt, NOTREELOG); + break; case Opt_flushoncommit: btrfs_info(root->fs_info, "turning on flush-on-commit"); btrfs_set_opt(info->mount_opt, FLUSHONCOMMIT); -- 1.8.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 6/9] btrfs: Add acl mount option.
Add acl mount option to enable acl with remount option. Signed-off-by: Qu Wenruo Reviewed-by: David Sterba --- Changelog: v2: add acl option v3: None --- Documentation/filesystems/btrfs.txt | 3 ++- fs/btrfs/super.c| 6 +- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt index 303b49c..79c08f3 100644 --- a/Documentation/filesystems/btrfs.txt +++ b/Documentation/filesystems/btrfs.txt @@ -141,8 +141,9 @@ Options with (*) are default options and will not show in the mount options. Specify that 1 metadata chunk should be allocated after every data chunks. Off by default. + acl(*) noacl - Disable support for Posix Access Control Lists (ACLs). See the + Enable/disable support for Posix Access Control Lists (ACLs). See the acl(5) manual page for more information about ACLs. barrier(*) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index b2c752e..3d743cf 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -324,7 +324,7 @@ enum { Opt_check_integrity, Opt_check_integrity_including_extent_data, Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree, Opt_commit_interval, Opt_barrier, Opt_nodefrag, Opt_nodiscard, - Opt_noenospc_debug, Opt_noflushoncommit, + Opt_noenospc_debug, Opt_noflushoncommit, Opt_acl, Opt_err, }; @@ -347,6 +347,7 @@ static match_table_t tokens = { {Opt_ssd, "ssd"}, {Opt_ssd_spread, "ssd_spread"}, {Opt_nossd, "nossd"}, + {Opt_acl, "acl"}, {Opt_noacl, "noacl"}, {Opt_notreelog, "notreelog"}, {Opt_flushoncommit, "flushoncommit"}, @@ -552,6 +553,9 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) goto out; } break; + case Opt_acl: + root->fs_info->sb->s_flags |= MS_POSIXACL; + break; case Opt_noacl: root->fs_info->sb->s_flags &= ~MS_POSIXACL; break; -- 1.8.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html