Re: coredump in btrfsck

2014-01-05 Thread Marc MERLIN
On Fri, Jan 03, 2014 at 05:14:56PM -0700, Chris Murphy wrote:
 
 On Jan 3, 2014, at 5:33 AM, Marc MERLIN m...@merlins.org wrote:
  
  Would it be possible for whoever maintains btrfs-tools to change both
  the man page and the help included in the tool to clearly state that
  running the fsck tool is unlikely to be the right course of action
  and talk about btrfs-zero-log as well as mount -o recovery?
 
 The problem FAQ doesn't even mention btrfsck so I think people are just 
 getting around that page or making assumptions.
 https://btrfs.wiki.kernel.org/index.php/Problem_FAQ

It's easy to find btrfsck without the wiki, whether it's with dpkg -l,
rpm -ql, or command line completion.
My point is that as you said, it's most often not the command to use, it
can even do more damage than good, but neither its command line help,
nor the man page warn of anything dangerous or bad in using it.

Telling people they should have read a wiki instead of the canonical man
page isn't the right way to go longer term, nor how things are done on
linux usually.
 
 Should btrfs check (btrfsck without --repair) work similar to xfs_repair when 
 the file system is not cleanly unmounted? If an XFS volume is not cleanly 
 unmounted, running xfs_repair will instruct the user to first mount the 
 volume so that the journal is replayed, then umount the volume, then run 
 xfs_repair.

I don't know about what the actual tool does when it works, I've never
had it do anything useful for me, so I can't comment, except about the
fact that it should warn users about I'm not the fsck you're used to or
are likely looking for

 A possible variant of this for btrfs check: inform the user the first step in 
 repairing a problem Btrfs volume is to use -o recovery, for more information 
 see Btrfs FAQ url for additional problem solving recommendations.

Yes, along with tweaking the man page to say the same.

Thanks,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-05 Thread Marc MERLIN
On Fri, Jan 03, 2014 at 09:34:10PM +, Duncan wrote:
  Thank you for that tip, I had been unaware of it 'till now.
  This will make my virtualbox image directory much happier :)
 
 I think I said it, but it bears repeating.  Once you set that attribute 
 on the dir, you may want to move the files out of the dir (to another 
 partition would make sure the data is actually moved) and back in, so 
 they're effectively new files in the dir.  Or use something like cat 
 oldfile  newfile, so you know it's actually creating the new file, not 
 reflinking.  That'll ensure the NOCOW takes effect.

Yes, I got that. That why I ran btrfs defrag on the files after that (I
explained why, copy would waste lots of snapshot space by replacing all
the block needlessly).
 
  Unfortunately, on a 83GB vdi (virtualbox) file, with 3.12.5, it did a
  lot of writing and chewed up my 4 CPUs. Then, it started to be hard to
  move my mouse cursor and my procmeter graph was barely updating seconds.
  Next, nothing updated on my X server anymore, not even seconds in time
  widgets.
  
  But, I could still sometimes move my mouse cursor, and I could sometimes
  see the HD light fliker a bit before going dead again. In other words,
  the system wasn't fully deadlocked, but btrfs sure got into a state
  where it was unable to to finish the job, and took the kernel down with
  it (64bit, 8GB of RAM).
  
  I waited 2H and it never came out of it, I had to power down the system
  in the end.  Note that this was on a top of the line 500MB/s write
  Samsung Evo 840 SSD, not a slow HD.
 
 That was defrag (the command) or autodefrag (the mount option)?  I'd 
 guess defrag (the command).

defrag, the btrfs subcommand.

 That's fragmentation for you!  What did/does filefrag have to say about 
 that file?  Were you the one that posted the 6-digit extents?

Nope, I never posted anything until now. Hopefully you agree that it's
not ok for btrfs/kernel to just kill my system for over 2H until I power
it off before of defragging one file. I did hit a severe performance but
if it's not a never ending loop.

gandalfthegreat:/var/local/nobck/VirtualBox VMs/Win7# filefrag Win7.vdi 
Win7.vdi: 156222 extents found

Considering how virtualbox works, that's hardly surprising.

 For something that bad, it might be faster to copy/move it off-device 
 (expect it to take awhile) then move it back.  That way you're only 
 trying to read OR write on the device, not both, and the move elsewhere 
 should defrag it quite a bit, effectively sequential write, then read and 
 write on the move back.

Yes, I know how I can work around the problem (although I'll likely have
to delete all my historical snapshots to delete the old blocks, which I
don't love to do).
But doesn't it make sense to see why the kernel is near deadlocking on a
single file defrag first?

 But even that might be prohibitive.  At some point, you may need to 
 either simply give up on it (if you're lazy), or get down and dirty with 
 the tracing/profiling, working with a dev to figure out where it's 
 spending its time and hopefully get btrfs recoded to work a bit faster 
 for that sort of thing.

I'm on my way to a linux conf where I'm speaking, so I have limited time
and can't crash my laptop, but I'm happy to type some commands and give
output.

 As I suggested above, you might try the old school method of defrag, move 
 the file to a different device, then move it back.  And if possible do it 
 when nothing else is using the system.  But it may simply be practically 
 inaccessible with a current kernel, in which case you'd either have to 
 work with the devs to optimize, or give it up as a lost cause. =:(
 
I can fix my problem, actually virtualbox works fine with the fragmented
file, without even feeling slow, so really I don't need to fix it
urgently, I was just trying it out after your post.
 
 Then if the process completed successfully, you could cat the parts back 
 together again... and the written parts would be basically sequential, so 
 that should go MUCH faster! =:^)

All that noted, but I'm not desperate, just trying commands I hadn't
tried yet :)

Thanks for your replies,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-05 Thread Chris Samuel
On Sat, 4 Jan 2014 02:56:39 PM Chris Mason wrote:

 Seconded +ADs-)  We're really focused on nailing down these problems instead
 of hiding behind the experimental flag.  I know we won't be perfect
 overnight, but it's time to focus on production workloads.

Perhaps an option here is to remove the need to specify the degraded flag but 
if the filesystem notice that it is mounting a RAID array and would otherwise 
fail it then sets the degraded flag itself and carries on?

That way the fact it was degraded would be visible in /proc/mounts and could 
be detected with health check scripts like NRPE for icinga/nagios.

Looking at the code this would be in read_one_dev() in fs/btrfs/volumes.c ?

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP


signature.asc
Description: This is a digitally signed message part.


Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-05 Thread Duncan
Jim Salter posted on Sat, 04 Jan 2014 16:22:53 -0500 as excerpted:


 On 01/04/2014 01:10 AM, Duncan wrote:
 The example given in the OP was of a 4-device raid10, already the
 minimum number to work undegraded, with one device dropped out, to
 below the minimum required number to mount undegraded, so of /course/
 it wouldn't mount without that option.
 
 The issue was not realizing that a degraded fault-tolerant array would
 refuse to mount without being passed an -o degraded option. Yes, it's on
 the wiki - but it's on the wiki under *replacing* a device, not in the
 FAQ, not in the head of the multiple devices section, etc; and no
 coherent message is thrown either on the console or in the kernel log
 when you do attempt to mount a degraded array without the correct
 argument.
 
 IMO that's a bug. =)

I'd agree, usability bug, one of many smoothing out the rough it works, 
but it's not easy to work with it bugs.

FWIW I'm seeing progress in that area, now.  The rush of functional bugs 
and fixes for them has finally slowed down to the point where there's 
beginning to be time to focus on the usability and rough edges bugs.  I 
believe I saw a post in October or November from Chris Mason, where he 
said yes, the maturing of btrfs has been predicted before, but it really 
does seem like the functional bugs are slowing down to the point where 
the usability bugs can finally be addressed, and 2014 really does look 
like the year that btrfs will finally start shaping up into a mature 
looking and acting filesystem, including in usability, etc.

And Chris mentioned the GSoS project that worked on one angle of this 
specific issue, too.  Getting that code integrated and having btrfs 
finally be able to recognize a dropped and re-added device and 
automatically trigger a resync... that'd be a pretty sweet improvement to 
get. =:^)  While they're working on that they may well take a look at at 
least giving the admin more information on a degraded-needed mount 
failure, too, tweaking the kernel log messages, etc, and possibly taking 
a second look as to whether full refusing to mount is the best situation 
then, or not.

Actually, I wonder... what about mounting in such a situation, but read-
only and refusing to go writable unless degraded is added too?  That 
would preserve the first, do no harm, don't make the problem worse 
ideal, while mounting but read-only unless degraded is added with the rw, 
wouldn't be /quite/ as drastic as refusing to mount entirely, unless 
degraded is added.  I actually think that, plus some better logging 
saying hey, we don't have enough devices to write with the requested raid 
level, so remount rw,degraded, and either add another device or 
reconfigure the raid mode to something suitable for the number of devices.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-05 Thread Duncan
Chris Samuel posted on Sun, 05 Jan 2014 20:20:26 +1100 as excerpted:

 On Sat, 4 Jan 2014 02:56:39 PM Chris Mason wrote:
 
 Seconded +ADs-)  We're really focused on nailing down these problems
 instead of hiding behind the experimental flag.  I know we won't be
 perfect overnight, but it's time to focus on production workloads.
 
 Perhaps an option here is to remove the need to specify the degraded
 flag but if the filesystem notice that it is mounting a RAID array and
 would otherwise fail it then sets the degraded flag itself and carries
 on?
 
 That way the fact it was degraded would be visible in /proc/mounts and
 could be detected with health check scripts like NRPE for icinga/nagios.
 
 Looking at the code this would be in read_one_dev() in
 fs/btrfs/volumes.c ?

The idea I came up elsewhere was to mount read-only, with a dmesg to the 
effect that the filesystem was configured for a raid-level that the 
current number of devices couldn't support, so mount rw,degraded to 
accept that temporarily and to make changes, either by adding a new 
device to fill out the required number for the configured raid level, or 
by reducing the configured raid level to match reality.

The read-only mount would be better than not mounting at all, while 
preserving the first, do no further harm ideal, since mounted read-
only, the existing situation should at least remain stable.  It would 
also alert the admin to problems, with a reasonable log message saying 
how to fix them, while letting the admin at least access the filesystem 
in read-only mode, thereby giving him tools access to manage whatever 
maintenance tasks are necessary, should it be the rootfs.  The admin 
could then take the action they deemed appropriate, whether that was 
getting the data backed up, or mounting degraded,rw in ordered to either 
add a device and bring it back to functional or to rebalance to a lower 
data/metadata redundancy level due to lack of devices.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck does not fix

2014-01-05 Thread Hendrik Friedel

Hello,



What messages in dmesg so you get when you use recovery?


I'll find out, tomorrow (I can't access the disk just now).


Here it is:
[90098.989872] btrfs: device fsid 989306aa-d291-4752-8477-0baf94f8c42f 
devid 2 transid 162460 /dev/sdc1


That's all. The same in the syslog.

Do you have further suggestions to fix the file-system?

Regards,
Hendrik
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748

2014-01-05 Thread Muthu Kumar
Fengguang,
Instead of rebooting, can you trigger a crash dump when this happens
and send us the backtrace (to start with)?

Kent,
Did you do any btrfs test with your changes?

Regards,
Muthu

On Sun, Jan 5, 2014 at 1:46 AM, Fengguang Wu fengguang...@intel.com wrote:
 Hi Muthu,

 On Fri, Jan 03, 2014 at 11:51:31AM -0800, Muthu Kumar wrote:
 Looks like Kent missed the btrfs endio in the original commit. How
 about this patch:

 -

 In btrfs_end_bio, call bio_endio_nodec on the restored bio so the
 bi_remaining is accounted for correctly.

 Reported-by: fengguang...@intel.com
 Cc: Kent Overstreet k...@daterainc.com
 CC: Jens Axboe ax...@kernel.dk
 Signed-off-by: Muthukumar Ratty mut...@gmail.com
 

  fs/btrfs/volumes.c |6 +-
  1 files changed, 5 insertions(+), 1 deletions(-)

 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index f2130de..edfed52 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -5316,7 +5316,11 @@ static void btrfs_end_bio(struct bio *bio, int err)
 }
 kfree(bbio);

 -   bio_endio(bio, err);
 +/*
 + * Call endio_nodec on the restored bio so the bi_remaining 
 is
 + * accounted for correctly
 + */
 +   bio_endio_nodec(bio, err);
 } else if (!is_orig_bio) {
 bio_put(bio);
 }

 Interestingly, the BUG message disappeared but it blocks the test run.
 In the end, the test watchdog reboots the machine with SysRq:

 2014-01-04 23:13:02 mount -t btrfs /dev/vda /fs/vda
 [   20.184264] btrfs: device fsid 
 f0e06999-0518-47e0-a622-21b8749438be devid 1 transid 4 /dev/vda
 [   20.186552] btrfs: disk space caching is enabled
 [  131.360457] random: nonblocking pool is initialized
 == [ 1465.069342] SysRq : Emergency Sync
 == [ 1475.071055] SysRq : Resetting

 Attached is the full dmesg for a good run (v3.13-rc7) and a bad run
 (this patch).

 Thanks,
 Fengguang
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck does not fix

2014-01-05 Thread Chris Murphy

On Jan 4, 2014, at 2:21 PM, Hendrik Friedel hend...@friedels.name wrote:

 Hi Chris,
 
 
  I ran btrfsck on my volume with the repair option. When I re-run it, I 
  get the same errors as before.
 
 Did you try mounting with -o recovery first?
 https://btrfs.wiki.kernel.org/index.php/Problem_FAQ
 
 No, I did not.
 In fact, I had visited the FAQ before, and my understanding was, that -o 
 recovery was used/needed when mounting is impossible. This is not the case. 
 In fact, the disk does work without obvious problems.

It mounts without errors? So why then btrfsck/btrfs repair? What precipitated 
the repair?

If mount option -o recovery is used, dmesg should report 'btrfs: enabling auto 
recovery' and I think you're right if it's mounting OK then probably recovery 
isn't applicable. Can you just do a btrfs check dev and report the results? 
Repair can sometimes make problems worse it seems.


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-05 Thread Chris Murphy

On Jan 4, 2014, at 11:39 PM, Marc MERLIN m...@merlins.org wrote:

 
 Nope, I never posted anything until now. Hopefully you agree that it's
 not ok for btrfs/kernel to just kill my system for over 2H until I power
 it off before of defragging one file. I did hit a severe performance but
 if it's not a never ending loop.
 
 gandalfthegreat:/var/local/nobck/VirtualBox VMs/Win7# filefrag Win7.vdi 
 Win7.vdi: 156222 extents found
 
 Considering how virtualbox works, that's hardly surprising.

I haven't read anything so far indicating defrag applies to the VM container 
use case, rather nodatacow via xattr +C is the way to go. At least for now.

 
 But doesn't it make sense to see why the kernel is near deadlocking on a
 single file defrag first?

It's better than a panic or corrupt data. So far the best combination I've 
found, open to other suggestions though, is +C xattr on 
/var/lib/libvirt/images, creating non-preallocated qcow2 files, and 
snapshotting the qcow2 file with qemu-img. Granted when sysroot is snapshot, 
I'm making btrfs snapshots of these qcow2 files. Another option is to make 
/var/lib/libvirt/images a subvolume, and then when sysroot is snapshot, then 
/var/lib/libvirt/images is immune to being snapshot automatically with the 
parent subvolume. I'd have to explicitly snapshot it. This may be a better way 
to go to avoid accumulation of btrfs snapshots of qcow2 snapshot files.

This may already be a known problem but it's worth sysrq+w, and then dmesg and 
posting those results if you haven't already.


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to properly mount an external usb hard drive other questions

2014-01-05 Thread Justus Seifert
On 05.01.2014 18:43, Justus Seifert wrote:
 On 05.01.2014 05:34, dhan.war wrote:
 hi all

 i am using up to date debian sid with xfce desktop environment. i am
 using Linux 3.13-rc6-amd64 #1 SMP Debian 3.13~rc6-1~exp1 (2013-12-30)
 x86_64 GNU/Linux from experimental.
 i have installed usbmount to auto mount all the devices connected
 through USB.
 […]

 [e] what is the appropriate fstab entry for my device ? [ i don't want
 to remove usbmount].
 
 /dev/sdc /path/to/your/favorite/mountpoint/that/has/to/exist/already
 btrfs compress,noauto 0 0

oh i forgot: if you want to mount it without su privileges you have to use:
/dev/sdc /path/to/your/favorite/mountpoint compress,noauto,users,user 0 0

also look into subvolume mounting with subvol=myfirstsubvolume in your
list of mount options, if you want to do cool stuff with subvolumes.


attachment: justus_seifert.vcf

signature.asc
Description: OpenPGP digital signature


Re: how to properly mount an external usb hard drive other questions

2014-01-05 Thread Justus Seifert
On 05.01.2014 05:34, dhan.war wrote:
 hi all
 
 i am using up to date debian sid with xfce desktop environment. i am
 using Linux 3.13-rc6-amd64 #1 SMP Debian 3.13~rc6-1~exp1 (2013-12-30)
 x86_64 GNU/Linux from experimental.
 i have installed usbmount to auto mount all the devices connected
 through USB.
 
 [cmd# 1] i have created btrfs partition on my external USB hard drive
 using the following command :
 
 # mkfs.btrfs -f -L btrfs -m single /dev/sdc
 Turning ON incompat feature 'extref': increased hardlink limit per file
 to 65536
 fs created label btrfs on /dev/sdc
 nodesize 16384 leafsize 16384 sectorsize 4096 size 931.51GiB
 Btrfs v3.12
 
 [cmd# 2] my permissions of the device :
 # ls -l /dev/sdc
 brw-rw 1 root floppy 8, 32 Jan  5 09:47 /dev/sdc
 
 my questions :
 [a] does the partition created by me is appropriate ?

it seems ok

 [b] how do i specify lzo compression in fstab ? last time when i tried
 to create entry fstab it is complaining about the auto mounting of the
 device by automount.

if you dont want the partition to be mounted with the fstab during boot
then you should add noauto to the list of options in the respective
fstab line.

 [c] what compression method is used by btrfs by default for the
 partitions created using the command mentioned above. [ cmd# 1]

none.  if you order mount to use compression without spezifieng the
algo, it will use zlib (thats like gz).  if you do not use the option
compression then it will not compress new files.

 [d] does the file permissions for my device are accurate ? [ cmd# 2]

i dont know.  are you member of the group floppy?  what is the purpose
of the group floppy on your machine?  what users are members of the
group floppy?

 [e] what is the appropriate fstab entry for my device ? [ i don't want
 to remove usbmount].

/dev/sdc /path/to/your/favorite/mountpoint/that/has/to/exist/already
btrfs compress,noauto 0 0

 [f] should i use single or dup for the device ?

maybe use single

 
 please provide suggestions for configuring my device appropriately.
 thank you for reading the message patiently.
 
 please alway cc me.
 
 regards,
 wardhan.


i tried to keep it short.  feel free to ask for more.

cheers
justus
attachment: justus_seifert.vcf

signature.asc
Description: OpenPGP digital signature


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-05 Thread Jim Salter


On 01/05/2014 12:09 PM, Chris Murphy wrote:
I haven't read anything so far indicating defrag applies to the VM 
container use case, rather nodatacow via xattr +C is the way to go. At 
least for now. 


Can you elaborate on the rationale behind database or VM binaries being 
set nodatacow? I experimented with this*, and found no significant (to 
me, anyway) performance enhancement with nodatacow on - maybe 10% at 
best, and if I understand correctly, that implies losing the live 
per-block checksumming of the data that's set nodatacow, meaning you 
won't get automatic correction if you're on a redundant array.


All I've heard so far is better performance without any more detailed 
explanation, and if the only benefit is an added MAYBE 10%ish 
performance... I'd rather take the hit, personally.


* experimented with this == set up a Win2008R2 test VM and ran 
HDTunePro for several runs on binaries stored with and without nodatacow 
set, 5G of random and sequential read and write access per run.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to properly mount an external usb hard drive other questions

2014-01-05 Thread Jim Salter


On 01/05/2014 12:50 PM, Justus Seifert wrote:
oh i forgot: if you want to mount it without su privileges you have to 
use:


/dev/sdc /path/to/your/favorite/mountpoint compress,noauto,users,user 0 0

If you want LZO compression, as you specified:

  /dev/sdc /path/to/mountpoint compress=lzo,noauto,users,user 0 0

Better yet, if your btrfs is actually on /dev/sdc right now, let's get 
that fstab entry mounting it by UUID instead.


  ls -l /dev/disk/by-uuid | grep sdc
  lrwxrwxrwx 1 root root 10 Jan  3 09:40 
12345678-9abc0-1234-5678-9a0123456789 - ../../sdc


So then:

  # this is not a real UUID, you need to check 
/dev/disk/by-uuid on your machine for a real UUID
  UUID=12345678-9abc0-1234-5678-9a0123456789 
/path/to/mountpoint   compress=lzo,noauto,users,user   0   0


This is EXTRA important with a USB drive, since it's HIGHLY likely it 
won't always be on the same physical devicename.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to properly mount an external usb hard drive other questions

2014-01-05 Thread Jim Salter

On 01/05/2014 01:02 PM, Jim Salter wrote:

If you want LZO compression, as you specified:

  /dev/sdc /path/to/mountpoint compress=lzo,noauto,users,user 0 0

Better yet, if your btrfs is actually on /dev/sdc right now, let's get 
that fstab entry mounting it by UUID instead.


  ls -l /dev/disk/by-uuid | grep sdc
  lrwxrwxrwx 1 root root 10 Jan  3 09:40 
12345678-9abc0-1234-5678-9a0123456789 - ../../sdc


So then:

  # this is not a real UUID, you need to check 
/dev/disk/by-uuid on your machine for a real UUID
  UUID=12345678-9abc0-1234-5678-9a0123456789 
/path/to/mountpoint   compress=lzo,noauto,users,user   0   0


This is EXTRA important with a USB drive, since it's HIGHLY likely it 
won't always be on the same physical devicename.


One other note: in this particular case, you might actually be better 
served setting compression by mounting the drive normally, then:


 cd /path/to/drive
 chattr +c . ; chattr +c * ; chattr +c .*

This will set compression on by default for any future files stored on 
that USB drive, *without* needing any special mount options.


Why might this be a better idea? Well, if it's a USB drive, presumably 
you might want to mount it on foreign systems from time to time. This 
way, even if you mount the drive on a foreign system that doesn't know 
anything about your preferences, it will see the +c on the root 
directory of the drive, and store any new data on the drive compressed.


The only caveat: +c won't set the compression algorithm to LZO. It'll be 
gzip, which is the default algorithm. (And, of course, this won't 
compress any EXISTING data already stored there - only NEW data written 
to it after you set the +c attribute.)

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: Fix 32/64-bit problem with BTRFS_SET_RECEIVED_SUBVOL ioctl

2014-01-05 Thread Hugo Mills
The structure for BTRFS_SET_RECEIVED_IOCTL packs differently on 32-bit
and 64-bit systems. This means that it is impossible to use btrfs
receive on a system with a 64-bit kernel and 32-bit userspace, because
the structure size (and hence the ioctl number) is different.

This patch adds a compatibility structure and ioctl to deal with the
above case.

Signed-off-by: Hugo Mills h...@carfax.org.uk
---
 fs/btrfs/ioctl.c | 95 +++-
 1 file changed, 87 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 21da576..e186439 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -57,6 +57,32 @@
 #include send.h
 #include dev-replace.h
 
+#ifdef CONFIG_64BIT
+/* If we have a 32-bit userspace and 64-bit kernel, then the UAPI
+ * structures are incorrect, as the timespec structure from userspace
+ * is 4 bytes too small. We define these alternatives here to teach
+ * the kernel about the 32-bit struct packing.
+ */
+struct btrfs_ioctl_timespec {
+   __u64 sec;
+   __u32 nsec;
+} ((__packed__));
+
+struct btrfs_ioctl_received_subvol_args {
+   charuuid[BTRFS_UUID_SIZE];  /* in */
+   __u64   stransid;   /* in */
+   __u64   rtransid;   /* out */
+   struct btrfs_ioctl_timespec stime; /* in */
+   struct btrfs_ioctl_timespec rtime; /* out */
+   __u64   flags;  /* in */
+   __u64   reserved[16];   /* in */
+} ((__packed__));
+#endif
+
+#define BTRFS_IOC_SET_RECEIVED_SUBVOL_32 _IOWR(BTRFS_IOCTL_MAGIC, 37, \
+   struct btrfs_ioctl_received_subvol_args_32)
+
+
 static int btrfs_clone(struct inode *src, struct inode *inode,
   u64 off, u64 olen, u64 olen_aligned, u64 destoff);
 
@@ -4313,10 +4339,69 @@ static long btrfs_ioctl_quota_rescan_wait(struct file 
*file, void __user *arg)
return btrfs_qgroup_wait_for_completion(root-fs_info);
 }
 
+#ifdef CONFIG_64BIT
+static long btrfs_ioctl_set_received_subvol_32(struct file *file,
+   void __user *arg)
+{
+   struct btrfs_ioctl_received_subvol_args_32 *args32 = NULL;
+   struct btrfs_ioctl_received_subvol_args *args64 = NULL;
+   int ret = 0;
+
+   args32 = memdup_user(arg, sizeof(*args32));
+   if (IS_ERR(args32)) {
+   ret = PTR_ERR(args32);
+   args32 = NULL;
+   goto out;
+   }
+
+   args64 = malloc(sizeof(*args64));
+   if (IS_ERR(args64)) {
+   ret = PTR_ERR(args64);
+   args64 = NULL;
+   goto out;
+   }
+
+   memcpy(args64-uuid, args32-uuid, BTRFS_UUID_SIZE);
+   args64-stransid = args32-stransid;
+   args64-rtransid = args32-rtransid;
+   args64-stime.sec = args32-stime.sec;
+   args64-stime.nsec = args32-stime.nsec;
+   args64-rtime.sec = args32-rtime.sec;
+   args64-rtime.nsec = args32-rtime.nsec;
+   args64-flags = args32-flags;
+
+   ret = _btrfs_ioctl_set_received_subvol(file, args64);
+
+out:
+   kfree(args32);
+   kfree(args64);
+   return ret;
+}
+#endif
+
 static long btrfs_ioctl_set_received_subvol(struct file *file,
void __user *arg)
 {
struct btrfs_ioctl_received_subvol_args *sa = NULL;
+   int ret = 0;
+
+   sa = memdup_user(arg, sizeof(*sa));
+   if (IS_ERR(sa)) {
+   ret = PTR_ERR(sa);
+   sa = NULL;
+   goto out;
+   }
+
+   ret = _btrfs_ioctl_set_received_subvol(file, sa);
+
+out:
+   kfree(sa);
+   return ret;
+}
+
+static long _btrfs_ioctl_set_received_subvol(struct file *file,
+   struct 
btrfs_ioctl_received_subvol_args *sa)
+{
struct inode *inode = file_inode(file);
struct btrfs_root *root = BTRFS_I(inode)-root;
struct btrfs_root_item *root_item = root-root_item;
@@ -4346,13 +4431,6 @@ static long btrfs_ioctl_set_received_subvol(struct file 
*file,
goto out;
}
 
-   sa = memdup_user(arg, sizeof(*sa));
-   if (IS_ERR(sa)) {
-   ret = PTR_ERR(sa);
-   sa = NULL;
-   goto out;
-   }
-
/*
 * 1 - root item
 * 2 - uuid items (received uuid + subvol uuid)
@@ -4411,7 +4489,6 @@ static long btrfs_ioctl_set_received_subvol(struct file 
*file,
ret = -EFAULT;
 
 out:
-   kfree(sa);
up_write(root-fs_info-subvol_sem);
mnt_drop_write_file(file);
return ret;
@@ -4572,6 +4649,8 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_balance_progress(root, argp);
case BTRFS_IOC_SET_RECEIVED_SUBVOL:
return btrfs_ioctl_set_received_subvol(file, argp);
+   case BTRFS_IOC_SET_RECEIVED_SUBVOL_32:
+   return btrfs_ioctl_set_received_subvol_32(file, argp);
case 

Re: [PATCH] btrfs: Fix 32/64-bit problem with BTRFS_SET_RECEIVED_SUBVOL ioctl

2014-01-05 Thread Hugo Mills
On Sun, Jan 05, 2014 at 05:55:27PM +, Hugo Mills wrote:
 The structure for BTRFS_SET_RECEIVED_IOCTL packs differently on 32-bit
 and 64-bit systems. This means that it is impossible to use btrfs
 receive on a system with a 64-bit kernel and 32-bit userspace, because
 the structure size (and hence the ioctl number) is different.
 
 This patch adds a compatibility structure and ioctl to deal with the
 above case.

   Oops, forgot to mention -- this has been compile tested, but not
actually run yet. The machine in question is several miles away and is
a production machine (it's my work desktop, and I can't afford much
downtime on it).

   Hugo.

 Signed-off-by: Hugo Mills h...@carfax.org.uk
 ---
  fs/btrfs/ioctl.c | 95 
 +++-
  1 file changed, 87 insertions(+), 8 deletions(-)
 
 diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
 index 21da576..e186439 100644
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -57,6 +57,32 @@
  #include send.h
  #include dev-replace.h
  
 +#ifdef CONFIG_64BIT
 +/* If we have a 32-bit userspace and 64-bit kernel, then the UAPI
 + * structures are incorrect, as the timespec structure from userspace
 + * is 4 bytes too small. We define these alternatives here to teach
 + * the kernel about the 32-bit struct packing.
 + */
 +struct btrfs_ioctl_timespec {
 + __u64 sec;
 + __u32 nsec;
 +} ((__packed__));
 +
 +struct btrfs_ioctl_received_subvol_args {
 + charuuid[BTRFS_UUID_SIZE];  /* in */
 + __u64   stransid;   /* in */
 + __u64   rtransid;   /* out */
 + struct btrfs_ioctl_timespec stime; /* in */
 + struct btrfs_ioctl_timespec rtime; /* out */
 + __u64   flags;  /* in */
 + __u64   reserved[16];   /* in */
 +} ((__packed__));
 +#endif
 +
 +#define BTRFS_IOC_SET_RECEIVED_SUBVOL_32 _IOWR(BTRFS_IOCTL_MAGIC, 37, \
 + struct btrfs_ioctl_received_subvol_args_32)
 +
 +
  static int btrfs_clone(struct inode *src, struct inode *inode,
  u64 off, u64 olen, u64 olen_aligned, u64 destoff);
  
 @@ -4313,10 +4339,69 @@ static long btrfs_ioctl_quota_rescan_wait(struct file 
 *file, void __user *arg)
   return btrfs_qgroup_wait_for_completion(root-fs_info);
  }
  
 +#ifdef CONFIG_64BIT
 +static long btrfs_ioctl_set_received_subvol_32(struct file *file,
 + void __user *arg)
 +{
 + struct btrfs_ioctl_received_subvol_args_32 *args32 = NULL;
 + struct btrfs_ioctl_received_subvol_args *args64 = NULL;
 + int ret = 0;
 +
 + args32 = memdup_user(arg, sizeof(*args32));
 + if (IS_ERR(args32)) {
 + ret = PTR_ERR(args32);
 + args32 = NULL;
 + goto out;
 + }
 +
 + args64 = malloc(sizeof(*args64));
 + if (IS_ERR(args64)) {
 + ret = PTR_ERR(args64);
 + args64 = NULL;
 + goto out;
 + }
 +
 + memcpy(args64-uuid, args32-uuid, BTRFS_UUID_SIZE);
 + args64-stransid = args32-stransid;
 + args64-rtransid = args32-rtransid;
 + args64-stime.sec = args32-stime.sec;
 + args64-stime.nsec = args32-stime.nsec;
 + args64-rtime.sec = args32-rtime.sec;
 + args64-rtime.nsec = args32-rtime.nsec;
 + args64-flags = args32-flags;
 +
 + ret = _btrfs_ioctl_set_received_subvol(file, args64);
 +
 +out:
 + kfree(args32);
 + kfree(args64);
 + return ret;
 +}
 +#endif
 +
  static long btrfs_ioctl_set_received_subvol(struct file *file,
   void __user *arg)
  {
   struct btrfs_ioctl_received_subvol_args *sa = NULL;
 + int ret = 0;
 +
 + sa = memdup_user(arg, sizeof(*sa));
 + if (IS_ERR(sa)) {
 + ret = PTR_ERR(sa);
 + sa = NULL;
 + goto out;
 + }
 +
 + ret = _btrfs_ioctl_set_received_subvol(file, sa);
 +
 +out:
 + kfree(sa);
 + return ret;
 +}
 +
 +static long _btrfs_ioctl_set_received_subvol(struct file *file,
 + struct 
 btrfs_ioctl_received_subvol_args *sa)
 +{
   struct inode *inode = file_inode(file);
   struct btrfs_root *root = BTRFS_I(inode)-root;
   struct btrfs_root_item *root_item = root-root_item;
 @@ -4346,13 +4431,6 @@ static long btrfs_ioctl_set_received_subvol(struct 
 file *file,
   goto out;
   }
  
 - sa = memdup_user(arg, sizeof(*sa));
 - if (IS_ERR(sa)) {
 - ret = PTR_ERR(sa);
 - sa = NULL;
 - goto out;
 - }
 -
   /*
* 1 - root item
* 2 - uuid items (received uuid + subvol uuid)
 @@ -4411,7 +4489,6 @@ static long btrfs_ioctl_set_received_subvol(struct file 
 *file,
   ret = -EFAULT;
  
  out:
 - kfree(sa);
   up_write(root-fs_info-subvol_sem);
   mnt_drop_write_file(file);
   return ret;
 @@ -4572,6 +4649,8 @@ long btrfs_ioctl(struct file *file, unsigned int
 

Re: Help! - btrfs device delete missing running out of space

2014-01-05 Thread Piotr Pawłow
Hello,
 distribution, used space on each device should be accordingly: 160, 
 216, and 405.

The last number should be 376, I copied the wrong one. Anyway, I deleted 
as much data as possible, which probably won't help in the end, but at 
the moment it's still going. Meanwhile, I made a script to replicate this 
problem:

http://pastebin.com/W2c2pJYp

On kernel 3.12.6 the output is:

---

WARNING! - Btrfs v0.20-rc1 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

adding device /dev/loop1 id 2
fs created label (null) on /dev/loop0
nodesize 4096 leafsize 4096 sectorsize 4096 size 40.00GB
Btrfs v0.20-rc1
17000+0 records in
17000+0 records out
17825792000 bytes (18 GB) copied, 533,571 s, 33,4 MB/s
Label: none  uuid: f8a01060-94c2-4665-b5ff-f134f9b6ad9b
Total devices 2 FS bytes used 16.63GB
devid2 size 20.00GB used 18.01GB path /dev/loop1
devid1 size 20.00GB used 18.03GB path /dev/loop0

Btrfs v0.20-rc1
ERROR: error removing the device 'missing' - No space left on device
Label: none  uuid: f8a01060-94c2-4665-b5ff-f134f9b6ad9b
Total devices 4 FS bytes used 16.62GB
devid4 size 10.00GB used 9.03GB path /dev/loop3
devid3 size 10.00GB used 9.25GB path /dev/loop2
devid1 size 20.00GB used 12.31GB path /dev/loop0
*** Some devices missing

Btrfs v0.20-rc1

---

The delete missing logic is pretty much broken, at least in this case. 
Instead of just replicating the data to other drives, it moves some of 
the data which fills up smaller drives and it fails with No space left 
on device error.

Regards

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-05 Thread Duncan
Jim Salter posted on Sun, 05 Jan 2014 12:54:44 -0500 as excerpted:


 On 01/05/2014 12:09 PM, Chris Murphy wrote:
 I haven't read anything so far indicating defrag applies to the VM
 container use case, rather nodatacow via xattr +C is the way to go. At
 least for now.

Well, NOCOW from the get-go would certainly be better, but given that the 
file is already there and heavily fragmented, my idea was to get it 
defragmented and then set the +C, to prevent it reoccurring.

But I do very little snapshotting here, and as a result hadn't considered 
the knockon effect of 100K-plus extents in perhaps 1000 snapshots.  I 
guess that's what's killing the defrag, however it's initiated.  The only 
way to get rid of the problem, then, would be to move the file away and 
then back, but doing so does still leave all those snapshots with the 
crazy fragmentation, and to kill that would require either killing all 
those snapshots, or setting them writable and doing the same move out, 
move back, on each one!  OUCH, but I guess that's why it just seems 
impossible to deal with the fragmentation on these things, whether it's 
autodefrag, or named file defrag, or doing the whole move out and back 
thing, and then having to worry about all those snapshots.

Still, I'd guess ultimately it'll need done, whether it's a wipe the 
filesystem and restore from backup or whatever.

 Can you elaborate on the rationale behind database or VM binaries being
 set nodatacow? I experimented with this*, and found no significant (to
 me,
 anyway) performance enhancement with nodatacow on - maybe 10% at best,
 and if I understand correctly, that implies losing the live per-block
 checksumming of the data that's set nodatacow, meaning you won't get
 automatic correction if you're on a redundant array.
 
 All I've heard so far is better performance without any more detailed
 explanation, and if the only benefit is an added MAYBE 10%ish
 performance... I'd rather take the hit, personally.
 
 * experimented with this == set up a Win2008R2 test VM and ran
 HDTunePro for several runs on binaries stored with and without nodatacow
 set, 5G of random and sequential read and write access per run.

Well, the problem isn't just performance, it's that in most such cases 
the apps actually have their own date integrity checking and management, 
and sometimes the app's integrity management and that of btrfs end up 
fighting each other, destroying the data as a result.

In normal operation, everything's fine.  But should the system crash at 
the wrong moment, btrfs' atomic commit and data integrity mechanisms can 
roll back to a slightly earlier version of the file.

Which is normally fine.  But because hardware is known to often lie about 
having committed writes that may actually still only be in buffer, if the 
power outage/crash occurred at the wrong moment, ordinary write-barrier 
ordering guarantees may be invalid (particularly on large files with 
finite-seek-speed devices), the app's own integrity checksum may have 
been updated before the data it was supposed to be a checksum on actually 
got to disk.  If btrfs ends up rolling back to that condition, btrfs will 
likely consider the file fine, but the app's own integrity management 
will consider it corrupted, which it actually is.

But if btrfs only stays out of the way, the application often can fix 
whatever minor corruption it detects, doing its own roll-backs to an 
earlier checkpoint, because it's /designed/ to be able to handle such 
problems on filesystems that don't have integrity management.

So having btrfs trying to manage integrity too on such data where the app 
already handles it is self-defeating, because neither knows about nor 
considers what the other one is doing, and the two end up undoing each 
other's careful work.

Again, this isn't something you'll see in normal operation, but several 
people have reported exactly that sort of problem with the general large-
internally-written-file, application-self-managed-file-integrity, 
scenario.  In those cases, the best thing btrfs can do is simply get out 
of the way and let the application handle its own integrity management, 
and the way to tell btrfs to do that, as well as to do in-place rewrites 
instead of COW-based rewrites, is with the NOCOW xattrib, chattr +C, and 
that must be done before the file gets so fragmented (and multi-
snapshotted in its fragmented state) in the first place.


-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT

2014-01-05 Thread Chris Murphy

On Jan 4, 2014, at 2:16 PM, Jim Salter j...@jrs-s.net wrote:

 
 On 01/04/2014 02:18 PM, Chris Murphy wrote:
 I'm not sure what else you're referring to?(working on boot environment of 
 btrfs)
 
 Just the string of caveats regarding mounting at boot time - needing to 
 monkeypatch 00_header to avoid the bogus sparse file error

I don't know what bogus sparse file error refers to. What version of GRUB? 
I'm seeing Ubuntu 12.03 precise-updates listing GRUB 1.99 which is rather old.


 (which, worse, tells you to press a key when pressing a key does nothing) 
 followed by this, in my opinion completely unexpected, behavior when missing 
 a disk in a fault-tolerant array, which also requires monkey-patching in 
 fstab and now elsewhere in GRUB to avoid.

and…

 I'm aware it's not intended for production yet.

On the one hand you say you're aware, yet on the other hand you say the missing 
disk behavior is completely unexpected.

Some parts of Btrfs, in certain contexts, are production ready. But the 
developmental state of Btrfs places a burden on the user to know more details 
about that state than he might otherwise be expected to know with more 
stable/mature file systems.

My opinion is that it's inappropriate for degraded mounts to be made automatic 
when there's no method of notifying user space of this state change. 
Gnome-shell via udisks will inform users of a degraded md array. Something 
equivalent to that is needed before Btrfs should enable a scenario where a user 
boots a computer in degraded state without being informed as if there's nothing 
wrong at all. That's demonstrably far worse than scary boot failure, during 
which one copy of data is still likely safe, unlike permitting uninformed 
degraded rw operation.



 However, it's just on the cusp, with distributions not only including it in 
 their installers but a couple teetering on the fence with declaring it their 
 next default FS (Oracle Unbreakable, OpenSuse, hell even RedHat was flirting 
 with the idea) that it seems to me some extra testing with an eye towards 
 production isn't a bad thing.

Does the Ubuntu 12.03 LTS installer let you create sysroot on a Btrfs raid1 
volume?

 That's why I'm here. Not to crap on anybody, but to get involved, hopefully 
 helpfully.

I think you're better off using something more developmental, it necessarily 
needs to exist in the first place there, before it can trickle down to an LTS 
release.

 
 fs_passno is 1 which doesn't apply to Btrfs.
 Again, that's the distribution's default, so the argument should be with 
 them, not me…

Yes so you'd want to file a bug? That's how you get involved.

 with that said, I'd respectfully argue that fs_passno 1 is correct for any 
 root file system; if the file system itself declines to run an fsck that's up 
 to the filesystem, but it's correct to specify fs_passno 1 if the filesystem 
 is to be mounted as root in the first place.
 
 I'm open to hearing why that's a bad idea, if you have a specific reason?

It's a minor point, but it shows that fs_passno has become quaint, like 
grandma's iron cozy. It's not applicable for either XFS or Btrfs. It's arguably 
inapplicable for ext3/4 but its fsck program has an optimization to skip fully 
checking the file system if the journal replay succeeds. There is no unattended 
fsck for either XFS or Btrfs.

On systemd systems, it reads fstab, and if fs_passno is non-zero it checks for 
the existence of /sbin/fsck.fs and if it doesn't exist, then it doesn't run 
fsck for that entry. This topic was recently brought up and is in the archives.


 Well actually LVM thinp does have fast snapshots without requiring 
 preallocation, and uses COW.
 
 LVM's snapshots aren't very useful for me - there's a performance penalty 
 while you have them in place, so they're best used as a transient 
 use-then-immediately-delete feature, for instance for rsync'ing off a 
 database binary. Until recently, there also wasn't a good way to roll back an 
 LV to a snapshot, and even now, that can be pretty problematic.

This describes old LVM snapshots, not LVM thinp snapshots.

 Finally, there's no way to get a partial copy of an LV snapshot out of the 
 snapshot and back into production, so if eg you have virtual machines of 
 significant size, you could be looking at *hours* of file copy operations to 
 restore an individual VM out of a snapshot (if you even have the drive space 
 available for it), as compared to btrfs' cp --reflink=always operation, which 
 allows you to do the same thing instantaneously.

LVM isn't a file system, so limitations compared to Btrfs are expected.

 
 I'm not sure what you mean by self-correcting, but if the drive reports a 
 read error md, lvm, and Btrfs raid1+ all will get missing data from 
 mirror/parity reconstruction, and write corrected data back to the bad 
 sector.
 
 You're assuming that the drive will actually *report* a read error, which is 
 frequently not the case.

This is discussed in 

Re: btrfs-transaction blocked for more than 120 seconds

2014-01-05 Thread Chris Murphy

On Dec 31, 2013, at 4:46 AM, Sulla su...@gmx.at wrote:

 Dear all!
 
 On my Ubuntu Server 13.10 I use a RAID5 blockdevice consisting of 3 WD20EARS

Sulla is this md raid5? If so can you report the result from mdadm -D 
mddevice, I'm curious what the chunk size is. Thanks.

Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-05 Thread Chris Murphy

On Jan 5, 2014, at 12:57 PM, Duncan 1i5t5.dun...@cox.net wrote:

 
 But I do very little snapshotting here, and as a result hadn't considered 
 the knockon effect of 100K-plus extents in perhaps 1000 snapshots.

I wonder if this is an issue with snapshot aware defrag? Some problems were 
fixed recently but I'm not sure of the status.

The OP's case involves Btrfs on LVM on (I think) md raid5. The mdadm default 
stripe size is 512KB, which would be a 1MB full stripe. There are some 
optimizations for non-full stripe reads and writes for raid5 (not for raid6 so 
it takes a much bigger performance hit) but nevertheless it might be a factor.

  I 
 guess that's what's killing the defrag, however it's initiated.  The only 
 way to get rid of the problem, then, would be to move the file away and 
 then back, but doing so does still leave all those snapshots with the 
 crazy fragmentation, and to kill that would require either killing all 
 those snapshots, or setting them writable and doing the same move out, 
 move back, on each one!  OUCH, but I guess that's why it just seems 
 impossible to deal with the fragmentation on these things, whether it's 
 autodefrag, or named file defrag, or doing the whole move out and back 
 thing, and then having to worry about all those snapshots.

It's why in the short term I'm using +C from the get go. And if I had more VM 
images and qcow2 snapshots, I would put them in a subvolume of their own so 
that they aren't snapshotted along with rootfs. Using Btrfs within the VM I 
still get the features I expect and the performance is quite good.


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-05 Thread Sulla
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dear Chris!

Certainly: I have 3 HDDs, all of which WD20EARS. Originally I wanted to
let btrfs handle all 3 devices directly without making partitions, but
this was impossible, as at least /boot needed to be ext4, at least back
then when I set up the server. And back then btrfs also hadn't raid5-like
functionality, so I decided to put good old partitions and md-Raids and
LVM on them and use btrfs just as plain file-systems on the partitions
provided by LVM.

On the WD disks I thus created 2 partitions each, the first sdX1 being
~500MiB, the rest, 1.9995 TiB is one partition of, sdX2.

I built a Raid1 on the 3 small partitions sdX1 with ext4 for boot, each
disk is bootable with grub installed into the MBR.

I combined the 3 large partitions to a Raid5 of size 3,64TB:

/proc/mdstat reads:
md0 : active raid1 sda1[5] sdb1[4] sdc1[3]
  498676 blocks super 1.2 [3/3] [UUU]
md1 : active raid5 sda2[5] sdb2[4] sdc2[3]
  3904907520 blocks super 1.2 level 5, 8k chunk, algorithm 2 [3/3] [UUU]

the information you requested:
# sudo mdadm -D /dev/md1
/dev/md1:
Version : 1.2
  Creation Time : Thu Jul 14 18:49:25 2011
 Raid Level : raid5
 Array Size : 3904907520 (3724.01 GiB 3998.63 GB)
  Used Dev Size : 1952453760 (1862.01 GiB 1999.31 GB)
   Raid Devices : 3
  Total Devices : 3
Persistence : Superblock is persistent
Update Time : Sun Jan  5 22:07:22 2014
  State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
 Layout : left-symmetric
 Chunk Size : 8K
   Name : freedom:1  (local to host freedom)
   UUID : 44b72520:a78af6f7:dba13fb3:2203127d
 Events : 576884
Number   Major   Minor   RaidDevice State
   4   8   180  active sync   /dev/sdb2
   5   821  active sync   /dev/sda2
   3   8   342  active sync   /dev/sdc2



I use the Raid5 md1 as physical volume for LVM: pvdisplay gives:
  --- Physical volume ---
  PV Name   /dev/md1
  VG Name   MAIN
  PV Size   3.64 TiB / not usable 2.06 MiB
  Allocatable   yes
  PE Size   4.00 MiB
  Total PE  953346
  Free PE   6274
  Allocated PE  947072
  PV UUID   WcuEx8-ehJL-xHdf-ElwF-b9s3-dlmM-KZlDNG

I keep a reserve of 6274 4MiB blocks (=24GiB) in case one of the logical
volumes runs out of space...

I created the following logical volumes, named after their intended
mountpoints:
  --- Logical volume ---
  LV Path/dev/MAIN/ROOT
  LV NameROOT
  VG NameMAIN
  LV UUIDkURJks-xHox-73B5-n02x-eZfS-agDD-n1dtAm
  LV Write Accessread/write
  LV Creation host, time ,
  LV Status  available
  # open 1
  LV Size19.31 GiB
  Current LE 4944
  Segments   2
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device   252:0

and similar:
  --- Logical volume ---
  LV Path/dev/MAIN/SWAP: 1.8GB
  LV Path/dev/MAIN/HOME: 18.6GB
  LV Path/dev/MAIN/TMP: 9.3 GB
  LV Path/dev/MAIN/DATA1 2.6 TB
  LV Path/dev/MAIN/DATA2: 0.9 TB


as filesystem I used btrfs during install form an ubuntu server, I don't
recall which, might have been 11.10 or 12.04 (?) for all logical
partitions except swap, of course,

any other information I can supply?
regards, Sulla

- -- 
Cogito cogito ergo cogito sum.
   Ambrose Bierce














-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.21 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlLJy+8ACgkQR6b2EdogPFupxgCfeDRdeO+PYoQNIjtySAYEmSEr
PNoAoLPNcSqDHsDzM8pAuHlbva7j18MS
=XBOA
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-05 Thread Duncan
On Sun, 05 Jan 2014 08:42:46 -0500
Jim Salter j...@jrs-s.net wrote:

 On Jan 5, 2014 1:39 AM, Marc MERLIN m...@merlins.org wrote:
 
  On Fri, Jan 03, 2014 at 09:34:10PM +, Duncan wrote: 
  Yes, I got that. That why I ran btrfs defrag on the files after that
 
 Why are you trying to defrag an SSD? There's no seek penalty for
 moving between fragmented blocks, so defrag isn't really desirable in
 the first place.

[I normally try to reply directly to list but don't believe I've seen
this there yet, but got it direct-mailed so will reply-all in response.]

There's no seek penalty so the overall problem is dramatically lessened
as that's the significant part of it on spinning rust, correct, but...

SSDs do remain IOPS-bound, and tens or hundreds of thousands of extents
do exact an IOPS (as well as general extent bookkeeping) toll, too.

That's why I ended up enabling autodefrag here when I was first setting
up, even tho I'm on SSD.  (Only after asking the list basically the same
question, what good it is autodefrag on SSD, tho.)

Luckily I don't happen to deal with any of the
internal-write-in-huge-files scenarios, however, and I enabled
autodefrag to cover the internal-write-in-small-file scenarios BEFORE I
started putting any data on the filesystems at all,  so I'm basically
covered, here, without actually having to do chattr +C on anything.

 That doesn't change the fact that the described lockup sounds like a
 bug not a feature of course, but I think the answer to your personal
 issue on that particular machine is don't defrag a solid state
 drive.

I now believe the lockup must be due to processing the hundreds of
thousands of extents on all those snapshots, too, in addition to doing
it on the main volume.  I don't actually make very extensive use of
snapshots here anyway, so I didn't think about that aspect originally,
but that's gotta be what's throwing the real spanner in the works,
turning a possibly long but workable normal defrag (O(1)) into a lockup
scenario (O(n)) where virtually no progress is made as currently
coded.

-- 
Duncan - No HTML messages please, as they are filtered as spam.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-05 Thread Brendan Hide

On 2014/01/05 11:17 PM, Sulla wrote:

Certainly: I have 3 HDDs, all of which WD20EARS.

Maybe/maybe-not off-topic:
Poor hardware performance, though not necessarily the root cause, can be 
a major factor with these errors.


WD Greens (Reds too, for that matter) have poor non-sequential 
performance. An educated guess I'd say there's a 15% chance this is a 
major factor to the problem and, perhaps, a 60% chance it is merely a 
small contributor to the problem. Greens are aimed at consumers 
wanting high capacity and a low pricepoint. The result is poor 
performance. See footnote * re my experience.


My general recommendation (use cases vary of course) is to install a 
tiny SSD (60GB, for example) just for the OS. It is typically cheaper 
than the larger drives and will be *much* faster. WD Greens and Reds 
have good *sequential* throughput but comparatively abysmal random 
throughput even in comparison to regular non-SSD consumer drives.


*
I had 8x 1.5TB WD1500EARS drives in an mdRAID5 array. With it I had a 
single 250GB IDE disk for the OS. When the very old IDE disk inevitably 
died, I decided to use a spare 1.5TB drive for the OS. Performance was 
bad enough that I simply bought my first SSD the same week.


--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


FILE_EXTENT_SAME changes mtime and ctime

2014-01-05 Thread Gerhard Heift
Hello,

I am currently playing with snapshots and manual deduplication of
files. During these tests I noticed the change of ctime and mtime in
the snapshot after the deduplication with FILE_EXTENT_SAME. Does this
happens on purpose? Otherwise I would like to have ctime and mtime
left unmodified, because on a read only snapshot I cannot change them
back after the ioctl call.

I attached a very basic patch, which illustrates my idea.

Thanks,
  Gerhard
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 9d46f60..975d207 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -59,7 +59,7 @@
 #include dev-replace.h
 
 static int btrfs_clone(struct inode *src, struct inode *inode,
-		   u64 off, u64 olen, u64 olen_aligned, u64 destoff);
+		   u64 off, u64 olen, u64 olen_aligned, u64 destoff, int update_time);
 
 /* Mask out flags that are inappropriate for the given type of inode. */
 static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
@@ -2683,7 +2683,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 len,
 
 	ret = btrfs_cmp_data(src, loff, dst, dst_loff, len);
 	if (ret == 0)
-		ret = btrfs_clone(src, dst, loff, len, len, dst_loff);
+		ret = btrfs_clone(src, dst, loff, len, len, dst_loff, /* update time */ 0);
 
 out_unlock:
 	btrfs_double_unlock(src, loff, dst, dst_loff, len);
@@ -2836,9 +2836,10 @@ out:
  * @olen_aligned: Block-aligned value of olen, extent_same uses
  *   identical values here
  * @destoff: Offset within @inode to start clone
+ * @update_time: Should we update ctime and mtime of @inode?
  */
 static int btrfs_clone(struct inode *src, struct inode *inode,
-		   u64 off, u64 olen, u64 olen_aligned, u64 destoff)
+		   u64 off, u64 olen, u64 olen_aligned, u64 destoff, int update_time)
 {
 	struct btrfs_root *root = BTRFS_I(inode)-root;
 	struct btrfs_path *path = NULL;
@@ -3081,8 +3082,10 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
 			btrfs_mark_buffer_dirty(leaf);
 			btrfs_release_path(path);
 
-			inode_inc_iversion(inode);
-			inode-i_mtime = inode-i_ctime = CURRENT_TIME;
+			if (update_time) {
+inode_inc_iversion(inode);
+inode-i_mtime = inode-i_ctime = CURRENT_TIME;
+			}
 
 			/*
 			 * we round up to the block size at eof when
@@ -3227,7 +3230,7 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
 
 	lock_extent_range(src, off, len);
 
-	ret = btrfs_clone(src, inode, off, olen, len, destoff);
+	ret = btrfs_clone(src, inode, off, olen, len, destoff, /* update time */ 1);
 
 	unlock_extent(BTRFS_I(src)-io_tree, off, off + len - 1);
 out_unlock:


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-05 Thread Roman Mamedov
On Mon, 06 Jan 2014 00:36:22 +0200
Brendan Hide bren...@swiftspirit.co.za wrote:

 I had 8x 1.5TB WD1500EARS drives in an mdRAID5 array. With it I had a 
 single 250GB IDE disk for the OS. When the very old IDE disk inevitably 
 died, I decided to use a spare 1.5TB drive for the OS. Performance was 
 bad enough that I simply bought my first SSD the same week.

Did you align your partitions to accommodate for the 4K sector of the EARS?

-- 
With respect,
Roman


signature.asc
Description: PGP signature


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-05 Thread Chris Murphy

On Jan 5, 2014, at 2:17 PM, Sulla su...@gmx.at wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Dear Chris!
 
 Certainly: I have 3 HDDs, all of which WD20EARS.

These drives don't have a configurable SCT ERC, so you need to modify the SCSI 
block layer timeout:

echo 120 /sys/block/sdX/device/timeout

You also need to schedule regular scrubs at the md level as well.

echo check  /sys/block/mdX/md/sync_action
cat /sys/block/mdX/mismatch_cnt

More info about this is in man 4 md, and on the linux-raid list.

 
  3904907520 blocks super 1.2 level 5, 8k chunk, algorithm 2 [3/3] [UUU]

OK so 8KB chunk, 16KB full stripe, so that doesn't apply to what I was thinking 
might be the case. The workload is presumably small file sizes, like a mail 
server?


 any other information I can supply?

I'm not a developer, I don't know if this problem is known or maybe fixed in a 
newer kernel than 3.11.0 - which has been around for 5-6 months. I think the 
main suggestion is to try a newer kernel, granted with the configuration of md, 
lvm, and btrfs you have three layers that will likely have kernel changes. I'd 
make sure you have backups. While this layout is valid and should work, it's 
also probably less common and therefore less tested.

Usually in case of blocking devs want to see sysrq+w issued. The setup is dmesg 
-n7, and enable sysrq functions. Then reproduce the block, and during the block 
issue w to the sysrq trigger, then capture dmesg contents and post the block 
and any other nearby btrfs messages.

https://www.kernel.org/doc/Documentation/sysrq.txt


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-05 Thread Chris Murphy

On Jan 5, 2014, at 4:48 PM, Chris Murphy li...@colorremedies.com wrote:

 
 On Jan 5, 2014, at 2:17 PM, Sulla su...@gmx.at wrote:
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Dear Chris!
 
 Certainly: I have 3 HDDs, all of which WD20EARS.
 
 These drives don't have a configurable SCT ERC, so you need to modify the 
 SCSI block layer timeout:
 
 echo 120 /sys/block/sdX/device/timeout
 
 You also need to schedule regular scrubs at the md level as well.
 
 echo check  /sys/block/mdX/md/sync_action
 cat /sys/block/mdX/mismatch_cnt
 
 More info about this is in man 4 md, and on the linux-raid list.
 
 
 3904907520 blocks super 1.2 level 5, 8k chunk, algorithm 2 [3/3] [UUU]
 
 OK so 8KB chunk, 16KB full stripe, so that doesn't apply to what I was 
 thinking might be the case. The workload is presumably small file sizes, like 
 a mail server?
 
 
 any other information I can supply?
 
 I'm not a developer, I don't know if this problem is known or maybe fixed in 
 a newer kernel than 3.11.0 - which has been around for 5-6 months. I think 
 the main suggestion is to try a newer kernel, granted with the configuration 
 of md, lvm, and btrfs you have three layers that will likely have kernel 
 changes. I'd make sure you have backups. While this layout is valid and 
 should work, it's also probably less common and therefore less tested.
 
 Usually in case of blocking devs want to see sysrq+w issued. The setup is 
 dmesg -n7, and enable sysrq functions. Then reproduce the block, and during 
 the block issue w to the sysrq trigger, then capture dmesg contents and post 
 the block and any other nearby btrfs messages.
 
 https://www.kernel.org/doc/Documentation/sysrq.txt

Also, this thread is pretty cluttered with other conversations by now so I 
think you're best off starting a new thread with this information, maybe a 
title of PROBLEM: btrfs on LVM on md raid, blocking  120 seconds

Since it's almost inevitable you'd be asked to test with a newer kernel anyway, 
you might as well go to 3.13rc7 and see if you can reproduce, if reproducible, 
be specific with the problem report by following this template:

https://www.kernel.org/pub/linux/docs/lkml/reporting-bugs.html



Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-05 Thread Chris Murphy

On Jan 5, 2014, at 3:36 PM, Brendan Hide bren...@swiftspirit.co.za wrote:

 WD Greens (Reds too, for that matter) have poor non-sequential performance. 
 An educated guess I'd say there's a 15% chance this is a major factor to the 
 problem and, perhaps, a 60% chance it is merely a small contributor to the 
 problem. Greens are aimed at consumers wanting high capacity and a low 
 pricepoint. The result is poor performance. See footnote * re my experience.
 
 My general recommendation (use cases vary of course) is to install a tiny SSD 
 (60GB, for example) just for the OS. It is typically cheaper than the larger 
 drives and will be *much* faster. WD Greens and Reds have good *sequential* 
 throughput but comparatively abysmal random throughput even in comparison to 
 regular non-SSD consumer drives.


Another thing with md raid and parallel flie systems that's been an issue is 
cqf. On the XFS list cqf is approximately in the realm of persona non grata. It 
might be worth Sulla also setting elevator=deadline and see if simply different 
scheduling is a work around, not that it's OK to get blocks with cqf. But it 
might be worth a shot as a more conservative approach to upgrading the kernel 
from 3.11.0.


 I had 8x 1.5TB WD1500EARS drives in an mdRAID5 array. With it I had a single 
 250GB IDE disk for the OS. When the very old IDE disk inevitably died, I 
 decided to use a spare 1.5TB drive for the OS. Performance was bad enough 
 that I simply bought my first SSD the same week.

Yeah for what it's worth, the current WD Green PDF says these drives are not to 
be used in RAID at all. Not 0, 1, 5 or 6.  Even Caviar Black is proscribed from 
use in RAID environments using multibay chassis, as in, no warranty. It's 
desktop raid0 and raid1 only, and arguably the lack of configurable SCT ERC 
makes it not ideal even for raid1.

Anyway, Sulla, how about putting up a smartctl -x for each drive? Curious if 
there are any bad sectors that have developed, and may be worth filtering all 
/var/log/messages for the word reset and see if you find any of these drives 
ever being reset by the kernel and if so, post the full output of that.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-05 Thread Chris Murphy

On Jan 5, 2014, at 5:15 PM, Chris Murphy li...@colorremedies.com wrote:

 
 On Jan 5, 2014, at 3:36 PM, Brendan Hide bren...@swiftspirit.co.za wrote:
 
 WD Greens (Reds too, for that matter) have poor non-sequential performance. 
 An educated guess I'd say there's a 15% chance this is a major factor to the 
 problem and, perhaps, a 60% chance it is merely a small contributor to the 
 problem. Greens are aimed at consumers wanting high capacity and a low 
 pricepoint. The result is poor performance. See footnote * re my experience.
 
 My general recommendation (use cases vary of course) is to install a tiny 
 SSD (60GB, for example) just for the OS. It is typically cheaper than the 
 larger drives and will be *much* faster. WD Greens and Reds have good 
 *sequential* throughput but comparatively abysmal random throughput even in 
 comparison to regular non-SSD consumer drives.
 
 
 Another thing with md raid and parallel flie systems that's been an issue is 
 cqf.

Oops, CFQ!

Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-05 Thread Sulla
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Thanks Chris!

Thanks for your support.

 echo 120 /sys/block/sdX/device/timeout
timeout is 30 for my HDDs. I'm well aware that the WD green HDDs are not
the perfect ones for servers, but they were cheaper - and quieter - than
the black ones for servers. I'll get the red ones next, though. ;-)

 You also need to schedule regular scrubs at the md level as well.

Ubuntu does that once a month.

 cat /sys/block/mdX/mismatch_cnt
this resides in cat /sys/devices/virtual/block/md1/md/mismatch_cnt on my
machine.
the count is zero.

 The workload is presumably small file sizes, like a mail server?
Yes. It serves as a mailserver (maildir-format), but also as a samba file
server with quite big files...

btrfs ran fine for more than a year, so I'm not sure how reproducible the
problem is...

I don't really wish to install or compile cumstom kernels, to be honest.
Not sure how problematic they might be during the next do-release-upgrade...

Sulla


- -- 
Russian Roulette is not the same without a gun
and baby when it's love, if it's not rough, it isn't fun, fun.
   Lady GaGa, Pokerface












-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.21 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlLJ+A8ACgkQR6b2EdogPFuFwwCffSjZpDJvIj70Ag+CPbClCVuc
viEAnjqnxcEdhKR2Gq84eGYEXfjfb23F
=pmTS
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-05 Thread Chris Murphy

On Jan 5, 2014, at 5:25 PM, Sulla su...@gmx.at wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Thanks Chris!
 
 Thanks for your support.
 
 echo 120 /sys/block/sdX/device/timeout
 timeout is 30 for my HDDs.

I don't think those drives support a configurable time out; the Green hasn't 
support it in years. Where are you getting this information? What do you get 
for 'smartctl -l scterc /dev/sdX'?



 I don't really wish to install or compile cumstom kernels, to be honest.

If the problem is reproducible, then that's the fastest way to find out if it's 
been fixed or not. In this case 3.11 is EOL already, no more updates.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 07/11] btrfs: Add noinode_cache mount option.

2014-01-05 Thread Qu Wenruo

On fri, 3 Jan 2014 18:52:44 +0100, David Sterba wrote:

On Fri, Jan 03, 2014 at 02:10:30PM +0800, Qu Wenruo wrote:

Add noinode_cache mount option to disable inode map cache with
remount option.

This looks almost safe, there's a sync_filesystem called before the
filesystem's remount handler, the transaction gets committed and flushes
all tha data related to inode_cache.

The caching thread keeps running, which is not a serious problem as
it'll finish at umount time, only consuming resources.

There's a window between sync_filesystem and successful remount when the
INODE_MAP_CACHE bit is set and the cache could be used to get a free ino,
then the INODE_MAP_CACHE is cleared but the ino cache remains is not
synced back to disk, normally called from transaction commit via
btrfs_unpin_free_ino. I haven't looked if something else blocks that to
happen.

I'd leave this patch out for now, it probably needs more code updates
than just unsetting the bit.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Thanks for pointing out the hidden problem.

I'll check the related source again to keep this behavior safe or
add new codes.

So in next patchset, the inode map cache option will not be included
and will be seperated to a new patch.

Qu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/11] btrfs: Add missing pairing mount options.

2014-01-05 Thread Qu Wenruo

On fri, 03 Jan 2014 11:52:07 -0600, Eric Sandeen wrote:

On 1/3/14, 12:10 AM, Qu Wenruo wrote:

Some options should be paired to support triggering different functions
when remounting.

This patchset add these missing pairing mount options.

I think this really would benefit from a regression test which
ensures that every remount transition works properly...

Thanks,
-Eric

Xfstests test case for the remounting test is under development and will
submit soon.(for both generic and btrfs mount options)

As far as I tested, no problem occurs in my test environment but since
the IO pressure is low, a more heavier test case is needed though.

Qu



changelog:
v1: Initial commit with only barrier option
v2: Add other missing pairing options

Qu Wenruo (11):
   btrfs: Add barrier option to support -o remount,barrier
   btrfs: Add noautodefrag mount option.
   btrfs: Add nocheck_int mount option.
   btrfs: Add nodiscard mount option.
   btrfs: Add noenospc_debug mount option.
   btrfs: Add noflushoncommit mount option.
   btrfs: Add noinode_cache mount option.
   btrfs: Add acl mount option.
   btrfs: Add datacow mount option.
   btrfs: Add datasum mount option.
   btrfs: Add treelog mount option.

  Documentation/filesystems/btrfs.txt | 56 ++--
  fs/btrfs/super.c| 74 -
  2 files changed, 110 insertions(+), 20 deletions(-)


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/11] btrfs: Add missing pairing mount options.

2014-01-05 Thread Qu Wenruo

On fri, 3 Jan 2014 18:58:28 +0100, David Sterba wrote:

On Fri, Jan 03, 2014 at 02:10:23PM +0800, Qu Wenruo wrote:

Some options should be paired to support triggering different functions
when remounting.
This patchset add these missing pairing mount options.

Thanks!


   btrfs: Add nocheck_int mount option.
   btrfs: Add noinode_cache mount option.

Commented separately, imho not to be merged in current state.


   btrfs: Add barrier option to support -o remount,barrier
   btrfs: Add noautodefrag mount option.
   btrfs: Add nodiscard mount option.
   btrfs: Add noenospc_debug mount option.
   btrfs: Add noflushoncommit mount option.
   btrfs: Add acl mount option.
   btrfs: Add datacow mount option.
   btrfs: Add datasum mount option.
   btrfs: Add treelog mount option.

All ok.

Reviewed-by: David Sterba dste...@suse.cz
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Thanks for your commenting.
nocheck_int and noinode_cache will be remove in next version,
and noinode_cache will be resent as a independent patch after more
investigation and tests.

Also remounting test case will be added to xfstest soon.

Qu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 03/11] btrfs: Add nocheck_int mount option.

2014-01-05 Thread Qu Wenruo

On Fri, 3 Jan 2014 18:13:08 +0100, David Sterba wrote:

On Fri, Jan 03, 2014 at 02:10:26PM +0800, Qu Wenruo wrote:

Add nocheck_int mount option to disable integrity check with
remount option.

+   nocheck_int disables all the debug options above.

I think this option is not needed, the integrity checker is a
deveoplment functionality and used by people who know what they're
doing. Besides this would need to clean up all the data structures that
the checker uses (see eg. btrfsic_unmount that's called only if the
mount option is used). I see little benefit compared to the amount of
work to make sure that disabling the checker functionality in the middle
works properly.

david


That's right, since most people won't enable integrity check
until checking the all-yes config or running xfstests,
it's better not to add this option.

Qu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-05 Thread Chris Murphy

On Jan 5, 2014, at 6:29 PM, Sulla su...@gmx.at wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Hi Chris!
 
 # sudo smartctl -l scterc /dev/sda
 tells me
 SCT Error Recovery Control command not supported
 
 you're right. the /sys/block/sdX/device/timeout file probably is useless then.

OK there's some confusion. /sys/block/sdX/device/timeout is the SCSI block 
layer timeout - linux itself has a timeout for each command issued to a block 
device, and will reset the link upon timeout being reached. So writing 120 to 
this will cause linux to wait for up to 120 seconds for the drive to respond. 
This is necessary because if there's a bad sector, the drive must report a read 
error in order for the md driver to reconstruct that data from parity. This is 
needed  bothfor effective scrubs, and recovery on read error in normal 
operation. It is not a persistent setting so you'll want to create a start up 
script for it.


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/9] btrfs: Add missing pairing mount options.

2014-01-05 Thread Qu Wenruo
Some options should be paired to support triggering different functions
when remounting.

This patchset add these missing pairing mount options except noinode_cache,
which may need more investigation to ensure the safety and will be sent as
independent patch.

Qu Wenruo (9):
  btrfs: Add barrier option to support -o remount,barrier
  btrfs: Add noautodefrag mount option.
  btrfs: Add nodiscard mount option.
  btrfs: Add noenospc_debug mount option.
  btrfs: Add noflushoncommit mount option.
  btrfs: Add acl mount option.
  btrfs: Add datacow mount option.
  btrfs: Add datasum mount option.
  btrfs: Add treelog mount option.

 Documentation/filesystems/btrfs.txt | 47 +++
 fs/btrfs/super.c| 55 -
 2 files changed, 84 insertions(+), 18 deletions(-)

-- 
1.8.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/9] btrfs: Add barrier option to support -o remount,barrier

2014-01-05 Thread Qu Wenruo
Btrfs can be remounted without barrier, but there is no barrier option
so nobody can remount btrfs back with barrier on. Only umount and
mount again can re-enable barrier.(Quite awkward)

Also the mount options in the document is also changed slightly for the
further pairing options changes.

Reported-by: Daniel Blueman dan...@quora.org
Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
Signed-off-by: Mike Fleetwood mike.fleetw...@googlemail.com
Reviewed-by: David Sterba dste...@suse.cz

---
Changelog:
v1: Add barrier option
v2: Document style change
v3: Small description change
---
 Documentation/filesystems/btrfs.txt | 13 +++--
 fs/btrfs/super.c|  8 +++-
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/Documentation/filesystems/btrfs.txt 
b/Documentation/filesystems/btrfs.txt
index 5dd282d..ce487a2 100644
--- a/Documentation/filesystems/btrfs.txt
+++ b/Documentation/filesystems/btrfs.txt
@@ -38,7 +38,7 @@ Mount Options
 =
 
 When mounting a btrfs filesystem, the following option are accepted.
-Unless otherwise specified, all options default to off.
+Options with (*) are default options and will not show in the mount options.
 
   alloc_start=bytes
Debugging option to force all block allocations above a certain
@@ -138,12 +138,13 @@ Unless otherwise specified, all options default to off.
Disable support for Posix Access Control Lists (ACLs).  See the
acl(5) manual page for more information about ACLs.
 
+  barrier(*)
   nobarrier
-Disables the use of block layer write barriers.  Write barriers ensure
-   that certain IOs make it through the device cache and are on persistent
-   storage.  If used on a device with a volatile (non-battery-backed)
-   write-back cache, this option will lead to filesystem corruption on a
-   system crash or power loss.
+Enable/disable the use of block layer write barriers.  Write barriers
+   ensure that certain IOs make it through the device cache and are on
+   persistent storage. If disabled on a device with a volatile
+   (non-battery-backed) write-back cache, nobarrier option will lead to
+   filesystem corruption on a system crash or power loss.
 
   nodatacow
Disable data copy-on-write for newly created files.  Implies nodatasum,
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index e9c13fb..fe9d8a6 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -323,7 +323,7 @@ enum {
Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
Opt_check_integrity, Opt_check_integrity_including_extent_data,
Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree,
-   Opt_commit_interval,
+   Opt_commit_interval, Opt_barrier,
Opt_err,
 };
 
@@ -335,6 +335,7 @@ static match_table_t tokens = {
{Opt_nodatasum, nodatasum},
{Opt_nodatacow, nodatacow},
{Opt_nobarrier, nobarrier},
+   {Opt_barrier, barrier},
{Opt_max_inline, max_inline=%s},
{Opt_alloc_start, alloc_start=%s},
{Opt_thread_pool, thread_pool=%d},
@@ -494,6 +495,11 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
btrfs_clear_opt(info-mount_opt, SSD);
btrfs_clear_opt(info-mount_opt, SSD_SPREAD);
break;
+   case Opt_barrier:
+   if (btrfs_test_opt(root, NOBARRIER))
+   btrfs_info(root-fs_info, turning on 
barriers);
+   btrfs_clear_opt(info-mount_opt, NOBARRIER);
+   break;
case Opt_nobarrier:
btrfs_info(root-fs_info, turning off barriers);
btrfs_set_opt(info-mount_opt, NOBARRIER);
-- 
1.8.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 5/9] btrfs: Add noflushoncommit mount option.

2014-01-05 Thread Qu Wenruo
Add noflushoncommit mount option to disable flush on commit with
remount option.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
Reviewed-by: David Sterba dste...@suse.cz

---
Changelog:
v2: Add noflushoncommit option
v3: None
---
 Documentation/filesystems/btrfs.txt | 1 +
 fs/btrfs/super.c| 8 +++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/btrfs.txt 
b/Documentation/filesystems/btrfs.txt
index 13a7cac..303b49c 100644
--- a/Documentation/filesystems/btrfs.txt
+++ b/Documentation/filesystems/btrfs.txt
@@ -117,6 +117,7 @@ Options with (*) are default options and will not show in 
the mount options.
  bug - BUG() on a fatal error.  This is the default.
  panic - panic() on a fatal error.
 
+  noflushoncommit(*)
   flushoncommit
The 'flushoncommit' mount option forces any data dirtied by a write in a
prior transaction to commit as part of the current commit.  This makes
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index acf3e7d..b2c752e 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -324,7 +324,7 @@ enum {
Opt_check_integrity, Opt_check_integrity_including_extent_data,
Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree,
Opt_commit_interval, Opt_barrier, Opt_nodefrag, Opt_nodiscard,
-   Opt_noenospc_debug,
+   Opt_noenospc_debug, Opt_noflushoncommit,
Opt_err,
 };
 
@@ -350,6 +350,7 @@ static match_table_t tokens = {
{Opt_noacl, noacl},
{Opt_notreelog, notreelog},
{Opt_flushoncommit, flushoncommit},
+   {Opt_noflushoncommit, noflushoncommit},
{Opt_ratio, metadata_ratio=%d},
{Opt_discard, discard},
{Opt_nodiscard, nodiscard},
@@ -562,6 +563,11 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
btrfs_info(root-fs_info, turning on flush-on-commit);
btrfs_set_opt(info-mount_opt, FLUSHONCOMMIT);
break;
+   case Opt_noflushoncommit:
+   if (btrfs_test_opt(root, FLUSHONCOMMIT))
+   btrfs_info(root-fs_info, turning off 
flush-on-commit);
+   btrfs_clear_opt(info-mount_opt, FLUSHONCOMMIT);
+   break;
case Opt_ratio:
ret = match_int(args[0], intarg);
if (ret) {
-- 
1.8.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/9] btrfs: Add noenospc_debug mount option.

2014-01-05 Thread Qu Wenruo
Add noenospc_debug mount option to disable ENOSPC debug with
remount option.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
Reviewed-by: David Sterba dste...@suse.cz

---
Changelog:
v2: Add noenospc_debug option
v3: None
---
 Documentation/filesystems/btrfs.txt | 3 ++-
 fs/btrfs/super.c| 5 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/btrfs.txt 
b/Documentation/filesystems/btrfs.txt
index 7254cf5..13a7cac 100644
--- a/Documentation/filesystems/btrfs.txt
+++ b/Documentation/filesystems/btrfs.txt
@@ -108,8 +108,9 @@ Options with (*) are default options and will not show in 
the mount options.
performance impact.  (The fstrim command is also available to
initiate batch trims from userspace).
 
+  noenospc_debug(*)
   enospc_debug
-   Debugging option to be more verbose in some ENOSPC conditions.
+   Disable/enable debugging option to be more verbose in some ENOSPC 
conditions.
 
   fatal_errors=action
Action to take when encountering a fatal error: 
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 8731ee6..acf3e7d 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -324,6 +324,7 @@ enum {
Opt_check_integrity, Opt_check_integrity_including_extent_data,
Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree,
Opt_commit_interval, Opt_barrier, Opt_nodefrag, Opt_nodiscard,
+   Opt_noenospc_debug,
Opt_err,
 };
 
@@ -356,6 +357,7 @@ static match_table_t tokens = {
{Opt_clear_cache, clear_cache},
{Opt_user_subvol_rm_allowed, user_subvol_rm_allowed},
{Opt_enospc_debug, enospc_debug},
+   {Opt_noenospc_debug, noenospc_debug},
{Opt_subvolrootid, subvolrootid=%d},
{Opt_defrag, autodefrag},
{Opt_nodefrag, noautodefrag},
@@ -603,6 +605,9 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
case Opt_enospc_debug:
btrfs_set_opt(info-mount_opt, ENOSPC_DEBUG);
break;
+   case Opt_noenospc_debug:
+   btrfs_clear_opt(info-mount_opt, ENOSPC_DEBUG);
+   break;
case Opt_defrag:
btrfs_info(root-fs_info, enabling auto defrag);
btrfs_set_opt(info-mount_opt, AUTO_DEFRAG);
-- 
1.8.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/9] btrfs: Add noautodefrag mount option.

2014-01-05 Thread Qu Wenruo
Btrfs has autodefrag mount option but no pairing noautodefrag option,
which makes it impossible to disable autodefrag without umount.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
Reviewed-by: David Sterba dste...@suse.cz

---
Changelog:
v2: Add noautodefrag option
v3: None
---
 Documentation/filesystems/btrfs.txt | 8 +---
 fs/btrfs/super.c| 8 +++-
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/Documentation/filesystems/btrfs.txt 
b/Documentation/filesystems/btrfs.txt
index ce487a2..e87609a 100644
--- a/Documentation/filesystems/btrfs.txt
+++ b/Documentation/filesystems/btrfs.txt
@@ -46,10 +46,12 @@ Options with (*) are default options and will not show in 
the mount options.
bytes, optionally with a K, M, or G suffix, case insensitive.
Default is 1MB.
 
+  noautodefrag(*)
   autodefrag
-   Detect small random writes into files and queue them up for the
-   defrag process.  Works best for small files; Not well suited for
-   large database workloads.
+   Disable/enable auto defragmentation.
+   Auto defragmentation detects small random writes into files and queue
+   them up for the defrag process.  Works best for small files;
+   Not well suited for large database workloads.
 
   check_int
   check_int_data
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index fe9d8a6..c65f696 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -323,7 +323,7 @@ enum {
Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
Opt_check_integrity, Opt_check_integrity_including_extent_data,
Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree,
-   Opt_commit_interval, Opt_barrier,
+   Opt_commit_interval, Opt_barrier, Opt_nodefrag,
Opt_err,
 };
 
@@ -357,6 +357,7 @@ static match_table_t tokens = {
{Opt_enospc_debug, enospc_debug},
{Opt_subvolrootid, subvolrootid=%d},
{Opt_defrag, autodefrag},
+   {Opt_nodefrag, noautodefrag},
{Opt_inode_cache, inode_cache},
{Opt_no_space_cache, nospace_cache},
{Opt_recovery, recovery},
@@ -602,6 +603,11 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
btrfs_info(root-fs_info, enabling auto defrag);
btrfs_set_opt(info-mount_opt, AUTO_DEFRAG);
break;
+   case Opt_nodefrag:
+   if (btrfs_test_opt(root, AUTO_DEFRAG))
+   btrfs_info(root-fs_info, disabling auto 
defrag);
+   btrfs_clear_opt(info-mount_opt, AUTO_DEFRAG);
+   break;
case Opt_recovery:
btrfs_info(root-fs_info, enabling auto recovery);
btrfs_set_opt(info-mount_opt, RECOVERY);
-- 
1.8.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/9] btrfs: Add nodiscard mount option.

2014-01-05 Thread Qu Wenruo
Add nodiscard mount option to disable discard with remount option.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
Reviewed-by: David Sterba dste...@suse.cz

---
Changelog:
v2: Add nodiscard option
v3: None
---
 Documentation/filesystems/btrfs.txt | 7 +--
 fs/btrfs/super.c| 6 +-
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/btrfs.txt 
b/Documentation/filesystems/btrfs.txt
index e87609a..7254cf5 100644
--- a/Documentation/filesystems/btrfs.txt
+++ b/Documentation/filesystems/btrfs.txt
@@ -98,9 +98,12 @@ Options with (*) are default options and will not show in 
the mount options.
can be avoided.  Especially useful when trying to mount a multi-device
setup as root.  May be specified multiple times for multiple devices.
 
+  nodiscard(*)
   discard
-   Issue frequent commands to let the block device reclaim space freed by
-   the filesystem.  This is useful for SSD devices, thinly provisioned
+   Disable/enable discard mount option.
+   Discard issues frequent commands to let the block device reclaim space
+   freed by the filesystem.
+   This is useful for SSD devices, thinly provisioned
LUNs and virtual machine images, but may have a significant
performance impact.  (The fstrim command is also available to
initiate batch trims from userspace).
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index c65f696..8731ee6 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -323,7 +323,7 @@ enum {
Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
Opt_check_integrity, Opt_check_integrity_including_extent_data,
Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree,
-   Opt_commit_interval, Opt_barrier, Opt_nodefrag,
+   Opt_commit_interval, Opt_barrier, Opt_nodefrag, Opt_nodiscard,
Opt_err,
 };
 
@@ -351,6 +351,7 @@ static match_table_t tokens = {
{Opt_flushoncommit, flushoncommit},
{Opt_ratio, metadata_ratio=%d},
{Opt_discard, discard},
+   {Opt_nodiscard, nodiscard},
{Opt_space_cache, space_cache},
{Opt_clear_cache, clear_cache},
{Opt_user_subvol_rm_allowed, user_subvol_rm_allowed},
@@ -575,6 +576,9 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
case Opt_discard:
btrfs_set_opt(info-mount_opt, DISCARD);
break;
+   case Opt_nodiscard:
+   btrfs_clear_opt(info-mount_opt, DISCARD);
+   break;
case Opt_space_cache:
btrfs_set_opt(info-mount_opt, SPACE_CACHE);
break;
-- 
1.8.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 7/9] btrfs: Add datacow mount option.

2014-01-05 Thread Qu Wenruo
Add datacow mount option to enable copy-on-write with
remount option.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
Reviewed-by: David Sterba dste...@suse.cz

---
Changelog:
v2: add datacow mount option
v3: None
---
 Documentation/filesystems/btrfs.txt | 5 +++--
 fs/btrfs/super.c| 8 +++-
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/btrfs.txt 
b/Documentation/filesystems/btrfs.txt
index 79c08f3..bbd1f0f 100644
--- a/Documentation/filesystems/btrfs.txt
+++ b/Documentation/filesystems/btrfs.txt
@@ -154,9 +154,10 @@ Options with (*) are default options and will not show in 
the mount options.
(non-battery-backed) write-back cache, nobarrier option will lead to
filesystem corruption on a system crash or power loss.
 
+  datacow(*)
   nodatacow
-   Disable data copy-on-write for newly created files.  Implies nodatasum,
-   and disables all compression.
+   Enable/disable data copy-on-write for newly created files.
+   Nodatacow implies nodatasum, and disables all compression.
 
   nodatasum
Disable data checksumming for newly created files.
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 3d743cf..1bf9202 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -324,7 +324,7 @@ enum {
Opt_check_integrity, Opt_check_integrity_including_extent_data,
Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree,
Opt_commit_interval, Opt_barrier, Opt_nodefrag, Opt_nodiscard,
-   Opt_noenospc_debug, Opt_noflushoncommit, Opt_acl,
+   Opt_noenospc_debug, Opt_noflushoncommit, Opt_acl, Opt_datacow,
Opt_err,
 };
 
@@ -335,6 +335,7 @@ static match_table_t tokens = {
{Opt_device, device=%s},
{Opt_nodatasum, nodatasum},
{Opt_nodatacow, nodatacow},
+   {Opt_datacow, datacow},
{Opt_nobarrier, nobarrier},
{Opt_barrier, barrier},
{Opt_max_inline, max_inline=%s},
@@ -446,6 +447,11 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
btrfs_set_opt(info-mount_opt, NODATACOW);
btrfs_set_opt(info-mount_opt, NODATASUM);
break;
+   case Opt_datacow:
+   if (btrfs_test_opt(root, NODATACOW))
+   btrfs_info(root-fs_info, setting datacow);
+   btrfs_clear_opt(info-mount_opt, NODATACOW);
+   break;
case Opt_compress_force:
case Opt_compress_force_type:
compress_force = true;
-- 
1.8.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 8/9] btrfs: Add datasum mount option.

2014-01-05 Thread Qu Wenruo
Add datasum mount option to enable checksum with
remount option.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
Reviewed-by: David Sterba dste...@suse.cz

---
Changelog:
v2: Add datasum option
v3: None
---
 Documentation/filesystems/btrfs.txt |  4 +++-
 fs/btrfs/super.c| 10 ++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/btrfs.txt 
b/Documentation/filesystems/btrfs.txt
index bbd1f0f..e05c6ae 100644
--- a/Documentation/filesystems/btrfs.txt
+++ b/Documentation/filesystems/btrfs.txt
@@ -159,8 +159,10 @@ Options with (*) are default options and will not show in 
the mount options.
Enable/disable data copy-on-write for newly created files.
Nodatacow implies nodatasum, and disables all compression.
 
+  datasum(*)
   nodatasum
-   Disable data checksumming for newly created files.
+   Enable/disable data checksumming for newly created files.
+   Datasum implies datacow.
 
   notreelog
Disable the tree logging used for fsync and O_SYNC writes.
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 1bf9202..fa74252 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -325,6 +325,7 @@ enum {
Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree,
Opt_commit_interval, Opt_barrier, Opt_nodefrag, Opt_nodiscard,
Opt_noenospc_debug, Opt_noflushoncommit, Opt_acl, Opt_datacow,
+   Opt_datasum,
Opt_err,
 };
 
@@ -334,6 +335,7 @@ static match_table_t tokens = {
{Opt_subvolid, subvolid=%s},
{Opt_device, device=%s},
{Opt_nodatasum, nodatasum},
+   {Opt_datasum, datasum},
{Opt_nodatacow, nodatacow},
{Opt_datacow, datacow},
{Opt_nobarrier, nobarrier},
@@ -434,6 +436,14 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
btrfs_info(root-fs_info, setting nodatasum);
btrfs_set_opt(info-mount_opt, NODATASUM);
break;
+   case Opt_datasum:
+   if (btrfs_test_opt(root, NODATACOW))
+   btrfs_info(root-fs_info, setting datasum, 
datacow enabled);
+   else
+   btrfs_info(root-fs_info, setting datasum);
+   btrfs_clear_opt(info-mount_opt, NODATACOW);
+   btrfs_clear_opt(info-mount_opt, NODATASUM);
+   break;
case Opt_nodatacow:
if (!btrfs_test_opt(root, COMPRESS) ||
!btrfs_test_opt(root, FORCE_COMPRESS)) {
-- 
1.8.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 9/9] btrfs: Add treelog mount option.

2014-01-05 Thread Qu Wenruo
Add treelog mount option to enable tree log with
remount option.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
Reviewed-by: David Sterba dste...@suse.cz

---
Changelog:
v2: Add treelog option
v3: None
---
 Documentation/filesystems/btrfs.txt | 3 ++-
 fs/btrfs/super.c| 8 +++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/btrfs.txt 
b/Documentation/filesystems/btrfs.txt
index e05c6ae..d11cc2f 100644
--- a/Documentation/filesystems/btrfs.txt
+++ b/Documentation/filesystems/btrfs.txt
@@ -164,8 +164,9 @@ Options with (*) are default options and will not show in 
the mount options.
Enable/disable data checksumming for newly created files.
Datasum implies datacow.
 
+  treelog(*)
   notreelog
-   Disable the tree logging used for fsync and O_SYNC writes.
+   Enable/disable the tree logging used for fsync and O_SYNC writes.
 
   recovery
Enable autorecovery attempts if a bad tree root is found at mount time.
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index fa74252..d353b9e 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -325,7 +325,7 @@ enum {
Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree,
Opt_commit_interval, Opt_barrier, Opt_nodefrag, Opt_nodiscard,
Opt_noenospc_debug, Opt_noflushoncommit, Opt_acl, Opt_datacow,
-   Opt_datasum,
+   Opt_datasum, Opt_treelog,
Opt_err,
 };
 
@@ -353,6 +353,7 @@ static match_table_t tokens = {
{Opt_acl, acl},
{Opt_noacl, noacl},
{Opt_notreelog, notreelog},
+   {Opt_treelog, treelog},
{Opt_flushoncommit, flushoncommit},
{Opt_noflushoncommit, noflushoncommit},
{Opt_ratio, metadata_ratio=%d},
@@ -579,6 +580,11 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
btrfs_info(root-fs_info, disabling tree log);
btrfs_set_opt(info-mount_opt, NOTREELOG);
break;
+   case Opt_treelog:
+   if (btrfs_test_opt(root, NOTREELOG))
+   btrfs_info(root-fs_info, enabling tree log);
+   btrfs_clear_opt(info-mount_opt, NOTREELOG);
+   break;
case Opt_flushoncommit:
btrfs_info(root-fs_info, turning on flush-on-commit);
btrfs_set_opt(info-mount_opt, FLUSHONCOMMIT);
-- 
1.8.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 6/9] btrfs: Add acl mount option.

2014-01-05 Thread Qu Wenruo
Add acl mount option to enable acl with remount option.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
Reviewed-by: David Sterba dste...@suse.cz

---
Changelog:
v2: add acl option
v3: None
---
 Documentation/filesystems/btrfs.txt | 3 ++-
 fs/btrfs/super.c| 6 +-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/btrfs.txt 
b/Documentation/filesystems/btrfs.txt
index 303b49c..79c08f3 100644
--- a/Documentation/filesystems/btrfs.txt
+++ b/Documentation/filesystems/btrfs.txt
@@ -141,8 +141,9 @@ Options with (*) are default options and will not show in 
the mount options.
Specify that 1 metadata chunk should be allocated after every value
data chunks.  Off by default.
 
+  acl(*)
   noacl
-   Disable support for Posix Access Control Lists (ACLs).  See the
+   Enable/disable support for Posix Access Control Lists (ACLs).  See the
acl(5) manual page for more information about ACLs.
 
   barrier(*)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index b2c752e..3d743cf 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -324,7 +324,7 @@ enum {
Opt_check_integrity, Opt_check_integrity_including_extent_data,
Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree,
Opt_commit_interval, Opt_barrier, Opt_nodefrag, Opt_nodiscard,
-   Opt_noenospc_debug, Opt_noflushoncommit,
+   Opt_noenospc_debug, Opt_noflushoncommit, Opt_acl,
Opt_err,
 };
 
@@ -347,6 +347,7 @@ static match_table_t tokens = {
{Opt_ssd, ssd},
{Opt_ssd_spread, ssd_spread},
{Opt_nossd, nossd},
+   {Opt_acl, acl},
{Opt_noacl, noacl},
{Opt_notreelog, notreelog},
{Opt_flushoncommit, flushoncommit},
@@ -552,6 +553,9 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
goto out;
}
break;
+   case Opt_acl:
+   root-fs_info-sb-s_flags |= MS_POSIXACL;
+   break;
case Opt_noacl:
root-fs_info-sb-s_flags = ~MS_POSIXACL;
break;
-- 
1.8.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html