[PATCH] btrfs-progs: Fix a extent buffer leak in count_csum_range().

2015-01-04 Thread Qu Wenruo
The commit f495a2ac6611 (btrfs-progs: fsck: remove unfriendly BUG_ON()
for searching tree failure) is causing tons of extent buffer leak if some
csum mismatches in btrfsck.

This is caused by a misplaced btrfs_release_path(), fix it.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 cmds-check.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cmds-check.c b/cmds-check.c
index d2d218a..5b644cf 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -1186,9 +1186,9 @@ static int count_csum_range(struct btrfs_root *root, u64 
start,
path.slots[0]++;
}
 out:
+   btrfs_release_path(path);
if (ret  0)
return ret;
-   btrfs_release_path(path);
return 0;
 }
 
-- 
2.2.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Uncorrectable errors on RAID-1?

2015-01-04 Thread Chris Murphy
On Sun, Jan 4, 2015 at 9:18 PM, Phillip Susi ps...@ubuntu.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA512

 On 01/03/2015 12:31 AM, Chris Murphy wrote:

 This is architecture astronaut territory.

 The system only has a terrible response for two reasons: 1. The
 user spec'd the wrong hardware for the use case; 2. The distro
 isn't automatically leveraging existing ways to mitigate that user
 mistake by changing either SCT ERC on the drives, or the SCSI
 command timer for each block device.

 No, it has terrible response because the kernel either waits an
 unreasonable time or fails the drive and kicks it out of the array
 instead of trying to repair it.

It's a default that works for more use cases than not. The kernel
isn't dynamically self-configuring, and it isn't even the kernel's job
to take the first step which is to enable and correctly set SCT ERC on
each drive.

I think assuming a large pile of causes for a drive freezing on a
command be treated as read errors (after the link reset) is a bad
idea. But since it's your idea, and I'm not a kernel developer, you
should propose it on linux-raid@ instead of arguing with me.


 Blaming the user for not buying
 better hardware is not an appropriate response for the kernel failing
 so badly to handle commonly available hardware that doesn't behave in
 the most ideal way.

Hi, I'm a good and knowledgeable sysadmin. I buy hardware that's
explicitly stated in the company's marketing data sheet as being
incompatible with my use case. This is someone else's fault.

Sounds like buck passing.

 Now, even though that solution *might* mean long recoveries on
 occasion, it's still better than link reset behavior which is what
 we have today because it causes the underlying problem to be fixed
 by md/dm/Btrfs once the read error is reported. But no distro has
 implemented this $500 man hour solution. Instead you're suggesting
 a $500,000 fix that will take hundreds of man hours and end user
 testing to find all the edge cases. It's like, seriously, WTF?

 Seriously?  Treating a timeout the same way you treat an unrecoverable
 media error is no herculean task.

So you keep saying.

But best practices is already known and tested, and can be done with a
startup script. Yet no distro does this for the user, even though its
much much simpler than what you're proposing, and actually fixes both
sources of the problem.

That it is in your opinion an imperfect fix is not relevant. It's
still better behavior than what we have today, and yet still no distro
does this, thereby tacitly preferring status quo. And if the current
behavior is simply good enough no one has taken action to implement
automatically the known best practice work around of the day, why
should kernel developers gives two shits about this idea? Sounds like
more buck passing.



 http://www.seagate.com/files/www-content/support-content/documentation/product-manuals/en-us/Enterprise/Savvio/Savvio%2015K.3/100629381e.pdf

  That's a high end SAS drive. It's default is to retry up to 20
 times, which takes ~1.4 seconds, per sector. But also note how it
 says

 20 retries on a 15,000 rpm drive only takes 80 milliseconds, not 1.4
 seconds.  15,000 rpm / 60 seconds per minute = 250 rotations/retries
 per second.

The PDF contains a table saying 20 retries takes 1.4 seconds. I didn't
compute this number myself, it's in the bloody manufacturer's own
documentation. Obviously the ECC is doing things that take more than
one revolution of the spindle.


 Maybe you'd prefer seeing these big, cheap, green drives have
 shorter ERC times, with a commensurate reality check with their
 unrecoverable error rate, which right now is already two orders
 magnitude higher than enterprise SAS drives. So what if this means
 that rate is 3 or 4 orders magnitude higher?

 20 retries vs. 200 retries does not reduce the URE rate by orders of
 magnitude; more like 1% *maybe*.  200 vs 2000 makes no measurable
 difference at all.

I see, well I guess you prefer believing in fraud and conspiracy
theories, by multiple companies, to screw users over, while they admit
the incompatibility of the intended use case on their data sheets.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ignoring bad blocks

2015-01-04 Thread Marc MERLIN
On Sun, Jan 04, 2015 at 01:45:41AM -0700, Chris Murphy wrote:
 On Sat, Jan 3, 2015 at 10:40 PM, Dyweni - BTRFS y4bwxfpc4...@dyweni.com 
 wrote:
  Hi All,
 
  Can BTRFS ignore bad blocks as they are discovered?
 
  I want to try BTRFS on some older drives, but they all have a few bad
  blocks.
 
 Not currently, and I don't see it in the project ideas list. Right now
 on Btrfs you will just get write errors, but I'm uncertain if it just
 tries a new sector and continues on (indirectly not use the bad sector
 but also not keeping track of it either)? The unreliable disk features
 are still project ideas.

badblocks are a thing of the past, as you hinted drives automatically
remap badblocks so that the filesystem doesn't have to deal with them.

If you have a questionable drive, you can indeed simply dd 0's over it
before you use it with btrfs.

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ignoring bad blocks

2015-01-04 Thread Chris Murphy
On Sat, Jan 3, 2015 at 10:40 PM, Dyweni - BTRFS y4bwxfpc4...@dyweni.com wrote:
 Hi All,

 Can BTRFS ignore bad blocks as they are discovered?

 I want to try BTRFS on some older drives, but they all have a few bad
 blocks.

Not currently, and I don't see it in the project ideas list. Right now
on Btrfs you will just get write errors, but I'm uncertain if it just
tries a new sector and continues on (indirectly not use the bad sector
but also not keeping track of it either)? The unreliable disk features
are still project ideas.

If the drives no longer have reserve sectors, then technically they're
toast. That's indicated by write failure in dmesg. Two work arounds:
use ext4 with mkfs.ext4 -c which builds a bad blocks list and then
won't use those sectors; mdadm 3.1+ has an option to build a bad
blocks list also but I don't know if raid0 or linear/concat are
supported:
http://thread.gmane.org/gmane.linux.raid/34883

If you haven't tried it, badblocks -wvs will (destructively) write
over the entire block device, and the drive firmware should detect
persistent write failures automatically and remap the LBA to a reserve
sector, removing the bad sector from use. This is transparent to
everything outside the drive. Once reserve sectors are depleted then
the drive will report write failure.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: possible bug in balance

2015-01-04 Thread Erkki Seppala
lu...@plaintext.sk writes:

Hello,

 luvar@blackdawn:~$ sudo time btrfs balance start -dconvert=raid1 -dusage=20 
 /home/luvar/programs/

 Am I doing something forbidden (I have not see any structure where
 raid type is stored per file/subvolume item), or I just hit some
 problem? What should I try?

btrfs doesn't yet support per-subvolume RAID1 levels. I'm not sure how
it should behave with your command line. It probably tries to
rebalance the whole filesystem.

 Than I wanted to convert to raid1 also some data (with balance
 filter) and try if there is some speedup when reading files
 (starting programs)...

Though I can already tell that no, there won't be a speedup, as btrfs
scheduler chooses the device to access by using the process id as a
seed. Therefore a single thread is never able to use 100% RAID1 input
capability. Perhaps in future there will be more sophisticated
schedulers. You may try to use MD raid1 for extra speed, but you would
lose the automatic error recovery of btrfs (but you would still notice
if data gets corrupted).

 [ 8159.300427] attempt to access beyond end of device
 [ 8159.300434] sdb2: rw=1041, want=480110048, limit=473956352
 [ 8159.300440] btrfs: bdev /dev/sdb2 errs: wr 638628, rd 65867, flush 0, 
 corrupt 0, gen 0

I have noticed that 'attempt to access beyond end of device' typically
indicates (with other file systems, I haven't seen that with btrfs)
that the partition table and the filesystem size don't
match. Typically such a situation could occur when one modifies
partition table after creating the file system, though I'm sure there
are other ways to get into such a situation. You may find the
filesystem size with btrfs filesystem show and partition sizes with
cat /proc/partitions (multiply by block size = 1024 bytes).

Should the partition sizes and filesystem sizes match, I would be
quite certain this would indeed be a btrfs bug. But,

 root@blackdawn:/home/luvar# uname -a
 Linux blackdawn 3.13.0-30-generic #55-Ubuntu SMP Fri Jul 4 21:40:53 UTC 2014 
 x86_64 x86_64 x86_64 GNU/Linux
 root@blackdawn:/home/luvar# btrfs v
 Btrfs v0.20-rc1-189-g704a08c

should this turn out to be a bug, I'm certain trying a more recent
kernel version is a terrific idea :). 3.18.x or 3.17.y where y2 (I
think those were the two versions that were bad in 3.17
series..). They won't have support for raid1'n a subvolume either,
though, as far as I know.

Remember backups :).

-- 
  _
 / __// /__   __   http://www.modeemi.fi/~flux/\   \
/ /_ / // // /\ \/ /\  /
   /_/  /_/ \___/ /_/\_\@modeemi.fi  \/

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] E2fsprogs: add compress and cow support in chattr, lsattr

2015-01-04 Thread Erkki Seppala
Lutz Vieweg l...@5t9.de writes:

 Maybe chattr +C could print a warning if a file
 to change attributes for is  0 bytes long?

This may only affect btrfs. The old ext2? ext3? compression patches
were able to compress pre-existing files. I don't know how other
filesystems behave in this regard.

-- 
  _
 / __// /__   __   http://www.modeemi.fi/~flux/\   \
/ /_ / // // /\ \/ /\  /
   /_/  /_/ \___/ /_/\_\@modeemi.fi  \/

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fixing quota error when removing files from a limit exceeded subvols

2015-01-04 Thread Dongsheng Yang
On Sat, Jan 3, 2015 at 10:29 PM, Khaled Ahmed khaled@gmail.com wrote:
 Hi Yang,

 This is how to reproduce the bug,

 [root@algodev ~]# uname -r
 3.18.0+

 [root@algodev ~]# btrfs version
 Btrfs v3.18-2-g6938452-dirty

 [root@algodev ~]# btrfs quota enable LOOP/
 [root@algodev ~]# btrfs qgroup show  LOOP/
 qgroupid rfer  excl
    
 0/5  16384 16384

 [root@algodev ~]# btrfs subvol create LOOP/subvol1
 Create subvolume 'LOOP/subvol1'

 [root@algodev ~]# btrfs qgroup limit 1g LOOP/subvol1/

 [root@algodev ~]# btrfs qgroup show  LOOP/
 qgroupid rfer  excl
    
 0/5  16384 16384
 0/25816384 16384

 [root@algodev ~]# dd if=/dev/zero of=LOOP/subvol1/bigfile
 dd: writing to ‘LOOP/subvol1/bigfile’: Disk quota exceeded
 2097018+0 records in
 2097017+0 records out
 1073672704 bytes (1.1 GB) copied, 10.0759 s, 107 MB/s

 [root@algodev ~]# rm -f LOOP/subvol1/bigfile
 rm: cannot remove ‘LOOP/subvol1/bigfile’: Disk quota exceeded

Hi Ahmed,
Okey, thanx for your example.

a). I guess your problem is getting a EQUOTA when remove a file here.
It's because we need to reserve some metadata in transaction of btrfs_unlink().

b). I think you patch here will not solve your problem. The root cause is
current quota in btrfs is accounting data and metadata together.

c). I admit getting a EQUOTA is strange when you did not writing anything but
only remove a file. I had a plan in my TODO list which is making qgroup to
limit and account the size in three modes, data, metadata and both.
Then in this case
if you only limit the size of data, you will not get a EQUOTA any more.

Thanx
Yang
 [root@algodev ~]#

 Best Regards,
 ~Khaled Ahmed


 On Jan 3, 2015, at 4:09 AM, Dongsheng Yang dongsheng081...@gmail.com wrote:

 Hi Khaled,

 Could you give use more description about the problem this patch
 is trying to solve? Maybe an example will help a lot to understand it.

 Thanx

 On Fri, Jan 2, 2015 at 7:48 AM, Khaled Ahmed khaled@gmail.com wrote:
 Signed-off-by: Khaled Ahmed khaled@gmail.com
 ---
 fs/btrfs/qgroup.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

 diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
 index 48b60db..b85200d 100644
 --- a/fs/btrfs/qgroup.c
 +++ b/fs/btrfs/qgroup.c
 @@ -2408,14 +2408,14 @@ int btrfs_qgroup_reserve(struct btrfs_root *root, 
 u64 num_bytes)

if ((qg-lim_flags  BTRFS_QGROUP_LIMIT_MAX_RFER) 
qg-reserved + (s64)qg-rfer + num_bytes 
 -   qg-max_rfer) {
 +   qg-max_rfer - 1 ) {
ret = -EDQUOT;
goto out;
}

if ((qg-lim_flags  BTRFS_QGROUP_LIMIT_MAX_EXCL) 
qg-reserved + (s64)qg-excl + num_bytes 
 -   qg-max_excl) {
 +   qg-max_excl - 1) {
ret = -EDQUOT;
goto out;
}
 --
 2.1.0

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] E2fsprogs: add compress and cow support in chattr, lsattr

2015-01-04 Thread Martin Steigerwald
Am Sonntag, 4. Januar 2015, 12:40:59 schrieb Erkki Seppala:
 Lutz Vieweg l...@5t9.de writes:
 
  Maybe chattr +C could print a warning if a file
  to change attributes for is  0 bytes long?
 
 This may only affect btrfs. The old ext2? ext3? compression patches
 were able to compress pre-existing files. I don't know how other
 filesystems behave in this regard.

+C is no-cow, -c is compression.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Debian/Jessie 3.16.7-ckt2-1 kernel error

2015-01-04 Thread Satoru Takeuchi

Hi Petr,

On 2014/12/28 0:36, Petr Janecek wrote:

Hello Satoru and all,

   that Oct. report was the only time I've experienced the error, so I
don't have much to add. I can try to answer your questions:


Here are my questions.

1. Is your system btrfs scrub clean?


   yes,


2. Is this message shown every boot time?


   no, I have seen them only during one boot


3. Is this message shown only in boot?


   As in my Oct. email
http://permalink.gmane.org/gmane.comp.file-systems.btrfs/39721
I've seen a similar one after creating a subvolume on a new fs.
But it was during the same boot.


4. When this message is started to be shown?
5. Do you have any trouble, change your operation or configuration
just before the answer of Q4 ?


   a disk was added to the fs and balance has been run. The balance
crashed, as in https://bugzilla.kernel.org/show_bug.cgi?id=64961
(probably unrelated).  After reboot, I've seen the messages.


Additional questions.
Q5. Could you give me your kernel configuration?
 At least, could you tell me whether your kernel
 enabled CONFIG_PREEMPT or not?


   CONFIG_PREEMPT_VOLUNTARY=y


Q6. If the answer of Q1 is correct, please give me the
 file system image which can be captured by the following command.


   Sorry, the fs's are long gone. I continued to run similar workloads on
that test box, but these errors never appeared again.


Thank you for giving me information.

So, further investigation of this problem seems to be hard.
Please give us the above-mentioned information if this problem
happens again.

Thanks,
Satoru




Regards,

Petr
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] btrfs-progs: Documentation: add T/P/E description for resize cmd

2015-01-04 Thread Gui Hecheng
On Fri, 2015-01-02 at 17:21 +0100, David Sterba wrote:
 On Fri, Jan 02, 2015 at 05:12:04PM +0100, David Sterba wrote:
  On Thu, Jan 01, 2015 at 08:27:55PM -0700, Chris Murphy wrote:
   Small problem with the rendering of this commit
   d4ef1a06f8be623ae94e4d498c306e8dd1605bef, when I use 'man btrfs
   filesystem' the above portion looks like this:
   
'K', 'M', 'G', 'T', 'P', or 'E\',
   
   I'm not sure why there's a trailing slash after the E.
  
  Me neither, but it looks like a bug in the asciidoc processing.
 
 Seems that only the first ' has to be quoted, and consumes the next
 unquoted ' as a pair, so with the last \' the next one is missing and
 is printed verbatim:
 
 Fixed by:
 
 -units designators: \'K\', \'M\', \'G\', \'T\', \'P\', or \'E\', which 
 represent
 +units designators: \'K', \'M', \'G', \'T', \'P', or \'E', which represent
 

Oh, sorry, I missed this problem, thanks for fixing it.

-Gui

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] xfstests: btrfs: fix up 001.out

2015-01-04 Thread Eryu Guan
On Fri, Jan 02, 2015 at 09:04:29PM +0800, Anand Jain wrote:
 The subvol delete output has changed with btrfs-progs

Better to point out that since which btrfs-progs version the output
changed.

 -Delete subvolume 'SCRATCH_MNT/snap'
 +Delete subvolume (no-commit): 'SCRATCH_MNT/snap'
 
 so fix 001 failing.
 
 Signed-off-by: Anand Jain anand.j...@oracle.com
 
 v2: Thanks Filipe for mentioning now we have _run_btrfs_util_prog. and
 commit update

I think a better way to fix this is to update the
_filter_btrfs_subvol_delete filter

Right now the filter does delete message about transaction commit:

sed -e /Transaction commit: none (default)/d

Just adding another -e to sed to delete the (no-commit): part is fine.

Thanks,
Eryu
 ---
  tests/btrfs/001 | 2 +-
  tests/btrfs/001.out | 1 -
  2 files changed, 1 insertion(+), 2 deletions(-)
 
 diff --git a/tests/btrfs/001 b/tests/btrfs/001
 index 8258d06..a7747c8 100755
 --- a/tests/btrfs/001
 +++ b/tests/btrfs/001
 @@ -99,7 +99,7 @@ echo Listing subvolumes
  $BTRFS_UTIL_PROG subvolume list $SCRATCH_MNT | awk '{ print $NF }'
  
  # Delete the snapshot
 -$BTRFS_UTIL_PROG subvolume delete $SCRATCH_MNT/snap | 
 _filter_btrfs_subvol_delete
 +_run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap
  echo List root dir
  ls $SCRATCH_MNT
  _scratch_remount
 diff --git a/tests/btrfs/001.out b/tests/btrfs/001.out
 index c782bde..43e8c56 100644
 --- a/tests/btrfs/001.out
 +++ b/tests/btrfs/001.out
 @@ -33,7 +33,6 @@ subvol
  Listing subvolumes
  snap
  subvol
 -Delete subvolume 'SCRATCH_MNT/snap'
  List root dir
  subvol
  List root dir
 -- 
 2.0.0.153.g79d
 
 --
 To unsubscribe from this list: send the line unsubscribe fstests in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Uncorrectable errors on RAID-1?

2015-01-04 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 01/03/2015 12:31 AM, Chris Murphy wrote:
 It's not a made to order hard drive industry. Maybe one day you'll
 be able to 3D print your own with its own specs.

And wookies did not live on endor.  What's your point?

 Sticking fingers in your ears doesn't change the fact there's a 
 measurable difference in support requirements.

Sure, just don't misrepresent one requirement for another.  Just
because I don't care about a warranty from the hardware manufacturer
does not mean I have no right to expect the kernel to perform
*reasonably* on that hardware.

 This is architecture astronaut territory.
 
 The system only has a terrible response for two reasons: 1. The
 user spec'd the wrong hardware for the use case; 2. The distro
 isn't automatically leveraging existing ways to mitigate that user
 mistake by changing either SCT ERC on the drives, or the SCSI
 command timer for each block device.

No, it has terrible response because the kernel either waits an
unreasonable time or fails the drive and kicks it out of the array
instead of trying to repair it.  Blaming the user for not buying
better hardware is not an appropriate response for the kernel failing
so badly to handle commonly available hardware that doesn't behave in
the most ideal way.

 Now, even though that solution *might* mean long recoveries on 
 occasion, it's still better than link reset behavior which is what
 we have today because it causes the underlying problem to be fixed
 by md/dm/Btrfs once the read error is reported. But no distro has 
 implemented this $500 man hour solution. Instead you're suggesting
 a $500,000 fix that will take hundreds of man hours and end user
 testing to find all the edge cases. It's like, seriously, WTF?

Seriously?  Treating a timeout the same way you treat an unrecoverable
media error is no herculean task.

 Ok well I think that's hubris unless you're a hard drive engineer. 
 You're referring to how drives behaved over a decade ago, when bad 
 sectors were persistent rather than remapped, and we had to scan
 the drive at format time to build a map so the bad ones wouldn't be
 used by the filesystem.

Remapping has nothing to do with it: we are talking about *read*
errors, which do not trigger a remap.

 http://www.seagate.com/files/www-content/support-content/documentation/product-manuals/en-us/Enterprise/Savvio/Savvio%2015K.3/100629381e.pdf

  That's a high end SAS drive. It's default is to retry up to 20
 times, which takes ~1.4 seconds, per sector. But also note how it
 says

20 retries on a 15,000 rpm drive only takes 80 milliseconds, not 1.4
seconds.  15,000 rpm / 60 seconds per minute = 250 rotations/retries
per second.

 Maybe you'd prefer seeing these big, cheap, green drives have 
 shorter ERC times, with a commensurate reality check with their 
 unrecoverable error rate, which right now is already two orders 
 magnitude higher than enterprise SAS drives. So what if this means 
 that rate is 3 or 4 orders magnitude higher?

20 retries vs. 200 retries does not reduce the URE rate by orders of
magnitude; more like 1% *maybe*.  200 vs 2000 makes no measurable
difference at all.


-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBCgAGBQJUqhCxAAoJENRVrw2cjl5RhDYH/RLbHXEPyjK4j6u33ElOyS5S
W5/nfiT1ZZjVAFxJwD0y/gt2L61hB1PQdlUjBm2NayExfCXn3sEuccAxvjMDrvsL
dFJOV8G/7GBbUfsD0uBustG5639QGc30bRzuiw/URT77zNf+T6+5SmTPSC3Oaj3j
fCcDdiKCwNcYiUF3/Q3gdh4XVI8wgoABHC2S/GqvRB+FmmqD6Yt6yG50TG5sPBzq
zSUSxWjOPwVinZOlPfCUCFr3buw+yzg5fclcvaNRStJM38gtK0UGgeIHFgCViHtN
0xNRCKWMu3XkfjfOI/cYVor79K4sQlz9K83Ja/UAMrOtopdlKjn9N04oIiPdsbg=
=u/i9
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/3] Btrfs: Enhancment for qgroup.

2015-01-04 Thread Dongsheng Yang
Hi Josef and others,

This patch set is about enhancing qgroup.

[1/3]: fix a bug about qgroup leak when we exceed quota limit,
It is reviewd by Josef.
[2/3]: introduce a new accounter in qgroup to close a window where
user will exceed the limit by qgroup. It looks good to Josef.
[3/3]: a new patch to fix a bug reported by Satoru.

BTW, I have some other plan about qgroup in my TODO list:

Kernel:
a). adjust the accounters in parent qgroup when we move
the child qgroup.
Currently, when we move a qgroup, the parent qgroup
will not updated at the same time. This will cause some wrong
numbers in qgroup.

b). add a ioctl to show the qgroup info.
Command btrfs qgroup show is showing the qgroup info
read from qgroup tree. But there is some information in memory
which is not synced into device. Then it will show some outdate
number.

c). limit and account size in 3 modes, data, metadata and both.
qgroup is accounting the size both of data and metadata
togather, but to a user, the data size is the most useful to them.

d). remove a subvolume related qgroup when subvolume is deleted and
there is no other reference to it.

user-tool:
a). Add the unit of B/K/M/G to btrfs qgroup show.
b). get the information via ioctl rather than reading it from
btree. Will keep the old way as a fallback for compatiblity.

Any comment and sugguestion is welcome. :)

Yang

Dongsheng Yang (3):
  Btrfs: qgroup: free reserved in exceeding quota.
  Btrfs: qgroup: Introduce a may_use to account
space_info-bytes_may_use.
  Btrfs: qgroup, Account data space in more proper timings.

 fs/btrfs/extent-tree.c | 41 +++---
 fs/btrfs/file.c|  9 ---
 fs/btrfs/inode.c   | 18 -
 fs/btrfs/qgroup.c  | 68 +++---
 fs/btrfs/qgroup.h  |  4 +++
 5 files changed, 117 insertions(+), 23 deletions(-)

-- 
1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/3] Btrfs: qgroup, Account data space in more proper timings.

2015-01-04 Thread Dongsheng Yang
Currenly, in data writing, -reserved is accounted in
fill_delalloc(), but -may_use is released in clear_bit_hook()
which is called by btrfs_finish_ordered_io(). That's too late,
that said, between fill_delalloc() and btrfs_finish_ordered_io(),
the data is doublely accounted by qgroup. It will cause some
unexpected -EDQUOT.

Example:
# btrfs quota enable /root/btrfs-auto-test/
# btrfs subvolume create /root/btrfs-auto-test//sub
Create subvolume '/root/btrfs-auto-test/sub'
# btrfs qgroup limit 1G /root/btrfs-auto-test//sub
dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=150
dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded
681353+0 records in
681352+0 records out
697704448 bytes (698 MB) copied, 8.15563 s, 85.5 MB/s
It's (698 MB) when we got an -EDQUOT, but we limit it by 1G.

This patch move the btrfs_qgroup_reserve/free() for data from
btrfs_delalloc_reserve/release_metadata() to btrfs_check_data_free_space()
and btrfs_free_reserved_data_space(). Then the accounter in qgroup
will be updated at the same time with the accounter in space_info updated.
In this way, the unexpected -EDQUOT will be killed.

Reported-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
Signed-off-by: Dongsheng Yang yangds.f...@cn.fujitsu.com
---
 fs/btrfs/extent-tree.c | 16 +---
 fs/btrfs/file.c|  9 -
 2 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index d1a7ce0..67c2e28 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3774,12 +3774,16 @@ commit_trans:
  data_sinfo-flags, bytes, 1);
return -ENOSPC;
}
+   ret = btrfs_qgroup_reserve(root, bytes);
+   if (ret)
+   goto out;
data_sinfo-bytes_may_use += bytes;
trace_btrfs_space_reservation(root-fs_info, space_info,
  data_sinfo-flags, bytes, 1);
+out:
spin_unlock(data_sinfo-lock);
 
-   return 0;
+   return ret;
 }
 
 /*
@@ -3796,6 +3800,7 @@ void btrfs_free_reserved_data_space(struct inode *inode, 
u64 bytes)
data_sinfo = root-fs_info-data_sinfo;
spin_lock(data_sinfo-lock);
WARN_ON(data_sinfo-bytes_may_use  bytes);
+   btrfs_qgroup_free(root, bytes);
data_sinfo-bytes_may_use -= bytes;
trace_btrfs_space_reservation(root-fs_info, space_info,
  data_sinfo-flags, bytes, 0);
@@ -5191,8 +5196,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, 
u64 num_bytes)
spin_unlock(BTRFS_I(inode)-lock);
 
if (root-fs_info-quota_enabled) {
-   ret = btrfs_qgroup_reserve(root, num_bytes +
-  nr_extents * root-nodesize);
+   ret = btrfs_qgroup_reserve(root, nr_extents * root-nodesize);
if (ret)
goto out_fail;
}
@@ -5200,8 +5204,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, 
u64 num_bytes)
ret = reserve_metadata_bytes(root, block_rsv, to_reserve, flush);
if (unlikely(ret)) {
if (root-fs_info-quota_enabled)
-   btrfs_qgroup_free(root, num_bytes +
-   nr_extents * root-nodesize);
+   btrfs_qgroup_free(root, nr_extents * root-nodesize);
goto out_fail;
}
 
@@ -5319,8 +5322,7 @@ void btrfs_delalloc_release_metadata(struct inode *inode, 
u64 num_bytes)
trace_btrfs_space_reservation(root-fs_info, delalloc,
  btrfs_ino(inode), to_free, 0);
if (root-fs_info-quota_enabled) {
-   btrfs_qgroup_free(root, num_bytes +
-   dropped * root-nodesize);
+   btrfs_qgroup_free(root, dropped * root-nodesize);
}
 
btrfs_block_rsv_release(root, root-fs_info-delalloc_block_rsv,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index e409025..0ab1333 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2527,7 +2527,6 @@ static long btrfs_fallocate(struct file *file, int mode,
 {
struct inode *inode = file_inode(file);
struct extent_state *cached_state = NULL;
-   struct btrfs_root *root = BTRFS_I(inode)-root;
u64 cur_offset;
u64 last_byte;
u64 alloc_start;
@@ -2555,11 +2554,6 @@ static long btrfs_fallocate(struct file *file, int mode,
ret = btrfs_check_data_free_space(inode, alloc_end - alloc_start);
if (ret)
return ret;
-   if (root-fs_info-quota_enabled) {
-   ret = btrfs_qgroup_reserve(root, alloc_end - alloc_start);
-   if (ret)
-   goto out_reserve_fail;
-   }
 
mutex_lock(inode-i_mutex);
ret = inode_newsize_ok(inode, 

Re: [PATCH v2] xfstests: btrfs: fix up 001.out

2015-01-04 Thread Anand Jain



On 01/05/2015 11:25 AM, Eryu Guan wrote:

On Fri, Jan 02, 2015 at 09:04:29PM +0800, Anand Jain wrote:

The subvol delete output has changed with btrfs-progs


Better to point out that since which btrfs-progs version the output
changed.


 The fix here is output string change neutral, so it does not matter.


 -Delete subvolume 'SCRATCH_MNT/snap'
 +Delete subvolume (no-commit): 'SCRATCH_MNT/snap'

so fix 001 failing.

Signed-off-by: Anand Jain anand.j...@oracle.com

v2: Thanks Filipe for mentioning now we have _run_btrfs_util_prog. and
 commit update


I think a better way to fix this is to update the
_filter_btrfs_subvol_delete filter

Right now the filter does delete message about transaction commit:

sed -e /Transaction commit: none (default)/d

Just adding another -e to sed to delete the (no-commit): part is fine.


 in this case checking for the output string was fundamentally wrong 
for a long.


Thanks, Anand


Thanks,
Eryu

---
  tests/btrfs/001 | 2 +-
  tests/btrfs/001.out | 1 -
  2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/btrfs/001 b/tests/btrfs/001
index 8258d06..a7747c8 100755
--- a/tests/btrfs/001
+++ b/tests/btrfs/001
@@ -99,7 +99,7 @@ echo Listing subvolumes
  $BTRFS_UTIL_PROG subvolume list $SCRATCH_MNT | awk '{ print $NF }'

  # Delete the snapshot
-$BTRFS_UTIL_PROG subvolume delete $SCRATCH_MNT/snap | 
_filter_btrfs_subvol_delete
+_run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap
  echo List root dir
  ls $SCRATCH_MNT
  _scratch_remount
diff --git a/tests/btrfs/001.out b/tests/btrfs/001.out
index c782bde..43e8c56 100644
--- a/tests/btrfs/001.out
+++ b/tests/btrfs/001.out
@@ -33,7 +33,6 @@ subvol
  Listing subvolumes
  snap
  subvol
-Delete subvolume 'SCRATCH_MNT/snap'
  List root dir
  subvol
  List root dir
--
2.0.0.153.g79d

--
To unsubscribe from this list: send the line unsubscribe fstests in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/3] Btrfs: qgroup: free reserved in exceeding quota.

2015-01-04 Thread Dongsheng Yang
When we exceed quota limit in writing, we will free
some reserved extent when we need to drop but not free
account in qgroup. It means, each time we exceed quota
in writing, there will be some remain space in qg-reserved
we can not use any more. If things go on like this, the
all space will be ate up.

Signed-off-by: Dongsheng Yang yangds.f...@cn.fujitsu.com
Reviewed-by: Josef Bacik jba...@fb.com
---
 fs/btrfs/extent-tree.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a80b971..88b4e32 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5275,8 +5275,11 @@ out_fail:
to_free = 0;
}
spin_unlock(BTRFS_I(inode)-lock);
-   if (dropped)
+   if (dropped) {
+   if (root-fs_info-quota_enabled)
+   btrfs_qgroup_free(root, dropped * root-nodesize);
to_free += btrfs_calc_trans_metadata_size(root, dropped);
+   }
 
if (to_free) {
btrfs_block_rsv_release(root, block_rsv, to_free);
-- 
1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: Allow debug-tree to be executed on regular file.

2015-01-04 Thread Qu Wenruo
The commit 1bad43fbe002 (btrfs-progs: refine btrfs-debug-tree error
prompt when a mount point given)
add judgement on btrfs-debug-tree to restrict only block device to be
executed on, but the command can also be used on regular file, so add
regular file support for the judgement.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 btrfs-debug-tree.c |  5 +++--
 utils.c| 21 +
 utils.h|  1 +
 3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/btrfs-debug-tree.c b/btrfs-debug-tree.c
index 9cdb35f..0815fe1 100644
--- a/btrfs-debug-tree.c
+++ b/btrfs-debug-tree.c
@@ -180,8 +180,9 @@ int main(int ac, char **av)
print_usage();
 
ret = check_arg_type(av[optind]);
-   if (ret != BTRFS_ARG_BLKDEV) {
-   fprintf(stderr, '%s' is not a block device\n, av[optind]);
+   if (ret != BTRFS_ARG_BLKDEV  ret != BTRFS_ARG_REG) {
+   fprintf(stderr, '%s' is not a block device or regular file\n,
+   av[optind]);
exit(1);
}
 
diff --git a/utils.c b/utils.c
index af0a8fe..3ca8229 100644
--- a/utils.c
+++ b/utils.c
@@ -854,13 +854,23 @@ int is_mount_point(const char *path)
return ret;
 }
 
+static int is_reg_file(const char *path)
+{
+   struct stat statbuf;
+
+   if (stat(path, statbuf)  0)
+   return -errno;
+   return S_ISREG(statbuf.st_mode);
+}
+
 /*
  * This function checks if the given input parameter is
  * an uuid or a path
- * return -1: some error in the given input
- * return 0: unknow input
- * return 1: given input is uuid
- * return 2: given input is path
+ * return 0 : some error in the given input
+ * return BTRFS_ARG_UNKNOWN:   unknown input
+ * return BTRFS_ARG_UUID:  given input is uuid
+ * return BTRFS_ARG_MNTPOINT:  given input is path
+ * return BTRFS_ARG_REG:   given input is regular file
  */
 int check_arg_type(const char *input)
 {
@@ -877,6 +887,9 @@ int check_arg_type(const char *input)
if (is_mount_point(path) == 1)
return BTRFS_ARG_MNTPOINT;
 
+   if (is_reg_file(path))
+   return BTRFS_ARG_REG;
+
return BTRFS_ARG_UNKNOWN;
}
 
diff --git a/utils.h b/utils.h
index 3950491..142f3f9 100644
--- a/utils.h
+++ b/utils.h
@@ -35,6 +35,7 @@
 #define BTRFS_ARG_MNTPOINT 1
 #define BTRFS_ARG_UUID 2
 #define BTRFS_ARG_BLKDEV   3
+#define BTRFS_ARG_REG  4
 
 #define BTRFS_UUID_UNPARSED_SIZE   37
 
-- 
2.2.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html