read-only for no good reason on 4.9.30

2017-09-03 Thread Russell Coker
I have a system with less than 50% disk space used.  It just started rejecting 
writes due to lack of disk space.  I ran "btrfs balance" and then it started 
working correctly again.  It seems that a btrfs filesystem if left alone will 
eventually get fragmented enough that it rejects writes (I've had similar 
issues with other systems running BTRFS with other kernel versions).

Is this a known issue?

Is there any good way of recognising when it's likely to happen?  Is there 
anything I can do other than rewriting a medium size file to determine when 
it's happened?

# uname -a 
Linux trex 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) x86_64 
GNU/Linux
# df -h / 
Filesystem  Size  Used Avail Use% Mounted on 
/dev/sdc239G  113G  126G  48% /
# btrfs fi df / 
Data, RAID1: total=117.00GiB, used=111.81GiB 
System, RAID1: total=32.00MiB, used=48.00KiB 
Metadata, RAID1: total=1.00GiB, used=516.00MiB 
GlobalReserve, single: total=246.59MiB, used=0.00B
# btrfs dev usa / 
/dev/sdc, ID: 1 
  Device size:   238.47GiB 
  Device slack:  0.00B 
  Data,RAID1:117.00GiB 
  Metadata,RAID1:  1.00GiB 
  System,RAID1:   32.00MiB 
  Unallocated:   120.44GiB 

/dev/sdd, ID: 2 
  Device size:   238.47GiB 
  Device slack:  0.00B 
  Data,RAID1:117.00GiB 
  Metadata,RAID1:  1.00GiB 
  System,RAID1:   32.00MiB 
  Unallocated:   120.44GiB

-- 
My Main Blog http://etbe.coker.com.au/
My Documents Bloghttp://doc.coker.com.au/

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: inspect-internal rootid: Allow a file to be specified

2017-09-03 Thread Misono, Tomohiro
Since cmd_inspect_rootid() calls btrfs_open_dir(), it rejects a file to
be spcified. But as the document says, a file should be supported.

This patch introduces btrfs_open_file_or_dir(), which is a counterpart
of btrfs_open_dir(), to safely check and open btrfs file or directory.
The original btrfs_open_dir() codes are moved to btrfs_open() and shared
by both function.

Signed-off-by: Tomohiro Misono 
---
 cmds-inspect.c |  2 +-
 utils.c| 16 +---
 utils.h|  2 ++
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/cmds-inspect.c b/cmds-inspect.c
index d1a3a0e..885f3ab 100644
--- a/cmds-inspect.c
+++ b/cmds-inspect.c
@@ -318,7 +318,7 @@ static int cmd_inspect_rootid(int argc, char **argv)
if (check_argc_exact(argc - optind, 1))
usage(cmd_inspect_rootid_usage);
 
-   fd = btrfs_open_dir(argv[optind], , 1);
+   fd = btrfs_open_file_or_dir(argv[optind], , 1);
if (fd < 0) {
ret = -ENOENT;
goto out;
diff --git a/utils.c b/utils.c
index bb04913..9db39eb 100644
--- a/utils.c
+++ b/utils.c
@@ -568,9 +568,9 @@ int open_path_or_dev_mnt(const char *path, DIR **dirstream, 
int verbose)
 /*
  * Do the following checks before calling open_file_or_dir():
  * 1: path is in a btrfs filesystem
- * 2: path is a directory
+ * 2: path is a directory if dir_only is 1
  */
-int btrfs_open_dir(const char *path, DIR **dirstream, int verbose)
+int btrfs_open(const char *path, DIR **dirstream, int verbose, int dir_only)
 {
struct statfs stfs;
struct stat st;
@@ -593,7 +593,7 @@ int btrfs_open_dir(const char *path, DIR **dirstream, int 
verbose)
return -1;
}
 
-   if (!S_ISDIR(st.st_mode)) {
+   if (dir_only && !S_ISDIR(st.st_mode)) {
error_on(verbose, "not a directory: %s", path);
return -3;
}
@@ -607,6 +607,16 @@ int btrfs_open_dir(const char *path, DIR **dirstream, int 
verbose)
return ret;
 }
 
+int btrfs_open_dir(const char *path, DIR **dirstream, int verbose)
+{
+   return btrfs_open(path, dirstream, verbose, 1);
+}
+
+int btrfs_open_file_or_dir(const char *path, DIR **dirstream, int verbose)
+{
+   return btrfs_open(path, dirstream, verbose, 0);
+}
+
 /* checks if a device is a loop device */
 static int is_loop_device (const char* device) {
struct stat statbuf;
diff --git a/utils.h b/utils.h
index 091f8fa..d28a05a 100644
--- a/utils.h
+++ b/utils.h
@@ -108,7 +108,9 @@ int is_block_device(const char *file);
 int is_mount_point(const char *file);
 int check_arg_type(const char *input);
 int open_path_or_dev_mnt(const char *path, DIR **dirstream, int verbose);
+int btrfs_open(const char *path, DIR **dirstream, int verbose, int dir_only);
 int btrfs_open_dir(const char *path, DIR **dirstream, int verbose);
+int btrfs_open_file_or_dir(const char *path, DIR **dirstream, int verbose);
 u64 btrfs_device_size(int fd, struct stat *st);
 /* Helper to always get proper size of the destination string */
 #define strncpy_null(dest, src) __strncpy_null(dest, src, sizeof(dest))
-- 
2.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: block group 11778977169408 has wrong amount of free space

2017-09-03 Thread Qu Wenruo



On 2017年09月04日 09:51, Christoph Anton Mitterer wrote:

Did another mount with clear_cache,rw (cause it was ro before)... now I
get even more errors:
# btrfs check  /dev/mapper/data-a2 ; echo $?
Checking filesystem on /dev/mapper/data-a2
UUID: f8acb432-7604-46ba-b3ad-0abe8e92c4db
checking extents
checking free space cache
block group 9857516175360 has wrong amount of free space
failed to load free space cache for block group 9857516175360
block group 11778977169408 has wrong amount of free space
failed to load free space cache for block group 11778977169408
checking fs roots
checking csums
checking root refs
found 4404625330176 bytes used, no error found
total csum bytes: 4293007908
total tree bytes: 7511883776
total fs tree bytes: 1856258048
total extent tree bytes: 1097842688
btree space waste bytes: 887738230
file data blocks allocated: 4397113446400
  referenced 4515055595520
0

what the???



IIRC clear_cache will only clear the cache of modified block groups for 
v1 space cache.


And that's why we have btrfs check --clear-space-cache v1, which will 
wipe out all (v1) space cache.


Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] btrfs-progs: mkfs: Fix wrong file type for dir items and indexes when specifying root directory

2017-09-03 Thread Qu Wenruo
[Bug]
If using mkfs.btrfs with "-r" parameter and specified direct has
fifo/socket/char/block special file, then created btrfs can't pass fsck:

--
checking fs roots
unresolved ref dir 241158 index 3 namelen 9 name S.dirmngr filetype 0 
errors 80, filetype mismatch
ERROR: errors found in fs roots
--

[Reason]
Btrfs dir items/indexes records inode type, while "-r" only handles
directors, regular files and soft link, it makes such special files type
to be regular file and caused the problem.

[Fix]
Add missing types for add_directory_items(), so that result of
"mkfs.btrfs -r" can pass mkfs.

Signed-off-by: Qu Wenruo 
---
 mkfs/main.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/mkfs/main.c b/mkfs/main.c
index afd68bc5..84ff300b 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -435,6 +435,14 @@ static int add_directory_items(struct btrfs_trans_handle 
*trans,
filetype = BTRFS_FT_REG_FILE;
if (S_ISLNK(st->st_mode))
filetype = BTRFS_FT_SYMLINK;
+   if (S_ISSOCK(st->st_mode))
+   filetype = BTRFS_FT_SOCK;
+   if (S_ISCHR(st->st_mode))
+   filetype = BTRFS_FT_CHRDEV;
+   if (S_ISBLK(st->st_mode))
+   filetype = BTRFS_FT_BLKDEV;
+   if (S_ISFIFO(st->st_mode))
+   filetype = BTRFS_FT_FIFO;
 
ret = btrfs_insert_dir_item(trans, root, name, name_len,
parent_inum, ,
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs-progs: test/mkfs: Add test case for rootdir parameter

2017-09-03 Thread Qu Wenruo
Add test case which checks if -r|--rootdir mount option can handle
softlink/char/block/fifo files.

Signed-off-by: Qu Wenruo 
---
 .../009-special-files-for-rootdir/test.sh  | 40 ++
 1 file changed, 40 insertions(+)
 create mode 100755 tests/mkfs-tests/009-special-files-for-rootdir/test.sh

diff --git a/tests/mkfs-tests/009-special-files-for-rootdir/test.sh 
b/tests/mkfs-tests/009-special-files-for-rootdir/test.sh
new file mode 100755
index ..bc5297d0
--- /dev/null
+++ b/tests/mkfs-tests/009-special-files-for-rootdir/test.sh
@@ -0,0 +1,40 @@
+#!/bin/bash
+# Check if --rootdir can handle special files (socket/fifo/char/block) 
correctly
+#
+# --rootdir had a problem of filling dir items/indexes with wrong type
+# and caused btrfs check to report such error
+
+source "$TOP/tests/common"
+
+check_prereq mkfs.btrfs
+check_prereq btrfs
+
+setup_root_helper  # For mknod
+prepare_test_dev 128M
+
+# mknod can create FIFO/CHAR/BLOCK file but not SOCK.
+# No neat tool to create socket file, unless using python or similar.
+# So no SOCK is tested here
+check_global_prereq mknod
+
+# Also check regular file
+check_global_prereq dd
+
+# And dir
+check_global_prereq mkdir
+
+tmp="/tmp/btrfs_selftest_$$"
+
+run_check mkdir $tmp
+run_check mkdir $tmp/dir
+run_check mkdir -p $tmp/dir/in/dir
+run_check mknod $tmp/fifo p
+run_check $SUDO_HELPER mknod $tmp/char c 1 1
+run_check $SUDO_HELPER mknod $tmp/block b 1 1
+run_check dd if=/dev/zero bs=1M count=1 of=$tmp/regular
+
+run_check $SUDO_HELPER "$TOP/mkfs.btrfs" -f -r "$tmp" $TEST_DEV
+
+rm "$tmp" -rf
+
+run_check $SUDO_HELPER "$TOP/btrfs" check $TEST_DEV
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to disable/revoke 'compression'?

2017-09-03 Thread Qu Wenruo



On 2017年09月04日 08:14, Adam Borowski wrote:

On Mon, Sep 04, 2017 at 07:55:27AM +0800, Qu Wenruo wrote:

On 2017年09月04日 02:06, Adam Borowski wrote:

I've once written a tool which does this, but 1. it's extremely slow, 2.
insane, 3. so insane a certain member of this list would kill me had I
distributed the tool.  Thus, I'd need to rewrite it first...


AFAIK the only method to determine the compression ratio is to check the
EXTENT_DATA key and its corresponding file_extent_item structure.
(Which I assume Adam is doing this way)

In that structure is records its on-disk data size and in-memory data size.
(All rounded up to sectorsize, which is 4K in most case)
So in theory it's possible to determine the compression ratio.

The only method I can think of (maybe I forgot some methods?) is to use
offline tool (btrfs-debug-tree) to check that.
FS APIs like fiemap doesn't even support to report on-disk data size so we
can't use it.


BTRFS_IOC_TREE_SEARCH_V2 returns all we want to know; its only downside is
being root only.


Just forgot that.




But the problem is more complicated, especially when compressed CoW is
involved.

For example, there is an extent (A) which represents the data for inode 258,
range [0,128k).
On disk size its just 4K.

And when we write the range [32K, 64K), which get CoWed and compressed,
resulting a new file extent (B) for inode 258, range [32K, 64K), and on disk
size is 4K as an example.

Then file extent layout for 258 will be:
[0,32k):  range [0,32K) of uncompressed Extent A
[32k, 64k): range [0,32k) of uncompressed Extent B
[64k, 128k): range [64k, 128K) of uncompressed Extent A.

And on disk extent size is 4K (compressed Extent A) + 4K (compressed Extent
B) = 8K.

Before the write, the compresstion ratio is 4K/128K = 3.125%
While after write, the compression ratio is 8K/128K = 6.25%


There's no real meaningful way to speak about compression ratio of a partial
extent.  Thus, I decided to, for every extent, take compressed:uncompressed
sizes of the whole extent, no matter whether the file uses only a few bytes
of that extent or references it a thousand times.


Very clever move.




Not to mention that it's possible to have uncompressed file extent.


Yeah, the tool gives a report like:
all   74%  9.2M/  13M
lzo   68%  7.1M/  11M
none 100%  2.1M/ 2.1M
as you typically have a mix of compressible and uncompressible data.


Looks quite nice!

Thanks,
Qu




喵!


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: block group 11778977169408 has wrong amount of free space

2017-09-03 Thread Christoph Anton Mitterer
Did another mount with clear_cache,rw (cause it was ro before)... now I
get even more errors:
# btrfs check  /dev/mapper/data-a2 ; echo $?
Checking filesystem on /dev/mapper/data-a2
UUID: f8acb432-7604-46ba-b3ad-0abe8e92c4db
checking extents
checking free space cache
block group 9857516175360 has wrong amount of free space
failed to load free space cache for block group 9857516175360
block group 11778977169408 has wrong amount of free space
failed to load free space cache for block group 11778977169408
checking fs roots
checking csums
checking root refs
found 4404625330176 bytes used, no error found
total csum bytes: 4293007908
total tree bytes: 7511883776
total fs tree bytes: 1856258048
total extent tree bytes: 1097842688
btree space waste bytes: 887738230
file data blocks allocated: 4397113446400
 referenced 4515055595520
0

what the???

smime.p7s
Description: S/MIME cryptographic signature


Re: block group 11778977169408 has wrong amount of free space

2017-09-03 Thread Christoph Anton Mitterer
Just checked, and mounting with clear_cache, and then re-fscking
doesn't even fix the problem...

Output stays the same.

Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-03 Thread Josef Bacik
Ok this output looked fishy and so I went and tested it on my box again.  It 
looks like I wasn't testing modifying a snapshot with an existing fs so I never 
saw these errors, but I see them as well.  I definitely fucked the building of 
the initial ref tree.  It's too late tonight for me to rework it and have it 
working for you, but I should be able to get it into shape in the morning.  
I'll let you know when I have something useful to test, sorry about the mess,

Josef

Sent from my iPhone

> On Sep 3, 2017, at 4:21 PM, Marc MERLIN  wrote:
> 
>> On Sun, Sep 03, 2017 at 05:33:33PM +, Josef Bacik wrote:
>> Alright pushed, sorry about that.
> 
> I'm reasonably sure I'm running the new code, but still got this:
> [ 2104.336513] Dropping a ref for a root that doesn't have a ref on the block
> [ 2104.358226] Dumping block entry [115253923840 155648], num_refs 1, 
> metadata 0, from disk 1
> [ 2104.384037]   Ref root 0, parent 3414272884736, owner 262813, offset 0, 
> num_refs 18446744073709551615
> [ 2104.412766]   Ref root 418, parent 0, owner 262813, offset 0, num_refs 1
> [ 2104.433888]   Root entry 418, num_refs 1
> [ 2104.446648]   Root entry 69869, num_refs 0
> [ 2104.459904]   Ref action 2, root 69869, ref_root 0, parent 3414272884736, 
> owner 262813, offset 0, num_refs 18446744073709551615
> [ 2104.496244]   No Stacktrace
> 
> Now, in the background I had a monthly md check of the underlying device
> (mdadm raid 5), and got some of those. Obviously that's not good, and 
> I'm assuming that md raid5 may not have a checksum on blocks, so it won't know
> which drive has the corrupted data.
> Does that sound right?
> 
> Now, the good news is that btrfs on top does have checksums, so running a 
> scrub should
> hopefully find those corrupted blocks if they happen to be in use by the 
> filesystem
> (maybe they are free).
> But as a reminder, this whole thread started with my FS maybe not being in a 
> good state, but both
> check --repair and scrub returning clean. Maybe I'll use the opportunity to 
> re-run a check --repair
> and a scrub after that to see what state things are in.
> 
> md6: mismatch sector in range 3581539536-3581539544
> md6: mismatch sector in range 3581539544-3581539552
> md6: mismatch sector in range 3581539552-3581539560
> md6: mismatch sector in range 3581539560-3581539568  
> md6: mismatch sector in range 3581543792-3581543800
> md6: mismatch sector in range 3581543800-3581543808
> md6: mismatch sector in range 3581543808-3581543816
> md6: mismatch sector in range 3581543816-3581543824
> md6: mismatch sector in range 3581544112-3581544120
> md6: mismatch sector in range 3581544120-3581544128
> 
> As for your patch, no idea why it's not giving me a stacktrace, sorry :-/
> 
> Git log of my tree does show:
> commit aa162d2908bd7452805ea812b7550232b0b6ed53
> Author: Josef Bacik 
> Date:   Sun Sep 3 13:32:17 2017 -0400
> 
>Btrfs: use be->metadata just in case
> 
>I suspect we're not getting the owner in some cases, so we want to just
>use the known value.
> 
>Signed-off-by: Josef Bacik 
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>   what McDonalds is to gourmet cooking
> Home page: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_=DwIBAg=5VD0RTtNlTh3ycd41b3MUw=sDzg6MvHymKOUgI8SFIm4Q=BaH33jtavN-1wWyV3yseE5v7ImIAaTXLnjChSr4HnQw=3JczS4Mo254uip2aIsYiC_EUHsmGYcCJUUMl6si8NQ8=
>   | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


block group 11778977169408 has wrong amount of free space

2017-09-03 Thread Christoph Anton Mitterer
Hey.

Just got the following:
$ uname -a
Linux heisenberg 4.12.0-1-amd64 #1 SMP Debian 4.12.6-1 (2017-08-12)
x86_64 GNU/Linux

$ btrfs version
btrfs-progs v4.12

on a filesystem:

# btrfs check  /dev/mapper/data-a2 ; echo $?
Checking filesystem on /dev/mapper/data-a2
UUID: f8acb432-7604-46ba-b3ad-0abe8e92c4db
checking extents
checking free space cache
block group 11778977169408 has wrong amount of free space
failed to load free space cache for block group 11778977169408
checking fs roots
checking csums
checking root refs
found 4404625739776 bytes used, no error found
total csum bytes: 4293007908
total tree bytes: 7511900160
total fs tree bytes: 1856258048
total extent tree bytes: 1097859072
btree space waste bytes: 887753954
file data blocks allocated: 4397113839616
 referenced 4515055988736
0

Any idea what could cause these free space issues and how to clean them
up? Thought that should work with recent kernels could that mean
some data will be corrupted when I do e.g. mount with clean_cache?

Interestingly, $? is still 0... even though errors were found.
And kernel log shows nothing.


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: How to disable/revoke 'compression'?

2017-09-03 Thread Adam Borowski
On Mon, Sep 04, 2017 at 07:55:27AM +0800, Qu Wenruo wrote:
> On 2017年09月04日 02:06, Adam Borowski wrote:
> > I've once written a tool which does this, but 1. it's extremely slow, 2.
> > insane, 3. so insane a certain member of this list would kill me had I
> > distributed the tool.  Thus, I'd need to rewrite it first...
> 
> AFAIK the only method to determine the compression ratio is to check the
> EXTENT_DATA key and its corresponding file_extent_item structure.
> (Which I assume Adam is doing this way)
> 
> In that structure is records its on-disk data size and in-memory data size.
> (All rounded up to sectorsize, which is 4K in most case)
> So in theory it's possible to determine the compression ratio.
> 
> The only method I can think of (maybe I forgot some methods?) is to use
> offline tool (btrfs-debug-tree) to check that.
> FS APIs like fiemap doesn't even support to report on-disk data size so we
> can't use it.

BTRFS_IOC_TREE_SEARCH_V2 returns all we want to know; its only downside is
being root only.

> But the problem is more complicated, especially when compressed CoW is
> involved.
> 
> For example, there is an extent (A) which represents the data for inode 258,
> range [0,128k).
> On disk size its just 4K.
> 
> And when we write the range [32K, 64K), which get CoWed and compressed,
> resulting a new file extent (B) for inode 258, range [32K, 64K), and on disk
> size is 4K as an example.
> 
> Then file extent layout for 258 will be:
> [0,32k):  range [0,32K) of uncompressed Extent A
> [32k, 64k): range [0,32k) of uncompressed Extent B
> [64k, 128k): range [64k, 128K) of uncompressed Extent A.
> 
> And on disk extent size is 4K (compressed Extent A) + 4K (compressed Extent
> B) = 8K.
> 
> Before the write, the compresstion ratio is 4K/128K = 3.125%
> While after write, the compression ratio is 8K/128K = 6.25%

There's no real meaningful way to speak about compression ratio of a partial
extent.  Thus, I decided to, for every extent, take compressed:uncompressed
sizes of the whole extent, no matter whether the file uses only a few bytes
of that extent or references it a thousand times.

> Not to mention that it's possible to have uncompressed file extent.

Yeah, the tool gives a report like:
all   74%  9.2M/  13M
lzo   68%  7.1M/  11M
none 100%  2.1M/ 2.1M
as you typically have a mix of compressible and uncompressible data.


喵!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
⢿⡄⠘⠷⠚⠋⠀ -- Genghis Ht'rok'din
⠈⠳⣄ 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to disable/revoke 'compression'?

2017-09-03 Thread Qu Wenruo



On 2017年09月04日 02:06, Adam Borowski wrote:

On Sun, Sep 03, 2017 at 07:32:01PM +0200, Cloud Admin wrote:

Hi,
I used the mount option 'compression' on some mounted sub volumes. How
can I revoke the compression? Means to delete the option and get all
data uncompressed on this volume.
Is it enough to remount the sub volume without this option? Or is it
necessary to do some addional step (balancing?) to get all stored data
uncompressed.


If you set it via mount option, removing the option is enough to disable
compression for _new_ files.  Other ways are chattr +c and btrfs-property,
but if you haven't heard about those you almost surely don't have such
attributes set.

After remounting, you may uncompress existing files.  Balancing won't do
this as it moves extents around without looking inside; defrag on the other
hand rewrites extents thus as a side effect it applies new [non]compression
settings.  Thus: 「btrfs fi defrag -r /path/to/filesystem」.


Beside of it, is it possible to find out what the real and compressed size
of a file, for example or the ratio?


Currently not.

I've once written a tool which does this, but 1. it's extremely slow, 2.
insane, 3. so insane a certain member of this list would kill me had I
distributed the tool.  Thus, I'd need to rewrite it first...


AFAIK the only method to determine the compression ratio is to check the 
EXTENT_DATA key and its corresponding file_extent_item structure.

(Which I assume Adam is doing this way)

In that structure is records its on-disk data size and in-memory data 
size. (All rounded up to sectorsize, which is 4K in most case)

So in theory it's possible to determine the compression ratio.

The only method I can think of (maybe I forgot some methods?) is to use 
offline tool (btrfs-debug-tree) to check that.
FS APIs like fiemap doesn't even support to report on-disk data size so 
we can't use it.



But the problem is more complicated, especially when compressed CoW is 
involved.


For example, there is an extent (A) which represents the data for inode 
258, range [0,128k).

On disk size its just 4K.

And when we write the range [32K, 64K), which get CoWed and compressed, 
resulting a new file extent (B) for inode 258, range [32K, 64K), and on 
disk size is 4K as an example.


Then file extent layout for 258 will be:
[0,32k):  range [0,32K) of uncompressed Extent A
[32k, 64k): range [0,32k) of uncompressed Extent B
[64k, 128k): range [64k, 128K) of uncompressed Extent A.

And on disk extent size is 4K (compressed Extent A) + 4K (compressed 
Extent B) = 8K.


Before the write, the compresstion ratio is 4K/128K = 3.125%
While after write, the compression ratio is 8K/128K = 6.25%

Not to mention that it's possible to have uncompressed file extent.

So it's complicated even we're just using offline tool to determine the 
compression ratio of btrfs compressed file.


Thanks,
Qu




Meow!


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-03 Thread Marc MERLIN
On Sun, Sep 03, 2017 at 05:33:33PM +, Josef Bacik wrote:
> Alright pushed, sorry about that.
 
I'm reasonably sure I'm running the new code, but still got this:
[ 2104.336513] Dropping a ref for a root that doesn't have a ref on the block
[ 2104.358226] Dumping block entry [115253923840 155648], num_refs 1, metadata 
0, from disk 1
[ 2104.384037]   Ref root 0, parent 3414272884736, owner 262813, offset 0, 
num_refs 18446744073709551615
[ 2104.412766]   Ref root 418, parent 0, owner 262813, offset 0, num_refs 1
[ 2104.433888]   Root entry 418, num_refs 1
[ 2104.446648]   Root entry 69869, num_refs 0
[ 2104.459904]   Ref action 2, root 69869, ref_root 0, parent 3414272884736, 
owner 262813, offset 0, num_refs 18446744073709551615
[ 2104.496244]   No Stacktrace

Now, in the background I had a monthly md check of the underlying device
(mdadm raid 5), and got some of those. Obviously that's not good, and 
I'm assuming that md raid5 may not have a checksum on blocks, so it won't know
which drive has the corrupted data.
Does that sound right?

Now, the good news is that btrfs on top does have checksums, so running a scrub 
should
hopefully find those corrupted blocks if they happen to be in use by the 
filesystem
(maybe they are free).
But as a reminder, this whole thread started with my FS maybe not being in a 
good state, but both
check --repair and scrub returning clean. Maybe I'll use the opportunity to 
re-run a check --repair
and a scrub after that to see what state things are in.

md6: mismatch sector in range 3581539536-3581539544
md6: mismatch sector in range 3581539544-3581539552
md6: mismatch sector in range 3581539552-3581539560
md6: mismatch sector in range 3581539560-3581539568  
md6: mismatch sector in range 3581543792-3581543800
md6: mismatch sector in range 3581543800-3581543808
md6: mismatch sector in range 3581543808-3581543816
md6: mismatch sector in range 3581543816-3581543824
md6: mismatch sector in range 3581544112-3581544120
md6: mismatch sector in range 3581544120-3581544128

As for your patch, no idea why it's not giving me a stacktrace, sorry :-/

Git log of my tree does show:
commit aa162d2908bd7452805ea812b7550232b0b6ed53
Author: Josef Bacik 
Date:   Sun Sep 3 13:32:17 2017 -0400

Btrfs: use be->metadata just in case

I suspect we're not getting the owner in some cases, so we want to just
use the known value.

Signed-off-by: Josef Bacik 

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: speed up big btrfs volumes with ssds

2017-09-03 Thread Peter Grandi
> [ ... ] - needed volume size is 60TB

I wonder how long that takes to 'scrub', 'balance', 'check',
'subvolume delete', 'find', etc.

> [ ... ] 4x HW Raid 5 with 1GB controller memory of 4TB 3,5"
> devices and using btrfs as raid 0 for data and metadata on top
> of those 4 raid 5. [ ... ]  the write speed is not as good as
> i would like - especially for random 8k-16k I/O. [ ... ]

Also I noticed that the rain is wet and cold - especially if one
walks around for a few hours in a t-shirt, shorts and sandals.
:-)

> My current idea is to use a pcie flash card with bcache on top
> of each raid 5. Is this something which makes sense to speed
> up the write speed.

Well 'bcache' in the role of write buffer allegedly helps
turning unaligned writes into aligned writes, so might help, but
I wonder how effective that will be in this case, plus it won't
turn low random IOPS-per-TB 4TB devices into high ones. Anyhow
if they are battery-backed the 1GB of HW HBA cache/buffer should
do exactly that, excep that again in this case that is rather
optimistic.

But this reminds me of the common story: "Doctor, if I stab
repeatedly my hand with a fork it hurts a lot, how to fix that?"
"Don't do it".
:-)

PS Random writes of 8-16KiB over 60TB might seem like storing
small records/images in small files. That would be "brave".
On a 60TB RAID50 of 20x 4TB disk drives that might mean around
5-10MB/s of random small writes, including both data and
metadata.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to disable/revoke 'compression'?

2017-09-03 Thread Hans van Kranenburg
On 09/03/2017 08:06 PM, Adam Borowski wrote:
> On Sun, Sep 03, 2017 at 07:32:01PM +0200, Cloud Admin wrote:
>> Hi,
>> I used the mount option 'compression' on some mounted sub volumes. How
>> can I revoke the compression? Means to delete the option and get all
>> data uncompressed on this volume.
>> Is it enough to remount the sub volume without this option? Or is it
>> necessary to do some addional step (balancing?) to get all stored data
>> uncompressed.
> 
> If you set it via mount option, removing the option is enough to disable
> compression for _new_ files.  Other ways are chattr +c and btrfs-property,
> but if you haven't heard about those you almost surely don't have such
> attributes set.
> 
> After remounting, you may uncompress existing files.  Balancing won't do
> this as it moves extents around without looking inside; defrag on the other
> hand rewrites extents thus as a side effect it applies new [non]compression
> settings.  Thus: 「btrfs fi defrag -r /path/to/filesystem」.
> 
>> Beside of it, is it possible to find out what the real and compressed size
>> of a file, for example or the ratio?
> 
> Currently not.
> 
> I've once written a tool which does this, but 1. it's extremely slow, 2.
> insane, 3. so insane a certain member of this list would kill me had I
> distributed the tool.  Thus, I'd need to rewrite it first...

Heh, I wouldn't do that, since I need you to do my debian uploads. :D

But it would certainly help to be a bit less stubborn only wanting to
code in the language that matches your country code. :O

Or maybe I can help a bit, since it sounds like a nice one for the
coding examples in the lib. ;] Days are getting shorter again, so the
amount of indoor coding activity will hopefully increase a bit again soon.

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


speed up big btrfs volumes with ssds

2017-09-03 Thread Stefan Priebe - Profihost AG
Hello,

i'm trying to speed up big btrfs volumes.

Some facts:
- Kernel will be 4.13-rc7
- needed volume size is 60TB

Currently without any ssds i get the best speed with:
- 4x HW Raid 5 with 1GB controller memory of 4TB 3,5" devices

and using btrfs as raid 0 for data and metadata on top of those 4 raid 5.

I can live with a data loss every now and and than ;-) so a raid 0 on
top of the 4x radi5 is acceptable for me.

Currently the write speed is not as good as i would like - especially
for random 8k-16k I/O.

My current idea is to use a pcie flash card with bcache on top of each
raid 5.

Is this something which makes sense to speed up the write speed.

Greets,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to disable/revoke 'compression'?

2017-09-03 Thread Adam Borowski
On Sun, Sep 03, 2017 at 07:32:01PM +0200, Cloud Admin wrote:
> Hi,
> I used the mount option 'compression' on some mounted sub volumes. How
> can I revoke the compression? Means to delete the option and get all
> data uncompressed on this volume.
> Is it enough to remount the sub volume without this option? Or is it
> necessary to do some addional step (balancing?) to get all stored data
> uncompressed.

If you set it via mount option, removing the option is enough to disable
compression for _new_ files.  Other ways are chattr +c and btrfs-property,
but if you haven't heard about those you almost surely don't have such
attributes set.

After remounting, you may uncompress existing files.  Balancing won't do
this as it moves extents around without looking inside; defrag on the other
hand rewrites extents thus as a side effect it applies new [non]compression
settings.  Thus: 「btrfs fi defrag -r /path/to/filesystem」.

> Beside of it, is it possible to find out what the real and compressed size
> of a file, for example or the ratio?

Currently not.

I've once written a tool which does this, but 1. it's extremely slow, 2.
insane, 3. so insane a certain member of this list would kill me had I
distributed the tool.  Thus, I'd need to rewrite it first...


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
⢿⡄⠘⠷⠚⠋⠀ -- Genghis Ht'rok'din
⠈⠳⣄ 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to disable/revoke 'compression'?

2017-09-03 Thread Cloud Admin
Hi,
I used the mount option 'compression' on some mounted sub volumes. How
can I revoke the compression? Means to delete the option and get all
data uncompressed on this volume.
Is it enough to remount the sub volume without this option? Or is it
necessary to do some addional step (balancing?) to get all stored data
uncompressed. Beside of it, is it possible to find out what the real
and compressed size of a file, for example or the ratio?
Bye
   Frank
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-03 Thread Josef Bacik
Alright pushed, sorry about that.

Josef

Sent from my iPhone

> On Sep 3, 2017, at 10:42 AM, Marc MERLIN  wrote:
> 
>> On Sun, Sep 03, 2017 at 02:38:57PM +, Josef Bacik wrote:
>> Oh yeah you need CONFIG_STACKTRACE turned on, otherwise this is going to be 
>> difficult ;).  Thanks,
> 
> Right, except that I thought I did:
> 
> saruman:/usr/src/linux-btrfs/btrfs-next# grep STACKTRACE .config
> CONFIG_STACKTRACE_SUPPORT=y
> CONFIG_HAVE_RELIABLE_STACKTRACE=y
> CONFIG_STACKTRACE=y
> CONFIG_USER_STACKTRACE_SUPPORT=y
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>   what McDonalds is to gourmet cooking
> Home page: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_=DwIBAg=5VD0RTtNlTh3ycd41b3MUw=sDzg6MvHymKOUgI8SFIm4Q=6hYQEzNFsUwvT2CxYV_u4CrE2zAroYdvDkhnSNUI_aY=8wh8ci2P8k3BgZ3s_Fxsh3cZak4P3ESZslRm2vobnqs=
>   | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-03 Thread Josef Bacik
Jesus Christ I misspelled it, I'll fix it up when I get home.  Thanks,

Josef

Sent from my iPhone

> On Sep 3, 2017, at 10:42 AM, Marc MERLIN  wrote:
> 
>> On Sun, Sep 03, 2017 at 02:38:57PM +, Josef Bacik wrote:
>> Oh yeah you need CONFIG_STACKTRACE turned on, otherwise this is going to be 
>> difficult ;).  Thanks,
> 
> Right, except that I thought I did:
> 
> saruman:/usr/src/linux-btrfs/btrfs-next# grep STACKTRACE .config
> CONFIG_STACKTRACE_SUPPORT=y
> CONFIG_HAVE_RELIABLE_STACKTRACE=y
> CONFIG_STACKTRACE=y
> CONFIG_USER_STACKTRACE_SUPPORT=y
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>   what McDonalds is to gourmet cooking
> Home page: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_=DwIBAg=5VD0RTtNlTh3ycd41b3MUw=sDzg6MvHymKOUgI8SFIm4Q=6hYQEzNFsUwvT2CxYV_u4CrE2zAroYdvDkhnSNUI_aY=8wh8ci2P8k3BgZ3s_Fxsh3cZak4P3ESZslRm2vobnqs=
>   | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-03 Thread Marc MERLIN
On Sun, Sep 03, 2017 at 02:38:57PM +, Josef Bacik wrote:
> Oh yeah you need CONFIG_STACKTRACE turned on, otherwise this is going to be 
> difficult ;).  Thanks,
 
Right, except that I thought I did:

saruman:/usr/src/linux-btrfs/btrfs-next# grep STACKTRACE .config
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_STACKTRACE=y
CONFIG_USER_STACKTRACE_SUPPORT=y

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-03 Thread Josef Bacik
Oh yeah you need CONFIG_STACKTRACE turned on, otherwise this is going to be 
difficult ;).  Thanks,

Josef

Sent from my iPhone

> On Sep 3, 2017, at 10:31 AM, Marc MERLIN  wrote:
> 
>> On Sun, Sep 03, 2017 at 03:26:34AM +, Josef Bacik wrote:
>> I was looking through the code for other ways to cut down memory usage when 
>> I noticed we only catch improper re-allocations, not adding another ref for 
>> metadata which is what I suspect your problem is.  I added another patch and 
>> pushed it out, sorry for the churn.
> 
> Installed.
> 
> For now, I've seen this once, but otherwise no issues:
> Dropping a ref for a root that doesn't have a ref on the block
> Dumping block entry [26538725376 4096], num_refs 2, metadata 0, from disk 1
>  Ref root 0, parent 29818880, owner 23608, offset 0, num_refs 
> 18446744073709551615
>  Ref root 0, parent 202129408, owner 23608, offset 0, num_refs 1
>  Ref root 418, parent 0, owner 23608, offset 0, num_refs 1
>  Root entry 418, num_refs 1
>  Root entry 69809, num_refs 0
>  Ref action 1, root 418, ref_root 0, parent 202129408, owner 23608, offset 0, 
> num_refs 1
>  No stacktrace support
>  Ref action 2, root 69809, ref_root 0, parent 29818880, owner 23608, offset 
> 0, num_refs 18446744073709551615
>  No stacktrace support
> 
> 
> I'm assuming this was done by your patch?
> Should I worry about 'No stacktrace support' ?
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>   what McDonalds is to gourmet cooking
> Home page: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__marc.merlins.org_=DwIBAg=5VD0RTtNlTh3ycd41b3MUw=sDzg6MvHymKOUgI8SFIm4Q=LcpX_93P3Y777JowgGupu6UcijcbbvSYDebGKuuA1G8=w9rh7zu0AfB72bo7gMQ9oAj20iJYe8KIXuudlTWa_ek=
>   | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-03 Thread Marc MERLIN
On Sun, Sep 03, 2017 at 03:26:34AM +, Josef Bacik wrote:
> I was looking through the code for other ways to cut down memory usage when I 
> noticed we only catch improper re-allocations, not adding another ref for 
> metadata which is what I suspect your problem is.  I added another patch and 
> pushed it out, sorry for the churn.

Installed.

For now, I've seen this once, but otherwise no issues:
Dropping a ref for a root that doesn't have a ref on the block
Dumping block entry [26538725376 4096], num_refs 2, metadata 0, from disk 1
  Ref root 0, parent 29818880, owner 23608, offset 0, num_refs 
18446744073709551615
  Ref root 0, parent 202129408, owner 23608, offset 0, num_refs 1
  Ref root 418, parent 0, owner 23608, offset 0, num_refs 1
  Root entry 418, num_refs 1
  Root entry 69809, num_refs 0
  Ref action 1, root 418, ref_root 0, parent 202129408, owner 23608, offset 0, 
num_refs 1
  No stacktrace support
  Ref action 2, root 69809, ref_root 0, parent 29818880, owner 23608, offset 0, 
num_refs 18446744073709551615
  No stacktrace support


I'm assuming this was done by your patch?
Should I worry about 'No stacktrace support' ?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: joining to contribute

2017-09-03 Thread Piotr Pawłow
Hello,
>
>Alongside this, there's also a requirement for being able to do
> round-trip send/receive while preserving the ability to do incremental
> sends. This is likely to be related to the above bug-fix. I did a
> complete write-up of what's happening, and what needs to happen, here:
>
> http://www.spinics.net/lists/linux-btrfs/msg44089.html

I missed that discussion, but I proposed a different solution in a similar 
thread about send/receive 
(https://www.spinics.net/lists/linux-btrfs/msg60694.html)

I think it's not very useful that received_uuid encodes where the subvolume 
comes from. All that send / receive should care about, is that the contents of 
source(s) used for incremental send match the contents of subvolumes on the 
receive side. Let's call it, for example, "contents_uuid". The rules would be 
simple: operations that preserve contents preserve contents_uuid ; operations 
that change contents change contents_uuid. For simplicity and performance 
reasons, in order to not need tracking of changes, we could allow for some 
false positives, where contents_uuid changed when data did not. A simpler to 
implement set of rules could look like this:

- rw subvolumes have no contents_uuid
- changing rw subvolume to ro assigns a random contents_uuid
- ro snapshot of rw subvolume gets a random contents_uuid
- ro snapshot of ro subvolume preserves contents_uuid
- send/receive preserves contents_uuid (after successful receive)

And then the rule for send / receive would be:
- send transmits contents_uuid of subvolumes used as clone sources, which are 
matched to subvolumes having identical contents_uuid on the receive side.

Does it make sense? Did I miss something? I haven't received any feedback last 
time, which is why I bring it up again for discussion.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html