Re: [PATCH V2] test online label ioctl

2018-05-14 Thread Dave Chinner
On Mon, May 14, 2018 at 06:26:07PM -0500, Eric Sandeen wrote:
> On 5/14/18 6:11 PM, Dave Chinner wrote:
> > On Mon, May 14, 2018 at 12:09:16PM -0500, Eric Sandeen wrote:
> >> This tests the online label ioctl that btrfs has, which has been
> >> recently proposed for XFS.
> >>
> >> To run, it requires an updated xfs_io with the label command and a
> >> filesystem that supports it
> >>
> >> A slight change here to _require_xfs_io_command as well, so that tests
> >> which simply fail with "Inappropriate ioctl" can be caught in the
> >> common case.
> >>
> >> Signed-off-by: Eric Sandeen 
> >> ---
> 
> 
>  
> >> +# The maximum filesystem label length, not including terminating NULL
> >> +_label_get_max()
> >> +{
> >> +  case $FSTYP in
> >> +  xfs)
> >> +  MAXLEN=12
> >> +  ;;
> >> +  btrfs)
> >> +  MAXLEN=255
> >> +  ;;
> >> +  *)
> >> +  MAXLEN=0
> > 
> > Why not just _notrun here?
> 
> do we want to go through the other steps only to get here and notrun
> halfway through?
> 
> >> +  ;;
> >> +  esac
> >> +
> >> +  echo $MAXLEN
> >> +}
> >> +
> >> +_require_label_get_max()
> >> +{
> >> +  if [ $(_label_get_max) -eq 0 ]; then
> >> +  _notrun "$FSTYP does not define maximum label length"
> >> +  fi
> > 
> > And this check can go away?
> 
> We'd like to know if all the validations in this can complete before we
> get started, right?  That's why I did it this way.

Sure, just trying to be a bit defensive as people often forget
_requires directives when writing new tests

> > Also, shouldn't it be _require_online_label_change() ? And then
> > maybe you can move the xfs_io label command check inside it?
> 
> Well, we want to know a lot of things:
> 
> 1) can the kernel code for this filesystem support online label
> 2) does xfs_io support the label command
> 3) does this test know the maximum label length to test for this fs
> 
> the above stuff is for 3)

What I was suggesting was doing all of these in one function similar
to _require_xfs_sparse_inodes, _require_meta_uuid, etc:

_require_online_label_change()
{
# need xfs_io support
_require_xfs_io_command "label"

# need fstests knowledge of label size
[ $(_label_get_max) -eq 0 ] && _notrun "$FSTYP does not define maximum 
label length"

# need kernel support
$XFS_IO_PROG -c label $TEST_DIR > /dev/null 2>&1
[ $? -ne 0 ] && _notrun "Kernel does not support FS_IOC_GETLABEL"
}

Which also means that the label_f command in xfs_io needs to set the
exitcode to non-zero when the ioctl fails so that xfs_io returns
non-zero on failure...

Cheers,

Dave.

-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set

2018-05-14 Thread Qu Wenruo


On 2018年05月15日 00:52, David Sterba wrote:
> On Mon, May 14, 2018 at 03:02:10PM +0800, Qu Wenruo wrote:
>> As btrfs(5) specified:
>>
>>  Note
>>  If nodatacow or nodatasum are enabled, compression is disabled.
>>
>> If NODATASUM or NODATACOW set, we should not compress the extent.
>>
>> And in fact, we have bug report about corrupted compressed extent
>> leading to memory corruption in mail list.
> 
> Link please.
> 
>> Although it's mostly buggy lzo implementation causing the problem, btrfs
>> still needs to be fixed to meet the specification.
> 
> That's very vague, what's the LZO bug? If the input is garbage and lzo
> decompression cannot decompress it, it's not a lzo bug.

Still digging.
Would update this content in next version.

> 
>> Reported-by: James Harvey 
>> Signed-off-by: Qu Wenruo 
>> ---
>>  fs/btrfs/inode.c | 8 
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index d241285a0d2a..dbef3f404559 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -396,6 +396,14 @@ static inline int inode_need_compress(struct inode 
>> *inode, u64 start, u64 end)
>>  {
>>  struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>>  
>> +/*
>> + * Btrfs doesn't support compression without csum or CoW.
>> + * This should have the highest priority.
>> + */
>> +if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW ||
>> +BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)
>> +return 0;
> 
> This is also the wrong place to fix that, NODATASUM or NODATACOW inode
> should never make it to compress_file_range (that calls
> inode_need_compress).

Nope, in run_delalloc_range(), we calls inode_need_compress() to
determine if we goes to normal cow routine or goes to async routine
(compression).

So inode_need_compress() looks like a pretty valid location.

Just mount with nodatasum and create an inode, then remount to
datasum,compress, write something to that inode, you could hit the behavior.

Thanks,
Qu

> 
>> +
>>  /* force compress */
>>  if (btrfs_test_opt(fs_info, FORCE_COMPRESS))
>>  return 1;
>> -- 
>> 2.17.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


Re: verify key failure

2018-05-14 Thread Qu Wenruo


On 2018年05月14日 22:35, Liu Bo wrote:
> Hi,
> 
> I got another warning of verify_level_key by running btrfs/124 in a loop, I'm 
> testing against 4.17-rc3.
> 
> Not sure if it's false positive.
> 
> [101414.336691] WARNING: CPU: 3 PID: 30194 at fs/btrfs/disk-io.c:455 
> btree_read_extent_buffer_pages+0x183/0x220 [btrfs]
> [101414.340372] Modules linked in: btrfs(O) xor zstd_decompress zstd_compress 
> xxhash zlib_inflate lzo_compress lzo_decompress zlib_deflate raid6_pq 
> dm_flakey [last unloaded: xor]
> [101414.345713] CPU: 3 PID: 30194 Comm: btrfs Tainted: GW  O  
> 4.17.0-rc3-liubo+ #35
> [101414.348501] RIP: 0010:btree_read_extent_buffer_pages+0x183/0x220 [btrfs]
> ...
> [101414.369713] Call Trace:
> [101414.370477]  read_tree_block+0x3d/0x60 [btrfs]
> [101414.371946]  read_block_for_search.isra.11+0x19c/0x350 [btrfs]
> [101414.373915]  btrfs_search_slot+0x4a0/0xa60 [btrfs]
> [101414.375489]  ? trace_hardirqs_on_caller+0x12/0x1c0
> [101414.377080]  ? btrfs_lookup_ordered_extent+0x8b/0xd0 [btrfs]
> [101414.379007]  btrfs_lookup_csum+0x42/0x130 [btrfs]
> [101414.380456]  __btrfs_lookup_bio_sums+0x2fb/0x6a0 [btrfs]
> [101414.381554]  btrfs_submit_bio_hook+0xbb/0x180 [btrfs]
> [101414.382598]  submit_one_bio+0x57/0x80 [btrfs]
> [101414.383509]  submit_extent_page+0xd5/0x1f0 [btrfs]
> [101414.384507]  __do_readpage+0x2a6/0x770 [btrfs]
> [101414.385449]  ? btrfs_create_repair_bio+0x100/0x100 [btrfs]
> [101414.386576]  ? btrfs_direct_IO+0x3a0/0x3a0 [btrfs]
> [101414.387569]  ? btrfs_direct_IO+0x3a0/0x3a0 [btrfs]
> [101414.388562]  __extent_readpages+0x2e2/0x330 [btrfs]
> [101414.389584]  extent_readpages+0x10e/0x1a0 [btrfs]
> [101414.390565]  __do_page_cache_readahead+0x283/0x340
> [101414.391550]  ? ondemand_readahead+0x207/0x460
> [101414.392451]  ondemand_readahead+0x207/0x460
> [101414.393353]  relocate_file_extent_cluster+0x364/0x4c0 [btrfs]
> [101414.394546]  relocate_block_group+0x5d4/0x6e0 [btrfs]
> ...
> [101414.432616] BTRFS error (device sdb): tree first key mismatch detected, 
> bytenr=30523392 key expected=(18446744073709551606, 128, 1120665600) has=(1, 
> 204, 22020096)

The expected key is completely fine, while the found one obviously
belongs to extent tree.

Maybe that's the bug which I'm always chasing.

Can you reproduce it again with btrfs_print_tree() added to provide more
info?

Thanks,
Qu

> 
> thanks,
> -liubo
> 



signature.asc
Description: OpenPGP digital signature


[PATCH v2 1/3] btrfs-progs: check: check symlinks with append/immutable flags

2018-05-14 Thread Su Yue
Define new macro I_ERR_ODD_INODE_FLAGS to represents odd inode flags.

Symlinks should never have append/immutable flags.
While processing inodes, if found a symlink with append/immutable
flags, mark the inode record with I_ERR_ODD_INODE_FLAGS.

This is for original mode.

Signed-off-by: Su Yue 
---
 check/main.c  | 7 +++
 check/mode-original.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/check/main.c b/check/main.c
index 68da994f7ae0..c764fc011ded 100644
--- a/check/main.c
+++ b/check/main.c
@@ -576,6 +576,8 @@ static void print_inode_error(struct btrfs_root *root, 
struct inode_record *rec)
fprintf(stderr, ", link count wrong");
if (errors & I_ERR_FILE_EXTENT_ORPHAN)
fprintf(stderr, ", orphan file extent");
+   if (errors & I_ERR_ODD_INODE_FLAGS)
+   fprintf(stderr, ", odd inode flags");
fprintf(stderr, "\n");
/* Print the orphan extents if needed */
if (errors & I_ERR_FILE_EXTENT_ORPHAN)
@@ -805,6 +807,7 @@ static int process_inode_item(struct extent_buffer *eb,
 {
struct inode_record *rec;
struct btrfs_inode_item *item;
+   u64 flags;
 
rec = active_node->current;
BUG_ON(rec->ino != key->objectid || rec->refs > 1);
@@ -822,6 +825,10 @@ static int process_inode_item(struct extent_buffer *eb,
rec->found_inode_item = 1;
if (rec->nlink == 0)
rec->errors |= I_ERR_NO_ORPHAN_ITEM;
+   flags = btrfs_inode_flags(eb, item);
+   if (rec->imode & BTRFS_FT_SYMLINK &&
+   flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND))
+   rec->errors |= I_ERR_ODD_INODE_FLAGS;
maybe_free_inode_rec(_node->inode_cache, rec);
return 0;
 }
diff --git a/check/mode-original.h b/check/mode-original.h
index 368de692fdd1..13cfa5b9e1b3 100644
--- a/check/mode-original.h
+++ b/check/mode-original.h
@@ -186,6 +186,7 @@ struct file_extent_hole {
 #define I_ERR_LINK_COUNT_WRONG (1 << 13)
 #define I_ERR_FILE_EXTENT_ORPHAN   (1 << 14)
 #define I_ERR_FILE_EXTENT_TOO_LARGE(1 << 15)
+#define I_ERR_ODD_INODE_FLAGS  (1 << 16)
 
 struct inode_record {
struct list_head backrefs;
-- 
2.17.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3] btrfs-progs: check: verify symlinks with append/immutable flags

2018-05-14 Thread Su Yue
This patchset can be fetch from my github:
https://github.com/Damenly/btrfs-progs/commits/odd_inode_flags
It's based on devel.

symlinks should never have append/immutable attributes.
This patchset enables btrfs check to verify such corruption.

PATCH[1] is for original mode.
PATCH[2] is for original mode.

PATCH[3] adds a test image.
For further use, the directory is called bad-inode-flags.

#issue 133

---
Changelog:
v2:
  Use "rec->errors |=" instead of "rec->errors =" in patch[1].
  Define new error bit of invalid inode flags in lowmem mode.
  Adjust print message in patch[2]. Thanks, Qu and Nikolay.
  Rename test directory from odd-inode-flags to bad-inode-flags.

Su Yue (3):
  btrfs-progs: check: check symlinks with append/immutable flags
  btrfs-progs: lowmem: check symlinks with append/immutable flags
  btrfs-progs: fsck-tests: add test case to check symlinks with bad
flags

 check/main.c |   7 +++
 check/mode-lowmem.c  |  10 ++
 check/mode-lowmem.h  |   1 +
 check/mode-original.h|   1 +
 .../034-bad-inode-flags/default_case.img | Bin 0 -> 4096 bytes
 tests/fsck-tests/034-bad-inode-flags/test.sh |  15 +++
 6 files changed, 34 insertions(+)
 create mode 100644 tests/fsck-tests/034-bad-inode-flags/default_case.img
 create mode 100755 tests/fsck-tests/034-bad-inode-flags/test.sh

-- 
2.17.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] btrfs-progs: lowmem: check symlinks with append/immutable flags

2018-05-14 Thread Su Yue
Define new error bit INODE_FLAGS_ERROR to represents invalid inode
flags error.

Symlinks should never have append/immutable flags set.
While checking inodes, if found a symlink with append/immutable
flags, report and record the inode flags error.

This is for lowmem mode.

Signed-off-by: Su Yue 
---
 check/mode-lowmem.c | 10 ++
 check/mode-lowmem.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 9890180d1d3c..f598bc364de4 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -2274,6 +2274,7 @@ static int check_inode_item(struct btrfs_root *root, 
struct btrfs_path *path)
struct btrfs_key last_key;
u64 inode_id;
u32 mode;
+   u64 flags;
u64 nlink;
u64 nbytes;
u64 isize;
@@ -2307,10 +2308,19 @@ static int check_inode_item(struct btrfs_root *root, 
struct btrfs_path *path)
isize = btrfs_inode_size(node, ii);
nbytes = btrfs_inode_nbytes(node, ii);
mode = btrfs_inode_mode(node, ii);
+   flags = btrfs_inode_flags(node, ii);
dir = imode_to_type(mode) == BTRFS_FT_DIR;
nlink = btrfs_inode_nlink(node, ii);
nodatasum = btrfs_inode_flags(node, ii) & BTRFS_INODE_NODATASUM;
 
+   if (mode & BTRFS_FT_SYMLINK &&
+   flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND)) {
+   err |= INODE_FLAGS_ERROR;
+   error(
+"symlinks must never have immutable/append flags set, root %llu inode item 
%llu flags %llu may be corrupted",
+ root->objectid, inode_id, flags);
+   }
+
while (1) {
btrfs_item_key_to_cpu(path->nodes[0], _key, 
path->slots[0]);
ret = btrfs_next_item(root, path);
diff --git a/check/mode-lowmem.h b/check/mode-lowmem.h
index e7ba62e2413e..91f7b6b1db53 100644
--- a/check/mode-lowmem.h
+++ b/check/mode-lowmem.h
@@ -44,6 +44,7 @@
 #define DIR_COUNT_AGAIN (1<<20) /* DIR isize should be recalculated */
 #define BG_ACCOUNTING_ERROR (1<<21) /* Block group accounting error */
 #define FATAL_ERROR (1<<22) /* Fatal bit for errno */
+#define INODE_FLAGS_ERROR  (1<<23) /* Invalid inode flags */
 
 /*
  * Error bit for low memory mode check.
-- 
2.17.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] btrfs-progs: fsck-tests: add test case to check symlinks with bad flags

2018-05-14 Thread Su Yue
There are two bad symlinks in the test case.
One is with immutable attribute.
Another one is with append attribute.

Signed-off-by: Su Yue 
---
 .../034-bad-inode-flags/default_case.img | Bin 0 -> 4096 bytes
 tests/fsck-tests/034-bad-inode-flags/test.sh |  15 +++
 2 files changed, 15 insertions(+)
 create mode 100644 tests/fsck-tests/034-bad-inode-flags/default_case.img
 create mode 100755 tests/fsck-tests/034-bad-inode-flags/test.sh

diff --git a/tests/fsck-tests/034-bad-inode-flags/default_case.img 
b/tests/fsck-tests/034-bad-inode-flags/default_case.img
new file mode 100644
index 
..43a2a6f62d5ef3afd77f117b577a56ad6098ed19
GIT binary patch
literal 4096
zcmeH}c{CJUAIFU?k*yI4S@RGnly%15rj(_~G8rO6vS$b*hR9G^M_GnP*~gG%*XULD
zecwiA3?YW>hM5=dd(Ly-=e+&-p7Z|Uo_l`xdw=)dbMEhRes@7VO!Ok2v8iSFcVXRY
z0S9$YY#lgx4lMhgmx00fKp$b;YjjKwbl1Mc|4Xm#`|-vHGaC;^56vNgLjwQD1pG!)
z0%s{UtRTF}`3%o*f~o}Ub~i7&5a;#~3S~wq>yb%mpwOOBQV@`K}NNkK2T)W
zP3ZBKkdz|>(+ppAd?n`Mhq5Y_Mx6#)lkr%V0RYfvXKS#j!hDNZK|Zl0`#(aZhs{~>g=;i^@`Df+(}JrN_k>!De44F)K~Z_B2ji*e|B3TM=T#=3J?hFQkLT
zcoHc*_{i*Zl|=_O)<@(9atD)c?k`SQU{$3zUo_)!00YeaVw>N{!sut0FUEY0%ig=a
zMor(Rh4`G&3Z`YTQC;Vd8DgP6v(Pk4lF8cYH?czMG@QAi}DN#2dOFHE}ny2M`mZ24yt(4GUf}VgxH+`7uL@
zKDpXm>?+9{UNN|}xP%5x8ehlk9$5E{}{L^#S!Q*F3j-dy9VK#9I*B-fuj6aJJl
z=P4E0NBM;7Fya(6l|UU2sBHD~_7;trI@t}asks;9Kk+;cNS#DJ)+5qi%9$5}Frqs<
zGo#p;!NId;WbB=u?K!Wum6^@+(Q%lm<^l5MSYiPzgj6jAV1WhvGO+kDGX
zzw5{X`z6v>V}g9P^96Zk*+d9P;fMmiO;S-TF=bQT1)?9j;R){>iIAX>R5$~D?vMpj
zBnOV)t?w>WW|3jF#YJ{gK;@)e6U`2;z47;3jE!%&}epjB+(;{Nt3NKke
zgQOrb4!>S43Sh|5t~6Q9_eALyPQFaH!cTLiXpX=W)Jx7Uxq;~%Hy)VR7GgFu`>T4N
zZ?@I};@Zg%oqD*Vk4*qEy@RB5H{Z>mf%4)?l}C7)7z9RiH3+|z}`sur%ucH}%V
zPz+A(a7rcq^2?5ccAa_hN*yFmglI!5N>l2$a={+~R77Z4R7$n%aIEVA=N$
z1b$h$FIy~<-xmLlH6{;68g#pjl7j63c)%xA>Dd9r0^C%QE)RQfgi2
zXiewE(h-{O)QWl=h!TjI+0de3-VC=YSC^0MN(pB5!!V!#68@VzTm&5G-AS>CCye#X
zC(0KE`nt?7e5TX|O%{xg!*Ab5P=2^-DRjo81w`|=zUk#|&(uj(4p2-{$h3e`sM17w
zdW#Tl75Q}}emQch`~$C9_sjTscj@M>sfi4kHHdcdX*RC$on>k-Dn@x1YE7>^O(DHf
z{Sf3DiCn-r)e#3Nqt1-yQ4Z
zjQbKaB#VjYFLQ=5E+mdcWR7)?7QwYYr_nqX2!rj`wma*sUcEyVH)-UMp9vk5p*sJi
z)uP>;M~F@^1@nip{sw)_u!t0m3wfd_^tsEjdl3JV!)ti4tO%j3AZkU&1P8pdqT2@4
z)2a@Bm48L_Y|X>TZU*bQ)wNAuogJWSw*}wLwwe>k*K*tA6k?IBPtx`FWcq8tfigF_
z`^-m<5?dEeNYIhR9?
zpXv|E5a?(uK@>aY^((xBosb-{GjPf)k(;|7FUGk~Pl7(`D#yu_syY49#0sXEQ
zq2Yh_;mDS0F#bl9X5`NV5q91498spY6Yu}FrF1zva3DCCYrYxh$_DQ7>Kut=Pt1kZ
z=

Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set

2018-05-14 Thread james harvey
Don't know if this will help.  I just learned about pstore, and see in
there a dmesg that's interesting.

The serial port kernel errors started this time with "BUG: unable to
handle kernel paging request".  The pstore dmesg has everything from
there until the end of the first trace.

But, the interesting part is the pstore dmesg has 310 "BTRFS:
decompress failed" messages the serial port (the versions I've shared)
version doesn't. (Sometimes the serial crashes have 1 of these btrfs
errors, but never repeated like this.)

These 310 btrfs errors are all uptime-stamped from 13110.016096 -
13110.253752, when the BUG is later at 13110.370494 (when the serial
errors start.)

With the kernel trying to decompress 310 times within 0.237656
seconds, maybe that's an indication with invalid data, it retries
forever in a bad way crashing the kernel, rather than failing?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


4.15.6 crash: BUG at fs/btrfs/ctree.c:1862

2018-05-14 Thread Marc MERLIN
static noinline struct extent_buffer *
read_node_slot(struct btrfs_fs_info *fs_info, struct extent_buffer *parent,
   int slot)
{
int level = btrfs_header_level(parent);
struct extent_buffer *eb;

if (slot < 0 || slot >= btrfs_header_nritems(parent))
return ERR_PTR(-ENOENT);

BUG_ON(level == 0);



BTRFS info (device dm-2): relocating block group 13404622290944 flags data
BTRFS info (device dm-2): found 9959 extents
BTRFS info (device dm-2): found 9959 extents
BTRFS info (device dm-2): relocating block group 13403548549120 flags data
[ cut here ]
kernel BUG at fs/btrfs/ctree.c:1862!
invalid opcode:  [#1] PREEMPT SMP PTI
CPU: 5 PID: 8103 Comm: btrfs Tainted: G U   
4.15.6-amd64-preempt-sysrq-20171018 #3
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 
04/27/2013
RIP: 0010:read_node_slot+0x3c/0x9e
RSP: 0018:becfaa0b7b58 EFLAGS: 00210246
RAX: 00a0 RBX: 000c RCX: 0003
RDX: 000c RSI: 9a60e9d9de78 RDI: 00052f6e
RBP: 9a60e9d9de78 R08: 0001 R09: becfaa0b7bf6
R10: 9a64988bd7e9 R11: 9a64988bd7c8 R12: e003d4bdb800
R13: 9a64a481 R14:  R15: 
FS:  7fba34c9c8c0() GS:9a64de34() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 5a8b9c9a CR3: 0001446c6004 CR4: 001606e0
Call Trace:
 tree_advance+0xb1/0x11e
 btrfs_compare_trees+0x1c2/0x4d6
 ? process_extent+0xdcf/0xdcf
 btrfs_ioctl_send+0x81e/0xc70
 ? __kmalloc_track_caller+0xfb/0x10f
 _btrfs_ioctl_send+0xbc/0xe6
 ? paravirt_sched_clock+0x5/0x8
 ? set_task_rq+0x2f/0x80
 ? task_rq_unlock+0x22/0x36
 btrfs_ioctl+0x162f/0x1dc8
 ? select_task_rq_fair+0xb65/0xb7a
 ? update_load_avg+0x16d/0x442
 ? list_add+0x15/0x2e
 ? cfs_rq_throttled.isra.30+0x9/0x18
 ? vfs_ioctl+0x1b/0x28
 vfs_ioctl+0x1b/0x28
 do_vfs_ioctl+0x4f4/0x53f
 ? __audit_syscall_entry+0xbf/0xe3
 SyS_ioctl+0x52/0x76
 do_syscall_64+0x72/0x81
 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7fba34d835e7
RSP: 002b:7ffc32cf4cb8 EFLAGS: 0202 ORIG_RAX: 0010
RAX: ffda RBX: 523f RCX: 7fba34d835e7
RDX: 7ffc32cf4d40 RSI: 40489426 RDI: 0004
RBP: 0004 R08:  R09: 7fba34c9b700
R10: 7fba34c9b9d0 R11: 0202 R12: 0003
R13: 563a30b87020 R14: 0001 R15: 0001
Code: f5 53 4c 8b a6 98 00 00 00 89 d3 4c 89 e7 e8 67 fd ff ff 85 db 78 63 4c 
89 e7 41 88 c6 e8 92 fb ff ff 39 d8 76 54 45 84 f6 75 02 <0f> 0b 89 de 48 89 ef 
e8 2e ff ff ff 89 de 49 89 c4 48 89 ef e8
RIP: read_node_slot+0x3c/0x9e RSP: becfaa0b7b58
---[ end trace a24e7de6b77b5cb1 ]---
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x1900 from 0x8100 (relocation range: 
0x8000-0xbfff)

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] test online label ioctl

2018-05-14 Thread Eric Sandeen
On 5/14/18 6:11 PM, Dave Chinner wrote:
> On Mon, May 14, 2018 at 12:09:16PM -0500, Eric Sandeen wrote:
>> This tests the online label ioctl that btrfs has, which has been
>> recently proposed for XFS.
>>
>> To run, it requires an updated xfs_io with the label command and a
>> filesystem that supports it
>>
>> A slight change here to _require_xfs_io_command as well, so that tests
>> which simply fail with "Inappropriate ioctl" can be caught in the
>> common case.
>>
>> Signed-off-by: Eric Sandeen 
>> ---


 
>> +# The maximum filesystem label length, not including terminating NULL
>> +_label_get_max()
>> +{
>> +case $FSTYP in
>> +xfs)
>> +MAXLEN=12
>> +;;
>> +btrfs)
>> +MAXLEN=255
>> +;;
>> +*)
>> +MAXLEN=0
> 
> Why not just _notrun here?

do we want to go through the other steps only to get here and notrun
halfway through?

>> +;;
>> +esac
>> +
>> +echo $MAXLEN
>> +}
>> +
>> +_require_label_get_max()
>> +{
>> +if [ $(_label_get_max) -eq 0 ]; then
>> +_notrun "$FSTYP does not define maximum label length"
>> +fi
> 
> And this check can go away?

We'd like to know if all the validations in this can complete before we
get started, right?  That's why I did it this way.

> Also, shouldn't it be _require_online_label_change() ? And then
> maybe you can move the xfs_io label command check inside it?

Well, we want to know a lot of things:

1) can the kernel code for this filesystem support online label
2) does xfs_io support the label command
3) does this test know the maximum label length to test for this fs

the above stuff is for 3)

>> +# remove previous $seqres.full before test
>> +rm -f $seqres.full
>> +
>> +# real QA test starts here
>> +
>> +_supported_fs generic
>> +_supported_os Linux
>> +_require_scratch
>> +_require_xfs_io_command "label"
>> +_require_label_get_max
>> +
>> +_scratch_mkfs > $seqres.full 2>&1
>> +_scratch_mount
>> +
>> +# Make sure we can set & clear the label
>> +$XFS_IO_PROG -c "label label.$seq" $SCRATCH_MNT
>> +$XFS_IO_PROG -c "label" $SCRATCH_MNT
>> +
>> +# And that userspace can see it now, while mounted
>> +# NB: some blkid has trailing whitespace, filter it out here
>> +blkid -s LABEL $SCRATCH_DEV | _filter_scratch | sed -e "s/ $//g"
>> +
>> +# And that the it is still there when it's unmounted
>> +_scratch_unmount
>> +blkid -s LABEL $SCRATCH_DEV | _filter_scratch | sed -e "s/ $//g"
> 
> Ok, so "LABEL" here is a special blkid match token

yep

>> +# And that it persists after a remount
>> +_scratch_mount
>> +$XFS_IO_PROG -c "label" $SCRATCH_MNT
>> +
>> +# And that a too-long label is rejected, beyond the interface max:
>> +LABEL=$(perl -e "print 'l' x 257;")
> 
> And now you use it as a variable. Nasty and confusing. Using lower
> case for local variables is the norm, right? I thought we were only
> supposed to use upper case for global test harness variables...

I guess I didn't know about that convention (or forgot)

ok, yeah, that's a little confusing I guess.  *shrug*  I can change it
if you prefer.  Obviously the "blkid -s LABEL" must remain "LABEL"

> But even making it "label" is problematic:
> 
>> +$XFS_IO_PROG -c "label $LABEL" $SCRATCH_MNT
> 
> because "label" is an xfs_io command. Perhaps call it "fs_label"?

ok

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] test online label ioctl

2018-05-14 Thread Dave Chinner
On Mon, May 14, 2018 at 12:09:16PM -0500, Eric Sandeen wrote:
> This tests the online label ioctl that btrfs has, which has been
> recently proposed for XFS.
> 
> To run, it requires an updated xfs_io with the label command and a
> filesystem that supports it
> 
> A slight change here to _require_xfs_io_command as well, so that tests
> which simply fail with "Inappropriate ioctl" can be caught in the
> common case.
> 
> Signed-off-by: Eric Sandeen 
> ---
> 
> (urgh send as proper new thread, sorry)
> 
> This passes on btrfs, _notruns on xfs/ext4 of yore, and passes
> on xfs w/ my online label patchset (as long as xfs_io has the new
> capability)
> 
> V2: Add a max label length helper
> Set the proper btrfs max label length o_O oops
> Filter trailing whitespace from blkid output
> 
> 
> diff --git a/common/rc b/common/rc
> index 9ffab7fd..88a99cff 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -2158,6 +2158,9 @@ _require_xfs_io_command()
>   echo $testio | grep -q "Inappropriate ioctl" && \
>   _notrun "xfs_io $command support is missing"
>   ;;
> + "label")
> + testio=`$XFS_IO_PROG -c "label" $TEST_DIR 2>&1`
> + ;;
>   "open")
>   # -c "open $f" is broken in xfs_io <= 4.8. Along with the fix,
>   # a new -C flag was introduced to execute one shot commands.
> @@ -2196,7 +2199,7 @@ _require_xfs_io_command()
>   rm -f $testfile 2>&1 > /dev/null
>   echo $testio | grep -q "not found" && \
>   _notrun "xfs_io $command support is missing"
> - echo $testio | grep -q "Operation not supported" && \
> + echo $testio | grep -q "Operation not supported\|Inappropriate ioctl" 
> && \
>   _notrun "xfs_io $command failed (old kernel/wrong fs?)"
>   echo $testio | grep -q "Invalid" && \
>   _notrun "xfs_io $command failed (old kernel/wrong fs/bad args?)"
> @@ -3802,6 +3805,31 @@ _require_scratch_feature()
>   esac
>  }
>  
> +# The maximum filesystem label length, not including terminating NULL
> +_label_get_max()
> +{
> + case $FSTYP in
> + xfs)
> + MAXLEN=12
> + ;;
> + btrfs)
> + MAXLEN=255
> + ;;
> + *)
> + MAXLEN=0

Why not just _notrun here?

> + ;;
> + esac
> +
> + echo $MAXLEN
> +}
> +
> +_require_label_get_max()
> +{
> + if [ $(_label_get_max) -eq 0 ]; then
> + _notrun "$FSTYP does not define maximum label length"
> + fi

And this check can go away?

Also, shouldn't it be _require_online_label_change() ? And then
maybe you can move the xfs_io label command check inside it?

> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +_supported_fs generic
> +_supported_os Linux
> +_require_scratch
> +_require_xfs_io_command "label"
> +_require_label_get_max
> +
> +_scratch_mkfs > $seqres.full 2>&1
> +_scratch_mount
> +
> +# Make sure we can set & clear the label
> +$XFS_IO_PROG -c "label label.$seq" $SCRATCH_MNT
> +$XFS_IO_PROG -c "label" $SCRATCH_MNT
> +
> +# And that userspace can see it now, while mounted
> +# NB: some blkid has trailing whitespace, filter it out here
> +blkid -s LABEL $SCRATCH_DEV | _filter_scratch | sed -e "s/ $//g"
> +
> +# And that the it is still there when it's unmounted
> +_scratch_unmount
> +blkid -s LABEL $SCRATCH_DEV | _filter_scratch | sed -e "s/ $//g"

Ok, so "LABEL" here is a special blkid match token

> +# And that it persists after a remount
> +_scratch_mount
> +$XFS_IO_PROG -c "label" $SCRATCH_MNT
> +
> +# And that a too-long label is rejected, beyond the interface max:
> +LABEL=$(perl -e "print 'l' x 257;")

And now you use it as a variable. Nasty and confusing. Using lower
case for local variables is the norm, right? I thought we were only
supposed to use upper case for global test harness variables...

But even making it "label" is problematic:

> +$XFS_IO_PROG -c "label $LABEL" $SCRATCH_MNT

because "label" is an xfs_io command. Perhaps call it "fs_label"?

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set

2018-05-14 Thread james harvey
On Mon, May 14, 2018 at 12:52 PM, David Sterba  wrote:
> On Mon, May 14, 2018 at 03:02:10PM +0800, Qu Wenruo wrote:
>> As btrfs(5) specified:
>>
>>   Note
>>   If nodatacow or nodatasum are enabled, compression is disabled.
>>
>> If NODATASUM or NODATACOW set, we should not compress the extent.
>>
>> And in fact, we have bug report about corrupted compressed extent
>> leading to memory corruption in mail list.
>
> Link please.

https://bugzilla.kernel.org/show_bug.cgi?id=199707
&
https://www.spinics.net/lists/linux-btrfs/msg77971.html

>> Although it's mostly buggy lzo implementation causing the problem, btrfs
>> still needs to be fixed to meet the specification.
>
> That's very vague, what's the LZO bug? If the input is garbage and lzo
> decompression cannot decompress it, it's not a lzo bug.

The bug is the kernel doesn't just give an I/O error.  It totally
crashes the system.  Even a regular user cat'ing an invalid
btrfs-lzo-compressed file will take the system down.  Qu's current
theory which I agree with is it causes some kind of random kernel
memory corruption.  The crashes sometimes include "BTRFS: decompress
failed" at the start, but give backtraces for other things than btrfs
lzo code.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set

2018-05-14 Thread james harvey
On Mon, May 14, 2018 at 6:35 AM, Qu Wenruo  wrote:
> And if possible, please don't just remove those offending files (yet).
> Your binary dump would help a lot locating the root case.

Absolutely.  This is on a 50G LVM root volume, so I've been able to
leave the original one unmodified and always mounted ro.  Copied it
and changed the UUID.  So, I can leave it around for a while without a
problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set

2018-05-14 Thread Timofey Titovets
пн, 14 мая 2018 г. в 20:32, David Sterba :

> On Mon, May 14, 2018 at 03:02:10PM +0800, Qu Wenruo wrote:
> > As btrfs(5) specified:
> >
> >   Note
> >   If nodatacow or nodatasum are enabled, compression is disabled.
> >
> > If NODATASUM or NODATACOW set, we should not compress the extent.
> >
> > And in fact, we have bug report about corrupted compressed extent
> > leading to memory corruption in mail list.

> Link please.

> > Although it's mostly buggy lzo implementation causing the problem, btrfs
> > still needs to be fixed to meet the specification.

> That's very vague, what's the LZO bug? If the input is garbage and lzo
> decompression cannot decompress it, it's not a lzo bug.

> > Reported-by: James Harvey 
> > Signed-off-by: Qu Wenruo 
> > ---
> >  fs/btrfs/inode.c | 8 
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index d241285a0d2a..dbef3f404559 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -396,6 +396,14 @@ static inline int inode_need_compress(struct inode
*inode, u64 start, u64 end)
> >  {
> >   struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> >
> > + /*
> > +  * Btrfs doesn't support compression without csum or CoW.
> > +  * This should have the highest priority.
> > +  */
> > + if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW ||
> > + BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)
> > + return 0;

> This is also the wrong place to fix that, NODATASUM or NODATACOW inode
> should never make it to compress_file_range (that calls
> inode_need_compress).


David, i've talk about that some time ago:
https://www.spinics.net/lists/linux-btrfs/msg73137.html

NoCow files can be *easy* compressed.
```
➜  ~ touch test
➜  ~ chattr +C test
➜  ~ lsattr test
---C-- test
➜  ~ dd if=/dev/zero of=./test bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00099878 s, 1.0 GB/s
➜  ~ sync
➜  ~ filefrag -v test
Filesystem type is: 9123683e
File size of test is 1048576 (256 blocks of 4096 bytes)
ext: logical_offset:physical_offset: length:   expected: flags:
   0:0.. 255:   88592741..  88592996:256:
last,eof
test: 1 extent found
➜  ~ btrfs fi def -vrczstd test
test
➜  ~ filefrag -v test
Filesystem type is: 9123683e
File size of test is 1048576 (256 blocks of 4096 bytes)
ext: logical_offset:physical_offset: length:   expected: flags:
   0:0..  31:   3125..  3156: 32: encoded
   1:   32..  63:   3180..  3211: 32:   3157: encoded
   2:   64..  95:   3185..  3216: 32:   3212: encoded
   3:   96.. 127:   3188..  3219: 32:   3217: encoded
   4:  128.. 159:   3263..  3294: 32:   3220: encoded
   5:  160.. 191:   3355..  3386: 32:   3295: encoded
   6:  192.. 223:   3376..  3407: 32:   3387: encoded
   7:  224.. 255:   3411..  3442: 32:   3408:
last,encoded,eof
test: 8 extents found
```
-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Clarification needed about libbtrfs & libbtrfsutil

2018-05-14 Thread Omar Sandoval
On Mon, May 14, 2018 at 09:40:19AM +0100, Dimitri John Ledkov wrote:
> Are both of these meant to be public libraries, installed on the user
> systems, and available in .so variant as well for 3rd party
> development and public dynamic linking?
> 
> Or are these private internal libraries, which are installed as public
> runtime only, simply to share code between the utils, but otherwise
> provide no abi stability and will forever remain libfoo.so.0?

They're both meant to be public. In fact, libbtrfsutil is already 1.0.0.

> Or should these even be a noinst_ libraries (~= Libtool Convenience
> Libraries), and are simply intermediate by-products?
> 
> I'm asking because despite compiling shared & static variants of these
> libraries, and "shared linked" and "static linked" variants of the
> utils, it appears that all utilities are statically linking against
> libbtrfs/libbtrfsutils. Thus no binaries nor bindings, dynamically
> link against neither libbtrfs nor libbtrfsutil.
> 
> Tweaking the makefile to use libs_shared variable instead of libs or
> libs_static, results in slightly smaller binaries, dynamically linked
> against libbtrfs/libbtrfsutil.
> 
> But it is hard to tell if this is a bug/mistake, or an intentional feature.

I'm not sure why we statically link libbtrfs into the the tools, and I
just copied that for libbtrfsutil.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2] test online label ioctl

2018-05-14 Thread Eric Sandeen
This tests the online label ioctl that btrfs has, which has been
recently proposed for XFS.

To run, it requires an updated xfs_io with the label command and a
filesystem that supports it

A slight change here to _require_xfs_io_command as well, so that tests
which simply fail with "Inappropriate ioctl" can be caught in the
common case.

Signed-off-by: Eric Sandeen 
---

(urgh send as proper new thread, sorry)

This passes on btrfs, _notruns on xfs/ext4 of yore, and passes
on xfs w/ my online label patchset (as long as xfs_io has the new
capability)

V2: Add a max label length helper
Set the proper btrfs max label length o_O oops
Filter trailing whitespace from blkid output


diff --git a/common/rc b/common/rc
index 9ffab7fd..88a99cff 100644
--- a/common/rc
+++ b/common/rc
@@ -2158,6 +2158,9 @@ _require_xfs_io_command()
echo $testio | grep -q "Inappropriate ioctl" && \
_notrun "xfs_io $command support is missing"
;;
+   "label")
+   testio=`$XFS_IO_PROG -c "label" $TEST_DIR 2>&1`
+   ;;
"open")
# -c "open $f" is broken in xfs_io <= 4.8. Along with the fix,
# a new -C flag was introduced to execute one shot commands.
@@ -2196,7 +2199,7 @@ _require_xfs_io_command()
rm -f $testfile 2>&1 > /dev/null
echo $testio | grep -q "not found" && \
_notrun "xfs_io $command support is missing"
-   echo $testio | grep -q "Operation not supported" && \
+   echo $testio | grep -q "Operation not supported\|Inappropriate ioctl" 
&& \
_notrun "xfs_io $command failed (old kernel/wrong fs?)"
echo $testio | grep -q "Invalid" && \
_notrun "xfs_io $command failed (old kernel/wrong fs/bad args?)"
@@ -3802,6 +3805,31 @@ _require_scratch_feature()
esac
 }
 
+# The maximum filesystem label length, not including terminating NULL
+_label_get_max()
+{
+   case $FSTYP in
+   xfs)
+   MAXLEN=12
+   ;;
+   btrfs)
+   MAXLEN=255
+   ;;
+   *)
+   MAXLEN=0
+   ;;
+   esac
+
+   echo $MAXLEN
+}
+
+_require_label_get_max()
+{
+   if [ $(_label_get_max) -eq 0 ]; then
+   _notrun "$FSTYP does not define maximum label length"
+   fi
+}
+
 init_rc
 
 

diff --git a/tests/generic/479 b/tests/generic/479
old mode 100644
new mode 100755
diff --git a/tests/generic/485 b/tests/generic/485
new file mode 100755
index ..941afd3d
--- /dev/null
+++ b/tests/generic/485
@@ -0,0 +1,90 @@
+#! /bin/bash
+# FS QA Test 485
+#
+# Test the online filesystem label set/get ioctls
+#
+#---
+# Copyright (c) 2018 Red Hat, Inc.  All Rights Reserved.
+# Author: Eric Sandeen 
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+_require_xfs_io_command "label"
+_require_label_get_max
+
+_scratch_mkfs > $seqres.full 2>&1
+_scratch_mount
+
+# Make sure we can set & clear the label
+$XFS_IO_PROG -c "label label.$seq" $SCRATCH_MNT
+$XFS_IO_PROG -c "label" $SCRATCH_MNT
+
+# And that userspace can see it now, while mounted
+# NB: some blkid has trailing whitespace, filter it out here
+blkid -s LABEL $SCRATCH_DEV | _filter_scratch | sed -e "s/ $//g"
+
+# And that the it is still there when it's unmounted
+_scratch_unmount
+blkid -s LABEL $SCRATCH_DEV | _filter_scratch | sed -e "s/ $//g"
+
+# And that it persists after a remount
+_scratch_mount
+$XFS_IO_PROG -c "label" $SCRATCH_MNT
+
+# And that a too-long label is rejected, beyond the interface max:
+LABEL=$(perl -e "print 'l' x 257;")
+$XFS_IO_PROG -c "label $LABEL" 

Re: [PATCH] test online label ioctl

2018-05-14 Thread Eric Sandeen
This tests the online label ioctl that btrfs has, which has been
recently proposed for XFS.

To run, it requires an updated xfs_io with the label command and a
filesystem that supports it

A slight change here to _require_xfs_io_command as well, so that tests
which simply fail with "Inappropriate ioctl" can be caught in the
common case.

Signed-off-by: Eric Sandeen 
---

This passes on btrfs, _notruns on xfs/ext4 of yore, and passes
on xfs w/ my online label patchset (as long as xfs_io has the new
capability)

V2: Add a max label length helper
Set the proper btrfs max label length o_O oops
Filter trailing whitespace from blkid output


diff --git a/common/rc b/common/rc
index 9ffab7fd..88a99cff 100644
--- a/common/rc
+++ b/common/rc
@@ -2158,6 +2158,9 @@ _require_xfs_io_command()
echo $testio | grep -q "Inappropriate ioctl" && \
_notrun "xfs_io $command support is missing"
;;
+   "label")
+   testio=`$XFS_IO_PROG -c "label" $TEST_DIR 2>&1`
+   ;;
"open")
# -c "open $f" is broken in xfs_io <= 4.8. Along with the fix,
# a new -C flag was introduced to execute one shot commands.
@@ -2196,7 +2199,7 @@ _require_xfs_io_command()
rm -f $testfile 2>&1 > /dev/null
echo $testio | grep -q "not found" && \
_notrun "xfs_io $command support is missing"
-   echo $testio | grep -q "Operation not supported" && \
+   echo $testio | grep -q "Operation not supported\|Inappropriate ioctl" 
&& \
_notrun "xfs_io $command failed (old kernel/wrong fs?)"
echo $testio | grep -q "Invalid" && \
_notrun "xfs_io $command failed (old kernel/wrong fs/bad args?)"
@@ -3802,6 +3805,31 @@ _require_scratch_feature()
esac
 }
 
+# The maximum filesystem label length, not including terminating NULL
+_label_get_max()
+{
+   case $FSTYP in
+   xfs)
+   MAXLEN=12
+   ;;
+   btrfs)
+   MAXLEN=255
+   ;;
+   *)
+   MAXLEN=0
+   ;;
+   esac
+
+   echo $MAXLEN
+}
+
+_require_label_get_max()
+{
+   if [ $(_label_get_max) -eq 0 ]; then
+   _notrun "$FSTYP does not define maximum label length"
+   fi
+}
+
 init_rc
 
 

diff --git a/tests/generic/479 b/tests/generic/479
old mode 100644
new mode 100755
diff --git a/tests/generic/485 b/tests/generic/485
new file mode 100755
index ..941afd3d
--- /dev/null
+++ b/tests/generic/485
@@ -0,0 +1,90 @@
+#! /bin/bash
+# FS QA Test 485
+#
+# Test the online filesystem label set/get ioctls
+#
+#---
+# Copyright (c) 2018 Red Hat, Inc.  All Rights Reserved.
+# Author: Eric Sandeen 
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+_require_xfs_io_command "label"
+_require_label_get_max
+
+_scratch_mkfs > $seqres.full 2>&1
+_scratch_mount
+
+# Make sure we can set & clear the label
+$XFS_IO_PROG -c "label label.$seq" $SCRATCH_MNT
+$XFS_IO_PROG -c "label" $SCRATCH_MNT
+
+# And that userspace can see it now, while mounted
+# NB: some blkid has trailing whitespace, filter it out here
+blkid -s LABEL $SCRATCH_DEV | _filter_scratch | sed -e "s/ $//g"
+
+# And that the it is still there when it's unmounted
+_scratch_unmount
+blkid -s LABEL $SCRATCH_DEV | _filter_scratch | sed -e "s/ $//g"
+
+# And that it persists after a remount
+_scratch_mount
+$XFS_IO_PROG -c "label" $SCRATCH_MNT
+
+# And that a too-long label is rejected, beyond the interface max:
+LABEL=$(perl -e "print 'l' x 257;")
+$XFS_IO_PROG -c "label $LABEL" $SCRATCH_MNT
+
+# And it succeeds right at the 

Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set

2018-05-14 Thread David Sterba
On Mon, May 14, 2018 at 03:02:10PM +0800, Qu Wenruo wrote:
> As btrfs(5) specified:
> 
>   Note
>   If nodatacow or nodatasum are enabled, compression is disabled.
> 
> If NODATASUM or NODATACOW set, we should not compress the extent.
> 
> And in fact, we have bug report about corrupted compressed extent
> leading to memory corruption in mail list.

Link please.

> Although it's mostly buggy lzo implementation causing the problem, btrfs
> still needs to be fixed to meet the specification.

That's very vague, what's the LZO bug? If the input is garbage and lzo
decompression cannot decompress it, it's not a lzo bug.

> Reported-by: James Harvey 
> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/inode.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index d241285a0d2a..dbef3f404559 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -396,6 +396,14 @@ static inline int inode_need_compress(struct inode 
> *inode, u64 start, u64 end)
>  {
>   struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>  
> + /*
> +  * Btrfs doesn't support compression without csum or CoW.
> +  * This should have the highest priority.
> +  */
> + if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW ||
> + BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)
> + return 0;

This is also the wrong place to fix that, NODATASUM or NODATACOW inode
should never make it to compress_file_range (that calls
inode_need_compress).

> +
>   /* force compress */
>   if (btrfs_test_opt(fs_info, FORCE_COMPRESS))
>   return 1;
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set

2018-05-14 Thread David Sterba
On Mon, May 14, 2018 at 12:46:31PM +0300, Nikolay Borisov wrote:
> [Adding David to CC]
> 
> On 14.05.2018 12:39, Roman Mamedov wrote:
> > On Mon, 14 May 2018 11:36:26 +0300
> > Nikolay Borisov  wrote:
> > 
> >> So what made you have these expectation, is it codified somewhere
> >> (docs/man pages etc)? I'm fine with that semantics IF this is what
> >> people expect.
> > 
> > "Compression ...does not work for NOCOW files":
> > https://btrfs.wiki.kernel.org/index.php/Compression
> > 
> > The mount options man page does not say that the NOCOW attribute of files 
> > will
> > be disregarded with compress-force.  It only mentions interaction with the
> > nodatacow and nodatasum mount options. So I'd expect the attribute to still
> > work and prevent compression of NOCOW files.
> 
> I wouldn't say this is very clear, it needs to be stated explicitly.
> 
> > 
> >> Now the question is why people grew up to have this expectation and not the
> >> other way round? IMO force_compress should really disregard everything else
> > 
> > Both are knobs that the user needs to explicitly set, the difference is that
> > the +C attribute is fine-grained and the mount option is global. If they are
> > set by the user to conflicting values, it seems more useful to have the
> > fine-grained control override the global one, not the other way round.
> 
> This is valid reasoning but so is mine. So I'd like to have some rules
> on that matter such that in the future things will have consistent
> semantics. Obviously in this case the "local options trump global ones"
> seems to be prevalent. I don't have problem with that but this should
> codified somewhere.
> 
> 
> David, what's your take on that. Where do you think will be the best
> place to say that local, per-inode options take precedence over global ones?

The order of precedence is roughly like this, whenever the respective
containing object exits:

- inode
- directory
- subvolume
- mount options
- filesystem defaults

There's some special cases for the compression, that are documented
separately.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO()

2018-05-14 Thread David Sterba
On Fri, May 11, 2018 at 01:30:01PM -0700, Omar Sandoval wrote:
> On Fri, May 11, 2018 at 09:05:38PM +0100, Al Viro wrote:
> > On Thu, May 10, 2018 at 11:30:10PM -0700, Omar Sandoval wrote:
> > >  do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
> > > struct block_device *bdev, struct iov_iter *iter,
> > > get_block_t get_block, dio_iodone_t end_io,
> > > -   dio_submit_t submit_io, int flags)
> > > +   dio_submit_t submit_io, int flags, void *private)
> > 
> > Oh, dear...  That's what, 9 arguments?  I agree that the hack in question
> > is obscene, but so is this ;-/
> 
> So looking at these one by one, obviously needed:
> 
> - iocb
> - inode
> - iter
> 
> bdev is almost always inode->i_sb->s_bdev, except for Btrfs :(
> 
> These could _maybe_ go in struct kiocb:
> 
> - flags could maybe be folded into ki_flags
> - private could maybe go in iocb->private, but I haven't yet read
>   through to figure out if we're already using iocb->private for direct
>   I/O

I think the kiocb::private can be used for the purpose. There's only one
user, ext4, that also passes some DIO data around so it would in line
with the interface AFAICS.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: verify key failure

2018-05-14 Thread Chris Mason



On 14 May 2018, at 10:35, Liu Bo wrote:


Hi,

I got another warning of verify_level_key by running btrfs/124 in a 
loop, I'm testing against 4.17-rc3.


Not sure if it's false positive.


How long does this take to trigger?

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/3] btrfs: replace waitqueue_actvie with cond_wake_up

2018-05-14 Thread Nikolay Borisov


On 14.05.2018 15:23, David Sterba wrote:
> Use the wrappers and reduce the amount of low-level details about the
> waitqueue management.
> 
> Signed-off-by: David Sterba 

Reviewed-by: Nikolay Borisov 

> ---
>  fs/btrfs/compression.c   |  7 +--
>  fs/btrfs/delayed-inode.c |  9 +++--
>  fs/btrfs/dev-replace.c   | 10 --
>  fs/btrfs/extent-tree.c   |  7 +--
>  fs/btrfs/inode.c |  9 +++--
>  fs/btrfs/locking.c   | 34 +++---
>  fs/btrfs/ordered-data.c  | 14 --
>  fs/btrfs/transaction.c   |  7 +--
>  fs/btrfs/tree-log.c  | 34 --
>  9 files changed, 40 insertions(+), 91 deletions(-)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index 5268c9f85ca7..dcebb91e4e0c 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -990,12 +990,7 @@ static void __free_workspace(int type, struct list_head 
> *workspace,
>   btrfs_compress_op[idx]->free_workspace(workspace);
>   atomic_dec(total_ws);
>  wake:
> - /*
> -  * Make sure counter is updated before we wake up waiters.
> -  */
> - smp_mb();
> - if (waitqueue_active(ws_wait))
> - wake_up(ws_wait);
> + cond_wake_up(ws_wait);
>  }
>  
>  static void free_workspace(int type, struct list_head *ws)
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index a8d492dbd3e7..fe6caa7e698b 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -460,13 +460,10 @@ static void finish_one_item(struct btrfs_delayed_root 
> *delayed_root)
>  {
>   int seq = atomic_inc_return(_root->items_seq);
>  
> - /*
> -  * atomic_dec_return implies a barrier for waitqueue_active
> -  */
> + /* atomic_dec_return implies a barrier */
>   if ((atomic_dec_return(_root->items) <
> - BTRFS_DELAYED_BACKGROUND || seq % BTRFS_DELAYED_BATCH == 0) &&
> - waitqueue_active(_root->wait))
> - wake_up(_root->wait);
> + BTRFS_DELAYED_BACKGROUND || seq % BTRFS_DELAYED_BATCH == 0))
> + cond_wake_up_nomb(_root->wait);
>  }
>  
>  static void __btrfs_remove_delayed_item(struct btrfs_delayed_item 
> *delayed_item)
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index f82be266ba4b..d33ff5b23302 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -916,9 +916,9 @@ void btrfs_dev_replace_clear_lock_blocking(
>   ASSERT(atomic_read(_replace->read_locks) > 0);
>   ASSERT(atomic_read(_replace->blocking_readers) > 0);
>   read_lock(_replace->lock);
> - if (atomic_dec_and_test(_replace->blocking_readers) &&
> - waitqueue_active(_replace->read_lock_wq))
> - wake_up(_replace->read_lock_wq);
> + /* Barrier implied by atomic_dec_and_test */
> + if (atomic_dec_and_test(_replace->blocking_readers))
> + cond_wake_up_nomb(_replace->read_lock_wq);
>  }
>  
>  void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info)
> @@ -929,9 +929,7 @@ void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info 
> *fs_info)
>  void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount)
>  {
>   percpu_counter_sub(_info->bio_counter, amount);
> -
> - if (waitqueue_active(_info->replace_wait))
> - wake_up(_info->replace_wait);
> + cond_wake_up_nomb(_info->replace_wait);
>  }
>  
>  void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info)
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 3871658b6ab1..94fc825a126d 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -11094,12 +11094,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, 
> struct fstrim_range *range)
>  void btrfs_end_write_no_snapshotting(struct btrfs_root *root)
>  {
>   percpu_counter_dec(>subv_writers->counter);
> - /*
> -  * Make sure counter is updated before we wake up waiters.
> -  */
> - smp_mb();
> - if (waitqueue_active(>subv_writers->wait))
> - wake_up(>subv_writers->wait);
> + cond_wake_up(>subv_writers->wait);
>  }
>  
>  int btrfs_start_write_no_snapshotting(struct btrfs_root *root)
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index d241285a0d2a..cf6f2815ffee 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1156,13 +1156,10 @@ static noinline void async_cow_submit(struct 
> btrfs_work *work)
>   nr_pages = (async_cow->end - async_cow->start + PAGE_SIZE) >>
>   PAGE_SHIFT;
>  
> - /*
> -  * atomic_sub_return implies a barrier for waitqueue_active
> -  */
> + /* atomic_sub_return implies a barrier */
>   if (atomic_sub_return(nr_pages, _info->async_delalloc_pages) <
> - 5 * SZ_1M &&
> - waitqueue_active(_info->async_submit_wait))
> - wake_up(_info->async_submit_wait);
> + 5 * SZ_1M)
> + 

Re: [PATCH v4 11/12] Btrfs: renumber BTRFS_INODE_ runtime flags

2018-05-14 Thread David Sterba
On Fri, May 11, 2018 at 01:13:39PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval 
> 
> We got rid of BTRFS_INODE_HAS_ORPHAN_ITEM and
> BTRFS_INODE_ORPHAN_META_RESERVED, so we can renumber the flags to make
> them consecutive again.
> 
> Signed-off-by: Omar Sandoval 
> ---
>  fs/btrfs/btrfs_inode.h | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index 4807cde0313d..bbbe7f308d68 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -20,14 +20,14 @@
>   * new data the application may have written before commit.
>   */
>  #define BTRFS_INODE_ORDERED_DATA_CLOSE   0
> -#define BTRFS_INODE_DUMMY2
> -#define BTRFS_INODE_IN_DEFRAG3
> -#define BTRFS_INODE_HAS_ASYNC_EXTENT 5
> -#define BTRFS_INODE_NEEDS_FULL_SYNC  6
> -#define BTRFS_INODE_COPY_EVERYTHING  7
> -#define BTRFS_INODE_IN_DELALLOC_LIST 8
> -#define BTRFS_INODE_READDIO_NEED_LOCK9
> -#define BTRFS_INODE_HAS_PROPS10
> +#define BTRFS_INODE_DUMMY1
> +#define BTRFS_INODE_IN_DEFRAG2
> +#define BTRFS_INODE_HAS_ASYNC_EXTENT 3
> +#define BTRFS_INODE_NEEDS_FULL_SYNC  4
> +#define BTRFS_INODE_COPY_EVERYTHING  5
> +#define BTRFS_INODE_IN_DELALLOC_LIST 6
> +#define BTRFS_INODE_READDIO_NEED_LOCK7
> +#define BTRFS_INODE_HAS_PROPS8

I'll update it to something like this, as we want the auto-numbering
from enums:

--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -19,17 +19,19 @@
 * ordered operations list so that we make sure to flush out any
 *   * new data the application may have written before commit.
 */
 -#define BTRFS_INODE_ORDERED_DATA_CLOSE 0
 -#define BTRFS_INODE_ORPHAN_META_RESERVED   1
 -#define BTRFS_INODE_DUMMY  2
 -#define BTRFS_INODE_IN_DEFRAG  3
 -#define BTRFS_INODE_HAS_ORPHAN_ITEM4
 -#define BTRFS_INODE_HAS_ASYNC_EXTENT   5
 -#define BTRFS_INODE_NEEDS_FULL_SYNC6
 -#define BTRFS_INODE_COPY_EVERYTHING7
 -#define BTRFS_INODE_IN_DELALLOC_LIST   8
 -#define BTRFS_INODE_READDIO_NEED_LOCK  9
 -#define BTRFS_INODE_HAS_PROPS  10
 +enum {
 +   BTRFS_INODE_ORDERED_DATA_CLOSE = 0,
 +   BTRFS_INODE_ORPHAN_META_RESERVED,
 +   BTRFS_INODE_DUMMY,
 +   BTRFS_INODE_IN_DEFRAG,
 +   BTRFS_INODE_HAS_ORPHAN_ITEM,
 +   BTRFS_INODE_HAS_ASYNC_EXTENT,
 +   BTRFS_INODE_NEEDS_FULL_SYNC,
 +   BTRFS_INODE_COPY_EVERYTHING,
 +   BTRFS_INODE_IN_DELALLOC_LIST,
 +   BTRFS_INODE_READDIO_NEED_LOCK,
 +   BTRFS_INODE_HAS_PROPS,
 +};
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] vfs: dedupe should return EPERM if permission is not granted

2018-05-14 Thread David Sterba
On Fri, May 11, 2018 at 12:26:51PM -0700, Mark Fasheh wrote:
> Right now we return EINVAL if a process does not have permission to dedupe a
> file. This was an oversight on my part. EPERM gives a true description of
> the nature of our error, and EINVAL is already used for the case that the
> filesystem does not support dedupe.
> 
> Signed-off-by: Mark Fasheh 

Acked-by: David Sterba 

We've been using EINVAL when the request is invalid in the ioctls, eg.
combination of arguments that does not make sense, while EPERM is for
cases when the request is ok but there's still some permission missing,

>   } else if (!(is_admin || (dst_file->f_mode & FMODE_WRITE) ||
>uid_eq(current_fsuid(), dst->i_uid))) {
> - info->status = -EINVAL;
> + info->status = -EPERM;

exactly like this.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix duplicate extents after fsync of file with prealloc extents

2018-05-14 Thread David Sterba
On Wed, May 09, 2018 at 04:01:46PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
...
> Fixes: 471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size after 
> fsync log replay")
> Signed-off-by: Filipe Manana 

Thanks for the excellent and very educational writeup!

Added to 4.17-rc queue as it fixes a regression.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] btrfs: add barriers to btrfs_sync_log before log_commit_wait wakeups

2018-05-14 Thread Nikolay Borisov


On 14.05.2018 15:23, David Sterba wrote:
> Currently the code assumes that there's an implied barrier by the
> sequence of code preceding the wakeup, namely the mutex unlock.
> 
> As Nikolay pointed out:
> 
> I think this is wrong (not your code) but the original assumption that
> the RELEASE semantics provided by mutex_unlock is sufficient.
> According to memory-barriers.txt:
> 
> Section 'LOCK ACQUISITION FUNCTIONS' states:
> 
>  (2) RELEASE operation implication:
> 
>  Memory operations issued before the RELEASE will be completed before the
>  RELEASE operation has completed.
> 
>  Memory operations issued after the RELEASE *may* be completed before the
>  RELEASE operation has completed.
> 
> (I've bolded the may portion)
> 
> The example given there:
> 
> As an example, consider the following:
> 
> *A = a;
> *B = b;
> ACQUIRE
> *C = c;
> *D = d;
> RELEASE
> *E = e;
> *F = f;
> 
> The following sequence of events is acceptable:
> 
> ACQUIRE, {*F,*A}, *E, {*C,*D}, *B, RELEASE
> 
> So if we assume that *C is modifying the flag which the waitqueue is checking,
> and *E is the actual wakeup, then those accesses can be re-ordered...
> 
> IMHO this code should be considered broken...
> ---
> 
> To be on the safe side, add the barriers. The synchronization logic
> around log using the mutexes and several other threads does not make it
> easy to reason for/against the barrier.
> 
> CC: Nikolay Borisov 
> Link: https://lkml.kernel.org/r/6ee068d8-1a69-3728-00d1-d86293d43...@suse.com
> Signed-off-by: David Sterba 
> ---

Apart from what I said initially which prompted introducing this patch I
can't say anything else. I think the fsync code is in dire need of being
rewritten/simplified.

But in so far as the newly introduced barriers are concerned:

Reviewed-by: Nikolay Borisov 
>  fs/btrfs/tree-log.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 43758e30aa7a..fa5b3dc5f4d5 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -3116,8 +3116,11 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
>   mutex_unlock(_root_tree->log_mutex);
>  
>   /*
> -  * The barrier before waitqueue_active is implied by mutex_unlock
> +  * The barrier before waitqueue_active is needed so all the updates
> +  * above are seen by the woken threads. It might not be necessary, but
> +  * proving that seems to be hard.
>*/
> + smp_mb();
>   if (waitqueue_active(_root_tree->log_commit_wait[index2]))
>   wake_up(_root_tree->log_commit_wait[index2]);
>  out:
> @@ -3128,8 +3131,11 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
>   mutex_unlock(>log_mutex);
>  
>   /*
> -  * The barrier before waitqueue_active is implied by mutex_unlock
> +  * The barrier before waitqueue_active is needed so all the updates
> +  * above are seen by the woken threads. It might not be necessary, but
> +  * proving that seems to be hard.
>*/
> + smp_mb();
>   if (waitqueue_active(>log_commit_wait[index1]))
>   wake_up(>log_commit_wait[index1]);
>   return ret;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


verify key failure

2018-05-14 Thread Liu Bo
Hi,

I got another warning of verify_level_key by running btrfs/124 in a loop, I'm 
testing against 4.17-rc3.

Not sure if it's false positive.

[101414.336691] WARNING: CPU: 3 PID: 30194 at fs/btrfs/disk-io.c:455 
btree_read_extent_buffer_pages+0x183/0x220 [btrfs]
[101414.340372] Modules linked in: btrfs(O) xor zstd_decompress zstd_compress 
xxhash zlib_inflate lzo_compress lzo_decompress zlib_deflate raid6_pq dm_flakey 
[last unloaded: xor]
[101414.345713] CPU: 3 PID: 30194 Comm: btrfs Tainted: GW  O  
4.17.0-rc3-liubo+ #35
[101414.348501] RIP: 0010:btree_read_extent_buffer_pages+0x183/0x220 [btrfs]
...
[101414.369713] Call Trace:
[101414.370477]  read_tree_block+0x3d/0x60 [btrfs]
[101414.371946]  read_block_for_search.isra.11+0x19c/0x350 [btrfs]
[101414.373915]  btrfs_search_slot+0x4a0/0xa60 [btrfs]
[101414.375489]  ? trace_hardirqs_on_caller+0x12/0x1c0
[101414.377080]  ? btrfs_lookup_ordered_extent+0x8b/0xd0 [btrfs]
[101414.379007]  btrfs_lookup_csum+0x42/0x130 [btrfs]
[101414.380456]  __btrfs_lookup_bio_sums+0x2fb/0x6a0 [btrfs]
[101414.381554]  btrfs_submit_bio_hook+0xbb/0x180 [btrfs]
[101414.382598]  submit_one_bio+0x57/0x80 [btrfs]
[101414.383509]  submit_extent_page+0xd5/0x1f0 [btrfs]
[101414.384507]  __do_readpage+0x2a6/0x770 [btrfs]
[101414.385449]  ? btrfs_create_repair_bio+0x100/0x100 [btrfs]
[101414.386576]  ? btrfs_direct_IO+0x3a0/0x3a0 [btrfs]
[101414.387569]  ? btrfs_direct_IO+0x3a0/0x3a0 [btrfs]
[101414.388562]  __extent_readpages+0x2e2/0x330 [btrfs]
[101414.389584]  extent_readpages+0x10e/0x1a0 [btrfs]
[101414.390565]  __do_page_cache_readahead+0x283/0x340
[101414.391550]  ? ondemand_readahead+0x207/0x460
[101414.392451]  ondemand_readahead+0x207/0x460
[101414.393353]  relocate_file_extent_cluster+0x364/0x4c0 [btrfs]
[101414.394546]  relocate_block_group+0x5d4/0x6e0 [btrfs]
...
[101414.432616] BTRFS error (device sdb): tree first key mismatch detected, 
bytenr=30523392 key expected=(18446744073709551606, 128, 1120665600) has=(1, 
204, 22020096)

thanks,
-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix xattr loss after power failure

2018-05-14 Thread David Sterba
On Fri, May 11, 2018 at 04:42:42PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> If a file has xattrs, we fsync it, to ensure we clear the flags
> BTRFS_INODE_NEEDS_FULL_SYNC and BTRFS_INODE_COPY_EVERYTHING from its
> inode, the current transaction commits and then we fsync it (without
> either of those bits being set in its inode), we end up not logging
> all its xattrs. This results in deleting all xattrs when replying the
> log after a power failure.
> 
> Trivial reproducer
> 
>   $ mkfs.btrfs -f /dev/sdb
>   $ mount /dev/sdb /mnt
> 
>   $ touch /mnt/foobar
>   $ setfattr -n user.xa -v qwerty /mnt/foobar
>   $ xfs_io -c "fsync" /mnt/foobar
> 
>   $ sync
> 
>   $ xfs_io -c "pwrite -S 0xab 0 64K" /mnt/foobar
>   $ xfs_io -c "fsync" /mnt/foobar
>   
> 
>   $ mount /dev/sdb /mnt
>   $ getfattr --absolute-names --dump /mnt/foobar
>   
>   $
> 
> So fix this by making sure all xattrs are logged if we log a file's inode
> item and neither the flags BTRFS_INODE_NEEDS_FULL_SYNC nor
> BTRFS_INODE_COPY_EVERYTHING were set in the inode.
> 
> Fixes: 36283bf777d9 ("Btrfs: fix fsync xattr loss in the fast fsync path")
> Cc: 
> Signed-off-by: Filipe Manana 

Added to next, thanks, and I'll probably add that to 4.17-rc too.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] btrfs: incremental send, fix BUG when invalid memory access

2018-05-14 Thread David Sterba
On Mon, May 14, 2018 at 12:39:36PM +0100, Filipe Manana wrote:
> On Mon, May 14, 2018 at 3:51 AM, robbieko  wrote:
> > Signed-off-by: Robbie Ko 
> Reviewed-by: Filipe Manana 
> 
> Looks good, I would only change the subject to something like:
> 
> Btrfs: send, fix invalid access to commit roots due to concurrent snapshotting
> 
> David can probably do that when picking this patch.

Patch updated, thank you both.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/2] btrfs: qgroup rescan fixes part 1

2018-05-14 Thread David Sterba
On Mon, May 14, 2018 at 09:38:11AM +0800, Qu Wenruo wrote:
> This patchset is mainly focused on fixing qgroup rescan corruption.
> 
> Since the whole btrfs qgroup is based on the modification between 2
> transactions, it only has correct qgroup delta.
> While if the rescan can't provide a correct result from the very
> beginning, qgroup numbers are corrupted.
> 
> The patchset is fixing two types of qgroup corruption which could happen
> by looping btrfs/017 with some possibility.
> 
> 1) Not accounting tree blocks
>Caused by the fact that qgroup rescan only searches commit root for
>backref, while we're passing current extent root to search.
>Fix it by also passing commit extent root.
> 
> 2) Double accounting tree blocks
>Caused by wrong rescan exit condition.
>Currently qgroup only exit when it can't find any leaves beyond
>rescan progress.
>However it could cause problem when new transaction happens after
>last rescan, and old leaves CoWed to new location, and double
>accounting could happen.
>Fix it by checking and leave qgroup rescan if we have hit last leaf,
>instead of leaving it to next leaf rescan.
> 
> 
> Changelog:
> v2:
>   Remove unused tree_mod_seq_elem for the 1st patch.
>   Fix double unlock in 2nd patch.
>   Thanks Jeff for the update.

V1 patches replaced, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hang/deadlock in 4.14.40 running rsync

2018-05-14 Thread E V
Running an rsync to copy data onto a btrfs filesystem used for
backups. It's appearing to deadlock/hang. The rsync process stops
doing any IO, and no other IO is ongoing against the filesystem. The
rsync is in state D in ps and is unkillable even with kill -9. The
/proc//stack for the rsync:

[] btrfs_async_run_delayed_refs+0x110/0x130 [btrfs]
[] __btrfs_end_transaction+0x1cc/0x2d0 [btrfs]
[] btrfs_dirty_inode+0x65/0xc0 [btrfs]
[] btrfs_setattr+0x90/0x380 [btrfs]
[] notify_change+0x2ce/0x400
[] chown_common+0xfa/0x150
[] SyS_lchown+0x6c/0xb0
[] do_syscall_64+0x55/0x100
[] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[] 0x


dmesg with the btrfs stuff, including the initial mount:

[ 6813.213251] BTRFS info (device sdc): enabling ssd optimizations
[ 6813.213296] BTRFS info (device sdc): using spread ssd allocation scheme
[ 6813.213342] BTRFS info (device sdc): use zstd compression
[ 6813.213373] BTRFS info (device sdc): using free space tree
[ 6813.213404] BTRFS info (device sdc): has skinny extents
[36775.729252] BTRFS info (device sdc): left=1048576, need=2097152, flags=20
[36775.729333] BTRFS info (device sdc): space_info 2 has 1048576 free,
is not full
[36775.729376] BTRFS info (device sdc): space_info total=8388608,
used=3145728, pinned=0, reserved=0, may_use=4194304, readonly=0
[82104.402887] [ cut here ]
[82104.402948] WARNING: CPU: 10 PID: 6015 at fs/btrfs/locking.c:251
btrfs_tree_lock+0x1be/0x1d0 [btrfs]
[82104.402996] Modules linked in: ipmi_si mpt3sas raid_class
scsi_transport_sas dell_rbu nfsv3 nfsv4 nfs fscache ext2 mgag200
i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops ttm drm intel_powerclamp iTCO_wdt coretemp
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
iTCO_vendor_support dcdbas joydev evdev aes_x86_64 crypto_simd cryptd
serio_raw pcspkr glue_helper sg ipmi_devintf ipmi_msghandler
acpi_power_meter lpc_ich i7core_edac mfd_core button nfsd auth_rpcgss
oid_registry nfs_acl lockd grace sunrpc loop autofs4 ext4
crc32c_generic crc16 mbcache jbd2 btrfs xor zstd_decompress
zstd_compress xxhash raid6_pq hid_generic usbhid hid sd_mod
crc32c_intel psmouse i2c_core ehci_pci uhci_hcd ehci_hcd megaraid_sas
usbcore ixgbe usb_common mdio ptp pps_core scsi_mod bnx2
[82104.403404]  [last unloaded: ipmi_si]
[82104.403429] CPU: 10 PID: 6015 Comm: btrfs-transacti Tainted: G
I 4.14.40 #1
[82104.403513] task: 88041df0f040 task.stack: c9000425
[82104.403562] RIP: 0010:btrfs_tree_lock+0x1be/0x1d0 [btrfs]
[82104.403592] RSP: 0018:c90004253848 EFLAGS: 00010246
[82104.403622] RAX: 177f RBX: 88013dbd9c48 RCX: 0001
[82104.403660] RDX: 0001 RSI: 0002 RDI: 88013dbd9c48
[82104.403698] RBP: 0001 R08: 88041df0f040 R09: 8800
[82104.403736] R10: 0040 R11: 88013dbd9c48 R12: 88081c1af470
[82104.403773] R13: 0001 R14: 88081c1af468 R15: 88041eba5000
[82104.403812] FS:  () GS:88082fd4()
knlGS:
[82104.403855] CS:  0010 DS:  ES:  CR0: 80050033
[82104.403886] CR2: 557275467dd0 CR3: 01c09001 CR4: 000206e0
[82104.403923] Call Trace:
[82104.405350]  btrfs_search_slot+0x761/0xa60 [btrfs]
[82104.406762]  btrfs_insert_empty_items+0x62/0xb0 [btrfs]
[82104.408194]  btrfs_insert_item+0x5b/0xc0 [btrfs]
[82104.409595]  btrfs_create_pending_block_groups+0xfb/0x1e0 [btrfs]
[82104.410997]  do_chunk_alloc+0x1e5/0x2a0 [btrfs]
[82104.412404]  find_free_extent+0xcd0/0xf60 [btrfs]
[82104.413801]  btrfs_reserve_extent+0x96/0x1e0 [btrfs]
[82104.415191]  btrfs_alloc_tree_block+0xfb/0x4a0 [btrfs]
[82104.416569]  __btrfs_cow_block+0x127/0x570 [btrfs]
[82104.417913]  btrfs_cow_block+0xdd/0x1c0 [btrfs]
[82104.419247]  btrfs_search_slot+0x227/0xa60 [btrfs]
[82104.420570]  ? btrfs_update_inode_item+0x59/0x100 [btrfs]
[82104.421883]  ? iput+0x72/0x1e0
[82104.423200]  write_one_cache_group.isra.45+0x20/0x90 [btrfs]
[82104.424518]  btrfs_start_dirty_block_groups+0xff/0x420 [btrfs]
[82104.425837]  btrfs_commit_transaction+0x121/0x890 [btrfs]
[82104.427161]  ? start_transaction+0x84/0x410 [btrfs]
[82104.428500]  transaction_kthread+0x184/0x1a0 [btrfs]
[82104.429826]  kthread+0xf7/0x130
[82104.431128]  ? btrfs_cleanup_transaction+0x4e0/0x4e0 [btrfs]
[82104.432420]  ? kthread_create_on_node+0x40/0x40
[82104.433700]  ? SyS_exit_group+0xb/0x10
[82104.434958]  ret_from_fork+0x1f/0x30
[82104.436182] Code: 43 58 85 c0 75 2c f0 ff 43 58 f0 ff 43 44 65 48
8b 04 25 00 4d 01 00 8b 80 10 04 00 00 89 43 40 48 83 c4 28 5b 5d 41
5c 41 5d c3 <0f> 0b e9 60 fe ff ff 0f 0b eb d0 0f 1f 80 00 00 00 00 8b
47 4c
[82104.438766] ---[ end trace 2d8eafbb1dc47717 ]---
[82286.641371] INFO: task btrfs-transacti:6015 blocked for more than
120 seconds.
[82286.642665]   Tainted: GW I 4.14.40 #1
[82286.643866] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this 

[PATCH v3 3/3] btrfs: replace waitqueue_actvie with cond_wake_up

2018-05-14 Thread David Sterba
Use the wrappers and reduce the amount of low-level details about the
waitqueue management.

Signed-off-by: David Sterba 
---
 fs/btrfs/compression.c   |  7 +--
 fs/btrfs/delayed-inode.c |  9 +++--
 fs/btrfs/dev-replace.c   | 10 --
 fs/btrfs/extent-tree.c   |  7 +--
 fs/btrfs/inode.c |  9 +++--
 fs/btrfs/locking.c   | 34 +++---
 fs/btrfs/ordered-data.c  | 14 --
 fs/btrfs/transaction.c   |  7 +--
 fs/btrfs/tree-log.c  | 34 --
 9 files changed, 40 insertions(+), 91 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 5268c9f85ca7..dcebb91e4e0c 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -990,12 +990,7 @@ static void __free_workspace(int type, struct list_head 
*workspace,
btrfs_compress_op[idx]->free_workspace(workspace);
atomic_dec(total_ws);
 wake:
-   /*
-* Make sure counter is updated before we wake up waiters.
-*/
-   smp_mb();
-   if (waitqueue_active(ws_wait))
-   wake_up(ws_wait);
+   cond_wake_up(ws_wait);
 }
 
 static void free_workspace(int type, struct list_head *ws)
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index a8d492dbd3e7..fe6caa7e698b 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -460,13 +460,10 @@ static void finish_one_item(struct btrfs_delayed_root 
*delayed_root)
 {
int seq = atomic_inc_return(_root->items_seq);
 
-   /*
-* atomic_dec_return implies a barrier for waitqueue_active
-*/
+   /* atomic_dec_return implies a barrier */
if ((atomic_dec_return(_root->items) <
-   BTRFS_DELAYED_BACKGROUND || seq % BTRFS_DELAYED_BATCH == 0) &&
-   waitqueue_active(_root->wait))
-   wake_up(_root->wait);
+   BTRFS_DELAYED_BACKGROUND || seq % BTRFS_DELAYED_BATCH == 0))
+   cond_wake_up_nomb(_root->wait);
 }
 
 static void __btrfs_remove_delayed_item(struct btrfs_delayed_item 
*delayed_item)
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index f82be266ba4b..d33ff5b23302 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -916,9 +916,9 @@ void btrfs_dev_replace_clear_lock_blocking(
ASSERT(atomic_read(_replace->read_locks) > 0);
ASSERT(atomic_read(_replace->blocking_readers) > 0);
read_lock(_replace->lock);
-   if (atomic_dec_and_test(_replace->blocking_readers) &&
-   waitqueue_active(_replace->read_lock_wq))
-   wake_up(_replace->read_lock_wq);
+   /* Barrier implied by atomic_dec_and_test */
+   if (atomic_dec_and_test(_replace->blocking_readers))
+   cond_wake_up_nomb(_replace->read_lock_wq);
 }
 
 void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info)
@@ -929,9 +929,7 @@ void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info 
*fs_info)
 void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount)
 {
percpu_counter_sub(_info->bio_counter, amount);
-
-   if (waitqueue_active(_info->replace_wait))
-   wake_up(_info->replace_wait);
+   cond_wake_up_nomb(_info->replace_wait);
 }
 
 void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3871658b6ab1..94fc825a126d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -11094,12 +11094,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, 
struct fstrim_range *range)
 void btrfs_end_write_no_snapshotting(struct btrfs_root *root)
 {
percpu_counter_dec(>subv_writers->counter);
-   /*
-* Make sure counter is updated before we wake up waiters.
-*/
-   smp_mb();
-   if (waitqueue_active(>subv_writers->wait))
-   wake_up(>subv_writers->wait);
+   cond_wake_up(>subv_writers->wait);
 }
 
 int btrfs_start_write_no_snapshotting(struct btrfs_root *root)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d241285a0d2a..cf6f2815ffee 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1156,13 +1156,10 @@ static noinline void async_cow_submit(struct btrfs_work 
*work)
nr_pages = (async_cow->end - async_cow->start + PAGE_SIZE) >>
PAGE_SHIFT;
 
-   /*
-* atomic_sub_return implies a barrier for waitqueue_active
-*/
+   /* atomic_sub_return implies a barrier */
if (atomic_sub_return(nr_pages, _info->async_delalloc_pages) <
-   5 * SZ_1M &&
-   waitqueue_active(_info->async_submit_wait))
-   wake_up(_info->async_submit_wait);
+   5 * SZ_1M)
+   cond_wake_up_nomb(_info->async_submit_wait);
 
if (async_cow->inode)
submit_compressed_extents(async_cow->inode, async_cow);
diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index e4faefac9d16..1da768e5ef75 100644
--- 

[PATCH v3 2/3] btrfs: add barriers to btrfs_sync_log before log_commit_wait wakeups

2018-05-14 Thread David Sterba
Currently the code assumes that there's an implied barrier by the
sequence of code preceding the wakeup, namely the mutex unlock.

As Nikolay pointed out:

I think this is wrong (not your code) but the original assumption that
the RELEASE semantics provided by mutex_unlock is sufficient.
According to memory-barriers.txt:

Section 'LOCK ACQUISITION FUNCTIONS' states:

 (2) RELEASE operation implication:

 Memory operations issued before the RELEASE will be completed before the
 RELEASE operation has completed.

 Memory operations issued after the RELEASE *may* be completed before the
 RELEASE operation has completed.

(I've bolded the may portion)

The example given there:

As an example, consider the following:

*A = a;
*B = b;
ACQUIRE
*C = c;
*D = d;
RELEASE
*E = e;
*F = f;

The following sequence of events is acceptable:

ACQUIRE, {*F,*A}, *E, {*C,*D}, *B, RELEASE

So if we assume that *C is modifying the flag which the waitqueue is checking,
and *E is the actual wakeup, then those accesses can be re-ordered...

IMHO this code should be considered broken...
---

To be on the safe side, add the barriers. The synchronization logic
around log using the mutexes and several other threads does not make it
easy to reason for/against the barrier.

CC: Nikolay Borisov 
Link: https://lkml.kernel.org/r/6ee068d8-1a69-3728-00d1-d86293d43...@suse.com
Signed-off-by: David Sterba 
---
 fs/btrfs/tree-log.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 43758e30aa7a..fa5b3dc5f4d5 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3116,8 +3116,11 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
mutex_unlock(_root_tree->log_mutex);
 
/*
-* The barrier before waitqueue_active is implied by mutex_unlock
+* The barrier before waitqueue_active is needed so all the updates
+* above are seen by the woken threads. It might not be necessary, but
+* proving that seems to be hard.
 */
+   smp_mb();
if (waitqueue_active(_root_tree->log_commit_wait[index2]))
wake_up(_root_tree->log_commit_wait[index2]);
 out:
@@ -3128,8 +3131,11 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
mutex_unlock(>log_mutex);
 
/*
-* The barrier before waitqueue_active is implied by mutex_unlock
+* The barrier before waitqueue_active is needed so all the updates
+* above are seen by the woken threads. It might not be necessary, but
+* proving that seems to be hard.
 */
+   smp_mb();
if (waitqueue_active(>log_commit_wait[index1]))
wake_up(>log_commit_wait[index1]);
return ret;
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/3] Cleanup waitqueue_active and barriers

2018-05-14 Thread David Sterba
Reduce number of standalone barriers before waitqueue_active calls.

Changes v3:
* fix wrong use of the _nomb variant in tree-log.c:btrfs_sync_log,
* update comments to be more specific about the waitqueue_active and
  barrier

Changes v2:
* add 2 barriers to btrfs_sync_log and do not assume they're implied,
  (pointed out by Nikolay)

git://github.com/kdave/btrfs-devel.git cleanup/cond-wake

David Sterba (3):
  btrfs: introduce conditional wakeup helpers
  btrfs: add barriers to btrfs_sync_log before log_commit_wait wakeups
  btrfs: replace waitqueue_actvie with cond_wake_up

 fs/btrfs/compression.c   |  7 +--
 fs/btrfs/ctree.h | 22 ++
 fs/btrfs/delayed-inode.c |  9 +++--
 fs/btrfs/dev-replace.c   | 10 --
 fs/btrfs/extent-tree.c   |  7 +--
 fs/btrfs/inode.c |  9 +++--
 fs/btrfs/locking.c   | 34 +++---
 fs/btrfs/ordered-data.c  | 14 --
 fs/btrfs/transaction.c   |  7 +--
 fs/btrfs/tree-log.c  | 28 
 10 files changed, 62 insertions(+), 85 deletions(-)

-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/3] btrfs: introduce conditional wakeup helpers

2018-05-14 Thread David Sterba
Add convenience wrappers for the waitqueue management that involves
memory barriers to prevent deadlocks. The helpers will let us remove
barriers and the necessary comments in several places.

Reviewed-by: Nikolay Borisov 
Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2771cc56a622..1389942f3be1 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3765,4 +3765,26 @@ static inline int btrfs_is_testing(struct btrfs_fs_info 
*fs_info)
return 0;
 }
 
+static inline void cond_wake_up(struct wait_queue_head *wq)
+{
+   /*
+* This implies a full smp_mb barrier, see comments for
+* waitqueue_active why.
+*/
+   if (wq_has_sleeper(wq))
+   wake_up(wq);
+}
+
+static inline void cond_wake_up_nomb(struct wait_queue_head *wq)
+{
+   /*
+* Special case for conditional wakeup where the barrier required for
+* waitqueue_active is implied by some of the preceding code. Eg. one
+* of such atomic operations (atomic_dec_and_return, ...), or a
+* unlock/lock sequence, etc.
+*/
+   if (waitqueue_active(wq))
+   wake_up(wq);
+}
+
 #endif
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs-progs: check: check symlinks with append/immutable flags

2018-05-14 Thread Su Yue
On 2018/5/14 7:18 PM, Nikolay Borisov wrote:
> 
> 
> On 14.05.2018 13:29, Su Yue wrote:
>> Define new macro I_ERR_ODD_INODE_FLAGS to represents odd inode flags.
>>
>> Symlinks should never have append/immutable flags.
>> While processing inodes, if found a symlink with append/immutable
>> flags, mark the inode record with I_ERR_ODD_INODE_FLAGS.
>>
>> This is for original mode.
>>
>> Signed-off-by: Su Yue 
>> ---
>>  check/main.c  | 7 +++
>>  check/mode-original.h | 1 +
>>  2 files changed, 8 insertions(+)
>>
>> diff --git a/check/main.c b/check/main.c
>> index 68da994f7ae0..f5f52c406046 100644
>> --- a/check/main.c
>> +++ b/check/main.c
>> @@ -576,6 +576,8 @@ static void print_inode_error(struct btrfs_root *root, 
>> struct inode_record *rec)
>>  fprintf(stderr, ", link count wrong");
>>  if (errors & I_ERR_FILE_EXTENT_ORPHAN)
>>  fprintf(stderr, ", orphan file extent");
>> +if (errors & I_ERR_ODD_INODE_FLAGS)
>> +fprintf(stderr, ", odd inode flags");
>>  fprintf(stderr, "\n");
>>  /* Print the orphan extents if needed */
>>  if (errors & I_ERR_FILE_EXTENT_ORPHAN)
>> @@ -805,6 +807,7 @@ static int process_inode_item(struct extent_buffer *eb,
>>  {
>>  struct inode_record *rec;
>>  struct btrfs_inode_item *item;
>> +u64 flags;
>>  
>>  rec = active_node->current;
>>  BUG_ON(rec->ino != key->objectid || rec->refs > 1);
>> @@ -822,6 +825,10 @@ static int process_inode_item(struct extent_buffer *eb,
>>  rec->found_inode_item = 1;
>>  if (rec->nlink == 0)
>>  rec->errors |= I_ERR_NO_ORPHAN_ITEM;
>> +flags = btrfs_inode_flags(eb, item);
>> +if (rec->imode & BTRFS_FT_SYMLINK &&
>> +flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND))
>> +rec->errors = I_ERR_ODD_INODE_FLAGS;
>>  maybe_free_inode_rec(_node->inode_cache, rec);
>>  return 0;
>>  }
>> diff --git a/check/mode-original.h b/check/mode-original.h
>> index 368de692fdd1..13cfa5b9e1b3 100644
>> --- a/check/mode-original.h
>> +++ b/check/mode-original.h
>> @@ -186,6 +186,7 @@ struct file_extent_hole {
>>  #define I_ERR_LINK_COUNT_WRONG  (1 << 13)
>>  #define I_ERR_FILE_EXTENT_ORPHAN(1 << 14)
>>  #define I_ERR_FILE_EXTENT_TOO_LARGE (1 << 15)
>> +#define I_ERR_ODD_INODE_FLAGS   (1 << 16)
> 
> I don't like the name of the flag. How about:
> 
> I_ERR_INVALID_SYMLINK_FLAGS
> 
Before, original mode doesn't check inode flags.
So I'd rather to keep INODE_FLAGS for further use.

As for "ODD" part, it is just inherited from name style of above lines.
I don't like "ODD" too much either.
Anyway, thanks.

Su
>>  
>>  struct inode_record {
>>  struct list_head backrefs;
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



pEpkey.asc
Description: application/pgp-keys


Re: [PATCH 2/3] btrfs-progs: lowmem: check symlinks with append/immutable flags

2018-05-14 Thread Su Yue
Indeed better. Will do it in V2.

Thanks,
Su

On Mon, May 14, 2018 at 7:19 PM Nikolay Borisov  wrote:



> On 14.05.2018 13:29, Su Yue wrote:
> > Symlinks should never have append/immutable flags.
> > While checking inodes, if found a symlink with append/immutable
> > flags, report and record the fatal error.
> >
> > This is for lowmem mode.
> >
> > Signed-off-by: Su Yue 
> > ---
> >  check/mode-lowmem.c | 10 ++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
> > index 9890180d1d3c..61c4e7f23d47 100644
> > --- a/check/mode-lowmem.c
> > +++ b/check/mode-lowmem.c
> > @@ -2274,6 +2274,7 @@ static int check_inode_item(struct btrfs_root
*root, struct btrfs_path *path)
> >   struct btrfs_key last_key;
> >   u64 inode_id;
> >   u32 mode;
> > + u64 flags;
> >   u64 nlink;
> >   u64 nbytes;
> >   u64 isize;
> > @@ -2307,10 +2308,19 @@ static int check_inode_item(struct btrfs_root
*root, struct btrfs_path *path)
> >   isize = btrfs_inode_size(node, ii);
> >   nbytes = btrfs_inode_nbytes(node, ii);
> >   mode = btrfs_inode_mode(node, ii);
> > + flags = btrfs_inode_flags(node, ii);
> >   dir = imode_to_type(mode) == BTRFS_FT_DIR;
> >   nlink = btrfs_inode_nlink(node, ii);
> >   nodatasum = btrfs_inode_flags(node, ii) & BTRFS_INODE_NODATASUM;
> >
> > + if (mode & BTRFS_FT_SYMLINK &&
> > + (flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND))) {
> > + err = FATAL_ERROR;
> > + error(
> > +"symlinks shall never be with immutable/append, root %llu inode item
[%llu] flags %llu may be corrupted",

> How about:

> "symlinks must never have immutable/append flags set, root %llu ino %llu
flag %llu may be corrupted".

> I think printing the ino number should suffice, no need for the "fancy"
[] around the inode number.
> > +   root->objectid, inode_id, flags);
> > + }
> > +
> >   while (1) {
> >   btrfs_item_key_to_cpu(path->nodes[0], _key,
path->slots[0]);
> >   ret = btrfs_next_item(root, path);
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs-progs: check: check symlinks with append/immutable flags

2018-05-14 Thread Su Yue
On 2018/5/14 7:22 PM, Qu Wenruo wrote:
> 
> 
> On 2018年05月14日 18:29, Su Yue wrote:
>> Define new macro I_ERR_ODD_INODE_FLAGS to represents odd inode flags.
>>
>> Symlinks should never have append/immutable flags.
>> While processing inodes, if found a symlink with append/immutable
>> flags, mark the inode record with I_ERR_ODD_INODE_FLAGS.
>>
>> This is for original mode.
>>
>> Signed-off-by: Su Yue 
>> ---
>>  check/main.c  | 7 +++
>>  check/mode-original.h | 1 +
>>  2 files changed, 8 insertions(+)
>>
>> diff --git a/check/main.c b/check/main.c
>> index 68da994f7ae0..f5f52c406046 100644
>> --- a/check/main.c
>> +++ b/check/main.c
>> @@ -576,6 +576,8 @@ static void print_inode_error(struct btrfs_root *root, 
>> struct inode_record *rec)
>>  fprintf(stderr, ", link count wrong");
>>  if (errors & I_ERR_FILE_EXTENT_ORPHAN)
>>  fprintf(stderr, ", orphan file extent");
>> +if (errors & I_ERR_ODD_INODE_FLAGS)
>> +fprintf(stderr, ", odd inode flags");
>>  fprintf(stderr, "\n");
>>  /* Print the orphan extents if needed */
>>  if (errors & I_ERR_FILE_EXTENT_ORPHAN)
>> @@ -805,6 +807,7 @@ static int process_inode_item(struct extent_buffer *eb,
>>  {
>>  struct inode_record *rec;
>>  struct btrfs_inode_item *item;
>> +u64 flags;
>>  
>>  rec = active_node->current;
>>  BUG_ON(rec->ino != key->objectid || rec->refs > 1);
>> @@ -822,6 +825,10 @@ static int process_inode_item(struct extent_buffer *eb,
>>  rec->found_inode_item = 1;
>>  if (rec->nlink == 0)
>>  rec->errors |= I_ERR_NO_ORPHAN_ITEM;
>> +flags = btrfs_inode_flags(eb, item);
>> +if (rec->imode & BTRFS_FT_SYMLINK &&
>> +flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND))
>> +rec->errors = I_ERR_ODD_INODE_FLAGS;
> 
> Shouldn't it be "|=" instead of "=" ?
> 
> Thanks,
> Qu

Oh... Should be "|=".

Thanks,
Su

> 
>>  maybe_free_inode_rec(_node->inode_cache, rec);
>>  return 0;
>>  }
>> diff --git a/check/mode-original.h b/check/mode-original.h
>> index 368de692fdd1..13cfa5b9e1b3 100644
>> --- a/check/mode-original.h
>> +++ b/check/mode-original.h
>> @@ -186,6 +186,7 @@ struct file_extent_hole {
>>  #define I_ERR_LINK_COUNT_WRONG  (1 << 13)
>>  #define I_ERR_FILE_EXTENT_ORPHAN(1 << 14)
>>  #define I_ERR_FILE_EXTENT_TOO_LARGE (1 << 15)
>> +#define I_ERR_ODD_INODE_FLAGS   (1 << 16)
>>  
>>  struct inode_record {
>>  struct list_head backrefs;
>>
> 



pEpkey.asc
Description: application/pgp-keys


Re: Btrfs installation advices

2018-05-14 Thread Austin S. Hemmelgarn

On 2018-05-12 21:58, faurepi...@gmail.com wrote:

Thanks you two very much for your answers.

So if I sum up correctly, I could:

1- use Self-Encrypting Drive (SED), since my drive is a Samsung NVMe 960 
EVO, which is supposed to support SED according to 
http://www.samsung.com/semiconductor/minisite/ssd/support/faqs-nvmessd:

"*Do Samsung NVMe M.2 SSDs have hardware encryption?*
Samsung NVMe SSDs provide internal hardware encryption of all data 
stored on the SSD, including the operating system. Data is decrypted 
through a pre-boot authentication process.
Because all user data is encrypted, private information is protected 
against loss or theft.
Encryption is done by hardware, which provides a safer environment 
without sacrificing performance.


The encryption methods provided by each Samsung NVMe SSD are: AES 
(Advanced Encryption Standard, Class0 SED) TCG/OPAL, and eDrive


Please note that you cannot use more than one encryption method 
simultaneously.



*Do Samsung NVMe M.2 SSDs support TCG Opal?*
TCG Opal is supported by Samsung NVMe SSDs (960EVO / PRO and newer). It 
is an authentication method that employs the protocol specified by the 
Trusted Computing Group (TCG) meaning that you will need to install TCG 
software supplied by a TCG OPAL software development company.


User authentication is done by pre-boot authentication provided by the 
software. For more detailed information and instructions, please contact 
a TCG software company. In addition, TCG/opal can only be enabled / 
disabled by using special security software. "


For the moment, I don't know how to use that self-encryption from linux. 
Could you please give me some tips or links about how you did?


2- now that the full drive is self-encrypted, I can build manually the 
three partitions from a live system: boot with ext(2,3,4), swap with 
swap, and root with btrfs


3- and finally install debian sid in the dedicaced partitions.

Am I right? :)
Yes, that approach will work, assuming you trust Samsung (since they're 
the ones who wrote the code responsible for the encryption, and you 
can't inspect that code yourself).



Le 08/05/2018 à 13:32, Austin S. Hemmelgarn a écrit :

On 2018-05-08 03:50, Rolf Wald wrote:

Hello,

some hints inside

Am 08.05.2018 um 02:22 schrieb faurepi...@gmail.com:

Hi,

I'm curious about btrfs, and maybe considering it for my new laptop
installation (a Lenovo T470).
I was going to install my usual lvm+ext4+full disk encryption setup, 
but

thought I should maybe give a try to btrfs.


Is it possible to meet all these criteria?
- operating system: debian sid
- file system: btrfs
- disk encryption (or at least of sensitives partitions)
- hibernation feature (which implies a swap partition or file, and I've
read btrfs is not a big fan of the latter)


A swap partition is not possible inside or with btrfs alone.

You can choose btrfs filesystem out of the box in debian install, but 
that would mean full-disk-encryption with lvm and btrfs. The extra 
layer lvm doesn't hurt, but you have two layers with many functions 
double, e.g. snapshotting, resize.
Um, this isn't really as much of an issue as you might think.  LVM has 
near zero overhead unless you're actually doing any of that stuff (as 
long as the LV is just a simple linear mapping, it has less than 1% 
more overhead than just using partitions).  The only real caveat here 
is to make _ABSOLUTELY CERTAIN_ that you _DO NOT_ make LVM snapshots 
of _ANY_ BTRFS volumes.  Doing so is a recipe for disaster, and will 
likely eat at least your data, and possibly your children.


The bigger issue is that dm-crypt generally slows down device access, 
which BTRFS is very sensitive to.  Using BTRFS with FDE works, but 
it's slow, so I would only suggest doing it with an SSD (and if you're 
using an SSD, you may be better off getting a TCG Opal compliant 
self-encrypting drive and just using the self-encryption functionality 
instead of FDE).




If yes, how would you suggest me to achieve it?


Yes, there is a solution, and it works for me now several years.
You need to build three partitions, e.g. named boot, swap, root. The 
sizes choose to your need. the boot partition remains unencrypted, 
but the other two partitions are encrypted with cryptsetup (luks) 
separately. Normally there are two passphrases to type in (and to 
remember), but there is an option in the cryptsetup scripts 
(/lib/cryptsetup/scripts) decrypt_derived, which could take the key 
from the root partition to decrypt the swap partition also. The 
filesystems then on the partitions are boot with ext(2,3,4), swap 
with swap and root with btrfs.
This configuration is not reachable with a standard debian 
installation. Debian always choose lvm if you want full encryption. 
You have to do the first steps manually: make partitions, 
cryptsetup(luks) for the partitions swap and root, and open the 
encrypted partitions manually. After that you can install your OS. 
The manual steps you have to make 

Re: [PATCH v3] btrfs: incremental send, fix BUG when invalid memory access

2018-05-14 Thread Filipe Manana
On Mon, May 14, 2018 at 3:51 AM, robbieko  wrote:
> From: Robbie Ko 
>
> [BUG]
> btrfs incremental send BUG happens when creating a snapshot of snapshot
> that is being used by send.
>
> [REASON]
> The problem can happen if while we are doing a send one of the snapshots
> used (parent or send) is snapshotted, because snapshoting implies COWing
> the root of the source subvolume/snapshot.
>
> 1. When doing an incremental send, the send process will get the commit
>roots from the parent and send snapshots, and add references to them
>through extent_buffer_get().
>
> 2. When a snapshot/subvolume is snapshotted, its root node is COWed
>(transaction.c:create_pending_snapshot()).
>
> 3. COWing releases the space used by the node immediately, through:
>
>__btrfs_cow_block()
>--btrfs_free_tree_block()
>btrfs_add_free_space(bytenr of node)
>
> 4. Because send doesn't hold a transaction open, it's possible that
>the transaction used to create the snapshot commits, switches the
>commit root and the old space used by the previous root node gets
>assigned to some other node allocation. Allocation of a new node will
>use the existing extent buffer found in memory, which we previously
>got a reference through extent_buffer_get(), and allow the extent
>buffer's content (pages) to be modified:
>
>btrfs_alloc_tree_block
>--btrfs_reserve_extent
>find_free_extent (get bytenr of old node)
>--btrfs_init_new_buffer (use bytenr of old node)
>btrfs_find_create_tree_block
>--alloc_extent_buffer
>find_extent_buffer (get old node)
>
> 5. So send can access invalid memory content and have unpredictable
>behaviour.
>
> [FIX]
> So we fix the problem by copying the commit roots of the send and
> parent snapshots and use those copies.
>
> CallTrace looks like this:
>  [ cut here ]
>  kernel BUG at fs/btrfs/ctree.c:1861!
>  invalid opcode:  [#1] SMP
>  CPU: 6 PID: 24235 Comm: btrfs Tainted: P   O 3.10.105 #23721
>  88046652d680 ti: 88041b72 task.ti: 88041b72
>  RIP: 0010:[] read_node_slot+0x108/0x110 [btrfs]
>  RSP: 0018:88041b723b68  EFLAGS: 00010246
>  RAX: 88043ca6b000 RBX: 88041b723c50 RCX: 8800
>  RDX: 004c RSI: 880314b133f8 RDI: 880458b24000
>  RBP:  R08: 0001 R09: 88041b723c66
>  R10: 0001 R11: 1000 R12: 8803f3e48890
>  R13: 8803f3e48880 R14: 880466351800 R15: 0001
>  FS:  7f8c321dc8c0() GS:88047fcc()
>  CS:  0010 DS:  ES:  CR0: 80050033
>  R2: 7efd1006d000 CR3: 000213a24000 CR4: 003407e0
>  DR0:  DR1:  DR2: 
>  DR3:  DR6: fffe0ff0 DR7: 0400
>  Stack:
>  88041b723c50 8803f3e48880 8803f3e48890 8803f3e48880
>  880466351800 0001 a08dd9d7 88041b723c50
>  8803f3e48880 88041b723c66 a08dde85 a9ff88042d2c4400
>  Call Trace:
>  [] ? tree_move_down.isra.33+0x27/0x50 [btrfs]
>  [] ? tree_advance+0xb5/0xc0 [btrfs]
>  [] ? btrfs_compare_trees+0x2d4/0x760 [btrfs]
>  [] ? finish_inode_if_needed+0x870/0x870 [btrfs]
>  [] ? btrfs_ioctl_send+0xeda/0x1050 [btrfs]
>  [] ? btrfs_ioctl+0x1e3d/0x33f0 [btrfs]
>  [] ? handle_pte_fault+0x373/0x990
>  [] ? atomic_notifier_call_chain+0x16/0x20
>  [] ? set_task_cpu+0xb6/0x1d0
>  [] ? handle_mm_fault+0x143/0x2a0
>  [] ? __do_page_fault+0x1d0/0x500
>  [] ? check_preempt_curr+0x57/0x90
>  [] ? do_vfs_ioctl+0x4aa/0x990
>  [] ? do_fork+0x113/0x3b0
>  [] ? trace_hardirqs_off_thunk+0x3a/0x6c
>  [] ? SyS_ioctl+0x88/0xa0
>  [] ? system_call_fastpath+0x16/0x1b
>  ---[ end trace 29576629ee80b2e1 ]---
>
> Signed-off-by: Robbie Ko 
Reviewed-by: Filipe Manana 

Looks good, I would only change the subject to something like:

Btrfs: send, fix invalid access to commit roots due to concurrent snapshotting

David can probably do that when picking this patch.

thanks

> ---
> V3: use btrfs_clone_extent_buffer
> V2: fix comments
>  fs/btrfs/ctree.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index b88a79e..8b6b554 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -5460,12 +5460,24 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
> down_read(_info->commit_root_sem);
> left_level = btrfs_header_level(left_root->commit_root);
> left_root_level = left_level;
> -   left_path->nodes[left_level] = left_root->commit_root;
> +   left_path->nodes[left_level] =
> +   btrfs_clone_extent_buffer(left_root->commit_root);
> +   if (!left_path->nodes[left_level]) {
> +   up_read(_info->commit_root_sem);
> +   

Re: [PATCH 2/3] btrfs-progs: lowmem: check symlinks with append/immutable flags

2018-05-14 Thread Qu Wenruo


On 2018年05月14日 18:29, Su Yue wrote:
> Symlinks should never have append/immutable flags.
> While checking inodes, if found a symlink with append/immutable
> flags, report and record the fatal error.
> 
> This is for lowmem mode.
> 
> Signed-off-by: Su Yue 
> ---
>  check/mode-lowmem.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
> index 9890180d1d3c..61c4e7f23d47 100644
> --- a/check/mode-lowmem.c
> +++ b/check/mode-lowmem.c
> @@ -2274,6 +2274,7 @@ static int check_inode_item(struct btrfs_root *root, 
> struct btrfs_path *path)
>   struct btrfs_key last_key;
>   u64 inode_id;
>   u32 mode;
> + u64 flags;
>   u64 nlink;
>   u64 nbytes;
>   u64 isize;
> @@ -2307,10 +2308,19 @@ static int check_inode_item(struct btrfs_root *root, 
> struct btrfs_path *path)
>   isize = btrfs_inode_size(node, ii);
>   nbytes = btrfs_inode_nbytes(node, ii);
>   mode = btrfs_inode_mode(node, ii);
> + flags = btrfs_inode_flags(node, ii);
>   dir = imode_to_type(mode) == BTRFS_FT_DIR;
>   nlink = btrfs_inode_nlink(node, ii);
>   nodatasum = btrfs_inode_flags(node, ii) & BTRFS_INODE_NODATASUM;
>  
> + if (mode & BTRFS_FT_SYMLINK &&
> + (flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND))) {
> + err = FATAL_ERROR;

Where is the FATAL_ERROR defined?
I can't find it anyway on v4.16 branch.

And I assume that FATAL_ERROR is defined to abort check, if so, in that
case the corruption is not that serious, just some unexpected behavior,
not a fatal error.

If not, just ignore this comment.

Thanks,
Qu

> + error(
> +"symlinks shall never be with immutable/append, root %llu inode item [%llu] 
> flags %llu may be corrupted",
> +   root->objectid, inode_id, flags);
> + }
> +
>   while (1) {
>   btrfs_item_key_to_cpu(path->nodes[0], _key, 
> path->slots[0]);
>   ret = btrfs_next_item(root, path);
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/3] btrfs-progs: check: check symlinks with append/immutable flags

2018-05-14 Thread Qu Wenruo


On 2018年05月14日 18:29, Su Yue wrote:
> Define new macro I_ERR_ODD_INODE_FLAGS to represents odd inode flags.
> 
> Symlinks should never have append/immutable flags.
> While processing inodes, if found a symlink with append/immutable
> flags, mark the inode record with I_ERR_ODD_INODE_FLAGS.
> 
> This is for original mode.
> 
> Signed-off-by: Su Yue 
> ---
>  check/main.c  | 7 +++
>  check/mode-original.h | 1 +
>  2 files changed, 8 insertions(+)
> 
> diff --git a/check/main.c b/check/main.c
> index 68da994f7ae0..f5f52c406046 100644
> --- a/check/main.c
> +++ b/check/main.c
> @@ -576,6 +576,8 @@ static void print_inode_error(struct btrfs_root *root, 
> struct inode_record *rec)
>   fprintf(stderr, ", link count wrong");
>   if (errors & I_ERR_FILE_EXTENT_ORPHAN)
>   fprintf(stderr, ", orphan file extent");
> + if (errors & I_ERR_ODD_INODE_FLAGS)
> + fprintf(stderr, ", odd inode flags");
>   fprintf(stderr, "\n");
>   /* Print the orphan extents if needed */
>   if (errors & I_ERR_FILE_EXTENT_ORPHAN)
> @@ -805,6 +807,7 @@ static int process_inode_item(struct extent_buffer *eb,
>  {
>   struct inode_record *rec;
>   struct btrfs_inode_item *item;
> + u64 flags;
>  
>   rec = active_node->current;
>   BUG_ON(rec->ino != key->objectid || rec->refs > 1);
> @@ -822,6 +825,10 @@ static int process_inode_item(struct extent_buffer *eb,
>   rec->found_inode_item = 1;
>   if (rec->nlink == 0)
>   rec->errors |= I_ERR_NO_ORPHAN_ITEM;
> + flags = btrfs_inode_flags(eb, item);
> + if (rec->imode & BTRFS_FT_SYMLINK &&
> + flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND))
> + rec->errors = I_ERR_ODD_INODE_FLAGS;

Shouldn't it be "|=" instead of "=" ?

Thanks,
Qu

>   maybe_free_inode_rec(_node->inode_cache, rec);
>   return 0;
>  }
> diff --git a/check/mode-original.h b/check/mode-original.h
> index 368de692fdd1..13cfa5b9e1b3 100644
> --- a/check/mode-original.h
> +++ b/check/mode-original.h
> @@ -186,6 +186,7 @@ struct file_extent_hole {
>  #define I_ERR_LINK_COUNT_WRONG   (1 << 13)
>  #define I_ERR_FILE_EXTENT_ORPHAN (1 << 14)
>  #define I_ERR_FILE_EXTENT_TOO_LARGE  (1 << 15)
> +#define I_ERR_ODD_INODE_FLAGS(1 << 16)
>  
>  struct inode_record {
>   struct list_head backrefs;
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/3] btrfs-progs: check: check symlinks with append/immutable flags

2018-05-14 Thread Nikolay Borisov


On 14.05.2018 13:29, Su Yue wrote:
> Define new macro I_ERR_ODD_INODE_FLAGS to represents odd inode flags.
> 
> Symlinks should never have append/immutable flags.
> While processing inodes, if found a symlink with append/immutable
> flags, mark the inode record with I_ERR_ODD_INODE_FLAGS.
> 
> This is for original mode.
> 
> Signed-off-by: Su Yue 
> ---
>  check/main.c  | 7 +++
>  check/mode-original.h | 1 +
>  2 files changed, 8 insertions(+)
> 
> diff --git a/check/main.c b/check/main.c
> index 68da994f7ae0..f5f52c406046 100644
> --- a/check/main.c
> +++ b/check/main.c
> @@ -576,6 +576,8 @@ static void print_inode_error(struct btrfs_root *root, 
> struct inode_record *rec)
>   fprintf(stderr, ", link count wrong");
>   if (errors & I_ERR_FILE_EXTENT_ORPHAN)
>   fprintf(stderr, ", orphan file extent");
> + if (errors & I_ERR_ODD_INODE_FLAGS)
> + fprintf(stderr, ", odd inode flags");
>   fprintf(stderr, "\n");
>   /* Print the orphan extents if needed */
>   if (errors & I_ERR_FILE_EXTENT_ORPHAN)
> @@ -805,6 +807,7 @@ static int process_inode_item(struct extent_buffer *eb,
>  {
>   struct inode_record *rec;
>   struct btrfs_inode_item *item;
> + u64 flags;
>  
>   rec = active_node->current;
>   BUG_ON(rec->ino != key->objectid || rec->refs > 1);
> @@ -822,6 +825,10 @@ static int process_inode_item(struct extent_buffer *eb,
>   rec->found_inode_item = 1;
>   if (rec->nlink == 0)
>   rec->errors |= I_ERR_NO_ORPHAN_ITEM;
> + flags = btrfs_inode_flags(eb, item);
> + if (rec->imode & BTRFS_FT_SYMLINK &&
> + flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND))
> + rec->errors = I_ERR_ODD_INODE_FLAGS;
>   maybe_free_inode_rec(_node->inode_cache, rec);
>   return 0;
>  }
> diff --git a/check/mode-original.h b/check/mode-original.h
> index 368de692fdd1..13cfa5b9e1b3 100644
> --- a/check/mode-original.h
> +++ b/check/mode-original.h
> @@ -186,6 +186,7 @@ struct file_extent_hole {
>  #define I_ERR_LINK_COUNT_WRONG   (1 << 13)
>  #define I_ERR_FILE_EXTENT_ORPHAN (1 << 14)
>  #define I_ERR_FILE_EXTENT_TOO_LARGE  (1 << 15)
> +#define I_ERR_ODD_INODE_FLAGS(1 << 16)

I don't like the name of the flag. How about:

I_ERR_INVALID_SYMLINK_FLAGS

>  
>  struct inode_record {
>   struct list_head backrefs;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] btrfs-progs: lowmem: check symlinks with append/immutable flags

2018-05-14 Thread Nikolay Borisov


On 14.05.2018 13:29, Su Yue wrote:
> Symlinks should never have append/immutable flags.
> While checking inodes, if found a symlink with append/immutable
> flags, report and record the fatal error.
> 
> This is for lowmem mode.
> 
> Signed-off-by: Su Yue 
> ---
>  check/mode-lowmem.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
> index 9890180d1d3c..61c4e7f23d47 100644
> --- a/check/mode-lowmem.c
> +++ b/check/mode-lowmem.c
> @@ -2274,6 +2274,7 @@ static int check_inode_item(struct btrfs_root *root, 
> struct btrfs_path *path)
>   struct btrfs_key last_key;
>   u64 inode_id;
>   u32 mode;
> + u64 flags;
>   u64 nlink;
>   u64 nbytes;
>   u64 isize;
> @@ -2307,10 +2308,19 @@ static int check_inode_item(struct btrfs_root *root, 
> struct btrfs_path *path)
>   isize = btrfs_inode_size(node, ii);
>   nbytes = btrfs_inode_nbytes(node, ii);
>   mode = btrfs_inode_mode(node, ii);
> + flags = btrfs_inode_flags(node, ii);
>   dir = imode_to_type(mode) == BTRFS_FT_DIR;
>   nlink = btrfs_inode_nlink(node, ii);
>   nodatasum = btrfs_inode_flags(node, ii) & BTRFS_INODE_NODATASUM;
>  
> + if (mode & BTRFS_FT_SYMLINK &&
> + (flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND))) {
> + err = FATAL_ERROR;
> + error(
> +"symlinks shall never be with immutable/append, root %llu inode item [%llu] 
> flags %llu may be corrupted",

How about: 

"symlinks must never have immutable/append flags set, root %llu ino %llu flag 
%llu may be corrupted". 

I think printing the ino number should suffice, no need for the "fancy" [] 
around the inode number. 
> +   root->objectid, inode_id, flags);
> + }
> +
>   while (1) {
>   btrfs_item_key_to_cpu(path->nodes[0], _key, 
> path->slots[0]);
>   ret = btrfs_next_item(root, path);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/9] btrfs-progs: btrfs-corrupt-block: Change -I flag parameter format

2018-05-14 Thread Nikolay Borisov
Presently, if we want to corrupt a particular item we need to call
corrupt block like so:

btrfs-corrupt-block -I -K  -r "numeric rootid"

This is problematic because the -K option not only sets the key to the
item that the -I option should corrupt but it also signals that we
want to corrupt the key itself. This way we cannot really use the
program with the following semantics:

"Corrupt only the item which corresponds to this key but leave the
key intact"

This patch aims to fix this by changing the format of the -I flag. So
if one wants to corrupt only an item (and leave the key intact) they
should use:

  btrfs-corrupt-block -r  -I 

In addition to this problem, -K doesn't really understand the the "-r"
argument, so when using it in conjunction with -I to corrupt an item,
not part of the root tree, it will always produce an error:

  Couldn't find the key to corrupt

Signed-off-by: Nikolay Borisov 
---
 btrfs-corrupt-block.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
index 39a611d89d55..4cc3528df105 100644
--- a/btrfs-corrupt-block.c
+++ b/btrfs-corrupt-block.c
@@ -116,7 +116,7 @@ static void print_usage(int ret)
printf("\t-m   The metadata block to corrupt (must also specify -f for 
the field to corrupt)\n");
printf("\t-K   The key to corrupt in the format ,, (must 
also specify -f for the field)\n");
printf("\t-f   The field in the item to corrupt\n");
-   printf("\t-I   An item to corrupt (must also specify the field to 
corrupt and a root+key for the item)\n");
+   printf("\t-I  Corrupt an item corresponding to the passed 
key triplet (must also specify the field to corrupt and root for the item)\n");
printf("\t-D   Corrupt a dir item, must specify key and field\n");
printf("\t-d   Delete this item (must specify -K)\n");
printf("\t-r   Operate on this root (only works with -d)\n");
@@ -1165,7 +1165,7 @@ int main(int argc, char **argv)
{ NULL, 0, NULL, 0 }
};
 
-   c = getopt_long(argc, argv, "l:c:b:eEkuUi:f:x:m:K:IDdr:C:",
+   c = getopt_long(argc, argv, "l:c:b:eEkuUi:f:x:m:K:I:Ddr:C:",
long_options, NULL);
if (c < 0)
break;
@@ -1222,6 +1222,7 @@ int main(int argc, char **argv)
break;
case 'I':
corrupt_item = 1;
+   parse_key(, , 
);
break;
case 'd':
delete = 1;
@@ -1356,6 +1357,7 @@ int main(int argc, char **argv)
target = open_root(root->fs_info, root_objectid);
 
ret = corrupt_btrfs_item(target, , field);
+   goto out_close;
}
if (delete) {
struct btrfs_root *target = root;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/9] btrfs-progs: btrfs-corrupt-block: Factor out key parsing function

2018-05-14 Thread Nikolay Borisov
Currently passing a key with -K handling is open coded. I intend on
changing the interface a bit to make the program more usable. To aid
in this factor out common code which parses a triplet of the
"u64,u8,u64" format, corresponding to a btrfs key. No functional
changes.

Signed-off-by: Nikolay Borisov 
---
 btrfs-corrupt-block.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
index 0018b6c9662d..39a611d89d55 100644
--- a/btrfs-corrupt-block.c
+++ b/btrfs-corrupt-block.c
@@ -1081,6 +1081,16 @@ static int find_chunk_offset(struct btrfs_root *root,
 
 }
 
+static void parse_key(u64 *objectid, u8 *type, u64 *offset)
+{
+
+   int ret = sscanf(optarg, "%llu,%hhu,%llu", objectid, type, offset);
+   if (ret != 3) {
+   fprintf(stderr, "Error parsing key %d\n", errno);
+   print_usage(1);
+   }
+}
+
 static struct btrfs_root *open_root(struct btrfs_fs_info *fs_info,
u64 root_objectid)
 {
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/9] btrfs-progs: btrfs-corrupt-block: Convert -K flag argument handling to common function

2018-05-14 Thread Nikolay Borisov
There is now a common function used to parse btrfs keys triplets so
use that one. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 btrfs-corrupt-block.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
index 4cc3528df105..7a5513c455e6 100644
--- a/btrfs-corrupt-block.c
+++ b/btrfs-corrupt-block.c
@@ -114,7 +114,7 @@ static void print_usage(int ret)
printf("\t-i   The inode item to corrupt (must also specify the field 
to corrupt)\n");
printf("\t-x   The file extent item to corrupt (must also specify -i 
for the inode and -f for the field to corrupt)\n");
printf("\t-m   The metadata block to corrupt (must also specify -f for 
the field to corrupt)\n");
-   printf("\t-K   The key to corrupt in the format ,, (must 
also specify -f for the field)\n");
+   printf("\t-K  Corrupt the given key (must also specify -f 
for the field)\n");
printf("\t-f   The field in the item to corrupt\n");
printf("\t-I  Corrupt an item corresponding to the passed 
key triplet (must also specify the field to corrupt and root for the item)\n");
printf("\t-D   Corrupt a dir item, must specify key and field\n");
@@ -1129,6 +1129,7 @@ int main(int argc, char **argv)
int corrupt_item = 0;
int corrupt_di = 0;
int delete = 0;
+   int should_corrupt_key = 0;
u64 metadata_block = 0;
u64 inode = 0;
u64 file_extent = (u64)-1;
@@ -1207,15 +1208,8 @@ int main(int argc, char **argv)
metadata_block = arg_strtou64(optarg);
break;
case 'K':
-   ret = sscanf(optarg, "%llu,%u,%llu",
-,
-(unsigned int *),
-);
-   if (ret != 3) {
-   fprintf(stderr, "error reading key "
-   "%d\n", errno);
-   print_usage(1);
-   }
+   should_corrupt_key = 1;
+   parse_key(, , 
);
break;
case 'D':
corrupt_di = 1;
@@ -1370,7 +1364,7 @@ int main(int argc, char **argv)
ret = delete_item(target, );
goto out_close;
}
-   if (key.objectid || key.offset || key.type) {
+   if (should_corrupt_key) {
if (*field == 0)
print_usage(1);
ret = corrupt_key(root, , field);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/9] Overhaul btrfs-corrupt-block

2018-05-14 Thread Nikolay Borisov
btrfs-corrupt-block is a very useful tool albeit very neglected. This series 
aims to give it much needed attention. There is a mix of code-improvements and 
bug fixes. Code improvement mainly consists of factoring our duplicated code 
(Patch 1,3,6) and improving the interface of some options (4,5,8,9). The
recurring topic here is that instead of having to pass 

btrfs-corrupt-block -K  -f  - 

make each corrupting option to take the key as an argument to it. Say we want
to corrupt an item field (-I options) we now do: 
 
 btrfs-corrupt-block -I  -r  /dev/blah 

 instead of 

 btrfs-corrupt-block -I -K  -r  /dev/blah 

Some patches also incorporate fixes for bugs (patch 2,7 and 9) I found during 
my testing. Those usability improvements are needed to enable me to produce 
tests for the pending free space tree support in userspace. 


Nikolay Borisov (9):
  btrfs-progs: btrfs-corrupt-block: Factor out specific-root code
  btrfs-progs: btrfs-corrupt-block: Correctly handle -r when passing -I
  btrfs-progs: btrfs-corrupt-block: Factor out key parsing function
  btrfs-progs: btrfs-corrupt-block: Change -I flag parameter format
  btrfs-progs: btrfs-corrupt-block: Convert -K flag argument handling to
common function
  btrfs-progs: btrfs-corrupt-block: Factor out common "-r" handling code
  btrfs-progs: btrfs-corrupt-block: Add support for handling specific
root when using -K option
  btrfs-progs: btrfs-corrupt-block: Change format of -d option
  btrfs-progs: btrfs-corrupt-block: Fix -D option

 btrfs-corrupt-block.c | 94 +++
 1 file changed, 57 insertions(+), 37 deletions(-)

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/9] btrfs-progs: btrfs-corrupt-block: Factor out common "-r" handling code

2018-05-14 Thread Nikolay Borisov
Since more and more of the "corrupt XXX" options are going to support
combination with -r option, let's extract the common code needed for
this. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 btrfs-corrupt-block.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
index 7a5513c455e6..8bbfe1cdb855 100644
--- a/btrfs-corrupt-block.c
+++ b/btrfs-corrupt-block.c
@@ -1114,7 +1114,7 @@ int main(int argc, char **argv)
 {
struct cache_tree root_cache;
struct btrfs_key key;
-   struct btrfs_root *root;
+   struct btrfs_root *root, *target_root;
char *dev;
/* chunk offset can be 0,so change to (u64)-1 */
u64 logical = (u64)-1;
@@ -1245,6 +1245,10 @@ int main(int argc, char **argv)
fprintf(stderr, "Open ctree failed\n");
exit(1);
}
+   target_root = root;
+   if (root_objectid)
+   target_root = open_root(root->fs_info, root_objectid);
+
if (extent_rec) {
struct btrfs_trans_handle *trans;
 
@@ -1342,26 +1346,19 @@ int main(int argc, char **argv)
goto out_close;
}
if (corrupt_item) {
-   struct btrfs_root *target;
if (!key.objectid)
print_usage(1);
if (!root_objectid)
print_usage(1);
 
-   target = open_root(root->fs_info, root_objectid);
-
-   ret = corrupt_btrfs_item(target, , field);
+   ret = corrupt_btrfs_item(target_root, , field);
goto out_close;
}
if (delete) {
-   struct btrfs_root *target = root;
-
if (!key.objectid)
print_usage(1);
-   if (root_objectid)
-   target = open_root(root->fs_info, root_objectid);
 
-   ret = delete_item(target, );
+   ret = delete_item(target_root, );
goto out_close;
}
if (should_corrupt_key) {
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/9] btrfs-progs: btrfs-corrupt-block: Change format of -d option

2018-05-14 Thread Nikolay Borisov
Currently if we want to delete an item we need to invoke corrupt block
like so:

  btrfs-corrupt-block [-r ] -K  -d

Instead, this patch converts the format to:

  btrfs-corrupt-block [-r ] -d 

Signed-off-by: Nikolay Borisov 
---
 btrfs-corrupt-block.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
index 74928eae839e..10a0b2d13b97 100644
--- a/btrfs-corrupt-block.c
+++ b/btrfs-corrupt-block.c
@@ -118,7 +118,7 @@ static void print_usage(int ret)
printf("\t-f   The field in the item to corrupt\n");
printf("\t-I  Corrupt an item corresponding to the passed 
key triplet (must also specify the field to corrupt and root for the item)\n");
printf("\t-D   Corrupt a dir item, must specify key and field\n");
-   printf("\t-d   Delete this item (must specify -K)\n");
+   printf("\t-d  Delete item corresponding to passed key 
triplet\n");
printf("\t-r   Operate on this root (only works with -d)\n");
printf("\t-C   Delete a csum for the specified bytenr.  When used with 
-b it'll delete that many bytes, otherwise it's just sectorsize\n");
exit(ret);
@@ -1165,7 +1165,7 @@ int main(int argc, char **argv)
{ NULL, 0, NULL, 0 }
};
 
-   c = getopt_long(argc, argv, "l:c:b:eEkuUi:f:x:m:K:I:Ddr:C:",
+   c = getopt_long(argc, argv, "l:c:b:eEkuUi:f:x:m:K:I:Dd:r:C:",
long_options, NULL);
if (c < 0)
break;
@@ -1219,6 +1219,7 @@ int main(int argc, char **argv)
break;
case 'd':
delete = 1;
+   parse_key(, , 
);
break;
case 'r':
root_objectid = arg_strtou64(optarg);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/9] btrfs-progs: btrfs-corrupt-block: Correctly handle -r when passing -I

2018-05-14 Thread Nikolay Borisov
The documentation for the -I option (corrupt an item) states:

  An item to corrupt (must also specify the field to corrupt and a root+key for 
the item)

The code on the other hand doesn't check whether -r is in fact passed,
and even if it is it's not handled at all. This means presently -I
is possible to corrupt items only in the root tree. Fix this by
correctly checking -r is passed and fail otherwise and passing the
correct root to corrupt_btrfs_item.

Signed-off-by: Nikolay Borisov 
---
 btrfs-corrupt-block.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
index ab6ca0a1e90a..0018b6c9662d 100644
--- a/btrfs-corrupt-block.c
+++ b/btrfs-corrupt-block.c
@@ -1337,9 +1337,15 @@ int main(int argc, char **argv)
goto out_close;
}
if (corrupt_item) {
+   struct btrfs_root *target;
if (!key.objectid)
print_usage(1);
-   ret = corrupt_btrfs_item(root, , field);
+   if (!root_objectid)
+   print_usage(1);
+
+   target = open_root(root->fs_info, root_objectid);
+
+   ret = corrupt_btrfs_item(target, , field);
}
if (delete) {
struct btrfs_root *target = root;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/9] btrfs-progs: btrfs-corrupt-block: Fix -D option

2018-05-14 Thread Nikolay Borisov
Currently the -D option is essentially defunct since it's the root,
where we are going to corrupt a dir item is always set to the tree
root. Fix this by passing the root from the "-r" option. Aditionally
convert the interface for this option to the new format. So if one
wants to corrupt a dir item in the default fs tree, they should now
invoke:

 btrfs-corrupt-block -r 5 -D  -f name

Signed-off-by: Nikolay Borisov 
---
 btrfs-corrupt-block.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
index 10a0b2d13b97..37f850b5b239 100644
--- a/btrfs-corrupt-block.c
+++ b/btrfs-corrupt-block.c
@@ -117,7 +117,7 @@ static void print_usage(int ret)
printf("\t-K  Corrupt the given key (must also specify -f 
for the field and optionally -r for the root)\n");
printf("\t-f   The field in the item to corrupt\n");
printf("\t-I  Corrupt an item corresponding to the passed 
key triplet (must also specify the field to corrupt and root for the item)\n");
-   printf("\t-D   Corrupt a dir item, must specify key and field\n");
+   printf("\t-D  Corrupt a dir item corresponding to the 
passed key triplet, must also specify a field\n");
printf("\t-d  Delete item corresponding to passed key 
triplet\n");
printf("\t-r   Operate on this root (only works with -d)\n");
printf("\t-C   Delete a csum for the specified bytenr.  When used with 
-b it'll delete that many bytes, otherwise it's just sectorsize\n");
@@ -1165,7 +1165,7 @@ int main(int argc, char **argv)
{ NULL, 0, NULL, 0 }
};
 
-   c = getopt_long(argc, argv, "l:c:b:eEkuUi:f:x:m:K:I:Dd:r:C:",
+   c = getopt_long(argc, argv, "l:c:b:eEkuUi:f:x:m:K:I:D:d:r:C:",
long_options, NULL);
if (c < 0)
break;
@@ -1212,6 +1212,7 @@ int main(int argc, char **argv)
break;
case 'D':
corrupt_di = 1;
+   parse_key(, , 
);
break;
case 'I':
corrupt_item = 1;
@@ -1338,7 +1339,7 @@ int main(int argc, char **argv)
if (corrupt_di) {
if (!key.objectid || *field == 0)
print_usage(1);
-   ret = corrupt_dir_item(root, , field);
+   ret = corrupt_dir_item(target_root, , field);
goto out_close;
}
if (csum_bytenr) {
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/9] btrfs-progs: btrfs-corrupt-block: Factor out specific-root code

2018-05-14 Thread Nikolay Borisov
Some options operate on a specific root so let's extract the code which
deals with this. No functional change.

Signed-off-by: Nikolay Borisov 
---
 btrfs-corrupt-block.c | 37 +++--
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
index da0ec8c51e5a..ab6ca0a1e90a 100644
--- a/btrfs-corrupt-block.c
+++ b/btrfs-corrupt-block.c
@@ -1080,6 +1080,26 @@ static int find_chunk_offset(struct btrfs_root *root,
return ret;
 
 }
+
+static struct btrfs_root *open_root(struct btrfs_fs_info *fs_info,
+   u64 root_objectid)
+{
+
+   struct btrfs_key root_key;
+   struct btrfs_root *root;
+
+   root_key.objectid = root_objectid;
+   root_key.type = BTRFS_ROOT_ITEM_KEY;
+   root_key.offset = (u64)-1;
+
+   root = btrfs_read_fs_root(fs_info, _key);
+   if (IS_ERR(root)) {
+   fprintf(stderr, "Couldn't find root %llu\n", root_objectid);
+   print_usage(1);
+   }
+
+   return root;
+}
 int main(int argc, char **argv)
 {
struct cache_tree root_cache;
@@ -1326,20 +1346,9 @@ int main(int argc, char **argv)
 
if (!key.objectid)
print_usage(1);
-   if (root_objectid) {
-   struct btrfs_key root_key;
-
-   root_key.objectid = root_objectid;
-   root_key.type = BTRFS_ROOT_ITEM_KEY;
-   root_key.offset = (u64)-1;
-
-   target = btrfs_read_fs_root(root->fs_info, _key);
-   if (IS_ERR(target)) {
-   fprintf(stderr, "Couldn't find root %llu\n",
-   (unsigned long long)root_objectid);
-   print_usage(1);
-   }
-   }
+   if (root_objectid)
+   target = open_root(root->fs_info, root_objectid);
+
ret = delete_item(target, );
goto out_close;
}
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass

2018-05-14 Thread Qu Wenruo


On 2018年05月14日 18:29, james harvey wrote:
> On Mon, May 14, 2018 at 2:36 AM, Qu Wenruo  wrote:
>> OK, I could reproduce it now.
>>
>> Just mount with -o nodatasum, then create a file.
>> Remount with compress-force=lzo, then write something.
>>
>> So at least btrfs should disallow such thing.
>>
>> Thanks,
>> Qu
> 
> Would the corrupted dump and correct one of the file, and kernel with
> kasan output still help?  Or, with what you reproduced, do you have
> what you need?

The dumps are good enough, kasan will be a little over killed.

For my reproduced case, the data is all good, thus unable to reproduce
the wildly corrupted kernel memory.

I could try corrupt them, but I'm not sure if the same symptom can be
reproduced, thus binary good/bad dump still helps.
The heavy lifting kasan can be done in my environment.

> 
> 
> On Mon, May 14, 2018 at 1:30 AM, Qu Wenruo  wrote:
>> So there is something wrong that btrfs allows compressed data to be
>> generated for such file.
>> (Could not reproduce the same behavior with 4.16 kernel, could such
>> problem happens in older kernels? Or just get fixed recently?)
>>
>> Then some corruption screwed up the compressed data, and when we
>> decompress, the kernel is screwed up.
> 
> In this thread, Chris Murphy noted systemd sets the "C" attribute, and
> discussed what sounds to me like what happened here: "Usually nocow
> also means no compression. But in the archives is a thread where I
> found that compression can be forced on nocow if the file is submitted
> for defragmentation

Oh, defrag is making things more complex here.

But at least the kernel patch should also address that case.

> and either the volume is mounted with compression
> or the file has inherited chattr +c (I don't remember which or
> possibly both). And systemd does submit rotated logs for
> defragmentation."
> 
> 
> 
> (If you don't need the dumps and kernel kasan output, you can skip the
> rest of this reply)
> 
> 
> 
>> Yep, even the last case it still looks like that it's kernel memory get
>> corrupted.
>>
>> From the thread, since you have already located the corrupted mirror,
>> would you please provide the corrupted dump along with correct one?
>>
>> It would help a lot for us to under stand what's going on.
> 
> Absolutely.  I'm not sure how to best get you that, though.

It's great you'd like to help.
Considering you have experienced btrfs-map-block usage, it should be a
piece of cake.
And considering you're an Arch user, it won't really take you much time.
(Yeah, Arch users rock!)

> 
> "filefrag -v" on one of the files can be seen here:
> https://bugzilla.kernel.org/attachment.cgi?id=275953
> 
> It lists 58 fragments.

That filefrag output is less useful here, the main problem is it only
provides the uncompressed extent size.
That's why I asked for debug-tree dump.

And now those debug tree dumps would shine.

I'll take the 72267 dump as an example.

--
item 182 key (72267 EXTENT_DATA *0* ) itemoff 6375 itemsize 53
generation 41083 type 1 (regular)
extent data disk byte *2625134592* nr *4096*
extent data offset 0 nr 131072 ram 131072
extent compression 2 (lzo)
--
Important numbers are surrounded by '*'.

The "0" means the offset inside the file.
"2625134592" means the logical address that compressed extent lies.
"4096" means the compressed size of that extent.

So to dump it, passing all needed numbers to btrfs-map-logical, ins this
example you just need:

# btrfs-map-logical -l 2625134592 -b 4096 

And would get the result like:
mirror 1 logical 1104150528 physical 9437184 device
/dev/mapper/test-scratch2
mirror 2 logical 1104150528 physical 1095761920 device
/dev/mapper/test-scratch1

Then grab them using dd.

> 
> filefrag lists the ending offsets and length based on the uncompressed
> sizes.  filefrag doesn't account for the compression.

Now with debug-tree dump, you have the length you need. :)

> 
> So, in this thread, I compared the first 4k of fragments 0-57 on each
> disk and found all the corruption was on disk 1.  (And the entire
> 207*4096 bytes on fragment 58.)  Grabbing more than 4k of each
> fragment brings in data from other files.

And indeed, they are all 4K sized from the dump.

>  So, I might have compared
> all of the data (fragments 0-57 are 128k uncompressed, and at least
> fragment 0 uncompressed does lzop down to about 2k, so perhaps all the
> other 128k fragments compress to within 4k, but maybe not) but this
> might not have grabbed all the data.

Any bad/good pair is enough.

> 
> I could give you (56) 128k, (1) 68k, and (1) 828k fragments, but
> they'd include extra data not involved, so you'd have to find a way to
> use them, and without the metadata saying how many bytes of each
> fragment to use, it might not be easy to put together.  (Maybe
> chopping off all the trailing 0's in each fragment would do the
> trick.)  Maybe 

Re: [RFC][PATCH] btrfs: take the last remnants of ->d_fsdata use out

2018-05-14 Thread David Sterba
On Sun, May 13, 2018 at 07:03:18PM +0100, Al Viro wrote:
> [spotted while going through ->d_fsdata handling around d_splice_alias();
> don't really care which tree that goes through]
> 
> The only thing even looking at ->d_fsdata in there (since 2012)
> had been kfree(dentry->d_fsdata) in btrfs_dentry_delete().  Which,
> incidentally, is all btrfs_dentry_delete() does.

Right, it's a leftover after the DCACHE_NEED_LOOKUP removal. I'll add it
to btrfs tree, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set

2018-05-14 Thread Qu Wenruo


On 2018年05月14日 17:30, james harvey wrote:
> On Mon, May 14, 2018 at 4:20 AM, Roman Mamedov  wrote:
>> On Mon, 14 May 2018 11:10:34 +0300
>> Nikolay Borisov  wrote:
>>
>>> But if we have mounted the fs with FORCE_COMPRESS shouldn't we disregard
>>> the inode flags, presumably the admin knows what he is doing?
>>
>> Please don't. Personally I always assumed chattr +C would prevent both CoW 
>> and
>> compression, and used that as a way to override volume-wide compress-force 
>> for
>> a particular folder. Now that it turns out this wasn't working, the patch
>> would fix it to behave in line with prior expectations.
> 
> "the patch would fix it to behave" might be missing one part.  Qu, am
> I right (best guess, not familiar with btrfs internals enough) this
> patch will prevent new files and files of size 0 from getting
> compression with NODATASUM or NODATACOW,

Yep.

> but would leave existing
> compressed files (that shouldn't have been compressed) as they are?

Yep again, unfortunately.

> After getting an error from btrfs check,

BTW for this particular case, --mode=lowmem would provide more detailed
info.

> presumably, user would need
> to copy the file(s) to a temporary and copy it back over itself, to
> get rid of the unwanted compression?

Unfortunately, reading (copying) data from the original fs is no longer
that safe (just like your case).

But if reading works fine, then copying things back will trigger normal
write routine, thus it would be plain text extent.

And if possible, please don't just remove those offending files (yet).
Your binary dump would help a lot locating the root case.

Thanks,
Qu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass

2018-05-14 Thread james harvey
On Mon, May 14, 2018 at 2:36 AM, Qu Wenruo  wrote:
> OK, I could reproduce it now.
>
> Just mount with -o nodatasum, then create a file.
> Remount with compress-force=lzo, then write something.
>
> So at least btrfs should disallow such thing.
>
> Thanks,
> Qu

Would the corrupted dump and correct one of the file, and kernel with
kasan output still help?  Or, with what you reproduced, do you have
what you need?


On Mon, May 14, 2018 at 1:30 AM, Qu Wenruo  wrote:
> So there is something wrong that btrfs allows compressed data to be
> generated for such file.
> (Could not reproduce the same behavior with 4.16 kernel, could such
> problem happens in older kernels? Or just get fixed recently?)
>
> Then some corruption screwed up the compressed data, and when we
> decompress, the kernel is screwed up.

In this thread, Chris Murphy noted systemd sets the "C" attribute, and
discussed what sounds to me like what happened here: "Usually nocow
also means no compression. But in the archives is a thread where I
found that compression can be forced on nocow if the file is submitted
for defragmentation and either the volume is mounted with compression
or the file has inherited chattr +c (I don't remember which or
possibly both). And systemd does submit rotated logs for
defragmentation."



(If you don't need the dumps and kernel kasan output, you can skip the
rest of this reply)



> Yep, even the last case it still looks like that it's kernel memory get
> corrupted.
>
> From the thread, since you have already located the corrupted mirror,
> would you please provide the corrupted dump along with correct one?
>
> It would help a lot for us to under stand what's going on.

Absolutely.  I'm not sure how to best get you that, though.

"filefrag -v" on one of the files can be seen here:
https://bugzilla.kernel.org/attachment.cgi?id=275953

It lists 58 fragments.

filefrag lists the ending offsets and length based on the uncompressed
sizes.  filefrag doesn't account for the compression.

So, in this thread, I compared the first 4k of fragments 0-57 on each
disk and found all the corruption was on disk 1.  (And the entire
207*4096 bytes on fragment 58.)  Grabbing more than 4k of each
fragment brings in data from other files.  So, I might have compared
all of the data (fragments 0-57 are 128k uncompressed, and at least
fragment 0 uncompressed does lzop down to about 2k, so perhaps all the
other 128k fragments compress to within 4k, but maybe not) but this
might not have grabbed all the data.

I could give you (56) 128k, (1) 68k, and (1) 828k fragments, but
they'd include extra data not involved, so you'd have to find a way to
use them, and without the metadata saying how many bytes of each
fragment to use, it might not be easy to put together.  (Maybe
chopping off all the trailing 0's in each fragment would do the
trick.)  Maybe the first 9 byte header on each fragment encodes the
length actually used?

If this is useful to you, I'd be happy to provide it, along with the
correct one.

If there's a better way than this, I'd be happy to do that instead.  I
of course can't just copy the file, so have to do something like dd or
"btrfs-map-logical -o".  "btrfs-map-logical -o" can't automatically
grab the proper length, because it needs a size argument, and if not
given, defaults to the 16k nodesize.

> The dump indicates the same conclusion you reached.
> The inode has NODATACOW NODATASUM flag, which means it should not has
> csum nor has data compressed.
> While in fact we have tons of compressed extents.
>
> But the following fiemap result also shows that these extents get
> shared. This could happen when there is a snapshot.

I do run snapper.

> To pindown the lzo decompress corruption, kasan would be a nice try.
> However this means you need to enable it at compile time, and recompile
> a kernel.
> Not to mention kasan has a great impact on performance.
>
> But it should provide more info before memory get corrupted.

Sure, it's compiling.  I'll probably be available to run it and send
results in 14 hours, if needed.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs-progs: fsck-tests: add test case to check symlinks with odd flags

2018-05-14 Thread Su Yue
There are two bad symlinks in the test case.
One is with immutable attribute.
Another one is with append attribute.

Signed-off-by: Su Yue 
---
 .../034-odd-inode-flags/default_case.img | Bin 0 -> 4096 bytes
 tests/fsck-tests/034-odd-inode-flags/test.sh |  15 +++
 2 files changed, 15 insertions(+)
 create mode 100644 tests/fsck-tests/034-odd-inode-flags/default_case.img
 create mode 100755 tests/fsck-tests/034-odd-inode-flags/test.sh

diff --git a/tests/fsck-tests/034-odd-inode-flags/default_case.img 
b/tests/fsck-tests/034-odd-inode-flags/default_case.img
new file mode 100644
index 
..43a2a6f62d5ef3afd77f117b577a56ad6098ed19
GIT binary patch
literal 4096
zcmeH}c{CJUAIFU?k*yI4S@RGnly%15rj(_~G8rO6vS$b*hR9G^M_GnP*~gG%*XULD
zecwiA3?YW>hM5=dd(Ly-=e+&-p7Z|Uo_l`xdw=)dbMEhRes@7VO!Ok2v8iSFcVXRY
z0S9$YY#lgx4lMhgmx00fKp$b;YjjKwbl1Mc|4Xm#`|-vHGaC;^56vNgLjwQD1pG!)
z0%s{UtRTF}`3%o*f~o}Ub~i7&5a;#~3S~wq>yb%mpwOOBQV@`K}NNkK2T)W
zP3ZBKkdz|>(+ppAd?n`Mhq5Y_Mx6#)lkr%V0RYfvXKS#j!hDNZK|Zl0`#(aZhs{~>g=;i^@`Df+(}JrN_k>!De44F)K~Z_B2ji*e|B3TM=T#=3J?hFQkLT
zcoHc*_{i*Zl|=_O)<@(9atD)c?k`SQU{$3zUo_)!00YeaVw>N{!sut0FUEY0%ig=a
zMor(Rh4`G&3Z`YTQC;Vd8DgP6v(Pk4lF8cYH?czMG@QAi}DN#2dOFHE}ny2M`mZ24yt(4GUf}VgxH+`7uL@
zKDpXm>?+9{UNN|}xP%5x8ehlk9$5E{}{L^#S!Q*F3j-dy9VK#9I*B-fuj6aJJl
z=P4E0NBM;7Fya(6l|UU2sBHD~_7;trI@t}asks;9Kk+;cNS#DJ)+5qi%9$5}Frqs<
zGo#p;!NId;WbB=u?K!Wum6^@+(Q%lm<^l5MSYiPzgj6jAV1WhvGO+kDGX
zzw5{X`z6v>V}g9P^96Zk*+d9P;fMmiO;S-TF=bQT1)?9j;R){>iIAX>R5$~D?vMpj
zBnOV)t?w>WW|3jF#YJ{gK;@)e6U`2;z47;3jE!%&}epjB+(;{Nt3NKke
zgQOrb4!>S43Sh|5t~6Q9_eALyPQFaH!cTLiXpX=W)Jx7Uxq;~%Hy)VR7GgFu`>T4N
zZ?@I};@Zg%oqD*Vk4*qEy@RB5H{Z>mf%4)?l}C7)7z9RiH3+|z}`sur%ucH}%V
zPz+A(a7rcq^2?5ccAa_hN*yFmglI!5N>l2$a={+~R77Z4R7$n%aIEVA=N$
z1b$h$FIy~<-xmLlH6{;68g#pjl7j63c)%xA>Dd9r0^C%QE)RQfgi2
zXiewE(h-{O)QWl=h!TjI+0de3-VC=YSC^0MN(pB5!!V!#68@VzTm&5G-AS>CCye#X
zC(0KE`nt?7e5TX|O%{xg!*Ab5P=2^-DRjo81w`|=zUk#|&(uj(4p2-{$h3e`sM17w
zdW#Tl75Q}}emQch`~$C9_sjTscj@M>sfi4kHHdcdX*RC$on>k-Dn@x1YE7>^O(DHf
z{Sf3DiCn-r)e#3Nqt1-yQ4Z
zjQbKaB#VjYFLQ=5E+mdcWR7)?7QwYYr_nqX2!rj`wma*sUcEyVH)-UMp9vk5p*sJi
z)uP>;M~F@^1@nip{sw)_u!t0m3wfd_^tsEjdl3JV!)ti4tO%j3AZkU&1P8pdqT2@4
z)2a@Bm48L_Y|X>TZU*bQ)wNAuogJWSw*}wLwwe>k*K*tA6k?IBPtx`FWcq8tfigF_
z`^-m<5?dEeNYIhR9?
zpXv|E5a?(uK@>aY^((xBosb-{GjPf)k(;|7FUGk~Pl7(`D#yu_syY49#0sXEQ
zq2Yh_;mDS0F#bl9X5`NV5q91498spY6Yu}FrF1zva3DCCYrYxh$_DQ7>Kut=Pt1kZ
z=

[PATCH 1/3] btrfs-progs: check: check symlinks with append/immutable flags

2018-05-14 Thread Su Yue
Define new macro I_ERR_ODD_INODE_FLAGS to represents odd inode flags.

Symlinks should never have append/immutable flags.
While processing inodes, if found a symlink with append/immutable
flags, mark the inode record with I_ERR_ODD_INODE_FLAGS.

This is for original mode.

Signed-off-by: Su Yue 
---
 check/main.c  | 7 +++
 check/mode-original.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/check/main.c b/check/main.c
index 68da994f7ae0..f5f52c406046 100644
--- a/check/main.c
+++ b/check/main.c
@@ -576,6 +576,8 @@ static void print_inode_error(struct btrfs_root *root, 
struct inode_record *rec)
fprintf(stderr, ", link count wrong");
if (errors & I_ERR_FILE_EXTENT_ORPHAN)
fprintf(stderr, ", orphan file extent");
+   if (errors & I_ERR_ODD_INODE_FLAGS)
+   fprintf(stderr, ", odd inode flags");
fprintf(stderr, "\n");
/* Print the orphan extents if needed */
if (errors & I_ERR_FILE_EXTENT_ORPHAN)
@@ -805,6 +807,7 @@ static int process_inode_item(struct extent_buffer *eb,
 {
struct inode_record *rec;
struct btrfs_inode_item *item;
+   u64 flags;
 
rec = active_node->current;
BUG_ON(rec->ino != key->objectid || rec->refs > 1);
@@ -822,6 +825,10 @@ static int process_inode_item(struct extent_buffer *eb,
rec->found_inode_item = 1;
if (rec->nlink == 0)
rec->errors |= I_ERR_NO_ORPHAN_ITEM;
+   flags = btrfs_inode_flags(eb, item);
+   if (rec->imode & BTRFS_FT_SYMLINK &&
+   flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND))
+   rec->errors = I_ERR_ODD_INODE_FLAGS;
maybe_free_inode_rec(_node->inode_cache, rec);
return 0;
 }
diff --git a/check/mode-original.h b/check/mode-original.h
index 368de692fdd1..13cfa5b9e1b3 100644
--- a/check/mode-original.h
+++ b/check/mode-original.h
@@ -186,6 +186,7 @@ struct file_extent_hole {
 #define I_ERR_LINK_COUNT_WRONG (1 << 13)
 #define I_ERR_FILE_EXTENT_ORPHAN   (1 << 14)
 #define I_ERR_FILE_EXTENT_TOO_LARGE(1 << 15)
+#define I_ERR_ODD_INODE_FLAGS  (1 << 16)
 
 struct inode_record {
struct list_head backrefs;
-- 
2.17.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] btrfs-progs: check: verify symlinks with append/immutable flags

2018-05-14 Thread Su Yue
This patchset can be fetch from my github:
https://github.com/Damenly/btrfs-progs/commits/odd_inode_flags

symlinks should never have append/immutable attributes.
This patchset enables btrfs check to verify such corruption.

PATCH[1] is for original mode.
PATCH[2] is for original mode.

PATCH[3] adds a test image.
For further use, the directory is called odd-inode-flags.

#issue 133

Su Yue (3):
  btrfs-progs: check: check symlinks with append/immutable flags
  btrfs-progs: lowmem: check symlinks with append/immutable flags
  btrfs-progs: fsck-tests: add test case to check symlinks with odd
flags

 check/main.c |   7 +++
 check/mode-lowmem.c  |  10 ++
 check/mode-original.h|   1 +
 .../034-odd-inode-flags/default_case.img | Bin 0 -> 4096 bytes
 tests/fsck-tests/034-odd-inode-flags/test.sh |  15 +++
 5 files changed, 33 insertions(+)
 create mode 100644 tests/fsck-tests/034-odd-inode-flags/default_case.img
 create mode 100755 tests/fsck-tests/034-odd-inode-flags/test.sh

-- 
2.17.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs-progs: lowmem: check symlinks with append/immutable flags

2018-05-14 Thread Su Yue
Symlinks should never have append/immutable flags.
While checking inodes, if found a symlink with append/immutable
flags, report and record the fatal error.

This is for lowmem mode.

Signed-off-by: Su Yue 
---
 check/mode-lowmem.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 9890180d1d3c..61c4e7f23d47 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -2274,6 +2274,7 @@ static int check_inode_item(struct btrfs_root *root, 
struct btrfs_path *path)
struct btrfs_key last_key;
u64 inode_id;
u32 mode;
+   u64 flags;
u64 nlink;
u64 nbytes;
u64 isize;
@@ -2307,10 +2308,19 @@ static int check_inode_item(struct btrfs_root *root, 
struct btrfs_path *path)
isize = btrfs_inode_size(node, ii);
nbytes = btrfs_inode_nbytes(node, ii);
mode = btrfs_inode_mode(node, ii);
+   flags = btrfs_inode_flags(node, ii);
dir = imode_to_type(mode) == BTRFS_FT_DIR;
nlink = btrfs_inode_nlink(node, ii);
nodatasum = btrfs_inode_flags(node, ii) & BTRFS_INODE_NODATASUM;
 
+   if (mode & BTRFS_FT_SYMLINK &&
+   (flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND))) {
+   err = FATAL_ERROR;
+   error(
+"symlinks shall never be with immutable/append, root %llu inode item [%llu] 
flags %llu may be corrupted",
+ root->objectid, inode_id, flags);
+   }
+
while (1) {
btrfs_item_key_to_cpu(path->nodes[0], _key, 
path->slots[0]);
ret = btrfs_next_item(root, path);
-- 
2.17.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set

2018-05-14 Thread Nikolay Borisov
[Adding David to CC]

On 14.05.2018 12:39, Roman Mamedov wrote:
> On Mon, 14 May 2018 11:36:26 +0300
> Nikolay Borisov  wrote:
> 
>> So what made you have these expectation, is it codified somewhere
>> (docs/man pages etc)? I'm fine with that semantics IF this is what
>> people expect.
> 
> "Compression ...does not work for NOCOW files":
> https://btrfs.wiki.kernel.org/index.php/Compression
> 
> The mount options man page does not say that the NOCOW attribute of files will
> be disregarded with compress-force.  It only mentions interaction with the
> nodatacow and nodatasum mount options. So I'd expect the attribute to still
> work and prevent compression of NOCOW files.

I wouldn't say this is very clear, it needs to be stated explicitly.

> 
>> Now the question is why people grew up to have this expectation and not the
>> other way round? IMO force_compress should really disregard everything else
> 
> Both are knobs that the user needs to explicitly set, the difference is that
> the +C attribute is fine-grained and the mount option is global. If they are
> set by the user to conflicting values, it seems more useful to have the
> fine-grained control override the global one, not the other way round.

This is valid reasoning but so is mine. So I'd like to have some rules
on that matter such that in the future things will have consistent
semantics. Obviously in this case the "local options trump global ones"
seems to be prevalent. I don't have problem with that but this should
codified somewhere.


David, what's your take on that. Where do you think will be the best
place to say that local, per-inode options take precedence over global ones?

> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set

2018-05-14 Thread Roman Mamedov
On Mon, 14 May 2018 11:36:26 +0300
Nikolay Borisov  wrote:

> So what made you have these expectation, is it codified somewhere
> (docs/man pages etc)? I'm fine with that semantics IF this is what
> people expect.

"Compression ...does not work for NOCOW files":
https://btrfs.wiki.kernel.org/index.php/Compression

The mount options man page does not say that the NOCOW attribute of files will
be disregarded with compress-force.  It only mentions interaction with the
nodatacow and nodatasum mount options. So I'd expect the attribute to still
work and prevent compression of NOCOW files.

> Now the question is why people grew up to have this expectation and not the
> other way round? IMO force_compress should really disregard everything else

Both are knobs that the user needs to explicitly set, the difference is that
the +C attribute is fine-grained and the mount option is global. If they are
set by the user to conflicting values, it seems more useful to have the
fine-grained control override the global one, not the other way round.

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set

2018-05-14 Thread james harvey
On Mon, May 14, 2018 at 4:36 AM, Nikolay Borisov  wrote:
> On 14.05.2018 11:20, Roman Mamedov wrote:
>> On Mon, 14 May 2018 11:10:34 +0300
>> Nikolay Borisov  wrote:
>>
>>> But if we have mounted the fs with FORCE_COMPRESS shouldn't we disregard
>>> the inode flags, presumably the admin knows what he is doing?
>>
>> Please don't. Personally I always assumed chattr +C would prevent both CoW 
>> and
>> compression, and used that as a way to override volume-wide compress-force 
>> for
>> a particular folder. Now that it turns out this wasn't working, the patch
>> would fix it to behave in line with prior expectations.
>
>
> So what made you have these expectation, is it codified somewhere
> (docs/man pages etc)? I'm fine with that semantics IF this is what
> people expect.

https://btrfs.wiki.kernel.org/index.php/Compression

"Compression... does not work for NOCOW files."

IMO, without a checksum, allowing compression is too much of a data
risk.  Uncompressed, user may be able to salvage most of the data with
partial corruption.  Compressed, they can't get anything to even look
at, unless they have a mirrored copy that is fine, figure out which
disk is corrupted, and mount degraded without it.

> Now the question is why people grew up to have this expectation and not
> the other way round? IMO force_compress should really disregard
> everything else, if not then it should be documented. The man pages [0]
> say:
>
> "
> If compress-force is specified, the compression will allways be
> attempted, but the data may end up uncompressed if the compression would
> make them larger.
>
> As this is too simple, the compress-force is a workaround that will
> compress most of the files at the cost of some wasted CPU cycles on
> failed attempts. The heuristics of compress will improve in the future
> so this will not be necessary.
> "
>
> In this case I think the man page for this option should be documented
> to say that inode-specific flags preclude mount flags.
>
> https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)#MOUNT_OPTIONS

Agreed the man page should be clarified.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set

2018-05-14 Thread james harvey
On Mon, May 14, 2018 at 4:20 AM, Roman Mamedov  wrote:
> On Mon, 14 May 2018 11:10:34 +0300
> Nikolay Borisov  wrote:
>
>> But if we have mounted the fs with FORCE_COMPRESS shouldn't we disregard
>> the inode flags, presumably the admin knows what he is doing?
>
> Please don't. Personally I always assumed chattr +C would prevent both CoW and
> compression, and used that as a way to override volume-wide compress-force for
> a particular folder. Now that it turns out this wasn't working, the patch
> would fix it to behave in line with prior expectations.

"the patch would fix it to behave" might be missing one part.  Qu, am
I right (best guess, not familiar with btrfs internals enough) this
patch will prevent new files and files of size 0 from getting
compression with NODATASUM or NODATACOW, but would leave existing
compressed files (that shouldn't have been compressed) as they are?
After getting an error from btrfs check, presumably, user would need
to copy the file(s) to a temporary and copy it back over itself, to
get rid of the unwanted compression?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


4.13 call trace in btrfs_update_device

2018-05-14 Thread Lukas Tribus
Hello,


seeing calltraces like this on a Ubuntu 4.13 kernel, not sure what
happens here and what the impact is:


[13729.871604] [ cut here ]
[13729.871647] WARNING: CPU: 1 PID: 3048 at
/build/linux-hwe-rDkE7z/linux-hwe-4.13.0/fs/btrfs/ctree.h:1559
btrfs_update_device+0x1b4/0x1c0 [btrfs]
[13729.871648] Modules linked in: xen_pciback xt_physdev br_netfilter
iptable_filter ip_tables x_tables xen_netback xen_blkback
algif_skcipher af_alg xen_gntdev xen_evtchn xenfs xen_privcmd
nls_iso8859_1 dm_crypt intel_rapl bridge stp llc x86_pkg_temp_thermal
intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel intel_rapl_perf shpchp ie31200_edac mei_me mei
nuvoton_cir rc_core lpc_ich mac_hid ib_iser rdma_cm iw_cm ib_cm
ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4
btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid hid i915 aesni_intel firewire_ohci i2c_algo_bit
aes_x86_64 tg3 crypto_simd drm_kms_helper ptp firewire_core cryptd
syscopyarea mxm_wmi glue_helper
[13729.871702]  crc_itu_t pps_core sysfillrect ahci sysimgblt
fb_sys_fops libahci drm wmi video
[13729.871714] CPU: 1 PID: 3048 Comm: btrfs-cleaner Tainted: G
W   4.13.0-41-generic #46~16.04.1-Ubuntu
[13729.871715] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./Z77 Extreme6, BIOS P2.80 07/01/2013
[13729.871717] task: 8802a1211740 task.stack: c9004231
[13729.871743] RIP: e030:btrfs_update_device+0x1b4/0x1c0 [btrfs]
[13729.871744] RSP: e02b:c90042313d10 EFLAGS: 00010206
[13729.871746] RAX: 0fff RBX:  RCX: 0917dfcfbe00
[13729.871748] RDX: 0004 RSI: 3dd4 RDI: 8801126aaaf0
[13729.871749] RBP: c90042313d58 R08: 3dd8 R09: c90042313cc8
[13729.871751] R10: 1000 R11: 0003 R12: 8802a2779a10
[13729.871752] R13: 8802a30ad000 R14: 3db4 R15: 8801126aaaf0
[13729.871761] FS:  () GS:8802b248()
knlGS:8802b248
[13729.871762] CS:  e033 DS:  ES:  CR0: 80050033
[13729.871764] CR2: 7f870df80480 CR3: 0001a1618000 CR4: 00042660
[13729.871766] Call Trace:
[13729.871795]  btrfs_remove_chunk+0x2b8/0x880 [btrfs]
[13729.871819]  btrfs_delete_unused_bgs+0x336/0x3e0 [btrfs]
[13729.871841]  cleaner_kthread+0x155/0x190 [btrfs]
[13729.871847]  kthread+0x10c/0x140
[13729.871868]  ? run_one_async_done+0xc0/0xc0 [btrfs]
[13729.871871]  ? kthread_create_on_node+0x70/0x70
[13729.871875]  ret_from_fork+0x35/0x40
[13729.871877] Code: 00 4c 89 ff 45 31 c0 ba 10 00 00 00 4c 89 f6 e8
73 23 ff ff 4c 89 ff e8 7b f5 fc ff e9 d9 fe ff ff b8 f4 ff ff ff e9
d9 fe ff ff <0f> ff eb b8 e8 53 e8 ba c0 0f 1f 00 0f 1f 44 00 00 55 48
89 e5
[13729.871923] ---[ end trace 2e6095c7c763f649 ]---
[13729.885370] [ cut here ]
[13729.885411] WARNING: CPU: 2 PID: 3048 at
/build/linux-hwe-rDkE7z/linux-hwe-4.13.0/fs/btrfs/ctree.h:1559
btrfs_update_device+0x1b4/0x1c0 [btrfs]
[13729.885412] Modules linked in: xen_pciback xt_physdev br_netfilter
iptable_filter ip_tables x_tables xen_netback xen_blkback
algif_skcipher af_alg xen_gntdev xen_evtchn xenfs xen_privcmd
nls_iso8859_1 dm_crypt intel_rapl bridge stp llc x86_pkg_temp_thermal
intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel intel_rapl_perf shpchp ie31200_edac mei_me mei
nuvoton_cir rc_core lpc_ich mac_hid ib_iser rdma_cm iw_cm ib_cm
ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4
btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
hid_generic usbhid hid i915 aesni_intel firewire_ohci i2c_algo_bit
aes_x86_64 tg3 crypto_simd drm_kms_helper ptp firewire_core cryptd
syscopyarea mxm_wmi glue_helper
[13729.885466]  crc_itu_t pps_core sysfillrect ahci sysimgblt
fb_sys_fops libahci drm wmi video
[13729.885478] CPU: 2 PID: 3048 Comm: btrfs-cleaner Tainted: G
W   4.13.0-41-generic #46~16.04.1-Ubuntu
[13729.885479] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./Z77 Extreme6, BIOS P2.80 07/01/2013
[13729.885481] task: 8802a1211740 task.stack: c9004231
[13729.885506] RIP: e030:btrfs_update_device+0x1b4/0x1c0 [btrfs]
[13729.885508] RSP: e02b:c90042313d10 EFLAGS: 00010206
[13729.885510] RAX: 0fff RBX:  RCX: 0917dfcfbe00
[13729.885512] RDX: 0004 RSI: 3dd4 RDI: 8801126aaaf0
[13729.885513] RBP: c90042313d58 R08: 3dd8 R09: c90042313cc8
[13729.885514] R10: 1000 R11: 0003 R12: 8802a25414d0
[13729.885516] R13: 8802a30ad000 R14: 3db4 R15: 8801126aaaf0
[13729.885524] FS:  () GS:8802b250()
knlGS:88003d30
[13729.885525] CS:  e033 DS:  ES:  CR0: 

Clarification needed about libbtrfs & libbtrfsutil

2018-05-14 Thread Dimitri John Ledkov
Are both of these meant to be public libraries, installed on the user
systems, and available in .so variant as well for 3rd party
development and public dynamic linking?

Or are these private internal libraries, which are installed as public
runtime only, simply to share code between the utils, but otherwise
provide no abi stability and will forever remain libfoo.so.0?

Or should these even be a noinst_ libraries (~= Libtool Convenience
Libraries), and are simply intermediate by-products?

I'm asking because despite compiling shared & static variants of these
libraries, and "shared linked" and "static linked" variants of the
utils, it appears that all utilities are statically linking against
libbtrfs/libbtrfsutils. Thus no binaries nor bindings, dynamically
link against neither libbtrfs nor libbtrfsutil.

Tweaking the makefile to use libs_shared variable instead of libs or
libs_static, results in slightly smaller binaries, dynamically linked
against libbtrfs/libbtrfsutil.

But it is hard to tell if this is a bug/mistake, or an intentional feature.

-- 
Regards,

Dimitri.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set

2018-05-14 Thread Nikolay Borisov


On 14.05.2018 11:20, Roman Mamedov wrote:
> On Mon, 14 May 2018 11:10:34 +0300
> Nikolay Borisov  wrote:
> 
>> But if we have mounted the fs with FORCE_COMPRESS shouldn't we disregard
>> the inode flags, presumably the admin knows what he is doing?
> 
> Please don't. Personally I always assumed chattr +C would prevent both CoW and
> compression, and used that as a way to override volume-wide compress-force for
> a particular folder. Now that it turns out this wasn't working, the patch
> would fix it to behave in line with prior expectations.


So what made you have these expectation, is it codified somewhere
(docs/man pages etc)? I'm fine with that semantics IF this is what
people expect.


Now the question is why people grew up to have this expectation and not
the other way round? IMO force_compress should really disregard
everything else, if not then it should be documented. The man pages [0]
say:

"
If compress-force is specified, the compression will allways be
attempted, but the data may end up uncompressed if the compression would
make them larger.

As this is too simple, the compress-force is a workaround that will
compress most of the files at the cost of some wasted CPU cycles on
failed attempts. The heuristics of compress will improve in the future
so this will not be necessary.
"

In this case I think the man page for this option should be documented
to say that inode-specific flags preclude mount flags.

https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)#MOUNT_OPTIONS

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set

2018-05-14 Thread Roman Mamedov
On Mon, 14 May 2018 11:10:34 +0300
Nikolay Borisov  wrote:

> But if we have mounted the fs with FORCE_COMPRESS shouldn't we disregard
> the inode flags, presumably the admin knows what he is doing?

Please don't. Personally I always assumed chattr +C would prevent both CoW and
compression, and used that as a way to override volume-wide compress-force for
a particular folder. Now that it turns out this wasn't working, the patch
would fix it to behave in line with prior expectations.

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set

2018-05-14 Thread Qu Wenruo


On 2018年05月14日 16:10, Nikolay Borisov wrote:
> 
> 
> On 14.05.2018 10:02, Qu Wenruo wrote:
>> As btrfs(5) specified:
>>
>>  Note
>>  If nodatacow or nodatasum are enabled, compression is disabled.
>>
>> If NODATASUM or NODATACOW set, we should not compress the extent.
>>
>> And in fact, we have bug report about corrupted compressed extent
>> leading to memory corruption in mail list.
>> Although it's mostly buggy lzo implementation causing the problem, btrfs
> 
> What does it mean "it's mostly buggy lzo implementation"? If we have bug
> in the LZO implementation that btrfs uses shouldn't it be fixed as well ?

It looks like buggy lzo decompress code which can't handle corrupted
data and results some random kernel memory corruption.
It's also possible that some incorrect function call in btrfs that
caused the problem.

It needs to be fixed of course, but right now, no binary dump of the
offending data, so we can't verify or further investigate the root cause.

Thanks,
Qu

> 
>> still needs to be fixed to meet the specification.
>>
>> Reported-by: James Harvey 
>> Signed-off-by: Qu Wenruo 
>> ---
>>  fs/btrfs/inode.c | 8 
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index d241285a0d2a..dbef3f404559 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -396,6 +396,14 @@ static inline int inode_need_compress(struct inode 
>> *inode, u64 start, u64 end)
>>  {
>>  struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>>  
>> +/*
>> + * Btrfs doesn't support compression without csum or CoW.
>> + * This should have the highest priority.
>> + */
>> +if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW ||
>> +BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)
>> +return 0;
>> +
>>  /* force compress */
>>  if (btrfs_test_opt(fs_info, FORCE_COMPRESS))
>>  return 1;
> 
> But if we have mounted the fs with FORCE_COMPRESS shouldn't we disregard
> the inode flags,

I'm afraid not.
AFAIK the purpose of force compress is to bypass the compression ratio
detection, other than generating compressed extent for NODATASUM use case.

Thanks,
Qu

> presumably the admin knows what he is doing? Won't it
> be better if somewhere further in the call chain we check if
> FORCE_COMPRESS is set and if so override any inode flags. This can be
> implemented by _always_ calling inode_need_compress to decide if we
> should compress or not?
> 
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set

2018-05-14 Thread Nikolay Borisov


On 14.05.2018 10:02, Qu Wenruo wrote:
> As btrfs(5) specified:
> 
>   Note
>   If nodatacow or nodatasum are enabled, compression is disabled.
> 
> If NODATASUM or NODATACOW set, we should not compress the extent.
> 
> And in fact, we have bug report about corrupted compressed extent
> leading to memory corruption in mail list.
> Although it's mostly buggy lzo implementation causing the problem, btrfs

What does it mean "it's mostly buggy lzo implementation"? If we have bug
in the LZO implementation that btrfs uses shouldn't it be fixed as well ?

> still needs to be fixed to meet the specification.
> 
> Reported-by: James Harvey 
> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/inode.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index d241285a0d2a..dbef3f404559 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -396,6 +396,14 @@ static inline int inode_need_compress(struct inode 
> *inode, u64 start, u64 end)
>  {
>   struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>  
> + /*
> +  * Btrfs doesn't support compression without csum or CoW.
> +  * This should have the highest priority.
> +  */
> + if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW ||
> + BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)
> + return 0;
> +
>   /* force compress */
>   if (btrfs_test_opt(fs_info, FORCE_COMPRESS))
>   return 1;

But if we have mounted the fs with FORCE_COMPRESS shouldn't we disregard
the inode flags, presumably the admin knows what he is doing? Won't it
be better if somewhere further in the call chain we check if
FORCE_COMPRESS is set and if so override any inode flags. This can be
implemented by _always_ calling inode_need_compress to decide if we
should compress or not?

> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs-progs: fsck-tests: Add test case for detecting compressed extent without csum

2018-05-14 Thread Qu Wenruo
Signed-off-by: Qu Wenruo 
---
 .../compressed_extent_without_csum.raw.xz | Bin 0 -> 21996 bytes
 .../032-compressed-nodatasum/test.sh  |  24 ++
 2 files changed, 24 insertions(+)
 create mode 100644 
tests/fsck-tests/032-compressed-nodatasum/compressed_extent_without_csum.raw.xz
 create mode 100755 tests/fsck-tests/032-compressed-nodatasum/test.sh

diff --git 
a/tests/fsck-tests/032-compressed-nodatasum/compressed_extent_without_csum.raw.xz
 
b/tests/fsck-tests/032-compressed-nodatasum/compressed_extent_without_csum.raw.xz
new file mode 100644
index 
..72a3bfd8f8b0bf2d610ce84af0abf4003204a9a1
GIT binary patch
literal 21996
zcmeI4c{J4jzsG075Hj{%+4r4FX))?EVxdjMBJ~vpQ
zaRS5#<_3X4^dHT;NhCpX6KfFY&(`-Y@>`5KGo1zUr+38Hhkk(9eDGpEBo%Y3+a
z39N$)?!aIWGP$QlLSB4}7GN-%M6!SI!A$!Rs}e(Kof{^+wd}33Zf48xDWoJETEpI1
zJaKTil_;sFO{y5G4$!50I$USB9T)ZvTBJJ#(kP=uiH=7`xOc6GK=#_tkR
z!d{gr!mh{ZL~)Oii%0PJQei`H+mC;Yxx{KdS%S%NR`Bvi*^?KY$%coNW6;V}UKynZ
z3=&39BUc+4ThIxAm!uXIYjteyW8|r>EMS(~?

[PATCH 0/3] btrfs-progs: Detect compressed extent without csum

2018-05-14 Thread Qu Wenruo
Patches can be fetch from github:
https://github.com/adam900710/btrfs-progs/tree/compress_nodatasum

It's based on v4.16 stable branch.

James Harvey from mail list reports a strange kernel panic, whichs show
obviously kernel memory corruption, while after btrfs decompression
failure.

It turns out that, some compressed extent get corrupted on-disk, while
the inode has NODATASUM set, there is no csum to prevent corrupted
mirror being used.

Although the root cause should be buggy lzo implementation, it still
shows that btrfs is not following the behavior defined in btrfs(5):

 Note
 If nodatacow or nodatasum are enabled, compression is disabled.

So at least make btrfs check to detect such problem.

Qu Wenruo (3):
  btrfs-progs: check/lowmem: Add checks for compressed extent without
csum
  btrfs-progs: check/original: Add checks for compressed extent without
csum
  btrfs-progs: fsck-tests: Add test case for detecting compressed extent
without csum

 check/main.c  |  10 ++--
 check/mode-lowmem.c   |  18 +
 .../compressed_extent_without_csum.raw.xz | Bin 0 -> 21996 bytes
 .../032-compressed-nodatasum/test.sh  |  24 ++
 4 files changed, 50 insertions(+), 2 deletions(-)
 create mode 100644 
tests/fsck-tests/032-compressed-nodatasum/compressed_extent_without_csum.raw.xz
 create mode 100755 tests/fsck-tests/032-compressed-nodatasum/test.sh

-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs-progs: check/lowmem: Add checks for compressed extent without csum

2018-05-14 Thread Su Yue


On 05/14/2018 03:03 PM, Qu Wenruo wrote:
> There is one report of compressed extent happens in btrfs, but has no
> csum and then leads to possible decompress error screwing up kernel
> memory.
> 
> Although it's a kernel bug, and won't cause problem until compressed
> data get corrupted, let's catch such problem in advance.
> 
> This patch will catch any unexpected compressed extent with:
> 
> 1) 0 or less than expected csum
> 
> 2) nodatasum flag set in the inode item
> 
> This is for lowmem mode.
> 
> Reported-by: James Harvey 
> Issue: #134
> Signed-off-by: Qu Wenruo 

Reviewed-by: Su Yue 

> ---
>  check/mode-lowmem.c | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
> index dac3201b7d99..8e6d5e8de12a 100644
> --- a/check/mode-lowmem.c
> +++ b/check/mode-lowmem.c
> @@ -1543,6 +1543,24 @@ static int check_file_extent(struct btrfs_root *root, 
> struct btrfs_key *fkey,
> csum_found);
>   }
>   }
> + /*
> +  * Extra check for compressed extents.
> +  * Btrfs doesn't allow NODATASUM and compressed extent co-exist, thus
> +  * all compressed extent should have csum.
> +  */
> + if (compressed && csum_found < search_len) {
> + error(
> +"root %llu EXTENT_DATA[%llu %llu] compressed extent must have csum, but only 
> %llu bytes has csum, expect %llu",
> +   root->objectid, fkey->objectid, fkey->offset, csum_found,
> +   search_len);
> + err |= CSUM_ITEM_MISSING;
> + }
> + if (compressed && nodatasum) {
> + error(
> +"root %llu EXTENT_DATA[%llu %llu] is compressed, but inode flag doesn't 
> allow it",
> +   root->objectid, fkey->objectid, fkey->offset);
> + err |= FILE_EXTENT_ERROR;
> + }
>  
>   /* Check EXTENT_DATA hole */
>   if (!no_holes && *end != fkey->offset) {
> 




pEpkey.asc
Description: application/pgp-keys


Re: [PATCH 2/3] btrfs-progs: check/original: Add checks for compressed extent without csum

2018-05-14 Thread Su Yue


On 05/14/2018 03:03 PM, Qu Wenruo wrote:
> There is one report of compressed extent happens in btrfs, but has no
> csum and then leads to possible decompress error screwing up kernel
> memory.
> 
> Although it's a kernel bug, and won't cause problem until compressed
> data get corrupted, let's catch such problem in advance.
> 
> This patch will catch any unexpected compressed extent with:
> 
> 1) 0 or less than expected csum
> 
> 2) nodatasum flag set in the inode item
> 
> This is for original mode.
> 
> Reported-by: James Harvey 
> Issue: #134
> Signed-off-by: Qu Wenruo 

Reviewed-by: Su Yue 

> ---
>  check/main.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/check/main.c b/check/main.c
> index 891a6d797756..db064b27e84b 100644
> --- a/check/main.c
> +++ b/check/main.c
> @@ -1437,6 +1437,7 @@ static int process_file_extent(struct btrfs_root *root,
>   u64 mask = root->fs_info->sectorsize - 1;
>   u32 max_inline_size = min_t(u32, mask,
>   BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info));
> + u8 compression;
>   int extent_type;
>   int ret;
>  
> @@ -1460,9 +1461,9 @@ static int process_file_extent(struct btrfs_root *root,
>  
>   fi = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item);
>   extent_type = btrfs_file_extent_type(eb, fi);
> + compression = btrfs_file_extent_compression(eb, fi);
>  
>   if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
> - u8 compression = btrfs_file_extent_compression(eb, fi);
>   struct btrfs_item *item = btrfs_item_nr(slot);
>  
>   num_bytes = btrfs_file_extent_inline_len(eb, slot, fi);
> @@ -1494,6 +1495,8 @@ static int process_file_extent(struct btrfs_root *root,
>btrfs_file_extent_encryption(eb, fi) ||
>btrfs_file_extent_other_encoding(eb, fi)))
>   rec->errors |= I_ERR_BAD_FILE_EXTENT;
> + if (compression && rec->nodatasum)
> + rec->errors |= I_ERR_BAD_FILE_EXTENT;
>   if (disk_bytenr > 0)
>   rec->found_size += num_bytes;
>   } else {
> @@ -1512,7 +1515,8 @@ static int process_file_extent(struct btrfs_root *root,
>   if (disk_bytenr > 0 &&
>   btrfs_header_owner(eb) != BTRFS_DATA_RELOC_TREE_OBJECTID) {
>   u64 found;
> - if (btrfs_file_extent_compression(eb, fi))
> +
> + if (compression)
>   num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi);
>   else
>   disk_bytenr += extent_offset;
> @@ -1526,6 +1530,8 @@ static int process_file_extent(struct btrfs_root *root,
>   rec->found_csum_item = 1;
>   if (found < num_bytes)
>   rec->some_csum_missing = 1;
> + if (compression && found < num_bytes)
> + rec->errors |= I_ERR_SOME_CSUM_MISSING;
>   } else if (extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
>   if (found > 0) {
>   ret = 
> check_prealloc_extent_written(root->fs_info,
> 




pEpkey.asc
Description: application/pgp-keys


[PATCH 1/3] btrfs-progs: check/lowmem: Add checks for compressed extent without csum

2018-05-14 Thread Qu Wenruo
There is one report of compressed extent happens in btrfs, but has no
csum and then leads to possible decompress error screwing up kernel
memory.

Although it's a kernel bug, and won't cause problem until compressed
data get corrupted, let's catch such problem in advance.

This patch will catch any unexpected compressed extent with:

1) 0 or less than expected csum

2) nodatasum flag set in the inode item

This is for lowmem mode.

Reported-by: James Harvey 
Issue: #134
Signed-off-by: Qu Wenruo 
---
 check/mode-lowmem.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index dac3201b7d99..8e6d5e8de12a 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -1543,6 +1543,24 @@ static int check_file_extent(struct btrfs_root *root, 
struct btrfs_key *fkey,
  csum_found);
}
}
+   /*
+* Extra check for compressed extents.
+* Btrfs doesn't allow NODATASUM and compressed extent co-exist, thus
+* all compressed extent should have csum.
+*/
+   if (compressed && csum_found < search_len) {
+   error(
+"root %llu EXTENT_DATA[%llu %llu] compressed extent must have csum, but only 
%llu bytes has csum, expect %llu",
+ root->objectid, fkey->objectid, fkey->offset, csum_found,
+ search_len);
+   err |= CSUM_ITEM_MISSING;
+   }
+   if (compressed && nodatasum) {
+   error(
+"root %llu EXTENT_DATA[%llu %llu] is compressed, but inode flag doesn't allow 
it",
+ root->objectid, fkey->objectid, fkey->offset);
+   err |= FILE_EXTENT_ERROR;
+   }
 
/* Check EXTENT_DATA hole */
if (!no_holes && *end != fkey->offset) {
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs-progs: fsck-tests: Add test case for detecting compressed extent without csum

2018-05-14 Thread Qu Wenruo
Signed-off-by: Qu Wenruo 
---
 .../compressed_extent_without_csum.raw.xz | Bin 0 -> 21996 bytes
 .../032-compressed-nodatasum/test.sh  |  24 ++
 2 files changed, 24 insertions(+)
 create mode 100644 
tests/fsck-tests/032-compressed-nodatasum/compressed_extent_without_csum.raw.xz
 create mode 100755 tests/fsck-tests/032-compressed-nodatasum/test.sh

diff --git 
a/tests/fsck-tests/032-compressed-nodatasum/compressed_extent_without_csum.raw.xz
 
b/tests/fsck-tests/032-compressed-nodatasum/compressed_extent_without_csum.raw.xz
new file mode 100644
index 
..72a3bfd8f8b0bf2d610ce84af0abf4003204a9a1
GIT binary patch
literal 21996
zcmeI4c{J4jzsG075Hj{%+4r4FX))?EVxdjMBJ~vpQ
zaRS5#<_3X4^dHT;NhCpX6KfFY&(`-Y@>`5KGo1zUr+38Hhkk(9eDGpEBo%Y3+a
z39N$)?!aIWGP$QlLSB4}7GN-%M6!SI!A$!Rs}e(Kof{^+wd}33Zf48xDWoJETEpI1
zJaKTil_;sFO{y5G4$!50I$USB9T)ZvTBJJ#(kP=uiH=7`xOc6GK=#_tkR
z!d{gr!mh{ZL~)Oii%0PJQei`H+mC;Yxx{KdS%S%NR`Bvi*^?KY$%coNW6;V}UKynZ
z3=&39BUc+4ThIxAm!uXIYjteyW8|r>EMS(~?

[PATCH 0/3] btrfs-progs: Detect compressed extent without csum

2018-05-14 Thread Qu Wenruo
Patches can be fetch from github:
https://github.com/adam900710/btrfs-progs/tree/compress_nodatasum

It's based on v4.16 stable branch.

James Harvey from mail list reports a strange kernel panic, whichs show
obviously kernel memory corruption, while after btrfs decompression
failure.

It turns out that, some compressed extent get corrupted on-disk, while
the inode has NODATASUM set, there is no csum to prevent corrupted
mirror being used.

Although the root cause should be buggy lzo implementation, it still
shows that btrfs is not following the behavior defined in btrfs(5):

 Note
 If nodatacow or nodatasum are enabled, compression is disabled.

So at least make btrfs check to detect such problem.

Qu Wenruo (3):
  btrfs-progs: check/lowmem: Add checks for compressed extent without
csum
  btrfs-progs: check/original: Add checks for compressed extent without
csum
  btrfs-progs: fsck-tests: Add test case for detecting compressed extent
without csum

 check/main.c  |  10 ++--
 check/mode-lowmem.c   |  18 +
 .../compressed_extent_without_csum.raw.xz | Bin 0 -> 21996 bytes
 .../032-compressed-nodatasum/test.sh  |  24 ++
 4 files changed, 50 insertions(+), 2 deletions(-)
 create mode 100644 
tests/fsck-tests/032-compressed-nodatasum/compressed_extent_without_csum.raw.xz
 create mode 100755 tests/fsck-tests/032-compressed-nodatasum/test.sh

-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs-progs: check/original: Add checks for compressed extent without csum

2018-05-14 Thread Qu Wenruo
There is one report of compressed extent happens in btrfs, but has no
csum and then leads to possible decompress error screwing up kernel
memory.

Although it's a kernel bug, and won't cause problem until compressed
data get corrupted, let's catch such problem in advance.

This patch will catch any unexpected compressed extent with:

1) 0 or less than expected csum

2) nodatasum flag set in the inode item

This is for original mode.

Reported-by: James Harvey 
Issue: #134
Signed-off-by: Qu Wenruo 
---
 check/main.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/check/main.c b/check/main.c
index 891a6d797756..db064b27e84b 100644
--- a/check/main.c
+++ b/check/main.c
@@ -1437,6 +1437,7 @@ static int process_file_extent(struct btrfs_root *root,
u64 mask = root->fs_info->sectorsize - 1;
u32 max_inline_size = min_t(u32, mask,
BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info));
+   u8 compression;
int extent_type;
int ret;
 
@@ -1460,9 +1461,9 @@ static int process_file_extent(struct btrfs_root *root,
 
fi = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item);
extent_type = btrfs_file_extent_type(eb, fi);
+   compression = btrfs_file_extent_compression(eb, fi);
 
if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
-   u8 compression = btrfs_file_extent_compression(eb, fi);
struct btrfs_item *item = btrfs_item_nr(slot);
 
num_bytes = btrfs_file_extent_inline_len(eb, slot, fi);
@@ -1494,6 +1495,8 @@ static int process_file_extent(struct btrfs_root *root,
 btrfs_file_extent_encryption(eb, fi) ||
 btrfs_file_extent_other_encoding(eb, fi)))
rec->errors |= I_ERR_BAD_FILE_EXTENT;
+   if (compression && rec->nodatasum)
+   rec->errors |= I_ERR_BAD_FILE_EXTENT;
if (disk_bytenr > 0)
rec->found_size += num_bytes;
} else {
@@ -1512,7 +1515,8 @@ static int process_file_extent(struct btrfs_root *root,
if (disk_bytenr > 0 &&
btrfs_header_owner(eb) != BTRFS_DATA_RELOC_TREE_OBJECTID) {
u64 found;
-   if (btrfs_file_extent_compression(eb, fi))
+
+   if (compression)
num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi);
else
disk_bytenr += extent_offset;
@@ -1526,6 +1530,8 @@ static int process_file_extent(struct btrfs_root *root,
rec->found_csum_item = 1;
if (found < num_bytes)
rec->some_csum_missing = 1;
+   if (compression && found < num_bytes)
+   rec->errors |= I_ERR_SOME_CSUM_MISSING;
} else if (extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
if (found > 0) {
ret = 
check_prealloc_extent_written(root->fs_info,
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set

2018-05-14 Thread Qu Wenruo
As btrfs(5) specified:

Note
If nodatacow or nodatasum are enabled, compression is disabled.

If NODATASUM or NODATACOW set, we should not compress the extent.

And in fact, we have bug report about corrupted compressed extent
leading to memory corruption in mail list.
Although it's mostly buggy lzo implementation causing the problem, btrfs
still needs to be fixed to meet the specification.

Reported-by: James Harvey 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/inode.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d241285a0d2a..dbef3f404559 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -396,6 +396,14 @@ static inline int inode_need_compress(struct inode *inode, 
u64 start, u64 end)
 {
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 
+   /*
+* Btrfs doesn't support compression without csum or CoW.
+* This should have the highest priority.
+*/
+   if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW ||
+   BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)
+   return 0;
+
/* force compress */
if (btrfs_test_opt(fs_info, FORCE_COMPRESS))
return 1;
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs-progs: check/original: Add checks for compressed extent without csum

2018-05-14 Thread Qu Wenruo
There is one report of compressed extent happens in btrfs, but has no
csum and then leads to possible decompress error screwing up kernel
memory.

Although it's a kernel bug, and won't cause problem until compressed
data get corrupted, let's catch such problem in advance.

This patch will catch any unexpected compressed extent with:

1) 0 or less than expected csum

2) nodatasum flag set in the inode item

This is for original mode.

Reported-by: James Harvey 
Issue: #134
Signed-off-by: Qu Wenruo 
---
 check/main.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/check/main.c b/check/main.c
index 891a6d797756..db064b27e84b 100644
--- a/check/main.c
+++ b/check/main.c
@@ -1437,6 +1437,7 @@ static int process_file_extent(struct btrfs_root *root,
u64 mask = root->fs_info->sectorsize - 1;
u32 max_inline_size = min_t(u32, mask,
BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info));
+   u8 compression;
int extent_type;
int ret;
 
@@ -1460,9 +1461,9 @@ static int process_file_extent(struct btrfs_root *root,
 
fi = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item);
extent_type = btrfs_file_extent_type(eb, fi);
+   compression = btrfs_file_extent_compression(eb, fi);
 
if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
-   u8 compression = btrfs_file_extent_compression(eb, fi);
struct btrfs_item *item = btrfs_item_nr(slot);
 
num_bytes = btrfs_file_extent_inline_len(eb, slot, fi);
@@ -1494,6 +1495,8 @@ static int process_file_extent(struct btrfs_root *root,
 btrfs_file_extent_encryption(eb, fi) ||
 btrfs_file_extent_other_encoding(eb, fi)))
rec->errors |= I_ERR_BAD_FILE_EXTENT;
+   if (compression && rec->nodatasum)
+   rec->errors |= I_ERR_BAD_FILE_EXTENT;
if (disk_bytenr > 0)
rec->found_size += num_bytes;
} else {
@@ -1512,7 +1515,8 @@ static int process_file_extent(struct btrfs_root *root,
if (disk_bytenr > 0 &&
btrfs_header_owner(eb) != BTRFS_DATA_RELOC_TREE_OBJECTID) {
u64 found;
-   if (btrfs_file_extent_compression(eb, fi))
+
+   if (compression)
num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi);
else
disk_bytenr += extent_offset;
@@ -1526,6 +1530,8 @@ static int process_file_extent(struct btrfs_root *root,
rec->found_csum_item = 1;
if (found < num_bytes)
rec->some_csum_missing = 1;
+   if (compression && found < num_bytes)
+   rec->errors |= I_ERR_SOME_CSUM_MISSING;
} else if (extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
if (found > 0) {
ret = 
check_prealloc_extent_written(root->fs_info,
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs-progs: check/lowmem: Add checks for compressed extent without csum

2018-05-14 Thread Qu Wenruo
There is one report of compressed extent happens in btrfs, but has no
csum and then leads to possible decompress error screwing up kernel
memory.

Although it's a kernel bug, and won't cause problem until compressed
data get corrupted, let's catch such problem in advance.

This patch will catch any unexpected compressed extent with:

1) 0 or less than expected csum

2) nodatasum flag set in the inode item

This is for lowmem mode.

Reported-by: James Harvey 
Issue: #134
Signed-off-by: Qu Wenruo 
---
 check/mode-lowmem.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index dac3201b7d99..8e6d5e8de12a 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -1543,6 +1543,24 @@ static int check_file_extent(struct btrfs_root *root, 
struct btrfs_key *fkey,
  csum_found);
}
}
+   /*
+* Extra check for compressed extents.
+* Btrfs doesn't allow NODATASUM and compressed extent co-exist, thus
+* all compressed extent should have csum.
+*/
+   if (compressed && csum_found < search_len) {
+   error(
+"root %llu EXTENT_DATA[%llu %llu] compressed extent must have csum, but only 
%llu bytes has csum, expect %llu",
+ root->objectid, fkey->objectid, fkey->offset, csum_found,
+ search_len);
+   err |= CSUM_ITEM_MISSING;
+   }
+   if (compressed && nodatasum) {
+   error(
+"root %llu EXTENT_DATA[%llu %llu] is compressed, but inode flag doesn't allow 
it",
+ root->objectid, fkey->objectid, fkey->offset);
+   err |= FILE_EXTENT_ERROR;
+   }
 
/* Check EXTENT_DATA hole */
if (!no_holes && *end != fkey->offset) {
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass

2018-05-14 Thread Qu Wenruo


On 2018年05月14日 13:30, Qu Wenruo wrote:
> 
> 
> On 2018年05月14日 12:41, james harvey wrote:
>> On Sun, May 13, 2018 at 10:08 PM, Qu Wenruo  wrote:
>>> On 2018年05月12日 13:08, james harvey wrote:
 Hardware is fine.  Passes memtest86+ in SMP mode.  Works fine on all
 other files.



 [  381.869940] BUG: unable to handle kernel paging request at 
 00390e50
 [  381.870881] BTRFS: decompress failed
 [  381.891775] IP: rebalance_domains+0x8a/0x2c0
>>>
>>> The interesting part here is, btrfs is not showing up the call trace,
>>> not even lzo code.
>>> (Despite of the "decompress failed" message).
>>> Maybe some corrupted data is screwing up some random kernel memory?
>>
>> I've been surprised by this too.  I've seen a few "styles" of crashes from 
>> this.
>>
>> The fuller version of the one I posted in original post:
>> https://bugzilla.kernel.org/attachment.cgi?id=275949
>>
>> One that starts with a "general protection fault":
>> https://bugzilla.kernel.org/attachment.cgi?id=275951
>>
>> And my most recent version, starts with "BTRFS: decompress failed"
>> then "BUG: unable to handle kernel NULL pointer dereference at
>> 0001":
>> https://bugzilla.kernel.org/attachment.cgi?id=275961
>>
>> This latest one does have a call trace including btrfs.  The top of
>> the call trace is "end_compressed_bio_read+0x34e/0x3d0 [btrfs]", and
>> although it includes the word compressed, I'm not sure that's actually
>> having to do with lzo compression.  The call stack doesn't scream that
>> to me.
>>
>> It seems like when the invalid decompression happens, that code itself
>> doesn't give any kernel errors, but the rest of the kernel starts
>> spazzing.
> 
> Yep, even the last case it still looks like that it's kernel memory get
> corrupted.
> 
>>
>> I've replicated this probably about 15 times now.  Only happens on
>> these files that have inconsistent mirrored data.
> 
> From the thread, since you have already located the corrupted mirror,
> would you please provide the corrupted dump along with correct one?
> 
> It would help a lot for us to under stand what's going on.
> 
>>
>>
>>
>>> Would you please get the inode number of that corrupted files, and throw
>>> it through btrfs-debug-tree?
>>>
>>> # btrfs-debug-tree -t   | grep -A 50 \(
>>>
>>> This is the preferred method as it would provide all the details we
>>> need. But since it could contain sensitive info like filename, please
>>> double check before posting it.
>>
>> # ls -i 
>> system@00fa3c0596e64d2e84096520ca46f008-0001-00053cd2c1756577.journal
>> 291489 
>> system@00fa3c0596e64d2e84096520ca46f008-0001-00053cd2c1756577.journal
>>
>> # ls -i 
>> user-1000@b70add0ef010457d933fec23a2afa48a-0495-00053b6b6e65e9cf.journal
>> 72267 
>> user-1000@b70add0ef010457d933fec23a2afa48a-0495-00053b6b6e65e9cf.journal
>>
>> # btrfs-debug-tree -t 5 /dev/lvm/newMain1 | grep -A 50 \(291489 >
>> debug.tree.291489
>> Available at: http://termbin.com/kegj
>>
>> # btrfs-debug-tree -t 5 /dev/lvm/newMain1 | grep -A 50 \(72267 >
>> debug.tree.72267
>> Available at: http://termbin.com/xhdc
> 
> The dump indicates the same conclusion you reached.
> The inode has NODATACOW NODATASUM flag, which means it should not has
> csum nor has data compressed.
> While in fact we have tons of compressed extents.
> 
> But the following fiemap result also shows that these extents get
> shared. This could happen when there is a snapshot.
> 
> So there is something wrong that btrfs allows compressed data to be
> generated for such file.
> (Could not reproduce the same behavior with 4.16 kernel, could such
> problem happens in older kernels? Or just get fixed recently?)

OK, I could reproduce it now.

Just mount with -o nodatasum, then create a file.
Remount with compress-force=lzo, then write something.

So at least btrfs should disallow such thing.

Thanks,
Qu

> 
> Then some corruption screwed up the compressed data, and when we
> decompress, the kernel is screwed up.
> 
> 
> To pindown the lzo decompress corruption, kasan would be a nice try.
> However this means you need to enable it at compile time, and recompile
> a kernel.
> Not to mention kasan has a great impact on performance.
> 
> But it should provide more info before memory get corrupted.
> 
> Thanks,
> Qu
> 
>>
>>
>>
>>> Or fiemap of that file could also help:
>>>
>>> # xfs_io -c "fiemap -v" 
>>>
>>> This is completely safe, but I'm not 100% sure about if the info is enough.
>>
>> # xfs_io -c "fiemap -v"
>> system@00fa3c0596e64d2e84096520ca46f008-0001-00053cd2c1756577.journal
>> Available at: http://termbin.com/nsej
>>
>> # xfs_io -c "fiemap -v"
>> system@00fa3c0596e64d2e84096520ca46f008-0001-00053cd2c1756577.journal
>> Available at: http://termbin.com/4fiz
> 
> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org