Re: [PATCH V2] test online label ioctl
On Mon, May 14, 2018 at 06:26:07PM -0500, Eric Sandeen wrote: > On 5/14/18 6:11 PM, Dave Chinner wrote: > > On Mon, May 14, 2018 at 12:09:16PM -0500, Eric Sandeen wrote: > >> This tests the online label ioctl that btrfs has, which has been > >> recently proposed for XFS. > >> > >> To run, it requires an updated xfs_io with the label command and a > >> filesystem that supports it > >> > >> A slight change here to _require_xfs_io_command as well, so that tests > >> which simply fail with "Inappropriate ioctl" can be caught in the > >> common case. > >> > >> Signed-off-by: Eric Sandeen> >> --- > > > > >> +# The maximum filesystem label length, not including terminating NULL > >> +_label_get_max() > >> +{ > >> + case $FSTYP in > >> + xfs) > >> + MAXLEN=12 > >> + ;; > >> + btrfs) > >> + MAXLEN=255 > >> + ;; > >> + *) > >> + MAXLEN=0 > > > > Why not just _notrun here? > > do we want to go through the other steps only to get here and notrun > halfway through? > > >> + ;; > >> + esac > >> + > >> + echo $MAXLEN > >> +} > >> + > >> +_require_label_get_max() > >> +{ > >> + if [ $(_label_get_max) -eq 0 ]; then > >> + _notrun "$FSTYP does not define maximum label length" > >> + fi > > > > And this check can go away? > > We'd like to know if all the validations in this can complete before we > get started, right? That's why I did it this way. Sure, just trying to be a bit defensive as people often forget _requires directives when writing new tests > > Also, shouldn't it be _require_online_label_change() ? And then > > maybe you can move the xfs_io label command check inside it? > > Well, we want to know a lot of things: > > 1) can the kernel code for this filesystem support online label > 2) does xfs_io support the label command > 3) does this test know the maximum label length to test for this fs > > the above stuff is for 3) What I was suggesting was doing all of these in one function similar to _require_xfs_sparse_inodes, _require_meta_uuid, etc: _require_online_label_change() { # need xfs_io support _require_xfs_io_command "label" # need fstests knowledge of label size [ $(_label_get_max) -eq 0 ] && _notrun "$FSTYP does not define maximum label length" # need kernel support $XFS_IO_PROG -c label $TEST_DIR > /dev/null 2>&1 [ $? -ne 0 ] && _notrun "Kernel does not support FS_IOC_GETLABEL" } Which also means that the label_f command in xfs_io needs to set the exitcode to non-zero when the ioctl fails so that xfs_io returns non-zero on failure... Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set
On 2018年05月15日 00:52, David Sterba wrote: > On Mon, May 14, 2018 at 03:02:10PM +0800, Qu Wenruo wrote: >> As btrfs(5) specified: >> >> Note >> If nodatacow or nodatasum are enabled, compression is disabled. >> >> If NODATASUM or NODATACOW set, we should not compress the extent. >> >> And in fact, we have bug report about corrupted compressed extent >> leading to memory corruption in mail list. > > Link please. > >> Although it's mostly buggy lzo implementation causing the problem, btrfs >> still needs to be fixed to meet the specification. > > That's very vague, what's the LZO bug? If the input is garbage and lzo > decompression cannot decompress it, it's not a lzo bug. Still digging. Would update this content in next version. > >> Reported-by: James Harvey>> Signed-off-by: Qu Wenruo >> --- >> fs/btrfs/inode.c | 8 >> 1 file changed, 8 insertions(+) >> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >> index d241285a0d2a..dbef3f404559 100644 >> --- a/fs/btrfs/inode.c >> +++ b/fs/btrfs/inode.c >> @@ -396,6 +396,14 @@ static inline int inode_need_compress(struct inode >> *inode, u64 start, u64 end) >> { >> struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); >> >> +/* >> + * Btrfs doesn't support compression without csum or CoW. >> + * This should have the highest priority. >> + */ >> +if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW || >> +BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) >> +return 0; > > This is also the wrong place to fix that, NODATASUM or NODATACOW inode > should never make it to compress_file_range (that calls > inode_need_compress). Nope, in run_delalloc_range(), we calls inode_need_compress() to determine if we goes to normal cow routine or goes to async routine (compression). So inode_need_compress() looks like a pretty valid location. Just mount with nodatasum and create an inode, then remount to datasum,compress, write something to that inode, you could hit the behavior. Thanks, Qu > >> + >> /* force compress */ >> if (btrfs_test_opt(fs_info, FORCE_COMPRESS)) >> return 1; >> -- >> 2.17.0 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > signature.asc Description: OpenPGP digital signature
Re: verify key failure
On 2018年05月14日 22:35, Liu Bo wrote: > Hi, > > I got another warning of verify_level_key by running btrfs/124 in a loop, I'm > testing against 4.17-rc3. > > Not sure if it's false positive. > > [101414.336691] WARNING: CPU: 3 PID: 30194 at fs/btrfs/disk-io.c:455 > btree_read_extent_buffer_pages+0x183/0x220 [btrfs] > [101414.340372] Modules linked in: btrfs(O) xor zstd_decompress zstd_compress > xxhash zlib_inflate lzo_compress lzo_decompress zlib_deflate raid6_pq > dm_flakey [last unloaded: xor] > [101414.345713] CPU: 3 PID: 30194 Comm: btrfs Tainted: GW O > 4.17.0-rc3-liubo+ #35 > [101414.348501] RIP: 0010:btree_read_extent_buffer_pages+0x183/0x220 [btrfs] > ... > [101414.369713] Call Trace: > [101414.370477] read_tree_block+0x3d/0x60 [btrfs] > [101414.371946] read_block_for_search.isra.11+0x19c/0x350 [btrfs] > [101414.373915] btrfs_search_slot+0x4a0/0xa60 [btrfs] > [101414.375489] ? trace_hardirqs_on_caller+0x12/0x1c0 > [101414.377080] ? btrfs_lookup_ordered_extent+0x8b/0xd0 [btrfs] > [101414.379007] btrfs_lookup_csum+0x42/0x130 [btrfs] > [101414.380456] __btrfs_lookup_bio_sums+0x2fb/0x6a0 [btrfs] > [101414.381554] btrfs_submit_bio_hook+0xbb/0x180 [btrfs] > [101414.382598] submit_one_bio+0x57/0x80 [btrfs] > [101414.383509] submit_extent_page+0xd5/0x1f0 [btrfs] > [101414.384507] __do_readpage+0x2a6/0x770 [btrfs] > [101414.385449] ? btrfs_create_repair_bio+0x100/0x100 [btrfs] > [101414.386576] ? btrfs_direct_IO+0x3a0/0x3a0 [btrfs] > [101414.387569] ? btrfs_direct_IO+0x3a0/0x3a0 [btrfs] > [101414.388562] __extent_readpages+0x2e2/0x330 [btrfs] > [101414.389584] extent_readpages+0x10e/0x1a0 [btrfs] > [101414.390565] __do_page_cache_readahead+0x283/0x340 > [101414.391550] ? ondemand_readahead+0x207/0x460 > [101414.392451] ondemand_readahead+0x207/0x460 > [101414.393353] relocate_file_extent_cluster+0x364/0x4c0 [btrfs] > [101414.394546] relocate_block_group+0x5d4/0x6e0 [btrfs] > ... > [101414.432616] BTRFS error (device sdb): tree first key mismatch detected, > bytenr=30523392 key expected=(18446744073709551606, 128, 1120665600) has=(1, > 204, 22020096) The expected key is completely fine, while the found one obviously belongs to extent tree. Maybe that's the bug which I'm always chasing. Can you reproduce it again with btrfs_print_tree() added to provide more info? Thanks, Qu > > thanks, > -liubo > signature.asc Description: OpenPGP digital signature
[PATCH v2 1/3] btrfs-progs: check: check symlinks with append/immutable flags
Define new macro I_ERR_ODD_INODE_FLAGS to represents odd inode flags. Symlinks should never have append/immutable flags. While processing inodes, if found a symlink with append/immutable flags, mark the inode record with I_ERR_ODD_INODE_FLAGS. This is for original mode. Signed-off-by: Su Yue--- check/main.c | 7 +++ check/mode-original.h | 1 + 2 files changed, 8 insertions(+) diff --git a/check/main.c b/check/main.c index 68da994f7ae0..c764fc011ded 100644 --- a/check/main.c +++ b/check/main.c @@ -576,6 +576,8 @@ static void print_inode_error(struct btrfs_root *root, struct inode_record *rec) fprintf(stderr, ", link count wrong"); if (errors & I_ERR_FILE_EXTENT_ORPHAN) fprintf(stderr, ", orphan file extent"); + if (errors & I_ERR_ODD_INODE_FLAGS) + fprintf(stderr, ", odd inode flags"); fprintf(stderr, "\n"); /* Print the orphan extents if needed */ if (errors & I_ERR_FILE_EXTENT_ORPHAN) @@ -805,6 +807,7 @@ static int process_inode_item(struct extent_buffer *eb, { struct inode_record *rec; struct btrfs_inode_item *item; + u64 flags; rec = active_node->current; BUG_ON(rec->ino != key->objectid || rec->refs > 1); @@ -822,6 +825,10 @@ static int process_inode_item(struct extent_buffer *eb, rec->found_inode_item = 1; if (rec->nlink == 0) rec->errors |= I_ERR_NO_ORPHAN_ITEM; + flags = btrfs_inode_flags(eb, item); + if (rec->imode & BTRFS_FT_SYMLINK && + flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND)) + rec->errors |= I_ERR_ODD_INODE_FLAGS; maybe_free_inode_rec(_node->inode_cache, rec); return 0; } diff --git a/check/mode-original.h b/check/mode-original.h index 368de692fdd1..13cfa5b9e1b3 100644 --- a/check/mode-original.h +++ b/check/mode-original.h @@ -186,6 +186,7 @@ struct file_extent_hole { #define I_ERR_LINK_COUNT_WRONG (1 << 13) #define I_ERR_FILE_EXTENT_ORPHAN (1 << 14) #define I_ERR_FILE_EXTENT_TOO_LARGE(1 << 15) +#define I_ERR_ODD_INODE_FLAGS (1 << 16) struct inode_record { struct list_head backrefs; -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/3] btrfs-progs: check: verify symlinks with append/immutable flags
This patchset can be fetch from my github: https://github.com/Damenly/btrfs-progs/commits/odd_inode_flags It's based on devel. symlinks should never have append/immutable attributes. This patchset enables btrfs check to verify such corruption. PATCH[1] is for original mode. PATCH[2] is for original mode. PATCH[3] adds a test image. For further use, the directory is called bad-inode-flags. #issue 133 --- Changelog: v2: Use "rec->errors |=" instead of "rec->errors =" in patch[1]. Define new error bit of invalid inode flags in lowmem mode. Adjust print message in patch[2]. Thanks, Qu and Nikolay. Rename test directory from odd-inode-flags to bad-inode-flags. Su Yue (3): btrfs-progs: check: check symlinks with append/immutable flags btrfs-progs: lowmem: check symlinks with append/immutable flags btrfs-progs: fsck-tests: add test case to check symlinks with bad flags check/main.c | 7 +++ check/mode-lowmem.c | 10 ++ check/mode-lowmem.h | 1 + check/mode-original.h| 1 + .../034-bad-inode-flags/default_case.img | Bin 0 -> 4096 bytes tests/fsck-tests/034-bad-inode-flags/test.sh | 15 +++ 6 files changed, 34 insertions(+) create mode 100644 tests/fsck-tests/034-bad-inode-flags/default_case.img create mode 100755 tests/fsck-tests/034-bad-inode-flags/test.sh -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/3] btrfs-progs: lowmem: check symlinks with append/immutable flags
Define new error bit INODE_FLAGS_ERROR to represents invalid inode flags error. Symlinks should never have append/immutable flags set. While checking inodes, if found a symlink with append/immutable flags, report and record the inode flags error. This is for lowmem mode. Signed-off-by: Su Yue--- check/mode-lowmem.c | 10 ++ check/mode-lowmem.h | 1 + 2 files changed, 11 insertions(+) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 9890180d1d3c..f598bc364de4 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -2274,6 +2274,7 @@ static int check_inode_item(struct btrfs_root *root, struct btrfs_path *path) struct btrfs_key last_key; u64 inode_id; u32 mode; + u64 flags; u64 nlink; u64 nbytes; u64 isize; @@ -2307,10 +2308,19 @@ static int check_inode_item(struct btrfs_root *root, struct btrfs_path *path) isize = btrfs_inode_size(node, ii); nbytes = btrfs_inode_nbytes(node, ii); mode = btrfs_inode_mode(node, ii); + flags = btrfs_inode_flags(node, ii); dir = imode_to_type(mode) == BTRFS_FT_DIR; nlink = btrfs_inode_nlink(node, ii); nodatasum = btrfs_inode_flags(node, ii) & BTRFS_INODE_NODATASUM; + if (mode & BTRFS_FT_SYMLINK && + flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND)) { + err |= INODE_FLAGS_ERROR; + error( +"symlinks must never have immutable/append flags set, root %llu inode item %llu flags %llu may be corrupted", + root->objectid, inode_id, flags); + } + while (1) { btrfs_item_key_to_cpu(path->nodes[0], _key, path->slots[0]); ret = btrfs_next_item(root, path); diff --git a/check/mode-lowmem.h b/check/mode-lowmem.h index e7ba62e2413e..91f7b6b1db53 100644 --- a/check/mode-lowmem.h +++ b/check/mode-lowmem.h @@ -44,6 +44,7 @@ #define DIR_COUNT_AGAIN (1<<20) /* DIR isize should be recalculated */ #define BG_ACCOUNTING_ERROR (1<<21) /* Block group accounting error */ #define FATAL_ERROR (1<<22) /* Fatal bit for errno */ +#define INODE_FLAGS_ERROR (1<<23) /* Invalid inode flags */ /* * Error bit for low memory mode check. -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/3] btrfs-progs: fsck-tests: add test case to check symlinks with bad flags
There are two bad symlinks in the test case. One is with immutable attribute. Another one is with append attribute. Signed-off-by: Su Yue--- .../034-bad-inode-flags/default_case.img | Bin 0 -> 4096 bytes tests/fsck-tests/034-bad-inode-flags/test.sh | 15 +++ 2 files changed, 15 insertions(+) create mode 100644 tests/fsck-tests/034-bad-inode-flags/default_case.img create mode 100755 tests/fsck-tests/034-bad-inode-flags/test.sh diff --git a/tests/fsck-tests/034-bad-inode-flags/default_case.img b/tests/fsck-tests/034-bad-inode-flags/default_case.img new file mode 100644 index ..43a2a6f62d5ef3afd77f117b577a56ad6098ed19 GIT binary patch literal 4096 zcmeH}c{CJUAIFU?k*yI4S@RGnly%15rj(_~G8rO6vS$b*hR9G^M_GnP*~gG%*XULD zecwiA3?YW>hM5=dd(Ly-=e+&-p7Z|Uo_l`xdw=)dbMEhRes@7VO!Ok2v8iSFcVXRY z0S9$YY#lgx4lMhgmx00fKp$b;YjjKwbl1Mc|4Xm#`|-vHGaC;^56vNgLjwQD1pG!) z0%s{UtRTF}`3%o*f~o}Ub~i7&5a;#~3S~wq>yb%mpwOOBQV@`K}NNkK2T)W zP3ZBKkdz|>(+ppAd?n`Mhq5Y_Mx6#)lkr%V0RYfv XKS#j!hDNZK|Zl0`#(aZhs{~>g=;i^@`D f+(}JrN_k>!De44F)K~Z_B2ji*e|B3TM=T#=3J?hFQkLT zcoHc*_{i*Zl|=_O)<@(9atD)c?k`SQU{$3zUo_)!00YeaVw>N{!sut0FUEY0%ig=a zMor(Rh4`G&3Z`YTQC;Vd8DgP6v(Pk4lF8cYH?czMG@QAi}DN #2dOFHE}ny2M`mZ24yt(4GUf}VgxH+`7uL@ zKDpXm>?+9{UNN|}xP%5x8e hlk9$5E{}{L^#S!Q*F3j-dy9VK#9I*B-fuj6aJJl z=P4E0NBM;7Fya(6l|UU2sBHD~_7;trI@t}asks;9Kk+;cNS#DJ)+5qi%9$5}Frqs< zGo#p;!NId;WbB=u?K!Wum6^@+(Q%lm<^l5MSYiPzgj6jAV1WhvGO+kDGX zzw5{X`z6v>V}g9P^96Zk*+d9P;fMmiO;S-TF=bQT1)?9j;R){>iIAX>R5$~D?vMpj zBnOV)t?w>WW|3jF#YJ{gK;@)e6U`2;z47;3jE!%&}epjB+(;{Nt3NKke zgQOrb4!>S43Sh|5t~6Q9_eALyPQFaH!cTLiXpX=W)Jx7Uxq;~%Hy)VR7GgFu`>T4N zZ?@I};@Zg%oqD*Vk4*qEy@RB5H{Z >mf%4)?l}C7)7z9RiH3+|z}`sur%ucH}%V zPz+A(a7rcq^2?5ccAa_hN*yFmglI!5N>l2$a={+~R77Z4R7$n%aIEVA=N$ z1b$h$FIy~<-xmLlH6{;68g#pjl7j63c)%xA>Dd9r0^C%QE)RQfgi2 zXiewE(h-{O)QWl=h!TjI+0de3-VC=YSC^0MN(pB5!!V!#68@VzTm&5G-AS>CCye#X zC(0KE`nt?7e5TX|O%{xg!*Ab5P=2^-DRjo81w`|=zUk#|&(uj(4p2-{$h3e`sM17w zdW#Tl75Q}}emQch`~$C9_sjTscj@M>sfi4kHHdcdX*RC$on>k-Dn@x1YE7>^O(DHf z{Sf3DiCn-r) e#3Nqt1-yQ4Z zjQbKaB#VjYFLQ=5E+mdcWR7)?7QwYYr_nqX2!rj`wma*sUcEyVH)-UMp9vk5p*sJi z)uP>;M~F@^1@nip{sw)_u!t0m3wfd_^tsEjdl3JV!)ti4tO%j3AZkU&1P8pdqT2@4 z)2a@Bm48L_Y|X>TZU*bQ)wNAuogJWSw*}wLwwe>k*K*tA6k?IBPtx`FWcq8tfigF_ z`^-m<5?dEeNYIhR9? zpXv|E5a?(uK@>aY^((xBosb-{GjPf)k(;|7FUGk~Pl7(`D#yu_syY49#0sXEQ zq2Yh_;mDS0F#bl9X5`NV5q91498spY6Yu}FrF1zva3DCCYrYxh$_DQ7>Kut=Pt1kZ z=
Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set
Don't know if this will help. I just learned about pstore, and see in there a dmesg that's interesting. The serial port kernel errors started this time with "BUG: unable to handle kernel paging request". The pstore dmesg has everything from there until the end of the first trace. But, the interesting part is the pstore dmesg has 310 "BTRFS: decompress failed" messages the serial port (the versions I've shared) version doesn't. (Sometimes the serial crashes have 1 of these btrfs errors, but never repeated like this.) These 310 btrfs errors are all uptime-stamped from 13110.016096 - 13110.253752, when the BUG is later at 13110.370494 (when the serial errors start.) With the kernel trying to decompress 310 times within 0.237656 seconds, maybe that's an indication with invalid data, it retries forever in a bad way crashing the kernel, rather than failing? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
4.15.6 crash: BUG at fs/btrfs/ctree.c:1862
static noinline struct extent_buffer * read_node_slot(struct btrfs_fs_info *fs_info, struct extent_buffer *parent, int slot) { int level = btrfs_header_level(parent); struct extent_buffer *eb; if (slot < 0 || slot >= btrfs_header_nritems(parent)) return ERR_PTR(-ENOENT); BUG_ON(level == 0); BTRFS info (device dm-2): relocating block group 13404622290944 flags data BTRFS info (device dm-2): found 9959 extents BTRFS info (device dm-2): found 9959 extents BTRFS info (device dm-2): relocating block group 13403548549120 flags data [ cut here ] kernel BUG at fs/btrfs/ctree.c:1862! invalid opcode: [#1] PREEMPT SMP PTI CPU: 5 PID: 8103 Comm: btrfs Tainted: G U 4.15.6-amd64-preempt-sysrq-20171018 #3 Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013 RIP: 0010:read_node_slot+0x3c/0x9e RSP: 0018:becfaa0b7b58 EFLAGS: 00210246 RAX: 00a0 RBX: 000c RCX: 0003 RDX: 000c RSI: 9a60e9d9de78 RDI: 00052f6e RBP: 9a60e9d9de78 R08: 0001 R09: becfaa0b7bf6 R10: 9a64988bd7e9 R11: 9a64988bd7c8 R12: e003d4bdb800 R13: 9a64a481 R14: R15: FS: 7fba34c9c8c0() GS:9a64de34() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 5a8b9c9a CR3: 0001446c6004 CR4: 001606e0 Call Trace: tree_advance+0xb1/0x11e btrfs_compare_trees+0x1c2/0x4d6 ? process_extent+0xdcf/0xdcf btrfs_ioctl_send+0x81e/0xc70 ? __kmalloc_track_caller+0xfb/0x10f _btrfs_ioctl_send+0xbc/0xe6 ? paravirt_sched_clock+0x5/0x8 ? set_task_rq+0x2f/0x80 ? task_rq_unlock+0x22/0x36 btrfs_ioctl+0x162f/0x1dc8 ? select_task_rq_fair+0xb65/0xb7a ? update_load_avg+0x16d/0x442 ? list_add+0x15/0x2e ? cfs_rq_throttled.isra.30+0x9/0x18 ? vfs_ioctl+0x1b/0x28 vfs_ioctl+0x1b/0x28 do_vfs_ioctl+0x4f4/0x53f ? __audit_syscall_entry+0xbf/0xe3 SyS_ioctl+0x52/0x76 do_syscall_64+0x72/0x81 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7fba34d835e7 RSP: 002b:7ffc32cf4cb8 EFLAGS: 0202 ORIG_RAX: 0010 RAX: ffda RBX: 523f RCX: 7fba34d835e7 RDX: 7ffc32cf4d40 RSI: 40489426 RDI: 0004 RBP: 0004 R08: R09: 7fba34c9b700 R10: 7fba34c9b9d0 R11: 0202 R12: 0003 R13: 563a30b87020 R14: 0001 R15: 0001 Code: f5 53 4c 8b a6 98 00 00 00 89 d3 4c 89 e7 e8 67 fd ff ff 85 db 78 63 4c 89 e7 41 88 c6 e8 92 fb ff ff 39 d8 76 54 45 84 f6 75 02 <0f> 0b 89 de 48 89 ef e8 2e ff ff ff 89 de 49 89 c4 48 89 ef e8 RIP: read_node_slot+0x3c/0x9e RSP: becfaa0b7b58 ---[ end trace a24e7de6b77b5cb1 ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: 0x1900 from 0x8100 (relocation range: 0x8000-0xbfff) -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] test online label ioctl
On 5/14/18 6:11 PM, Dave Chinner wrote: > On Mon, May 14, 2018 at 12:09:16PM -0500, Eric Sandeen wrote: >> This tests the online label ioctl that btrfs has, which has been >> recently proposed for XFS. >> >> To run, it requires an updated xfs_io with the label command and a >> filesystem that supports it >> >> A slight change here to _require_xfs_io_command as well, so that tests >> which simply fail with "Inappropriate ioctl" can be caught in the >> common case. >> >> Signed-off-by: Eric Sandeen>> --- >> +# The maximum filesystem label length, not including terminating NULL >> +_label_get_max() >> +{ >> +case $FSTYP in >> +xfs) >> +MAXLEN=12 >> +;; >> +btrfs) >> +MAXLEN=255 >> +;; >> +*) >> +MAXLEN=0 > > Why not just _notrun here? do we want to go through the other steps only to get here and notrun halfway through? >> +;; >> +esac >> + >> +echo $MAXLEN >> +} >> + >> +_require_label_get_max() >> +{ >> +if [ $(_label_get_max) -eq 0 ]; then >> +_notrun "$FSTYP does not define maximum label length" >> +fi > > And this check can go away? We'd like to know if all the validations in this can complete before we get started, right? That's why I did it this way. > Also, shouldn't it be _require_online_label_change() ? And then > maybe you can move the xfs_io label command check inside it? Well, we want to know a lot of things: 1) can the kernel code for this filesystem support online label 2) does xfs_io support the label command 3) does this test know the maximum label length to test for this fs the above stuff is for 3) >> +# remove previous $seqres.full before test >> +rm -f $seqres.full >> + >> +# real QA test starts here >> + >> +_supported_fs generic >> +_supported_os Linux >> +_require_scratch >> +_require_xfs_io_command "label" >> +_require_label_get_max >> + >> +_scratch_mkfs > $seqres.full 2>&1 >> +_scratch_mount >> + >> +# Make sure we can set & clear the label >> +$XFS_IO_PROG -c "label label.$seq" $SCRATCH_MNT >> +$XFS_IO_PROG -c "label" $SCRATCH_MNT >> + >> +# And that userspace can see it now, while mounted >> +# NB: some blkid has trailing whitespace, filter it out here >> +blkid -s LABEL $SCRATCH_DEV | _filter_scratch | sed -e "s/ $//g" >> + >> +# And that the it is still there when it's unmounted >> +_scratch_unmount >> +blkid -s LABEL $SCRATCH_DEV | _filter_scratch | sed -e "s/ $//g" > > Ok, so "LABEL" here is a special blkid match token yep >> +# And that it persists after a remount >> +_scratch_mount >> +$XFS_IO_PROG -c "label" $SCRATCH_MNT >> + >> +# And that a too-long label is rejected, beyond the interface max: >> +LABEL=$(perl -e "print 'l' x 257;") > > And now you use it as a variable. Nasty and confusing. Using lower > case for local variables is the norm, right? I thought we were only > supposed to use upper case for global test harness variables... I guess I didn't know about that convention (or forgot) ok, yeah, that's a little confusing I guess. *shrug* I can change it if you prefer. Obviously the "blkid -s LABEL" must remain "LABEL" > But even making it "label" is problematic: > >> +$XFS_IO_PROG -c "label $LABEL" $SCRATCH_MNT > > because "label" is an xfs_io command. Perhaps call it "fs_label"? ok -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] test online label ioctl
On Mon, May 14, 2018 at 12:09:16PM -0500, Eric Sandeen wrote: > This tests the online label ioctl that btrfs has, which has been > recently proposed for XFS. > > To run, it requires an updated xfs_io with the label command and a > filesystem that supports it > > A slight change here to _require_xfs_io_command as well, so that tests > which simply fail with "Inappropriate ioctl" can be caught in the > common case. > > Signed-off-by: Eric Sandeen> --- > > (urgh send as proper new thread, sorry) > > This passes on btrfs, _notruns on xfs/ext4 of yore, and passes > on xfs w/ my online label patchset (as long as xfs_io has the new > capability) > > V2: Add a max label length helper > Set the proper btrfs max label length o_O oops > Filter trailing whitespace from blkid output > > > diff --git a/common/rc b/common/rc > index 9ffab7fd..88a99cff 100644 > --- a/common/rc > +++ b/common/rc > @@ -2158,6 +2158,9 @@ _require_xfs_io_command() > echo $testio | grep -q "Inappropriate ioctl" && \ > _notrun "xfs_io $command support is missing" > ;; > + "label") > + testio=`$XFS_IO_PROG -c "label" $TEST_DIR 2>&1` > + ;; > "open") > # -c "open $f" is broken in xfs_io <= 4.8. Along with the fix, > # a new -C flag was introduced to execute one shot commands. > @@ -2196,7 +2199,7 @@ _require_xfs_io_command() > rm -f $testfile 2>&1 > /dev/null > echo $testio | grep -q "not found" && \ > _notrun "xfs_io $command support is missing" > - echo $testio | grep -q "Operation not supported" && \ > + echo $testio | grep -q "Operation not supported\|Inappropriate ioctl" > && \ > _notrun "xfs_io $command failed (old kernel/wrong fs?)" > echo $testio | grep -q "Invalid" && \ > _notrun "xfs_io $command failed (old kernel/wrong fs/bad args?)" > @@ -3802,6 +3805,31 @@ _require_scratch_feature() > esac > } > > +# The maximum filesystem label length, not including terminating NULL > +_label_get_max() > +{ > + case $FSTYP in > + xfs) > + MAXLEN=12 > + ;; > + btrfs) > + MAXLEN=255 > + ;; > + *) > + MAXLEN=0 Why not just _notrun here? > + ;; > + esac > + > + echo $MAXLEN > +} > + > +_require_label_get_max() > +{ > + if [ $(_label_get_max) -eq 0 ]; then > + _notrun "$FSTYP does not define maximum label length" > + fi And this check can go away? Also, shouldn't it be _require_online_label_change() ? And then maybe you can move the xfs_io label command check inside it? > +# remove previous $seqres.full before test > +rm -f $seqres.full > + > +# real QA test starts here > + > +_supported_fs generic > +_supported_os Linux > +_require_scratch > +_require_xfs_io_command "label" > +_require_label_get_max > + > +_scratch_mkfs > $seqres.full 2>&1 > +_scratch_mount > + > +# Make sure we can set & clear the label > +$XFS_IO_PROG -c "label label.$seq" $SCRATCH_MNT > +$XFS_IO_PROG -c "label" $SCRATCH_MNT > + > +# And that userspace can see it now, while mounted > +# NB: some blkid has trailing whitespace, filter it out here > +blkid -s LABEL $SCRATCH_DEV | _filter_scratch | sed -e "s/ $//g" > + > +# And that the it is still there when it's unmounted > +_scratch_unmount > +blkid -s LABEL $SCRATCH_DEV | _filter_scratch | sed -e "s/ $//g" Ok, so "LABEL" here is a special blkid match token > +# And that it persists after a remount > +_scratch_mount > +$XFS_IO_PROG -c "label" $SCRATCH_MNT > + > +# And that a too-long label is rejected, beyond the interface max: > +LABEL=$(perl -e "print 'l' x 257;") And now you use it as a variable. Nasty and confusing. Using lower case for local variables is the norm, right? I thought we were only supposed to use upper case for global test harness variables... But even making it "label" is problematic: > +$XFS_IO_PROG -c "label $LABEL" $SCRATCH_MNT because "label" is an xfs_io command. Perhaps call it "fs_label"? Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set
On Mon, May 14, 2018 at 12:52 PM, David Sterbawrote: > On Mon, May 14, 2018 at 03:02:10PM +0800, Qu Wenruo wrote: >> As btrfs(5) specified: >> >> Note >> If nodatacow or nodatasum are enabled, compression is disabled. >> >> If NODATASUM or NODATACOW set, we should not compress the extent. >> >> And in fact, we have bug report about corrupted compressed extent >> leading to memory corruption in mail list. > > Link please. https://bugzilla.kernel.org/show_bug.cgi?id=199707 & https://www.spinics.net/lists/linux-btrfs/msg77971.html >> Although it's mostly buggy lzo implementation causing the problem, btrfs >> still needs to be fixed to meet the specification. > > That's very vague, what's the LZO bug? If the input is garbage and lzo > decompression cannot decompress it, it's not a lzo bug. The bug is the kernel doesn't just give an I/O error. It totally crashes the system. Even a regular user cat'ing an invalid btrfs-lzo-compressed file will take the system down. Qu's current theory which I agree with is it causes some kind of random kernel memory corruption. The crashes sometimes include "BTRFS: decompress failed" at the start, but give backtraces for other things than btrfs lzo code. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set
On Mon, May 14, 2018 at 6:35 AM, Qu Wenruowrote: > And if possible, please don't just remove those offending files (yet). > Your binary dump would help a lot locating the root case. Absolutely. This is on a 50G LVM root volume, so I've been able to leave the original one unmodified and always mounted ro. Copied it and changed the UUID. So, I can leave it around for a while without a problem. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set
пн, 14 мая 2018 г. в 20:32, David Sterba: > On Mon, May 14, 2018 at 03:02:10PM +0800, Qu Wenruo wrote: > > As btrfs(5) specified: > > > > Note > > If nodatacow or nodatasum are enabled, compression is disabled. > > > > If NODATASUM or NODATACOW set, we should not compress the extent. > > > > And in fact, we have bug report about corrupted compressed extent > > leading to memory corruption in mail list. > Link please. > > Although it's mostly buggy lzo implementation causing the problem, btrfs > > still needs to be fixed to meet the specification. > That's very vague, what's the LZO bug? If the input is garbage and lzo > decompression cannot decompress it, it's not a lzo bug. > > Reported-by: James Harvey > > Signed-off-by: Qu Wenruo > > --- > > fs/btrfs/inode.c | 8 > > 1 file changed, 8 insertions(+) > > > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > > index d241285a0d2a..dbef3f404559 100644 > > --- a/fs/btrfs/inode.c > > +++ b/fs/btrfs/inode.c > > @@ -396,6 +396,14 @@ static inline int inode_need_compress(struct inode *inode, u64 start, u64 end) > > { > > struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); > > > > + /* > > + * Btrfs doesn't support compression without csum or CoW. > > + * This should have the highest priority. > > + */ > > + if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW || > > + BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) > > + return 0; > This is also the wrong place to fix that, NODATASUM or NODATACOW inode > should never make it to compress_file_range (that calls > inode_need_compress). David, i've talk about that some time ago: https://www.spinics.net/lists/linux-btrfs/msg73137.html NoCow files can be *easy* compressed. ``` ➜ ~ touch test ➜ ~ chattr +C test ➜ ~ lsattr test ---C-- test ➜ ~ dd if=/dev/zero of=./test bs=1M count=1 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00099878 s, 1.0 GB/s ➜ ~ sync ➜ ~ filefrag -v test Filesystem type is: 9123683e File size of test is 1048576 (256 blocks of 4096 bytes) ext: logical_offset:physical_offset: length: expected: flags: 0:0.. 255: 88592741.. 88592996:256: last,eof test: 1 extent found ➜ ~ btrfs fi def -vrczstd test test ➜ ~ filefrag -v test Filesystem type is: 9123683e File size of test is 1048576 (256 blocks of 4096 bytes) ext: logical_offset:physical_offset: length: expected: flags: 0:0.. 31: 3125.. 3156: 32: encoded 1: 32.. 63: 3180.. 3211: 32: 3157: encoded 2: 64.. 95: 3185.. 3216: 32: 3212: encoded 3: 96.. 127: 3188.. 3219: 32: 3217: encoded 4: 128.. 159: 3263.. 3294: 32: 3220: encoded 5: 160.. 191: 3355.. 3386: 32: 3295: encoded 6: 192.. 223: 3376.. 3407: 32: 3387: encoded 7: 224.. 255: 3411.. 3442: 32: 3408: last,encoded,eof test: 8 extents found ``` -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Clarification needed about libbtrfs & libbtrfsutil
On Mon, May 14, 2018 at 09:40:19AM +0100, Dimitri John Ledkov wrote: > Are both of these meant to be public libraries, installed on the user > systems, and available in .so variant as well for 3rd party > development and public dynamic linking? > > Or are these private internal libraries, which are installed as public > runtime only, simply to share code between the utils, but otherwise > provide no abi stability and will forever remain libfoo.so.0? They're both meant to be public. In fact, libbtrfsutil is already 1.0.0. > Or should these even be a noinst_ libraries (~= Libtool Convenience > Libraries), and are simply intermediate by-products? > > I'm asking because despite compiling shared & static variants of these > libraries, and "shared linked" and "static linked" variants of the > utils, it appears that all utilities are statically linking against > libbtrfs/libbtrfsutils. Thus no binaries nor bindings, dynamically > link against neither libbtrfs nor libbtrfsutil. > > Tweaking the makefile to use libs_shared variable instead of libs or > libs_static, results in slightly smaller binaries, dynamically linked > against libbtrfs/libbtrfsutil. > > But it is hard to tell if this is a bug/mistake, or an intentional feature. I'm not sure why we statically link libbtrfs into the the tools, and I just copied that for libbtrfsutil. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2] test online label ioctl
This tests the online label ioctl that btrfs has, which has been recently proposed for XFS. To run, it requires an updated xfs_io with the label command and a filesystem that supports it A slight change here to _require_xfs_io_command as well, so that tests which simply fail with "Inappropriate ioctl" can be caught in the common case. Signed-off-by: Eric Sandeen--- (urgh send as proper new thread, sorry) This passes on btrfs, _notruns on xfs/ext4 of yore, and passes on xfs w/ my online label patchset (as long as xfs_io has the new capability) V2: Add a max label length helper Set the proper btrfs max label length o_O oops Filter trailing whitespace from blkid output diff --git a/common/rc b/common/rc index 9ffab7fd..88a99cff 100644 --- a/common/rc +++ b/common/rc @@ -2158,6 +2158,9 @@ _require_xfs_io_command() echo $testio | grep -q "Inappropriate ioctl" && \ _notrun "xfs_io $command support is missing" ;; + "label") + testio=`$XFS_IO_PROG -c "label" $TEST_DIR 2>&1` + ;; "open") # -c "open $f" is broken in xfs_io <= 4.8. Along with the fix, # a new -C flag was introduced to execute one shot commands. @@ -2196,7 +2199,7 @@ _require_xfs_io_command() rm -f $testfile 2>&1 > /dev/null echo $testio | grep -q "not found" && \ _notrun "xfs_io $command support is missing" - echo $testio | grep -q "Operation not supported" && \ + echo $testio | grep -q "Operation not supported\|Inappropriate ioctl" && \ _notrun "xfs_io $command failed (old kernel/wrong fs?)" echo $testio | grep -q "Invalid" && \ _notrun "xfs_io $command failed (old kernel/wrong fs/bad args?)" @@ -3802,6 +3805,31 @@ _require_scratch_feature() esac } +# The maximum filesystem label length, not including terminating NULL +_label_get_max() +{ + case $FSTYP in + xfs) + MAXLEN=12 + ;; + btrfs) + MAXLEN=255 + ;; + *) + MAXLEN=0 + ;; + esac + + echo $MAXLEN +} + +_require_label_get_max() +{ + if [ $(_label_get_max) -eq 0 ]; then + _notrun "$FSTYP does not define maximum label length" + fi +} + init_rc diff --git a/tests/generic/479 b/tests/generic/479 old mode 100644 new mode 100755 diff --git a/tests/generic/485 b/tests/generic/485 new file mode 100755 index ..941afd3d --- /dev/null +++ b/tests/generic/485 @@ -0,0 +1,90 @@ +#! /bin/bash +# FS QA Test 485 +# +# Test the online filesystem label set/get ioctls +# +#--- +# Copyright (c) 2018 Red Hat, Inc. All Rights Reserved. +# Author: Eric Sandeen +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# remove previous $seqres.full before test +rm -f $seqres.full + +# real QA test starts here + +_supported_fs generic +_supported_os Linux +_require_scratch +_require_xfs_io_command "label" +_require_label_get_max + +_scratch_mkfs > $seqres.full 2>&1 +_scratch_mount + +# Make sure we can set & clear the label +$XFS_IO_PROG -c "label label.$seq" $SCRATCH_MNT +$XFS_IO_PROG -c "label" $SCRATCH_MNT + +# And that userspace can see it now, while mounted +# NB: some blkid has trailing whitespace, filter it out here +blkid -s LABEL $SCRATCH_DEV | _filter_scratch | sed -e "s/ $//g" + +# And that the it is still there when it's unmounted +_scratch_unmount +blkid -s LABEL $SCRATCH_DEV | _filter_scratch | sed -e "s/ $//g" + +# And that it persists after a remount +_scratch_mount +$XFS_IO_PROG -c "label" $SCRATCH_MNT + +# And that a too-long label is rejected, beyond the interface max: +LABEL=$(perl -e "print 'l' x 257;") +$XFS_IO_PROG -c "label $LABEL"
Re: [PATCH] test online label ioctl
This tests the online label ioctl that btrfs has, which has been recently proposed for XFS. To run, it requires an updated xfs_io with the label command and a filesystem that supports it A slight change here to _require_xfs_io_command as well, so that tests which simply fail with "Inappropriate ioctl" can be caught in the common case. Signed-off-by: Eric Sandeen--- This passes on btrfs, _notruns on xfs/ext4 of yore, and passes on xfs w/ my online label patchset (as long as xfs_io has the new capability) V2: Add a max label length helper Set the proper btrfs max label length o_O oops Filter trailing whitespace from blkid output diff --git a/common/rc b/common/rc index 9ffab7fd..88a99cff 100644 --- a/common/rc +++ b/common/rc @@ -2158,6 +2158,9 @@ _require_xfs_io_command() echo $testio | grep -q "Inappropriate ioctl" && \ _notrun "xfs_io $command support is missing" ;; + "label") + testio=`$XFS_IO_PROG -c "label" $TEST_DIR 2>&1` + ;; "open") # -c "open $f" is broken in xfs_io <= 4.8. Along with the fix, # a new -C flag was introduced to execute one shot commands. @@ -2196,7 +2199,7 @@ _require_xfs_io_command() rm -f $testfile 2>&1 > /dev/null echo $testio | grep -q "not found" && \ _notrun "xfs_io $command support is missing" - echo $testio | grep -q "Operation not supported" && \ + echo $testio | grep -q "Operation not supported\|Inappropriate ioctl" && \ _notrun "xfs_io $command failed (old kernel/wrong fs?)" echo $testio | grep -q "Invalid" && \ _notrun "xfs_io $command failed (old kernel/wrong fs/bad args?)" @@ -3802,6 +3805,31 @@ _require_scratch_feature() esac } +# The maximum filesystem label length, not including terminating NULL +_label_get_max() +{ + case $FSTYP in + xfs) + MAXLEN=12 + ;; + btrfs) + MAXLEN=255 + ;; + *) + MAXLEN=0 + ;; + esac + + echo $MAXLEN +} + +_require_label_get_max() +{ + if [ $(_label_get_max) -eq 0 ]; then + _notrun "$FSTYP does not define maximum label length" + fi +} + init_rc diff --git a/tests/generic/479 b/tests/generic/479 old mode 100644 new mode 100755 diff --git a/tests/generic/485 b/tests/generic/485 new file mode 100755 index ..941afd3d --- /dev/null +++ b/tests/generic/485 @@ -0,0 +1,90 @@ +#! /bin/bash +# FS QA Test 485 +# +# Test the online filesystem label set/get ioctls +# +#--- +# Copyright (c) 2018 Red Hat, Inc. All Rights Reserved. +# Author: Eric Sandeen +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# remove previous $seqres.full before test +rm -f $seqres.full + +# real QA test starts here + +_supported_fs generic +_supported_os Linux +_require_scratch +_require_xfs_io_command "label" +_require_label_get_max + +_scratch_mkfs > $seqres.full 2>&1 +_scratch_mount + +# Make sure we can set & clear the label +$XFS_IO_PROG -c "label label.$seq" $SCRATCH_MNT +$XFS_IO_PROG -c "label" $SCRATCH_MNT + +# And that userspace can see it now, while mounted +# NB: some blkid has trailing whitespace, filter it out here +blkid -s LABEL $SCRATCH_DEV | _filter_scratch | sed -e "s/ $//g" + +# And that the it is still there when it's unmounted +_scratch_unmount +blkid -s LABEL $SCRATCH_DEV | _filter_scratch | sed -e "s/ $//g" + +# And that it persists after a remount +_scratch_mount +$XFS_IO_PROG -c "label" $SCRATCH_MNT + +# And that a too-long label is rejected, beyond the interface max: +LABEL=$(perl -e "print 'l' x 257;") +$XFS_IO_PROG -c "label $LABEL" $SCRATCH_MNT + +# And it succeeds right at the
Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set
On Mon, May 14, 2018 at 03:02:10PM +0800, Qu Wenruo wrote: > As btrfs(5) specified: > > Note > If nodatacow or nodatasum are enabled, compression is disabled. > > If NODATASUM or NODATACOW set, we should not compress the extent. > > And in fact, we have bug report about corrupted compressed extent > leading to memory corruption in mail list. Link please. > Although it's mostly buggy lzo implementation causing the problem, btrfs > still needs to be fixed to meet the specification. That's very vague, what's the LZO bug? If the input is garbage and lzo decompression cannot decompress it, it's not a lzo bug. > Reported-by: James Harvey> Signed-off-by: Qu Wenruo > --- > fs/btrfs/inode.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index d241285a0d2a..dbef3f404559 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -396,6 +396,14 @@ static inline int inode_need_compress(struct inode > *inode, u64 start, u64 end) > { > struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); > > + /* > + * Btrfs doesn't support compression without csum or CoW. > + * This should have the highest priority. > + */ > + if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW || > + BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) > + return 0; This is also the wrong place to fix that, NODATASUM or NODATACOW inode should never make it to compress_file_range (that calls inode_need_compress). > + > /* force compress */ > if (btrfs_test_opt(fs_info, FORCE_COMPRESS)) > return 1; > -- > 2.17.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set
On Mon, May 14, 2018 at 12:46:31PM +0300, Nikolay Borisov wrote: > [Adding David to CC] > > On 14.05.2018 12:39, Roman Mamedov wrote: > > On Mon, 14 May 2018 11:36:26 +0300 > > Nikolay Borisovwrote: > > > >> So what made you have these expectation, is it codified somewhere > >> (docs/man pages etc)? I'm fine with that semantics IF this is what > >> people expect. > > > > "Compression ...does not work for NOCOW files": > > https://btrfs.wiki.kernel.org/index.php/Compression > > > > The mount options man page does not say that the NOCOW attribute of files > > will > > be disregarded with compress-force. It only mentions interaction with the > > nodatacow and nodatasum mount options. So I'd expect the attribute to still > > work and prevent compression of NOCOW files. > > I wouldn't say this is very clear, it needs to be stated explicitly. > > > > >> Now the question is why people grew up to have this expectation and not the > >> other way round? IMO force_compress should really disregard everything else > > > > Both are knobs that the user needs to explicitly set, the difference is that > > the +C attribute is fine-grained and the mount option is global. If they are > > set by the user to conflicting values, it seems more useful to have the > > fine-grained control override the global one, not the other way round. > > This is valid reasoning but so is mine. So I'd like to have some rules > on that matter such that in the future things will have consistent > semantics. Obviously in this case the "local options trump global ones" > seems to be prevalent. I don't have problem with that but this should > codified somewhere. > > > David, what's your take on that. Where do you think will be the best > place to say that local, per-inode options take precedence over global ones? The order of precedence is roughly like this, whenever the respective containing object exits: - inode - directory - subvolume - mount options - filesystem defaults There's some special cases for the compression, that are documented separately. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO()
On Fri, May 11, 2018 at 01:30:01PM -0700, Omar Sandoval wrote: > On Fri, May 11, 2018 at 09:05:38PM +0100, Al Viro wrote: > > On Thu, May 10, 2018 at 11:30:10PM -0700, Omar Sandoval wrote: > > > do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, > > > struct block_device *bdev, struct iov_iter *iter, > > > get_block_t get_block, dio_iodone_t end_io, > > > - dio_submit_t submit_io, int flags) > > > + dio_submit_t submit_io, int flags, void *private) > > > > Oh, dear... That's what, 9 arguments? I agree that the hack in question > > is obscene, but so is this ;-/ > > So looking at these one by one, obviously needed: > > - iocb > - inode > - iter > > bdev is almost always inode->i_sb->s_bdev, except for Btrfs :( > > These could _maybe_ go in struct kiocb: > > - flags could maybe be folded into ki_flags > - private could maybe go in iocb->private, but I haven't yet read > through to figure out if we're already using iocb->private for direct > I/O I think the kiocb::private can be used for the purpose. There's only one user, ext4, that also passes some DIO data around so it would in line with the interface AFAICS. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: verify key failure
On 14 May 2018, at 10:35, Liu Bo wrote: Hi, I got another warning of verify_level_key by running btrfs/124 in a loop, I'm testing against 4.17-rc3. Not sure if it's false positive. How long does this take to trigger? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/3] btrfs: replace waitqueue_actvie with cond_wake_up
On 14.05.2018 15:23, David Sterba wrote: > Use the wrappers and reduce the amount of low-level details about the > waitqueue management. > > Signed-off-by: David SterbaReviewed-by: Nikolay Borisov > --- > fs/btrfs/compression.c | 7 +-- > fs/btrfs/delayed-inode.c | 9 +++-- > fs/btrfs/dev-replace.c | 10 -- > fs/btrfs/extent-tree.c | 7 +-- > fs/btrfs/inode.c | 9 +++-- > fs/btrfs/locking.c | 34 +++--- > fs/btrfs/ordered-data.c | 14 -- > fs/btrfs/transaction.c | 7 +-- > fs/btrfs/tree-log.c | 34 -- > 9 files changed, 40 insertions(+), 91 deletions(-) > > diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c > index 5268c9f85ca7..dcebb91e4e0c 100644 > --- a/fs/btrfs/compression.c > +++ b/fs/btrfs/compression.c > @@ -990,12 +990,7 @@ static void __free_workspace(int type, struct list_head > *workspace, > btrfs_compress_op[idx]->free_workspace(workspace); > atomic_dec(total_ws); > wake: > - /* > - * Make sure counter is updated before we wake up waiters. > - */ > - smp_mb(); > - if (waitqueue_active(ws_wait)) > - wake_up(ws_wait); > + cond_wake_up(ws_wait); > } > > static void free_workspace(int type, struct list_head *ws) > diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c > index a8d492dbd3e7..fe6caa7e698b 100644 > --- a/fs/btrfs/delayed-inode.c > +++ b/fs/btrfs/delayed-inode.c > @@ -460,13 +460,10 @@ static void finish_one_item(struct btrfs_delayed_root > *delayed_root) > { > int seq = atomic_inc_return(_root->items_seq); > > - /* > - * atomic_dec_return implies a barrier for waitqueue_active > - */ > + /* atomic_dec_return implies a barrier */ > if ((atomic_dec_return(_root->items) < > - BTRFS_DELAYED_BACKGROUND || seq % BTRFS_DELAYED_BATCH == 0) && > - waitqueue_active(_root->wait)) > - wake_up(_root->wait); > + BTRFS_DELAYED_BACKGROUND || seq % BTRFS_DELAYED_BATCH == 0)) > + cond_wake_up_nomb(_root->wait); > } > > static void __btrfs_remove_delayed_item(struct btrfs_delayed_item > *delayed_item) > diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c > index f82be266ba4b..d33ff5b23302 100644 > --- a/fs/btrfs/dev-replace.c > +++ b/fs/btrfs/dev-replace.c > @@ -916,9 +916,9 @@ void btrfs_dev_replace_clear_lock_blocking( > ASSERT(atomic_read(_replace->read_locks) > 0); > ASSERT(atomic_read(_replace->blocking_readers) > 0); > read_lock(_replace->lock); > - if (atomic_dec_and_test(_replace->blocking_readers) && > - waitqueue_active(_replace->read_lock_wq)) > - wake_up(_replace->read_lock_wq); > + /* Barrier implied by atomic_dec_and_test */ > + if (atomic_dec_and_test(_replace->blocking_readers)) > + cond_wake_up_nomb(_replace->read_lock_wq); > } > > void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info) > @@ -929,9 +929,7 @@ void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info > *fs_info) > void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount) > { > percpu_counter_sub(_info->bio_counter, amount); > - > - if (waitqueue_active(_info->replace_wait)) > - wake_up(_info->replace_wait); > + cond_wake_up_nomb(_info->replace_wait); > } > > void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info) > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 3871658b6ab1..94fc825a126d 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -11094,12 +11094,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, > struct fstrim_range *range) > void btrfs_end_write_no_snapshotting(struct btrfs_root *root) > { > percpu_counter_dec(>subv_writers->counter); > - /* > - * Make sure counter is updated before we wake up waiters. > - */ > - smp_mb(); > - if (waitqueue_active(>subv_writers->wait)) > - wake_up(>subv_writers->wait); > + cond_wake_up(>subv_writers->wait); > } > > int btrfs_start_write_no_snapshotting(struct btrfs_root *root) > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index d241285a0d2a..cf6f2815ffee 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -1156,13 +1156,10 @@ static noinline void async_cow_submit(struct > btrfs_work *work) > nr_pages = (async_cow->end - async_cow->start + PAGE_SIZE) >> > PAGE_SHIFT; > > - /* > - * atomic_sub_return implies a barrier for waitqueue_active > - */ > + /* atomic_sub_return implies a barrier */ > if (atomic_sub_return(nr_pages, _info->async_delalloc_pages) < > - 5 * SZ_1M && > - waitqueue_active(_info->async_submit_wait)) > - wake_up(_info->async_submit_wait); > + 5 * SZ_1M) > +
Re: [PATCH v4 11/12] Btrfs: renumber BTRFS_INODE_ runtime flags
On Fri, May 11, 2018 at 01:13:39PM -0700, Omar Sandoval wrote: > From: Omar Sandoval> > We got rid of BTRFS_INODE_HAS_ORPHAN_ITEM and > BTRFS_INODE_ORPHAN_META_RESERVED, so we can renumber the flags to make > them consecutive again. > > Signed-off-by: Omar Sandoval > --- > fs/btrfs/btrfs_inode.h | 16 > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h > index 4807cde0313d..bbbe7f308d68 100644 > --- a/fs/btrfs/btrfs_inode.h > +++ b/fs/btrfs/btrfs_inode.h > @@ -20,14 +20,14 @@ > * new data the application may have written before commit. > */ > #define BTRFS_INODE_ORDERED_DATA_CLOSE 0 > -#define BTRFS_INODE_DUMMY2 > -#define BTRFS_INODE_IN_DEFRAG3 > -#define BTRFS_INODE_HAS_ASYNC_EXTENT 5 > -#define BTRFS_INODE_NEEDS_FULL_SYNC 6 > -#define BTRFS_INODE_COPY_EVERYTHING 7 > -#define BTRFS_INODE_IN_DELALLOC_LIST 8 > -#define BTRFS_INODE_READDIO_NEED_LOCK9 > -#define BTRFS_INODE_HAS_PROPS10 > +#define BTRFS_INODE_DUMMY1 > +#define BTRFS_INODE_IN_DEFRAG2 > +#define BTRFS_INODE_HAS_ASYNC_EXTENT 3 > +#define BTRFS_INODE_NEEDS_FULL_SYNC 4 > +#define BTRFS_INODE_COPY_EVERYTHING 5 > +#define BTRFS_INODE_IN_DELALLOC_LIST 6 > +#define BTRFS_INODE_READDIO_NEED_LOCK7 > +#define BTRFS_INODE_HAS_PROPS8 I'll update it to something like this, as we want the auto-numbering from enums: --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -19,17 +19,19 @@ * ordered operations list so that we make sure to flush out any * * new data the application may have written before commit. */ -#define BTRFS_INODE_ORDERED_DATA_CLOSE 0 -#define BTRFS_INODE_ORPHAN_META_RESERVED 1 -#define BTRFS_INODE_DUMMY 2 -#define BTRFS_INODE_IN_DEFRAG 3 -#define BTRFS_INODE_HAS_ORPHAN_ITEM4 -#define BTRFS_INODE_HAS_ASYNC_EXTENT 5 -#define BTRFS_INODE_NEEDS_FULL_SYNC6 -#define BTRFS_INODE_COPY_EVERYTHING7 -#define BTRFS_INODE_IN_DELALLOC_LIST 8 -#define BTRFS_INODE_READDIO_NEED_LOCK 9 -#define BTRFS_INODE_HAS_PROPS 10 +enum { + BTRFS_INODE_ORDERED_DATA_CLOSE = 0, + BTRFS_INODE_ORPHAN_META_RESERVED, + BTRFS_INODE_DUMMY, + BTRFS_INODE_IN_DEFRAG, + BTRFS_INODE_HAS_ORPHAN_ITEM, + BTRFS_INODE_HAS_ASYNC_EXTENT, + BTRFS_INODE_NEEDS_FULL_SYNC, + BTRFS_INODE_COPY_EVERYTHING, + BTRFS_INODE_IN_DELALLOC_LIST, + BTRFS_INODE_READDIO_NEED_LOCK, + BTRFS_INODE_HAS_PROPS, +}; -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] vfs: dedupe should return EPERM if permission is not granted
On Fri, May 11, 2018 at 12:26:51PM -0700, Mark Fasheh wrote: > Right now we return EINVAL if a process does not have permission to dedupe a > file. This was an oversight on my part. EPERM gives a true description of > the nature of our error, and EINVAL is already used for the case that the > filesystem does not support dedupe. > > Signed-off-by: Mark FashehAcked-by: David Sterba We've been using EINVAL when the request is invalid in the ioctls, eg. combination of arguments that does not make sense, while EPERM is for cases when the request is ok but there's still some permission missing, > } else if (!(is_admin || (dst_file->f_mode & FMODE_WRITE) || >uid_eq(current_fsuid(), dst->i_uid))) { > - info->status = -EINVAL; > + info->status = -EPERM; exactly like this. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix duplicate extents after fsync of file with prealloc extents
On Wed, May 09, 2018 at 04:01:46PM +0100, fdman...@kernel.org wrote: > From: Filipe Manana... > Fixes: 471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size after > fsync log replay") > Signed-off-by: Filipe Manana Thanks for the excellent and very educational writeup! Added to 4.17-rc queue as it fixes a regression. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/3] btrfs: add barriers to btrfs_sync_log before log_commit_wait wakeups
On 14.05.2018 15:23, David Sterba wrote: > Currently the code assumes that there's an implied barrier by the > sequence of code preceding the wakeup, namely the mutex unlock. > > As Nikolay pointed out: > > I think this is wrong (not your code) but the original assumption that > the RELEASE semantics provided by mutex_unlock is sufficient. > According to memory-barriers.txt: > > Section 'LOCK ACQUISITION FUNCTIONS' states: > > (2) RELEASE operation implication: > > Memory operations issued before the RELEASE will be completed before the > RELEASE operation has completed. > > Memory operations issued after the RELEASE *may* be completed before the > RELEASE operation has completed. > > (I've bolded the may portion) > > The example given there: > > As an example, consider the following: > > *A = a; > *B = b; > ACQUIRE > *C = c; > *D = d; > RELEASE > *E = e; > *F = f; > > The following sequence of events is acceptable: > > ACQUIRE, {*F,*A}, *E, {*C,*D}, *B, RELEASE > > So if we assume that *C is modifying the flag which the waitqueue is checking, > and *E is the actual wakeup, then those accesses can be re-ordered... > > IMHO this code should be considered broken... > --- > > To be on the safe side, add the barriers. The synchronization logic > around log using the mutexes and several other threads does not make it > easy to reason for/against the barrier. > > CC: Nikolay Borisov> Link: https://lkml.kernel.org/r/6ee068d8-1a69-3728-00d1-d86293d43...@suse.com > Signed-off-by: David Sterba > --- Apart from what I said initially which prompted introducing this patch I can't say anything else. I think the fsync code is in dire need of being rewritten/simplified. But in so far as the newly introduced barriers are concerned: Reviewed-by: Nikolay Borisov > fs/btrfs/tree-log.c | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c > index 43758e30aa7a..fa5b3dc5f4d5 100644 > --- a/fs/btrfs/tree-log.c > +++ b/fs/btrfs/tree-log.c > @@ -3116,8 +3116,11 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, > mutex_unlock(_root_tree->log_mutex); > > /* > - * The barrier before waitqueue_active is implied by mutex_unlock > + * The barrier before waitqueue_active is needed so all the updates > + * above are seen by the woken threads. It might not be necessary, but > + * proving that seems to be hard. >*/ > + smp_mb(); > if (waitqueue_active(_root_tree->log_commit_wait[index2])) > wake_up(_root_tree->log_commit_wait[index2]); > out: > @@ -3128,8 +3131,11 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, > mutex_unlock(>log_mutex); > > /* > - * The barrier before waitqueue_active is implied by mutex_unlock > + * The barrier before waitqueue_active is needed so all the updates > + * above are seen by the woken threads. It might not be necessary, but > + * proving that seems to be hard. >*/ > + smp_mb(); > if (waitqueue_active(>log_commit_wait[index1])) > wake_up(>log_commit_wait[index1]); > return ret; > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
verify key failure
Hi, I got another warning of verify_level_key by running btrfs/124 in a loop, I'm testing against 4.17-rc3. Not sure if it's false positive. [101414.336691] WARNING: CPU: 3 PID: 30194 at fs/btrfs/disk-io.c:455 btree_read_extent_buffer_pages+0x183/0x220 [btrfs] [101414.340372] Modules linked in: btrfs(O) xor zstd_decompress zstd_compress xxhash zlib_inflate lzo_compress lzo_decompress zlib_deflate raid6_pq dm_flakey [last unloaded: xor] [101414.345713] CPU: 3 PID: 30194 Comm: btrfs Tainted: GW O 4.17.0-rc3-liubo+ #35 [101414.348501] RIP: 0010:btree_read_extent_buffer_pages+0x183/0x220 [btrfs] ... [101414.369713] Call Trace: [101414.370477] read_tree_block+0x3d/0x60 [btrfs] [101414.371946] read_block_for_search.isra.11+0x19c/0x350 [btrfs] [101414.373915] btrfs_search_slot+0x4a0/0xa60 [btrfs] [101414.375489] ? trace_hardirqs_on_caller+0x12/0x1c0 [101414.377080] ? btrfs_lookup_ordered_extent+0x8b/0xd0 [btrfs] [101414.379007] btrfs_lookup_csum+0x42/0x130 [btrfs] [101414.380456] __btrfs_lookup_bio_sums+0x2fb/0x6a0 [btrfs] [101414.381554] btrfs_submit_bio_hook+0xbb/0x180 [btrfs] [101414.382598] submit_one_bio+0x57/0x80 [btrfs] [101414.383509] submit_extent_page+0xd5/0x1f0 [btrfs] [101414.384507] __do_readpage+0x2a6/0x770 [btrfs] [101414.385449] ? btrfs_create_repair_bio+0x100/0x100 [btrfs] [101414.386576] ? btrfs_direct_IO+0x3a0/0x3a0 [btrfs] [101414.387569] ? btrfs_direct_IO+0x3a0/0x3a0 [btrfs] [101414.388562] __extent_readpages+0x2e2/0x330 [btrfs] [101414.389584] extent_readpages+0x10e/0x1a0 [btrfs] [101414.390565] __do_page_cache_readahead+0x283/0x340 [101414.391550] ? ondemand_readahead+0x207/0x460 [101414.392451] ondemand_readahead+0x207/0x460 [101414.393353] relocate_file_extent_cluster+0x364/0x4c0 [btrfs] [101414.394546] relocate_block_group+0x5d4/0x6e0 [btrfs] ... [101414.432616] BTRFS error (device sdb): tree first key mismatch detected, bytenr=30523392 key expected=(18446744073709551606, 128, 1120665600) has=(1, 204, 22020096) thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix xattr loss after power failure
On Fri, May 11, 2018 at 04:42:42PM +0100, fdman...@kernel.org wrote: > From: Filipe Manana> > If a file has xattrs, we fsync it, to ensure we clear the flags > BTRFS_INODE_NEEDS_FULL_SYNC and BTRFS_INODE_COPY_EVERYTHING from its > inode, the current transaction commits and then we fsync it (without > either of those bits being set in its inode), we end up not logging > all its xattrs. This results in deleting all xattrs when replying the > log after a power failure. > > Trivial reproducer > > $ mkfs.btrfs -f /dev/sdb > $ mount /dev/sdb /mnt > > $ touch /mnt/foobar > $ setfattr -n user.xa -v qwerty /mnt/foobar > $ xfs_io -c "fsync" /mnt/foobar > > $ sync > > $ xfs_io -c "pwrite -S 0xab 0 64K" /mnt/foobar > $ xfs_io -c "fsync" /mnt/foobar > > > $ mount /dev/sdb /mnt > $ getfattr --absolute-names --dump /mnt/foobar > > $ > > So fix this by making sure all xattrs are logged if we log a file's inode > item and neither the flags BTRFS_INODE_NEEDS_FULL_SYNC nor > BTRFS_INODE_COPY_EVERYTHING were set in the inode. > > Fixes: 36283bf777d9 ("Btrfs: fix fsync xattr loss in the fast fsync path") > Cc: > Signed-off-by: Filipe Manana Added to next, thanks, and I'll probably add that to 4.17-rc too. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] btrfs: incremental send, fix BUG when invalid memory access
On Mon, May 14, 2018 at 12:39:36PM +0100, Filipe Manana wrote: > On Mon, May 14, 2018 at 3:51 AM, robbiekowrote: > > Signed-off-by: Robbie Ko > Reviewed-by: Filipe Manana > > Looks good, I would only change the subject to something like: > > Btrfs: send, fix invalid access to commit roots due to concurrent snapshotting > > David can probably do that when picking this patch. Patch updated, thank you both. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/2] btrfs: qgroup rescan fixes part 1
On Mon, May 14, 2018 at 09:38:11AM +0800, Qu Wenruo wrote: > This patchset is mainly focused on fixing qgroup rescan corruption. > > Since the whole btrfs qgroup is based on the modification between 2 > transactions, it only has correct qgroup delta. > While if the rescan can't provide a correct result from the very > beginning, qgroup numbers are corrupted. > > The patchset is fixing two types of qgroup corruption which could happen > by looping btrfs/017 with some possibility. > > 1) Not accounting tree blocks >Caused by the fact that qgroup rescan only searches commit root for >backref, while we're passing current extent root to search. >Fix it by also passing commit extent root. > > 2) Double accounting tree blocks >Caused by wrong rescan exit condition. >Currently qgroup only exit when it can't find any leaves beyond >rescan progress. >However it could cause problem when new transaction happens after >last rescan, and old leaves CoWed to new location, and double >accounting could happen. >Fix it by checking and leave qgroup rescan if we have hit last leaf, >instead of leaving it to next leaf rescan. > > > Changelog: > v2: > Remove unused tree_mod_seq_elem for the 1st patch. > Fix double unlock in 2nd patch. > Thanks Jeff for the update. V1 patches replaced, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hang/deadlock in 4.14.40 running rsync
Running an rsync to copy data onto a btrfs filesystem used for backups. It's appearing to deadlock/hang. The rsync process stops doing any IO, and no other IO is ongoing against the filesystem. The rsync is in state D in ps and is unkillable even with kill -9. The /proc//stack for the rsync: [] btrfs_async_run_delayed_refs+0x110/0x130 [btrfs] [] __btrfs_end_transaction+0x1cc/0x2d0 [btrfs] [] btrfs_dirty_inode+0x65/0xc0 [btrfs] [] btrfs_setattr+0x90/0x380 [btrfs] [] notify_change+0x2ce/0x400 [] chown_common+0xfa/0x150 [] SyS_lchown+0x6c/0xb0 [] do_syscall_64+0x55/0x100 [] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [] 0x dmesg with the btrfs stuff, including the initial mount: [ 6813.213251] BTRFS info (device sdc): enabling ssd optimizations [ 6813.213296] BTRFS info (device sdc): using spread ssd allocation scheme [ 6813.213342] BTRFS info (device sdc): use zstd compression [ 6813.213373] BTRFS info (device sdc): using free space tree [ 6813.213404] BTRFS info (device sdc): has skinny extents [36775.729252] BTRFS info (device sdc): left=1048576, need=2097152, flags=20 [36775.729333] BTRFS info (device sdc): space_info 2 has 1048576 free, is not full [36775.729376] BTRFS info (device sdc): space_info total=8388608, used=3145728, pinned=0, reserved=0, may_use=4194304, readonly=0 [82104.402887] [ cut here ] [82104.402948] WARNING: CPU: 10 PID: 6015 at fs/btrfs/locking.c:251 btrfs_tree_lock+0x1be/0x1d0 [btrfs] [82104.402996] Modules linked in: ipmi_si mpt3sas raid_class scsi_transport_sas dell_rbu nfsv3 nfsv4 nfs fscache ext2 mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm intel_powerclamp iTCO_wdt coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel iTCO_vendor_support dcdbas joydev evdev aes_x86_64 crypto_simd cryptd serio_raw pcspkr glue_helper sg ipmi_devintf ipmi_msghandler acpi_power_meter lpc_ich i7core_edac mfd_core button nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc loop autofs4 ext4 crc32c_generic crc16 mbcache jbd2 btrfs xor zstd_decompress zstd_compress xxhash raid6_pq hid_generic usbhid hid sd_mod crc32c_intel psmouse i2c_core ehci_pci uhci_hcd ehci_hcd megaraid_sas usbcore ixgbe usb_common mdio ptp pps_core scsi_mod bnx2 [82104.403404] [last unloaded: ipmi_si] [82104.403429] CPU: 10 PID: 6015 Comm: btrfs-transacti Tainted: G I 4.14.40 #1 [82104.403513] task: 88041df0f040 task.stack: c9000425 [82104.403562] RIP: 0010:btrfs_tree_lock+0x1be/0x1d0 [btrfs] [82104.403592] RSP: 0018:c90004253848 EFLAGS: 00010246 [82104.403622] RAX: 177f RBX: 88013dbd9c48 RCX: 0001 [82104.403660] RDX: 0001 RSI: 0002 RDI: 88013dbd9c48 [82104.403698] RBP: 0001 R08: 88041df0f040 R09: 8800 [82104.403736] R10: 0040 R11: 88013dbd9c48 R12: 88081c1af470 [82104.403773] R13: 0001 R14: 88081c1af468 R15: 88041eba5000 [82104.403812] FS: () GS:88082fd4() knlGS: [82104.403855] CS: 0010 DS: ES: CR0: 80050033 [82104.403886] CR2: 557275467dd0 CR3: 01c09001 CR4: 000206e0 [82104.403923] Call Trace: [82104.405350] btrfs_search_slot+0x761/0xa60 [btrfs] [82104.406762] btrfs_insert_empty_items+0x62/0xb0 [btrfs] [82104.408194] btrfs_insert_item+0x5b/0xc0 [btrfs] [82104.409595] btrfs_create_pending_block_groups+0xfb/0x1e0 [btrfs] [82104.410997] do_chunk_alloc+0x1e5/0x2a0 [btrfs] [82104.412404] find_free_extent+0xcd0/0xf60 [btrfs] [82104.413801] btrfs_reserve_extent+0x96/0x1e0 [btrfs] [82104.415191] btrfs_alloc_tree_block+0xfb/0x4a0 [btrfs] [82104.416569] __btrfs_cow_block+0x127/0x570 [btrfs] [82104.417913] btrfs_cow_block+0xdd/0x1c0 [btrfs] [82104.419247] btrfs_search_slot+0x227/0xa60 [btrfs] [82104.420570] ? btrfs_update_inode_item+0x59/0x100 [btrfs] [82104.421883] ? iput+0x72/0x1e0 [82104.423200] write_one_cache_group.isra.45+0x20/0x90 [btrfs] [82104.424518] btrfs_start_dirty_block_groups+0xff/0x420 [btrfs] [82104.425837] btrfs_commit_transaction+0x121/0x890 [btrfs] [82104.427161] ? start_transaction+0x84/0x410 [btrfs] [82104.428500] transaction_kthread+0x184/0x1a0 [btrfs] [82104.429826] kthread+0xf7/0x130 [82104.431128] ? btrfs_cleanup_transaction+0x4e0/0x4e0 [btrfs] [82104.432420] ? kthread_create_on_node+0x40/0x40 [82104.433700] ? SyS_exit_group+0xb/0x10 [82104.434958] ret_from_fork+0x1f/0x30 [82104.436182] Code: 43 58 85 c0 75 2c f0 ff 43 58 f0 ff 43 44 65 48 8b 04 25 00 4d 01 00 8b 80 10 04 00 00 89 43 40 48 83 c4 28 5b 5d 41 5c 41 5d c3 <0f> 0b e9 60 fe ff ff 0f 0b eb d0 0f 1f 80 00 00 00 00 8b 47 4c [82104.438766] ---[ end trace 2d8eafbb1dc47717 ]--- [82286.641371] INFO: task btrfs-transacti:6015 blocked for more than 120 seconds. [82286.642665] Tainted: GW I 4.14.40 #1 [82286.643866] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
[PATCH v3 3/3] btrfs: replace waitqueue_actvie with cond_wake_up
Use the wrappers and reduce the amount of low-level details about the waitqueue management. Signed-off-by: David Sterba--- fs/btrfs/compression.c | 7 +-- fs/btrfs/delayed-inode.c | 9 +++-- fs/btrfs/dev-replace.c | 10 -- fs/btrfs/extent-tree.c | 7 +-- fs/btrfs/inode.c | 9 +++-- fs/btrfs/locking.c | 34 +++--- fs/btrfs/ordered-data.c | 14 -- fs/btrfs/transaction.c | 7 +-- fs/btrfs/tree-log.c | 34 -- 9 files changed, 40 insertions(+), 91 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 5268c9f85ca7..dcebb91e4e0c 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -990,12 +990,7 @@ static void __free_workspace(int type, struct list_head *workspace, btrfs_compress_op[idx]->free_workspace(workspace); atomic_dec(total_ws); wake: - /* -* Make sure counter is updated before we wake up waiters. -*/ - smp_mb(); - if (waitqueue_active(ws_wait)) - wake_up(ws_wait); + cond_wake_up(ws_wait); } static void free_workspace(int type, struct list_head *ws) diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index a8d492dbd3e7..fe6caa7e698b 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -460,13 +460,10 @@ static void finish_one_item(struct btrfs_delayed_root *delayed_root) { int seq = atomic_inc_return(_root->items_seq); - /* -* atomic_dec_return implies a barrier for waitqueue_active -*/ + /* atomic_dec_return implies a barrier */ if ((atomic_dec_return(_root->items) < - BTRFS_DELAYED_BACKGROUND || seq % BTRFS_DELAYED_BATCH == 0) && - waitqueue_active(_root->wait)) - wake_up(_root->wait); + BTRFS_DELAYED_BACKGROUND || seq % BTRFS_DELAYED_BATCH == 0)) + cond_wake_up_nomb(_root->wait); } static void __btrfs_remove_delayed_item(struct btrfs_delayed_item *delayed_item) diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index f82be266ba4b..d33ff5b23302 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -916,9 +916,9 @@ void btrfs_dev_replace_clear_lock_blocking( ASSERT(atomic_read(_replace->read_locks) > 0); ASSERT(atomic_read(_replace->blocking_readers) > 0); read_lock(_replace->lock); - if (atomic_dec_and_test(_replace->blocking_readers) && - waitqueue_active(_replace->read_lock_wq)) - wake_up(_replace->read_lock_wq); + /* Barrier implied by atomic_dec_and_test */ + if (atomic_dec_and_test(_replace->blocking_readers)) + cond_wake_up_nomb(_replace->read_lock_wq); } void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info) @@ -929,9 +929,7 @@ void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info) void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount) { percpu_counter_sub(_info->bio_counter, amount); - - if (waitqueue_active(_info->replace_wait)) - wake_up(_info->replace_wait); + cond_wake_up_nomb(_info->replace_wait); } void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 3871658b6ab1..94fc825a126d 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -11094,12 +11094,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info, struct fstrim_range *range) void btrfs_end_write_no_snapshotting(struct btrfs_root *root) { percpu_counter_dec(>subv_writers->counter); - /* -* Make sure counter is updated before we wake up waiters. -*/ - smp_mb(); - if (waitqueue_active(>subv_writers->wait)) - wake_up(>subv_writers->wait); + cond_wake_up(>subv_writers->wait); } int btrfs_start_write_no_snapshotting(struct btrfs_root *root) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d241285a0d2a..cf6f2815ffee 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1156,13 +1156,10 @@ static noinline void async_cow_submit(struct btrfs_work *work) nr_pages = (async_cow->end - async_cow->start + PAGE_SIZE) >> PAGE_SHIFT; - /* -* atomic_sub_return implies a barrier for waitqueue_active -*/ + /* atomic_sub_return implies a barrier */ if (atomic_sub_return(nr_pages, _info->async_delalloc_pages) < - 5 * SZ_1M && - waitqueue_active(_info->async_submit_wait)) - wake_up(_info->async_submit_wait); + 5 * SZ_1M) + cond_wake_up_nomb(_info->async_submit_wait); if (async_cow->inode) submit_compressed_extents(async_cow->inode, async_cow); diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c index e4faefac9d16..1da768e5ef75 100644 ---
[PATCH v3 2/3] btrfs: add barriers to btrfs_sync_log before log_commit_wait wakeups
Currently the code assumes that there's an implied barrier by the sequence of code preceding the wakeup, namely the mutex unlock. As Nikolay pointed out: I think this is wrong (not your code) but the original assumption that the RELEASE semantics provided by mutex_unlock is sufficient. According to memory-barriers.txt: Section 'LOCK ACQUISITION FUNCTIONS' states: (2) RELEASE operation implication: Memory operations issued before the RELEASE will be completed before the RELEASE operation has completed. Memory operations issued after the RELEASE *may* be completed before the RELEASE operation has completed. (I've bolded the may portion) The example given there: As an example, consider the following: *A = a; *B = b; ACQUIRE *C = c; *D = d; RELEASE *E = e; *F = f; The following sequence of events is acceptable: ACQUIRE, {*F,*A}, *E, {*C,*D}, *B, RELEASE So if we assume that *C is modifying the flag which the waitqueue is checking, and *E is the actual wakeup, then those accesses can be re-ordered... IMHO this code should be considered broken... --- To be on the safe side, add the barriers. The synchronization logic around log using the mutexes and several other threads does not make it easy to reason for/against the barrier. CC: Nikolay BorisovLink: https://lkml.kernel.org/r/6ee068d8-1a69-3728-00d1-d86293d43...@suse.com Signed-off-by: David Sterba --- fs/btrfs/tree-log.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 43758e30aa7a..fa5b3dc5f4d5 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3116,8 +3116,11 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, mutex_unlock(_root_tree->log_mutex); /* -* The barrier before waitqueue_active is implied by mutex_unlock +* The barrier before waitqueue_active is needed so all the updates +* above are seen by the woken threads. It might not be necessary, but +* proving that seems to be hard. */ + smp_mb(); if (waitqueue_active(_root_tree->log_commit_wait[index2])) wake_up(_root_tree->log_commit_wait[index2]); out: @@ -3128,8 +3131,11 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, mutex_unlock(>log_mutex); /* -* The barrier before waitqueue_active is implied by mutex_unlock +* The barrier before waitqueue_active is needed so all the updates +* above are seen by the woken threads. It might not be necessary, but +* proving that seems to be hard. */ + smp_mb(); if (waitqueue_active(>log_commit_wait[index1])) wake_up(>log_commit_wait[index1]); return ret; -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/3] Cleanup waitqueue_active and barriers
Reduce number of standalone barriers before waitqueue_active calls. Changes v3: * fix wrong use of the _nomb variant in tree-log.c:btrfs_sync_log, * update comments to be more specific about the waitqueue_active and barrier Changes v2: * add 2 barriers to btrfs_sync_log and do not assume they're implied, (pointed out by Nikolay) git://github.com/kdave/btrfs-devel.git cleanup/cond-wake David Sterba (3): btrfs: introduce conditional wakeup helpers btrfs: add barriers to btrfs_sync_log before log_commit_wait wakeups btrfs: replace waitqueue_actvie with cond_wake_up fs/btrfs/compression.c | 7 +-- fs/btrfs/ctree.h | 22 ++ fs/btrfs/delayed-inode.c | 9 +++-- fs/btrfs/dev-replace.c | 10 -- fs/btrfs/extent-tree.c | 7 +-- fs/btrfs/inode.c | 9 +++-- fs/btrfs/locking.c | 34 +++--- fs/btrfs/ordered-data.c | 14 -- fs/btrfs/transaction.c | 7 +-- fs/btrfs/tree-log.c | 28 10 files changed, 62 insertions(+), 85 deletions(-) -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/3] btrfs: introduce conditional wakeup helpers
Add convenience wrappers for the waitqueue management that involves memory barriers to prevent deadlocks. The helpers will let us remove barriers and the necessary comments in several places. Reviewed-by: Nikolay BorisovSigned-off-by: David Sterba --- fs/btrfs/ctree.h | 22 ++ 1 file changed, 22 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2771cc56a622..1389942f3be1 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3765,4 +3765,26 @@ static inline int btrfs_is_testing(struct btrfs_fs_info *fs_info) return 0; } +static inline void cond_wake_up(struct wait_queue_head *wq) +{ + /* +* This implies a full smp_mb barrier, see comments for +* waitqueue_active why. +*/ + if (wq_has_sleeper(wq)) + wake_up(wq); +} + +static inline void cond_wake_up_nomb(struct wait_queue_head *wq) +{ + /* +* Special case for conditional wakeup where the barrier required for +* waitqueue_active is implied by some of the preceding code. Eg. one +* of such atomic operations (atomic_dec_and_return, ...), or a +* unlock/lock sequence, etc. +*/ + if (waitqueue_active(wq)) + wake_up(wq); +} + #endif -- 2.16.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] btrfs-progs: check: check symlinks with append/immutable flags
On 2018/5/14 7:18 PM, Nikolay Borisov wrote: > > > On 14.05.2018 13:29, Su Yue wrote: >> Define new macro I_ERR_ODD_INODE_FLAGS to represents odd inode flags. >> >> Symlinks should never have append/immutable flags. >> While processing inodes, if found a symlink with append/immutable >> flags, mark the inode record with I_ERR_ODD_INODE_FLAGS. >> >> This is for original mode. >> >> Signed-off-by: Su Yue>> --- >> check/main.c | 7 +++ >> check/mode-original.h | 1 + >> 2 files changed, 8 insertions(+) >> >> diff --git a/check/main.c b/check/main.c >> index 68da994f7ae0..f5f52c406046 100644 >> --- a/check/main.c >> +++ b/check/main.c >> @@ -576,6 +576,8 @@ static void print_inode_error(struct btrfs_root *root, >> struct inode_record *rec) >> fprintf(stderr, ", link count wrong"); >> if (errors & I_ERR_FILE_EXTENT_ORPHAN) >> fprintf(stderr, ", orphan file extent"); >> +if (errors & I_ERR_ODD_INODE_FLAGS) >> +fprintf(stderr, ", odd inode flags"); >> fprintf(stderr, "\n"); >> /* Print the orphan extents if needed */ >> if (errors & I_ERR_FILE_EXTENT_ORPHAN) >> @@ -805,6 +807,7 @@ static int process_inode_item(struct extent_buffer *eb, >> { >> struct inode_record *rec; >> struct btrfs_inode_item *item; >> +u64 flags; >> >> rec = active_node->current; >> BUG_ON(rec->ino != key->objectid || rec->refs > 1); >> @@ -822,6 +825,10 @@ static int process_inode_item(struct extent_buffer *eb, >> rec->found_inode_item = 1; >> if (rec->nlink == 0) >> rec->errors |= I_ERR_NO_ORPHAN_ITEM; >> +flags = btrfs_inode_flags(eb, item); >> +if (rec->imode & BTRFS_FT_SYMLINK && >> +flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND)) >> +rec->errors = I_ERR_ODD_INODE_FLAGS; >> maybe_free_inode_rec(_node->inode_cache, rec); >> return 0; >> } >> diff --git a/check/mode-original.h b/check/mode-original.h >> index 368de692fdd1..13cfa5b9e1b3 100644 >> --- a/check/mode-original.h >> +++ b/check/mode-original.h >> @@ -186,6 +186,7 @@ struct file_extent_hole { >> #define I_ERR_LINK_COUNT_WRONG (1 << 13) >> #define I_ERR_FILE_EXTENT_ORPHAN(1 << 14) >> #define I_ERR_FILE_EXTENT_TOO_LARGE (1 << 15) >> +#define I_ERR_ODD_INODE_FLAGS (1 << 16) > > I don't like the name of the flag. How about: > > I_ERR_INVALID_SYMLINK_FLAGS > Before, original mode doesn't check inode flags. So I'd rather to keep INODE_FLAGS for further use. As for "ODD" part, it is just inherited from name style of above lines. I don't like "ODD" too much either. Anyway, thanks. Su >> >> struct inode_record { >> struct list_head backrefs; >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > pEpkey.asc Description: application/pgp-keys
Re: [PATCH 2/3] btrfs-progs: lowmem: check symlinks with append/immutable flags
Indeed better. Will do it in V2. Thanks, Su On Mon, May 14, 2018 at 7:19 PM Nikolay Borisovwrote: > On 14.05.2018 13:29, Su Yue wrote: > > Symlinks should never have append/immutable flags. > > While checking inodes, if found a symlink with append/immutable > > flags, report and record the fatal error. > > > > This is for lowmem mode. > > > > Signed-off-by: Su Yue > > --- > > check/mode-lowmem.c | 10 ++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c > > index 9890180d1d3c..61c4e7f23d47 100644 > > --- a/check/mode-lowmem.c > > +++ b/check/mode-lowmem.c > > @@ -2274,6 +2274,7 @@ static int check_inode_item(struct btrfs_root *root, struct btrfs_path *path) > > struct btrfs_key last_key; > > u64 inode_id; > > u32 mode; > > + u64 flags; > > u64 nlink; > > u64 nbytes; > > u64 isize; > > @@ -2307,10 +2308,19 @@ static int check_inode_item(struct btrfs_root *root, struct btrfs_path *path) > > isize = btrfs_inode_size(node, ii); > > nbytes = btrfs_inode_nbytes(node, ii); > > mode = btrfs_inode_mode(node, ii); > > + flags = btrfs_inode_flags(node, ii); > > dir = imode_to_type(mode) == BTRFS_FT_DIR; > > nlink = btrfs_inode_nlink(node, ii); > > nodatasum = btrfs_inode_flags(node, ii) & BTRFS_INODE_NODATASUM; > > > > + if (mode & BTRFS_FT_SYMLINK && > > + (flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND))) { > > + err = FATAL_ERROR; > > + error( > > +"symlinks shall never be with immutable/append, root %llu inode item [%llu] flags %llu may be corrupted", > How about: > "symlinks must never have immutable/append flags set, root %llu ino %llu flag %llu may be corrupted". > I think printing the ino number should suffice, no need for the "fancy" [] around the inode number. > > + root->objectid, inode_id, flags); > > + } > > + > > while (1) { > > btrfs_item_key_to_cpu(path->nodes[0], _key, path->slots[0]); > > ret = btrfs_next_item(root, path); > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] btrfs-progs: check: check symlinks with append/immutable flags
On 2018/5/14 7:22 PM, Qu Wenruo wrote: > > > On 2018年05月14日 18:29, Su Yue wrote: >> Define new macro I_ERR_ODD_INODE_FLAGS to represents odd inode flags. >> >> Symlinks should never have append/immutable flags. >> While processing inodes, if found a symlink with append/immutable >> flags, mark the inode record with I_ERR_ODD_INODE_FLAGS. >> >> This is for original mode. >> >> Signed-off-by: Su Yue>> --- >> check/main.c | 7 +++ >> check/mode-original.h | 1 + >> 2 files changed, 8 insertions(+) >> >> diff --git a/check/main.c b/check/main.c >> index 68da994f7ae0..f5f52c406046 100644 >> --- a/check/main.c >> +++ b/check/main.c >> @@ -576,6 +576,8 @@ static void print_inode_error(struct btrfs_root *root, >> struct inode_record *rec) >> fprintf(stderr, ", link count wrong"); >> if (errors & I_ERR_FILE_EXTENT_ORPHAN) >> fprintf(stderr, ", orphan file extent"); >> +if (errors & I_ERR_ODD_INODE_FLAGS) >> +fprintf(stderr, ", odd inode flags"); >> fprintf(stderr, "\n"); >> /* Print the orphan extents if needed */ >> if (errors & I_ERR_FILE_EXTENT_ORPHAN) >> @@ -805,6 +807,7 @@ static int process_inode_item(struct extent_buffer *eb, >> { >> struct inode_record *rec; >> struct btrfs_inode_item *item; >> +u64 flags; >> >> rec = active_node->current; >> BUG_ON(rec->ino != key->objectid || rec->refs > 1); >> @@ -822,6 +825,10 @@ static int process_inode_item(struct extent_buffer *eb, >> rec->found_inode_item = 1; >> if (rec->nlink == 0) >> rec->errors |= I_ERR_NO_ORPHAN_ITEM; >> +flags = btrfs_inode_flags(eb, item); >> +if (rec->imode & BTRFS_FT_SYMLINK && >> +flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND)) >> +rec->errors = I_ERR_ODD_INODE_FLAGS; > > Shouldn't it be "|=" instead of "=" ? > > Thanks, > Qu Oh... Should be "|=". Thanks, Su > >> maybe_free_inode_rec(_node->inode_cache, rec); >> return 0; >> } >> diff --git a/check/mode-original.h b/check/mode-original.h >> index 368de692fdd1..13cfa5b9e1b3 100644 >> --- a/check/mode-original.h >> +++ b/check/mode-original.h >> @@ -186,6 +186,7 @@ struct file_extent_hole { >> #define I_ERR_LINK_COUNT_WRONG (1 << 13) >> #define I_ERR_FILE_EXTENT_ORPHAN(1 << 14) >> #define I_ERR_FILE_EXTENT_TOO_LARGE (1 << 15) >> +#define I_ERR_ODD_INODE_FLAGS (1 << 16) >> >> struct inode_record { >> struct list_head backrefs; >> > pEpkey.asc Description: application/pgp-keys
Re: Btrfs installation advices
On 2018-05-12 21:58, faurepi...@gmail.com wrote: Thanks you two very much for your answers. So if I sum up correctly, I could: 1- use Self-Encrypting Drive (SED), since my drive is a Samsung NVMe 960 EVO, which is supposed to support SED according to http://www.samsung.com/semiconductor/minisite/ssd/support/faqs-nvmessd: "*Do Samsung NVMe M.2 SSDs have hardware encryption?* Samsung NVMe SSDs provide internal hardware encryption of all data stored on the SSD, including the operating system. Data is decrypted through a pre-boot authentication process. Because all user data is encrypted, private information is protected against loss or theft. Encryption is done by hardware, which provides a safer environment without sacrificing performance. The encryption methods provided by each Samsung NVMe SSD are: AES (Advanced Encryption Standard, Class0 SED) TCG/OPAL, and eDrive Please note that you cannot use more than one encryption method simultaneously. *Do Samsung NVMe M.2 SSDs support TCG Opal?* TCG Opal is supported by Samsung NVMe SSDs (960EVO / PRO and newer). It is an authentication method that employs the protocol specified by the Trusted Computing Group (TCG) meaning that you will need to install TCG software supplied by a TCG OPAL software development company. User authentication is done by pre-boot authentication provided by the software. For more detailed information and instructions, please contact a TCG software company. In addition, TCG/opal can only be enabled / disabled by using special security software. " For the moment, I don't know how to use that self-encryption from linux. Could you please give me some tips or links about how you did? 2- now that the full drive is self-encrypted, I can build manually the three partitions from a live system: boot with ext(2,3,4), swap with swap, and root with btrfs 3- and finally install debian sid in the dedicaced partitions. Am I right? :) Yes, that approach will work, assuming you trust Samsung (since they're the ones who wrote the code responsible for the encryption, and you can't inspect that code yourself). Le 08/05/2018 à 13:32, Austin S. Hemmelgarn a écrit : On 2018-05-08 03:50, Rolf Wald wrote: Hello, some hints inside Am 08.05.2018 um 02:22 schrieb faurepi...@gmail.com: Hi, I'm curious about btrfs, and maybe considering it for my new laptop installation (a Lenovo T470). I was going to install my usual lvm+ext4+full disk encryption setup, but thought I should maybe give a try to btrfs. Is it possible to meet all these criteria? - operating system: debian sid - file system: btrfs - disk encryption (or at least of sensitives partitions) - hibernation feature (which implies a swap partition or file, and I've read btrfs is not a big fan of the latter) A swap partition is not possible inside or with btrfs alone. You can choose btrfs filesystem out of the box in debian install, but that would mean full-disk-encryption with lvm and btrfs. The extra layer lvm doesn't hurt, but you have two layers with many functions double, e.g. snapshotting, resize. Um, this isn't really as much of an issue as you might think. LVM has near zero overhead unless you're actually doing any of that stuff (as long as the LV is just a simple linear mapping, it has less than 1% more overhead than just using partitions). The only real caveat here is to make _ABSOLUTELY CERTAIN_ that you _DO NOT_ make LVM snapshots of _ANY_ BTRFS volumes. Doing so is a recipe for disaster, and will likely eat at least your data, and possibly your children. The bigger issue is that dm-crypt generally slows down device access, which BTRFS is very sensitive to. Using BTRFS with FDE works, but it's slow, so I would only suggest doing it with an SSD (and if you're using an SSD, you may be better off getting a TCG Opal compliant self-encrypting drive and just using the self-encryption functionality instead of FDE). If yes, how would you suggest me to achieve it? Yes, there is a solution, and it works for me now several years. You need to build three partitions, e.g. named boot, swap, root. The sizes choose to your need. the boot partition remains unencrypted, but the other two partitions are encrypted with cryptsetup (luks) separately. Normally there are two passphrases to type in (and to remember), but there is an option in the cryptsetup scripts (/lib/cryptsetup/scripts) decrypt_derived, which could take the key from the root partition to decrypt the swap partition also. The filesystems then on the partitions are boot with ext(2,3,4), swap with swap and root with btrfs. This configuration is not reachable with a standard debian installation. Debian always choose lvm if you want full encryption. You have to do the first steps manually: make partitions, cryptsetup(luks) for the partitions swap and root, and open the encrypted partitions manually. After that you can install your OS. The manual steps you have to make
Re: [PATCH v3] btrfs: incremental send, fix BUG when invalid memory access
On Mon, May 14, 2018 at 3:51 AM, robbiekowrote: > From: Robbie Ko > > [BUG] > btrfs incremental send BUG happens when creating a snapshot of snapshot > that is being used by send. > > [REASON] > The problem can happen if while we are doing a send one of the snapshots > used (parent or send) is snapshotted, because snapshoting implies COWing > the root of the source subvolume/snapshot. > > 1. When doing an incremental send, the send process will get the commit >roots from the parent and send snapshots, and add references to them >through extent_buffer_get(). > > 2. When a snapshot/subvolume is snapshotted, its root node is COWed >(transaction.c:create_pending_snapshot()). > > 3. COWing releases the space used by the node immediately, through: > >__btrfs_cow_block() >--btrfs_free_tree_block() >btrfs_add_free_space(bytenr of node) > > 4. Because send doesn't hold a transaction open, it's possible that >the transaction used to create the snapshot commits, switches the >commit root and the old space used by the previous root node gets >assigned to some other node allocation. Allocation of a new node will >use the existing extent buffer found in memory, which we previously >got a reference through extent_buffer_get(), and allow the extent >buffer's content (pages) to be modified: > >btrfs_alloc_tree_block >--btrfs_reserve_extent >find_free_extent (get bytenr of old node) >--btrfs_init_new_buffer (use bytenr of old node) >btrfs_find_create_tree_block >--alloc_extent_buffer >find_extent_buffer (get old node) > > 5. So send can access invalid memory content and have unpredictable >behaviour. > > [FIX] > So we fix the problem by copying the commit roots of the send and > parent snapshots and use those copies. > > CallTrace looks like this: > [ cut here ] > kernel BUG at fs/btrfs/ctree.c:1861! > invalid opcode: [#1] SMP > CPU: 6 PID: 24235 Comm: btrfs Tainted: P O 3.10.105 #23721 > 88046652d680 ti: 88041b72 task.ti: 88041b72 > RIP: 0010:[] read_node_slot+0x108/0x110 [btrfs] > RSP: 0018:88041b723b68 EFLAGS: 00010246 > RAX: 88043ca6b000 RBX: 88041b723c50 RCX: 8800 > RDX: 004c RSI: 880314b133f8 RDI: 880458b24000 > RBP: R08: 0001 R09: 88041b723c66 > R10: 0001 R11: 1000 R12: 8803f3e48890 > R13: 8803f3e48880 R14: 880466351800 R15: 0001 > FS: 7f8c321dc8c0() GS:88047fcc() > CS: 0010 DS: ES: CR0: 80050033 > R2: 7efd1006d000 CR3: 000213a24000 CR4: 003407e0 > DR0: DR1: DR2: > DR3: DR6: fffe0ff0 DR7: 0400 > Stack: > 88041b723c50 8803f3e48880 8803f3e48890 8803f3e48880 > 880466351800 0001 a08dd9d7 88041b723c50 > 8803f3e48880 88041b723c66 a08dde85 a9ff88042d2c4400 > Call Trace: > [] ? tree_move_down.isra.33+0x27/0x50 [btrfs] > [] ? tree_advance+0xb5/0xc0 [btrfs] > [] ? btrfs_compare_trees+0x2d4/0x760 [btrfs] > [] ? finish_inode_if_needed+0x870/0x870 [btrfs] > [] ? btrfs_ioctl_send+0xeda/0x1050 [btrfs] > [] ? btrfs_ioctl+0x1e3d/0x33f0 [btrfs] > [] ? handle_pte_fault+0x373/0x990 > [] ? atomic_notifier_call_chain+0x16/0x20 > [] ? set_task_cpu+0xb6/0x1d0 > [] ? handle_mm_fault+0x143/0x2a0 > [] ? __do_page_fault+0x1d0/0x500 > [] ? check_preempt_curr+0x57/0x90 > [] ? do_vfs_ioctl+0x4aa/0x990 > [] ? do_fork+0x113/0x3b0 > [] ? trace_hardirqs_off_thunk+0x3a/0x6c > [] ? SyS_ioctl+0x88/0xa0 > [] ? system_call_fastpath+0x16/0x1b > ---[ end trace 29576629ee80b2e1 ]--- > > Signed-off-by: Robbie Ko Reviewed-by: Filipe Manana Looks good, I would only change the subject to something like: Btrfs: send, fix invalid access to commit roots due to concurrent snapshotting David can probably do that when picking this patch. thanks > --- > V3: use btrfs_clone_extent_buffer > V2: fix comments > fs/btrfs/ctree.c | 16 ++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c > index b88a79e..8b6b554 100644 > --- a/fs/btrfs/ctree.c > +++ b/fs/btrfs/ctree.c > @@ -5460,12 +5460,24 @@ int btrfs_compare_trees(struct btrfs_root *left_root, > down_read(_info->commit_root_sem); > left_level = btrfs_header_level(left_root->commit_root); > left_root_level = left_level; > - left_path->nodes[left_level] = left_root->commit_root; > + left_path->nodes[left_level] = > + btrfs_clone_extent_buffer(left_root->commit_root); > + if (!left_path->nodes[left_level]) { > + up_read(_info->commit_root_sem); > +
Re: [PATCH 2/3] btrfs-progs: lowmem: check symlinks with append/immutable flags
On 2018年05月14日 18:29, Su Yue wrote: > Symlinks should never have append/immutable flags. > While checking inodes, if found a symlink with append/immutable > flags, report and record the fatal error. > > This is for lowmem mode. > > Signed-off-by: Su Yue> --- > check/mode-lowmem.c | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c > index 9890180d1d3c..61c4e7f23d47 100644 > --- a/check/mode-lowmem.c > +++ b/check/mode-lowmem.c > @@ -2274,6 +2274,7 @@ static int check_inode_item(struct btrfs_root *root, > struct btrfs_path *path) > struct btrfs_key last_key; > u64 inode_id; > u32 mode; > + u64 flags; > u64 nlink; > u64 nbytes; > u64 isize; > @@ -2307,10 +2308,19 @@ static int check_inode_item(struct btrfs_root *root, > struct btrfs_path *path) > isize = btrfs_inode_size(node, ii); > nbytes = btrfs_inode_nbytes(node, ii); > mode = btrfs_inode_mode(node, ii); > + flags = btrfs_inode_flags(node, ii); > dir = imode_to_type(mode) == BTRFS_FT_DIR; > nlink = btrfs_inode_nlink(node, ii); > nodatasum = btrfs_inode_flags(node, ii) & BTRFS_INODE_NODATASUM; > > + if (mode & BTRFS_FT_SYMLINK && > + (flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND))) { > + err = FATAL_ERROR; Where is the FATAL_ERROR defined? I can't find it anyway on v4.16 branch. And I assume that FATAL_ERROR is defined to abort check, if so, in that case the corruption is not that serious, just some unexpected behavior, not a fatal error. If not, just ignore this comment. Thanks, Qu > + error( > +"symlinks shall never be with immutable/append, root %llu inode item [%llu] > flags %llu may be corrupted", > + root->objectid, inode_id, flags); > + } > + > while (1) { > btrfs_item_key_to_cpu(path->nodes[0], _key, > path->slots[0]); > ret = btrfs_next_item(root, path); > signature.asc Description: OpenPGP digital signature
Re: [PATCH 1/3] btrfs-progs: check: check symlinks with append/immutable flags
On 2018年05月14日 18:29, Su Yue wrote: > Define new macro I_ERR_ODD_INODE_FLAGS to represents odd inode flags. > > Symlinks should never have append/immutable flags. > While processing inodes, if found a symlink with append/immutable > flags, mark the inode record with I_ERR_ODD_INODE_FLAGS. > > This is for original mode. > > Signed-off-by: Su Yue> --- > check/main.c | 7 +++ > check/mode-original.h | 1 + > 2 files changed, 8 insertions(+) > > diff --git a/check/main.c b/check/main.c > index 68da994f7ae0..f5f52c406046 100644 > --- a/check/main.c > +++ b/check/main.c > @@ -576,6 +576,8 @@ static void print_inode_error(struct btrfs_root *root, > struct inode_record *rec) > fprintf(stderr, ", link count wrong"); > if (errors & I_ERR_FILE_EXTENT_ORPHAN) > fprintf(stderr, ", orphan file extent"); > + if (errors & I_ERR_ODD_INODE_FLAGS) > + fprintf(stderr, ", odd inode flags"); > fprintf(stderr, "\n"); > /* Print the orphan extents if needed */ > if (errors & I_ERR_FILE_EXTENT_ORPHAN) > @@ -805,6 +807,7 @@ static int process_inode_item(struct extent_buffer *eb, > { > struct inode_record *rec; > struct btrfs_inode_item *item; > + u64 flags; > > rec = active_node->current; > BUG_ON(rec->ino != key->objectid || rec->refs > 1); > @@ -822,6 +825,10 @@ static int process_inode_item(struct extent_buffer *eb, > rec->found_inode_item = 1; > if (rec->nlink == 0) > rec->errors |= I_ERR_NO_ORPHAN_ITEM; > + flags = btrfs_inode_flags(eb, item); > + if (rec->imode & BTRFS_FT_SYMLINK && > + flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND)) > + rec->errors = I_ERR_ODD_INODE_FLAGS; Shouldn't it be "|=" instead of "=" ? Thanks, Qu > maybe_free_inode_rec(_node->inode_cache, rec); > return 0; > } > diff --git a/check/mode-original.h b/check/mode-original.h > index 368de692fdd1..13cfa5b9e1b3 100644 > --- a/check/mode-original.h > +++ b/check/mode-original.h > @@ -186,6 +186,7 @@ struct file_extent_hole { > #define I_ERR_LINK_COUNT_WRONG (1 << 13) > #define I_ERR_FILE_EXTENT_ORPHAN (1 << 14) > #define I_ERR_FILE_EXTENT_TOO_LARGE (1 << 15) > +#define I_ERR_ODD_INODE_FLAGS(1 << 16) > > struct inode_record { > struct list_head backrefs; > signature.asc Description: OpenPGP digital signature
Re: [PATCH 1/3] btrfs-progs: check: check symlinks with append/immutable flags
On 14.05.2018 13:29, Su Yue wrote: > Define new macro I_ERR_ODD_INODE_FLAGS to represents odd inode flags. > > Symlinks should never have append/immutable flags. > While processing inodes, if found a symlink with append/immutable > flags, mark the inode record with I_ERR_ODD_INODE_FLAGS. > > This is for original mode. > > Signed-off-by: Su Yue> --- > check/main.c | 7 +++ > check/mode-original.h | 1 + > 2 files changed, 8 insertions(+) > > diff --git a/check/main.c b/check/main.c > index 68da994f7ae0..f5f52c406046 100644 > --- a/check/main.c > +++ b/check/main.c > @@ -576,6 +576,8 @@ static void print_inode_error(struct btrfs_root *root, > struct inode_record *rec) > fprintf(stderr, ", link count wrong"); > if (errors & I_ERR_FILE_EXTENT_ORPHAN) > fprintf(stderr, ", orphan file extent"); > + if (errors & I_ERR_ODD_INODE_FLAGS) > + fprintf(stderr, ", odd inode flags"); > fprintf(stderr, "\n"); > /* Print the orphan extents if needed */ > if (errors & I_ERR_FILE_EXTENT_ORPHAN) > @@ -805,6 +807,7 @@ static int process_inode_item(struct extent_buffer *eb, > { > struct inode_record *rec; > struct btrfs_inode_item *item; > + u64 flags; > > rec = active_node->current; > BUG_ON(rec->ino != key->objectid || rec->refs > 1); > @@ -822,6 +825,10 @@ static int process_inode_item(struct extent_buffer *eb, > rec->found_inode_item = 1; > if (rec->nlink == 0) > rec->errors |= I_ERR_NO_ORPHAN_ITEM; > + flags = btrfs_inode_flags(eb, item); > + if (rec->imode & BTRFS_FT_SYMLINK && > + flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND)) > + rec->errors = I_ERR_ODD_INODE_FLAGS; > maybe_free_inode_rec(_node->inode_cache, rec); > return 0; > } > diff --git a/check/mode-original.h b/check/mode-original.h > index 368de692fdd1..13cfa5b9e1b3 100644 > --- a/check/mode-original.h > +++ b/check/mode-original.h > @@ -186,6 +186,7 @@ struct file_extent_hole { > #define I_ERR_LINK_COUNT_WRONG (1 << 13) > #define I_ERR_FILE_EXTENT_ORPHAN (1 << 14) > #define I_ERR_FILE_EXTENT_TOO_LARGE (1 << 15) > +#define I_ERR_ODD_INODE_FLAGS(1 << 16) I don't like the name of the flag. How about: I_ERR_INVALID_SYMLINK_FLAGS > > struct inode_record { > struct list_head backrefs; > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] btrfs-progs: lowmem: check symlinks with append/immutable flags
On 14.05.2018 13:29, Su Yue wrote: > Symlinks should never have append/immutable flags. > While checking inodes, if found a symlink with append/immutable > flags, report and record the fatal error. > > This is for lowmem mode. > > Signed-off-by: Su Yue> --- > check/mode-lowmem.c | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c > index 9890180d1d3c..61c4e7f23d47 100644 > --- a/check/mode-lowmem.c > +++ b/check/mode-lowmem.c > @@ -2274,6 +2274,7 @@ static int check_inode_item(struct btrfs_root *root, > struct btrfs_path *path) > struct btrfs_key last_key; > u64 inode_id; > u32 mode; > + u64 flags; > u64 nlink; > u64 nbytes; > u64 isize; > @@ -2307,10 +2308,19 @@ static int check_inode_item(struct btrfs_root *root, > struct btrfs_path *path) > isize = btrfs_inode_size(node, ii); > nbytes = btrfs_inode_nbytes(node, ii); > mode = btrfs_inode_mode(node, ii); > + flags = btrfs_inode_flags(node, ii); > dir = imode_to_type(mode) == BTRFS_FT_DIR; > nlink = btrfs_inode_nlink(node, ii); > nodatasum = btrfs_inode_flags(node, ii) & BTRFS_INODE_NODATASUM; > > + if (mode & BTRFS_FT_SYMLINK && > + (flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND))) { > + err = FATAL_ERROR; > + error( > +"symlinks shall never be with immutable/append, root %llu inode item [%llu] > flags %llu may be corrupted", How about: "symlinks must never have immutable/append flags set, root %llu ino %llu flag %llu may be corrupted". I think printing the ino number should suffice, no need for the "fancy" [] around the inode number. > + root->objectid, inode_id, flags); > + } > + > while (1) { > btrfs_item_key_to_cpu(path->nodes[0], _key, > path->slots[0]); > ret = btrfs_next_item(root, path); > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/9] btrfs-progs: btrfs-corrupt-block: Change -I flag parameter format
Presently, if we want to corrupt a particular item we need to call corrupt block like so: btrfs-corrupt-block -I -K-r "numeric rootid" This is problematic because the -K option not only sets the key to the item that the -I option should corrupt but it also signals that we want to corrupt the key itself. This way we cannot really use the program with the following semantics: "Corrupt only the item which corresponds to this key but leave the key intact" This patch aims to fix this by changing the format of the -I flag. So if one wants to corrupt only an item (and leave the key intact) they should use: btrfs-corrupt-block -r -I In addition to this problem, -K doesn't really understand the the "-r" argument, so when using it in conjunction with -I to corrupt an item, not part of the root tree, it will always produce an error: Couldn't find the key to corrupt Signed-off-by: Nikolay Borisov --- btrfs-corrupt-block.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c index 39a611d89d55..4cc3528df105 100644 --- a/btrfs-corrupt-block.c +++ b/btrfs-corrupt-block.c @@ -116,7 +116,7 @@ static void print_usage(int ret) printf("\t-m The metadata block to corrupt (must also specify -f for the field to corrupt)\n"); printf("\t-K The key to corrupt in the format ,, (must also specify -f for the field)\n"); printf("\t-f The field in the item to corrupt\n"); - printf("\t-I An item to corrupt (must also specify the field to corrupt and a root+key for the item)\n"); + printf("\t-I Corrupt an item corresponding to the passed key triplet (must also specify the field to corrupt and root for the item)\n"); printf("\t-D Corrupt a dir item, must specify key and field\n"); printf("\t-d Delete this item (must specify -K)\n"); printf("\t-r Operate on this root (only works with -d)\n"); @@ -1165,7 +1165,7 @@ int main(int argc, char **argv) { NULL, 0, NULL, 0 } }; - c = getopt_long(argc, argv, "l:c:b:eEkuUi:f:x:m:K:IDdr:C:", + c = getopt_long(argc, argv, "l:c:b:eEkuUi:f:x:m:K:I:Ddr:C:", long_options, NULL); if (c < 0) break; @@ -1222,6 +1222,7 @@ int main(int argc, char **argv) break; case 'I': corrupt_item = 1; + parse_key(, , ); break; case 'd': delete = 1; @@ -1356,6 +1357,7 @@ int main(int argc, char **argv) target = open_root(root->fs_info, root_objectid); ret = corrupt_btrfs_item(target, , field); + goto out_close; } if (delete) { struct btrfs_root *target = root; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/9] btrfs-progs: btrfs-corrupt-block: Factor out key parsing function
Currently passing a key with -K handling is open coded. I intend on changing the interface a bit to make the program more usable. To aid in this factor out common code which parses a triplet of the "u64,u8,u64" format, corresponding to a btrfs key. No functional changes. Signed-off-by: Nikolay Borisov--- btrfs-corrupt-block.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c index 0018b6c9662d..39a611d89d55 100644 --- a/btrfs-corrupt-block.c +++ b/btrfs-corrupt-block.c @@ -1081,6 +1081,16 @@ static int find_chunk_offset(struct btrfs_root *root, } +static void parse_key(u64 *objectid, u8 *type, u64 *offset) +{ + + int ret = sscanf(optarg, "%llu,%hhu,%llu", objectid, type, offset); + if (ret != 3) { + fprintf(stderr, "Error parsing key %d\n", errno); + print_usage(1); + } +} + static struct btrfs_root *open_root(struct btrfs_fs_info *fs_info, u64 root_objectid) { -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/9] btrfs-progs: btrfs-corrupt-block: Convert -K flag argument handling to common function
There is now a common function used to parse btrfs keys triplets so use that one. No functional changes. Signed-off-by: Nikolay Borisov--- btrfs-corrupt-block.c | 16 +--- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c index 4cc3528df105..7a5513c455e6 100644 --- a/btrfs-corrupt-block.c +++ b/btrfs-corrupt-block.c @@ -114,7 +114,7 @@ static void print_usage(int ret) printf("\t-i The inode item to corrupt (must also specify the field to corrupt)\n"); printf("\t-x The file extent item to corrupt (must also specify -i for the inode and -f for the field to corrupt)\n"); printf("\t-m The metadata block to corrupt (must also specify -f for the field to corrupt)\n"); - printf("\t-K The key to corrupt in the format ,, (must also specify -f for the field)\n"); + printf("\t-K Corrupt the given key (must also specify -f for the field)\n"); printf("\t-f The field in the item to corrupt\n"); printf("\t-I Corrupt an item corresponding to the passed key triplet (must also specify the field to corrupt and root for the item)\n"); printf("\t-D Corrupt a dir item, must specify key and field\n"); @@ -1129,6 +1129,7 @@ int main(int argc, char **argv) int corrupt_item = 0; int corrupt_di = 0; int delete = 0; + int should_corrupt_key = 0; u64 metadata_block = 0; u64 inode = 0; u64 file_extent = (u64)-1; @@ -1207,15 +1208,8 @@ int main(int argc, char **argv) metadata_block = arg_strtou64(optarg); break; case 'K': - ret = sscanf(optarg, "%llu,%u,%llu", -, -(unsigned int *), -); - if (ret != 3) { - fprintf(stderr, "error reading key " - "%d\n", errno); - print_usage(1); - } + should_corrupt_key = 1; + parse_key(, , ); break; case 'D': corrupt_di = 1; @@ -1370,7 +1364,7 @@ int main(int argc, char **argv) ret = delete_item(target, ); goto out_close; } - if (key.objectid || key.offset || key.type) { + if (should_corrupt_key) { if (*field == 0) print_usage(1); ret = corrupt_key(root, , field); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/9] Overhaul btrfs-corrupt-block
btrfs-corrupt-block is a very useful tool albeit very neglected. This series aims to give it much needed attention. There is a mix of code-improvements and bug fixes. Code improvement mainly consists of factoring our duplicated code (Patch 1,3,6) and improving the interface of some options (4,5,8,9). The recurring topic here is that instead of having to pass btrfs-corrupt-block -K -f - make each corrupting option to take the key as an argument to it. Say we want to corrupt an item field (-I options) we now do: btrfs-corrupt-block -I -r /dev/blah instead of btrfs-corrupt-block -I -K -r /dev/blah Some patches also incorporate fixes for bugs (patch 2,7 and 9) I found during my testing. Those usability improvements are needed to enable me to produce tests for the pending free space tree support in userspace. Nikolay Borisov (9): btrfs-progs: btrfs-corrupt-block: Factor out specific-root code btrfs-progs: btrfs-corrupt-block: Correctly handle -r when passing -I btrfs-progs: btrfs-corrupt-block: Factor out key parsing function btrfs-progs: btrfs-corrupt-block: Change -I flag parameter format btrfs-progs: btrfs-corrupt-block: Convert -K flag argument handling to common function btrfs-progs: btrfs-corrupt-block: Factor out common "-r" handling code btrfs-progs: btrfs-corrupt-block: Add support for handling specific root when using -K option btrfs-progs: btrfs-corrupt-block: Change format of -d option btrfs-progs: btrfs-corrupt-block: Fix -D option btrfs-corrupt-block.c | 94 +++ 1 file changed, 57 insertions(+), 37 deletions(-) -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/9] btrfs-progs: btrfs-corrupt-block: Factor out common "-r" handling code
Since more and more of the "corrupt XXX" options are going to support combination with -r option, let's extract the common code needed for this. No functional changes. Signed-off-by: Nikolay Borisov--- btrfs-corrupt-block.c | 17 +++-- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c index 7a5513c455e6..8bbfe1cdb855 100644 --- a/btrfs-corrupt-block.c +++ b/btrfs-corrupt-block.c @@ -1114,7 +1114,7 @@ int main(int argc, char **argv) { struct cache_tree root_cache; struct btrfs_key key; - struct btrfs_root *root; + struct btrfs_root *root, *target_root; char *dev; /* chunk offset can be 0,so change to (u64)-1 */ u64 logical = (u64)-1; @@ -1245,6 +1245,10 @@ int main(int argc, char **argv) fprintf(stderr, "Open ctree failed\n"); exit(1); } + target_root = root; + if (root_objectid) + target_root = open_root(root->fs_info, root_objectid); + if (extent_rec) { struct btrfs_trans_handle *trans; @@ -1342,26 +1346,19 @@ int main(int argc, char **argv) goto out_close; } if (corrupt_item) { - struct btrfs_root *target; if (!key.objectid) print_usage(1); if (!root_objectid) print_usage(1); - target = open_root(root->fs_info, root_objectid); - - ret = corrupt_btrfs_item(target, , field); + ret = corrupt_btrfs_item(target_root, , field); goto out_close; } if (delete) { - struct btrfs_root *target = root; - if (!key.objectid) print_usage(1); - if (root_objectid) - target = open_root(root->fs_info, root_objectid); - ret = delete_item(target, ); + ret = delete_item(target_root, ); goto out_close; } if (should_corrupt_key) { -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/9] btrfs-progs: btrfs-corrupt-block: Change format of -d option
Currently if we want to delete an item we need to invoke corrupt block like so: btrfs-corrupt-block [-r ] -K-d Instead, this patch converts the format to: btrfs-corrupt-block [-r ] -d Signed-off-by: Nikolay Borisov --- btrfs-corrupt-block.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c index 74928eae839e..10a0b2d13b97 100644 --- a/btrfs-corrupt-block.c +++ b/btrfs-corrupt-block.c @@ -118,7 +118,7 @@ static void print_usage(int ret) printf("\t-f The field in the item to corrupt\n"); printf("\t-I Corrupt an item corresponding to the passed key triplet (must also specify the field to corrupt and root for the item)\n"); printf("\t-D Corrupt a dir item, must specify key and field\n"); - printf("\t-d Delete this item (must specify -K)\n"); + printf("\t-d Delete item corresponding to passed key triplet\n"); printf("\t-r Operate on this root (only works with -d)\n"); printf("\t-C Delete a csum for the specified bytenr. When used with -b it'll delete that many bytes, otherwise it's just sectorsize\n"); exit(ret); @@ -1165,7 +1165,7 @@ int main(int argc, char **argv) { NULL, 0, NULL, 0 } }; - c = getopt_long(argc, argv, "l:c:b:eEkuUi:f:x:m:K:I:Ddr:C:", + c = getopt_long(argc, argv, "l:c:b:eEkuUi:f:x:m:K:I:Dd:r:C:", long_options, NULL); if (c < 0) break; @@ -1219,6 +1219,7 @@ int main(int argc, char **argv) break; case 'd': delete = 1; + parse_key(, , ); break; case 'r': root_objectid = arg_strtou64(optarg); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/9] btrfs-progs: btrfs-corrupt-block: Correctly handle -r when passing -I
The documentation for the -I option (corrupt an item) states: An item to corrupt (must also specify the field to corrupt and a root+key for the item) The code on the other hand doesn't check whether -r is in fact passed, and even if it is it's not handled at all. This means presently -I is possible to corrupt items only in the root tree. Fix this by correctly checking -r is passed and fail otherwise and passing the correct root to corrupt_btrfs_item. Signed-off-by: Nikolay Borisov--- btrfs-corrupt-block.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c index ab6ca0a1e90a..0018b6c9662d 100644 --- a/btrfs-corrupt-block.c +++ b/btrfs-corrupt-block.c @@ -1337,9 +1337,15 @@ int main(int argc, char **argv) goto out_close; } if (corrupt_item) { + struct btrfs_root *target; if (!key.objectid) print_usage(1); - ret = corrupt_btrfs_item(root, , field); + if (!root_objectid) + print_usage(1); + + target = open_root(root->fs_info, root_objectid); + + ret = corrupt_btrfs_item(target, , field); } if (delete) { struct btrfs_root *target = root; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 9/9] btrfs-progs: btrfs-corrupt-block: Fix -D option
Currently the -D option is essentially defunct since it's the root, where we are going to corrupt a dir item is always set to the tree root. Fix this by passing the root from the "-r" option. Aditionally convert the interface for this option to the new format. So if one wants to corrupt a dir item in the default fs tree, they should now invoke: btrfs-corrupt-block -r 5 -D-f name Signed-off-by: Nikolay Borisov --- btrfs-corrupt-block.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c index 10a0b2d13b97..37f850b5b239 100644 --- a/btrfs-corrupt-block.c +++ b/btrfs-corrupt-block.c @@ -117,7 +117,7 @@ static void print_usage(int ret) printf("\t-K Corrupt the given key (must also specify -f for the field and optionally -r for the root)\n"); printf("\t-f The field in the item to corrupt\n"); printf("\t-I Corrupt an item corresponding to the passed key triplet (must also specify the field to corrupt and root for the item)\n"); - printf("\t-D Corrupt a dir item, must specify key and field\n"); + printf("\t-D Corrupt a dir item corresponding to the passed key triplet, must also specify a field\n"); printf("\t-d Delete item corresponding to passed key triplet\n"); printf("\t-r Operate on this root (only works with -d)\n"); printf("\t-C Delete a csum for the specified bytenr. When used with -b it'll delete that many bytes, otherwise it's just sectorsize\n"); @@ -1165,7 +1165,7 @@ int main(int argc, char **argv) { NULL, 0, NULL, 0 } }; - c = getopt_long(argc, argv, "l:c:b:eEkuUi:f:x:m:K:I:Dd:r:C:", + c = getopt_long(argc, argv, "l:c:b:eEkuUi:f:x:m:K:I:D:d:r:C:", long_options, NULL); if (c < 0) break; @@ -1212,6 +1212,7 @@ int main(int argc, char **argv) break; case 'D': corrupt_di = 1; + parse_key(, , ); break; case 'I': corrupt_item = 1; @@ -1338,7 +1339,7 @@ int main(int argc, char **argv) if (corrupt_di) { if (!key.objectid || *field == 0) print_usage(1); - ret = corrupt_dir_item(root, , field); + ret = corrupt_dir_item(target_root, , field); goto out_close; } if (csum_bytenr) { -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/9] btrfs-progs: btrfs-corrupt-block: Factor out specific-root code
Some options operate on a specific root so let's extract the code which deals with this. No functional change. Signed-off-by: Nikolay Borisov--- btrfs-corrupt-block.c | 37 +++-- 1 file changed, 23 insertions(+), 14 deletions(-) diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c index da0ec8c51e5a..ab6ca0a1e90a 100644 --- a/btrfs-corrupt-block.c +++ b/btrfs-corrupt-block.c @@ -1080,6 +1080,26 @@ static int find_chunk_offset(struct btrfs_root *root, return ret; } + +static struct btrfs_root *open_root(struct btrfs_fs_info *fs_info, + u64 root_objectid) +{ + + struct btrfs_key root_key; + struct btrfs_root *root; + + root_key.objectid = root_objectid; + root_key.type = BTRFS_ROOT_ITEM_KEY; + root_key.offset = (u64)-1; + + root = btrfs_read_fs_root(fs_info, _key); + if (IS_ERR(root)) { + fprintf(stderr, "Couldn't find root %llu\n", root_objectid); + print_usage(1); + } + + return root; +} int main(int argc, char **argv) { struct cache_tree root_cache; @@ -1326,20 +1346,9 @@ int main(int argc, char **argv) if (!key.objectid) print_usage(1); - if (root_objectid) { - struct btrfs_key root_key; - - root_key.objectid = root_objectid; - root_key.type = BTRFS_ROOT_ITEM_KEY; - root_key.offset = (u64)-1; - - target = btrfs_read_fs_root(root->fs_info, _key); - if (IS_ERR(target)) { - fprintf(stderr, "Couldn't find root %llu\n", - (unsigned long long)root_objectid); - print_usage(1); - } - } + if (root_objectid) + target = open_root(root->fs_info, root_objectid); + ret = delete_item(target, ); goto out_close; } -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass
On 2018年05月14日 18:29, james harvey wrote: > On Mon, May 14, 2018 at 2:36 AM, Qu Wenruowrote: >> OK, I could reproduce it now. >> >> Just mount with -o nodatasum, then create a file. >> Remount with compress-force=lzo, then write something. >> >> So at least btrfs should disallow such thing. >> >> Thanks, >> Qu > > Would the corrupted dump and correct one of the file, and kernel with > kasan output still help? Or, with what you reproduced, do you have > what you need? The dumps are good enough, kasan will be a little over killed. For my reproduced case, the data is all good, thus unable to reproduce the wildly corrupted kernel memory. I could try corrupt them, but I'm not sure if the same symptom can be reproduced, thus binary good/bad dump still helps. The heavy lifting kasan can be done in my environment. > > > On Mon, May 14, 2018 at 1:30 AM, Qu Wenruo wrote: >> So there is something wrong that btrfs allows compressed data to be >> generated for such file. >> (Could not reproduce the same behavior with 4.16 kernel, could such >> problem happens in older kernels? Or just get fixed recently?) >> >> Then some corruption screwed up the compressed data, and when we >> decompress, the kernel is screwed up. > > In this thread, Chris Murphy noted systemd sets the "C" attribute, and > discussed what sounds to me like what happened here: "Usually nocow > also means no compression. But in the archives is a thread where I > found that compression can be forced on nocow if the file is submitted > for defragmentation Oh, defrag is making things more complex here. But at least the kernel patch should also address that case. > and either the volume is mounted with compression > or the file has inherited chattr +c (I don't remember which or > possibly both). And systemd does submit rotated logs for > defragmentation." > > > > (If you don't need the dumps and kernel kasan output, you can skip the > rest of this reply) > > > >> Yep, even the last case it still looks like that it's kernel memory get >> corrupted. >> >> From the thread, since you have already located the corrupted mirror, >> would you please provide the corrupted dump along with correct one? >> >> It would help a lot for us to under stand what's going on. > > Absolutely. I'm not sure how to best get you that, though. It's great you'd like to help. Considering you have experienced btrfs-map-block usage, it should be a piece of cake. And considering you're an Arch user, it won't really take you much time. (Yeah, Arch users rock!) > > "filefrag -v" on one of the files can be seen here: > https://bugzilla.kernel.org/attachment.cgi?id=275953 > > It lists 58 fragments. That filefrag output is less useful here, the main problem is it only provides the uncompressed extent size. That's why I asked for debug-tree dump. And now those debug tree dumps would shine. I'll take the 72267 dump as an example. -- item 182 key (72267 EXTENT_DATA *0* ) itemoff 6375 itemsize 53 generation 41083 type 1 (regular) extent data disk byte *2625134592* nr *4096* extent data offset 0 nr 131072 ram 131072 extent compression 2 (lzo) -- Important numbers are surrounded by '*'. The "0" means the offset inside the file. "2625134592" means the logical address that compressed extent lies. "4096" means the compressed size of that extent. So to dump it, passing all needed numbers to btrfs-map-logical, ins this example you just need: # btrfs-map-logical -l 2625134592 -b 4096 And would get the result like: mirror 1 logical 1104150528 physical 9437184 device /dev/mapper/test-scratch2 mirror 2 logical 1104150528 physical 1095761920 device /dev/mapper/test-scratch1 Then grab them using dd. > > filefrag lists the ending offsets and length based on the uncompressed > sizes. filefrag doesn't account for the compression. Now with debug-tree dump, you have the length you need. :) > > So, in this thread, I compared the first 4k of fragments 0-57 on each > disk and found all the corruption was on disk 1. (And the entire > 207*4096 bytes on fragment 58.) Grabbing more than 4k of each > fragment brings in data from other files. And indeed, they are all 4K sized from the dump. > So, I might have compared > all of the data (fragments 0-57 are 128k uncompressed, and at least > fragment 0 uncompressed does lzop down to about 2k, so perhaps all the > other 128k fragments compress to within 4k, but maybe not) but this > might not have grabbed all the data. Any bad/good pair is enough. > > I could give you (56) 128k, (1) 68k, and (1) 828k fragments, but > they'd include extra data not involved, so you'd have to find a way to > use them, and without the metadata saying how many bytes of each > fragment to use, it might not be easy to put together. (Maybe > chopping off all the trailing 0's in each fragment would do the > trick.) Maybe
Re: [RFC][PATCH] btrfs: take the last remnants of ->d_fsdata use out
On Sun, May 13, 2018 at 07:03:18PM +0100, Al Viro wrote: > [spotted while going through ->d_fsdata handling around d_splice_alias(); > don't really care which tree that goes through] > > The only thing even looking at ->d_fsdata in there (since 2012) > had been kfree(dentry->d_fsdata) in btrfs_dentry_delete(). Which, > incidentally, is all btrfs_dentry_delete() does. Right, it's a leftover after the DCACHE_NEED_LOOKUP removal. I'll add it to btrfs tree, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set
On 2018年05月14日 17:30, james harvey wrote: > On Mon, May 14, 2018 at 4:20 AM, Roman Mamedovwrote: >> On Mon, 14 May 2018 11:10:34 +0300 >> Nikolay Borisov wrote: >> >>> But if we have mounted the fs with FORCE_COMPRESS shouldn't we disregard >>> the inode flags, presumably the admin knows what he is doing? >> >> Please don't. Personally I always assumed chattr +C would prevent both CoW >> and >> compression, and used that as a way to override volume-wide compress-force >> for >> a particular folder. Now that it turns out this wasn't working, the patch >> would fix it to behave in line with prior expectations. > > "the patch would fix it to behave" might be missing one part. Qu, am > I right (best guess, not familiar with btrfs internals enough) this > patch will prevent new files and files of size 0 from getting > compression with NODATASUM or NODATACOW, Yep. > but would leave existing > compressed files (that shouldn't have been compressed) as they are? Yep again, unfortunately. > After getting an error from btrfs check, BTW for this particular case, --mode=lowmem would provide more detailed info. > presumably, user would need > to copy the file(s) to a temporary and copy it back over itself, to > get rid of the unwanted compression? Unfortunately, reading (copying) data from the original fs is no longer that safe (just like your case). But if reading works fine, then copying things back will trigger normal write routine, thus it would be plain text extent. And if possible, please don't just remove those offending files (yet). Your binary dump would help a lot locating the root case. Thanks, Qu > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass
On Mon, May 14, 2018 at 2:36 AM, Qu Wenruowrote: > OK, I could reproduce it now. > > Just mount with -o nodatasum, then create a file. > Remount with compress-force=lzo, then write something. > > So at least btrfs should disallow such thing. > > Thanks, > Qu Would the corrupted dump and correct one of the file, and kernel with kasan output still help? Or, with what you reproduced, do you have what you need? On Mon, May 14, 2018 at 1:30 AM, Qu Wenruo wrote: > So there is something wrong that btrfs allows compressed data to be > generated for such file. > (Could not reproduce the same behavior with 4.16 kernel, could such > problem happens in older kernels? Or just get fixed recently?) > > Then some corruption screwed up the compressed data, and when we > decompress, the kernel is screwed up. In this thread, Chris Murphy noted systemd sets the "C" attribute, and discussed what sounds to me like what happened here: "Usually nocow also means no compression. But in the archives is a thread where I found that compression can be forced on nocow if the file is submitted for defragmentation and either the volume is mounted with compression or the file has inherited chattr +c (I don't remember which or possibly both). And systemd does submit rotated logs for defragmentation." (If you don't need the dumps and kernel kasan output, you can skip the rest of this reply) > Yep, even the last case it still looks like that it's kernel memory get > corrupted. > > From the thread, since you have already located the corrupted mirror, > would you please provide the corrupted dump along with correct one? > > It would help a lot for us to under stand what's going on. Absolutely. I'm not sure how to best get you that, though. "filefrag -v" on one of the files can be seen here: https://bugzilla.kernel.org/attachment.cgi?id=275953 It lists 58 fragments. filefrag lists the ending offsets and length based on the uncompressed sizes. filefrag doesn't account for the compression. So, in this thread, I compared the first 4k of fragments 0-57 on each disk and found all the corruption was on disk 1. (And the entire 207*4096 bytes on fragment 58.) Grabbing more than 4k of each fragment brings in data from other files. So, I might have compared all of the data (fragments 0-57 are 128k uncompressed, and at least fragment 0 uncompressed does lzop down to about 2k, so perhaps all the other 128k fragments compress to within 4k, but maybe not) but this might not have grabbed all the data. I could give you (56) 128k, (1) 68k, and (1) 828k fragments, but they'd include extra data not involved, so you'd have to find a way to use them, and without the metadata saying how many bytes of each fragment to use, it might not be easy to put together. (Maybe chopping off all the trailing 0's in each fragment would do the trick.) Maybe the first 9 byte header on each fragment encodes the length actually used? If this is useful to you, I'd be happy to provide it, along with the correct one. If there's a better way than this, I'd be happy to do that instead. I of course can't just copy the file, so have to do something like dd or "btrfs-map-logical -o". "btrfs-map-logical -o" can't automatically grab the proper length, because it needs a size argument, and if not given, defaults to the 16k nodesize. > The dump indicates the same conclusion you reached. > The inode has NODATACOW NODATASUM flag, which means it should not has > csum nor has data compressed. > While in fact we have tons of compressed extents. > > But the following fiemap result also shows that these extents get > shared. This could happen when there is a snapshot. I do run snapper. > To pindown the lzo decompress corruption, kasan would be a nice try. > However this means you need to enable it at compile time, and recompile > a kernel. > Not to mention kasan has a great impact on performance. > > But it should provide more info before memory get corrupted. Sure, it's compiling. I'll probably be available to run it and send results in 14 hours, if needed. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] btrfs-progs: fsck-tests: add test case to check symlinks with odd flags
There are two bad symlinks in the test case. One is with immutable attribute. Another one is with append attribute. Signed-off-by: Su Yue--- .../034-odd-inode-flags/default_case.img | Bin 0 -> 4096 bytes tests/fsck-tests/034-odd-inode-flags/test.sh | 15 +++ 2 files changed, 15 insertions(+) create mode 100644 tests/fsck-tests/034-odd-inode-flags/default_case.img create mode 100755 tests/fsck-tests/034-odd-inode-flags/test.sh diff --git a/tests/fsck-tests/034-odd-inode-flags/default_case.img b/tests/fsck-tests/034-odd-inode-flags/default_case.img new file mode 100644 index ..43a2a6f62d5ef3afd77f117b577a56ad6098ed19 GIT binary patch literal 4096 zcmeH}c{CJUAIFU?k*yI4S@RGnly%15rj(_~G8rO6vS$b*hR9G^M_GnP*~gG%*XULD zecwiA3?YW>hM5=dd(Ly-=e+&-p7Z|Uo_l`xdw=)dbMEhRes@7VO!Ok2v8iSFcVXRY z0S9$YY#lgx4lMhgmx00fKp$b;YjjKwbl1Mc|4Xm#`|-vHGaC;^56vNgLjwQD1pG!) z0%s{UtRTF}`3%o*f~o}Ub~i7&5a;#~3S~wq>yb%mpwOOBQV@`K}NNkK2T)W zP3ZBKkdz|>(+ppAd?n`Mhq5Y_Mx6#)lkr%V0RYfv XKS#j!hDNZK|Zl0`#(aZhs{~>g=;i^@`D f+(}JrN_k>!De44F)K~Z_B2ji*e|B3TM=T#=3J?hFQkLT zcoHc*_{i*Zl|=_O)<@(9atD)c?k`SQU{$3zUo_)!00YeaVw>N{!sut0FUEY0%ig=a zMor(Rh4`G&3Z`YTQC;Vd8DgP6v(Pk4lF8cYH?czMG@QAi}DN #2dOFHE}ny2M`mZ24yt(4GUf}VgxH+`7uL@ zKDpXm>?+9{UNN|}xP%5x8e hlk9$5E{}{L^#S!Q*F3j-dy9VK#9I*B-fuj6aJJl z=P4E0NBM;7Fya(6l|UU2sBHD~_7;trI@t}asks;9Kk+;cNS#DJ)+5qi%9$5}Frqs< zGo#p;!NId;WbB=u?K!Wum6^@+(Q%lm<^l5MSYiPzgj6jAV1WhvGO+kDGX zzw5{X`z6v>V}g9P^96Zk*+d9P;fMmiO;S-TF=bQT1)?9j;R){>iIAX>R5$~D?vMpj zBnOV)t?w>WW|3jF#YJ{gK;@)e6U`2;z47;3jE!%&}epjB+(;{Nt3NKke zgQOrb4!>S43Sh|5t~6Q9_eALyPQFaH!cTLiXpX=W)Jx7Uxq;~%Hy)VR7GgFu`>T4N zZ?@I};@Zg%oqD*Vk4*qEy@RB5H{Z >mf%4)?l}C7)7z9RiH3+|z}`sur%ucH}%V zPz+A(a7rcq^2?5ccAa_hN*yFmglI!5N>l2$a={+~R77Z4R7$n%aIEVA=N$ z1b$h$FIy~<-xmLlH6{;68g#pjl7j63c)%xA>Dd9r0^C%QE)RQfgi2 zXiewE(h-{O)QWl=h!TjI+0de3-VC=YSC^0MN(pB5!!V!#68@VzTm&5G-AS>CCye#X zC(0KE`nt?7e5TX|O%{xg!*Ab5P=2^-DRjo81w`|=zUk#|&(uj(4p2-{$h3e`sM17w zdW#Tl75Q}}emQch`~$C9_sjTscj@M>sfi4kHHdcdX*RC$on>k-Dn@x1YE7>^O(DHf z{Sf3DiCn-r) e#3Nqt1-yQ4Z zjQbKaB#VjYFLQ=5E+mdcWR7)?7QwYYr_nqX2!rj`wma*sUcEyVH)-UMp9vk5p*sJi z)uP>;M~F@^1@nip{sw)_u!t0m3wfd_^tsEjdl3JV!)ti4tO%j3AZkU&1P8pdqT2@4 z)2a@Bm48L_Y|X>TZU*bQ)wNAuogJWSw*}wLwwe>k*K*tA6k?IBPtx`FWcq8tfigF_ z`^-m<5?dEeNYIhR9? zpXv|E5a?(uK@>aY^((xBosb-{GjPf)k(;|7FUGk~Pl7(`D#yu_syY49#0sXEQ zq2Yh_;mDS0F#bl9X5`NV5q91498spY6Yu}FrF1zva3DCCYrYxh$_DQ7>Kut=Pt1kZ z=
[PATCH 1/3] btrfs-progs: check: check symlinks with append/immutable flags
Define new macro I_ERR_ODD_INODE_FLAGS to represents odd inode flags. Symlinks should never have append/immutable flags. While processing inodes, if found a symlink with append/immutable flags, mark the inode record with I_ERR_ODD_INODE_FLAGS. This is for original mode. Signed-off-by: Su Yue--- check/main.c | 7 +++ check/mode-original.h | 1 + 2 files changed, 8 insertions(+) diff --git a/check/main.c b/check/main.c index 68da994f7ae0..f5f52c406046 100644 --- a/check/main.c +++ b/check/main.c @@ -576,6 +576,8 @@ static void print_inode_error(struct btrfs_root *root, struct inode_record *rec) fprintf(stderr, ", link count wrong"); if (errors & I_ERR_FILE_EXTENT_ORPHAN) fprintf(stderr, ", orphan file extent"); + if (errors & I_ERR_ODD_INODE_FLAGS) + fprintf(stderr, ", odd inode flags"); fprintf(stderr, "\n"); /* Print the orphan extents if needed */ if (errors & I_ERR_FILE_EXTENT_ORPHAN) @@ -805,6 +807,7 @@ static int process_inode_item(struct extent_buffer *eb, { struct inode_record *rec; struct btrfs_inode_item *item; + u64 flags; rec = active_node->current; BUG_ON(rec->ino != key->objectid || rec->refs > 1); @@ -822,6 +825,10 @@ static int process_inode_item(struct extent_buffer *eb, rec->found_inode_item = 1; if (rec->nlink == 0) rec->errors |= I_ERR_NO_ORPHAN_ITEM; + flags = btrfs_inode_flags(eb, item); + if (rec->imode & BTRFS_FT_SYMLINK && + flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND)) + rec->errors = I_ERR_ODD_INODE_FLAGS; maybe_free_inode_rec(_node->inode_cache, rec); return 0; } diff --git a/check/mode-original.h b/check/mode-original.h index 368de692fdd1..13cfa5b9e1b3 100644 --- a/check/mode-original.h +++ b/check/mode-original.h @@ -186,6 +186,7 @@ struct file_extent_hole { #define I_ERR_LINK_COUNT_WRONG (1 << 13) #define I_ERR_FILE_EXTENT_ORPHAN (1 << 14) #define I_ERR_FILE_EXTENT_TOO_LARGE(1 << 15) +#define I_ERR_ODD_INODE_FLAGS (1 << 16) struct inode_record { struct list_head backrefs; -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] btrfs-progs: check: verify symlinks with append/immutable flags
This patchset can be fetch from my github: https://github.com/Damenly/btrfs-progs/commits/odd_inode_flags symlinks should never have append/immutable attributes. This patchset enables btrfs check to verify such corruption. PATCH[1] is for original mode. PATCH[2] is for original mode. PATCH[3] adds a test image. For further use, the directory is called odd-inode-flags. #issue 133 Su Yue (3): btrfs-progs: check: check symlinks with append/immutable flags btrfs-progs: lowmem: check symlinks with append/immutable flags btrfs-progs: fsck-tests: add test case to check symlinks with odd flags check/main.c | 7 +++ check/mode-lowmem.c | 10 ++ check/mode-original.h| 1 + .../034-odd-inode-flags/default_case.img | Bin 0 -> 4096 bytes tests/fsck-tests/034-odd-inode-flags/test.sh | 15 +++ 5 files changed, 33 insertions(+) create mode 100644 tests/fsck-tests/034-odd-inode-flags/default_case.img create mode 100755 tests/fsck-tests/034-odd-inode-flags/test.sh -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] btrfs-progs: lowmem: check symlinks with append/immutable flags
Symlinks should never have append/immutable flags. While checking inodes, if found a symlink with append/immutable flags, report and record the fatal error. This is for lowmem mode. Signed-off-by: Su Yue--- check/mode-lowmem.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 9890180d1d3c..61c4e7f23d47 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -2274,6 +2274,7 @@ static int check_inode_item(struct btrfs_root *root, struct btrfs_path *path) struct btrfs_key last_key; u64 inode_id; u32 mode; + u64 flags; u64 nlink; u64 nbytes; u64 isize; @@ -2307,10 +2308,19 @@ static int check_inode_item(struct btrfs_root *root, struct btrfs_path *path) isize = btrfs_inode_size(node, ii); nbytes = btrfs_inode_nbytes(node, ii); mode = btrfs_inode_mode(node, ii); + flags = btrfs_inode_flags(node, ii); dir = imode_to_type(mode) == BTRFS_FT_DIR; nlink = btrfs_inode_nlink(node, ii); nodatasum = btrfs_inode_flags(node, ii) & BTRFS_INODE_NODATASUM; + if (mode & BTRFS_FT_SYMLINK && + (flags & (BTRFS_INODE_IMMUTABLE | BTRFS_INODE_APPEND))) { + err = FATAL_ERROR; + error( +"symlinks shall never be with immutable/append, root %llu inode item [%llu] flags %llu may be corrupted", + root->objectid, inode_id, flags); + } + while (1) { btrfs_item_key_to_cpu(path->nodes[0], _key, path->slots[0]); ret = btrfs_next_item(root, path); -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set
[Adding David to CC] On 14.05.2018 12:39, Roman Mamedov wrote: > On Mon, 14 May 2018 11:36:26 +0300 > Nikolay Borisovwrote: > >> So what made you have these expectation, is it codified somewhere >> (docs/man pages etc)? I'm fine with that semantics IF this is what >> people expect. > > "Compression ...does not work for NOCOW files": > https://btrfs.wiki.kernel.org/index.php/Compression > > The mount options man page does not say that the NOCOW attribute of files will > be disregarded with compress-force. It only mentions interaction with the > nodatacow and nodatasum mount options. So I'd expect the attribute to still > work and prevent compression of NOCOW files. I wouldn't say this is very clear, it needs to be stated explicitly. > >> Now the question is why people grew up to have this expectation and not the >> other way round? IMO force_compress should really disregard everything else > > Both are knobs that the user needs to explicitly set, the difference is that > the +C attribute is fine-grained and the mount option is global. If they are > set by the user to conflicting values, it seems more useful to have the > fine-grained control override the global one, not the other way round. This is valid reasoning but so is mine. So I'd like to have some rules on that matter such that in the future things will have consistent semantics. Obviously in this case the "local options trump global ones" seems to be prevalent. I don't have problem with that but this should codified somewhere. David, what's your take on that. Where do you think will be the best place to say that local, per-inode options take precedence over global ones? > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set
On Mon, 14 May 2018 11:36:26 +0300 Nikolay Borisovwrote: > So what made you have these expectation, is it codified somewhere > (docs/man pages etc)? I'm fine with that semantics IF this is what > people expect. "Compression ...does not work for NOCOW files": https://btrfs.wiki.kernel.org/index.php/Compression The mount options man page does not say that the NOCOW attribute of files will be disregarded with compress-force. It only mentions interaction with the nodatacow and nodatasum mount options. So I'd expect the attribute to still work and prevent compression of NOCOW files. > Now the question is why people grew up to have this expectation and not the > other way round? IMO force_compress should really disregard everything else Both are knobs that the user needs to explicitly set, the difference is that the +C attribute is fine-grained and the mount option is global. If they are set by the user to conflicting values, it seems more useful to have the fine-grained control override the global one, not the other way round. -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set
On Mon, May 14, 2018 at 4:36 AM, Nikolay Borisovwrote: > On 14.05.2018 11:20, Roman Mamedov wrote: >> On Mon, 14 May 2018 11:10:34 +0300 >> Nikolay Borisov wrote: >> >>> But if we have mounted the fs with FORCE_COMPRESS shouldn't we disregard >>> the inode flags, presumably the admin knows what he is doing? >> >> Please don't. Personally I always assumed chattr +C would prevent both CoW >> and >> compression, and used that as a way to override volume-wide compress-force >> for >> a particular folder. Now that it turns out this wasn't working, the patch >> would fix it to behave in line with prior expectations. > > > So what made you have these expectation, is it codified somewhere > (docs/man pages etc)? I'm fine with that semantics IF this is what > people expect. https://btrfs.wiki.kernel.org/index.php/Compression "Compression... does not work for NOCOW files." IMO, without a checksum, allowing compression is too much of a data risk. Uncompressed, user may be able to salvage most of the data with partial corruption. Compressed, they can't get anything to even look at, unless they have a mirrored copy that is fine, figure out which disk is corrupted, and mount degraded without it. > Now the question is why people grew up to have this expectation and not > the other way round? IMO force_compress should really disregard > everything else, if not then it should be documented. The man pages [0] > say: > > " > If compress-force is specified, the compression will allways be > attempted, but the data may end up uncompressed if the compression would > make them larger. > > As this is too simple, the compress-force is a workaround that will > compress most of the files at the cost of some wasted CPU cycles on > failed attempts. The heuristics of compress will improve in the future > so this will not be necessary. > " > > In this case I think the man page for this option should be documented > to say that inode-specific flags preclude mount flags. > > https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)#MOUNT_OPTIONS Agreed the man page should be clarified. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set
On Mon, May 14, 2018 at 4:20 AM, Roman Mamedovwrote: > On Mon, 14 May 2018 11:10:34 +0300 > Nikolay Borisov wrote: > >> But if we have mounted the fs with FORCE_COMPRESS shouldn't we disregard >> the inode flags, presumably the admin knows what he is doing? > > Please don't. Personally I always assumed chattr +C would prevent both CoW and > compression, and used that as a way to override volume-wide compress-force for > a particular folder. Now that it turns out this wasn't working, the patch > would fix it to behave in line with prior expectations. "the patch would fix it to behave" might be missing one part. Qu, am I right (best guess, not familiar with btrfs internals enough) this patch will prevent new files and files of size 0 from getting compression with NODATASUM or NODATACOW, but would leave existing compressed files (that shouldn't have been compressed) as they are? After getting an error from btrfs check, presumably, user would need to copy the file(s) to a temporary and copy it back over itself, to get rid of the unwanted compression? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
4.13 call trace in btrfs_update_device
Hello, seeing calltraces like this on a Ubuntu 4.13 kernel, not sure what happens here and what the impact is: [13729.871604] [ cut here ] [13729.871647] WARNING: CPU: 1 PID: 3048 at /build/linux-hwe-rDkE7z/linux-hwe-4.13.0/fs/btrfs/ctree.h:1559 btrfs_update_device+0x1b4/0x1c0 [btrfs] [13729.871648] Modules linked in: xen_pciback xt_physdev br_netfilter iptable_filter ip_tables x_tables xen_netback xen_blkback algif_skcipher af_alg xen_gntdev xen_evtchn xenfs xen_privcmd nls_iso8859_1 dm_crypt intel_rapl bridge stp llc x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_rapl_perf shpchp ie31200_edac mei_me mei nuvoton_cir rc_core lpc_ich mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid i915 aesni_intel firewire_ohci i2c_algo_bit aes_x86_64 tg3 crypto_simd drm_kms_helper ptp firewire_core cryptd syscopyarea mxm_wmi glue_helper [13729.871702] crc_itu_t pps_core sysfillrect ahci sysimgblt fb_sys_fops libahci drm wmi video [13729.871714] CPU: 1 PID: 3048 Comm: btrfs-cleaner Tainted: G W 4.13.0-41-generic #46~16.04.1-Ubuntu [13729.871715] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Extreme6, BIOS P2.80 07/01/2013 [13729.871717] task: 8802a1211740 task.stack: c9004231 [13729.871743] RIP: e030:btrfs_update_device+0x1b4/0x1c0 [btrfs] [13729.871744] RSP: e02b:c90042313d10 EFLAGS: 00010206 [13729.871746] RAX: 0fff RBX: RCX: 0917dfcfbe00 [13729.871748] RDX: 0004 RSI: 3dd4 RDI: 8801126aaaf0 [13729.871749] RBP: c90042313d58 R08: 3dd8 R09: c90042313cc8 [13729.871751] R10: 1000 R11: 0003 R12: 8802a2779a10 [13729.871752] R13: 8802a30ad000 R14: 3db4 R15: 8801126aaaf0 [13729.871761] FS: () GS:8802b248() knlGS:8802b248 [13729.871762] CS: e033 DS: ES: CR0: 80050033 [13729.871764] CR2: 7f870df80480 CR3: 0001a1618000 CR4: 00042660 [13729.871766] Call Trace: [13729.871795] btrfs_remove_chunk+0x2b8/0x880 [btrfs] [13729.871819] btrfs_delete_unused_bgs+0x336/0x3e0 [btrfs] [13729.871841] cleaner_kthread+0x155/0x190 [btrfs] [13729.871847] kthread+0x10c/0x140 [13729.871868] ? run_one_async_done+0xc0/0xc0 [btrfs] [13729.871871] ? kthread_create_on_node+0x70/0x70 [13729.871875] ret_from_fork+0x35/0x40 [13729.871877] Code: 00 4c 89 ff 45 31 c0 ba 10 00 00 00 4c 89 f6 e8 73 23 ff ff 4c 89 ff e8 7b f5 fc ff e9 d9 fe ff ff b8 f4 ff ff ff e9 d9 fe ff ff <0f> ff eb b8 e8 53 e8 ba c0 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 [13729.871923] ---[ end trace 2e6095c7c763f649 ]--- [13729.885370] [ cut here ] [13729.885411] WARNING: CPU: 2 PID: 3048 at /build/linux-hwe-rDkE7z/linux-hwe-4.13.0/fs/btrfs/ctree.h:1559 btrfs_update_device+0x1b4/0x1c0 [btrfs] [13729.885412] Modules linked in: xen_pciback xt_physdev br_netfilter iptable_filter ip_tables x_tables xen_netback xen_blkback algif_skcipher af_alg xen_gntdev xen_evtchn xenfs xen_privcmd nls_iso8859_1 dm_crypt intel_rapl bridge stp llc x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_rapl_perf shpchp ie31200_edac mei_me mei nuvoton_cir rc_core lpc_ich mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid i915 aesni_intel firewire_ohci i2c_algo_bit aes_x86_64 tg3 crypto_simd drm_kms_helper ptp firewire_core cryptd syscopyarea mxm_wmi glue_helper [13729.885466] crc_itu_t pps_core sysfillrect ahci sysimgblt fb_sys_fops libahci drm wmi video [13729.885478] CPU: 2 PID: 3048 Comm: btrfs-cleaner Tainted: G W 4.13.0-41-generic #46~16.04.1-Ubuntu [13729.885479] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Extreme6, BIOS P2.80 07/01/2013 [13729.885481] task: 8802a1211740 task.stack: c9004231 [13729.885506] RIP: e030:btrfs_update_device+0x1b4/0x1c0 [btrfs] [13729.885508] RSP: e02b:c90042313d10 EFLAGS: 00010206 [13729.885510] RAX: 0fff RBX: RCX: 0917dfcfbe00 [13729.885512] RDX: 0004 RSI: 3dd4 RDI: 8801126aaaf0 [13729.885513] RBP: c90042313d58 R08: 3dd8 R09: c90042313cc8 [13729.885514] R10: 1000 R11: 0003 R12: 8802a25414d0 [13729.885516] R13: 8802a30ad000 R14: 3db4 R15: 8801126aaaf0 [13729.885524] FS: () GS:8802b250() knlGS:88003d30 [13729.885525] CS: e033 DS: ES: CR0:
Clarification needed about libbtrfs & libbtrfsutil
Are both of these meant to be public libraries, installed on the user systems, and available in .so variant as well for 3rd party development and public dynamic linking? Or are these private internal libraries, which are installed as public runtime only, simply to share code between the utils, but otherwise provide no abi stability and will forever remain libfoo.so.0? Or should these even be a noinst_ libraries (~= Libtool Convenience Libraries), and are simply intermediate by-products? I'm asking because despite compiling shared & static variants of these libraries, and "shared linked" and "static linked" variants of the utils, it appears that all utilities are statically linking against libbtrfs/libbtrfsutils. Thus no binaries nor bindings, dynamically link against neither libbtrfs nor libbtrfsutil. Tweaking the makefile to use libs_shared variable instead of libs or libs_static, results in slightly smaller binaries, dynamically linked against libbtrfs/libbtrfsutil. But it is hard to tell if this is a bug/mistake, or an intentional feature. -- Regards, Dimitri. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set
On 14.05.2018 11:20, Roman Mamedov wrote: > On Mon, 14 May 2018 11:10:34 +0300 > Nikolay Borisovwrote: > >> But if we have mounted the fs with FORCE_COMPRESS shouldn't we disregard >> the inode flags, presumably the admin knows what he is doing? > > Please don't. Personally I always assumed chattr +C would prevent both CoW and > compression, and used that as a way to override volume-wide compress-force for > a particular folder. Now that it turns out this wasn't working, the patch > would fix it to behave in line with prior expectations. So what made you have these expectation, is it codified somewhere (docs/man pages etc)? I'm fine with that semantics IF this is what people expect. Now the question is why people grew up to have this expectation and not the other way round? IMO force_compress should really disregard everything else, if not then it should be documented. The man pages [0] say: " If compress-force is specified, the compression will allways be attempted, but the data may end up uncompressed if the compression would make them larger. As this is too simple, the compress-force is a workaround that will compress most of the files at the cost of some wasted CPU cycles on failed attempts. The heuristics of compress will improve in the future so this will not be necessary. " In this case I think the man page for this option should be documented to say that inode-specific flags preclude mount flags. https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)#MOUNT_OPTIONS -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set
On Mon, 14 May 2018 11:10:34 +0300 Nikolay Borisovwrote: > But if we have mounted the fs with FORCE_COMPRESS shouldn't we disregard > the inode flags, presumably the admin knows what he is doing? Please don't. Personally I always assumed chattr +C would prevent both CoW and compression, and used that as a way to override volume-wide compress-force for a particular folder. Now that it turns out this wasn't working, the patch would fix it to behave in line with prior expectations. -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set
On 2018年05月14日 16:10, Nikolay Borisov wrote: > > > On 14.05.2018 10:02, Qu Wenruo wrote: >> As btrfs(5) specified: >> >> Note >> If nodatacow or nodatasum are enabled, compression is disabled. >> >> If NODATASUM or NODATACOW set, we should not compress the extent. >> >> And in fact, we have bug report about corrupted compressed extent >> leading to memory corruption in mail list. >> Although it's mostly buggy lzo implementation causing the problem, btrfs > > What does it mean "it's mostly buggy lzo implementation"? If we have bug > in the LZO implementation that btrfs uses shouldn't it be fixed as well ? It looks like buggy lzo decompress code which can't handle corrupted data and results some random kernel memory corruption. It's also possible that some incorrect function call in btrfs that caused the problem. It needs to be fixed of course, but right now, no binary dump of the offending data, so we can't verify or further investigate the root cause. Thanks, Qu > >> still needs to be fixed to meet the specification. >> >> Reported-by: James Harvey>> Signed-off-by: Qu Wenruo >> --- >> fs/btrfs/inode.c | 8 >> 1 file changed, 8 insertions(+) >> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >> index d241285a0d2a..dbef3f404559 100644 >> --- a/fs/btrfs/inode.c >> +++ b/fs/btrfs/inode.c >> @@ -396,6 +396,14 @@ static inline int inode_need_compress(struct inode >> *inode, u64 start, u64 end) >> { >> struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); >> >> +/* >> + * Btrfs doesn't support compression without csum or CoW. >> + * This should have the highest priority. >> + */ >> +if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW || >> +BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) >> +return 0; >> + >> /* force compress */ >> if (btrfs_test_opt(fs_info, FORCE_COMPRESS)) >> return 1; > > But if we have mounted the fs with FORCE_COMPRESS shouldn't we disregard > the inode flags, I'm afraid not. AFAIK the purpose of force compress is to bypass the compression ratio detection, other than generating compressed extent for NODATASUM use case. Thanks, Qu > presumably the admin knows what he is doing? Won't it > be better if somewhere further in the call chain we check if > FORCE_COMPRESS is set and if so override any inode flags. This can be > implemented by _always_ calling inode_need_compress to decide if we > should compress or not? > >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set
On 14.05.2018 10:02, Qu Wenruo wrote: > As btrfs(5) specified: > > Note > If nodatacow or nodatasum are enabled, compression is disabled. > > If NODATASUM or NODATACOW set, we should not compress the extent. > > And in fact, we have bug report about corrupted compressed extent > leading to memory corruption in mail list. > Although it's mostly buggy lzo implementation causing the problem, btrfs What does it mean "it's mostly buggy lzo implementation"? If we have bug in the LZO implementation that btrfs uses shouldn't it be fixed as well ? > still needs to be fixed to meet the specification. > > Reported-by: James Harvey> Signed-off-by: Qu Wenruo > --- > fs/btrfs/inode.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index d241285a0d2a..dbef3f404559 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -396,6 +396,14 @@ static inline int inode_need_compress(struct inode > *inode, u64 start, u64 end) > { > struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); > > + /* > + * Btrfs doesn't support compression without csum or CoW. > + * This should have the highest priority. > + */ > + if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW || > + BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) > + return 0; > + > /* force compress */ > if (btrfs_test_opt(fs_info, FORCE_COMPRESS)) > return 1; But if we have mounted the fs with FORCE_COMPRESS shouldn't we disregard the inode flags, presumably the admin knows what he is doing? Won't it be better if somewhere further in the call chain we check if FORCE_COMPRESS is set and if so override any inode flags. This can be implemented by _always_ calling inode_need_compress to decide if we should compress or not? > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] btrfs-progs: fsck-tests: Add test case for detecting compressed extent without csum
Signed-off-by: Qu Wenruo--- .../compressed_extent_without_csum.raw.xz | Bin 0 -> 21996 bytes .../032-compressed-nodatasum/test.sh | 24 ++ 2 files changed, 24 insertions(+) create mode 100644 tests/fsck-tests/032-compressed-nodatasum/compressed_extent_without_csum.raw.xz create mode 100755 tests/fsck-tests/032-compressed-nodatasum/test.sh diff --git a/tests/fsck-tests/032-compressed-nodatasum/compressed_extent_without_csum.raw.xz b/tests/fsck-tests/032-compressed-nodatasum/compressed_extent_without_csum.raw.xz new file mode 100644 index ..72a3bfd8f8b0bf2d610ce84af0abf4003204a9a1 GIT binary patch literal 21996 zcmeI4c{J4jzsG075Hj{%+4r4FX) )?EVxdjMBJ~vpQ zaRS5#<_3X4^dHT;NhCpX6KfFY&(`-Y@>`5KGo1zUr+38Hhkk(9eDGpEBo%Y3+a z39N$)?!aIWGP$QlLSB4}7GN-%M6!SI!A$!Rs}e(Kof{^+wd}33Zf48xDWoJETEpI1 zJaKTil_;sFO{y5G4$!50I$USB9T)ZvTBJJ#(kP=uiH=7`xOc6GK=#_tkR z!d{gr!mh{ZL~)Oii%0PJQei`H+mC;Yxx{KdS%S%NR`Bvi*^?KY$%coNW6;V}UKynZ z3=&39BUc+4ThIxAm!uXIYjteyW8|r>EMS(~?
[PATCH 0/3] btrfs-progs: Detect compressed extent without csum
Patches can be fetch from github: https://github.com/adam900710/btrfs-progs/tree/compress_nodatasum It's based on v4.16 stable branch. James Harvey from mail list reports a strange kernel panic, whichs show obviously kernel memory corruption, while after btrfs decompression failure. It turns out that, some compressed extent get corrupted on-disk, while the inode has NODATASUM set, there is no csum to prevent corrupted mirror being used. Although the root cause should be buggy lzo implementation, it still shows that btrfs is not following the behavior defined in btrfs(5): Note If nodatacow or nodatasum are enabled, compression is disabled. So at least make btrfs check to detect such problem. Qu Wenruo (3): btrfs-progs: check/lowmem: Add checks for compressed extent without csum btrfs-progs: check/original: Add checks for compressed extent without csum btrfs-progs: fsck-tests: Add test case for detecting compressed extent without csum check/main.c | 10 ++-- check/mode-lowmem.c | 18 + .../compressed_extent_without_csum.raw.xz | Bin 0 -> 21996 bytes .../032-compressed-nodatasum/test.sh | 24 ++ 4 files changed, 50 insertions(+), 2 deletions(-) create mode 100644 tests/fsck-tests/032-compressed-nodatasum/compressed_extent_without_csum.raw.xz create mode 100755 tests/fsck-tests/032-compressed-nodatasum/test.sh -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] btrfs-progs: check/lowmem: Add checks for compressed extent without csum
On 05/14/2018 03:03 PM, Qu Wenruo wrote: > There is one report of compressed extent happens in btrfs, but has no > csum and then leads to possible decompress error screwing up kernel > memory. > > Although it's a kernel bug, and won't cause problem until compressed > data get corrupted, let's catch such problem in advance. > > This patch will catch any unexpected compressed extent with: > > 1) 0 or less than expected csum > > 2) nodatasum flag set in the inode item > > This is for lowmem mode. > > Reported-by: James Harvey> Issue: #134 > Signed-off-by: Qu Wenruo Reviewed-by: Su Yue > --- > check/mode-lowmem.c | 18 ++ > 1 file changed, 18 insertions(+) > > diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c > index dac3201b7d99..8e6d5e8de12a 100644 > --- a/check/mode-lowmem.c > +++ b/check/mode-lowmem.c > @@ -1543,6 +1543,24 @@ static int check_file_extent(struct btrfs_root *root, > struct btrfs_key *fkey, > csum_found); > } > } > + /* > + * Extra check for compressed extents. > + * Btrfs doesn't allow NODATASUM and compressed extent co-exist, thus > + * all compressed extent should have csum. > + */ > + if (compressed && csum_found < search_len) { > + error( > +"root %llu EXTENT_DATA[%llu %llu] compressed extent must have csum, but only > %llu bytes has csum, expect %llu", > + root->objectid, fkey->objectid, fkey->offset, csum_found, > + search_len); > + err |= CSUM_ITEM_MISSING; > + } > + if (compressed && nodatasum) { > + error( > +"root %llu EXTENT_DATA[%llu %llu] is compressed, but inode flag doesn't > allow it", > + root->objectid, fkey->objectid, fkey->offset); > + err |= FILE_EXTENT_ERROR; > + } > > /* Check EXTENT_DATA hole */ > if (!no_holes && *end != fkey->offset) { > pEpkey.asc Description: application/pgp-keys
Re: [PATCH 2/3] btrfs-progs: check/original: Add checks for compressed extent without csum
On 05/14/2018 03:03 PM, Qu Wenruo wrote: > There is one report of compressed extent happens in btrfs, but has no > csum and then leads to possible decompress error screwing up kernel > memory. > > Although it's a kernel bug, and won't cause problem until compressed > data get corrupted, let's catch such problem in advance. > > This patch will catch any unexpected compressed extent with: > > 1) 0 or less than expected csum > > 2) nodatasum flag set in the inode item > > This is for original mode. > > Reported-by: James Harvey> Issue: #134 > Signed-off-by: Qu Wenruo Reviewed-by: Su Yue > --- > check/main.c | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/check/main.c b/check/main.c > index 891a6d797756..db064b27e84b 100644 > --- a/check/main.c > +++ b/check/main.c > @@ -1437,6 +1437,7 @@ static int process_file_extent(struct btrfs_root *root, > u64 mask = root->fs_info->sectorsize - 1; > u32 max_inline_size = min_t(u32, mask, > BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)); > + u8 compression; > int extent_type; > int ret; > > @@ -1460,9 +1461,9 @@ static int process_file_extent(struct btrfs_root *root, > > fi = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item); > extent_type = btrfs_file_extent_type(eb, fi); > + compression = btrfs_file_extent_compression(eb, fi); > > if (extent_type == BTRFS_FILE_EXTENT_INLINE) { > - u8 compression = btrfs_file_extent_compression(eb, fi); > struct btrfs_item *item = btrfs_item_nr(slot); > > num_bytes = btrfs_file_extent_inline_len(eb, slot, fi); > @@ -1494,6 +1495,8 @@ static int process_file_extent(struct btrfs_root *root, >btrfs_file_extent_encryption(eb, fi) || >btrfs_file_extent_other_encoding(eb, fi))) > rec->errors |= I_ERR_BAD_FILE_EXTENT; > + if (compression && rec->nodatasum) > + rec->errors |= I_ERR_BAD_FILE_EXTENT; > if (disk_bytenr > 0) > rec->found_size += num_bytes; > } else { > @@ -1512,7 +1515,8 @@ static int process_file_extent(struct btrfs_root *root, > if (disk_bytenr > 0 && > btrfs_header_owner(eb) != BTRFS_DATA_RELOC_TREE_OBJECTID) { > u64 found; > - if (btrfs_file_extent_compression(eb, fi)) > + > + if (compression) > num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi); > else > disk_bytenr += extent_offset; > @@ -1526,6 +1530,8 @@ static int process_file_extent(struct btrfs_root *root, > rec->found_csum_item = 1; > if (found < num_bytes) > rec->some_csum_missing = 1; > + if (compression && found < num_bytes) > + rec->errors |= I_ERR_SOME_CSUM_MISSING; > } else if (extent_type == BTRFS_FILE_EXTENT_PREALLOC) { > if (found > 0) { > ret = > check_prealloc_extent_written(root->fs_info, > pEpkey.asc Description: application/pgp-keys
[PATCH 1/3] btrfs-progs: check/lowmem: Add checks for compressed extent without csum
There is one report of compressed extent happens in btrfs, but has no csum and then leads to possible decompress error screwing up kernel memory. Although it's a kernel bug, and won't cause problem until compressed data get corrupted, let's catch such problem in advance. This patch will catch any unexpected compressed extent with: 1) 0 or less than expected csum 2) nodatasum flag set in the inode item This is for lowmem mode. Reported-by: James HarveyIssue: #134 Signed-off-by: Qu Wenruo --- check/mode-lowmem.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index dac3201b7d99..8e6d5e8de12a 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -1543,6 +1543,24 @@ static int check_file_extent(struct btrfs_root *root, struct btrfs_key *fkey, csum_found); } } + /* +* Extra check for compressed extents. +* Btrfs doesn't allow NODATASUM and compressed extent co-exist, thus +* all compressed extent should have csum. +*/ + if (compressed && csum_found < search_len) { + error( +"root %llu EXTENT_DATA[%llu %llu] compressed extent must have csum, but only %llu bytes has csum, expect %llu", + root->objectid, fkey->objectid, fkey->offset, csum_found, + search_len); + err |= CSUM_ITEM_MISSING; + } + if (compressed && nodatasum) { + error( +"root %llu EXTENT_DATA[%llu %llu] is compressed, but inode flag doesn't allow it", + root->objectid, fkey->objectid, fkey->offset); + err |= FILE_EXTENT_ERROR; + } /* Check EXTENT_DATA hole */ if (!no_holes && *end != fkey->offset) { -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] btrfs-progs: fsck-tests: Add test case for detecting compressed extent without csum
Signed-off-by: Qu Wenruo--- .../compressed_extent_without_csum.raw.xz | Bin 0 -> 21996 bytes .../032-compressed-nodatasum/test.sh | 24 ++ 2 files changed, 24 insertions(+) create mode 100644 tests/fsck-tests/032-compressed-nodatasum/compressed_extent_without_csum.raw.xz create mode 100755 tests/fsck-tests/032-compressed-nodatasum/test.sh diff --git a/tests/fsck-tests/032-compressed-nodatasum/compressed_extent_without_csum.raw.xz b/tests/fsck-tests/032-compressed-nodatasum/compressed_extent_without_csum.raw.xz new file mode 100644 index ..72a3bfd8f8b0bf2d610ce84af0abf4003204a9a1 GIT binary patch literal 21996 zcmeI4c{J4jzsG075Hj{%+4r4FX) )?EVxdjMBJ~vpQ zaRS5#<_3X4^dHT;NhCpX6KfFY&(`-Y@>`5KGo1zUr+38Hhkk(9eDGpEBo%Y3+a z39N$)?!aIWGP$QlLSB4}7GN-%M6!SI!A$!Rs}e(Kof{^+wd}33Zf48xDWoJETEpI1 zJaKTil_;sFO{y5G4$!50I$USB9T)ZvTBJJ#(kP=uiH=7`xOc6GK=#_tkR z!d{gr!mh{ZL~)Oii%0PJQei`H+mC;Yxx{KdS%S%NR`Bvi*^?KY$%coNW6;V}UKynZ z3=&39BUc+4ThIxAm!uXIYjteyW8|r>EMS(~?
[PATCH 0/3] btrfs-progs: Detect compressed extent without csum
Patches can be fetch from github: https://github.com/adam900710/btrfs-progs/tree/compress_nodatasum It's based on v4.16 stable branch. James Harvey from mail list reports a strange kernel panic, whichs show obviously kernel memory corruption, while after btrfs decompression failure. It turns out that, some compressed extent get corrupted on-disk, while the inode has NODATASUM set, there is no csum to prevent corrupted mirror being used. Although the root cause should be buggy lzo implementation, it still shows that btrfs is not following the behavior defined in btrfs(5): Note If nodatacow or nodatasum are enabled, compression is disabled. So at least make btrfs check to detect such problem. Qu Wenruo (3): btrfs-progs: check/lowmem: Add checks for compressed extent without csum btrfs-progs: check/original: Add checks for compressed extent without csum btrfs-progs: fsck-tests: Add test case for detecting compressed extent without csum check/main.c | 10 ++-- check/mode-lowmem.c | 18 + .../compressed_extent_without_csum.raw.xz | Bin 0 -> 21996 bytes .../032-compressed-nodatasum/test.sh | 24 ++ 4 files changed, 50 insertions(+), 2 deletions(-) create mode 100644 tests/fsck-tests/032-compressed-nodatasum/compressed_extent_without_csum.raw.xz create mode 100755 tests/fsck-tests/032-compressed-nodatasum/test.sh -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] btrfs-progs: check/original: Add checks for compressed extent without csum
There is one report of compressed extent happens in btrfs, but has no csum and then leads to possible decompress error screwing up kernel memory. Although it's a kernel bug, and won't cause problem until compressed data get corrupted, let's catch such problem in advance. This patch will catch any unexpected compressed extent with: 1) 0 or less than expected csum 2) nodatasum flag set in the inode item This is for original mode. Reported-by: James HarveyIssue: #134 Signed-off-by: Qu Wenruo --- check/main.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/check/main.c b/check/main.c index 891a6d797756..db064b27e84b 100644 --- a/check/main.c +++ b/check/main.c @@ -1437,6 +1437,7 @@ static int process_file_extent(struct btrfs_root *root, u64 mask = root->fs_info->sectorsize - 1; u32 max_inline_size = min_t(u32, mask, BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)); + u8 compression; int extent_type; int ret; @@ -1460,9 +1461,9 @@ static int process_file_extent(struct btrfs_root *root, fi = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item); extent_type = btrfs_file_extent_type(eb, fi); + compression = btrfs_file_extent_compression(eb, fi); if (extent_type == BTRFS_FILE_EXTENT_INLINE) { - u8 compression = btrfs_file_extent_compression(eb, fi); struct btrfs_item *item = btrfs_item_nr(slot); num_bytes = btrfs_file_extent_inline_len(eb, slot, fi); @@ -1494,6 +1495,8 @@ static int process_file_extent(struct btrfs_root *root, btrfs_file_extent_encryption(eb, fi) || btrfs_file_extent_other_encoding(eb, fi))) rec->errors |= I_ERR_BAD_FILE_EXTENT; + if (compression && rec->nodatasum) + rec->errors |= I_ERR_BAD_FILE_EXTENT; if (disk_bytenr > 0) rec->found_size += num_bytes; } else { @@ -1512,7 +1515,8 @@ static int process_file_extent(struct btrfs_root *root, if (disk_bytenr > 0 && btrfs_header_owner(eb) != BTRFS_DATA_RELOC_TREE_OBJECTID) { u64 found; - if (btrfs_file_extent_compression(eb, fi)) + + if (compression) num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi); else disk_bytenr += extent_offset; @@ -1526,6 +1530,8 @@ static int process_file_extent(struct btrfs_root *root, rec->found_csum_item = 1; if (found < num_bytes) rec->some_csum_missing = 1; + if (compression && found < num_bytes) + rec->errors |= I_ERR_SOME_CSUM_MISSING; } else if (extent_type == BTRFS_FILE_EXTENT_PREALLOC) { if (found > 0) { ret = check_prealloc_extent_written(root->fs_info, -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set
As btrfs(5) specified: Note If nodatacow or nodatasum are enabled, compression is disabled. If NODATASUM or NODATACOW set, we should not compress the extent. And in fact, we have bug report about corrupted compressed extent leading to memory corruption in mail list. Although it's mostly buggy lzo implementation causing the problem, btrfs still needs to be fixed to meet the specification. Reported-by: James HarveySigned-off-by: Qu Wenruo --- fs/btrfs/inode.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d241285a0d2a..dbef3f404559 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -396,6 +396,14 @@ static inline int inode_need_compress(struct inode *inode, u64 start, u64 end) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + /* +* Btrfs doesn't support compression without csum or CoW. +* This should have the highest priority. +*/ + if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW || + BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) + return 0; + /* force compress */ if (btrfs_test_opt(fs_info, FORCE_COMPRESS)) return 1; -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] btrfs-progs: check/original: Add checks for compressed extent without csum
There is one report of compressed extent happens in btrfs, but has no csum and then leads to possible decompress error screwing up kernel memory. Although it's a kernel bug, and won't cause problem until compressed data get corrupted, let's catch such problem in advance. This patch will catch any unexpected compressed extent with: 1) 0 or less than expected csum 2) nodatasum flag set in the inode item This is for original mode. Reported-by: James HarveyIssue: #134 Signed-off-by: Qu Wenruo --- check/main.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/check/main.c b/check/main.c index 891a6d797756..db064b27e84b 100644 --- a/check/main.c +++ b/check/main.c @@ -1437,6 +1437,7 @@ static int process_file_extent(struct btrfs_root *root, u64 mask = root->fs_info->sectorsize - 1; u32 max_inline_size = min_t(u32, mask, BTRFS_MAX_INLINE_DATA_SIZE(root->fs_info)); + u8 compression; int extent_type; int ret; @@ -1460,9 +1461,9 @@ static int process_file_extent(struct btrfs_root *root, fi = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item); extent_type = btrfs_file_extent_type(eb, fi); + compression = btrfs_file_extent_compression(eb, fi); if (extent_type == BTRFS_FILE_EXTENT_INLINE) { - u8 compression = btrfs_file_extent_compression(eb, fi); struct btrfs_item *item = btrfs_item_nr(slot); num_bytes = btrfs_file_extent_inline_len(eb, slot, fi); @@ -1494,6 +1495,8 @@ static int process_file_extent(struct btrfs_root *root, btrfs_file_extent_encryption(eb, fi) || btrfs_file_extent_other_encoding(eb, fi))) rec->errors |= I_ERR_BAD_FILE_EXTENT; + if (compression && rec->nodatasum) + rec->errors |= I_ERR_BAD_FILE_EXTENT; if (disk_bytenr > 0) rec->found_size += num_bytes; } else { @@ -1512,7 +1515,8 @@ static int process_file_extent(struct btrfs_root *root, if (disk_bytenr > 0 && btrfs_header_owner(eb) != BTRFS_DATA_RELOC_TREE_OBJECTID) { u64 found; - if (btrfs_file_extent_compression(eb, fi)) + + if (compression) num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi); else disk_bytenr += extent_offset; @@ -1526,6 +1530,8 @@ static int process_file_extent(struct btrfs_root *root, rec->found_csum_item = 1; if (found < num_bytes) rec->some_csum_missing = 1; + if (compression && found < num_bytes) + rec->errors |= I_ERR_SOME_CSUM_MISSING; } else if (extent_type == BTRFS_FILE_EXTENT_PREALLOC) { if (found > 0) { ret = check_prealloc_extent_written(root->fs_info, -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] btrfs-progs: check/lowmem: Add checks for compressed extent without csum
There is one report of compressed extent happens in btrfs, but has no csum and then leads to possible decompress error screwing up kernel memory. Although it's a kernel bug, and won't cause problem until compressed data get corrupted, let's catch such problem in advance. This patch will catch any unexpected compressed extent with: 1) 0 or less than expected csum 2) nodatasum flag set in the inode item This is for lowmem mode. Reported-by: James HarveyIssue: #134 Signed-off-by: Qu Wenruo --- check/mode-lowmem.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index dac3201b7d99..8e6d5e8de12a 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -1543,6 +1543,24 @@ static int check_file_extent(struct btrfs_root *root, struct btrfs_key *fkey, csum_found); } } + /* +* Extra check for compressed extents. +* Btrfs doesn't allow NODATASUM and compressed extent co-exist, thus +* all compressed extent should have csum. +*/ + if (compressed && csum_found < search_len) { + error( +"root %llu EXTENT_DATA[%llu %llu] compressed extent must have csum, but only %llu bytes has csum, expect %llu", + root->objectid, fkey->objectid, fkey->offset, csum_found, + search_len); + err |= CSUM_ITEM_MISSING; + } + if (compressed && nodatasum) { + error( +"root %llu EXTENT_DATA[%llu %llu] is compressed, but inode flag doesn't allow it", + root->objectid, fkey->objectid, fkey->offset); + err |= FILE_EXTENT_ERROR; + } /* Check EXTENT_DATA hole */ if (!no_holes && *end != fkey->offset) { -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "decompress failed" in 1-2 files always causes kernel oops, check/scrub pass
On 2018年05月14日 13:30, Qu Wenruo wrote: > > > On 2018年05月14日 12:41, james harvey wrote: >> On Sun, May 13, 2018 at 10:08 PM, Qu Wenruowrote: >>> On 2018年05月12日 13:08, james harvey wrote: Hardware is fine. Passes memtest86+ in SMP mode. Works fine on all other files. [ 381.869940] BUG: unable to handle kernel paging request at 00390e50 [ 381.870881] BTRFS: decompress failed [ 381.891775] IP: rebalance_domains+0x8a/0x2c0 >>> >>> The interesting part here is, btrfs is not showing up the call trace, >>> not even lzo code. >>> (Despite of the "decompress failed" message). >>> Maybe some corrupted data is screwing up some random kernel memory? >> >> I've been surprised by this too. I've seen a few "styles" of crashes from >> this. >> >> The fuller version of the one I posted in original post: >> https://bugzilla.kernel.org/attachment.cgi?id=275949 >> >> One that starts with a "general protection fault": >> https://bugzilla.kernel.org/attachment.cgi?id=275951 >> >> And my most recent version, starts with "BTRFS: decompress failed" >> then "BUG: unable to handle kernel NULL pointer dereference at >> 0001": >> https://bugzilla.kernel.org/attachment.cgi?id=275961 >> >> This latest one does have a call trace including btrfs. The top of >> the call trace is "end_compressed_bio_read+0x34e/0x3d0 [btrfs]", and >> although it includes the word compressed, I'm not sure that's actually >> having to do with lzo compression. The call stack doesn't scream that >> to me. >> >> It seems like when the invalid decompression happens, that code itself >> doesn't give any kernel errors, but the rest of the kernel starts >> spazzing. > > Yep, even the last case it still looks like that it's kernel memory get > corrupted. > >> >> I've replicated this probably about 15 times now. Only happens on >> these files that have inconsistent mirrored data. > > From the thread, since you have already located the corrupted mirror, > would you please provide the corrupted dump along with correct one? > > It would help a lot for us to under stand what's going on. > >> >> >> >>> Would you please get the inode number of that corrupted files, and throw >>> it through btrfs-debug-tree? >>> >>> # btrfs-debug-tree -t | grep -A 50 \( >>> >>> This is the preferred method as it would provide all the details we >>> need. But since it could contain sensitive info like filename, please >>> double check before posting it. >> >> # ls -i >> system@00fa3c0596e64d2e84096520ca46f008-0001-00053cd2c1756577.journal >> 291489 >> system@00fa3c0596e64d2e84096520ca46f008-0001-00053cd2c1756577.journal >> >> # ls -i >> user-1000@b70add0ef010457d933fec23a2afa48a-0495-00053b6b6e65e9cf.journal >> 72267 >> user-1000@b70add0ef010457d933fec23a2afa48a-0495-00053b6b6e65e9cf.journal >> >> # btrfs-debug-tree -t 5 /dev/lvm/newMain1 | grep -A 50 \(291489 > >> debug.tree.291489 >> Available at: http://termbin.com/kegj >> >> # btrfs-debug-tree -t 5 /dev/lvm/newMain1 | grep -A 50 \(72267 > >> debug.tree.72267 >> Available at: http://termbin.com/xhdc > > The dump indicates the same conclusion you reached. > The inode has NODATACOW NODATASUM flag, which means it should not has > csum nor has data compressed. > While in fact we have tons of compressed extents. > > But the following fiemap result also shows that these extents get > shared. This could happen when there is a snapshot. > > So there is something wrong that btrfs allows compressed data to be > generated for such file. > (Could not reproduce the same behavior with 4.16 kernel, could such > problem happens in older kernels? Or just get fixed recently?) OK, I could reproduce it now. Just mount with -o nodatasum, then create a file. Remount with compress-force=lzo, then write something. So at least btrfs should disallow such thing. Thanks, Qu > > Then some corruption screwed up the compressed data, and when we > decompress, the kernel is screwed up. > > > To pindown the lzo decompress corruption, kasan would be a nice try. > However this means you need to enable it at compile time, and recompile > a kernel. > Not to mention kasan has a great impact on performance. > > But it should provide more info before memory get corrupted. > > Thanks, > Qu > >> >> >> >>> Or fiemap of that file could also help: >>> >>> # xfs_io -c "fiemap -v" >>> >>> This is completely safe, but I'm not 100% sure about if the info is enough. >> >> # xfs_io -c "fiemap -v" >> system@00fa3c0596e64d2e84096520ca46f008-0001-00053cd2c1756577.journal >> Available at: http://termbin.com/nsej >> >> # xfs_io -c "fiemap -v" >> system@00fa3c0596e64d2e84096520ca46f008-0001-00053cd2c1756577.journal >> Available at: http://termbin.com/4fiz > > >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org