Re: [PATCH v2 09/10] btrfs-progs: tests: add testcase for undelete-subvol
On 2018年03月27日 15:06, Lu Fengqi wrote: > The testcase checks the functionality of "btrfs rescue undelete-subvol", > including recovering an intact subvolume, and handling correctly > incomplete subvolume. > > Signed-off-by: Lu Fengqi> --- > v2: add shell quoting to test script. > > .../030-undelete-subvol/deleted_subvolume.img | Bin 0 -> 4096 bytes > .../030-undelete-subvol/drop_progress.raw.xz | Bin 0 -> 23452 bytes 030 is already taken. And just a nitpick, the file name is not obvious for its content. In fact for "drop_progress", it have 2 orphan roots while only one can be recovered. For "deleted_subvolume" it has 1 orphan root and it can be recovered. Despite that, it looks good to me. Thanks, Qu > tests/misc-tests/030-undelete-subvol/test.sh | 34 > + > 3 files changed, 34 insertions(+) > create mode 100644 tests/misc-tests/030-undelete-subvol/deleted_subvolume.img > create mode 100644 tests/misc-tests/030-undelete-subvol/drop_progress.raw.xz > create mode 100755 tests/misc-tests/030-undelete-subvol/test.sh > > diff --git a/tests/misc-tests/030-undelete-subvol/deleted_subvolume.img > b/tests/misc-tests/030-undelete-subvol/deleted_subvolume.img > new file mode 100644 > index > ..9168a8e33fbbb14f9ea78514721a52191b94d1a0 > GIT binary patch > literal 4096 > zcmeH|do+|=AIB%x8p^0djl0}JQb@$O%TY$;lEx)vlF+ymV;WO45{D?%IqnhB$S}Fi > zjN6EihKkC4NQhyKaT(V$2k-mO={@KF_pD{Fy}r-)`R-@^*504B_p@W+PlCQF!fF8j > zZGs!m9b0|F@NPJOvGJw?_&4=#{gw~i3;@(EFS(@+TiUf5-~Erz<=ODYja#|MmTnGi > zw`~I31pbc*g!Km3O1HeZD~KRV`{Pj~uzt3r2|UgTH}a > zWDkSP-%;<0K6oq%LMk0=5)g}2I{;`k(RwtnGuI_hB01WRSAongy!#b`_p*d$38#+; > zcYI;9gzxul-@Q0?=XZCRp1a_O(t&`~#!SFI<>VfKQp$%5D%q@qRO^IYiZQ11 > z*LJ*O@akHeln2DGHANbna7)texSTKS8w%9X2EBg2OP_*;ZbHjccM=V#lO%} > zATc9#siZ*izH<@v{m4ksm!$X!TLEok)EiGD1DIiBhF{t3@g30ydV%CQ75thpDR_Sz > zkNpn7>SYJJu-@rDqHlw-Wb-Wdq8i)|MhOAQWLvnU+THRaIvD@>=47X$XKjKl#3c>c > z+gVL~u-1qB(5Zq5RlSf2!q|mGkUWzvayXPkcZ+}`@>>iAb(MsWDCeg9_g?z@ > z+UzN}@~(J8&7SGm+x+QKwvX*meP^nxg%)4Y+tC!c%#Li1uw)!=uIZLYL%h({>6 > zI?6W8VFdIKTctfwulDqceyAL)1|1Zhpij6oC^|RGxbla6`S|q7kxq3u8hQMPc;I2Y > zCWZ`c;*ciOjl@3Y^DqdF9BWsR_*&1bIYVLtMwmWM2N+tNJ-7j(O%B-L35XizYGSd1 > zB~$y~k-c18T=w!oFaLH~OTx)ncn*m3#tpvpB({MHrvM%M=n2J!-i}LGsv{? > z<~5bwAY@lJ)i%#ao zFaVHTbX7=>Op^!R$6CJL^-x?kT(roP@)*Q8XnwB2j3pd-#ynd#jp7WCx#4WSfc|{y > z#5H*01m+|Z94ps6)UYtvKCre(ZF$dTAxE?$+j2>nxqYpZ_Rkl)A)IQAwvH~^(SzZi > z0p9u3gpBilS%_h>xu4tFT+N#=O{4QmFL|iRrJJjnU7WSMuKUyHtOvks7;`zGw0BVH > zJQ}jPdQ!xy(s!Us?OIYAl7nvL$7V~c{(G8ho_TID+~?_UeBFyGcNvvctBnsrrf > zR>%@OeHtPSWh}(@*~6Mo%wR1Xc{kY<0UPWj-p+;|;pI=V<}0+sd36pSJ9HL1 zedyJJH`V;qs}eahO35L0A>GOmQHYWmh`5IF7C{XLOE6~i$Hfktz9xZp$KS@J7FBer > zU6XsQh2^|0*VWg^M;!+bUUhGGt|)c?CClbV{)#jL~#x7@Hgi$!??u)-rbQ^;oY > zY!$-MsH;KQ%fzKtwafY8t$(=9mvBoO{a>gHD^*bc*g7T=1Gzb*{j2yz>ok& > z7M6gGH4Q*~983Vsv#8?bD~SiDobt+CdWEqjraBk8ESoH7vV6%oI3Ejb*$ > zuNVEyMF_j+Q>(9d(j#nI_W+HW?IqPn%^pKydSPdh>>Y3s5VIg;-RjrS*7CMI zy@k<|nP3VL@ndp87{u~HVW_{Y*$se=Z}$CDIdvvKrhnca(1M-26(9HuWE?Ohji > zuBpmHt(QTgQ-Yq{m73~S!#xX-w!hB|`k?xhx6>;m!B6zHz=2>h)6=4D_a-K3kJ6t; > zWMpX8X0U`fZB*msQ+^4vT8hol-6nS`h35;Ds!prmWNZ?5xhs3K9QX}xIY9op0XE<2 > zd-c9BbluK;C2_kTap|WlT=|N;-RvR trenZQU > zPiVGB&=zwO#labtRhCASg?G_h$wLNDK8)U6ui@n~0zl=OVap45dP71-m zMM_GKki$jYRwe^&@*Ja~QGj{(V%y{8|cwbYtaLwY@PFu7*0-Gi2}@`5%J{ > zXI9Rmc-)7felm=~CiMi=s78?Ut6J(^D8gq>(j?er4a{gMzE>X8BhF^HR&+yN > z``L#iGzIqve?b}mDKJ3R?!Sj~hEkClMj*aQPHs-je`Hw)G=a^_UGlEFuf?-JGn > zrt3Oz*O1f$_uqL2)~`vD^hS#j3mRtrVnrHk6fM)Az8Q5M9FVc;SjPt^dWinLHEw > z(Yd%niJ8}8Uk@IFxs`OJKq)!4{LOe!xf }C > zCCb1~=m9Lg$9ZHABR*mWv80B!dG3fdJeMV!XUBJIiz)c*3_^%D@4@C+zdY>o!34x- > zf=~s)a%m<7<=N$mEk3g{K0aS)5Q~o%4 zYQBQoFFhvh0X-TkNY8+hCRNwn{k z20Fx*h7a@QQ97Te92F>(9R>QL7W2uspH*KopFWYyiKEB0WdA*o`T(uUGBdW_*E > zzTw9jeaVG?gtI%=*AAzOt@~-PogS@pC ZeM(;pZJdD++rKZroo*A@Ch+ek@Fx_v;e7xA > > literal 0 > HcmV?d1 > > diff --git a/tests/misc-tests/030-undelete-subvol/drop_progress.raw.xz > b/tests/misc-tests/030-undelete-subvol/drop_progress.raw.xz > new file mode 100644 > index > ..3db2c80e61fa9bb99db374509dbb95910965d467 > GIT binary patch > literal 23452 > zcmeHPXH=63mktC7z4zV}r8hyO7ZF71J@gJj=p6y+(xijbKnNWvN|D}+^dbU^sPx_m > zAee9V?4CV)c4qg}`DSLFv-y{QZ{GXfd!PF}w_J((=^Fw77^^d{6wv`JX!HO8fOt%Q > z7>Rs>(P<0-IA0-=um{T^q#LWS2+;ZNGGoXTU+*0WG?kjb!RY9Rd@zE! > zprWN%HWIB*2GqTt(OQjR79XdXGC|eojjqRj@Ntje7V=+!*iK3 > ze=gkgvO#(Wb3wy1yH#b!yCt$!YIsJE^$1(}*0xh|G%dc{X8vwN4zHQ}bVmsR zo1FQ@#en!Cnsw{>RrWRKCJQwJZ4r;|fIzaNqWP$xtDSB^E`4R#+kNfFz-Pu^-_90d6FamaI%>#ZlImmLpgm-*(wek`j)@7COOkP^)r&;tzG%wF$uJ>jkzy^dj > z*q1CnCUcRGf<3#$dF~&$F~=`u6%?zfjrjSgZ#=WTN%JruO@f4|;_Itq@hv`` >
Re: [PATCH v2 08/10] btrfs-progs: undelete-subvol: add undelete-subvol subcommand
On 2018年03月27日 15:06, Lu Fengqi wrote: > Add the undelete-subvol subcommand for btrfs rescue. This subcommand is > used to recover deleted subvolume left intact on the device. > > Signed-off-by: Lu Fengqi> --- > v2: add -s option to specify subvol_id. > > cmds-rescue.c | 70 > +++ > 1 file changed, 70 insertions(+) > > diff --git a/cmds-rescue.c b/cmds-rescue.c > index c40088ad374e..c5132126e922 100644 > --- a/cmds-rescue.c > +++ b/cmds-rescue.c > @@ -25,6 +25,7 @@ > #include "disk-io.h" > #include "commands.h" > #include "utils.h" > +#include "undelete-subvol.h" > #include "help.h" > > static const char * const rescue_cmd_group_usage[] = { > @@ -248,6 +249,73 @@ out: > return !!ret; > } > > +static const char * const cmd_rescue_undelete_subvol_usage[] = { > + "btrfs rescue undelete-subvol [-s ] ", > + "Undelete deleted subvolume", > + "All deleted subvolume that still left intact on the device will be", > + "recovered. If -s option is given, then just recover the", > + "subvolume which specified by .", > + "", > + "-s specify the subvolume which will be recovered.", > + NULL > +}; > + > +static int cmd_rescue_undelete_subvol(int argc, char **argv) > +{ > + struct btrfs_fs_info *fs_info; > + char *devname; > + u64 subvol_id = 0; > + int ret; > + > + while (1) { > + int c = getopt(argc, argv, "s:"); > + > + if (c < 0) > + break; > + switch (c) { > + case 's': > + subvol_id = arg_strtou64(optarg); > + if (!is_fstree(subvol_id)) { > + error("%llu is not a valid subvolume id", > + subvol_id); > + ret = -EINVAL; > + goto out; > + } > + break; > + default: > + usage(cmd_rescue_undelete_subvol_usage); > + } > + } > + > + if (check_argc_exact(argc - optind, 1)) > + usage(cmd_rescue_undelete_subvol_usage); > + > + devname = argv[optind]; > + ret = check_mounted(devname); > + if (ret < 0) { > + error("could not check mount status: %s", strerror(-ret)); > + goto out; > + } else if (ret) { > + error("%s is currently mounted", devname); > + ret = -EBUSY; > + goto out; > + } > + > + fs_info = open_ctree_fs_info(devname, 0, 0, 0, OPEN_CTREE_WRITES | > + OPEN_CTREE_PARTIAL); I'm not sure if using OPEN_CTREE_PARTIAL here is a good idea. As the undelete-subvol looks like a tool to revert stupid user mistake, and we expect a healthy fs to be provided. And for corrupted fs, we may make the case worse. Thanks, Qu > + if (!fs_info) { > + error("could not open btrfs"); > + ret = -EIO; > + goto out; > + } > + > + ret = btrfs_undelete_intact_subvols(fs_info->tree_root, subvol_id); > + > + close_ctree(fs_info->tree_root); > +out: > + return ret; > +} > + > static const char rescue_cmd_group_info[] = > "toolbox for specific rescue operations"; > > @@ -260,6 +328,8 @@ const struct cmd_group rescue_cmd_group = { > { "zero-log", cmd_rescue_zero_log, cmd_rescue_zero_log_usage, > NULL, 0}, > { "fix-device-size", cmd_rescue_fix_device_size, > cmd_rescue_fix_device_size_usage, NULL, 0}, > + { "undelete-subvol", cmd_rescue_undelete_subvol, > + cmd_rescue_undelete_subvol_usage, NULL, 0}, > NULL_CMD_STRUCT > } > }; > signature.asc Description: OpenPGP digital signature
Re: [PATCH v2 07/10] btrfs-progs: undelete-subvol: introduce btrfs_undelete_intact_subvols
On 2018年03月27日 15:06, Lu Fengqi wrote: > The function default will traverse the all orphan items on the tree root, > and recover the all intact subvolumes. If subvol_id is specified, then only > the corresponding subvolume will be recovered. > > Signed-off-by: Lu Fengqi> --- > v2: add subvol_id argumenta to specify subvol_id instead of recovering > all subvolumes. > > undelete-subvol.c | 70 > +++ > undelete-subvol.h | 2 ++ > 2 files changed, 72 insertions(+) > > diff --git a/undelete-subvol.c b/undelete-subvol.c > index 9243e35545c5..5b494ca086ab 100644 > --- a/undelete-subvol.c > +++ b/undelete-subvol.c > @@ -15,6 +15,7 @@ > #include "transaction.h" > #include "disk-io.h" > #include "messages.h" > +#include "undelete-subvol.h" > > /* > * Determines whether the subvolume is intact, according to the drop_progress > @@ -182,3 +183,72 @@ static int link_subvol_to_lostfound(struct btrfs_root > *root, u64 subvol_id) > out: > return ret; > } > + > +/* > + * Traverse all orphan items on the root tree, restore them to the lost+found > + * directory if the corresponding subvolumes are still intact left on the > disk. > + * > + * @root the root of the root tree. Same comment here. > + * @subvol_idif not set to 0, skip other subvolumes and only recover > the > + * subvolume specified by @subvol_id. > + * > + * Return 0 if no error occurred even if no subvolume was recovered. > + */ > +int btrfs_undelete_intact_subvols(struct btrfs_root *root, u64 subvol_id) I prefer to remove the word "_intact". > +{ > + struct btrfs_key key; > + struct btrfs_path path; > + u64 found_count = 0; > + u64 recovered_count = 0; > + int ret = 0; > + > + key.objectid = BTRFS_ORPHAN_OBJECTID; > + key.type = BTRFS_ORPHAN_ITEM_KEY; > + key.offset = subvol_id ? subvol_id + 1 : (u64)-1; > + > + btrfs_init_path(); I would prefer to do btrfs_search_slot() here, and then use btrfs_previous_item() to iterate, other than calling btrfs_search_slot() several times. > + while (subvol_id != key.offset) { > + ret = btrfs_search_slot(NULL, root, , , 0, 0); > + if (ret < 0) { > + error("search ORPHAN_ITEM for %llu failed.\n", > + key.offset); > + break; > + } > + > + path.slots[0]--; Btrfs_previous_item() is much better here. Especially when above btrfs_search_slot() could return 0 (key found) and path.slots[0] could be 0. That's also the reason why I prefer btrfs_search_slot() then btrfs_previous_item() in the loop. Thanks, Qu > + btrfs_item_key_to_cpu(path.nodes[0], , path.slots[0]); > + > + btrfs_release_path(); > + > + /* No more BTRFS_ORPHAN_ITEM, so we don't need to continue. */ > + if (key.type != BTRFS_ORPHAN_ITEM_KEY) { > + ret = 0; > + break; > + } > + > + /* If subvol_id is non-zero, skip other deleted subvolume. */ > + if (subvol_id && subvol_id != key.offset) { > + ret = -ENOENT; > + break; > + } > + > + if (!is_subvol_intact(root, key.offset)) > + continue; > + > + /* Here we can confirm there is an intact subvolume. */ > + found_count++; > + ret = link_subvol_to_lostfound(root, key.offset); > + if (ret == 0) { > + recovered_count++; > + printf( > + "Recovered subvolume %llu to lost+found successfully.\n", > + key.offset); > + } > + > + } > + > + printf("Found %llu subvols left intact\n", found_count); > + printf("Recovered %llu subvols\n", found_count); > + > + return ret; > +} > diff --git a/undelete-subvol.h b/undelete-subvol.h > index 7cfd100cce37..f773210c46fe 100644 > --- a/undelete-subvol.h > +++ b/undelete-subvol.h > @@ -14,4 +14,6 @@ > #ifndef __BTRFS_UNDELETE_SUBVOLUME_H__ > #define __BTRFS_UNDELETE_SUBVOLUME_H__ > > +int btrfs_undelete_intact_subvols(struct btrfs_root *root, u64 subvol_id); > + > #endif > signature.asc Description: OpenPGP digital signature
Re: [PATCH v2 06/10] btrfs-progs: undelete-subvol: introduce link_subvol_to_lostfound
On 2018年03月27日 15:06, Lu Fengqi wrote: > The function will create lost+found directory, link the deleted > subvolume specified by the subvol_id to the directory, update the > information of root_item and cleanup the associated orphan item. > > Signed-off-by: Lu Fengqi> --- > undelete-subvol.c | 76 > +++ > 1 file changed, 76 insertions(+) > > diff --git a/undelete-subvol.c b/undelete-subvol.c > index 781057df2b84..9243e35545c5 100644 > --- a/undelete-subvol.c > +++ b/undelete-subvol.c > @@ -106,3 +106,79 @@ out: > btrfs_release_path(); > return ret; > } > + > +/* > + * Recover a subvolume specified by subvol_id, and link it to the lost+found > + * directory. > + * > + * @root: the root of the root tree. In that case, @fs_info would case less confusion. Otherwise looks good. Reviewed-by: Qu Wenruo Thanks, Qu > + * @subvol_id: specify the subvolume which will be linked, and also be the > part > + * of the subvolume name. > + * > + * Return 0 if no error occurred. > + */ > +static int link_subvol_to_lostfound(struct btrfs_root *root, u64 subvol_id) > +{ > + struct btrfs_trans_handle *trans; > + struct btrfs_fs_info *fs_info = root->fs_info; > + struct btrfs_root *fs_root = fs_info->fs_root; > + char buf[BTRFS_NAME_LEN + 1] = {0}; /* 1 for snprintf null */ > + char *dir_name = "lost+found"; > + u64 lost_found_ino = 0; > + u32 mode = 0700; > + int ret; > + > + /* > + * For link subvolume to lost+found, > + * 2 for parent(256)'s dir_index and dir_item > + * 2 for lost+found dir's inode_item and inode_ref > + * 2 for lost+found dir's dir_index and dir_item for the subvolume > + * 2 for the subvolume's root_ref and root_backref > + */ > + trans = btrfs_start_transaction(fs_root, 8); > + if (IS_ERR(trans)) { > + error("unable to start transaction"); > + ret = PTR_ERR(trans); > + goto out; > + } > + > + /* Create lost+found directory */ > + ret = btrfs_mkdir(trans, fs_root, dir_name, strlen(dir_name), > + BTRFS_FIRST_FREE_OBJECTID, _found_ino, > + mode); > + if (ret < 0) { > + error("failed to create '%s' dir: %d", dir_name, ret); > + goto out; > + } > + > + /* Link the subvolume to lost+found directory */ > + snprintf(buf, BTRFS_NAME_LEN + 1, "sub%llu", subvol_id); > + ret = btrfs_link_subvol(trans, fs_root, buf, subvol_id, lost_found_ino, > + false); > + if (ret) { > + error("failed to link the subvol %llu: %d", subvol_id, ret); > + goto out; > + } > + > + /* Clear root flags BTRFS_ROOT_SUBVOL_DEAD and increase root refs */ > + ret = recover_dead_root(trans, root, subvol_id); > + if (ret) > + goto out; > + > + /* Delete the orphan item after undeletion is completed. */ > + ret = btrfs_del_orphan_item(trans, root, subvol_id); > + if (ret) { > + error("failed to delete the orphan_item for %llu: %d", > + subvol_id, ret); > + goto out; > + } > + > + ret = btrfs_commit_transaction(trans, fs_root); > + if (ret) { > + error("transaction commit failed: %d", ret); > + goto out; > + } > + > +out: > + return ret; > +} > signature.asc Description: OpenPGP digital signature
Re: [PATCH v2 05/10] btrfs-progs: undelete-subvol: introduce recover_dead_root
On 2018年03月27日 15:06, Lu Fengqi wrote: > The function will find the root_item specified by the subvol_id, > clear the BTRFS_ROOT_SUBVOL_DEAD flag and set root_refs to one. Sorry I didn't point it out in btrfs_link_subvol() patch, but at least to me, refs should only be modified by btrfs_link_subvol(), just like btrfs_unlink(). Thanks, Qu > > Signed-off-by: Lu Fengqi> --- > ctree.h | 1 + > undelete-subvol.c | 55 > +++ > 2 files changed, 56 insertions(+) > > diff --git a/ctree.h b/ctree.h > index 4bff0b821472..7c0b8150bc4e 100644 > --- a/ctree.h > +++ b/ctree.h > @@ -184,6 +184,7 @@ static int btrfs_csum_sizes[] = { 4 }; > #define BTRFS_FT_MAX 9 > > #define BTRFS_ROOT_SUBVOL_RDONLY (1ULL << 0) > +#define BTRFS_ROOT_SUBVOL_DEAD (1ULL << 48) > > /* > * the key defines the order in the tree, and so it also defines (optimal) > diff --git a/undelete-subvol.c b/undelete-subvol.c > index 00fcc4895778..781057df2b84 100644 > --- a/undelete-subvol.c > +++ b/undelete-subvol.c > @@ -12,6 +12,9 @@ > */ > > #include "ctree.h" > +#include "transaction.h" > +#include "disk-io.h" > +#include "messages.h" > > /* > * Determines whether the subvolume is intact, according to the drop_progress > @@ -51,3 +54,55 @@ out: > btrfs_release_path(); > return ret; > } > + > +/* > + * Clear BTRFS_ROOT_SUBVOL_DEAD flag and set the root_refs to one. > + * > + * @root the root of root tree. > + * @subvol_idspecify the root_item which will be modified. > + * > + * Return 0 if no error occurred. > + */ > +static int recover_dead_root(struct btrfs_trans_handle *trans, > + struct btrfs_root *root, u64 subvol_id) > +{ > + struct btrfs_key key; > + struct btrfs_path path; > + struct extent_buffer *leaf; > + struct btrfs_root_item root_item; > + u64 root_flags; > + u64 offset; > + int ret; > + > + key.objectid = subvol_id; > + key.type = BTRFS_ROOT_ITEM_KEY; > + key.offset = 0; > + > + btrfs_init_path(); > + ret = btrfs_search_slot(trans, root, , , 0, 0); > + if (ret) { > + error("couldn't find ROOT_ITEM for %llu failed: %d", > + subvol_id, ret); > + goto out; > + } > + > + leaf = path.nodes[0]; > + > + offset = btrfs_item_ptr_offset(leaf, path.slots[0]); > + read_extent_buffer(leaf, _item, offset, sizeof(root_item)); > + > + /* Clear BTRFS_ROOT_SUBVOL_DEAD */ > + root_flags = btrfs_root_flags(_item); > + btrfs_set_root_flags(_item, > + root_flags & ~BTRFS_ROOT_SUBVOL_DEAD); > + > + /* Increase the refs */ > + btrfs_set_root_refs(_item, 1); > + > + write_extent_buffer(leaf, _item, offset, sizeof(root_item)); > + btrfs_mark_buffer_dirty(leaf); > + > +out: > + btrfs_release_path(); > + return ret; > +} > signature.asc Description: OpenPGP digital signature
Re: [PATCH v2 04/10] btrfs-progs: undelete-subvol: introduce is_subvol_intact
On 2018年03月27日 15:06, Lu Fengqi wrote: > The function is used to determine whether the subvolume is intact. > > Signed-off-by: Lu Fengqi> --- > Makefile | 3 ++- > undelete-subvol.c | 53 + > undelete-subvol.h | 17 + > 3 files changed, 72 insertions(+), 1 deletion(-) > create mode 100644 undelete-subvol.c > create mode 100644 undelete-subvol.h > > diff --git a/Makefile b/Makefile > index 6065522a615f..cb38984c7386 100644 > --- a/Makefile > +++ b/Makefile > @@ -116,7 +116,8 @@ objects = ctree.o disk-io.o kernel-lib/radix-tree.o > extent-tree.o print-tree.o \ > qgroup.o free-space-cache.o kernel-lib/list_sort.o props.o \ > kernel-shared/ulist.o qgroup-verify.o backref.o string-table.o > task-utils.o \ > inode.o file.o find-root.o free-space-tree.o help.o send-dump.o \ > - fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o > + fsfeatures.o kernel-lib/tables.o kernel-lib/raid56.o transaction.o \ > + undelete-subvol.o > cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o > \ > cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \ > cmds-quota.o cmds-qgroup.o cmds-replace.o check/main.o \ > diff --git a/undelete-subvol.c b/undelete-subvol.c > new file mode 100644 > index ..00fcc4895778 > --- /dev/null > +++ b/undelete-subvol.c > @@ -0,0 +1,53 @@ > +/* > + * Copyright (C) 2018 Fujitsu. All rights reserved. IIRC David will remove all such copy right line. Is there some principle about this, David? > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public > + * License v2 as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * General Public License for more details. > + */ > + > +#include "ctree.h" > + > +/* > + * Determines whether the subvolume is intact, according to the drop_progress > + * of the root item specified by subvol_id. > + * > + * Return true if the subvolume is intact, otherwise return false. > + */ > +static bool is_subvol_intact(struct btrfs_root *root, u64 subvol_id) Here we have 2 parameters which can represent a root, so please explain the meaning of each parameter. And after looking into the code, @root should be tree root, and in that case, I prefer passing @fs_info here to get rid of any confusion. > +{ > + struct btrfs_path path; > + struct btrfs_key key; > + struct extent_buffer *leaf; > + struct btrfs_root_item root_item; > + u64 offset; > + int ret; > + > + key.objectid = subvol_id; > + key.type = BTRFS_ROOT_ITEM_KEY; > + key.offset = 0; > + > + btrfs_init_path(); > + ret = btrfs_search_slot(NULL, root, , , 0, 0); Instead of setting up keys and search manually, what about using btrfs_read_fs_root()? > + if (ret) { > + ret = false; > + goto out; > + } > + > + leaf = path.nodes[0]; > + offset = btrfs_item_ptr_offset(leaf, path.slots[0]); > + > + read_extent_buffer(leaf, _item, offset, sizeof(root_item)); > + > + /* the subvolume is intact if the objectid of drop_progress == 0 */ > + ret = btrfs_disk_key_objectid(_item.drop_progress) ? false : true; > + > +out: > + btrfs_release_path(); > + return ret; > +} > diff --git a/undelete-subvol.h b/undelete-subvol.h > new file mode 100644 > index ..7cfd100cce37 > --- /dev/null > +++ b/undelete-subvol.h > @@ -0,0 +1,17 @@ > +/* > + * Copyright (C) 2018 Fujitsu. All rights reserved. > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public > + * License v2 as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * General Public License for more details. > + */ > + > +#ifndef __BTRFS_UNDELETE_SUBVOLUME_H__ I think the empty header is not really needed here. What about introducing it when we have something here? Thanks, Qu > +#define __BTRFS_UNDELETE_SUBVOLUME_H__ > + > +#endif > signature.asc Description: OpenPGP digital signature
Re: [PATCH v2 03/10] btrfs-progs: use btrfs_find_free_dir_index to find free inode index
On 2018年03月27日 15:06, Lu Fengqi wrote: > Since we have an existing function to find free inode index, we can > reuse it here. > > Signed-off-by: Lu FengqiLooks good. Reviewed-by: Qu Wenruo Thanks, Qu > --- > inode.c | 27 +++ > 1 file changed, 7 insertions(+), 20 deletions(-) > > diff --git a/inode.c b/inode.c > index 478036562652..86905365dfd8 100644 > --- a/inode.c > +++ b/inode.c > @@ -628,7 +628,7 @@ int btrfs_link_subvol(struct btrfs_trans_handle *trans, > struct btrfs_root *root, > struct btrfs_inode_item *inode_item; > struct extent_buffer *leaf; > struct btrfs_key key; > - u64 index = 2; > + u64 index; > char buf[BTRFS_NAME_LEN + 1]; /* for snprintf null */ > int len; > int i; > @@ -638,28 +638,15 @@ int btrfs_link_subvol(struct btrfs_trans_handle *trans, > struct btrfs_root *root, > if (len == 0 || len > BTRFS_NAME_LEN) > return -EINVAL; > > - /* find the free dir_index */ > - btrfs_init_path(); > - key.objectid = dirid; > - key.type = BTRFS_DIR_INDEX_KEY; > - key.offset = (u64)-1; > - > - ret = btrfs_search_slot(NULL, root, , , 0, 0); > - if (ret <= 0) { > - error("search for DIR_INDEX dirid %llu failed: %d", > - (unsigned long long)dirid, ret); > + /* add the dir_item/dir_index */ > + ret = btrfs_find_free_dir_index(root, dirid, ); > + if (ret < 0) { > + error("unable to find free dir index dirid %llu failed: %d", > + (unsigned long long)dirid, ret); > goto fail; > } > > - if (path.slots[0] > 0) { > - path.slots[0]--; > - btrfs_item_key_to_cpu(path.nodes[0], , path.slots[0]); > - if (key.objectid == dirid && key.type == BTRFS_DIR_INDEX_KEY) > - index = key.offset + 1; > - } > - btrfs_release_path(); > - > - /* add the dir_item/dir_index */ > + btrfs_init_path(); > key.objectid = dirid; > key.offset = 0; > key.type = BTRFS_INODE_ITEM_KEY; > signature.asc Description: OpenPGP digital signature
Re: [PATCH v2 02/10] btrfs-progs: extract btrfs_link_subvol from btrfs_mksubvol
On 2018年03月27日 15:06, Lu Fengqi wrote: > The original btrfs_mksubvol is too specific to specify the directory that > the subvolume will link to. Furthermore, in this transaction, we don't only > need to create root_ref/dir-item, but also update the refs or flags of > root_item. Extract a generic btrfs_link_subvol that allow the caller pass a > trans argument for later subvolume undelete. The idea makes sense. Although some small nitpicks inlined below. > > No functional changes for the btrfs_mksubvol. > > Signed-off-by: Lu Fengqi> --- > convert/main.c | 57 + > ctree.h| 5 +++-- > inode.c| 46 +- > 3 files changed, 81 insertions(+), 27 deletions(-) > > diff --git a/convert/main.c b/convert/main.c > index 6bdfab40d0b0..dd6b42a1f2b1 100644 > --- a/convert/main.c > +++ b/convert/main.c > @@ -1002,6 +1002,63 @@ err: > return ret; > } > > +/* > + * Link the subvolume specified by @root_objectid to the root_dir of @root. However the function name is btrfs_mksubvol(), not btrfs_link_subvol(). After reading the code, it turns out that btrfs_mksubvol() is just an less generic btrfs_link_subvol(). > + * @root the root of the file tree which the subvolume will > + * be linked to. Here you're using btrfs_root for source subvolume. > + * @subvol_name the name of the subvolume which will be linked. > + * @root_objectidspecify the subvolume which will be linked. But here you're specifying subvolume id as destination. It would be much better to unify both parameter to the same structure. And personally I prefer both btrfs_root. > + * @convert the flag to determine whether to try to resolve > + * the name conflict. This parameter is not used in this function, and the name "convert" doesn't reflect the name conflict check. Personally speaking, I would like the parameters to be something like btrfs_mksubolv(struct btrfs_root *dst_root, struct btrfs_root *src_root) > + * > + * Return the root of the subvolume if success, otherwise return NULL. > + */ > +static struct btrfs_root *btrfs_mksubvol(struct btrfs_root *root, > + const char *subvol_name, > + u64 root_objectid, > + bool convert) > +{ > + struct btrfs_trans_handle *trans; > + struct btrfs_root *subvol_root = NULL; > + struct btrfs_key key; > + u64 dirid = btrfs_root_dirid(>root_item); > + int ret; > + > + > + trans = btrfs_start_transaction(root, 1); > + if (IS_ERR(trans)) { > + error("unable to start transaction"); > + goto fail; > + } > + > + ret = btrfs_link_subvol(trans, root, subvol_name, root_objectid, dirid, > + true); > + if (ret) { > + error("unable to link subvolume %s", subvol_name); > + goto fail; > + } > + > + ret = btrfs_commit_transaction(trans, root); > + if (ret) { > + error("transaction commit failed: %d", ret); > + goto fail; > + } > + > + key.objectid = root_objectid; > + key.offset = (u64)-1; > + key.type = BTRFS_ROOT_ITEM_KEY; > + > + subvol_root = btrfs_read_fs_root(root->fs_info, ); > + if (!subvol_root) { > + error("unable to link subvolume %s", subvol_name); > + goto fail; > + } > + > +fail: > + return subvol_root; > +} > + > /* > * Migrate super block to its default position and zero 0 ~ 16k > */ > diff --git a/ctree.h b/ctree.h > index 0fc31dd8d998..4bff0b821472 100644 > --- a/ctree.h > +++ b/ctree.h > @@ -2797,8 +2797,9 @@ int btrfs_del_orphan_item(struct btrfs_trans_handle > *trans, > struct btrfs_root *root, u64 offset); > int btrfs_mkdir(struct btrfs_trans_handle *trans, struct btrfs_root *root, > char *name, int namelen, u64 parent_ino, u64 *ino, int mode); > -struct btrfs_root *btrfs_mksubvol(struct btrfs_root *root, const char *base, > - u64 root_objectid, bool convert); > +int btrfs_link_subvol(struct btrfs_trans_handle *trans, struct btrfs_root > *root, > + const char *base, u64 root_objectid, u64 dirid, > + bool convert); > > /* file.c */ > int btrfs_get_extent(struct btrfs_trans_handle *trans, > diff --git a/inode.c b/inode.c > index 8d0812c7cf50..478036562652 100644 > --- a/inode.c > +++ b/inode.c > @@ -606,19 +606,28 @@ out: > return ret; > } > > -struct btrfs_root *btrfs_mksubvol(struct btrfs_root *root, > - const char *base, u64 root_objectid, > - bool convert) > +/* > + * Link the subvolume specified by @root_objectid to the directory specified > by > + * @dirid on the
Re: [PATCH v2 00/10] undelete subvolume offline version
On Tue, Mar 27, 2018 at 03:06:48PM +0800, Lu Fengqi wrote: >This patchset will add undelete-subvol subcommand for btrfs rescue. This >enhancement allows undeleting a subvolume on an unmounted filesystem. Gentle ping. Any advice about this patchset will be welcomed. -- Thanks, Lu > >Patchset can be fetched from github: >https://github.com/littleroad/btrfs-progs.git undelete > >v1->v2: add -s option to allow user specify the subvolume which will be >recovered. > >The first six patches are not modified. >7th-8th add the -s option to specify subvol_id instead of recovering all >subvolumes. >9th add shell quoting to test script. >10th just add the -s option documentation. > >Lu Fengqi (10): > btrfs-progs: copy btrfs_del_orphan_item from kernel > btrfs-progs: extract btrfs_link_subvol from btrfs_mksubvol > btrfs-progs: use btrfs_find_free_dir_index to find free inode index > btrfs-progs: undelete-subvol: introduce is_subvol_intact > btrfs-progs: undelete-subvol: introduce recover_dead_root > btrfs-progs: undelete-subvol: introduce link_subvol_to_lostfound > btrfs-progs: undelete-subvol: introduce btrfs_undelete_intact_subvols > btrfs-progs: undelete-subvol: add undelete-subvol subcommand > btrfs-progs: tests: add testcase for undelete-subvol > btrfs-progs: undelete-subvol: update completion and documentation > > Documentation/btrfs-rescue.asciidoc| 12 + > Makefile | 3 +- > btrfs-completion | 2 +- > cmds-rescue.c | 70 ++ > convert/main.c | 57 + > ctree.h| 8 +- > inode.c| 99 > .../030-undelete-subvol/deleted_subvolume.img | Bin 0 -> 4096 bytes > .../030-undelete-subvol/drop_progress.raw.xz | Bin 0 -> 23452 bytes > tests/misc-tests/030-undelete-subvol/test.sh | 34 +++ > undelete-subvol.c | 254 + > undelete-subvol.h | 19 ++ > 12 files changed, 511 insertions(+), 47 deletions(-) > create mode 100644 tests/misc-tests/030-undelete-subvol/deleted_subvolume.img > create mode 100644 tests/misc-tests/030-undelete-subvol/drop_progress.raw.xz > create mode 100755 tests/misc-tests/030-undelete-subvol/test.sh > create mode 100644 undelete-subvol.c > create mode 100644 undelete-subvol.h > >-- >2.16.2 > > > >-- >To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >the body of a message to majord...@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 3/3] btrfs: Allow rmdir(2) to delete an empty subvolume
Change the behavior of rmdir(2) and allow it to delete an empty subvolume by using btrfs_delete_subvolume() which is used by btrfs_ioctl_snap_destroy(). The required lock for @dir and inode of @dentry is already acquired in vfs layer. We need some check before deleting a subvolume. Permission check is done in vfs layer, emptiness check is in btrfs_rmdir() and additional check (i.e. neither the subvolume is a default subvolume nor send is in progress) is in btrfs_delete_subvolume(). Note that in btrfs_ioctl_snap_destroy(), d_delete() is called after btrfs_delete_subvolume(). For rmdir(2), d_delete() is called in vfs layer later. Tested-by: Goffredo BaroncelliSigned-off-by: Tomohiro Misono --- fs/btrfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index b421aa8dd1a4..231e4c2ea1b4 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4522,7 +4522,7 @@ static int btrfs_rmdir(struct inode *dir, struct dentry *dentry) if (inode->i_size > BTRFS_EMPTY_DIR_SIZE) return -ENOTEMPTY; if (btrfs_ino(BTRFS_I(inode)) == BTRFS_FIRST_FREE_OBJECTID) - return -EPERM; + return btrfs_delete_subvolume(dir, dentry); trans = __unlink_start_trans(dir); if (IS_ERR(trans)) -- 2.14.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 2/3] btrfs: Factor out the main deletion process from btrfs_ioctl_snap_destroy()
Factor out the secand half of btrfs_ioctl_snap_destroy() as btrfs_delete_subvolume(), which performes some subvolume specific checks before deletion (1. send is not in progress. 2. the subvolume is not the default subvolume. 3. the subvolume does not contain other subvolumes) and actual deletion process. btrfs_delete_subvolume() requires inode_lock for both @dir and inode of @dentry. The remaining part of btrfs_ioctl_snap_destroy() is mainly permission checks. Note that call of d_delete() is not included in btrfs_delete_subvolume() as this function will also be used by btrfs_rmdir() to delete an empty subvolume and in that case d_delete() is called in vfs layer. As a result, btrfs_unlink_subvol() and may_destroy_subvol() become static functions. No functional change happens. Signed-off-by: Tomohiro Misono--- fs/btrfs/ctree.h | 6 +-- fs/btrfs/inode.c | 143 ++- fs/btrfs/ioctl.c | 131 +- 3 files changed, 144 insertions(+), 136 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 64c119349f98..34c22026e905 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3166,11 +3166,7 @@ int btrfs_unlink_inode(struct btrfs_trans_handle *trans, int btrfs_add_link(struct btrfs_trans_handle *trans, struct btrfs_inode *parent_inode, struct btrfs_inode *inode, const char *name, int name_len, int add_backref, u64 index); -int btrfs_unlink_subvol(struct btrfs_trans_handle *trans, - struct btrfs_root *root, - struct inode *dir, u64 objectid, - const char *name, int name_len); -noinline int may_destroy_subvol(struct btrfs_root *root); +int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry); int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, int front); int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 804a1913dabf..b421aa8dd1a4 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4237,7 +4237,7 @@ static int btrfs_unlink(struct inode *dir, struct dentry *dentry) return ret; } -int btrfs_unlink_subvol(struct btrfs_trans_handle *trans, +static int btrfs_unlink_subvol(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct inode *dir, u64 objectid, const char *name, int name_len) @@ -4321,7 +4321,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans, /* * helper to check if the subvolume references other subvolumes */ -noinline int may_destroy_subvol(struct btrfs_root *root) +static noinline int may_destroy_subvol(struct btrfs_root *root) { struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_path *path; @@ -4372,6 +4372,145 @@ noinline int may_destroy_subvol(struct btrfs_root *root) return ret; } +int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry) +{ + struct btrfs_fs_info *fs_info = btrfs_sb(dentry->d_sb); + struct btrfs_root *root = BTRFS_I(dir)->root; + struct inode *inode = d_inode(dentry); + struct btrfs_root *dest = BTRFS_I(inode)->root; + struct btrfs_trans_handle *trans; + struct btrfs_block_rsv block_rsv; + u64 root_flags; + u64 qgroup_reserved; + int ret; + int err; + + /* +* Don't allow to delete a subvolume with send in progress. This is +* inside the i_mutex so the error handling that has to drop the bit +* again is not run concurrently. +*/ + spin_lock(>root_item_lock); + root_flags = btrfs_root_flags(>root_item); + if (dest->send_in_progress == 0) { + btrfs_set_root_flags(>root_item, + root_flags | BTRFS_ROOT_SUBVOL_DEAD); + spin_unlock(>root_item_lock); + } else { + spin_unlock(>root_item_lock); + btrfs_warn(fs_info, + "Attempt to delete subvolume %llu during send", + dest->root_key.objectid); + err = -EPERM; + return err; + } + + down_write(_info->subvol_sem); + + err = may_destroy_subvol(dest); + if (err) + goto out_up_write; + + btrfs_init_block_rsv(_rsv, BTRFS_BLOCK_RSV_TEMP); + /* +* One for dir inode, two for dir entries, two for root +* ref/backref. +*/ + err = btrfs_subvolume_reserve_metadata(root, _rsv, + 5, _reserved, true); + if (err) + goto out_up_write; + + trans = btrfs_start_transaction(root, 0); + if (IS_ERR(trans)) { + err = PTR_ERR(trans); + goto out_release; + } +
[PATCH v5 1/3] btrfs: Move may_destroy_subvol() from ioctl.c to inode.c
This is a preparation work to refactor btrfs_ioctl_snap_destroy() and to allow rmdir(2) to delete an empty subvolume. Signed-off-by: Tomohiro Misono--- fs/btrfs/ctree.h | 1 + fs/btrfs/inode.c | 54 ++ fs/btrfs/ioctl.c | 54 -- 3 files changed, 55 insertions(+), 54 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 5474ef14d6e6..64c119349f98 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3170,6 +3170,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct inode *dir, u64 objectid, const char *name, int name_len); +noinline int may_destroy_subvol(struct btrfs_root *root); int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, int front); int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e064c49c9a9a..804a1913dabf 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4318,6 +4318,60 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans, return ret; } +/* + * helper to check if the subvolume references other subvolumes + */ +noinline int may_destroy_subvol(struct btrfs_root *root) +{ + struct btrfs_fs_info *fs_info = root->fs_info; + struct btrfs_path *path; + struct btrfs_dir_item *di; + struct btrfs_key key; + u64 dir_id; + int ret; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + /* Make sure this root isn't set as the default subvol */ + dir_id = btrfs_super_root_dir(fs_info->super_copy); + di = btrfs_lookup_dir_item(NULL, fs_info->tree_root, path, + dir_id, "default", 7, 0); + if (di && !IS_ERR(di)) { + btrfs_dir_item_key_to_cpu(path->nodes[0], di, ); + if (key.objectid == root->root_key.objectid) { + ret = -EPERM; + btrfs_err(fs_info, + "deleting default subvolume %llu is not allowed", + key.objectid); + goto out; + } + btrfs_release_path(path); + } + + key.objectid = root->root_key.objectid; + key.type = BTRFS_ROOT_REF_KEY; + key.offset = (u64)-1; + + ret = btrfs_search_slot(NULL, fs_info->tree_root, , path, 0, 0); + if (ret < 0) + goto out; + BUG_ON(ret == 0); + + ret = 0; + if (path->slots[0] > 0) { + path->slots[0]--; + btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]); + if (key.objectid == root->root_key.objectid && + key.type == BTRFS_ROOT_REF_KEY) + ret = -ENOTEMPTY; + } +out: + btrfs_free_path(path); + return ret; +} + static int btrfs_rmdir(struct inode *dir, struct dentry *dentry) { struct inode *inode = d_inode(dentry); diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 632e26d6f7ce..abd998247d64 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1832,60 +1832,6 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file, return ret; } -/* - * helper to check if the subvolume references other subvolumes - */ -static noinline int may_destroy_subvol(struct btrfs_root *root) -{ - struct btrfs_fs_info *fs_info = root->fs_info; - struct btrfs_path *path; - struct btrfs_dir_item *di; - struct btrfs_key key; - u64 dir_id; - int ret; - - path = btrfs_alloc_path(); - if (!path) - return -ENOMEM; - - /* Make sure this root isn't set as the default subvol */ - dir_id = btrfs_super_root_dir(fs_info->super_copy); - di = btrfs_lookup_dir_item(NULL, fs_info->tree_root, path, - dir_id, "default", 7, 0); - if (di && !IS_ERR(di)) { - btrfs_dir_item_key_to_cpu(path->nodes[0], di, ); - if (key.objectid == root->root_key.objectid) { - ret = -EPERM; - btrfs_err(fs_info, - "deleting default subvolume %llu is not allowed", - key.objectid); - goto out; - } - btrfs_release_path(path); - } - - key.objectid = root->root_key.objectid; - key.type = BTRFS_ROOT_REF_KEY; - key.offset = (u64)-1; - - ret = btrfs_search_slot(NULL, fs_info->tree_root, , path, 0, 0); - if (ret < 0) - goto out; - BUG_ON(ret == 0); - - ret = 0; - if (path->slots[0] > 0) { - path->slots[0]--; -
[PATCH v5 0/3] Allow rmdir(2) to delete a subvolume
cangelog: v4 -> v5 ... Merge 2nd and 3rd patches and update commit log No code change in total v3 -> v4 ... Reorganize patches and update commit log No code change in total v2 -> v3 ... Use if-else block instead of two if blocks and add Tested-by tag in 2nd patch v1 -> v2 ... Split the patch to hopefully make review easier Note: I will send a xfstest if this series is merged. This series changes the behavior of rmdir(2) and allow it to delete an empty subvolume. In order so that, 1st and 2nd patch refactor btrfs_ioctl_snap_destroy() and extract the actual deletion process as btrfs_delete_subvolume() (remaining part in btrfs_ioctl_snap_destroy() is mainly permission checks). Then, 3rd patch changes btrfs_rmdir() to call this function. The required permission check is already done in vfs layer. Tomohiro Misono (3): btrfs: Move may_destroy_subvol() from ioctl.c to inode.c btrfs: Factor out the main deletion process from btrfs_ioctl_snap_destroy() btrfs: Allow rmdir(2) to delete an empty subvolume fs/btrfs/ctree.h | 5 +- fs/btrfs/inode.c | 197 ++- fs/btrfs/ioctl.c | 185 +-- 3 files changed, 198 insertions(+), 189 deletions(-) -- 2.14.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: delayed-inode: Remove wrong qgroup meta reservation calls
On 2018年04月17日 23:18, David Sterba wrote: > On Tue, Apr 17, 2018 at 04:52:45PM +0800, Qu Wenruo wrote: >> Commit 4f5427ccce5d ("btrfs: delayed-inode: Use new qgroup meta rsv for >> delayed inode and item") merged into mainline is not the updated version >> submitted to the mail list in Dec 2017. >> >> Which lacks the following fixes: >> >> 1) Remove btrfs_qgroup_convert_reserved_meta() call in >>btrfs_delayed_item_release_metadata() >> 2) Remove btrfs_qgroup_reserve_meta_prealloc() call in >>btrfs_delayed_inode_reserve_metadata() >> >> Those fixes will resolve unexpected EDQUOT problems. >> >> Fixes: 4f5427ccce5d ("btrfs: delayed-inode: Use new qgroup meta rsv for >> delayed inode and item") >> Signed-off-by: Qu Wenruo> > Added to 4.17 queue, thanks. > >> @@ -569,6 +569,12 @@ static int btrfs_delayed_item_reserve_metadata(struct >> btrfs_trans_handle *trans, >> dst_rsv = _info->delayed_block_rsv; >> >> num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1); >> + >> +/* >> + * Here we migrate space rsv from transaction rsv, since have >> + * already reserved space when starting a transaction. >> + * So no need to reserve qgroup space here. >> + */ > > Please format the comments to the full line width. Right, the already can go previous line without exceeding 80 chars. But the "So no need to..." line is a new line so it will not take up any space of previous line anyway. > >> @@ -647,7 +657,9 @@ static int btrfs_delayed_inode_reserve_metadata( >>"delayed_inode", >>btrfs_ino(inode), >>num_bytes, 1); >> -} >> +} else >> +btrfs_qgroup_free_meta_prealloc(root, >> +fs_info->nodesize); > > Please don't diverge from the coding style, I'm fixing such issues but, > you know. For this, did you mean the bracket for else branch? Thanks, Qu signature.asc Description: OpenPGP digital signature
Re: Hard link not persisted on fsync
Hi, A gentle reminder on the crash consistency bug that we found on btrfs: Link count of a file is not persisted even after a fsync. We believe a filesystem that ensures strictly ordered metadata behaviour should be able to persist the hard link after a fsync on the original file. Could you comment on why btrfs does not exhibit this behavior, and if it's something you'd want to fix? Thanks, Jayashree Mohan On Mon, Apr 16, 2018 at 9:35 AM, Jayashree Mohanwrote: > Hi, > > The following seems to be a crash consistency bug on btrfs, where in > the link count is not persisted even after a fsync on the original > file. > > Consider the following workload : > creat foo > link (foo, A/bar) > fsync(foo) > ---Crash--- > > Now, on recovery we expect the metadata of foo to be persisted i.e > have a link count of 2. However in btrfs, the link count is 1 and file > A/bar is not persisted. The expected behaviour would be to persist the > dependencies of inode foo. That is to say, shouldn't fsync of foo > persist A/bar and correctly update the link count? > Note that ext4, xfs and f2fs recover to the correct link count of 2 > for the above workload. > > Let us know what you think about this behavior. > > Thanks, > Jayashree Mohan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: add chattr support for send/receive
On 04/17/2018 04:39 PM, Howard McLauchlan wrote: > Presently btrfs send/receive does not propagate inode attribute flags; > all chattr operations are effectively discarded upon transmission. > > This patch adds kernel support for inode attribute flags. Userspace > support can be found under the commit: > > btrfs-progs: add chattr support for send/receive > > An associated xfstest can be found at: > > btrfs: add verify chattr support for send/receive test > > A caveat is that a user with an updated kernel (aware of chattrs) and an > older version of btrfs-progs (unaware of chattrs) will fail to receive > if a chattr is included in the send stream. > > Signed-off-by: Howard McLauchlan> --- > Based on 4.17-rc1 > > fs/btrfs/ctree.h | 2 + > fs/btrfs/ioctl.c | 2 +- > fs/btrfs/send.c | 176 +++ > fs/btrfs/send.h | 2 + > 4 files changed, 154 insertions(+), 28 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 5474ef14d6e6..a0dc6a8a37eb 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -1436,6 +1436,8 @@ struct btrfs_map_token { > unsigned long offset; > }; > > +unsigned int btrfs_flags_to_ioctl(unsigned int flags); > + > #define BTRFS_BYTES_TO_BLKS(fs_info, bytes) \ > ((bytes) >> (fs_info)->sb->s_blocksize_bits) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 632e26d6f7ce..36ce1e589f9e 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -106,7 +106,7 @@ static unsigned int btrfs_mask_flags(umode_t mode, > unsigned int flags) > /* > * Export inode flags to the format expected by the FS_IOC_GETFLAGS ioctl. > */ > -static unsigned int btrfs_flags_to_ioctl(unsigned int flags) > +unsigned int btrfs_flags_to_ioctl(unsigned int flags) > { > unsigned int iflags = 0; > > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c > index 221e5cdb060b..da521a5a1843 100644 > --- a/fs/btrfs/send.c > +++ b/fs/btrfs/send.c > @@ -101,6 +101,13 @@ struct send_ctx { > u64 cur_inode_last_extent; > u64 cur_inode_next_write_offset; > > + /* > + * state for chattr purposes > + */ > + u64 cur_inode_flip_flags; > + u64 cur_inode_receive_flags; > + int receive_flags_valid; > + > u64 send_progress; > > struct list_head new_refs; > @@ -798,7 +805,7 @@ static int send_rmdir(struct send_ctx *sctx, struct > fs_path *path) > */ > static int __get_inode_info(struct btrfs_root *root, struct btrfs_path *path, > u64 ino, u64 *size, u64 *gen, u64 *mode, u64 *uid, > - u64 *gid, u64 *rdev) > + u64 *gid, u64 *rdev, u64 *flags) > { > int ret; > struct btrfs_inode_item *ii; > @@ -828,6 +835,8 @@ static int __get_inode_info(struct btrfs_root *root, > struct btrfs_path *path, > *gid = btrfs_inode_gid(path->nodes[0], ii); > if (rdev) > *rdev = btrfs_inode_rdev(path->nodes[0], ii); > + if (flags) > + *flags = btrfs_inode_flags(path->nodes[0], ii); > > return ret; > } > @@ -835,7 +844,7 @@ static int __get_inode_info(struct btrfs_root *root, > struct btrfs_path *path, > static int get_inode_info(struct btrfs_root *root, > u64 ino, u64 *size, u64 *gen, > u64 *mode, u64 *uid, u64 *gid, > - u64 *rdev) > + u64 *rdev, u64 *flags) > { > struct btrfs_path *path; > int ret; > @@ -844,7 +853,7 @@ static int get_inode_info(struct btrfs_root *root, > if (!path) > return -ENOMEM; > ret = __get_inode_info(root, path, ino, size, gen, mode, uid, gid, > -rdev); > +rdev, flags); > btrfs_free_path(path); > return ret; > } > @@ -1233,7 +1242,7 @@ static int __iterate_backrefs(u64 ino, u64 offset, u64 > root, void *ctx_) >* accept clones from these extents. >*/ > ret = __get_inode_info(found->root, bctx->path, ino, _size, NULL, > NULL, > -NULL, NULL, NULL); > +NULL, NULL, NULL, NULL); > btrfs_release_path(bctx->path); > if (ret < 0) > return ret; > @@ -1593,7 +1602,7 @@ static int get_cur_inode_state(struct send_ctx *sctx, > u64 ino, u64 gen) > u64 right_gen; > > ret = get_inode_info(sctx->send_root, ino, NULL, _gen, NULL, NULL, > - NULL, NULL); > + NULL, NULL, NULL); > if (ret < 0 && ret != -ENOENT) > goto out; > left_ret = ret; > @@ -1602,7 +1611,7 @@ static int get_cur_inode_state(struct send_ctx *sctx, > u64 ino, u64 gen) > right_ret = -ENOENT; > } else { > ret = get_inode_info(sctx->parent_root, ino, NULL, _gen, > - NULL, NULL, NULL, NULL); > +
[PATCH] btrfs-progs: add chattr support for send/receive
Presently, btrfs send/receive does not propagate inode attribute flags; all chattr operations are effectively discarded upon transmission. This patch adds userspace support for inode attribute flags. Kernel support can be found under the commit: btrfs: add chattr support for send/receive An associated xfstest can also be found at: btrfs: verify chattr support for send/receive test A caveat is that a user with an updated kernel (aware of chattrs) and an older version of btrfs-progs (unaware of chattrs) will fail to receive if a chattr is included in the send stream. Signed-off-by: Howard McLauchlan--- cmds-receive.c | 31 +++ send-dump.c| 8 +++- send-stream.c | 5 + send-stream.h | 1 + send.h | 2 ++ 5 files changed, 46 insertions(+), 1 deletion(-) diff --git a/cmds-receive.c b/cmds-receive.c index 68123a31..883aee4e 100644 --- a/cmds-receive.c +++ b/cmds-receive.c @@ -38,6 +38,7 @@ #include #include #include +#include #include "ctree.h" #include "ioctl.h" @@ -1059,6 +1060,35 @@ static int process_update_extent(const char *path, u64 offset, u64 len, return 0; } +static int process_chattr(const char *path, u64 flags, void *user) +{ + int ret = 0; + int fd = 0; + int _flags = flags; + struct btrfs_receive *rctx = user; + char full_path[PATH_MAX]; + + ret = path_cat_out(full_path, rctx->full_subvol_path, path); + if (ret < 0) { + error("chattr: path invalid: %s", path); + goto out; + } + + if (g_verbose >= 2) + fprintf(stderr, "chattr %s - flags=0%o\n", path, (int)flags); + + fd = open(full_path, O_RDONLY); + ret = ioctl(fd, FS_IOC_SETFLAGS, &_flags); + + if (ret < 0) { + ret = -errno; + error("chattr %s failed: %s", path, strerror(-ret)); + goto out; + } + +out: + return ret; +} static struct btrfs_send_ops send_ops = { .subvol = process_subvol, .snapshot = process_snapshot, @@ -1081,6 +,7 @@ static struct btrfs_send_ops send_ops = { .chown = process_chown, .utimes = process_utimes, .update_extent = process_update_extent, + .chattr = process_chattr, }; static int do_receive(struct btrfs_receive *rctx, const char *tomnt, diff --git a/send-dump.c b/send-dump.c index 1591e0cc..757ed5ec 100644 --- a/send-dump.c +++ b/send-dump.c @@ -316,6 +316,11 @@ static int print_update_extent(const char *path, u64 offset, u64 len, offset, len); } +static int print_chattr(const char *path, u64 flags, void *user) +{ + return PRINT_DUMP(user, path, "chattr", "flags=%llu", flags); +} + struct btrfs_send_ops btrfs_print_send_ops = { .subvol = print_subvol, .snapshot = print_snapshot, @@ -337,5 +342,6 @@ struct btrfs_send_ops btrfs_print_send_ops = { .chmod = print_chmod, .chown = print_chown, .utimes = print_utimes, - .update_extent = print_update_extent + .update_extent = print_update_extent, + .chattr = print_chattr, }; diff --git a/send-stream.c b/send-stream.c index 78f2571a..2c927af2 100644 --- a/send-stream.c +++ b/send-stream.c @@ -453,6 +453,11 @@ static int read_and_process_cmd(struct btrfs_send_stream *sctx) TLV_GET_U64(sctx, BTRFS_SEND_A_SIZE, ); ret = sctx->ops->update_extent(path, offset, tmp, sctx->user); break; + case BTRFS_SEND_C_CHATTR: + TLV_GET_STRING(sctx, BTRFS_SEND_A_PATH, ); + TLV_GET_U64(sctx, BTRFS_SEND_A_CHATTR, ); + ret = sctx->ops->chattr(path, tmp, sctx->user); + break; case BTRFS_SEND_C_END: ret = 1; break; diff --git a/send-stream.h b/send-stream.h index 39901f86..2446f22d 100644 --- a/send-stream.h +++ b/send-stream.h @@ -66,6 +66,7 @@ struct btrfs_send_ops { struct timespec *mt, struct timespec *ct, void *user); int (*update_extent)(const char *path, u64 offset, u64 len, void *user); + int (*chattr)(const char *path, u64 flags, void *user); }; int btrfs_read_and_process_send_stream(int fd, diff --git a/send.h b/send.h index fe613cbb..fd07495f 100644 --- a/send.h +++ b/send.h @@ -94,6 +94,7 @@ enum btrfs_send_cmd { BTRFS_SEND_C_END, BTRFS_SEND_C_UPDATE_EXTENT, + BTRFS_SEND_C_CHATTR, __BTRFS_SEND_C_MAX, }; #define BTRFS_SEND_C_MAX (__BTRFS_SEND_C_MAX - 1) @@ -131,6 +132,7 @@ enum { BTRFS_SEND_A_CLONE_PATH, BTRFS_SEND_A_CLONE_OFFSET, BTRFS_SEND_A_CLONE_LEN, + BTRFS_SEND_A_CHATTR, __BTRFS_SEND_A_MAX, }; -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at
[PATCH] btrfs: add chattr support for send/receive
Presently btrfs send/receive does not propagate inode attribute flags; all chattr operations are effectively discarded upon transmission. This patch adds kernel support for inode attribute flags. Userspace support can be found under the commit: btrfs-progs: add chattr support for send/receive An associated xfstest can be found at: btrfs: add verify chattr support for send/receive test A caveat is that a user with an updated kernel (aware of chattrs) and an older version of btrfs-progs (unaware of chattrs) will fail to receive if a chattr is included in the send stream. Signed-off-by: Howard McLauchlan--- Based on 4.17-rc1 fs/btrfs/ctree.h | 2 + fs/btrfs/ioctl.c | 2 +- fs/btrfs/send.c | 176 +++ fs/btrfs/send.h | 2 + 4 files changed, 154 insertions(+), 28 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 5474ef14d6e6..a0dc6a8a37eb 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1436,6 +1436,8 @@ struct btrfs_map_token { unsigned long offset; }; +unsigned int btrfs_flags_to_ioctl(unsigned int flags); + #define BTRFS_BYTES_TO_BLKS(fs_info, bytes) \ ((bytes) >> (fs_info)->sb->s_blocksize_bits) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 632e26d6f7ce..36ce1e589f9e 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -106,7 +106,7 @@ static unsigned int btrfs_mask_flags(umode_t mode, unsigned int flags) /* * Export inode flags to the format expected by the FS_IOC_GETFLAGS ioctl. */ -static unsigned int btrfs_flags_to_ioctl(unsigned int flags) +unsigned int btrfs_flags_to_ioctl(unsigned int flags) { unsigned int iflags = 0; diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 221e5cdb060b..da521a5a1843 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -101,6 +101,13 @@ struct send_ctx { u64 cur_inode_last_extent; u64 cur_inode_next_write_offset; + /* +* state for chattr purposes +*/ + u64 cur_inode_flip_flags; + u64 cur_inode_receive_flags; + int receive_flags_valid; + u64 send_progress; struct list_head new_refs; @@ -798,7 +805,7 @@ static int send_rmdir(struct send_ctx *sctx, struct fs_path *path) */ static int __get_inode_info(struct btrfs_root *root, struct btrfs_path *path, u64 ino, u64 *size, u64 *gen, u64 *mode, u64 *uid, - u64 *gid, u64 *rdev) + u64 *gid, u64 *rdev, u64 *flags) { int ret; struct btrfs_inode_item *ii; @@ -828,6 +835,8 @@ static int __get_inode_info(struct btrfs_root *root, struct btrfs_path *path, *gid = btrfs_inode_gid(path->nodes[0], ii); if (rdev) *rdev = btrfs_inode_rdev(path->nodes[0], ii); + if (flags) + *flags = btrfs_inode_flags(path->nodes[0], ii); return ret; } @@ -835,7 +844,7 @@ static int __get_inode_info(struct btrfs_root *root, struct btrfs_path *path, static int get_inode_info(struct btrfs_root *root, u64 ino, u64 *size, u64 *gen, u64 *mode, u64 *uid, u64 *gid, - u64 *rdev) + u64 *rdev, u64 *flags) { struct btrfs_path *path; int ret; @@ -844,7 +853,7 @@ static int get_inode_info(struct btrfs_root *root, if (!path) return -ENOMEM; ret = __get_inode_info(root, path, ino, size, gen, mode, uid, gid, - rdev); + rdev, flags); btrfs_free_path(path); return ret; } @@ -1233,7 +1242,7 @@ static int __iterate_backrefs(u64 ino, u64 offset, u64 root, void *ctx_) * accept clones from these extents. */ ret = __get_inode_info(found->root, bctx->path, ino, _size, NULL, NULL, - NULL, NULL, NULL); + NULL, NULL, NULL, NULL); btrfs_release_path(bctx->path); if (ret < 0) return ret; @@ -1593,7 +1602,7 @@ static int get_cur_inode_state(struct send_ctx *sctx, u64 ino, u64 gen) u64 right_gen; ret = get_inode_info(sctx->send_root, ino, NULL, _gen, NULL, NULL, - NULL, NULL); + NULL, NULL, NULL); if (ret < 0 && ret != -ENOENT) goto out; left_ret = ret; @@ -1602,7 +1611,7 @@ static int get_cur_inode_state(struct send_ctx *sctx, u64 ino, u64 gen) right_ret = -ENOENT; } else { ret = get_inode_info(sctx->parent_root, ino, NULL, _gen, - NULL, NULL, NULL, NULL); + NULL, NULL, NULL, NULL, NULL); if (ret < 0 && ret != -ENOENT) goto out; right_ret = ret; @@ -1771,7 +1780,7 @@ static int
[PATCH] btrfs: optimize free space tree bitmap conversion
Presently, convert_free_space_to_extents() does a linear scan of the bitmap. We can speed this up with find_next_{bit,zero_bit}_le(). This patch replaces the linear scan with find_next_{bit,zero_bit}_le(). Testing shows a 20-33% decrease in execution time for convert_free_space_to_extents(). Suggested-by: Omar SandovalSigned-off-by: Howard McLauchlan --- Based on 4.17-rc1 fs/btrfs/extent_io.c | 12 - fs/btrfs/extent_io.h | 4 +-- fs/btrfs/free-space-tree.c | 54 ++ 3 files changed, 27 insertions(+), 43 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index e99b329002cf..1c0e7ce49556 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -5620,12 +5620,12 @@ void copy_extent_buffer(struct extent_buffer *dst, struct extent_buffer *src, } } -void le_bitmap_set(u8 *map, unsigned int start, int len) +void le_bitmap_set(unsigned long *map, unsigned int start, int len) { - u8 *p = map + BIT_BYTE(start); + char *p = ((char *)map) + BIT_BYTE(start); const unsigned int size = start + len; int bits_to_set = BITS_PER_BYTE - (start % BITS_PER_BYTE); - u8 mask_to_set = BITMAP_FIRST_BYTE_MASK(start); + char mask_to_set = BITMAP_FIRST_BYTE_MASK(start); while (len - bits_to_set >= 0) { *p |= mask_to_set; @@ -5640,12 +5640,12 @@ void le_bitmap_set(u8 *map, unsigned int start, int len) } } -void le_bitmap_clear(u8 *map, unsigned int start, int len) +void le_bitmap_clear(unsigned long *map, unsigned int start, int len) { - u8 *p = map + BIT_BYTE(start); + char *p = ((char *)map) + BIT_BYTE(start); const unsigned int size = start + len; int bits_to_clear = BITS_PER_BYTE - (start % BITS_PER_BYTE); - u8 mask_to_clear = BITMAP_FIRST_BYTE_MASK(start); + char mask_to_clear = BITMAP_FIRST_BYTE_MASK(start); while (len - bits_to_clear >= 0) { *p &= ~mask_to_clear; diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index a53009694b16..a6db233f4a40 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -84,8 +84,8 @@ static inline int le_test_bit(int nr, const u8 *addr) return 1U & (addr[BIT_BYTE(nr)] >> (nr & (BITS_PER_BYTE-1))); } -void le_bitmap_set(u8 *map, unsigned int start, int len); -void le_bitmap_clear(u8 *map, unsigned int start, int len); +void le_bitmap_set(unsigned long *map, unsigned int start, int len); +void le_bitmap_clear(unsigned long *map, unsigned int start, int len); struct extent_state; struct btrfs_root; diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c index 32a0f6cb5594..0ddf96b01a3a 100644 --- a/fs/btrfs/free-space-tree.c +++ b/fs/btrfs/free-space-tree.c @@ -138,9 +138,9 @@ static inline u32 free_space_bitmap_size(u64 size, u32 sectorsize) return DIV_ROUND_UP((u32)div_u64(size, sectorsize), BITS_PER_BYTE); } -static u8 *alloc_bitmap(u32 bitmap_size) +static unsigned long *alloc_bitmap(u32 bitmap_size) { - u8 *ret; + unsigned long *ret; unsigned int nofs_flag; /* @@ -166,7 +166,8 @@ int convert_free_space_to_bitmaps(struct btrfs_trans_handle *trans, struct btrfs_free_space_info *info; struct btrfs_key key, found_key; struct extent_buffer *leaf; - u8 *bitmap, *bitmap_cursor; + unsigned long *bitmap; + char *bitmap_cursor; u64 start, end; u64 bitmap_range, i; u32 bitmap_size, flags, expected_extent_count; @@ -255,7 +256,7 @@ int convert_free_space_to_bitmaps(struct btrfs_trans_handle *trans, goto out; } - bitmap_cursor = bitmap; + bitmap_cursor = (char *)bitmap; bitmap_range = fs_info->sectorsize * BTRFS_FREE_SPACE_BITMAP_BITS; i = start; while (i < end) { @@ -304,13 +305,10 @@ int convert_free_space_to_extents(struct btrfs_trans_handle *trans, struct btrfs_free_space_info *info; struct btrfs_key key, found_key; struct extent_buffer *leaf; - u8 *bitmap; + unsigned long *bitmap; u64 start, end; - /* Initialize to silence GCC. */ - u64 extent_start = 0; - u64 offset; u32 bitmap_size, flags, expected_extent_count; - int prev_bit = 0, bit, bitnr; + unsigned long nrbits, start_bit, end_bit; u32 extent_count = 0; int done = 0, nr; int ret; @@ -348,7 +346,7 @@ int convert_free_space_to_extents(struct btrfs_trans_handle *trans, break; } else if (found_key.type == BTRFS_FREE_SPACE_BITMAP_KEY) { unsigned long ptr; - u8 *bitmap_cursor; + char *bitmap_cursor; u32 bitmap_pos, data_size;
[RFC] Add support for BTRFS raid5/6 to GRUB
Hi All, Below you can find a patch to add support for accessing files from grub in a RAID5/6 btrfs filesystem. This is a RFC because it is missing the support for recovery (i.e. if some devices are missed). In the next days (weeks ?) I will extend this patch to support also this case. Comments are welcome. BR G.Baroncelli --- commit 8c80a1b7c913faf50f95c5c76b4666ed17685666 Author: Goffredo BaroncelliDate: Tue Apr 17 21:40:31 2018 +0200 Add initial support for btrfs raid5/6 chunk diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c index be195448d..4c5632acb 100644 --- a/grub-core/fs/btrfs.c +++ b/grub-core/fs/btrfs.c @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10 #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED0x20 #define GRUB_BTRFS_CHUNK_TYPE_RAID100x40 +#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80 +#define GRUB_BTRFS_CHUNK_TYPE_RAID60x100 grub_uint8_t dummy2[0xc]; grub_uint16_t nstripes; grub_uint16_t nsubstripes; @@ -764,6 +766,39 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr, stripe_offset = low + chunk_stripe_length * high; csize = chunk_stripe_length - low; + break; + } + case GRUB_BTRFS_CHUNK_TYPE_RAID5: + case GRUB_BTRFS_CHUNK_TYPE_RAID6: + { + grub_uint64_t nparities; + grub_uint64_t parity_pos; + grub_uint64_t stripe_nr, high; + grub_uint64_t low; + + redundancy = 1; /* no redundancy for now */ + + if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5) + { + grub_dprintf ("btrfs", "RAID5\n"); + nparities = 1; + } + else + { + grub_dprintf ("btrfs", "RAID6\n"); + nparities = 2; + } + + stripe_nr = grub_divmod64 (off, chunk_stripe_length, ); + + high = grub_divmod64 (stripe_nr, nstripes - nparities, ); + grub_divmod64 (high+nstripes-nparities, nstripes, _pos); + grub_divmod64 (parity_pos+nparities+stripen, nstripes, ); + + stripe_offset = low + chunk_stripe_length * high; + csize = chunk_stripe_length - low; + break; } default: -- gpg @keyserver.linux.it: Goffredo Baroncelli Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to dump/find parity of RAID-5 file?
On 02/06/2017 09:40 PM, Goffredo Baroncelli wrote: > On 2017-02-03 11:44, Lakshmipathi.G wrote: >> Hi. >> >> Came across this thread >> https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg55161.html >> Exploring possibility of adding test-scripts around these area using >> dump-tree & corrupt-block.But >> unable to figure-out how to get parity of file or find its location. >> dump-tree output gave, >> >> item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 145096704) itemoff 15557 >> itemsize 144 >> length 134217728 owner 2 stripe_len 65536 type DATA|RAID5 >> io_align 65536 io_width 65536 sector_size 4096 >> num_stripes 3 sub_stripes 0 >> stripe 0 devid 3 offset 6368 # Is this >> parity?Seems empty? >> dev_uuid f62df114-186c-4e48-8152-9ed15aa078b4 >> stripe 1 devid 2 offset 6368 # Contains file >> data-stripe-1 >> dev_uuid c0aeaab0-e57e-4f7a-9356-db1878876d9f >> stripe 2 devid 1 offset 83034112 # Contains file >> data-stripe-2 >> dev_uuid 637b3666-9d8f-4ec4-9969-53b0b933b9b1 >> thanks. > > IIRC, the parity is spread across the disk stripes of the chunk. > > So first you have to find the logical-offset [LO] where the the file begins. > Then you have to map this offset to the chunk which holds the data. The chunk > has the following info: > - chunk start [CS], chunk length [CL] > - for each stripe: > where the stripe starts > > If you subtract the chunk-start from the logical-offset [ CO == LO-CS], you > will find the offset where the data belongs in the chunk. > > As stated above, the PARITY is spread across the chunk stripes. So (supposing > that the stripe size is 64K, the raid level is 5, the disks are three), > > - the first 64k of stripe 0, is data [0..64K) > - the first 64k of stripe 1, is data [64..128K) > - the first 64k of stripe 2 is parity, > > - the 2nd 64k of stripe 0 is parity, > - the 2nd 64k of stripe 1, is data [128..196K) > - the 2nd 64k of stripe 2, is data [192..256K) > > - the 3rd 64k of stripe 0, is data [256..320K) > - the 3rd 64k of stripe 1 is parity, > - the 3rd 64k of stripe 2, is data [320..384K) > and so on, I was wrong ! - the 3rd 64k of stripe 0, is data [320..384K) - the 3rd 64k of stripe 1 is parity, - the 3rd 64k of stripe 2, is data [256..320K) Basically stripe 0 and 2 were swapped. The idea is that after parity there is stripe 1, then stripe 2, and so on DDD III SSS KKK 123 01P P23 5P4 67P Where P = parity 0...7 are the slices of data So 'P' parity move from left to right starting from the last disk; the data are in the increasing address from left to right *starting* from parity in a circular buffer. I am trying to implement raid5/6 in grub, so I went deep in the raid5 layout BR G.Baroncelli > > To find the data, You have to compare the CO to the data [...) range. > > If you look to an my old patch (unfinished :-( ), you can find some example > to dump the different stripe > > [BTRFS-PROGS][PATCH][V2] Add two new commands: 'btrfs insp physical-find' and > 'btrfs insp physical-dump' > > > > > > >> Cheers. >> Lakshmipathi.G >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- gpg @keyserver.linux.it: Goffredo Baroncelli Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: push relocation recovery into a helper thread
On a file system with many snapshots and qgroups enabled, an interrupted balance can end up taking a long time to mount due to recovering the relocations during mount. It does this in the task performing the mount, which can't be interrupted and may prevent mounting (and systems booting) for a long time as well. The thing is that as part of balance, this runs in the background all the time. This patch pushes the recovery into a helper thread and allows the mount to continue normally. We hold off on resuming any paused balance operation until after the relocation has completed, disallow any new balance operations if it's running, and wait for it on umount and remounting read-only. This doesn't do anything to address the relocation recovery operation taking a long time but does allow the file system to mount. Signed-off-by: Jeff Mahoney--- fs/btrfs/ctree.h |7 +++ fs/btrfs/disk-io.c|7 ++- fs/btrfs/relocation.c | 92 +- fs/btrfs/super.c |5 +- fs/btrfs/volumes.c|6 +++ 5 files changed, 97 insertions(+), 20 deletions(-) --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1052,6 +1052,10 @@ struct btrfs_fs_info { struct btrfs_work qgroup_rescan_work; bool qgroup_rescan_running; /* protected by qgroup_rescan_lock */ + /* relocation recovery items */ + bool relocation_recovery_started; + struct completion relocation_recovery_completion; + /* filesystem state */ unsigned long fs_state; @@ -3671,7 +3675,8 @@ int btrfs_init_reloc_root(struct btrfs_t struct btrfs_root *root); int btrfs_update_reloc_root(struct btrfs_trans_handle *trans, struct btrfs_root *root); -int btrfs_recover_relocation(struct btrfs_root *root); +int btrfs_recover_relocation(struct btrfs_fs_info *fs_info); +void btrfs_wait_for_relocation_completion(struct btrfs_fs_info *fs_info); int btrfs_reloc_clone_csums(struct inode *inode, u64 file_pos, u64 len); int btrfs_reloc_cow_block(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *buf, --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2999,7 +2999,7 @@ retry_root_backup: goto fail_qgroup; mutex_lock(_info->cleaner_mutex); - ret = btrfs_recover_relocation(tree_root); + ret = btrfs_recover_relocation(fs_info); mutex_unlock(_info->cleaner_mutex); if (ret < 0) { btrfs_warn(fs_info, "failed to recover relocation: %d", @@ -3017,7 +3017,8 @@ retry_root_backup: if (IS_ERR(fs_info->fs_root)) { err = PTR_ERR(fs_info->fs_root); btrfs_warn(fs_info, "failed to read fs tree: %d", err); - goto fail_qgroup; + close_ctree(fs_info); + return err; } if (sb_rdonly(sb)) @@ -3778,6 +3779,8 @@ void close_ctree(struct btrfs_fs_info *f /* wait for the qgroup rescan worker to stop */ btrfs_qgroup_wait_for_completion(fs_info, false); + btrfs_wait_for_relocation_completion(fs_info); + /* wait for the uuid_scan task to finish */ down(_info->uuid_tree_rescan_sem); /* avoid complains from lockdep et al., set sem back to initial state */ --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -22,6 +22,7 @@ #include #include #include +#include #include "ctree.h" #include "disk-io.h" #include "transaction.h" @@ -4492,14 +4493,61 @@ static noinline_for_stack int mark_garba } /* - * recover relocation interrupted by system crash. - * * this function resumes merging reloc trees with corresponding fs trees. * this is important for keeping the sharing of tree blocks */ -int btrfs_recover_relocation(struct btrfs_root *root) +static int +btrfs_resume_relocation(void *data) { - struct btrfs_fs_info *fs_info = root->fs_info; + struct btrfs_fs_info *fs_info = data; + struct btrfs_trans_handle *trans; + struct reloc_control *rc = fs_info->reloc_ctl; + int err, ret; + + btrfs_info(fs_info, "resuming relocation"); + + BUG_ON(!rc); + + mutex_lock(_info->cleaner_mutex); + + merge_reloc_roots(rc); + + unset_reloc_control(rc); + + trans = btrfs_join_transaction(rc->extent_root); + if (IS_ERR(trans)) + err = PTR_ERR(trans); + else { + ret = btrfs_commit_transaction(trans); + if (ret < 0) + err = ret; + } + + kfree(rc); + + if (err == 0) { + struct btrfs_root *fs_root; + + /* cleanup orphan inode in data relocation tree */ + fs_root = read_fs_root(fs_info, BTRFS_DATA_RELOC_TREE_OBJECTID); + if (IS_ERR(fs_root)) + err = PTR_ERR(fs_root); +
Re: [PATCH 13/16] btrfs: remove redundant read-only check from btrfs_cancel_balance
On Mon, Apr 16, 2018 at 05:43:46PM +0800, Anand Jain wrote: > On 04/04/2018 02:34 AM, David Sterba wrote: > > Balance cannot be started on a read-only filesystem and will have to > > finish/exit before eg. going to read-only via remount. Cancelling does > > not need to check for that. > > > > In case the filesystem is forcibly set to read-only after an error, > > balance will finish anyway and if the cancel call is too fast it will > > just wait for that to happen. Again does not have to check. > > What if there is a power recycle and mounted as readonly > after the reboot? So the balance item would be stored on disk from previous run, mount will set it up but balance will not be resumed (sb_rdonly check in open_ctree happens earlier). Calling btrfs_cancel_balance by ioctl would reset the in-memory state and also would want to delete the balance item on disk, so yes, the check is needed. I'll add a comment explaining why the read-only check is there instead. Thanks for catching it. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/16] btrfs: add sanity check when resuming balance after mount
On Mon, Apr 16, 2018 at 02:10:08PM +0800, Anand Jain wrote: > > > On 04/04/2018 02:34 AM, David Sterba wrote: > > Replace a WARN_ON with a proper check and message in case something goes > > really wrong and resumed balance cannot set up its exclusive status. > > > The check is a user friendly assertion, I don't expect to ever happen > > under normal circumstances. > > > > Also document that the paused balance starts here and owns the exclusive > > op status. > > > > Signed-off-by: David Sterba> > --- > > fs/btrfs/volumes.c | 16 ++-- > > 1 file changed, 14 insertions(+), 2 deletions(-) > > > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > > index eb78c1d0ce2b..843982a2cbdb 100644 > > --- a/fs/btrfs/volumes.c > > +++ b/fs/btrfs/volumes.c > > @@ -3982,6 +3982,20 @@ int btrfs_recover_balance(struct btrfs_fs_info > > *fs_info) > > struct btrfs_key key; > > int ret; > > > > + /* > > +* This should never happen, as the paused balance state is recovered > > +* during mount without any chance of other exclusive ops to collide. > > +* Let it fail early and do not continue mount. > > +* > > +* Otherwise, this gives the exclusive op status to balance and keeps > > +* in paused state until user intervention (cancel or umount). > > +*/ > > + if (test_and_set_bit(BTRFS_FS_EXCL_OP, _info->flags)) { > > + btrfs_err(fs_info, > > + "cannot set exclusive op status to resume balance"); > > + return -EINVAL; > > + } > > > > We need the test_and_set_bit(BTRFS_FS_EXCL_OP) only if we confirm that > there is a pending balance. Its better to test and set at the same > place as WARN_ON before. Right. And the patch is wrong, because we need to set up fs_info::balance_ctl in all cases, otherwise unpausing balance would not work as expected. The only change should be a better message and maybe not even that, as it's just preparing the state but the exclusive op is claimed later in btrfs_resume_balance_async. I'll revisit how the error handling is done after resuming balance or dev-replace, this is considered a hard failure (mount, rw remount) but I don't think it's necessary. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: clean up resources during umount after trans is aborted
On Tue, Apr 17, 2018 at 03:34:04PM +0200, David Sterba wrote: > On Sat, Mar 31, 2018 at 06:11:56AM +0800, Liu Bo wrote: > > Currently if some fatal errors occur, like all IO get -EIO, resources > > would be cleaned up when > > a) transaction is being committed or > > b) BTRFS_FS_STATE_ERROR is set > > > > However, in some rare cases, resources may be left alone after transaction > > gets aborted and umount may run into some ASSERT(), e.g. > > ASSERT(list_empty(_group->dirty_list)); > > > > For case a), in btrfs_commit_transaciton(), there're several places at the > > beginning where we just call btrfs_end_transaction() without cleaning up > > resources. For case b), it is possible that the trans handle doesn't have > > any dirty stuff, then only trans hanlde is marked as aborted while > > BTRFS_FS_STATE_ERROR is not set, so resources remain in memory. > > I have some doubts here. The flag BTRFS_FS_STATE_TRANS_ABORTED was added > to avoid duplicate warnings when the transaction is aborted for the > first time. And nothing else. > > BTRFS_FS_STATE_ERROR should track the overall state and is checked in > several places if a post-error cleanup is necessary. > > Calling abort should set BTRFS_FS_STATE_ERROR immediatelly so any caller > up the callchaing can potentially cleanup. > > > This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that > > all resources won't stay in memory after umount. > > The question is why BTRFS_FS_STATE_ERROR is not set but we still got > here and the filesystem is in an error state. > > You're not specific about the rare cases where the resources are left > after abort and the umount is about to start. I think the rare cases > need to be identified and understood otherwise this seems like papering > over the problem. Yes, that makes sense. btrfs_commit_transaction() -> btrfs_run_delayed_refs() # the very 1st one -> __btrfs_run_delayed_refs() # bail out with errors e.g. -EIO, -ENOSPC -> btrfs_abort_transaction() -> # set BTRFS_FS_STATE_TRANS_ABORTED -> __btrfs_abort_transaction() # !trans->dirty and no new_bgs, then return without setting BTRFS_FS_STATA_ERROR While trans handle is aborted, cur_trans is not aborted, so that other threads would join this @cur_trans and it's likely that they would go thru the same process. Eventually umount is called, close_ctree() -> btrfs_commit_super() -> join_transaction() -> btrfs_commit_transaction() # bail out due to the same reason -> ... # btrfs_error_commit_super is not called due to STATE_ERROR not being set. So as we can see, if btrfs_commit_transaction() enters COMMIT phrase, then cleanup_transaction would handle the rest well, otherwise we'd get this. thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: add comment about BTRFS_FS_EXCL_OP
On Mon, Apr 16, 2018 at 06:28:31PM +0800, Anand Jain wrote: > Adds comments about BTRFS_FS_EXCL_OP to existing comments > about the device locks. Thanks, a few comments below. > Signed-off-by: Anand Jain> --- > fs/btrfs/volumes.c | 16 > 1 file changed, 16 insertions(+) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 3a3912a9dec9..18306165ffaa 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -202,6 +202,22 @@ static int __btrfs_map_block(struct btrfs_fs_info > *fs_info, > * device_list_mutex > * chunk_mutex > * balance_mutex > + * > + * BTRFS_FS_EXCL_OP > + * > + * Maintains the exclusivity of the following device operations. > + * > + * - Balance (*) > + * - Add device add > + * - Remove device remove > + * - Replace (*) device replace > + * - Resize > + * > + * The device operation with (*) indicate it can continue after a CLI pause > + * (only balance can be paused using CLI) or remount or unmount-mount > sequence > + * or a power-recycle, as long as the FS is still writeable. And during the this would be more readable if is' a bullet list like above > + * course of the pause, the BTRFS_FS_EXCL_OP remains set. BTRFS_FS_EXCL_OP > flag > + * is set and cleared using atomic functions. Please rephrase that without the CLI and use 'ioctl' instead, as this is the interface. > */ > > DEFINE_MUTEX(uuid_mutex); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 3/4] btrfs: Refactor btrfs_ioctl_snap_destroy() by using btrfs_delete_subvolume()
On Tue, Apr 17, 2018 at 09:21:16AM +0900, Misono Tomohiro wrote: > On 2018/04/17 2:53, David Sterba wrote: > > On Wed, Apr 11, 2018 at 02:20:49PM +0900, Misono Tomohiro wrote: > >> Use btrfs_delete_subvolume() to refactor btrfs_ioctl_snap_destroy(). > >> The permission check is still done in btrfs_ioctl_snap_destroy(). Also, > >> call of d_delete() is still required since btrfs_delete_subvolume() > >> does not call it. > >> > >> As a result, btrfs_unlink_subvol() and may_destroy_subvol() > >> become static functions. No functional change happens. > >> > >> Signed-off-by: Tomohiro Misono> > > > Why is this still split into two patches? Factoring out a function > > should happen in one patch, ie 2 and 3 in one go. Do you have a reason > > not to do it like that? > > I thought this reduces code change in one patch and might be good, but > I'm completely fine with merging 2nd and 3rd patches. This makes the rewiew harder, if the code is moved the diff can be long but we have both pieces to compare. Moving code is considered one logical change and should not introduce any other changes, besides renaming or formatting fixups. > Should I send v5? Yes please, this is not a trivial change and the changelogs need to be adjusted. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
unsubscribe linux-kernel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Filesystem full with unallocated space?
Hello! I am getting ENOSPC on my root filesystem with plenty of unallocated space according to "fi usage". Any idea why that may be? This is a root partition for Debian Stable. Not doing anything unusual that I'm aware of. No snapshots. Thanks! Matthew uname -a: Linux bigfoot 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3+deb9u1 (2017-12-23) x86_64 GNU/Linux btrfs --version: btrfs-progs v4.7.3 btrfs fi show: Label: none uuid: 2364c63f-e20c-410f-90b4-05f722ee1c77 Total devices 1 FS bytes used 176.27GiB devid 1 size 246.33GiB used 185.01GiB path /dev/sda2 btrfs fi df /: Data, single: total=183.00GiB, used=175.78GiB System, single: total=4.00MiB, used=48.00KiB Metadata, single: total=2.01GiB, used=504.81MiB GlobalReserve, single: total=211.39MiB, used=0.00B dmesg (I had tried balancing - "balance start -dusage=95 /" completed successfully and didn't help): 62411.020348] usb 2-1.2.4: USB disconnect, device number 12 [62656.848845] usb 2-1.2: new high-speed USB device number 14 using ehci-pci [62656.958066] usb 2-1.2: New USB device found, idVendor=0451, idProduct=8044 [62656.958071] usb 2-1.2: New USB device strings: Mfr=0, Product=0, SerialNumber=1 [62656.958073] usb 2-1.2: SerialNumber: A80108717AA4 [62656.958692] hub 2-1.2:1.0: USB hub found [62656.958828] hub 2-1.2:1.0: 4 ports detected [62657.244860] usb 2-1.2.2: new high-speed USB device number 15 using ehci-pci [62657.360966] usb 2-1.2.2: New USB device found, idVendor=2109, idProduct=2813 [62657.360970] usb 2-1.2.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [62657.360972] usb 2-1.2.2: Product: USB2.0 Hub [62657.360974] usb 2-1.2.2: Manufacturer: VIA Labs, Inc. [62657.361955] hub 2-1.2.2:1.0: USB hub found [62657.362734] hub 2-1.2.2:1.0: 4 ports detected [62657.456890] usb 2-1.2.3: new high-speed USB device number 16 using ehci-pci [62657.692872] usb 2-1.2.2.1: new full-speed USB device number 17 using ehci-pci [62657.815204] usb 2-1.2.2.1: New USB device found, idVendor=1050, idProduct=0211 [62657.815209] usb 2-1.2.2.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [62657.815227] usb 2-1.2.2.1: Product: Yubico WinUSB Gnubby (gnubby1) [62657.815228] usb 2-1.2.2.1: Manufacturer: Yubico [62657.908881] usb 2-1.2.2.2: new full-speed USB device number 18 using ehci-pci [62658.032771] usb 2-1.2.2.2: New USB device found, idVendor=046d, idProduct=c52b [62658.032775] usb 2-1.2.2.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [62658.032778] usb 2-1.2.2.2: Product: USB Receiver [62658.032780] usb 2-1.2.2.2: Manufacturer: Logitech [62658.040715] logitech-djreceiver 0003:046D:C52B.000D: hiddev0,hidraw4: USB HID v1.11 Device [Logitech USB Receiver] on usb-:00:1d.0-1.2.2.2/input2 [62658.188708] input: Logitech M705 as /devices/pci:00/:00:1d.0/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2.2/2-1.2.2.2:1.2/0003:046D:C52B.000D/0003:046D:101B.000E/input/input15 [62658.189329] logitech-hidpp-device 0003:046D:101B.000E: input,hidraw5: USB HID v1.11 Mouse [Logitech M705] on usb-:00:1d.0-1.2.2.2:2 [62658.248907] usb 2-1.2.2.3: new full-speed USB device number 19 using ehci-pci [62658.370856] usb 2-1.2.2.3: New USB device found, idVendor=2a03, idProduct=0042 [62658.370859] usb 2-1.2.2.3: New USB device strings: Mfr=1, Product=2, SerialNumber=220 [62658.370861] usb 2-1.2.2.3: Product: Arduino Mega [62658.370862] usb 2-1.2.2.3: Manufacturer: Arduino Srl [62658.370863] usb 2-1.2.2.3: SerialNumber: 75533353437351C07182 [62658.371497] cdc_acm 2-1.2.2.3:1.0: ttyACM0: USB ACM device [62659.062915] input: Logitech K270 as /devices/pci:00/:00:1d.0/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2.2/2-1.2.2.2:1.2/0003:046D:C52B.000D/0003:046D:4003.000F/input/input16 [62659.063686] logitech-hidpp-device 0003:046D:4003.000F: input,hidraw6: USB HID v1.11 Keyboard [Logitech K270] on usb-:00:1d.0-1.2.2.2:3 [62659.807183] usb 2-1.2.3: New USB device found, idVendor=046d, idProduct=082d [62659.807188] usb 2-1.2.3: New USB device strings: Mfr=0, Product=2, SerialNumber=1 [62659.807190] usb 2-1.2.3: Product: HD Pro Webcam C920 [62659.807192] usb 2-1.2.3: SerialNumber: C034F41F [62659.807647] uvcvideo: Found UVC 1.00 device HD Pro Webcam C920 (046d:082d) [62659.808304] uvcvideo 2-1.2.3:1.0: Entity type for entity Processing 3 was not initialized! [62659.808308] uvcvideo 2-1.2.3:1.0: Entity type for entity Extension 6 was not initialized! [62659.808311] uvcvideo 2-1.2.3:1.0: Entity type for entity Extension 12 was not initialized! [62659.808314] uvcvideo 2-1.2.3:1.0: Entity type for entity Camera 1 was not initialized! [62659.808316] uvcvideo 2-1.2.3:1.0: Entity type for entity Extension 8 was not initialized! [62659.808319] uvcvideo 2-1.2.3:1.0: Entity type for entity Extension 9 was not initialized! [62659.808321] uvcvideo 2-1.2.3:1.0: Entity type for entity Extension 10 was not initialized! [62659.808324] uvcvideo 2-1.2.3:1.0: Entity type for entity Extension 11 was not initialized! [62659.808551] input:
Re: [PATCH] btrfs: Fix wrong btrfs_delalloc_release_extents parameter
On Tue, Apr 17, 2018 at 06:43:58PM +0800, Qu Wenruo wrote: > Commit 43b18595d660 ("btrfs: qgroup: Use separate meta reservation type > for delalloc") merged into mainline is not the latest version submitted > to mail list in Dec 2017. > > It has a fatal wrong @qgroup_free parameter, which results increasing > qgroup metadata pertrans reserved space, and causing a lot of early EDQUOT. > > Fix it by applying the correct diff on top of current branch. > > Fixes: 43b18595d660 ("btrfs: qgroup: Use separate meta reservation type for > delalloc") > Signed-off-by: Qu WenruoAdded to 4.17 queue, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: delayed-inode: Remove wrong qgroup meta reservation calls
On Tue, Apr 17, 2018 at 04:52:45PM +0800, Qu Wenruo wrote: > Commit 4f5427ccce5d ("btrfs: delayed-inode: Use new qgroup meta rsv for > delayed inode and item") merged into mainline is not the updated version > submitted to the mail list in Dec 2017. > > Which lacks the following fixes: > > 1) Remove btrfs_qgroup_convert_reserved_meta() call in >btrfs_delayed_item_release_metadata() > 2) Remove btrfs_qgroup_reserve_meta_prealloc() call in >btrfs_delayed_inode_reserve_metadata() > > Those fixes will resolve unexpected EDQUOT problems. > > Fixes: 4f5427ccce5d ("btrfs: delayed-inode: Use new qgroup meta rsv for > delayed inode and item") > Signed-off-by: Qu WenruoAdded to 4.17 queue, thanks. > @@ -569,6 +569,12 @@ static int btrfs_delayed_item_reserve_metadata(struct > btrfs_trans_handle *trans, > dst_rsv = _info->delayed_block_rsv; > > num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1); > + > + /* > + * Here we migrate space rsv from transaction rsv, since have > + * already reserved space when starting a transaction. > + * So no need to reserve qgroup space here. > + */ Please format the comments to the full line width. > @@ -647,7 +657,9 @@ static int btrfs_delayed_inode_reserve_metadata( > "delayed_inode", > btrfs_ino(inode), > num_bytes, 1); > - } > + } else > + btrfs_qgroup_free_meta_prealloc(root, > + fs_info->nodesize); Please don't diverge from the coding style, I'm fixing such issues but, you know. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fstests: btrfs/130 make it workable on small systems
This test case takes a long time to complete at the default LOAD_FACTOR=1, so reduce the nr_extents to 256, so for larger systems it can still use higher LOAD_FACTOR. Signed-off-by: Anand Jain--- tests/btrfs/130 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/btrfs/130 b/tests/btrfs/130 index 96c8f9d9d526..05f92c11a998 100755 --- a/tests/btrfs/130 +++ b/tests/btrfs/130 @@ -60,7 +60,7 @@ _require_scratch_reflink _scratch_mkfs > /dev/null 2>&1 _scratch_mount -nr_extents=$((4096 * $LOAD_FACTOR)) +nr_extents=$((256 * $LOAD_FACTOR)) # Use 128K blocksize, the default value of both deduperemove or # inband dedupe -- 2.15.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] fstests: btrfs/130 fix Invalid argument
btrfs-progs patch[1] replaced read(2) write(2) with splice(2) and caused the append-redirect to stop working. Before: btrfs send /btrfs/ro_send > /dev/null At subvol /btrfs/ro_snap btrfs send /btrfs/ro_send >> /dev/null At subvol /btrfs/ro_snap After: btrfs send /btrfs/ro_send > /dev/null At subvol /btrfs/ro_snap btrfs send /btrfs/ro_send >> /dev/null At subvol /btrfs/ro_snap ERROR: failed to read stream from kernel: Invalid argument Further in the test case the line.. btrfs/130 :: _run_btrfs_util_prog send $SCRATCH_MNT/ro_snap > /dev/null 2>&1 which intended to redirect send output to /dev/null, but ended up append redirect to the $seqres.full file. And so this test case failed as 'Invalid argument' for sometime now. Still as append of a btrfs send output doesn't make sense, so fix the fstests. Also adds logs going into $seqres.full. [1] ba23855cdc8961bbaef1fcad4854d494cdb3afd3 btrfs-progs: send: use splice syscall instead of read/write to transfer buffer Signed-off-by: Anand Jain--- tests/btrfs/130 | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/tests/btrfs/130 b/tests/btrfs/130 index edb7397a1ceb..96c8f9d9d526 100755 --- a/tests/btrfs/130 +++ b/tests/btrfs/130 @@ -67,12 +67,14 @@ nr_extents=$((4096 * $LOAD_FACTOR)) blocksize=$((128 * 1024)) file=$SCRATCH_MNT/foobar +echo LOAD_FACTOR=$LOAD_FACTOR nr_extents=$nr_extents blocksize=$blocksize >> $seqres.full + # create the initial file, whose file extents are all point to one extent _pwrite_byte 0xcdcdcdcd 0 $blocksize $file | _filter_xfs_io for i in $(seq 1 $(($nr_extents - 1))); do _reflink_range $file 0 $file $(($i * $blocksize)) $blocksize \ - > /dev/null 2>&1 + >> $seqres.full 2>&1 done # create a RO snapshot, so we can send out the snapshot @@ -82,7 +84,8 @@ _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/ro_snap # 1) OOM since memory is allocated inside a O(n^3) loop # 2) Softlock since time consuming backref walk is called without scheduling. # the send destination is not important, just send will cause the problem -_run_btrfs_util_prog send $SCRATCH_MNT/ro_snap > /dev/null 2>&1 +echo "# $BTRFS_UTIL_PROG send $SCRATCH_MNT/ro_snap > /dev/null" >> $seqres.full +$BTRFS_UTIL_PROG send $SCRATCH_MNT/ro_snap > /dev/null 2>>$seqres.full # success, all done status=0 -- 2.15.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Do super block verification before writing it to disk
On 2018年04月17日 22:32, Anand Jain wrote: > > > On 04/17/2018 05:58 PM, Qu Wenruo wrote: >> >> >> On 2018年04月17日 17:05, Anand Jain wrote: >>> v3: Update commit message to show the corruption in details. Modify the kernel error message to show corruption is detected before transaction commitment. >>> Nice. Thanks. more below. >>> @@ -3310,6 +3311,27 @@ static int write_dev_supers(struct btrfs_device *device, btrfs_set_super_bytenr(sb, bytenr); + /* check the validation of the primary sb before writing */ + if (i == 0) { + ret = btrfs_check_super_valid(device->fs_info, sb); + if (ret) { + btrfs_err(device->fs_info, +"superblock corruption detected before transaction commitment for device %llu", + device->devid); + return -EUCLEAN; + } >>> >>> Why not move this entire check further below, after we have the ready >>> crc and use btrfs_check_super_csum(), instead of >>> btrfs_check_super_valid()? so that we verify only what is known to be >>> corrupted that is .. >> >> The problem is, we don't know the cause yet, so we must check the whole >> superblock. >> >> For example, if the corruption is caused by some wild pointer of other >> kernel module, and we're just unlucky that one day it corrupts nodesize, >> then we can't detect it if we only check certain members. > > Right I notice that. > > But without btrfs_check_super_csum(), it leaves out checking one of the > member (csum_type) which is know to be corrupted at the two instances, > so it can also include btrfs_check_super_csum(). > > There were two cases, both of them corrupted the same offset, its not > just a coincidence that both of these reported corrupted the same > offset. Yep, but since we're here to do extra verification, checking everything is never a bad idea. By this we don't need to bother checking other members when new corruption pops out. > > Also the incompatible features flags (169) are still valid in both the > cases. It looks as if we wrote u32 to a u64. I notice that we provide > the options to write the incompatible flags through mount option, sysfs > and ioctl. While I don't think that's the cause of these reported corruption. I'd prefer some under/over flow of memory which corrupted fs_info->super_copy somehow. It may be btrfs or it may not. It's pretty hard to determine with just 2 reports. Especially for ben's report, he is using latest vega graphics IIRC, who knows what could went wrong with latest amd drm codes. > > >>> btrfs_super_block { >>> :: >>> __le64 incompat_flags; >>> __le16 csum_type; >>> :: >>> } >>> >>> And also can you dump contents of incompat_flags and csum_type at both >>> fs_info->super_copy >>> and >>> fs_info->super_for_commit >> >> Not really needed, as when corruption happens, it's super_copy >> corrupted, not something went wrong after we called memcpy() > > As shown below, we aren't memcpy()-ing in the btrfs_sync_log() thread, > did you check if btrfs_sync_log() can not be the last person to write > at umount? I checked the dump super output, where log_tree output is all 0, means no log tree, hence not btrfs_sync_log() caused the problem. From Ben's: -- chunk_root 5518540881920 chunk_root_level1 log_root0 log_root_transid0 log_root_level 0 -- And from Ken's -- chunk_root21004288 chunk_root_level1 log_root0 log_root_transid0 log_root_level0 -- So at least for current only reports, it's not btrfs_sync_log() causing the problem. Thanks, Qu > > Thanks, Anand > >> Thanks, >> Qu >> >>> >>> Because at each commit transaction we >>> >>> btrfs_commit_transaction() >>> { >>> :: >>> memcpy(fs_info->super_for_commit, fs_info->super_copy, >>> sizeof(*fs_info->super_copy)); >>> :: >>> ret = write_all_supers(fs_info, 0); >>> >>> } >>> >>> And also the sync log can write the >>> >>> btrfs_sync_log() >>> { >>> :: >>> ret = write_all_supers(fs_info, 1); >>> >>> >>> Finally locks between these two threads needs a review as well. >>> >>> Thanks, Anand >>> -- >>> To unsubscribe from this list: send the line "unsubscribe >>> linux-btrfs" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> signature.asc Description: OpenPGP digital signature
Re: [PATCH] btrfs: Do super block verification before writing it to disk
On 04/17/2018 05:58 PM, Qu Wenruo wrote: On 2018年04月17日 17:05, Anand Jain wrote: v3: Update commit message to show the corruption in details. Modify the kernel error message to show corruption is detected before transaction commitment. Nice. Thanks. more below. @@ -3310,6 +3311,27 @@ static int write_dev_supers(struct btrfs_device *device, btrfs_set_super_bytenr(sb, bytenr); + /* check the validation of the primary sb before writing */ + if (i == 0) { + ret = btrfs_check_super_valid(device->fs_info, sb); + if (ret) { + btrfs_err(device->fs_info, +"superblock corruption detected before transaction commitment for device %llu", + device->devid); + return -EUCLEAN; + } Why not move this entire check further below, after we have the ready crc and use btrfs_check_super_csum(), instead of btrfs_check_super_valid()? so that we verify only what is known to be corrupted that is .. The problem is, we don't know the cause yet, so we must check the whole superblock. For example, if the corruption is caused by some wild pointer of other kernel module, and we're just unlucky that one day it corrupts nodesize, then we can't detect it if we only check certain members. Right I notice that. But without btrfs_check_super_csum(), it leaves out checking one of the member (csum_type) which is know to be corrupted at the two instances, so it can also include btrfs_check_super_csum(). There were two cases, both of them corrupted the same offset, its not just a coincidence that both of these reported corrupted the same offset. Also the incompatible features flags (169) are still valid in both the cases. It looks as if we wrote u32 to a u64. I notice that we provide the options to write the incompatible flags through mount option, sysfs and ioctl. btrfs_super_block { :: __le64 incompat_flags; __le16 csum_type; :: } And also can you dump contents of incompat_flags and csum_type at both fs_info->super_copy and fs_info->super_for_commit Not really needed, as when corruption happens, it's super_copy corrupted, not something went wrong after we called memcpy() As shown below, we aren't memcpy()-ing in the btrfs_sync_log() thread, did you check if btrfs_sync_log() can not be the last person to write at umount? Thanks, Anand Thanks, Qu Because at each commit transaction we btrfs_commit_transaction() { :: memcpy(fs_info->super_for_commit, fs_info->super_copy, sizeof(*fs_info->super_copy)); :: ret = write_all_supers(fs_info, 0); } And also the sync log can write the btrfs_sync_log() { :: ret = write_all_supers(fs_info, 1); Finally locks between these two threads needs a review as well. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: clean up resources during umount after trans is aborted
On Sat, Mar 31, 2018 at 06:11:56AM +0800, Liu Bo wrote: > Currently if some fatal errors occur, like all IO get -EIO, resources > would be cleaned up when > a) transaction is being committed or > b) BTRFS_FS_STATE_ERROR is set > > However, in some rare cases, resources may be left alone after transaction > gets aborted and umount may run into some ASSERT(), e.g. > ASSERT(list_empty(_group->dirty_list)); > > For case a), in btrfs_commit_transaciton(), there're several places at the > beginning where we just call btrfs_end_transaction() without cleaning up > resources. For case b), it is possible that the trans handle doesn't have > any dirty stuff, then only trans hanlde is marked as aborted while > BTRFS_FS_STATE_ERROR is not set, so resources remain in memory. I have some doubts here. The flag BTRFS_FS_STATE_TRANS_ABORTED was added to avoid duplicate warnings when the transaction is aborted for the first time. And nothing else. BTRFS_FS_STATE_ERROR should track the overall state and is checked in several places if a post-error cleanup is necessary. Calling abort should set BTRFS_FS_STATE_ERROR immediatelly so any caller up the callchaing can potentially cleanup. > This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that > all resources won't stay in memory after umount. The question is why BTRFS_FS_STATE_ERROR is not set but we still got here and the filesystem is in an error state. You're not specific about the rare cases where the resources are left after abort and the umount is about to start. I think the rare cases need to be identified and understood otherwise this seems like papering over the problem. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Add udev-md-raid-safe-timeouts.rules
On 2018-04-16 13:10, Chris Murphy wrote: Adding linux-usb@ and linux-scsi@ (This email does contain the thread initiating email, but some replies are on the other lists.) On Mon, Apr 16, 2018 at 5:43 AM, Austin S. Hemmelgarnwrote: On 2018-04-15 21:04, Chris Murphy wrote: I just ran into this: https://github.com/neilbrown/mdadm/pull/32/commits/af1ddca7d5311dfc9ed60a5eb6497db1296f1bec This solution is inadequate, can it be made more generic? This isn't an md specific problem, it affects Btrfs and LVM as well. And in fact raid0, and even none raid setups. There is no good reason to prevent deep recovery, which is what happens with the default command timer of 30 seconds, with this class of drive. Basically that value is going to cause data loss for the single device and also raid0 case, where the reset happens before deep recovery has a chance. And even if deep recovery fails to return user data, what we need to see is the proper error message: read error UNC, rather than a link reset message which just obfuscates the problem. This has been discussed at least once here before (probably more times, hard to be sure since it usually comes up as a side discussion in an only marginally related thread). Last I knew, the consensus here was that it needs to be changed upstream in the kernel, not by adding a udev rule because while the value is technically system policy, the default policy is brain-dead for anything but the original disks it was i9ntended for (30 seconds works perfectly fine for actual SCSI devices because they behave sanely in the face of media errors, but it's horribly inadequate for ATA devices). To re-iterate what I've said before on the subject: For ATA drives it should probably be 150 seconds. That's 30 seconds beyond the typical amount of time most consumer drives will keep retrying a sector, so even if it goes the full time to try and recover a sector this shouldn't trigger. The only people this change should negatively impact are those who have failing drives which support SCT ERC and have it enabled, but aren't already adjusting this timeout. For physical SCSI devices, it should continue to be 30 seconds. SCSI disks are sensible here and don't waste your time trying to recover a sector. For PV-SCSI devices, it should probably be adjusted too, but I don't know what a reasonable value is. For USB devices it should probably be higher than 30 seconds, but again I have no idea what a reasonable value is. I don't know how all of this is designed but it seems like there's only one location for the command timer, and the SCSI driver owns it, and then everyone else (ATA and USB and for all I know SAN) are on top of that and lack any ability to have separate timeouts. On the note of SAN, iSCSI is part of the SCSI subsystem, so it gets applied directly there. I'm pretty sure NBD has it's own thing, and I think the same is true of ATAoE. As far as USB, UMS is essentially a stripped down version of SCSI with it's own limitations, and UAS _is_ SCSI, with both of those having pretty much always been routed through the SCSI subsystem. The nice thing about the udev rule is that it tests for SCT ERC before making a change. There certainly are enterprise and almost enterprise "NAS" SATA drives that have short SCT ERC times enabled out of the box - and the udev method makes them immune to the change. The kernel could just as easily look for that too though. From what I've seen however, other failure sources that wouldn't trigger SCT ERC on SATA drives are really rare, usually it means a bad cable, bad drive electronics, or a bad storage controller, so i don't think having it set really high for SCT ERC enabled drives is likely to be much of an issue most of the time. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Add udev-md-raid-safe-timeouts.rules
On 2018-04-16 11:02, Wol's lists wrote: On 16/04/18 12:43, Austin S. Hemmelgarn wrote: On 2018-04-15 21:04, Chris Murphy wrote: I just ran into this: https://github.com/neilbrown/mdadm/pull/32/commits/af1ddca7d5311dfc9ed60a5eb6497db1296f1bec This solution is inadequate, can it be made more generic? This isn't an md specific problem, it affects Btrfs and LVM as well. And in fact raid0, and even none raid setups. There is no good reason to prevent deep recovery, which is what happens with the default command timer of 30 seconds, with this class of drive. Basically that value is going to cause data loss for the single device and also raid0 case, where the reset happens before deep recovery has a chance. And even if deep recovery fails to return user data, what we need to see is the proper error message: read error UNC, rather than a link reset message which just obfuscates the problem. This has been discussed at least once here before (probably more times, hard to be sure since it usually comes up as a side discussion in an only marginally related thread). Sorry, but where is "here"? This message is cross-posted to about three lists at least ... Oops, didn't see the extra lists listed. In this case, discussed previously on the BTRFS ML. Last I knew, the consensus here was that it needs to be changed upstream in the kernel, not by adding a udev rule because while the value is technically system policy, the default policy is brain-dead for anything but the original disks it was i9ntended for (30 seconds works perfectly fine for actual SCSI devices because they behave sanely in the face of media errors, but it's horribly inadequate for ATA devices). To re-iterate what I've said before on the subject: imho (and it's probably going to be a pain to implement :-) there should be a soft time-out and a hard time-out. The soft time-out should trigger "drive is taking too long to respond" messages that end up in a log - so that people who actually care can keep a track of this sort of thing. The hard timeout should be the current set-up, where the kernel just gives up. Agreed, although as pointed out by Roger in his reply to this, it kind of already works this way in some cases. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Fix wrong btrfs_delalloc_release_extents parameter
Commit 43b18595d660 ("btrfs: qgroup: Use separate meta reservation type for delalloc") merged into mainline is not the latest version submitted to mail list in Dec 2017. It has a fatal wrong @qgroup_free parameter, which results increasing qgroup metadata pertrans reserved space, and causing a lot of early EDQUOT. Fix it by applying the correct diff on top of current branch. Fixes: 43b18595d660 ("btrfs: qgroup: Use separate meta reservation type for delalloc") Signed-off-by: Qu Wenruo--- fs/btrfs/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index dd1b79750d6a..9701aa7361f3 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1763,7 +1763,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, unlock_extent_cached(_I(inode)->io_tree, lockstart, lockend, _state); btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes, - (ret != 0)); + true); if (ret) { btrfs_drop_pages(pages, num_pages); break; -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel unaligned access at ... btrfs_real_readdir+0x51c/0x718 [btrfs]
Hi, On 16 Apr 2018, at 21:15, David Sterbawrote: > On Mon, Apr 16, 2018 at 09:55:45PM +0200, René Rebe wrote: >> Hi, >> >> On 04/16/2018 06:48 PM, David Sterba wrote: >>> The warnings are valid, there's unaligned access introduced by patch >>> >>> 23b5ec74943f44378b68c0edd8e210a86318ea5e >>> btrfs: fix readdir deadlock with pagefault >>> >>> The directory entries (struct dir_entry) are copied to a temporary >>> buffer as they fit, ie. no alignment, and the members accessed in >>> several places. >>> >>> The following patch adds the proper unaligned access, only compile-tested. >>> Please test and let me know, thanks! >> Would have loved to immediately give it a try, however, sorry, >> I forgot to mention I'm on the latest stable release -4.16.2- >> on a first glance this does not look like it does just apply. >> >> I would re-base myself if I would not also have a glibc initialization >> bug to hunt and debug, too :-/ >> >> If you happen to also rebase it for current -stable, ... ;-) > > Sure, attached a 4.16.2 version. > <0001-test-readdir-unaligned-access.patch> Cool, thanks, built and so far it works, the warnings are gone. https://www.youtube.com/watch?v=XYsKct4T2xk Greetings form Berlin, René -- ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin http://exactcode.com | http://exactscan.com | http://ocrkit.com | http://t2-project.org | http://rene.rebe.de -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Do super block verification before writing it to disk
On 2018年04月17日 17:05, Anand Jain wrote: > >> v3: >> Update commit message to show the corruption in details. >> Modify the kernel error message to show corruption is detected before >> transaction commitment. > Nice. Thanks. more below. > >> @@ -3310,6 +3311,27 @@ static int write_dev_supers(struct btrfs_device >> *device, >> btrfs_set_super_bytenr(sb, bytenr); >> + /* check the validation of the primary sb before writing */ >> + if (i == 0) { >> + ret = btrfs_check_super_valid(device->fs_info, sb); >> + if (ret) { >> + btrfs_err(device->fs_info, >> +"superblock corruption detected before transaction commitment for >> device %llu", >> + device->devid); >> + return -EUCLEAN; >> + } > > Why not move this entire check further below, after we have the ready > crc and use btrfs_check_super_csum(), instead of > btrfs_check_super_valid()? so that we verify only what is known to be > corrupted that is .. The problem is, we don't know the cause yet, so we must check the whole superblock. For example, if the corruption is caused by some wild pointer of other kernel module, and we're just unlucky that one day it corrupts nodesize, then we can't detect it if we only check certain members. > > btrfs_super_block { > :: > __le64 incompat_flags; > __le16 csum_type; > :: > } > > And also can you dump contents of incompat_flags and csum_type at both > fs_info->super_copy > and > fs_info->super_for_commit Not really needed, as when corruption happens, it's super_copy corrupted, not something went wrong after we called memcpy() Thanks, Qu > > Because at each commit transaction we > > btrfs_commit_transaction() > { > :: > memcpy(fs_info->super_for_commit, fs_info->super_copy, > sizeof(*fs_info->super_copy)); > :: > ret = write_all_supers(fs_info, 0); > > } > > And also the sync log can write the > > btrfs_sync_log() > { > :: > ret = write_all_supers(fs_info, 1); > > > Finally locks between these two threads needs a review as well. > > Thanks, Anand > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: OpenPGP digital signature
Re: [PATCH] btrfs: Do super block verification before writing it to disk
v3: Update commit message to show the corruption in details. Modify the kernel error message to show corruption is detected before transaction commitment. Nice. Thanks. more below. @@ -3310,6 +3311,27 @@ static int write_dev_supers(struct btrfs_device *device, btrfs_set_super_bytenr(sb, bytenr); + /* check the validation of the primary sb before writing */ + if (i == 0) { + ret = btrfs_check_super_valid(device->fs_info, sb); + if (ret) { + btrfs_err(device->fs_info, +"superblock corruption detected before transaction commitment for device %llu", + device->devid); + return -EUCLEAN; + } Why not move this entire check further below, after we have the ready crc and use btrfs_check_super_csum(), instead of btrfs_check_super_valid()? so that we verify only what is known to be corrupted that is .. btrfs_super_block { :: __le64 incompat_flags; __le16 csum_type; :: } And also can you dump contents of incompat_flags and csum_type at both fs_info->super_copy and fs_info->super_for_commit Because at each commit transaction we btrfs_commit_transaction() { :: memcpy(fs_info->super_for_commit, fs_info->super_copy, sizeof(*fs_info->super_copy)); :: ret = write_all_supers(fs_info, 0); } And also the sync log can write the btrfs_sync_log() { :: ret = write_all_supers(fs_info, 1); Finally locks between these two threads needs a review as well. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: delayed-inode: Remove wrong qgroup meta reservation calls
Commit 4f5427ccce5d ("btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item") merged into mainline is not the updated version submitted to the mail list in Dec 2017. Which lacks the following fixes: 1) Remove btrfs_qgroup_convert_reserved_meta() call in btrfs_delayed_item_release_metadata() 2) Remove btrfs_qgroup_reserve_meta_prealloc() call in btrfs_delayed_inode_reserve_metadata() Those fixes will resolve unexpected EDQUOT problems. Fixes: 4f5427ccce5d ("btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item") Signed-off-by: Qu Wenruo--- fs/btrfs/delayed-inode.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 86ec2edc05e8..ac4fa7218c57 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -569,6 +569,12 @@ static int btrfs_delayed_item_reserve_metadata(struct btrfs_trans_handle *trans, dst_rsv = _info->delayed_block_rsv; num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1); + + /* +* Here we migrate space rsv from transaction rsv, since have +* already reserved space when starting a transaction. +* So no need to reserve qgroup space here. +*/ ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes, 1); if (!ret) { trace_btrfs_space_reservation(fs_info, "delayed_item", @@ -590,7 +596,10 @@ static void btrfs_delayed_item_release_metadata(struct btrfs_root *root, return; rsv = _info->delayed_block_rsv; - btrfs_qgroup_convert_reserved_meta(root, item->bytes_reserved); + /* +* Check btrfs_delayed_item_reserve_metadata() to see why we don't need +* to release/reserve qgroup space. +*/ trace_btrfs_space_reservation(fs_info, "delayed_item", item->key.objectid, item->bytes_reserved, 0); @@ -615,9 +624,6 @@ static int btrfs_delayed_inode_reserve_metadata( num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1); - ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true); - if (ret < 0) - return ret; /* * btrfs_dirty_inode will update the inode under btrfs_join_transaction * which doesn't reserve space for speed. This is a problem since we @@ -629,6 +635,10 @@ static int btrfs_delayed_inode_reserve_metadata( */ if (!src_rsv || (!trans->bytes_reserved && src_rsv->type != BTRFS_BLOCK_RSV_DELALLOC)) { + ret = btrfs_qgroup_reserve_meta_prealloc(root, + fs_info->nodesize, true); + if (ret < 0) + return ret; ret = btrfs_block_rsv_add(root, dst_rsv, num_bytes, BTRFS_RESERVE_NO_FLUSH); /* @@ -647,7 +657,9 @@ static int btrfs_delayed_inode_reserve_metadata( "delayed_inode", btrfs_ino(inode), num_bytes, 1); - } + } else + btrfs_qgroup_free_meta_prealloc(root, + fs_info->nodesize); return ret; } -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: libbtrfsutil: python: use top=0 as default value in SubvolumeIterator
Document of SubvolumeIterator says: SubvolumeIterator(path, top=0, info=False, post_order=False) -> new subvolume iterator ... top -- if not zero, instead of only listing subvolumes beneath the given path, list subvolumes beneath the subvolume with this ID; passing BTRFS_FS_TREE_OBJECTID (5) here lists all subvolumes. The subvolumes are listed relative to the subvolume with this ID. ... So, change the default value of @top from 5 to 0 accordingly. Signed-off-by: Tomohiro Misono--- libbtrfsutil/python/subvolume.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libbtrfsutil/python/subvolume.c b/libbtrfsutil/python/subvolume.c index 069e606b..6ecde1f6 100644 --- a/libbtrfsutil/python/subvolume.c +++ b/libbtrfsutil/python/subvolume.c @@ -525,7 +525,7 @@ static int SubvolumeIterator_init(SubvolumeIterator *self, PyObject *args, static char *keywords[] = {"path", "top", "info", "post_order", NULL}; struct path_arg path = {.allow_fd = true}; enum btrfs_util_error err; - unsigned long long top = 5; + unsigned long long top = 0; int info = 0; int post_order = 0; int flags = 0; -- 2.14.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html