Re: cause of dmesg call traces?
On 26.08.2017 23:30, Adam Bahe wrote: > Hello all. Recently I added another 10TB sas drive to my btrfs array > and I have received the following messages in dmesg during the > balance. I was hoping someone could clarify what seems to be causing > this. > > Some additional info, I did a smartctl long test and one of my brand > new 8TB drives warned me with this: > > 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 136 > # 5 Extended offlineCompleted: servo/seek failure 90% > 474 0 > > Are the messages in dmesg caused by the issues with the hard drive, or > something else entirely? A few months ago I had a total failure > requiring a complete nuke and pave so I am trying to track down any > potential issues aggressively and appreciate any help. Thanks! > > Also, how many current_pending_sectors do you tolerate before you swap > a drive? I am going to pull this drive as soon as this current balance > finishes. But for future reference it would be good to keep an eye on. > > > > [Sat Aug 26 03:01:53 2017] WARNING: CPU: 30 PID: 5516 at > fs/btrfs/extent-tree.c:3197 btrfs_cross_ref_exist+0xd1/0xf0 [btrfs] > > [Sat Aug 26 03:01:53 2017] Modules linked in: dm_mod rpcrdma ib_isert > iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt > target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm > ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core sb_edac > edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm > irqbypass iTCO_wdt crct10dif_pclmul iTCO_vendor_support crc32_pclmul > ghash_clmulni_intel pcbc ext4 aesni_intel jbd2 crypto_simd mbcache > glue_helper cryptd intel_cstate intel_rapl_perf ses enclosure pcspkr > mei_me lpc_ich input_leds i2c_i801 joydev mfd_core mei sg ioatdma > shpchp wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter > acpi_pad nfsd auth_rpcgss nfs_acl 8021q lockd garp grace mrp sunrpc > ip_tables btrfs xor raid6_pq mlx4_en sd_mod crc32c_intel mlx4_core ast > i2c_algo_bit ata_generic > > [Sat Aug 26 03:01:53 2017] pata_acpi drm_kms_helper syscopyarea > sysfillrect sysimgblt fb_sys_fops ttm drm ixgbe ata_piix mdio mpt3sas > ptp raid_class pps_core libata scsi_transport_sas dca fjes > > [Sat Aug 26 03:01:53 2017] CPU: 30 PID: 5516 Comm: kworker/u97:5 > Tainted: GW 4.10.6-1.el7.elrepo.x86_64 #1 You are not even using upstream kernel, but some redhat-like derivative. If you'd like to get support on this list, please test with an upstream kernel otherwise all bets are off what kind of code you might be running. > > [Sat Aug 26 03:01:53 2017] Hardware name: Supermicro Super > Server/X10DRi-T4+, BIOS 2.0 12/17/2015 > > [Sat Aug 26 03:01:53 2017] Workqueue: writeback wb_workfn (flush-btrfs-2) > > [Sat Aug 26 03:01:53 2017] Call Trace: > > [Sat Aug 26 03:01:53 2017] dump_stack+0x63/0x87 > > [Sat Aug 26 03:01:53 2017] __warn+0xd1/0xf0 > > [Sat Aug 26 03:01:53 2017] warn_slowpath_null+0x1d/0x20 > > [Sat Aug 26 03:01:53 2017] btrfs_cross_ref_exist+0xd1/0xf0 [btrfs] > > [Sat Aug 26 03:01:53 2017] run_delalloc_nocow+0x6e7/0xc00 [btrfs] > > [Sat Aug 26 03:01:53 2017] ? test_range_bit+0xd0/0x160 [btrfs] > > [Sat Aug 26 03:01:53 2017] run_delalloc_range+0x7d/0x3a0 [btrfs] > > [Sat Aug 26 03:01:53 2017] ? > find_lock_delalloc_range.constprop.56+0x1d1/0x200 [btrfs] > > [Sat Aug 26 03:01:53 2017] writepage_delalloc.isra.48+0x10c/0x170 [btrfs] > > [Sat Aug 26 03:01:53 2017] __extent_writepage+0xd6/0x2e0 [btrfs] > > [Sat Aug 26 03:01:53 2017] > extent_write_cache_pages.isra.44.constprop.59+0x2c4/0x480 [btrfs] > > [Sat Aug 26 03:01:53 2017] extent_writepages+0x5c/0x90 [btrfs] > > [Sat Aug 26 03:01:53 2017] ? btrfs_submit_direct+0x8b0/0x8b0 [btrfs] > > [Sat Aug 26 03:01:53 2017] btrfs_writepages+0x28/0x30 [btrfs] > > [Sat Aug 26 03:01:53 2017] do_writepages+0x1e/0x30 > > [Sat Aug 26 03:01:53 2017] __writeback_single_inode+0x45/0x330 > > [Sat Aug 26 03:01:53 2017] writeback_sb_inodes+0x280/0x570 > > [Sat Aug 26 03:01:53 2017] __writeback_inodes_wb+0x8c/0xc0 > > [Sat Aug 26 03:01:53 2017] wb_writeback+0x276/0x310 > > [Sat Aug 26 03:01:53 2017] wb_workfn+0x2e1/0x410 > > [Sat Aug 26 03:01:53 2017] process_one_work+0x165/0x410 > > [Sat Aug 26 03:01:53 2017] worker_thread+0x137/0x4c0 > > [Sat Aug 26 03:01:53 2017] kthread+0x101/0x140 > > [Sat Aug 26 03:01:53 2017] ? rescuer_thread+0x3b0/0x3b0 > > [Sat Aug 26 03:01:53 2017] ? kthread_park+0x90/0x90 > > [Sat Aug 26 03:01:53 2017] ret_from_fork+0x2c/0x40 > > [Sat Aug 26 03:01:53 2017] ---[ end trace 7ba8e3b5c60c322d ]--- > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: deleted subvols don't go away?
On 28.08.2017 06:43, Janos Toth F. wrote: > ID=5 is the default, "root" or "toplevel" subvolume which can't be > deleted anyway (at least normally, I am not sure if some debug-magic > can achieve that). > I just checked this (out of curiosity) and all my Btrfs filesystems > report something very similar to yours (I thought DELETED was a made > up example but I see it was literal...): > > ~ # btrfs sub list -a / > ID 303 gen 172881 top level 5 path /gentoo > ~ # btrfs sub list -ad / > ID 5 gen 172564 top level 0 path /DELETED This seems to be coming form the userspace tools, specifically the filter_and_sort_subvol() function. So this function in turn calls resolve_root and if it returns -ENOENT, meaning it couldn't resolve a root. Then DELETED is returned. On a quick inspection of the code it seems that even for deleted subvolumes btrfs still retains the ROOT_ITEM for the subvolume but since all ROOT_BACKREF are deleted then the name of the tree cannot be resolved (since it's stored in the root_backref). For example I did: btrfs subvolume create /media/scratch/subvol1 && sync btrfs inspect-internal dump-tree -t root /dev/vdc item 14 key (258 ROOT_ITEM 0) itemoff 12972 itemsize 439 generation 11 root_dirid 256 bytenr 29949952 level 0 refs 1 lastsnap 0 byte_limit 0 bytes_used 16384 flags 0x0(none) uuid 217fd861-4606-1146-b5ee-59fba8d37f8c ctransid 11 otransid 10 stransid 0 rtransid 0 drop key (0 UNKNOWN.0 0) level 0 item 15 key (258 ROOT_BACKREF 5) itemoff 12947 itemsize 25 root backref key dirid 256 sequence 4 name subvol1 Afterwards, I deleted the subvolume: btrfs subvolume delete -v /media/scratch/subvol1/ && sync item 13 key (258 ROOT_ITEM 0) itemoff 12997 itemsize 439 generation 11 root_dirid 256 bytenr 29949952 level 0 refs 0 lastsnap 0 byte_limit 0 bytes_used 16384 flags 0x1(none) uuid 217fd861-4606-1146-b5ee-59fba8d37f8c ctransid 11 otransid 10 stransid 0 rtransid 0 drop key (0 UNKNOWN.0 0) level 0 > > I guess this entry is some placeholder, like a hidden "trash" > directory on some filesystems. I don't think this means all Btrfs > filesystems forever hold on to their last deleted subvolumes (and only > one). > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: status of inline deduplication in btrfs
On Sat, Aug 26, 2017 at 9:45 PM, Adam Borowskiwrote: > On Sat, Aug 26, 2017 at 01:36:35AM +, Duncan wrote: >> The second has to do with btrfs scaling issues due to reflinking, which >> of course is the operational mechanism for both snapshotting and dedup. >> Snapshotting of course reflinks the entire subvolume, so it's reflinking >> on a /massive/ scale. While normal file operations aren't affected much, >> btrfs maintenance operations such as balance and check scale badly enough >> with snapshotting (due to the reflinking) that keeping the number of >> snapshots per subvolume under 250 or so is strongly recommended, and >> keeping them to double-digits or even single-digits is recommended if >> possible. >> >> Dedup works by reflinking as well, but its effect on btrfs maintenance >> will be far more variable, depending of course on how effective the >> deduping, and thus the reflinking, is. But considering that snapshotting >> is effectively 100% effective deduping of the entire subvolume (until the >> snapshot and active copy begin to diverge, at least), that tends to be >> the worst case, so figuring a full two-copy dedup as equivalent to one >> snapshot is a reasonable estimate of effect. If dedup only catches 10%, >> only once, than it would be 10% of a snapshot's effect. If it's 10% but >> there's 10 duplicated instances, that's the effect of a single snapshot. >> Assuming of course that the dedup domain is the same as the subvolume >> that's being snapshotted. This looks to me a debate between using inline dedup Vs snapshotting or more precisely, doing a dedupe via snapshots? Did I understand it correct? if yes, does it mean people are still in thoughts if current design and proposal to inline dedup is right way to go for? > > Nope, snapshotting is not anywhere near the worst case of dedup: > > [/]$ find /bin /sbin /lib /usr /var -type f -exec md5sum '{}' +| > cut -d' ' -f1|sort|uniq -c|sort -nr|head > > Even on the system parts (ie, ignoring my data) of my desktop, top files > have the following dup counts: 532 384 373 164 123 122 101. On this small > SSD, the system parts are reflinked by snapshots with 10 dailies, and by > deduping with 10 regular chroots, 11 sbuild chroots and 3 full-system lxc > containers (chroots are mostly a zoo of different architectures). > > This is nothing compared to the backup server, which stores backups of 46 > machines (only system/user and small data, bulky stuff is backed up > elsewhere), 24 snapshots each (a mix of dailies, 1/11/21, monthlies and > yearly). This worked well enough until I made the mistake of deduping the > whole thing. > > But, this is still not the worst horror imaginable. I'd recommend using > whole-file dedup only as this avoids this pitfall: take two VM images, run > block dedup on them. Identical blocks in them will be cross-reflinked. And > there's _many_. The vast majority of duplicate blocks are all-zero: I just > ran fallocate -d on a 40G win10 VM and it shrank to 19G. AFAIK > file_extent_same is not yet smart enough to dedupe them to a hole instead. > Am bit confused over here, is your description based on offline-dedupe here Or its with inline deduplication? Thanks Shally > > Meow! > -- > ⢀⣴⠾⠻⢶⣦⠀ > ⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!? > ⢿⡄⠘⠷⠚⠋⠀ -- Genghis Ht'rok'din > ⠈⠳⣄ > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: deleted subvols don't go away?
Thanks... Still a bit strange that it displays that entry... especially with a generation that seems newer than what I thought was the actually last generation on the fs. Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature
Re: status of inline deduplication in btrfs
On Mon, Aug 28, 2017 at 12:49:10PM +0530, shally verma wrote: > Am bit confused over here, is your description based on offline-dedupe > here Or its with inline deduplication? It doesn't matter _how_ you get to excessive reflinking, the resulting slowdown is the same. By the way, you can try "bees", it does nearline-dedupe which is for practical purposes as good as fully online, and unlike the latter, has no way to damage your data in case of bugs (mistaken userland dedupe can at most make the kernel pointlessly read and compare data). I haven't tried it myself, but what it does is dedupe using FILE_EXTENT_SAME asynchronously right after a write gets put into the page cache, which in most cases is quick enough to avoid writeout. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!? ⢿⡄⠘⠷⠚⠋⠀ -- Genghis Ht'rok'din ⠈⠳⣄ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: status of inline deduplication in btrfs
On 2017-08-28 06:32, Adam Borowski wrote: On Mon, Aug 28, 2017 at 12:49:10PM +0530, shally verma wrote: Am bit confused over here, is your description based on offline-dedupe here Or its with inline deduplication? It doesn't matter _how_ you get to excessive reflinking, the resulting slowdown is the same. By the way, you can try "bees", it does nearline-dedupe which is for practical purposes as good as fully online, and unlike the latter, has no way to damage your data in case of bugs (mistaken userland dedupe can at most make the kernel pointlessly read and compare data). I haven't tried it myself, but what it does is dedupe using FILE_EXTENT_SAME asynchronously right after a write gets put into the page cache, which in most cases is quick enough to avoid writeout. I would also recommend looking at 'bees'. If you absolutely _must_ have online or near-online deduplication, then this is your best option currently from a data safety perspective. That said, it's worth pointing out that in-line deduplication is not always the best answer. In fact, it's quite often a sub-optimal answer compared to a combination of compression, sparse files, and batch deduplication. Compression and usage of sparse files will get you about the same space savings most of the time as in-line deduplication (I've tested this on ZFS on FreeBSD using native in-line deduplication, and with BTRFS on Linux using bees) while using much less memory, and about the same amount of processor time. In the event that you need better space savings than that, you're better off using batch deduplication because it gives you better control over when you're using more system resources and will often get better overall results than in-line deduplication. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: deleted subvols don't go away?
On 28.08.2017 11:07, Christoph Anton Mitterer wrote: > Thanks... > > Still a bit strange that it displays that entry... especially with a > generation that seems newer than what I thought was the actually last > generation on the fs. Snapshot destroy is a 2-phase process. The first phase deletes just the root references. After it you see what you've described. Then, later, when the cleaner thread runs again the snapshot's root item is going to be deleted for good and you no longer will see it. > > Cheers, > Chris. > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: deleted subvols don't go away?
On Mon, Aug 28, 2017 at 03:03:47PM +0300, Nikolay Borisov wrote: > > > On 28.08.2017 11:07, Christoph Anton Mitterer wrote: > > Thanks... > > > > Still a bit strange that it displays that entry... especially with a > > generation that seems newer than what I thought was the actually last > > generation on the fs. > > Snapshot destroy is a 2-phase process. The first phase deletes just the > root references. After it you see what you've described. Then, later, > when the cleaner thread runs again the snapshot's root item is going to > be deleted for good and you no longer will see it. It's worth noting also that if the subvol is still used in some way (still mounted, nested subvol, processes with CWD in it, open files), then it won't be cleaned up until the usage stops. Basically the same behaviour as deleting a file. This could also explain the more recent than expected generation values. Hugo. -- Hugo Mills | "Big data" doesn't just mean increasing the font hugo@... carfax.org.uk | size. http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Re: deleted subvols don't go away?
On Mon, 28 Aug 2017 15:03:47 +0300 Nikolay Borisovwrote: > when the cleaner thread runs again the snapshot's root item is going to > be deleted for good and you no longer will see it. Oh, that's pretty sweet -- it means there's actually a way to reliably wait for cleaner work to be done on all deleted snapshots before unmounting the FS. I was wondering about that recently for some transient filesystems (which get mounted, synced to, snapshot-created/removed, then unmounted). Now can just loop with a few second sleeps until `btrfs sub list -d $PATH` comes up empty. -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] btrfs-progs: mkfs: add subvolume support to mkfs
Hi All, unfortunately, your patch crashes on my PC $ truncate -s 100G /tmp/disk.img $ sudo losetup -f /tmp/disk.img $ # good case $ sudo ./mkfs.btrfs -f -r /tmp/empty/ /dev/loop0 btrfs-progs v4.12.1-1-gf80d059c See http://btrfs.wiki.kernel.org for more information. Making image is completed. Label: (null) UUID: 7cb4927c-d24a-41b3-8151-277ad9064008 Node size: 16384 Sector size:4096 Filesystem size:28.00MiB Block group profiles: Data: single 10.75MiB System: DUP 4.00MiB SSD detected: no Incompat features: extref, skinny-metadata Number of devices: 1 Devices: IDSIZE PATH 128.00MiB /dev/loop0 $ # bad case $ sudo ./mkfs.btrfs -f -S prova -r /tmp/empty/ /dev/loop0 btrfs-progs v4.12.1-1-gf80d059c See http://btrfs.wiki.kernel.org for more information. ERROR: failed to create subvolume: -17 transaction.h:42: btrfs_start_transaction: BUG_ON `fs_info->running_transaction` triggered, value 884442943152 ./mkfs.btrfs(+0x15674)[0xcdeb52c674] ./mkfs.btrfs(close_ctree_fs_info+0x313)[0xcdeb52e80f] ./mkfs.btrfs(main+0x1028)[0xcdeb52381e] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f85a5d1f2e1] ./mkfs.btrfs(_start+0x2a)[0xcdeb520e9a] Aborted Below some further comments On 08/28/2017 01:39 AM, Qu Wenruo wrote: > > > On 2017年08月26日 07:21, Yingyi Luo wrote: >> From: yingyil>> >> Add -S/--subvol [NAME] option to configure. It enables users to create a >> subvolume under the toplevel volume and populate the created subvolume >> with files from the rootdir specified by -r/--rootdir option. >> >> Two functions link_subvol() and create_subvol() are moved from >> convert/main.c to utils.c to enable code reuse. > > What about split the patch as the code move of link/create_subvol() makes > review a little difficult. > > BTW, if exporting link/create_subvol(), what about adding "btrfs_" prefix? > > Thanks, > Qu >> >> Signed-off-by: yingyil >> --- [...] >> --- a/mkfs/main.c >> +++ b/mkfs/main.c >> @@ -365,6 +365,7 @@ static void print_usage(int ret) >> printf(" creation:\n"); >> printf("\t-b|--byte-count SIZE set filesystem size to SIZE (on the >> first device)\n"); >> printf("\t-r|--rootdir DIR copy files from DIR to the image >> root directory\n"); >> + printf("\t-S|--subvol NAME create a sunvolume with NAME and copy >> files from ROOTDIR to the subvolume\n"); >> printf("\t-K|--nodiscard do not perform whole device TRIM\n"); >> printf("\t-f|--force force overwrite of existing >> filesystem\n"); >> printf(" general:\n"); >> @@ -413,6 +414,18 @@ static char *parse_label(const char *input) >> return strdup(input); >> } >> +static char *parse_subvol_name(const char *input) >> +{ >> + int len = strlen(input); >> + >> + if (len >= BTRFS_SUBVOL_NAME_MAX) { >> + error("subvolume name %s is too long (max %d)", >> + input, BTRFS_SUBVOL_NAME_MAX - 1); >> + exit(1); >> + } >> + return strdup(input); why use strdup ? >> +} >> + [...] >> @@ -1517,6 +1533,10 @@ int main(int argc, char **argv) >> PACKAGE_STRING); >> exit(0); >> break; >> + case 'S': >> + subvol_name = parse_subvol_name(optarg); >> + subvol_name_set = 1; >> + break; >> case 'r': >> source_dir = optarg; >> source_dir_set = 1; >> @@ -1537,6 +1557,11 @@ int main(int argc, char **argv) >> } >> } >> + if (subvol_name_set && !source_dir_set) { >> + error("root directory needs to be set"); >> + exit(1); >> + } >> + To me it seems reasonable to create an empty subvolume (below more comments) >> if (verbose) { >> printf("%s\n", PACKAGE_STRING); >> printf("See %s for more information.\n\n", PACKAGE_URL); >> @@ -1876,10 +1901,48 @@ raid_groups: >> goto out; >> } [...] >> ret = cleanup_temp_chunks(fs_info, , data_profile, >> diff --git a/utils.c b/utils.c >> index bb04913..c9bbbed 100644 >> --- a/utils.c >> +++ b/utils.c >> @@ -2574,3 +2574,164 @@ u8 rand_u8(void) >> void btrfs_config_init(void) >> { >> } >> + >> +struct btrfs_root *link_subvol(struct btrfs_root *root, >> + const char *base, u64 root_objectid) >> +{ >> + struct btrfs_trans_handle *trans; [] >> + >> + memcpy(buf, base, len); >> + for (i = 0; i < 1024; i++) { >> + ret = btrfs_insert_dir_item(trans, root, buf, len, >> + dirid, , BTRFS_FT_DIR, index); >> + if (ret != -EEXIST) >> + break; >> + len = snprintf(buf, ARRAY_SIZE(buf), "%s%d", base, i); >> + if (len < 1 || len > BTRFS_NAME_LEN) { >> + ret = -EINVAL; >> + break; >> + }
Re: slow btrfs with a single kworker process using 100% CPU
Hello, a trace of the kworker looks like this: kworker/u24:4-13405 [003] 344186.202535: _cond_resched <-find_free_extent kworker/u24:4-13405 [003] 344186.202535: down_read <-find_free_extent kworker/u24:4-13405 [003] 344186.202535: block_group_cache_done.isra.27 <-find_free_extent kworker/u24:4-13405 [003] 344186.202535: _raw_spin_lock <-find_free_extent kworker/u24:4-13405 [003] 344186.202535: btrfs_find_space_for_alloc <-find_free_extent kworker/u24:4-13405 [003] 344186.202535: _raw_spin_lock <-btrfs_find_space_for_alloc kworker/u24:4-13405 [003] 344186.202536: tree_search_offset.isra.25 <-btrfs_find_space_for_alloc kworker/u24:4-13405 [003] 344186.202554: __get_raid_index <-find_free_extent kworker/u24:4-13405 [003] 344186.202554: up_read <-find_free_extent kworker/u24:4-13405 [003] 344186.202554: btrfs_put_block_group <-find_free_extent kworker/u24:4-13405 [003] 344186.202554: _cond_resched <-find_free_extent kworker/u24:4-13405 [003] 344186.202555: down_read <-find_free_extent kworker/u24:4-13405 [003] 344186.202555: block_group_cache_done.isra.27 <-find_free_extent kworker/u24:4-13405 [003] 344186.202555: _raw_spin_lock <-find_free_extent kworker/u24:4-13405 [003] 344186.202555: btrfs_find_space_for_alloc <-find_free_extent kworker/u24:4-13405 [003] 344186.202555: _raw_spin_lock <-btrfs_find_space_for_alloc kworker/u24:4-13405 [003] 344186.202556: tree_search_offset.isra.25 <-btrfs_find_space_for_alloc kworker/u24:4-13405 [003] 344186.202560: __get_raid_index <-find_free_extent kworker/u24:4-13405 [003] 344186.202560: up_read <-find_free_extent kworker/u24:4-13405 [003] 344186.202561: btrfs_put_block_group <-find_free_extent kworker/u24:4-13405 [003] 344186.202561: _cond_resched <-find_free_extent kworker/u24:4-13405 [003] 344186.202561: down_read <-find_free_extent kworker/u24:4-13405 [003] 344186.202561: block_group_cache_done.isra.27 <-find_free_extent kworker/u24:4-13405 [003] 344186.202561: _raw_spin_lock <-find_free_extent kworker/u24:4-13405 [003] 344186.202562: __get_raid_index <-find_free_extent kworker/u24:4-13405 [003] 344186.202562: up_read <-find_free_extent kworker/u24:4-13405 [003] 344186.202562: btrfs_put_block_group <-find_free_extent kworker/u24:4-13405 [003] 344186.202562: _cond_resched <-find_free_extent kworker/u24:4-13405 [003] 344186.202562: down_read <-find_free_extent kworker/u24:4-13405 [003] 344186.202563: block_group_cache_done.isra.27 <-find_free_extent kworker/u24:4-13405 [003] 344186.202563: _raw_spin_lock <-find_free_extent kworker/u24:4-13405 [003] 344186.202563: __get_raid_index <-find_free_extent kworker/u24:4-13405 [003] 344186.202563: up_read <-find_free_extent kworker/u24:4-13405 [003] 344186.202563: btrfs_put_block_group <-find_free_extent kworker/u24:4-13405 [003] 344186.202563: _cond_resched <-find_free_extent kworker/u24:4-13405 [003] 344186.202564: down_read <-find_free_extent kworker/u24:4-13405 [003] 344186.202564: block_group_cache_done.isra.27 <-find_free_extent kworker/u24:4-13405 [003] 344186.202564: _raw_spin_lock <-find_free_extent kworker/u24:4-13405 [003] 344186.202564: btrfs_find_space_for_alloc <-find_free_extent kworker/u24:4-13405 [003] 344186.202564: _raw_spin_lock <-btrfs_find_space_for_alloc kworker/u24:4-13405 [003] 344186.202565: tree_search_offset.isra.25 <-btrfs_find_space_for_alloc kworker/u24:4-13405 [003] 344186.202566: __get_raid_index <-find_free_extent kworker/u24:4-13405 [003] 344186.202567: up_read <-find_free_extent kworker/u24:4-13405 [003] 344186.202567: btrfs_put_block_group <-find_free_extent kworker/u24:4-13405 [003] 344186.202567: _cond_resched <-find_free_extent kworker/u24:4-13405 [003] 344186.202567: down_read <-find_free_extent kworker/u24:4-13405 [003] 344186.202568: block_group_cache_done.isra.27 <-find_free_extent kworker/u24:4-13405 [003] 344186.202568: _raw_spin_lock <-find_free_extent kworker/u24:4-13405 [003] 344186.202568: btrfs_find_space_for_alloc <-find_free_extent kworker/u24:4-13405 [003] 344186.202568: _raw_spin_lock <-btrfs_find_space_for_alloc kworker/u24:4-13405 [003] 344186.202569: tree_search_offset.isra.25 <-btrfs_find_space_for_alloc kworker/u24:4-13405 [003] 344186.202576: __get_raid_index <-find_free_extent kworker/u24:4-13405 [003] 344186.202576: up_read <-find_free_extent kworker/u24:4-13405 [003] 344186.202577: btrfs_put_block_group <-find_free_extent kworker/u24:4-13405 [003] 344186.202577: _cond_resched <-find_free_extent kworker/u24:4-13405 [003] 344186.202577: down_read <-find_free_extent kworker/u24:4-13405 [003] 344186.202577:
Re: status of inline deduplication in btrfs
shally verma posted on Mon, 28 Aug 2017 12:49:10 +0530 as excerpted: > On Sat, Aug 26, 2017 at 9:45 PM, Adam Borowski> wrote: >> On Sat, Aug 26, 2017 at 01:36:35AM +, Duncan wrote: >>> The second has to do with btrfs scaling issues due to reflinking, >>> which of course is the operational mechanism for both snapshotting and >>> dedup. >>> Snapshotting of course reflinks the entire subvolume, so it's >>> reflinking on a /massive/ scale. While normal file operations aren't >>> affected much, >>> btrfs maintenance operations such as balance and check scale badly >>> enough with snapshotting (due to the reflinking) that keeping the >>> number of snapshots per subvolume under 250 or so is strongly >>> recommended, and keeping them to double-digits or even single-digits >>> is recommended if possible. >>> >>> Dedup works by reflinking as well, but its effect on btrfs maintenance >>> will be far more variable, depending of course on how effective the >>> deduping, and thus the reflinking, is. But considering that >>> snapshotting is effectively 100% effective deduping of the entire >>> subvolume (until the snapshot and active copy begin to diverge, at >>> least), that tends to be the worst case, so figuring a full two-copy >>> dedup as equivalent to one snapshot is a reasonable estimate of >>> effect. >>> If dedup only catches 10%, only once, than it would be 10% of a >>> snapshot's effect. If it's 10% but there's 10 duplicated instances, >>> that's the effect of a single snapshot. Assuming of course that the >>> dedup domain is the same as the subvolume that's being snapshotted. > > This looks to me a debate between using inline dedup Vs snapshotting or > more precisely, doing a dedupe via snapshots? > Did I understand it correct? if yes, does it mean people are still in > thoughts if current design and proposal to inline dedup is right way to > go for? Not that I'm aware of and it wasn't my intent to leave that impression. What I'm saying is that btrfs uses the same underlying mechanism, reflinking, for both snapshotting and dedup. A rather limited but perhaps useful analogy from an /entirely/ different area might be that both single-person bicycles and full-size truck/ trailer rigs use the same underlying mechanism, wheels with tires turning against the ground, to move, while they have vastly different uses and neither one can replace the other. And just as the common to both cases tire has the limitation that it can be punctured and go flat, that applies to both due to the common mechanism used to move, so reflinking has certain limitations that apply to both snapshotting and dedup, due to the common mechanism used in the implementation. Of course taking the analogy much further than that will likely result in comically absurd conclusions, but hopefully when kept within its limits it's useful to convey my point, two technologies with very different usage at the surface level, taking advantage of a common implementation mechanism underneath. And because the underlying mechanism is the same, its limits become the limits of both overlying solutions, however they otherwise differ. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/1] btrfs-progs: mkfs: add subvolume support to mkfs
Add -S/--subvol [NAME] option to configure. It enables users to create a subvolume under the toplevel volume > and populate the created subvolume > with files from the rootdir specified by -r/--rootdir option. This brings two enhancements, those might be good ideas, but stating a specific use case will add the required clarity. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)
On Sat, Jul 15, 2017 at 04:12:45PM -0700, Marc MERLIN wrote: > On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote: > > Dear Chris and other developers, > > > > Can you look at this bug which has been happening since 2012 on apparently > > all kernels between at least > > 3.4 and 4.11. > > I didn't look in detail at each thread (took long enough to even find them > > all and paste here), but they seem pretty > > similar although the reasons how they got there may be different, or at > > least not as benign as a race condition > > between snapshot creation and deletion for those who do hourly snapshot > > rotations like me. > > I just finished 2 check repairs, one with each mode, they both come back > clean. > Yet my FS still remounts read only with the same > BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object > already exists > BTRFS info (device dm-2): forced readonly > BTRFS warning (device dm-2): failed setting block group ro, ret=-30 So this still happens pseudo randomly every 2 weeks maybe? Last one is below. It did not happen during a btrfs snapshot although I'm not entirely sure what else was running at the time. Any update on this problem? [ cut here ] WARNING: CPU: 6 PID: 3783 at fs/btrfs/extent-tree.c:2967 btrfs_run_delayed_refs+0xbd/0x1be BTRFS: Transaction aborted (error -17) Modules linked in: asix veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype br_netfilter bridge stp llc tun autofs4 softdog binfmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter lm85 hwmon_vid pl2303 dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_cmipci snd_mpu401_uart snd_hda_intel snd_opl3_lib snd_hda_codec snd_hda_core snd_hwdep eeepc_wmi snd_rawmidi snd_seq_device tpm_infineon tpm_tis snd_pcm asus_wmi snd_timer tpm_tis_core rc_ati_x10 snd ati_remote sparse_keymap rfkill i2c_i801 usbserial hwmon usbnet libphy pcspkr wmi soundcore input_leds tpm rc_core parport_pc evdev i915 lpc_ich i2c_smbus parport battery mei_me e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common aesni_intel aes_x86_64 lrw glue_helper ablk_helper cryptd sata_sil24 fjes mvsas xhci_pci libsas xhci_hcd ehci_pci ehci_hcd thermal usbcore fan r8169 mii scsi_transport_sas [last unloaded: asix] CPU: 2 PID: 3783 Comm: btrfs-transacti Tainted: G U 4.9.36-amd64-preempt-sysrq-20170406 #1 Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013 b7eb67affc98 ae39b00b b7eb67affce8 b7eb67affcd8 ae066769 0b9767affd58 974f736da960 9756319df000 ffef 975302da7a50 Call Trace: [] dump_stack+0x61/0x7d [] __warn+0xc2/0xdd [] warn_slowpath_fmt+0x5a/0x76 [] btrfs_run_delayed_refs+0xbd/0x1be [] commit_cowonly_roots+0x10d/0x2b2 [] ? btrfs_qgroup_account_extents+0x131/0x181 [] ? btrfs_run_delayed_refs+0x1a6/0x1be [] btrfs_commit_transaction+0x46b/0x8fb [] transaction_kthread+0xf5/0x1a1 [] ? btrfs_cleanup_transaction+0x436/0x436 [] kthread+0xd1/0xd9 [] ? init_completion+0x24/0x24 [] ? do_fast_syscall_32+0xb7/0xfe [] ret_from_fork+0x25/0x30 ---[ end trace 4c5fcb9daa07c11a ]--- BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object already exists BTRFS info (device dm-2): forced readonly BTRFS warning (device dm-2): Skipping commit of aborted transaction. BTRFS: error (device dm-2) in cleanup_transaction:1850: errno=-17 Object already exists BTRFS error (device dm-2): pending csums is 131072 Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html