Re: [PATCH 2/2] Btrfs: resize all devices when we dont assign a specific device id
Hi On 05/17/2012 02:08 PM, Liu Bo wrote: > This patch fixes two bugs: > > When we do not assigne a device id for the resizer, > - it will only take one device to resize, which is supposed to apply on > all available devices. > - it will take 'id 1' device as default, and this will cause a bug as we > may have removed the 'id 1' device from the filesystem. > > After this patch, we can find all available devices by searching the > chunk tree and resize them: I am not sure that this is a sane default for all resizing. If the user want to resize to MAX, I agree that it is a sane default, but when the user want to shrink or enlarge of a fixed quantity, the user should specific the dev id. Because the shrinking and or the enlarging should be evaluated case by case. My suggestion is to change the code at kernel level so in case of multi-volume file-system the user *has* to specify the device to shrink and/or enlarge. Should be the user space btrfs tool to handle the check and the growing (i.e: if the new size is max, automatically grow all the device up to max; otherwise the user should specific the device to shrink and/or enlarge). BR Goffredo > > $ mkfs.btrfs /dev/sdb7 > $ mount /dev/sdb7 /mnt/btrfs/ > $ btrfs dev add /dev/sdb8 /mnt/btrfs/ > > $ btrfs fi resize -100m /mnt/btrfs/ > then we can get from dmesg: > btrfs: new size for /dev/sdb7 is 980844544 > btrfs: new size for /dev/sdb8 is 980844544 > > $ btrfs fi resize max /mnt/btrfs > then we can get from dmesg: > btrfs: new size for /dev/sdb7 is 1085702144 > btrfs: new size for /dev/sdb8 is 1085702144 > > $ btrfs fi resize 1:-100m /mnt/btrfs > then we can get from dmesg: > btrfs: resizing devid 1 > btrfs: new size for /dev/sdb7 is 980844544 > > $ btrfs fi resize 1:-100m /mnt/btrfs > then we can get from dmesg: > btrfs: resizing devid 2 > btrfs: new size for /dev/sdb8 is 980844544 > > Signed-off-by: Liu Bo > --- > fs/btrfs/ioctl.c | 101 > -- > 1 files changed, 83 insertions(+), 18 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index ec2245d..d9a4fa8 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -1250,12 +1250,51 @@ out_ra: > return ret; > } > > +static struct btrfs_device *get_avail_device(struct btrfs_root *root, u64 > devid) > +{ > + struct btrfs_key key; > + struct btrfs_path *path; > + struct btrfs_dev_item *dev_item; > + struct btrfs_device *device = NULL; > + int ret; > + > + path = btrfs_alloc_path(); > + if (!path) > + return ERR_PTR(-ENOMEM); > + > + key.objectid = BTRFS_DEV_ITEMS_OBJECTID; > + key.offset = devid; > + key.type = BTRFS_DEV_ITEM_KEY; > + > + ret = btrfs_search_slot(NULL, root->fs_info->chunk_root, &key, > + path, 0, 0); > + if (ret < 0) { > + device = ERR_PTR(ret); > + goto out; > + } > + btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]); > + if (key.objectid != BTRFS_DEV_ITEMS_OBJECTID || > + key.type != BTRFS_DEV_ITEM_KEY) { > + device = NULL; > + goto out; > + } > + dev_item = btrfs_item_ptr(path->nodes[0], path->slots[0], > + struct btrfs_dev_item); > + devid = btrfs_device_id(path->nodes[0], dev_item); > + > + device = btrfs_find_device(root, devid, NULL, NULL); > +out: > + btrfs_free_path(path); > + return device; > +} > + > static noinline int btrfs_ioctl_resize(struct btrfs_root *root, > void __user *arg) > { > - u64 new_size; > + u64 new_size = 0; > u64 old_size; > - u64 devid = 1; > + u64 orig_new_size = 0; > + u64 devid = (-1ULL); > struct btrfs_ioctl_vol_args *vol_args; > struct btrfs_trans_handle *trans; > struct btrfs_device *device = NULL; > @@ -1263,6 +1302,8 @@ static noinline int btrfs_ioctl_resize(struct > btrfs_root *root, > char *devstr = NULL; > int ret = 0; > int mod = 0; > + int scan_all = 1; > + int use_max = 0; > > if (root->fs_info->sb->s_flags & MS_RDONLY) > return -EROFS; > @@ -1295,8 +1336,31 @@ static noinline int btrfs_ioctl_resize(struct > btrfs_root *root, > devid = simple_strtoull(devstr, &end, 10); > printk(KERN_INFO "btrfs: resizing devid %llu\n", > (unsigned long long)devid); > + scan_all = 0; > } > - device = btrfs_find_device(root, devid, NULL, NULL); > + > + if (!strcmp(sizestr, "max")) { > + use_max = 1; > + } else { > + if (sizestr[0] == '-') { > + mod = -1; > + sizestr++; > + } else if (sizestr[0] == '+') { > + mod = 1; > + sizestr++; > + } > + orig_new_size = memparse(sizestr, NULL); > + if (orig_
Re: SSD erase state and reducing SSD wear
On Tue, 2012-05-22 at 22:47 +0100, Martin wrote: > I've got two recent examples of SSDs. Their pristine state from the > manufacturer shows: > Device Model: OCZ-VERTEX3 > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > Device Model: OCZ VERTEX PLUS > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > What's a good way to test what state they get erased to from a TRIM > operation? This pristine state probably matches up with the result of a trim command on the drive. In particular, a freshly erased flash block is in a state where the bits are all 1, so the Vertex Plus drive is showing you the flash contents directly. The Vertex 3 has substantially more processing, and the 0s are effectively generated on the fly for unmapped flash blocks (similar to how the missing portions of a sparse file contains 0s). > Can btrfs detect the erase state and pad unused space in filesystem > writes with the same value so as to reduce SSD wear? On the Vertex 3, this wouldn't actually do what you'd hope. The firmware in that drive actually compresses, deduplicates, and encrypts all the data prior to writing it to flash - and as a result the data that hits the flash looks nothing like what the filesystem wrote. (For best performance, it might make sense to disable btrfs's built-in compression on the Vertex 3 drive to allow the drive's compression to kick in. Let us know if you benchmark it either way.) The benefit to doing this on the Vertex Plus is probably fairly small, since to rewrite a block - even if the block is partially unwritten - is still likely to require a read-modify-write cycle with an erase step. The granularity of the erase blocks is just too big for the savings to be very meaningful. -- Calvin Walton -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 3/3] Btrfs: read device stats on mount, write modified ones during commit
On 05/22/2012 06:53 PM, Stefan Behrens wrote: > The device statistics are written into the device tree with each > transaction commit. Only modified statistics are written. > When a filesystem is mounted, the device statistics for each involved > device are read from the device tree and used to initialize the > counters. > > Signed-off-by: Stefan Behrens > --- > fs/btrfs/ctree.h | 51 > fs/btrfs/disk-io.c |7 ++ > fs/btrfs/print-tree.c |3 + > fs/btrfs/transaction.c |4 + > fs/btrfs/volumes.c | 205 > > fs/btrfs/volumes.h |9 +++ > 6 files changed, 279 insertions(+) > [...] > +static int update_device_stat_item(struct btrfs_trans_handle *trans, > +struct btrfs_root *dev_root, > +struct btrfs_device *device) > +{ > + struct btrfs_path *path; > + struct btrfs_key key; > + struct extent_buffer *eb; > + struct btrfs_device_stats_item *ptr; > + int ret; > + > + key.objectid = 0; > + key.type = BTRFS_DEVICE_STATS_KEY; > + key.offset = device->devid; > + > + path = btrfs_alloc_path(); > + BUG_ON(!path); > + ret = btrfs_search_slot(trans, dev_root, &key, path, 0, 1); Since we may delete this item, I prefer cow: -1, btrfs_search_slot(trans, dev_root, &key, path, 0, -1); thanks, liubo > + if (ret < 0) { > + printk(KERN_WARNING "btrfs: error %d while searching for > device_stats item for device %s!\n", > +ret, device->name); > + goto out; > + } > + > + if (ret == 0 && > + btrfs_item_size_nr(path->nodes[0], path->slots[0]) < sizeof(*ptr)) { > + /* need to delete old one and insert a new one */ > + ret = btrfs_del_item(trans, dev_root, path); > + if (ret != 0) { > + printk(KERN_WARNING "btrfs: delete too small > device_stats item for device %s failed %d!\n", > +device->name, ret); > + goto out; > + } > + ret = 1; > + } > + > + if (ret == 1) { > + /* need to insert a new item */ > + btrfs_release_path(path); > + ret = btrfs_insert_empty_item(trans, dev_root, path, > + &key, sizeof(*ptr)); > + if (ret < 0) { > + printk(KERN_WARNING "btrfs: insert device_stats item > for device %s failed %d!\n", > +device->name, ret); > + goto out; > + } > + } > + > + eb = path->nodes[0]; > + ptr = btrfs_item_ptr(eb, path->slots[0], > + struct btrfs_device_stats_item); > + btrfs_set_device_stats_cnt_write_io_errs(eb, ptr, > + btrfs_device_stat_read(&device->cnt_write_io_errs)); > + btrfs_set_device_stats_cnt_read_io_errs(eb, ptr, > + btrfs_device_stat_read(&device->cnt_read_io_errs)); > + btrfs_set_device_stats_cnt_flush_io_errs(eb, ptr, > + btrfs_device_stat_read(&device->cnt_flush_io_errs)); > + btrfs_set_device_stats_cnt_corruption_errs(eb, ptr, > + btrfs_device_stat_read(&device->cnt_corruption_errs)); > + btrfs_set_device_stats_cnt_generation_errs(eb, ptr, > + btrfs_device_stat_read(&device->cnt_generation_errs)); > + btrfs_mark_buffer_dirty(eb); > + > +out: > + btrfs_free_path(path); > + return ret; > +} > + > +/* > + * called from commit_transaction. Writes all changed device stats to disk. > + */ > +int btrfs_run_device_stats(struct btrfs_trans_handle *trans, > +struct btrfs_fs_info *fs_info) > +{ > + struct btrfs_root *dev_root = fs_info->dev_root; > + struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; > + struct btrfs_device *device; > + int ret = 0; > + > + mutex_lock(&fs_devices->device_list_mutex); > + list_for_each_entry(device, &fs_devices->devices, dev_list) { > + if (!device->device_stats_valid || !device->device_stats_dirty) > + continue; > + > + ret = update_device_stat_item(trans, dev_root, device); > + if (!ret) > + device->device_stats_dirty = 0; > + } > + mutex_unlock(&fs_devices->device_list_mutex); > + > + return ret; > +} > + > void btrfs_device_stat_print_on_error(struct btrfs_device *device) > { > + if (!device->device_stats_valid) > + return; > printk_ratelimited(KERN_ERR > "btrfs: bdev %s errs: wr %u, rd %u, flush %u, > corrupt %u, gen %u\n", > device->name, > @@ -4639,6 +4828,18 @@ void btrfs_device_stat_print_on_error(struct > btrfs_device *device) > &device->cnt_generation_errs)); > } > > +static void btrfs_device_stat_print_on_load(struct btrfs_devic
Re: warnings met in introduce extent buffer cache for each i-node patch
On Tue, 22 May 2012 09:54:54 -0700, Tim Chen wrote: > Miao, > > I was trying out your patch on scalability testing for BTRFS on v3.3 > kernel. > http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg14930.html > > However, I ran into a lot of warnings (see the dmesg below). Wonder if > you have a more up to date version of this patch? > > In addition, I have to do this modification to fix a warning in your > original patch. Thanks for your test, This patch still has some problem, I'm improve it now. I will send the new one soon. Thanks again Miao > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 892b347..e0210c9 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -4608,7 +4613,7 @@ fail_dir_item: > int err; > > err = btrfs_del_inode_ref(trans, root, name, name_len, > - ino, parent_ino, &local_index); > + inode, parent_ino, &local_index); > } > return ret; > } > > Thanks. > Tim > > > May 22 09:23:57 bigbox kernel: [56455.532138] [ cut here > ] > May 22 09:23:57 bigbox kernel: [56455.532146] NG: at > fs/btrfs/extent_io.c:3795 free_extent_buffer+0x31/6455.532189] Hardware name: > PRIMEQUEST 1800E2 > May 22 09:23:57 bigbox kernel: [56455.53nked in: scsi_ram lockd > nf_conntrack_ipv4 nf_defrag_ipv4 xtfat ioatdma i2c_i801 i7core_edac e1000e > microcode edac_core i2c_core igb iTCO_wdt iTCO_vendor_support dca uinput > sunrpc usb_stortsas mptscsih mptbase scsi_transport_sas [last unloaded: > scsi_wait_sca55.532431] Pid: 4399, comm: btrfs-endio-wri Tainted: GW > 3.3.0c-scsiram-btrfs2+ #30 > May 22 09:23:57 bigbox kernel: [56455.532486] Call Trace: > May 22 09:23:57 bigbox kernel: [56455. [] > warn_slowpath_common+0x7f/0xc0 > May 22 09:23:57 bigbox kernel: [5645] [] > warn_slowpath_null+0x1a/0x20 > May 22 09:23:57 bigbox kernel: [56455.53220] [] > free_extent_buffer+0x31/0x40 > May 22 09:23:57 bigbox kernel: ] [] > read_block_for_search+0x117/0x3d0 > May 22 09:23:57 bigbox kernel: 32559] [] ? > generic_bin_search.constprop.4+0[] ? unlock_up+0x15d/0x190 > May 22 09:23:57 bigbox kernel: [56455.5812c31c1>] > btrfs_search_slot+0x241/0x720 > May 22 09:23:57 bigbox kernel: [56455.5326fff812c3adc>] > btrfs_search_slot_for_inode+0x43c/0x910 > May 22 09:23:57 bigbox kernel: [56455.532fff812d5f04>] > btrfs_lookup_file_extent+0x54/0x70 > May 22 09:23:57 bigbox kernel: [56455.532646812f097c>] > btrfs_drop_extents+0xec/0x940 > May 22 09:23:57 bigbox kernel: [56455.532662] fff81084eec>] ? > try_to_wake_up+0x1bc/0x2b0 > May 22 09:23:57 bigbox kernel: [56455.53268 [] ? > set_state_bits+0x3f/0x80 > May 22 09:23:57 bigbox kernel: [56455fff8116228c>] ? > kmem_cache_alloc+0x10c/0x140 > May 22 09:23:57 bigbox kernel: [56455.532713] [] ? > btrfs_alloc_path+0x1a/0x20 > May 22 09:23:57 bigbox kernel: [5645532] [] > insert_reserved_file_extent.constpr13+0x73/0x270 > May 22 09:23:57 bigbox kernel: [56455.532746] [] ? > join_transactio0x2b/0x2b0 > May 22 09:23:57 bigbox kernel: [56455.532759] [] ? > start_transaction+0x94/0x320 > May 22 09:23:57 bigbox kernel: [56455.532774] [] > btrfinish_ordered_io+0x2ca/0x320 > May 22 09:23:57 bigbox kernel: [56455.532793] > [age_end_io_hook+0x4d/0xc0 > May 22 09:23:57 bigbox kernel: [56455.532813] [] > ] ? bio_free+0x5f/0x70 > May 22 09:23:57 bigbox kernel: [56455.532837] fff811a817d>] > bio_endio+0x1d/0x40 > May 22 09:23:57 bigbox kernel: [56455.532869] [] > end_workqueue_fn+0x56/0x140 > May 22 09:23:57 bigbox kernel: [56455.532886] [] > worker_loop+0x148/0x580 > May 22 09:23:57 bigbox kernel: [56455.532898] [] ? > btrfs_queue_worker+0x2e0/0x2e0 > May 22 09:23:57 bigbox kernel: [56455.532915] [] > kthread+0x93/0xa0 > May 22 09:23:57 bigbox kernel: [56455.532929] [] > kernel_thread_helper+0x4/0x10 > May 22 09:23:57 bigbox kernel: [56455.532944] [ May 22 09:23:57 bigbox kernel: [56455.532972] ---[ end trace a7919e7f17c42adb > ]--- > May 22 09:23:57 bigbox kernel: [56455.532985] [ cut here > ] > > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Newbie questions on some of btrfs code...
On 22.05.2012 10:07, Alex Lyakas wrote: >>> # If my understanding in the previous bullet is correct: Is that the >>> reason that in btrfs_prev_leaf() it is assumed that if there is a >>> lesser key, btrfs_search_slot() will never bring us to the slot==0 of >>> the current leaf? >> >> It's quite straight: We look for a key smaller than the first (slot 0) >> of the current leaf. If we find the current leaf again (because >> btrfs_search_slot returns the place where such a key would have be >> inserted), then there's no previous leaf. No preconditions or >> assumptions on nodes in levels needed. > > Let's say that slot[0] of the current leaf (A) has key=10. And let's > say that its parent node (N) has key=5 (and not 10). Let's say we have > a previous leaf (B), whose last slot has key=2. > If such tree is valid, then: btrfs_prev_leaf() will search for key==9. > Then btrfs_search_slot() would bring us node N and leaf A again, > wouldn't it? Because key(N)<=9. So we will receive leaf A back, and > will think that there is no previous leaf, while there is. > What am I missing here? It wouldn't. btrfs_search_slot always sets up the path such that it points to the position where such an key would be inserted. And we never insert at the beginning of a leaf. So in your example, this would be at the end of leaf B: your path object will have nodes[1] = N, nodes[0] = B and slots[0] = number_of_slots_used_in_B + 1. Your example sounds like a good explanation why the key in the parent node should really be an exact match. It sounds reasonable that it's not allowed to be <= than the first key of its child. If it was, extra lookups would be required to setup the path correctly for your example above (which I haven't seen so far). -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Could btrfs-restore be extended to also restore file dates?
Any possibility of getting btrfs-restore to also restore the files timestamp? I'm doing a restore right now as I had one btrfs partition blow up and I'm noting that the timestamps are marking all the restored files as new. It would be nice to be able to do a quick compare of file dates to determine any changed files that may be newer on the restore vs the backup. (I can save full file compares for when the server is not being actively used.) I do realize it is possible that there could be other issues, but for quickly determining potential issues this could be useful. I do realize that there may be technical reasons for the current behavior, so at the very least this is suggestion for future functionality even if it doesn't help me. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
SSD erase state and reducing SSD wear
I've got two recent examples of SSDs. Their pristine state from the manufacturer shows: Device Model: OCZ-VERTEX3 # hexdump -C /dev/sdd 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 || * 1bf2976000 Device Model: OCZ VERTEX PLUS (OCZ VERTEX 2E) # hexdump -C /dev/sdd ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff || * df99e6000 What's a good way to test what state they get erased to from a TRIM operation? Can btrfs detect the erase state and pad unused space in filesystem writes with the same value so as to reduce SSD wear? Regards, Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SSD format/mount parameters questions
On 19/05/12 18:36, Martin Steigerwald wrote: > Am Freitag, 18. Mai 2012 schrieb Sander: >> Martin wrote (ao): >>> Are there any format/mount parameters that should be set for using >>> btrfs on SSDs (other than the "ssd" mount option)? >> >> If possible, format the whole device, do not partition the ssd. This >> will guarantee proper allignment. > > Current partitioning tools align at 1 MiB unless otherwise specified. > > And then thats only the alignment of the start of the filesystem. > > Not the granularity that the filesystem itself uses to align its writes. > > And then its not clear to me what effect proper alignment will actually > have given the intelligent nature of SSD firmwares. That's what I'm trying to untangle rather than just trusting to "magic". I'm also not so convinced about the "SSD firmwares" being quite so "intelligent"... So far, the only clear indications are that a number of SSDs have a performance 'sweet spot' when you use 16kByte blocks for data transfer. Practicalities for the SSD internal structure strongly suggest that they work in chunks of data greater than 4kBytes. 4kByte operation is a strong driver for SSD manufacturers, but what compromises do they make to accommodate that? And for btrfs: Extents are aligned to "sector size" boundaries (4kBytes default). And there is a comment that setting larger sector sizes increases the CPU overhead in btrfs due to the larger memory moves needed for making inserts into the trees. If the SSD is going to do a read-modify-write on anything smaller than 16kBytes in any case, might btrfs just as well use that chunk size to good advantage in the first place? So, what is most significant? Also: btrfs has a big advantage of using checksumming and COW. However, ext4 is more mature, similarly uses extents, and also allows specifying a large "delayed allocation" time to merge multiple writes if you're happy your system is safely on a UPS... I'm not too worried about this for MLC SSDs, but it is something that is of concern for the yet shorter modify-erase count lifespan of TLC SSDs. Regards, Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Which is the maximum files size in BTRFS ? [was Re: btrfs: Probably the larger filesystem I will see for a long time]
On 05/22/2012 07:17 PM, Goffredo Baroncelli wrote: > Hi all, > >>From the specification [1] the btrfs maximum file size limit should be > 1<<64 bytes. However I was never able to create a file >= 1<<63 bytes. > > > ghigo@venice:/mnt/old-btrfs/home/ghigo/gianfile$ ls -l giantfile2 > -rw-r--r-- 1 ghigo ghigo 9223372036854775807 May 22 18:55 giantfile2 > ghigo@venice:/mnt/old-btrfs/home/ghigo/gianfile$ ls -lh giantfile2 > -rw-r--r-- 1 ghigo ghigo 8.0E May 22 18:55 giantfile2 > ghigo@venice:/mnt/old-btrfs/home/ghigo/gianfile$ echo -n x >>giantfile2 > bash: echo: write error: File too large > ghigo@venice:/mnt/old-btrfs/home/ghigo/gianfile$ python -c "print 1<<63" > 9223372036854775808 > > Could be a kernel limit ? Yes, it seems to be a kernel limit: the generic_file_llseek() function check the lseek "offset" argument against superblock->s_maxbytes, which is set to MAX_LFS_FILESIZE in btrfs. (see file fs/read_write.c and fs/btrfs/super.c). MAX_LFS_FILESIZE is defined in include/linux/fs.h as /* Page cache limit. The filesystems should put that into their s_maxbytes limits, otherwise bad things can happen in VM. */ #if BITS_PER_LONG==32 #define MAX_LFS_FILESIZE \ (((u64)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1) #elif BITS_PER_LONG==64 #define MAX_LFS_FILESIZE 0x7fffUL #endif Which means that in btrfs under linux there is a file size limit of 8EB ( 0x7fff +1 ). Goffredo > > Goffredo > > [1] https://btrfs.wiki.kernel.org/index.php/Main_Page > > P.S. > I am asking about this un-useful question because I want to create a > loop based btrfs filesystem on a file greater than 8E. But I was unable > to create a such big file. I got success up to 8E-1 > > > > On 05/19/2012 05:03 AM, Christian Robert wrote: >> Probably the larger filesystem I will ever see. Tryed 8 Exabytes but it >> failed. >> >> [root@CentOS6-A:/root] # df >> Filesystem1K-blocks Used Available >> Use% Mounted >> /dev/mapper/vg01-root 17915884 11533392 5513572 >> 68% / >> /dev/sda1508745140314342831 >> 30% /boot >> /dev/mapper/data_0 66993872 1644372 61994060 >> 3% /mnt/data_0 >> /dev/mapper/data_1 7881299347898368508360 7881248224091896 >> 1% /mnt/data_1 >> >> [root@CentOS6-A:/root] # df -h >> Filesystem Size Used Avail Use% Mounted >> /dev/mapper/vg01-root 18G 11G 5.3G 68% / >> /dev/sda1 497M 138M 335M 30% /boot >> /dev/mapper/data_0 64G 1.6G60G3% /mnt/data_0 >> /dev/mapper/data_1 7.0E 497M 7.0E1% /mnt/data_1 >> >> [root@CentOS6-A:/root] # df -Th >> Filesystem Type Size Used Avail Use% >> /dev/mapper/vg01-root ext4 18G 11G 5.3G 68% >> /dev/sda1 ext4 497M 138M 335M 30% >> /dev/mapper/data_0 ext4 64G 1.6G60G 3% >> /dev/mapper/data_1 btrfs 7.0E 499M 7.0E 1% >> [root@CentOS6-A:/root] # >> >> >> [root@CentOS6-A:/root] # uname -rv >> 3.4.0-rc7+ #23 SMP Wed May 16 20:20:47 EDT 2012 >> >> >> made with a dm-thin device sitting on a device pair composed of >> (metadata 256Megs and data 23 Gigs) >> >> running on my laptop at home. >> >> yes, this is 7 Exabytes or 7,168 Petabytes or ( 7,340,032 Terabytes ) or >> 7,516,192,768 Gigabytes. >> >> >> please do not answer, it is just a statement of a fact at 3.4-rc7 (was >> not working at 3.4-rc3 if I remember). >> >> >> Xtian. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> . >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > . > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ceph on btrfs 3.4rc
On Tue, May 22, 2012 at 12:29:59PM +0200, Christian Brunner wrote: > 2012/5/21 Miao Xie : > > Hi Josef, > > > > On fri, 18 May 2012 15:01:05 -0400, Josef Bacik wrote: > >> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h > >> index 9b9b15f..492c74f 100644 > >> --- a/fs/btrfs/btrfs_inode.h > >> +++ b/fs/btrfs/btrfs_inode.h > >> @@ -57,9 +57,6 @@ struct btrfs_inode { > >> /* used to order data wrt metadata */ > >> struct btrfs_ordered_inode_tree ordered_tree; > >> > >> - /* for keeping track of orphaned inodes */ > >> - struct list_head i_orphan; > >> - > >> /* list of all the delalloc inodes in the FS. There are times we > >> need > >> * to write all the delalloc pages to disk, and this list is used > >> * to walk them all. > >> @@ -156,6 +153,8 @@ struct btrfs_inode { > >> unsigned dummy_inode:1; > >> unsigned in_defrag:1; > >> unsigned delalloc_meta_reserved:1; > >> + unsigned has_orphan_item:1; > >> + unsigned doing_truncate:1; > > > > I think the problem is we should not use the different lock to protect the > > bit fields which > > are stored in the same machine word. Or some bit fields may be covered by > > the others when > > someone change those fields. Could you try to declare > > ->delalloc_meta_reserved and ->has_orphan_item > > as a integer? > > I have tried changing it to: > > struct btrfs_inode { > unsigned orphan_meta_reserved:1; > unsigned dummy_inode:1; > unsigned in_defrag:1; > - unsigned delalloc_meta_reserved:1; > + int delalloc_meta_reserved; > + int has_orphan_item; > + int doing_truncate; > > The strange thing is, that I'm no longer hitting the BUG_ON, but the > old WARNING (no additional messages): > Yeah you would also need to change orphan_meta_reserved. I fixed this by just taking the BTRFS_I(inode)->lock when messing with these since we don't want to take up all that space in the inode just for a marker. I ran this patch for 3 hours with no issues, let me know if it works for you. Thanks, Josef diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 3771b85..559e716 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -57,9 +57,6 @@ struct btrfs_inode { /* used to order data wrt metadata */ struct btrfs_ordered_inode_tree ordered_tree; - /* for keeping track of orphaned inodes */ - struct list_head i_orphan; - /* list of all the delalloc inodes in the FS. There are times we need * to write all the delalloc pages to disk, and this list is used * to walk them all. @@ -153,6 +150,7 @@ struct btrfs_inode { unsigned dummy_inode:1; unsigned in_defrag:1; unsigned delalloc_meta_reserved:1; + unsigned has_orphan_item:1; /* * always compress this one file diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index ba8743b..72cdf98 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1375,7 +1375,7 @@ struct btrfs_root { struct list_head root_list; spinlock_t orphan_lock; - struct list_head orphan_list; + atomic_t orphan_inodes; struct btrfs_block_rsv *orphan_block_rsv; int orphan_item_inserted; int orphan_cleanup_state; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 19f5b45..25dba7a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1153,7 +1153,6 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, root->orphan_block_rsv = NULL; INIT_LIST_HEAD(&root->dirty_list); - INIT_LIST_HEAD(&root->orphan_list); INIT_LIST_HEAD(&root->root_list); spin_lock_init(&root->orphan_lock); spin_lock_init(&root->inode_lock); @@ -1166,6 +1165,7 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, atomic_set(&root->log_commit[0], 0); atomic_set(&root->log_commit[1], 0); atomic_set(&root->log_writers, 0); + atomic_set(&root->orphan_inodes, 0); root->log_batch = 0; root->log_transid = 0; root->last_log_commit = 0; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 54ae3df..54f1b30 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2104,12 +2104,12 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle *trans, struct btrfs_block_rsv *block_rsv; int ret; - if (!list_empty(&root->orphan_list) || + if (atomic_read(&root->orphan_inodes) || root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE) return; spin_lock(&root->orphan_lock); - if (!list_empty(&root->orphan_list)) { + if (atomic_read(&root->orphan_inodes)) { spin_unlock(&root->orphan_lock); return; } @@ -2166,8 +2166,9 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode) block_rsv = NULL; } - if
Which is the maximum files size in BTRFS ? [was Re: btrfs: Probably the larger filesystem I will see for a long time]
Hi all, >From the specification [1] the btrfs maximum file size limit should be 1<<64 bytes. However I was never able to create a file >= 1<<63 bytes. ghigo@venice:/mnt/old-btrfs/home/ghigo/gianfile$ ls -l giantfile2 -rw-r--r-- 1 ghigo ghigo 9223372036854775807 May 22 18:55 giantfile2 ghigo@venice:/mnt/old-btrfs/home/ghigo/gianfile$ ls -lh giantfile2 -rw-r--r-- 1 ghigo ghigo 8.0E May 22 18:55 giantfile2 ghigo@venice:/mnt/old-btrfs/home/ghigo/gianfile$ echo -n x >>giantfile2 bash: echo: write error: File too large ghigo@venice:/mnt/old-btrfs/home/ghigo/gianfile$ python -c "print 1<<63" 9223372036854775808 Could be a kernel limit ? Goffredo [1] https://btrfs.wiki.kernel.org/index.php/Main_Page P.S. I am asking about this un-useful question because I want to create a loop based btrfs filesystem on a file greater than 8E. But I was unable to create a such big file. I got success up to 8E-1 On 05/19/2012 05:03 AM, Christian Robert wrote: > Probably the larger filesystem I will ever see. Tryed 8 Exabytes but it > failed. > > [root@CentOS6-A:/root] # df > Filesystem1K-blocks Used Available > Use% Mounted > /dev/mapper/vg01-root 17915884 11533392 5513572 > 68% / > /dev/sda1508745140314342831 > 30% /boot > /dev/mapper/data_0 66993872 1644372 61994060 > 3% /mnt/data_0 > /dev/mapper/data_1 7881299347898368508360 7881248224091896 > 1% /mnt/data_1 > > [root@CentOS6-A:/root] # df -h > Filesystem Size Used Avail Use% Mounted > /dev/mapper/vg01-root 18G 11G 5.3G 68% / > /dev/sda1 497M 138M 335M 30% /boot > /dev/mapper/data_0 64G 1.6G60G3% /mnt/data_0 > /dev/mapper/data_1 7.0E 497M 7.0E1% /mnt/data_1 > > [root@CentOS6-A:/root] # df -Th > Filesystem Type Size Used Avail Use% > /dev/mapper/vg01-root ext4 18G 11G 5.3G 68% > /dev/sda1 ext4 497M 138M 335M 30% > /dev/mapper/data_0 ext4 64G 1.6G60G 3% > /dev/mapper/data_1 btrfs 7.0E 499M 7.0E 1% > [root@CentOS6-A:/root] # > > > [root@CentOS6-A:/root] # uname -rv > 3.4.0-rc7+ #23 SMP Wed May 16 20:20:47 EDT 2012 > > > made with a dm-thin device sitting on a device pair composed of > (metadata 256Megs and data 23 Gigs) > > running on my laptop at home. > > yes, this is 7 Exabytes or 7,168 Petabytes or ( 7,340,032 Terabytes ) or > 7,516,192,768 Gigabytes. > > > please do not answer, it is just a statement of a fact at 3.4-rc7 (was > not working at 3.4-rc3 if I remember). > > > Xtian. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > . > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
warnings met in introduce extent buffer cache for each i-node patch
Miao, I was trying out your patch on scalability testing for BTRFS on v3.3 kernel. http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg14930.html However, I ran into a lot of warnings (see the dmesg below). Wonder if you have a more up to date version of this patch? In addition, I have to do this modification to fix a warning in your original patch. diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 892b347..e0210c9 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4608,7 +4613,7 @@ fail_dir_item: int err; err = btrfs_del_inode_ref(trans, root, name, name_len, - ino, parent_ino, &local_index); + inode, parent_ino, &local_index); } return ret; } Thanks. Tim May 22 09:23:57 bigbox kernel: [56455.532138] [ cut here ] May 22 09:23:57 bigbox kernel: [56455.532146] NG: at fs/btrfs/extent_io.c:3795 free_extent_buffer+0x31/6455.532189] Hardware name: PRIMEQUEST 1800E2 May 22 09:23:57 bigbox kernel: [56455.53nked in: scsi_ram lockd nf_conntrack_ipv4 nf_defrag_ipv4 xtfat ioatdma i2c_i801 i7core_edac e1000e microcode edac_core i2c_core igb iTCO_wdt iTCO_vendor_support dca uinput sunrpc usb_stortsas mptscsih mptbase scsi_transport_sas [last unloaded: scsi_wait_sca55.532431] Pid: 4399, comm: btrfs-endio-wri Tainted: GW 3.3.0c-scsiram-btrfs2+ #30 May 22 09:23:57 bigbox kernel: [56455.532486] Call Trace: May 22 09:23:57 bigbox kernel: [56455. [] warn_slowpath_common+0x7f/0xc0 May 22 09:23:57 bigbox kernel: [5645] [] warn_slowpath_null+0x1a/0x20 May 22 09:23:57 bigbox kernel: [56455.53220] [] free_extent_buffer+0x31/0x40 May 22 09:23:57 bigbox kernel: ] [] read_block_for_search+0x117/0x3d0 May 22 09:23:57 bigbox kernel: 32559] [] ? generic_bin_search.constprop.4+0[] ? unlock_up+0x15d/0x190 May 22 09:23:57 bigbox kernel: [56455.5812c31c1>] btrfs_search_slot+0x241/0x720 May 22 09:23:57 bigbox kernel: [56455.5326fff812c3adc>] btrfs_search_slot_for_inode+0x43c/0x910 May 22 09:23:57 bigbox kernel: [56455.532fff812d5f04>] btrfs_lookup_file_extent+0x54/0x70 May 22 09:23:57 bigbox kernel: [56455.532646812f097c>] btrfs_drop_extents+0xec/0x940 May 22 09:23:57 bigbox kernel: [56455.532662] fff81084eec>] ? try_to_wake_up+0x1bc/0x2b0 May 22 09:23:57 bigbox kernel: [56455.53268 [] ? set_state_bits+0x3f/0x80 May 22 09:23:57 bigbox kernel: [56455fff8116228c>] ? kmem_cache_alloc+0x10c/0x140 May 22 09:23:57 bigbox kernel: [56455.532713] [] ? btrfs_alloc_path+0x1a/0x20 May 22 09:23:57 bigbox kernel: [5645532] [] insert_reserved_file_extent.constpr13+0x73/0x270 May 22 09:23:57 bigbox kernel: [56455.532746] [] ? join_transactio0x2b/0x2b0 May 22 09:23:57 bigbox kernel: [56455.532759] [] ? start_transaction+0x94/0x320 May 22 09:23:57 bigbox kernel: [56455.532774] [] btrfinish_ordered_io+0x2ca/0x320 May 22 09:23:57 bigbox kernel: [56455.532793] [age_end_io_hook+0x4d/0xc0 May 22 09:23:57 bigbox kernel: [56455.532813] [] ] ? bio_free+0x5f/0x70 May 22 09:23:57 bigbox kernel: [56455.532837] fff811a817d>] bio_endio+0x1d/0x40 May 22 09:23:57 bigbox kernel: [56455.532869] [] end_workqueue_fn+0x56/0x140 May 22 09:23:57 bigbox kernel: [56455.532886] [] worker_loop+0x148/0x580 May 22 09:23:57 bigbox kernel: [56455.532898] [] ? btrfs_queue_worker+0x2e0/0x2e0 May 22 09:23:57 bigbox kernel: [56455.532915] [] kthread+0x93/0xa0 May 22 09:23:57 bigbox kernel: [56455.532929] [] kernel_thread_helper+0x4/0x10 May 22 09:23:57 bigbox kernel: [56455.532944] [http://vger.kernel.org/majordomo-info.html
Re: 3.4.0-rc6: WARNING: at fs/btrfs/super.c:219 __btrfs_abort_transaction+0xae/0xc0 [btrfs]()
Hi, I just got the same warning on a fresh 3.4.0 final while booting. This time on /usr/share (different filesystem from last time): arnd@kallisto:~$ ls -l /dev/mapper/vg0-usr_share lrwxrwxrwx 1 root root 7 Mai 22 17:59 /dev/mapper/vg0-usr_share -> ../dm-4 arnd@kallisto:~$ grep usr_share /proc/mounts /dev/mapper/vg0-usr_share /usr/share btrfs rw,relatime,compress=zlib,ssd,nospace_cache 0 0 [ 12.326239] [ cut here ] [ 12.326264] WARNING: at /home/arnd/Projekte/kernel/linux-2.6/fs/btrfs/super.c:219 __btrfs_abort_transaction+0xae/0xc0 [btrfs]() [ 12.326266] Hardware name: 4384GEG [ 12.326267] btrfs: Transaction aborted [ 12.326268] Modules linked in: joydev bridge stp llc kvm_intel kvm dm_crypt bnep rfcomm bluetooth binfmt_misc arc4 coretemp snd_hda_codec_hdmi snd_hda_codec_conexant thinkpad_acpi microcode snd_seq_midi psmouse snd_rawmidi serio_raw iwlwifi intel_ips qcserial usb_wwan usbserial mac80211 snd_hda_intel snd_seq_midi_event snd_hda_codec snd_seq snd_hwdep cfg80211 snd_seq_device snd_pcm snd_timer snd_page_alloc snd soundcore tpm_tis nvram mei(C) btrfs zlib_deflate libcrc32c mxm_wmi ghash_clmulni_intel aesni_intel cryptd aes_x86_64 i915 ahci libahci drm_kms_helper drm e1000e sdhci_pci sdhci firewire_ohci firewire_core i2c_algo_bit crc_itu_t video wmi [ 12.326297] Pid: 1471, comm: hybrid-detect Tainted: G C 3.4.0aha+ #11 [ 12.326298] Call Trace: [ 12.326305] [] warn_slowpath_common+0x7f/0xc0 [ 12.326307] [] warn_slowpath_fmt+0x46/0x50 [ 12.326315] [] ? do_chunk_alloc.isra.71+0x31c/0x3f0 [btrfs] [ 12.326322] [] __btrfs_abort_transaction+0xae/0xc0 [btrfs] [ 12.326329] [] find_free_extent+0xbe5/0xc70 [btrfs] [ 12.326334] [] ? __switch_to+0x17a/0x410 [ 12.326341] [] btrfs_reserve_extent+0xed/0x250 [btrfs] [ 12.326350] [] btrfs_alloc_free_block+0x177/0x370 [btrfs] [ 12.326357] [] __btrfs_cow_block+0x135/0x4d0 [btrfs] [ 12.326363] [] btrfs_cow_block+0xfc/0x220 [btrfs] [ 12.326370] [] btrfs_search_slot+0x454/0x910 [btrfs] [ 12.326377] [] ? reserve_metadata_bytes.isra.72+0x207/0x740 [btrfs] [ 12.326384] [] btrfs_insert_empty_items+0x7c/0xe0 [btrfs] [ 12.326390] [] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [ 12.326401] [] btrfs_insert_orphan_item+0x5f/0x90 [btrfs] [ 12.326429] [] btrfs_orphan_add+0xc5/0x1c0 [btrfs] [ 12.326443] [] btrfs_truncate+0x146/0x650 [btrfs] [ 12.326449] [] ? security_inode_alloc+0x1e/0x20 [ 12.326461] [] btrfs_setattr+0xc1/0x1b0 [btrfs] [ 12.326464] [] notify_change+0x1aa/0x340 [ 12.326467] [] do_truncate+0x5e/0xa0 [ 12.326470] [] do_last+0x581/0x8f0 [ 12.326472] [] path_openat+0xd2/0x400 [ 12.326474] [] do_filp_open+0x42/0xa0 [ 12.326476] [] ? alloc_fd+0xd1/0x120 [ 12.326478] [] do_sys_open+0xf8/0x1d0 [ 12.326480] [] ? filp_close+0x66/0x90 [ 12.326482] [] sys_open+0x21/0x30 [ 12.326485] [] system_call_fastpath+0x16/0x1b [ 12.326487] ---[ end trace 4479826ac6de5588 ]--- [ 12.326489] BTRFS warning (device dm-4): Aborting unused transaction. Best regards Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] Btrfs: fix deadlock on sb->s_umount when doing umount
On Wed, May 09, 2012 at 11:24:28AM +0800, Miao Xie wrote: > Did you apply the trylock patchs I sent before? 20120429 [PATCH 1/2] vfs: re-implement writeback_inodes_sb(_nr)_if_idle() and rename them 20120429 [PATCH 2/2] Btrfs: flush all the dirty pages if try_to_writeback_inodes_sb_nr() fails on top of 3.4, no deadlocks occured with looped 269, 264, 254, 276. Mounted with space_cache,autodefrag . david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ceph on btrfs 3.4rc
On Mon, May 21, 2012 at 11:59:54AM +0800, Miao Xie wrote: > Hi Josef, > > On fri, 18 May 2012 15:01:05 -0400, Josef Bacik wrote: > > diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h > > index 9b9b15f..492c74f 100644 > > --- a/fs/btrfs/btrfs_inode.h > > +++ b/fs/btrfs/btrfs_inode.h > > @@ -57,9 +57,6 @@ struct btrfs_inode { > > /* used to order data wrt metadata */ > > struct btrfs_ordered_inode_tree ordered_tree; > > > > - /* for keeping track of orphaned inodes */ > > - struct list_head i_orphan; > > - > > /* list of all the delalloc inodes in the FS. There are times we need > > * to write all the delalloc pages to disk, and this list is used > > * to walk them all. > > @@ -156,6 +153,8 @@ struct btrfs_inode { > > unsigned dummy_inode:1; > > unsigned in_defrag:1; > > unsigned delalloc_meta_reserved:1; > > + unsigned has_orphan_item:1; > > + unsigned doing_truncate:1; > > I think the problem is we should not use the different lock to protect the > bit fields which > are stored in the same machine word. Or some bit fields may be covered by the > others when > someone change those fields. Could you try to declare > ->delalloc_meta_reserved and ->has_orphan_item > as a integer? > Oh freaking duh, thank you Miao, I'm an idiot. Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 1/3] Btrfs-progs: move open_file_or_dir() to utils.c
This is a preparation step to add support for device stats. The definition of the function open_file_or_dir() is moved from common.c to utils.c in order to be able to share some common code between scrub and the device stats in the following step. That common code uses open_file_or_dir(). Since open_file_or_dir() makes use of the function dirfd(3), the required XOPEN version was raised from 6 to 7. Signed-off-by: Stefan Behrens --- Makefile |4 ++-- btrfsctl.c | 28 cmds-balance.c |1 + cmds-inspect.c |1 + cmds-subvolume.c |1 + commands.h |3 --- common.c | 46 -- utils.c | 31 +-- utils.h |3 +++ 9 files changed, 37 insertions(+), 81 deletions(-) diff --git a/Makefile b/Makefile index 79818e6..fe2b432 100644 --- a/Makefile +++ b/Makefile @@ -39,8 +39,8 @@ all: version $(progs) manpages version: bash version.sh -btrfs: $(objects) btrfs.o help.o common.o $(cmds_objects) - $(CC) $(CFLAGS) -o btrfs btrfs.o help.o common.o $(cmds_objects) \ +btrfs: $(objects) btrfs.o help.o $(cmds_objects) + $(CC) $(CFLAGS) -o btrfs btrfs.o help.o $(cmds_objects) \ $(objects) $(LDFLAGS) $(LIBS) -lpthread calc-size: $(objects) calc-size.o diff --git a/btrfsctl.c b/btrfsctl.c index d45e2a7..f0584f3 100644 --- a/btrfsctl.c +++ b/btrfsctl.c @@ -63,34 +63,6 @@ static void print_usage(void) exit(1); } -static int open_file_or_dir(const char *fname) -{ - int ret; - struct stat st; - DIR *dirstream; - int fd; - - ret = stat(fname, &st); - if (ret < 0) { - perror("stat:"); - exit(1); - } - if (S_ISDIR(st.st_mode)) { - dirstream = opendir(fname); - if (!dirstream) { - perror("opendir"); - exit(1); - } - fd = dirfd(dirstream); - } else { - fd = open(fname, O_RDWR); - } - if (fd < 0) { - perror("open"); - exit(1); - } - return fd; -} int main(int ac, char **av) { char *fname = NULL; diff --git a/cmds-balance.c b/cmds-balance.c index 38a7426..5793b5c 100644 --- a/cmds-balance.c +++ b/cmds-balance.c @@ -26,6 +26,7 @@ #include "ctree.h" #include "ioctl.h" #include "volumes.h" +#include "utils.h" #include "commands.h" diff --git a/cmds-inspect.c b/cmds-inspect.c index 2f0228f..7a8785b 100644 --- a/cmds-inspect.c +++ b/cmds-inspect.c @@ -22,6 +22,7 @@ #include "kerncompat.h" #include "ioctl.h" +#include "utils.h" #include "commands.h" diff --git a/cmds-subvolume.c b/cmds-subvolume.c index 950fa8f..8ecd3f4 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -26,6 +26,7 @@ #include "kerncompat.h" #include "ioctl.h" +#include "utils.h" #include "commands.h" diff --git a/commands.h b/commands.h index a303a50..aea4cb1 100644 --- a/commands.h +++ b/commands.h @@ -79,9 +79,6 @@ void help_ambiguous_token(const char *arg, const struct cmd_group *grp); void help_command_group(const struct cmd_group *grp, int argc, char **argv); -/* common.c */ -int open_file_or_dir(const char *fname); - extern const struct cmd_group subvolume_cmd_group; extern const struct cmd_group filesystem_cmd_group; extern const struct cmd_group balance_cmd_group; diff --git a/common.c b/common.c deleted file mode 100644 index 03f6570..000 --- a/common.c +++ /dev/null @@ -1,46 +0,0 @@ -/* - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public - * License v2 as published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License for more details. - * - * You should have received a copy of the GNU General Public - * License along with this program; if not, write to the - * Free Software Foundation, Inc., 59 Temple Place - Suite 330, - * Boston, MA 021110-1307, USA. - */ - -#include -#include -#include -#include - -int open_file_or_dir(const char *fname) -{ - int ret; - struct stat st; - DIR *dirstream; - int fd; - - ret = stat(fname, &st); - if (ret < 0) { - return -1; - } - if (S_ISDIR(st.st_mode)) { - dirstream = opendir(fname); - if (!dirstream) { - return -2; - } - fd = dirfd(dirstream); - } else { - fd = open(fname, O_RDWR); - } - if (fd < 0) { - return -3; - } - return fd; -} diff --git a/utils.c b/utils.c index ee7fa1b..6157115 100644 --- a/utils.c +++ b/utils.c @@ -16,8 +16,9
[PATCH v4 3/3] Btrfs-progs: add command to get/reset device stats via ioctl
"btrfs device stats" is used to retrieve and print the device stats. "btrfs device stats -z" is used to atomically retrieve, reset and print the stats. Signed-off-by: Stefan Behrens --- cmds-device.c | 113 ctree.h|6 +++ ioctl.h| 28 ++ man/btrfs.8.in | 14 +++ print-tree.c |6 +++ 5 files changed, 167 insertions(+) diff --git a/cmds-device.c b/cmds-device.c index db625a6..3417f03 100644 --- a/cmds-device.c +++ b/cmds-device.c @@ -246,11 +246,124 @@ static int cmd_scan_dev(int argc, char **argv) return 0; } +static const char * const cmd_dev_stats_usage[] = { + "btrfs device stats [-z] |", + "Show current device IO stats. -z to reset stats afterwards.", + NULL +}; + +static int cmd_dev_stats(int argc, char **argv) +{ + char *path; + struct btrfs_ioctl_fs_info_args fi_args; + struct btrfs_ioctl_dev_info_args *di_args = NULL; + int ret; + int fdmnt; + int i; + char c; + int fdres = -1; + int err = 0; + int cmd = BTRFS_IOC_GET_DEVICE_STATS; + + optind = 1; + while ((c = getopt(argc, argv, "z")) != -1) { + switch (c) { + case 'z': + cmd = BTRFS_IOC_GET_AND_RESET_DEVICE_STATS; + break; + case '?': + default: + fprintf(stderr, "ERROR: device stat args invalid.\n" + " device stat [-z] |\n" + " -z to reset stats after reading.\n"); + return 1; + } + } + + if (optind + 1 != argc) { + fprintf(stderr, "ERROR: device stat needs path|device as single" + " argument\n"); + return 1; + } + + path = argv[optind]; + + fdmnt = open_file_or_dir(path); + if (fdmnt < 0) { + fprintf(stderr, "ERROR: can't access '%s'\n", path); + return 12; + } + + ret = get_fs_info(fdmnt, path, &fi_args, &di_args); + if (ret) { + fprintf(stderr, "ERROR: getting dev info for devstats failed: " + "%s\n", strerror(-ret)); + err = 1; + goto out; + } + if (!fi_args.num_devices) { + fprintf(stderr, "ERROR: no devices found\n"); + err = 1; + goto out; + } + + for (i = 0; i < fi_args.num_devices; i++) { + struct btrfs_ioctl_get_device_stats args = {0}; + __u8 path[BTRFS_DEVICE_PATH_NAME_MAX + 1]; + + strncpy((char *)path, (char *)di_args[i].path, + BTRFS_DEVICE_PATH_NAME_MAX); + path[BTRFS_DEVICE_PATH_NAME_MAX] = '\0'; + + args.devid = di_args[i].devid; + args.nr_items = BTRFS_IOCTL_GET_DEVICE_STATS_MAX_NR_ITEMS; + + if (ioctl(fdmnt, cmd, &args) < 0) { + fprintf(stderr, "ERROR: ioctl(%s) on %s failed: %s\n", + BTRFS_IOC_GET_AND_RESET_DEVICE_STATS == cmd ? +"BTRFS_IOC_GET_AND_RESET_DEVICE_STATS" : +"BTRFS_IOC_GET_DEVICE_STATS", + path, strerror(errno)); + err = 1; + } else { + if (args.nr_items >= 1) + printf("[%s].cnt_write_io_errs %llu\n", + path, (unsigned long long) +args.cnt_write_io_errs); + if (args.nr_items >= 2) + printf("[%s].cnt_read_io_errs%llu\n", + path, (unsigned long long) +args.cnt_read_io_errs); + if (args.nr_items >= 3) + printf("[%s].cnt_flush_io_errs %llu\n", + path, (unsigned long long) +args.cnt_flush_io_errs); + if (args.nr_items >= 4) + printf("[%s].cnt_corruption_errs %llu\n", + path, (unsigned long long) +args.cnt_corruption_errs); + if (args.nr_items >= 5) + printf("[%s].cnt_generation_errs %llu\n", + path, (unsigned long long) +args.cnt_generation_errs); + } + } + +out: + free(di_args); + close(fdmnt); + if (fdres > -1) + close(fdres); + + return err; +} + const struct cmd_group device_cmd_group = { de
[PATCH v4 0/3] Btrfs-progs: support get/reset device stats via ioctl
"btrfs device stats" is used to retrieve and print the device stats. "btrfs device stats -z" is used to atomically retrieve, reset and print the stats. In order to share two utility functions between scrub and the dev stats code, these two functions are moved to utils.c and renamed. Since these functions are using open_file_or_dir(), and since the linking against utils.o and common.o was different, open_file_or_dir() was moved from common.c to utils.c. And since that function makes use of the function dirfd(3), the required XOPEN version was raised from 6 to 7. Changes v1->v2: - Remove a verbose printf() - Cast u64 to unsigned long long for printf() - Update the man page Changes v2->v3: - Rebase on Chris' current master branch - Split the patch into three seperate patches because after rebasing, open_file_or_dir() was moved and additional changes had been necessary Changes v2->v3: - Add padding at end of ioctl structure Stefan Behrens (3): Btrfs-progs: move open_file_or_dir() to utils.c Btrfs-progs: make two utility functions globally available Btrfs-progs: add command to get/reset device stats via ioctl Makefile |4 +- btrfsctl.c | 28 -- cmds-balance.c |1 + cmds-device.c| 113 ++ cmds-inspect.c |1 + cmds-scrub.c | 72 +- cmds-subvolume.c |1 + commands.h |3 -- common.c | 46 -- ctree.h |6 +++ ioctl.h | 28 ++ man/btrfs.8.in | 14 +++ print-tree.c |6 +++ utils.c | 97 +- utils.h |7 15 files changed, 276 insertions(+), 151 deletions(-) delete mode 100644 common.c -- 1.7.10.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 2/3] Btrfs-progs: make two utility functions globally available
Two convenient utility functions that have so far been local to scrub are moved to utils.c. They will be used in the device stats code in a following commit. Signed-off-by: Stefan Behrens --- cmds-scrub.c | 72 ++ utils.c | 66 + utils.h |4 3 files changed, 72 insertions(+), 70 deletions(-) diff --git a/cmds-scrub.c b/cmds-scrub.c index c4503f4..37a9890 100644 --- a/cmds-scrub.c +++ b/cmds-scrub.c @@ -967,74 +967,6 @@ static struct scrub_file_record *last_dev_scrub( return NULL; } -static int scrub_device_info(int fd, u64 devid, -struct btrfs_ioctl_dev_info_args *di_args) -{ - int ret; - - di_args->devid = devid; - memset(&di_args->uuid, '\0', sizeof(di_args->uuid)); - - ret = ioctl(fd, BTRFS_IOC_DEV_INFO, di_args); - return ret ? -errno : 0; -} - -static int scrub_fs_info(int fd, char *path, - struct btrfs_ioctl_fs_info_args *fi_args, - struct btrfs_ioctl_dev_info_args **di_ret) -{ - int ret = 0; - int ndevs = 0; - int i = 1; - struct btrfs_fs_devices *fs_devices_mnt = NULL; - struct btrfs_ioctl_dev_info_args *di_args; - char mp[BTRFS_PATH_NAME_MAX + 1]; - - memset(fi_args, 0, sizeof(*fi_args)); - - ret = ioctl(fd, BTRFS_IOC_FS_INFO, fi_args); - if (ret && errno == EINVAL) { - /* path is no mounted btrfs. try if it's a device */ - ret = check_mounted_where(fd, path, mp, sizeof(mp), - &fs_devices_mnt); - if (!ret) - return -EINVAL; - if (ret < 0) - return ret; - fi_args->num_devices = 1; - fi_args->max_id = fs_devices_mnt->latest_devid; - i = fs_devices_mnt->latest_devid; - memcpy(fi_args->fsid, fs_devices_mnt->fsid, BTRFS_FSID_SIZE); - close(fd); - fd = open_file_or_dir(mp); - if (fd < 0) - return -errno; - } else if (ret) { - return -errno; - } - - if (!fi_args->num_devices) - return 0; - - di_args = *di_ret = malloc(fi_args->num_devices * sizeof(*di_args)); - if (!di_args) - return -errno; - - for (; i <= fi_args->max_id; ++i) { - BUG_ON(ndevs >= fi_args->num_devices); - ret = scrub_device_info(fd, i, &di_args[ndevs]); - if (ret == -ENODEV) - continue; - if (ret) - return ret; - ++ndevs; - } - - BUG_ON(ndevs == 0); - - return 0; -} - int mkdir_p(char *path) { int i; @@ -1155,7 +1087,7 @@ static int scrub_start(int argc, char **argv, int resume) return 12; } - ret = scrub_fs_info(fdmnt, path, &fi_args, &di_args); + ret = get_fs_info(fdmnt, path, &fi_args, &di_args); if (ret) { ERR(!do_quiet, "ERROR: getting dev info for scrub failed: " "%s\n", strerror(-ret)); @@ -1621,7 +1553,7 @@ static int cmd_scrub_status(int argc, char **argv) return 12; } - ret = scrub_fs_info(fdmnt, path, &fi_args, &di_args); + ret = get_fs_info(fdmnt, path, &fi_args, &di_args); if (ret) { fprintf(stderr, "ERROR: getting dev info for scrub failed: " "%s\n", strerror(-ret)); diff --git a/utils.c b/utils.c index 6157115..037f64b 100644 --- a/utils.c +++ b/utils.c @@ -1233,3 +1233,69 @@ int open_file_or_dir(const char *fname) return fd; } +int get_device_info(int fd, u64 devid, + struct btrfs_ioctl_dev_info_args *di_args) +{ + int ret; + + di_args->devid = devid; + memset(&di_args->uuid, '\0', sizeof(di_args->uuid)); + + ret = ioctl(fd, BTRFS_IOC_DEV_INFO, di_args); + return ret ? -errno : 0; +} + +int get_fs_info(int fd, char *path, struct btrfs_ioctl_fs_info_args *fi_args, + struct btrfs_ioctl_dev_info_args **di_ret) +{ + int ret = 0; + int ndevs = 0; + int i = 1; + struct btrfs_fs_devices *fs_devices_mnt = NULL; + struct btrfs_ioctl_dev_info_args *di_args; + char mp[BTRFS_PATH_NAME_MAX + 1]; + + memset(fi_args, 0, sizeof(*fi_args)); + + ret = ioctl(fd, BTRFS_IOC_FS_INFO, fi_args); + if (ret && (errno == EINVAL || errno == ENOTTY)) { + /* path is not a mounted btrfs. Try if it's a device */ + ret = check_mounted_where(fd, path, mp, sizeof(mp), + &fs_devices_mnt); + if (!ret) + return -EINVAL; + if (ret < 0) +
[PATCH v4 1/3] Btrfs: add device counters for detected IO and checksum errors
The goal is to detect when drives start to get an increased error rate, when drives should be replaced soon. Therefore statistic counters are added that count IO errors (read, write and flush). Additionally, the software detected errors like checksum errors and corrupted blocks are counted. Signed-off-by: Stefan Behrens --- fs/btrfs/disk-io.c | 18 ++--- fs/btrfs/extent_io.c | 27 +-- fs/btrfs/scrub.c | 72 +++--- fs/btrfs/volumes.c | 61 +++--- fs/btrfs/volumes.h | 21 +++ 5 files changed, 174 insertions(+), 25 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a7ffc88..e123629 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2556,18 +2556,21 @@ recovery_tree_root: static void btrfs_end_buffer_write_sync(struct buffer_head *bh, int uptodate) { - char b[BDEVNAME_SIZE]; - if (uptodate) { set_buffer_uptodate(bh); } else { + struct btrfs_device *device = (struct btrfs_device *) + bh->b_private; + printk_ratelimited(KERN_WARNING "lost page write due to " - "I/O error on %s\n", - bdevname(bh->b_bdev, b)); + "I/O error on %s\n", device->name); /* note, we dont' set_buffer_write_io_error because we have * our own ways of dealing with the IO errors */ clear_buffer_uptodate(bh); + btrfs_device_stat_inc(&device->cnt_write_io_errs); + device->device_stats_dirty = 1; + btrfs_device_stat_print_on_error(device); } unlock_buffer(bh); put_bh(bh); @@ -2682,6 +2685,7 @@ static int write_dev_supers(struct btrfs_device *device, set_buffer_uptodate(bh); lock_buffer(bh); bh->b_end_io = btrfs_end_buffer_write_sync; + bh->b_private = device; } /* @@ -2740,6 +2744,12 @@ static int write_dev_flush(struct btrfs_device *device, int wait) } if (!bio_flagged(bio, BIO_UPTODATE)) { ret = -EIO; + if (!bio_flagged(bio, BIO_EOPNOTSUPP)) { + btrfs_device_stat_inc( + &device->cnt_flush_io_errs); + device->device_stats_dirty = 1; + btrfs_device_stat_print_on_error(device); + } } /* drop the reference from the wait == 0 run */ diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 2fb52c2..6cd9a55 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1923,6 +1923,9 @@ int repair_io_failure(struct btrfs_mapping_tree *map_tree, u64 start, if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) { /* try to remap that extent elsewhere? */ bio_put(bio); + btrfs_device_stat_inc(&dev->cnt_write_io_errs); + dev->device_stats_dirty = 1; + btrfs_device_stat_print_on_error(dev); return -EIO; } @@ -2347,10 +2350,30 @@ static void end_bio_extent_readpage(struct bio *bio, int err) if (uptodate && tree->ops && tree->ops->readpage_end_io_hook) { ret = tree->ops->readpage_end_io_hook(page, start, end, state, mirror); - if (ret) + if (ret) { + /* no IO indicated but software detected errors +* in the block, either checksum errors or +* issues with the contents */ + int failed_mirror = (int)(uintptr_t) + bio->bi_bdev; + struct btrfs_root *root = + BTRFS_I(page->mapping->host)->root; + struct btrfs_device *device; + uptodate = 0; - else + device = btrfs_find_device_for_logical( + root, start, + (int)failed_mirror); + if (device) { + btrfs_device_stat_inc( + &device->cnt_corruption_errs); + device->device_stats_dirty = 1; + btrfs_device_stat_print_on_error( +
[PATCH v4 2/3] Btrfs: add ioctl to get and reset the device stats
An ioctl interface is added to get the device statistic counters. A second ioctl is added to atomically get and reset these counters. Signed-off-by: Stefan Behrens --- fs/btrfs/ioctl.c | 26 fs/btrfs/ioctl.h | 28 + fs/btrfs/volumes.c | 69 fs/btrfs/volumes.h | 13 ++ 4 files changed, 136 insertions(+) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 14f8e1f..19d2244 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3042,6 +3042,28 @@ static long btrfs_ioctl_scrub_progress(struct btrfs_root *root, return ret; } +static long btrfs_ioctl_get_device_stats(struct btrfs_root *root, +void __user *arg, int reset_after_read) +{ + struct btrfs_ioctl_get_device_stats *sa; + int ret; + + if (reset_after_read && !capable(CAP_SYS_ADMIN)) + return -EPERM; + + sa = memdup_user(arg, sizeof(*sa)); + if (IS_ERR(sa)) + return PTR_ERR(sa); + + ret = btrfs_get_device_stats(root, sa, reset_after_read); + + if (copy_to_user(arg, sa, sizeof(*sa))) + ret = -EFAULT; + + kfree(sa); + return ret; +} + static long btrfs_ioctl_ino_to_path(struct btrfs_root *root, void __user *arg) { int ret = 0; @@ -3424,6 +3446,10 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_balance_ctl(root, arg); case BTRFS_IOC_BALANCE_PROGRESS: return btrfs_ioctl_balance_progress(root, argp); + case BTRFS_IOC_GET_DEVICE_STATS: + return btrfs_ioctl_get_device_stats(root, argp, 0); + case BTRFS_IOC_GET_AND_RESET_DEVICE_STATS: + return btrfs_ioctl_get_device_stats(root, argp, 1); } return -ENOTTY; diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index 086e6bd..f1c1196 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -266,6 +266,30 @@ struct btrfs_ioctl_logical_ino_args { __u64 inodes; }; +#define BTRFS_IOCTL_GET_DEVICE_STATS_MAX_NR_ITEMS 5 +struct btrfs_ioctl_get_device_stats { + __u64 devid;/* in */ + __u64 nr_items; /* in/out */ + + /* out values: */ + + /* disk I/O failure stats */ + __u64 cnt_write_io_errs; /* EIO or EREMOTEIO from lower layers */ + __u64 cnt_read_io_errs; /* EIO or EREMOTEIO from lower layers */ + __u64 cnt_flush_io_errs; /* EIO or EREMOTEIO from lower layers */ + + /* stats for indirect indications for I/O failures */ + __u64 cnt_corruption_errs; /* checksum error, bytenr error or + * contents is illegal: this is an + * indication that the block was damaged + * during read or write, or written to + * wrong location or read from wrong + * location */ + __u64 cnt_generation_errs; /* an indication that blocks have not + * been written */ + __u64 unused[121]; /* pad to 1k */ +}; + #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \ struct btrfs_ioctl_vol_args) #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \ @@ -330,5 +354,9 @@ struct btrfs_ioctl_logical_ino_args { struct btrfs_ioctl_ino_path_args) #define BTRFS_IOC_LOGICAL_INO _IOWR(BTRFS_IOCTL_MAGIC, 36, \ struct btrfs_ioctl_ino_path_args) +#define BTRFS_IOC_GET_DEVICE_STATS _IOWR(BTRFS_IOCTL_MAGIC, 52, \ +struct btrfs_ioctl_get_device_stats) +#define BTRFS_IOC_GET_AND_RESET_DEVICE_STATS _IOWR(BTRFS_IOCTL_MAGIC, 53, \ +struct btrfs_ioctl_get_device_stats) #endif diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c458c74..5f5a6ce 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4638,3 +4638,72 @@ void btrfs_device_stat_print_on_error(struct btrfs_device *device) btrfs_device_stat_read( &device->cnt_generation_errs)); } + +int btrfs_get_device_stats(struct btrfs_root *root, + struct btrfs_ioctl_get_device_stats *stats, + int reset_after_read) +{ + struct btrfs_device *dev; + struct btrfs_fs_devices *fs_devices = root->fs_info->fs_devices; + + mutex_lock(&fs_devices->device_list_mutex); + dev = btrfs_find_device(root, stats->devid, NULL, NULL); + mutex_unlock(&fs_devices->device_list_mutex); + + if (!dev) { + printk(KERN_WARNING + "btrfs: get device_stats failed, device not found\n"); + return -ENODEV; +
[PATCH v4 3/3] Btrfs: read device stats on mount, write modified ones during commit
The device statistics are written into the device tree with each transaction commit. Only modified statistics are written. When a filesystem is mounted, the device statistics for each involved device are read from the device tree and used to initialize the counters. Signed-off-by: Stefan Behrens --- fs/btrfs/ctree.h | 51 fs/btrfs/disk-io.c |7 ++ fs/btrfs/print-tree.c |3 + fs/btrfs/transaction.c |4 + fs/btrfs/volumes.c | 205 fs/btrfs/volumes.h |9 +++ 6 files changed, 279 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index ec42a24..1dd7651 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -823,6 +823,26 @@ struct btrfs_csum_item { u8 csum; } __attribute__ ((__packed__)); +struct btrfs_device_stats_item { + /* +* grow this item struct at the end for future enhancements and keep +* the existing values unchanged +*/ + __le64 cnt_write_io_errs; /* EIO or EREMOTEIO from lower layers */ + __le64 cnt_read_io_errs; /* EIO or EREMOTEIO from lower layers */ + __le64 cnt_flush_io_errs; /* EIO or EREMOTEIO from lower layers */ + + /* stats for indirect indications for I/O failures */ + __le64 cnt_corruption_errs; /* checksum error, bytenr error or +* contents is illegal: this is an +* indication that the block was damaged +* during read or write, or written to +* wrong location or read from wrong +* location */ + __le64 cnt_generation_errs; /* an indication that blocks have not +* been written */ +} __attribute__ ((__packed__)); + /* different types of block groups (and chunks) */ #define BTRFS_BLOCK_GROUP_DATA (1ULL << 0) #define BTRFS_BLOCK_GROUP_SYSTEM (1ULL << 1) @@ -1508,6 +1528,12 @@ struct btrfs_ioctl_defrag_range_args { #define BTRFS_BALANCE_ITEM_KEY 248 /* + * Persistantly stores the io stats in the device tree. + * One key for all stats, (0, BTRFS_DEVICE_STATS_KEY, devid). + */ +#define BTRFS_DEVICE_STATS_KEY 249 + +/* * string items are for debugging. They just store a short string of * data in the FS */ @@ -2415,6 +2441,31 @@ static inline u32 btrfs_file_extent_inline_item_len(struct extent_buffer *eb, return btrfs_item_size(eb, e) - offset; } +/* btrfs_device_stats_item */ +BTRFS_SETGET_FUNCS(device_stats_cnt_write_io_errs, + struct btrfs_device_stats_item, cnt_write_io_errs, 64); +BTRFS_SETGET_FUNCS(device_stats_cnt_read_io_errs, + struct btrfs_device_stats_item, cnt_read_io_errs, 64); +BTRFS_SETGET_FUNCS(device_stats_cnt_flush_io_errs, + struct btrfs_device_stats_item, cnt_flush_io_errs, 64); +BTRFS_SETGET_FUNCS(device_stats_cnt_corruption_errs, + struct btrfs_device_stats_item, cnt_corruption_errs, 64); +BTRFS_SETGET_FUNCS(device_stats_cnt_generation_errs, + struct btrfs_device_stats_item, cnt_generation_errs, 64); + +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_write_io_errs, +struct btrfs_device_stats_item, cnt_write_io_errs, 64); +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_read_io_errs, +struct btrfs_device_stats_item, cnt_read_io_errs, 64); +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_flush_io_errs, +struct btrfs_device_stats_item, cnt_flush_io_errs, 64); +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_corruption_errs, +struct btrfs_device_stats_item, cnt_corruption_errs, +64); +BTRFS_SETGET_STACK_FUNCS(stack_device_stats_cnt_generation_errs, +struct btrfs_device_stats_item, cnt_generation_errs, +64); + static inline struct btrfs_fs_info *btrfs_sb(struct super_block *sb) { return sb->s_fs_info; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e123629..7ba08f7 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2353,6 +2353,13 @@ retry_root_backup: fs_info->generation = generation; fs_info->last_trans_committed = generation; + ret = btrfs_init_device_stats(fs_info); + if (ret) { + printk(KERN_ERR "btrfs: failed to init device_stats: %d\n", + ret); + goto fail_block_groups; + } + ret = btrfs_init_space_info(fs_info); if (ret) { printk(KERN_ERR "Failed to initial space info: %d\n", ret); diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c index f38e452..a9e45e4 100644 --- a/fs/btrfs/print-tree.c +++ b/fs/btrfs/print-tree.c @@ -294,6 +294,9 @@ void btrfs_print_leaf(struct btrfs_root *root, struct extent_buffer *l)
[PATCH v4 0/3] Btrfs: add IO error device stats
Changes v1-v2: - Remove restriction that BTRFS_IOC_GET_DEVICE_STATS is a privileged operation - Cast u64 to unsigned long long for printf() Changes v2-v3: - Rebased on Chris' current master Changes v3-v4: - Add padding at end of ioctl structure The goal is to detect when drives start to get an increased error rate, when drives should be replaced soon. Therefore statistic counters are added that count IO errors (read, write and flush). Additionally, the software detected errors like checksum errors and corrupted blocks are counted. An ioctl interface is added to get the device statistic counters. A second ioctl is added to atomically get and reset these counters. The device statistics are written into the device tree with each transaction commit. Only modified statistics are written. When a filesystem is mounted, the device statistics for each involved device are read from the device tree and used to initialize the counters. A patch for the btrfs-progs world will also be sent. Stefan Behrens (3): Btrfs: add device counters for detected IO and checksum errors Btrfs: add ioctl to get and reset the device stats Btrfs: read device stats on mount, write modified ones during commit fs/btrfs/ctree.h | 51 fs/btrfs/disk-io.c | 25 +++- fs/btrfs/extent_io.c | 27 +++- fs/btrfs/ioctl.c | 26 fs/btrfs/ioctl.h | 28 fs/btrfs/print-tree.c |3 + fs/btrfs/scrub.c | 72 --- fs/btrfs/transaction.c |4 + fs/btrfs/volumes.c | 335 +++- fs/btrfs/volumes.h | 43 +++ 10 files changed, 589 insertions(+), 25 deletions(-) -- 1.7.10.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Ceph on btrfs 3.4rc
2012/5/21 Miao Xie : > Hi Josef, > > On fri, 18 May 2012 15:01:05 -0400, Josef Bacik wrote: >> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h >> index 9b9b15f..492c74f 100644 >> --- a/fs/btrfs/btrfs_inode.h >> +++ b/fs/btrfs/btrfs_inode.h >> @@ -57,9 +57,6 @@ struct btrfs_inode { >> /* used to order data wrt metadata */ >> struct btrfs_ordered_inode_tree ordered_tree; >> >> - /* for keeping track of orphaned inodes */ >> - struct list_head i_orphan; >> - >> /* list of all the delalloc inodes in the FS. There are times we need >> * to write all the delalloc pages to disk, and this list is used >> * to walk them all. >> @@ -156,6 +153,8 @@ struct btrfs_inode { >> unsigned dummy_inode:1; >> unsigned in_defrag:1; >> unsigned delalloc_meta_reserved:1; >> + unsigned has_orphan_item:1; >> + unsigned doing_truncate:1; > > I think the problem is we should not use the different lock to protect the > bit fields which > are stored in the same machine word. Or some bit fields may be covered by the > others when > someone change those fields. Could you try to declare > ->delalloc_meta_reserved and ->has_orphan_item > as a integer? I have tried changing it to: struct btrfs_inode { unsigned orphan_meta_reserved:1; unsigned dummy_inode:1; unsigned in_defrag:1; - unsigned delalloc_meta_reserved:1; + int delalloc_meta_reserved; + int has_orphan_item; + int doing_truncate; The strange thing is, that I'm no longer hitting the BUG_ON, but the old WARNING (no additional messages): [351021.157124] [ cut here ] [351021.162400] WARNING: at fs/btrfs/inode.c:2103 btrfs_orphan_commit_root+0xf7/0x100 [btrfs]() [351021.171812] Hardware name: ProLiant DL180 G6 [351021.176867] Modules linked in: btrfs zlib_deflate libcrc32c xfs exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt iTCO_vendor_support ixgbe dca mdio i7core_edac edac_core iomemory_vsl(PO) hpsa squashfs [last unloaded: btrfs] [351021.200236] Pid: 9837, comm: btrfs-transacti Tainted: PW O 3.3.5-1.fits.1.el6.x86_64 #1 [351021.210126] Call Trace: [351021.212957] [] warn_slowpath_common+0x7f/0xc0 [351021.219758] [] warn_slowpath_null+0x1a/0x20 [351021.226385] [] btrfs_orphan_commit_root+0xf7/0x100 [btrfs] [351021.234461] [] commit_fs_roots+0xc6/0x1c0 [btrfs] [351021.241669] [] ? btrfs_run_delayed_items+0xf1/0x160 [btrfs] [351021.249841] [] btrfs_commit_transaction+0x584/0xa50 [btrfs] [351021.258006] [] ? start_transaction+0x92/0x310 [btrfs] [351021.265580] [] ? wake_up_bit+0x40/0x40 [351021.271719] [] transaction_kthread+0x26b/0x2e0 [btrfs] [351021.279405] [] ? btrfs_destroy_marked_extents.clone.0+0x1f0/0x1f0 [btrfs] [351021.288934] [] ? btrfs_destroy_marked_extents.clone.0+0x1f0/0x1f0 [btrfs] [351021.298449] [] kthread+0x9e/0xb0 [351021.303989] [] kernel_thread_helper+0x4/0x10 [351021.310691] [] ? kthread_freezable_should_stop+0x70/0x70 [351021.318555] [] ? gs_change+0x13/0x13 [351021.324479] ---[ end trace 9adc7b36a3e66833 ]--- [351710.339482] [ cut here ] [351710.344754] WARNING: at fs/btrfs/inode.c:2103 btrfs_orphan_commit_root+0xf7/0x100 [btrfs]() [351710.354165] Hardware name: ProLiant DL180 G6 [351710.359222] Modules linked in: btrfs zlib_deflate libcrc32c xfs exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt iTCO_vendor_support ixgbe dca mdio i7core_edac edac_core iomemory_vsl(PO) hpsa squashfs [last unloaded: btrfs] [351710.382569] Pid: 9797, comm: kworker/5:0 Tainted: PW O 3.3.5-1.fits.1.el6.x86_64 #1 [351710.392075] Call Trace: [351710.394901] [] warn_slowpath_common+0x7f/0xc0 [351710.401750] [] warn_slowpath_null+0x1a/0x20 [351710.408414] [] btrfs_orphan_commit_root+0xf7/0x100 [btrfs] [351710.416528] [] commit_fs_roots+0xc6/0x1c0 [btrfs] [351710.423775] [] btrfs_commit_transaction+0x584/0xa50 [btrfs] [351710.431983] [] ? __switch_to+0x153/0x440 [351710.438352] [] ? wake_up_bit+0x40/0x40 [351710.444529] [] ? btrfs_commit_transaction+0xa50/0xa50 [btrfs] [351710.452894] [] do_async_commit+0x1f/0x30 [btrfs] [351710.459979] [] process_one_work+0x129/0x450 [351710.466576] [] worker_thread+0x17b/0x3c0 [351710.472884] [] ? manage_workers+0x220/0x220 [351710.479472] [] kthread+0x9e/0xb0 [351710.485029] [] kernel_thread_helper+0x4/0x10 [351710.491731] [] ? kthread_freezable_should_stop+0x70/0x70 [351710.499640] [] ? gs_change+0x13/0x13 [351710.505590] ---[ end trace 9adc7b36a3e66834 ]--- Regards, Christian -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Newbie questions on some of btrfs code...
Hi Jan, >> # I saw that slot==0 is special. My understanding is that btrfs >> maintains the property that the parent of each node/leaf has a key >> pointing to that node/leaf, which must be equal to the key in the >> slot==0 of this node/leaf. That's what fixup_low_keys() tries to >> maintain. Is this correct? > > Yes. I'm not 100% sure if the key in the parent node must match exactly > the first key of the child node. It is probably allowed that it's less > or equal than the first key. It is guaranteed to be larger than the > largest of the previous (left) node, though. > > And yes, that's what fixup_low_keys is correcting. > >> # If my understanding in the previous bullet is correct: Is that the >> reason that in btrfs_prev_leaf() it is assumed that if there is a >> lesser key, btrfs_search_slot() will never bring us to the slot==0 of >> the current leaf? > > It's quite straight: We look for a key smaller than the first (slot 0) > of the current leaf. If we find the current leaf again (because > btrfs_search_slot returns the place where such a key would have be > inserted), then there's no previous leaf. No preconditions or > assumptions on nodes in levels needed. Let's say that slot[0] of the current leaf (A) has key=10. And let's say that its parent node (N) has key=5 (and not 10). Let's say we have a previous leaf (B), whose last slot has key=2. If such tree is valid, then: btrfs_prev_leaf() will search for key==9. Then btrfs_search_slot() would bring us node N and leaf A again, wouldn't it? Because key(N)<=9. So we will receive leaf A back, and will think that there is no previous leaf, while there is. What am I missing here? Thanks for your help, Alex. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Newbie questions on some of btrfs code...
Thanks, Liu, that clarifies. Alex. On Tue, May 22, 2012 at 4:42 AM, Liu Bo wrote: > On 05/21/2012 06:05 PM, Alex Lyakas wrote: > >> Hi Liu, >> thanks for the clarifications. >> >> I did not understand the dd example of yours, though. >> >>> So for the following situation: item 23 key (266 EXTENT_DATA 4096) itemoff 2269 itemsize 53 extent data disk byte 0 nr 0 extent data offset 0 nr 4096 ram 8192 extent compression 0 >>> As your case, after the first 'size 5' inline extent is written, >>> "nr 4096 < ram 8192" could come from: >>> 1) dd if=/dev/zero of=/mnt/btrfs/foobar bs=1k seek=12 count=4 >>> conv=notrunc;sync >>> 2) dd if=/dev/zero of=/mnt/btrfs/foobar bs=1k seek=8 count=4 >>> conv=notrunc;sync >>> >>> 1) makes item 23 key (266 EXTENT_DATA 4096) itemoff 2269 itemsize 53 extent data disk byte 0 nr 0 extent data offset 0 nr 8192 ram 8192 extent compression 0 >>> 2) makes item 23 key (266 EXTENT_DATA 4096) itemoff 2269 itemsize 53 extent data disk byte 0 nr 0 extent data offset 0 nr 4096 ram 8192 extent compression 0 >> >> You talk about the "ram_bytes" field. But do I need to look at it, if >> I don't use compression or another encoding? Shouldn't I always look >> at btrfs_file_extent_item::offset/num_bytes for the real data, and at >> btrfs_file_extent_item::disk_bytenr/disk_num_bytes for finding >> CHUNK_ITEM? Any reason I should be aware of "ram_bytes" field? >> > >> The first dd created a 4k extent at offset 12k. How did we end up with >> "nr 8192 ram 8192" and offset 4k? >> The second dd added a 4k extent at 8k offset. But still EXTENT_DATA >> has 4k offset. >> So now we should have have twp 4k extents or one 8k extent. What am I >> missing? >> >> Alex. >> > > > As I mentioned, disk_bytenr == 0 means dummy extents, which we have not yet > allocate > a range of space for it. > > After your first 'size=5' inline extent, we'll start allocating extents from > _4096_, cause > it is _4k aligned_. > >>> 1) dd if=/dev/zero of=/mnt/btrfs/foobar bs=1k seek=12 count=4 >>> conv=notrunc;sync > : we need a dummy extent for [4k, 12k], which starts from 4096, and nr is 8192 > >>> 2) dd if=/dev/zero of=/mnt/btrfs/foobar bs=1k seek=8 count=4 >>> conv=notrunc;sync > : we break [4k, 12k] into a dummy one [4k, 8k] and a real one [8k, 12k]. > > More details, plz refer to btrfs_drop_extents(); > > thanks, > liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html