convert raid-10 to raid-0?
Please CC me on any replies as I am not subscribed to this list. Thanks. Is it possible to convert an existing 4-disk btrfs volume created as raid-10 to a btrfs raid-0/striped volume? i've got a btrfs raid-10 volume made with 4x1TB drives that's running out of space, and i'd prefer not to reformat it if i don't have to. craig -- craig sanders BOFH excuse #372: Forced to support NT servers; sysadmins quit. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Safe fsck / consistent backup while mounted
On Sat, 2011-06-04 at 12:25 +0200, Martin Steigerwald wrote: > Hi! > > Now I thought about a way to safely backup a MySQL or other database - > without long service interruption: > > - Tell DB to turn itself into consistent state and freeze there > - sync / btrfs filesystem sync ; fsfreeze -f /mountpoint > - btrfs subvolume snapshot > - fsfreeze -u /mountpoint > - Tell DB to continue business as usual > > My questions are: > 2) Is the sync needed? I'm not sure. In some cases it might not be: E.g. If the database uses fsync() to save the data when you tell it to go into a consistent state, there would be no need to have a separate sync. It shouldn't hurt, however. > 3) Is the fsfreeze needed at all? Does btrfs subvolume freeze the > filesystem prior to the snapshot? The manpage doesn´t tell it. The fsfreeze should not be needed. The btrfs subvolume snapshot command takes an atomic snapshot of the current subvolume state. -- Calvin Walton -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Btrfs updates
Hi everyone, The for-linus branch of the btrfs unstable repo: git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git for-linus Has our collection of fixes. It's a little bigger than usual for rc2 because it includes Josef's queue of Btrfs changes. It seemed best to split them so we could concentrate on looking for any issues in the new btrfs rc1 code from Fujitsu. His tree is bug fixes and journal lock reduction. Some people have reported the initial caching of the free inode number map (which happens only once when it is first enabled) is sucking down too much CPU and IO time on their systems. We don't have that one fixed yet, but this pull does clean up a few other problems in the new inode number allocatgor. It also turns it off by default (mount -o inode_cache to enable). I was on the fence for turning this on by default, but we've already kicked out three bugs so it seems best to keep it optional until 3.1. Josef Bacik (15) commits (+478/-386): Btrfs: don't try to allocate from a block group that doesn't have enough space (+8/-0) Btrfs: take away the num_items argument from btrfs_join_transaction (+42/-48) Btrfs: make sure to use the delalloc reserve when filling delalloc (+2/-0) Btrfs: don't save the inode cache if we are deleting this root (+5/-0) Btrfs: don't look at the extent buffer level 3 times in a row (+0/-3) Btrfs: map the node block when looking for readahead targets (+21/-2) Btrfs: set range_start to the right start in count_range_bits (+1/-1) Btrfs: if we've already started a trans handle, use that one (+19/-0) Btrfs: check for duplicate entries in the free space cache (+24/-3) Btrfs: try not to sleep as much when doing slow caching (+11/-8) Btrfs: fix how we do space reservation for truncate (+123/-37) Btrfs: leave spinning on lookup and map the leaf (+12/-0) Btrfs: kill BTRFS_I(inode)->block_group (+13/-110) Btrfs: don't always do readahead (+20/-5) Btrfs: kill trans_mutex (+177/-169) Chris Mason (3) commits (+54/-9): Btrfs: make sure we don't overflow the free space cache crc page (+19/-8) Btrfs: fix uninit variable in the delayed inode code (+1/-0) Btrfs: add mount -o inode_cache (+34/-1) David Sterba (3) commits (+26/-21): btrfs: use btrfs_ino to access inode number (+5/-4) btrfs: fix uninitialized variable warning (+1/-1) btrfs: add helper for fs_info->closing (+20/-16) Arne Jansen (3) commits (+70/-53): btrfs: scrub: don't reuse bios and pages (+65/-49) btrfs: scrub: add explicit plugging (+4/-3) btrfs: false BUG_ON when degraded (+1/-1) liubo (1) commits (+6/-0): Btrfs: don't save the inode cache in non-FS roots Total: (25) commits fs/btrfs/btrfs_inode.h |3 - fs/btrfs/ctree.c| 28 +++- fs/btrfs/ctree.h| 22 +++- fs/btrfs/delayed-inode.c|8 +- fs/btrfs/disk-io.c | 36 +++--- fs/btrfs/extent-tree.c | 103 ++- fs/btrfs/extent_io.c|2 +- fs/btrfs/file.c | 10 +- fs/btrfs/free-space-cache.c | 70 --- fs/btrfs/inode-map.c| 34 +- fs/btrfs/inode.c| 261 +++-- fs/btrfs/ioctl.c| 26 ++--- fs/btrfs/relocation.c | 34 +++-- fs/btrfs/scrub.c| 123 ++ fs/btrfs/super.c|8 +- fs/btrfs/transaction.c | 302 +++ fs/btrfs/transaction.h | 29 +--- fs/btrfs/volumes.c |2 +- fs/btrfs/xattr.c|2 - 19 files changed, 635 insertions(+), 468 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Safe fsck / consistent backup while mounted
On 04.06.2011 12:25, Martin Steigerwald wrote: Hi! In mailing list debian-user-german we are discussing safe ways to do a fsck when mounted. I tested with Ext4 that fsck -nf works either with mount -o remount,ro or fsfreeze -f while writing with: I=0; while true ; let I=I+1 ; do touch /boot/test$I ; sleep 0.2 ; done In the read only mount case the write application returns errors, in the fsfreeze case Linux kernel stacks the changes in memory, but the fsck reports no errors like it should. for online fsck you can use scrub, it checks at least partially the consistency. Now I thought about a way to safely backup a MySQL or other database - without long service interruption: - Tell DB to turn itself into consistent state and freeze there - sync / btrfs filesystem sync ; fsfreeze -f /mountpoint - btrfs subvolume snapshot - fsfreeze -u /mountpoint - Tell DB to continue business as usual I'd just take a snapshot and backup from there. As a snapshot is a consistent image of the filesystem at the time the snapshot is taken, and every database is required to always have an at least recoverable state on disk, the snapshot represents a state where your DB can recover from. My questions are: 1) Would this work? 2) Is the sync needed? And if so how to avoid the race condition between the sync and the fsfreeze invocation? Reading from the fsfreeze manpage I understand that fsfreeze allows all ongoing transactions to complete. But does that include everything what sync would bring to disk? 3) Is the fsfreeze needed at all? Does btrfs subvolume freeze the filesystem prior to the snapshot? The manpage doesn´t tell it. Thanks, -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Safe fsck / consistent backup while mounted
Now I thought about a way to safely backup a MySQL or other database - without long service interruption: - Tell DB to turn itself into consistent state and freeze there - sync / btrfs filesystem sync ; fsfreeze -f /mountpoint - btrfs subvolume snapshot - fsfreeze -u /mountpoint Hmm, I don't think fsfreeze works properly with btrfs? -- Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kernel BUG at fs/btrfs/extent-tree.c:1418!
Hi, On kernel 2.6.39 I encountered the following kernel BUG (see below). The btrfs filesystem (just application data) is 1.4TB big with several subvolumes, was created with -m raid1 -d raid0, and reports 108G free (via df -h) at the moment. The system has a dual core cpu. The load average is constantly increasing (reached 43 within 2 days), one core is 100% busy with kernel time and the other core is doing maybe up to 5% of kernel and user time while it spends the other 95% with io-wait. All process which tried to access the btrfs filesystem (for writing I guess) are stuck in D state. When the bug occured the vdr was doing a tv recording and noad was rereading another recording for marking all the ads. Unfortunately, I am not at the site of the machine until Sunday evening and the machine did not react on a "shutdown -r", so I think I will have to push the power button then. Is there anything I should take care of before hard rebooting? Thanks, Andreas Philipp [ cut here ] kernel BUG at fs/btrfs/extent-tree.c:1418! invalid opcode: [#1] SMP last sysfs file: /sys/devices/pci:00/:00:1f.2/host2/target2:0:0/2:0:0:0/model CPU 0 Modules linked in: xt_TCPMSS ipt_LOG ipt_REDIRECT xt_tcpudp ipt_MASQUERADE iptable_raw xt_comment iptable_nat ipt_REJECT bridge stp llc iptable_mangle nf_nat_tftp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp iptable_filter xt_DSCP xt_dscp xt_string xt_NFQUEUE xt_multiport xt_mark xt_hashlimit xt_conntrack xt_connmark nf_conntrack ip_tables x_tables coretemp snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss nfsd tun btrfs zlib_deflate lzo_compress cpufreq_ondemand cpufreq_stats acpi_cpufreq freq_table mperf zl10353 em28xx_dvb snd_hda_codec_hdmi tda826x tda10086 lnbp21 stb6100 stb0899 tuner_xc2028 nvidia(P) tuner tvp5150 snd_hda_codec_realtek snd_hda_intel snd_hda_codec budget budget_core saa7146 uvcvideo mantis mantis_core snd_usb_audio em28xx snd_hwdep snd_usbmidi_lib snd_pcm ttpci_eeprom snd_rawmidi snd_timer rtc_cmos snd_seq_device rtc_core tpm_tis dvb_core v4l2_common i2c_i801 videodev ir_lirc_codec lirc_dev tpm videobuf_vmalloc videobuf_core snd rc_core snd_page_alloc joydev tveeprom serio_raw rtc_lib tpm_bios v4l2_compat_ioctl32 processor fuse xfs nfs lockd sunrpc reiserfs raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid0 dm_snapshot dm_mirror dm_region_hash dm_log scsi_wait_scan usbhid uhci_hcd usb_storage ehci_hcd usbcore sg ata_piix ahci libahci pata_jmicron Pid: 5359, comm: btrfs-endio-wri Tainted: PW 2.6.39 #2/965P-DQ6 RIP: 0010:[] [] lookup_inline_extent_backref+0xec/0x3fd [btrfs] RSP: 0018:88012d7db9d0 EFLAGS: 00010202 RAX: 0001 RBX: 880059572a30 RCX: 0019 RDX: 0001 RSI: 8800 RDI: 880136a29ef8 RBP: 00b2 R08: 00800020 R09: R10: 0034 R11: 880059572a30 R12: 88013d018920 R13: 0001 R14: 001d R15: 0001 FS: () GS:88013fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 7faadc4c0c50 CR3: 00013419 CR4: 06f0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process btrfs-endio-wri (pid: 5359, threadinfo 88012d7da000, task 88013bc80100) Stack: 0240 88012d7dbb60 00322d7da000 880059572a30 001d052e 88013afdd800 00352d7dbc41 03b84e69e000 880136a29828 88012d7dbab8 03b84e69e000 0bf000a8 Call Trace: [] ? insert_inline_extent_backref+0x63/0xec [btrfs] [] ? update_block_group+0x1d4/0x1f1 [btrfs] [] ? __btrfs_inc_extent_ref+0xb1/0x1e3 [btrfs] [] ? run_clustered_refs+0x69d/0x768 [btrfs] [] ? btrfs_run_delayed_refs+0xcd/0x1c0 [btrfs] [] ? __btrfs_end_transaction+0x66/0x1c1 [btrfs] [] ? btrfs_finish_ordered_io+0x2b3/0x2d8 [btrfs] [] ? end_bio_extent_writepage+0xa0/0x14a [btrfs] [] ? worker_loop+0x17f/0x47d [btrfs] [] ? btrfs_queue_worker+0x248/0x248 [btrfs] [] ? btrfs_queue_worker+0x248/0x248 [btrfs] [] ? kthread+0x7a/0x82 [] ? kernel_thread_helper+0x4/0x10 [] ? kthread_worker_fn+0x139/0x139 [] ? gs_change+0xb/0xb Code: 24 50 41 b9 01 00 00 00 44 8b 44 24 24 48 89 d9 48 8b 74 24 28 4c 89 e7 e8 7e 63 ff ff 41 89 c5 83 f8 00 0f 8c e0 02 00 00 74 04 <0f> 0b eb fe 4c 8b 2b 48 63 73 40 4c 89 ef 48 6b f6 19 48 83 c6 RIP [] lookup_inline_extent_backref+0xec/0x3fd [btrfs] RSP ---[ end trace 1dac9e78db79cc
Safe fsck / consistent backup while mounted
Hi! In mailing list debian-user-german we are discussing safe ways to do a fsck when mounted. I tested with Ext4 that fsck -nf works either with mount -o remount,ro or fsfreeze -f while writing with: I=0; while true ; let I=I+1 ; do touch /boot/test$I ; sleep 0.2 ; done In the read only mount case the write application returns errors, in the fsfreeze case Linux kernel stacks the changes in memory, but the fsck reports no errors like it should. Now I thought about a way to safely backup a MySQL or other database - without long service interruption: - Tell DB to turn itself into consistent state and freeze there - sync / btrfs filesystem sync ; fsfreeze -f /mountpoint - btrfs subvolume snapshot - fsfreeze -u /mountpoint - Tell DB to continue business as usual My questions are: 1) Would this work? 2) Is the sync needed? And if so how to avoid the race condition between the sync and the fsfreeze invocation? Reading from the fsfreeze manpage I understand that fsfreeze allows all ongoing transactions to complete. But does that include everything what sync would bring to disk? 3) Is the fsfreeze needed at all? Does btrfs subvolume freeze the filesystem prior to the snapshot? The manpage doesn´t tell it. Thanks, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 signature.asc Description: This is a digitally signed message part.
[PATCH v2 9/9] mkfs.btrfs: fix error text in '-r' mode
Smart gcc noticed use of uninitialized warning when compiled with -O0 flags: mkfs.c:1291: error: 'file' may be used uninitialized in this function Signed-off-by: Sergei Trofimovich --- mkfs.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/mkfs.c b/mkfs.c index a65fb4d..44a05e8 100644 --- a/mkfs.c +++ b/mkfs.c @@ -1272,47 +1272,47 @@ int main(int ac, char **av) fprintf(stderr, "error checking %s mount status\n", file); exit(1); } if (ret == 1) { fprintf(stderr, "%s is mounted\n", file); exit(1); } ac--; fd = open(file, O_RDWR); if (fd < 0) { fprintf(stderr, "unable to open %s\n", file); exit(1); } first_fd = fd; first_file = file; ret = btrfs_prepare_device(fd, file, zero_end, &dev_block_count, &mixed); if (block_count == 0) block_count = dev_block_count; } else { ac = 0; + file = output; fd = open_target(output); if (fd < 0) { fprintf(stderr, "unable to open the %s\n", file); exit(1); } - file = output; first_fd = fd; first_file = file; block_count = size_sourcedir(source_dir, sectorsize, &num_of_meta_chunks, &size_of_data); ret = zero_output_file(fd, block_count, sectorsize); if (ret) { fprintf(stderr, "unable to zero the output file\n"); exit(1); } } if (mixed) { if (!metadata_profile_opt) metadata_profile = 0; if (!data_profile_opt) data_profile = 0; if (metadata_profile != data_profile) { fprintf(stderr, "With mixed block groups data and metadata " "profiles must be the same\n"); exit(1); -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 8/9] mkfs.btrfs: fix memory leak caused by 'scandir()' calls
Signed-off-by: Sergei Trofimovich --- mkfs.c | 16 1 files changed, 16 insertions(+), 0 deletions(-) diff --git a/mkfs.c b/mkfs.c index c8b19c1..a65fb4d 100644 --- a/mkfs.c +++ b/mkfs.c @@ -451,53 +451,67 @@ static int fill_inode_item(struct btrfs_trans_handle *trans, blocks += 1; blocks *= sectorsize; btrfs_set_stack_inode_nbytes(dst, blocks); } } if (S_ISLNK(src->st_mode)) btrfs_set_stack_inode_nbytes(dst, src->st_size + 1); return 0; } static int directory_select(const struct direct *entry) { if ((strncmp(entry->d_name, ".", entry->d_reclen) == 0) || (strncmp(entry->d_name, "..", entry->d_reclen) == 0)) return 0; else return 1; } +static void free_namelist(struct direct **files, int count) +{ + int i; + + if (count < 0) + return; + + for (i = 0; i < count; ++i) + free(files[i]); + free (files); +} + static u64 calculate_dir_inode_size(char *dirname) { int count, i; struct direct **files, *cur_file; u64 dir_inode_size = 0; count = scandir(dirname, &files, directory_select, NULL); for (i = 0; i < count; i++) { cur_file = files[i]; dir_inode_size += strlen(cur_file->d_name); } + free_namelist(files, count); + dir_inode_size *= 2; return dir_inode_size; } static int add_inode_items(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct stat *st, char *name, u64 self_objectid, ino_t parent_inum, int dir_index_cnt, struct btrfs_inode_item *inode_ret) { int ret; struct btrfs_key inode_key; struct btrfs_inode_item btrfs_inode; u64 objectid; u64 inode_size = 0; int name_len; name_len = strlen(name); fill_inode_item(trans, root, &btrfs_inode, st); objectid = self_objectid; @@ -954,49 +968,51 @@ static int traverse_directory(struct btrfs_trans_handle *trans, dir_entry->inum = cur_inum; list_add_tail(&dir_entry->list, &dir_head->list); } else if (S_ISREG(st.st_mode)) { ret = add_file_items(trans, root, &cur_inode, cur_inum, parent_inum, &st, cur_file->d_name, out_fd); if (ret) { fprintf(stderr, "add_file_items failed\n"); goto fail; } } else if (S_ISLNK(st.st_mode)) { ret = add_symbolic_link(trans, root, cur_inum, cur_file->d_name); if (ret) { fprintf(stderr, "add_symbolic_link failed\n"); goto fail; } } } + free_namelist(files, count); free(parent_dir_entry->path); free(parent_dir_entry); index_cnt = 2; } while (!list_empty(&dir_head->list)); return 0; fail: + free_namelist(files, count); free(parent_dir_entry->path); free(parent_dir_entry); return -1; } static int open_target(char *output_name) { int output_fd; output_fd = open(output_name, O_CREAT | O_RDWR | O_TRUNC, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH); return output_fd; } static int create_chunks(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 num_of_meta_chunks, u64 size_of_data) { u64 chunk_start; u64 chunk_size; -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 7/9] mkfs.btrfs: free buffers allocated by pretty_sizes
found by valgrind: ==2559== 16 bytes in 1 blocks are definitely lost in loss record 3 of 19 ==2559==at 0x4C2720E: malloc (vg_replace_malloc.c:236) ==2559==by 0x412F7E: pretty_sizes (utils.c:1054) ==2559==by 0x4179E9: main (mkfs.c:1395) Signed-off-by: Sergei Trofimovich --- mkfs.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/mkfs.c b/mkfs.c index 32f25f5..c8b19c1 100644 --- a/mkfs.c +++ b/mkfs.c @@ -1159,40 +1159,41 @@ int main(int ac, char **av) u64 data_profile = BTRFS_BLOCK_GROUP_RAID0; u32 leafsize = getpagesize(); u32 sectorsize = 4096; u32 nodesize = leafsize; u32 stripesize = 4096; int zero_end = 1; int option_index = 0; int fd; int first_fd; int ret; int i; int mixed = 0; int data_profile_opt = 0; int metadata_profile_opt = 0; char *source_dir = NULL; int source_dir_set = 0; char *output = "output.img"; u64 num_of_meta_chunks = 0; u64 size_of_data = 0; + char * pretty_buf; while(1) { int c; c = getopt_long(ac, av, "A:b:l:n:s:m:d:L:r:VM", long_options, &option_index); if (c < 0) break; switch(c) { case 'A': alloc_start = parse_size(optarg); break; case 'd': data_profile = parse_profile(optarg); data_profile_opt = 1; break; case 'l': leafsize = parse_size(optarg); break; case 'L': label = parse_label(optarg); @@ -1378,41 +1379,42 @@ raid_groups: if (!source_dir_set) { ret = create_raid_groups(trans, root, data_profile, metadata_profile, mixed); BUG_ON(ret); } ret = create_data_reloc_tree(trans, root); BUG_ON(ret); if (mixed) { struct btrfs_super_block *super = &root->fs_info->super_copy; u64 flags = btrfs_super_incompat_flags(super); flags |= BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS; btrfs_set_super_incompat_flags(super, flags); } printf("fs created label %s on %s\n\tnodesize %u leafsize %u " "sectorsize %u size %s\n", label, first_file, nodesize, leafsize, sectorsize, - pretty_sizes(btrfs_super_total_bytes(&root->fs_info->super_copy))); + pretty_buf = pretty_sizes(btrfs_super_total_bytes(&root->fs_info->super_copy))); + free (pretty_buf); printf("%s\n", BTRFS_BUILD_VERSION); btrfs_commit_transaction(trans, root); if (source_dir_set) { trans = btrfs_start_transaction(root, 1); ret = create_chunks(trans, root, num_of_meta_chunks, size_of_data); BUG_ON(ret); btrfs_commit_transaction(trans, root); ret = make_image(source_dir, root, fd); BUG_ON(ret); } ret = close_ctree(root); BUG_ON(ret); free(label); return 0; -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 6/9] mkfs.btrfs: write zeroes instead on uninitialized data.
Found by valgrind: ==8968== Use of uninitialised value of size 8 ==8968==at 0x41CE7D: crc32c_le (crc32c.c:98) ==8968==by 0x40A1D0: csum_tree_block_size (disk-io.c:82) ==8968==by 0x40A2D4: csum_tree_block (disk-io.c:105) ==8968==by 0x40A7D6: write_tree_block (disk-io.c:241) ==8968==by 0x40ACEE: __commit_transaction (disk-io.c:354) ==8968==by 0x40AE9E: btrfs_commit_transaction (disk-io.c:385) ==8968==by 0x42CF66: make_image (mkfs.c:1061) ==8968==by 0x42DE63: main (mkfs.c:1410) ==8968== Uninitialised value was created by a stack allocation ==8968==at 0x42B5FB: add_inode_items (mkfs.c:493) 1. On-disk inode format has reserved (and thus, random at alloc time) fields: btrfs_inode_item: __le64 reserved[4] 2. Sometimes extents are created on disk without writing data there. (Or at least not all data is written there). Kernel code always had it kzalloc'ed. Zero them all. Signed-off-by: Sergei Trofimovich --- extent_io.c |1 + mkfs.c |7 +++ 2 files changed, 8 insertions(+), 0 deletions(-) diff --git a/extent_io.c b/extent_io.c index 069c199..a93d4d6 100644 --- a/extent_io.c +++ b/extent_io.c @@ -555,40 +555,41 @@ static int free_some_buffers(struct extent_io_tree *tree) } else { list_move_tail(&eb->lru, &tree->lru); } if (nrscan++ > 64) break; } return 0; } static struct extent_buffer *__alloc_extent_buffer(struct extent_io_tree *tree, u64 bytenr, u32 blocksize) { struct extent_buffer *eb; int ret; eb = malloc(sizeof(struct extent_buffer) + blocksize); if (!eb) { BUG(); return NULL; } + memset (eb, 0, sizeof(struct extent_buffer) + blocksize); eb->start = bytenr; eb->len = blocksize; eb->refs = 2; eb->flags = 0; eb->tree = tree; eb->fd = -1; eb->dev_bytenr = (u64)-1; eb->cache_node.start = bytenr; eb->cache_node.size = blocksize; free_some_buffers(tree); ret = insert_existing_cache_extent(&tree->cache, &eb->cache_node); if (ret) { free(eb); return NULL; } list_add_tail(&eb->lru, &tree->lru); tree->cache_size += blocksize; return eb; diff --git a/mkfs.c b/mkfs.c index 8ff2b1e..32f25f5 100644 --- a/mkfs.c +++ b/mkfs.c @@ -394,40 +394,47 @@ static int add_directory_items(struct btrfs_trans_handle *trans, if (S_ISLNK(st->st_mode)) filetype = BTRFS_FT_SYMLINK; ret = btrfs_insert_dir_item(trans, root, name, name_len, parent_inum, &location, filetype, index_cnt); *dir_index_cnt = index_cnt; index_cnt++; return ret; } static int fill_inode_item(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_inode_item *dst, struct stat *src) { u64 blocks = 0; u64 sectorsize = root->sectorsize; + /* +* btrfs_inode_item has some reserved fields +* and represents on-disk inode entry, so +* zero everything to prevent information leak +*/ + memset (dst, 0, sizeof (*dst)); + btrfs_set_stack_inode_generation(dst, trans->transid); btrfs_set_stack_inode_size(dst, src->st_size); btrfs_set_stack_inode_nbytes(dst, 0); btrfs_set_stack_inode_block_group(dst, 0); btrfs_set_stack_inode_nlink(dst, src->st_nlink); btrfs_set_stack_inode_uid(dst, src->st_uid); btrfs_set_stack_inode_gid(dst, src->st_gid); btrfs_set_stack_inode_mode(dst, src->st_mode); btrfs_set_stack_inode_rdev(dst, 0); btrfs_set_stack_inode_flags(dst, 0); btrfs_set_stack_timespec_sec(&dst->atime, src->st_atime); btrfs_set_stack_timespec_nsec(&dst->atime, 0); btrfs_set_stack_timespec_sec(&dst->ctime, src->st_ctime); btrfs_set_stack_timespec_nsec(&dst->ctime, 0); btrfs_set_stack_timespec_sec(&dst->mtime, src->st_mtime); btrfs_set_stack_timespec_nsec(&dst->mtime, 0); btrfs_set_stack_timespec_sec(&dst->otime, 0); btrfs_set_stack_timespec_nsec(&dst->otime, 0); if (S_ISDIR(src->st_mode)) { -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 5/9] mkfs.btrfs: fix symlink names writing
Found by valgrind: ==8968== Use of uninitialised value of size 8 ==8968==at 0x41CE7D: crc32c_le (crc32c.c:98) ==8968==by 0x40A1D0: csum_tree_block_size (disk-io.c:82) ==8968==by 0x40A2D4: csum_tree_block (disk-io.c:105) ==8968==by 0x40A7D6: write_tree_block (disk-io.c:241) ==8968==by 0x40ACEE: __commit_transaction (disk-io.c:354) ==8968==by 0x40AE9E: btrfs_commit_transaction (disk-io.c:385) ==8968==by 0x42CF66: make_image (mkfs.c:1061) ==8968==by 0x42DE63: main (mkfs.c:1410) ==8968== Uninitialised value was created by a stack allocation ==8968==at 0x42B5FB: add_inode_items (mkfs.c:493) readlink(2) does not write '\0' for us, so make it manually. Signed-off-by: Sergei Trofimovich --- mkfs.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/mkfs.c b/mkfs.c index 9d7b792..8ff2b1e 100644 --- a/mkfs.c +++ b/mkfs.c @@ -692,45 +692,47 @@ static int record_file_extent(struct btrfs_trans_handle *trans, root->root_key.objectid, objectid, 0); fail: btrfs_release_path(root, &path); return ret; } static int add_symbolic_link(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 objectid, const char *path_name) { int ret; u64 sectorsize = root->sectorsize; char *buf = malloc(sectorsize); ret = readlink(path_name, buf, sectorsize); if (ret <= 0) { fprintf(stderr, "readlink failed for %s\n", path_name); goto fail; } - if (ret > sectorsize) { + if (ret >= sectorsize) { fprintf(stderr, "symlink too long for %s", path_name); ret = -1; goto fail; } + + buf[ret] = '\0'; /* readlink does not do it for us */ ret = btrfs_insert_inline_extent(trans, root, objectid, 0, buf, ret + 1); fail: free(buf); return ret; } static int add_file_items(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_inode_item *btrfs_inode, u64 objectid, ino_t parent_inum, struct stat *st, const char *path_name, int out_fd) { int ret; ssize_t ret_read; u64 bytes_read = 0; char *buffer = NULL; struct btrfs_key key; int blocks; u32 sectorsize = root->sectorsize; -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/9] mkfs.btrfs: return some defined value instead of garbage when lookup checksum
==31873== Command: ./mkfs.btrfs -r /some/root/ ==31873== Parent PID: 31872 ==31873== ==31873== Conditional jump or move depends on uninitialised value(s) ==31873==at 0x42C3D0: add_file_items (mkfs.c:792) ==31873==by 0x42CAB3: traverse_directory (mkfs.c:948) ==31873==by 0x42CF11: make_image (mkfs.c:1047) ==31873==by 0x42DE53: main (mkfs.c:1401) ==31873== Uninitialised value was created by a stack allocation ==31873==at 0x41B1B1: btrfs_csum_file_block (file-item.c:195) 'ret' value was not initialized for 'found' branch. The same fix sits in kernel: > commit 639cb58675ce9b507eed9c3d6b3335488079b21a > Author: Chris Mason > Date: Thu Aug 28 06:15:25 2008 -0400 > > Btrfs: Fix variable init during csum creation > > Signed-off-by: Chris Mason Signed-off-by: Sergei Trofimovich --- file-item.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/file-item.c b/file-item.c index 9732282..47f6ad2 100644 --- a/file-item.c +++ b/file-item.c @@ -201,40 +201,41 @@ int btrfs_csum_file_block(struct btrfs_trans_handle *trans, struct btrfs_path *path; struct btrfs_csum_item *item; struct extent_buffer *leaf = NULL; u64 csum_offset; u32 csum_result = ~(u32)0; u32 nritems; u32 ins_size; u16 csum_size = btrfs_super_csum_size(&root->fs_info->super_copy); path = btrfs_alloc_path(); BUG_ON(!path); file_key.objectid = BTRFS_EXTENT_CSUM_OBJECTID; file_key.offset = bytenr; file_key.type = BTRFS_EXTENT_CSUM_KEY; item = btrfs_lookup_csum(trans, root, path, bytenr, 1); if (!IS_ERR(item)) { leaf = path->nodes[0]; + ret = 0; goto found; } ret = PTR_ERR(item); if (ret == -EFBIG) { u32 item_size; /* we found one, but it isn't big enough yet */ leaf = path->nodes[0]; item_size = btrfs_item_size_nr(leaf, path->slots[0]); if ((item_size / csum_size) >= MAX_CSUM_ITEMS(root, csum_size)) { /* already at max size, make a new one */ goto insert; } } else { int slot = path->slots[0] + 1; /* we didn't find a csum item, insert one */ nritems = btrfs_header_nritems(path->nodes[0]); if (path->slots[0] >= nritems - 1) { ret = btrfs_next_leaf(root, path); if (ret == 1) found_next = 1; -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/9] mkfs.btrfs: fail on scandir error (-r mode)
mkfs.btrfs does not handle relative pathnames for now. When they are passed to it it creates empty image. So first time I thought it does not work at all. This patch adds error handling for scandir(). With patch it behaves this way: $ mkfs.btrfs -r ./root ... fs created label (null) on output.img nodesize 4096 leafsize 4096 sectorsize 4096 size 256.00MB Btrfs v0.19-52-g438c5ff-dirty scandir for ./root failed: No such file or directory unable to traverse_directory Making image is aborted. mkfs.btrfs: mkfs.c:1402: main: Assertion `!(ret)' failed. Signed-off-by: Sergei Trofimovich --- mkfs.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/mkfs.c b/mkfs.c index 57c88f9..9d7b792 100644 --- a/mkfs.c +++ b/mkfs.c @@ -878,40 +878,46 @@ static int traverse_directory(struct btrfs_trans_handle *trans, btrfs_mark_buffer_dirty(leaf); btrfs_release_path(root, &path); do { parent_dir_entry = list_entry(dir_head->list.next, struct directory_name_entry, list); list_del(&parent_dir_entry->list); parent_inum = parent_dir_entry->inum; parent_dir_name = parent_dir_entry->dir_name; if (chdir(parent_dir_entry->path)) { fprintf(stderr, "chdir error for %s\n", parent_dir_name); goto fail; } count = scandir(parent_dir_entry->path, &files, directory_select, NULL); + if (count == -1) + { + fprintf(stderr, "scandir for %s failed: %s\n", + parent_dir_name, strerror (errno)); + goto fail; + } for (i = 0; i < count; i++) { cur_file = files[i]; if (lstat(cur_file->d_name, &st) == -1) { fprintf(stderr, "lstat failed for file %s\n", cur_file->d_name); goto fail; } cur_inum = ++highest_inum + BTRFS_FIRST_FREE_OBJECTID; ret = add_directory_items(trans, root, cur_inum, parent_inum, cur_file->d_name, &st, &dir_index_cnt); if (ret) { fprintf(stderr, "add_directory_items failed\n"); goto fail; } -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/9] btrfs-convert: fix typo: 'all inode' -> 'all inodes'
Signed-off-by: Sergei Trofimovich --- convert.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/convert.c b/convert.c index fbcf4a3..291dc27 100644 --- a/convert.c +++ b/convert.c @@ -1103,41 +1103,41 @@ static int copy_disk_extent(struct btrfs_root *root, u64 dst_bytenr, char *buffer; struct btrfs_fs_devices *fs_devs = root->fs_info->fs_devices; buffer = malloc(num_bytes); if (!buffer) return -ENOMEM; ret = pread(fs_devs->latest_bdev, buffer, num_bytes, src_bytenr); if (ret != num_bytes) goto fail; ret = pwrite(fs_devs->latest_bdev, buffer, num_bytes, dst_bytenr); if (ret != num_bytes) goto fail; ret = 0; fail: free(buffer); if (ret > 0) ret = -1; return ret; } /* - * scan ext2's inode bitmap and copy all used inode. + * scan ext2's inode bitmap and copy all used inodes. */ static int copy_inodes(struct btrfs_root *root, ext2_filsys ext2_fs, int datacsum, int packing, int noxattr) { int ret; errcode_t err; ext2_inode_scan ext2_scan; struct ext2_inode ext2_inode; ext2_ino_t ext2_ino; u64 objectid; struct btrfs_trans_handle *trans; trans = btrfs_start_transaction(root, 1); if (!trans) return -ENOMEM; err = ext2fs_open_inode_scan(ext2_fs, 0, &ext2_scan); if (err) { fprintf(stderr, "ext2fs_open_inode_scan: %s\n", error_message(err)); return -1; } -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/9] btrfs progs: fix extra metadata chunk allocation in --mixed case
From: Arne Jansen When creating a mixed fs with mkfs, an extra metadata chunk got allocated. This is because btrfs_reserve_extent calls do_chunk_alloc for METADATA, which in turn wasn't able to find the proper space_info, as __find_space_info did a hard compare of the flags. It is now sufficient for the space_info to include the proper flag. This reflects the change done to the kernel code to support mixed chunks. Also for a subsequent chunk allocation (which should not be hit in the mkfs case), the chunk is now created with the flags from the space_info instead of the requested flags. A better solution would be to pull the full changeset for the mixed case from the kernel into the user mode (or, even better, share the code) The additional chunk probably confused block_rsv calculation, which in turn led to severeal ENOSPC Oopses. Signed-off-by: Arne Jansen --- extent-tree.c |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/extent-tree.c b/extent-tree.c index b2f9bb2..c6c77c6 100644 --- a/extent-tree.c +++ b/extent-tree.c @@ -1718,41 +1718,41 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans, clear_extent_bits(block_group_cache, start, end, BLOCK_GROUP_DIRTY, GFP_NOFS); cache = (struct btrfs_block_group_cache *)(unsigned long)ptr; ret = write_one_cache_group(trans, root, path, cache); BUG_ON(ret); } btrfs_free_path(path); return 0; } static struct btrfs_space_info *__find_space_info(struct btrfs_fs_info *info, u64 flags) { struct list_head *head = &info->space_info; struct list_head *cur; struct btrfs_space_info *found; list_for_each(cur, head) { found = list_entry(cur, struct btrfs_space_info, list); - if (found->flags == flags) + if (found->flags & flags) return found; } return NULL; } static int update_space_info(struct btrfs_fs_info *info, u64 flags, u64 total_bytes, u64 bytes_used, struct btrfs_space_info **space_info) { struct btrfs_space_info *found; found = __find_space_info(info, flags); if (found) { found->total_bytes += total_bytes; found->bytes_used += bytes_used; WARN_ON(found->total_bytes < found->bytes_used); *space_info = found; return 0; } @@ -1795,49 +1795,50 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans, u64 start; u64 num_bytes; int ret; space_info = __find_space_info(extent_root->fs_info, flags); if (!space_info) { ret = update_space_info(extent_root->fs_info, flags, 0, 0, &space_info); BUG_ON(ret); } BUG_ON(!space_info); if (space_info->full) return 0; thresh = div_factor(space_info->total_bytes, 7); if ((space_info->bytes_used + space_info->bytes_pinned + alloc_bytes) < thresh) return 0; - ret = btrfs_alloc_chunk(trans, extent_root, &start, &num_bytes, flags); + ret = btrfs_alloc_chunk(trans, extent_root, &start, &num_bytes, + space_info->flags); if (ret == -ENOSPC) { space_info->full = 1; return 0; } BUG_ON(ret); - ret = btrfs_make_block_group(trans, extent_root, 0, flags, + ret = btrfs_make_block_group(trans, extent_root, 0, space_info->flags, BTRFS_FIRST_CHUNK_TREE_OBJECTID, start, num_bytes); BUG_ON(ret); return 0; } static int update_block_group(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 num_bytes, int alloc, int mark_free) { struct btrfs_block_group_cache *cache; struct btrfs_fs_info *info = root->fs_info; u64 total = num_bytes; u64 old_val; u64 byte_in_group; u64 start; u64 end; /* block accounting for super block */ old_val = btrfs_super_bytes_used(&info->super_copy); -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/9] btrfs-progs: some fixes for bugs spotted by valgrind
tmp branch recently got very nice feature: 'mkfs.btrfs -r /some/directory'. It's very useful, when you need to creare minimal root: /bin/sh and fs_mark. But there is another hidden feature! As '-r' can create whole filesystem we can effectively valgrind a lot of code paths in btrfs and pick bugs. This patch series is mostly (with one exception) dumb obvous holes plugs (sometimes they are backports from kernel). Patchset based on git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git#tmp commit e6bd18d8938986c997c45f0ea95b221d4edec095 Author: Christoph Hellwig Date: Thu Apr 21 16:24:07 2011 -0400 First off the exception: In order to make --mixed produce proper filesystems with meta+data only blocks (and not meta+data/data ones, which confused space_cache and led to an oops for me) I ask to consider for pulling Arne's patch: > Subject: [PATCH v2 1/9] btrfs progs: fix extra metadata chunk allocation in > --mixed case The rest of patches should be obvoius. They don't fix all the fair valgrind compaints, but reduce them severely. Changes since v1: - "[PATCH 8/9] mkfs.btrfs: fix memory leak caused by 'scandir()' calls": 'free_namelist()' now works correctly if 'count == -1'. It happens when 'free_namelist()' is called right after 'scandir()' returning an error. Some stats: convert.c |2 +- extent-tree.c |7 --- extent_io.c |1 + file-item.c |1 + mkfs.c| 39 --- 5 files changed, 43 insertions(+), 7 deletions(-) Arne Jansen (1): btrfs progs: fix extra metadata chunk allocation in --mixed case Sergei Trofimovich (8): btrfs-convert: fix typo: 'all inode' -> 'all inodes' mkfs.btrfs: fail on scandir error (-r mode) mkfs.btrfs: return some defined value instead of garbage when lookup checksum mkfs.btrfs: fix symlink names writing mkfs.btrfs: write zeroes instead on uninitialized data. mkfs.btrfs: free buffers allocated by pretty_sizes mkfs.btrfs: fix memory leak caused by 'scandir()' calls mkfs.btrfs: fix error text in '-r' mode -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Quota Implementation
On 03.06.2011 18:47, Hugo Mills wrote: On Fri, Jun 03, 2011 at 06:24:41PM +0200, Arne Jansen wrote: Hi, If no one is already working on it, I'd like to take the Quota lock and see how far I come. Let me sketch out in short what I'm planning to do: - Quota will be subvolume based. Only the FS-trees and data extents will be accounted. - Quota Groups can be defined. Every quota group can comprise any number of subvolumes. A subvolume can be assigned to any number of quota groups. - A Quota Group can account/limit the total amount of space that is referenced by it and/or the amount of space that is exclusively referenced (i.e. referenced by no other quota group). - With this it is possible to define a hierarchical quota that need not necessarily reflect the filesystem hierarchy. - It is also possible to decide for each snapshot if it should be accounted into the parent group. So in a scenario where each subvolume reflect a user home, it's possible to have some snapshots accounted to the user and others not (e.g. the ones needed for system backups). - Quota information will be stored in new records, possibly in a separate tree. - It should be possible to change the Quota config and group assignments online, though this might need a full re-scan of the fs. - It does NOT include any kind of user/group (UID/GID) quota. Any addenda or arguments why it's impossible or insane welcome. There's a problem in that in some cases, it's possible to get into a situation where you can't *delete* files because you're going over quota. If I have two subvolumes that share most of their data (e.g. one is a snapshot of the other), and both subvolumes have a limit under the "exclusive use" clause, then deleting material from subvolume A could cause subvolume B to go over quota. I wouldn't prevent the deletion in A, but let go B over quota instead. Maybe a limit on exclusive use is of little practical use, but a tracking of it is very useful, as it is the space that will get freed if this subvol should get deleted. So it is an answer to the question 'how big is this snapshot?'. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html