convert raid-10 to raid-0?

2011-06-04 Thread Craig Sanders
Please CC me on any replies as I am not subscribed to this list. Thanks.

Is it possible to convert an existing 4-disk btrfs volume created as
raid-10 to a btrfs raid-0/striped volume?

i've got a btrfs raid-10 volume made with 4x1TB drives that's running
out of space, and i'd prefer not to reformat it if i don't have to.

craig

-- 
craig sanders 

BOFH excuse #372:

Forced to support NT servers; sysadmins quit.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Safe fsck / consistent backup while mounted

2011-06-04 Thread Calvin Walton
On Sat, 2011-06-04 at 12:25 +0200, Martin Steigerwald wrote:
> Hi!
> 
> Now I thought about a way to safely backup a MySQL or other database - 
> without long service interruption:
> 
> - Tell DB to turn itself into consistent state and freeze there
> - sync / btrfs filesystem sync ; fsfreeze -f /mountpoint
> - btrfs subvolume snapshot
> - fsfreeze -u /mountpoint
> - Tell DB to continue business as usual
> 
> My questions are:
> 2) Is the sync needed?

I'm not sure. In some cases it might not be: E.g. If the database uses
fsync() to save the data when you tell it to go into a consistent state,
there would be no need to have a separate sync. It shouldn't hurt,
however.

> 3) Is the fsfreeze needed at all? Does btrfs subvolume freeze the 
> filesystem prior to the snapshot? The manpage doesn´t tell it.

The fsfreeze should not be needed. The btrfs subvolume snapshot command
takes an atomic snapshot of the current subvolume state.

-- 
Calvin Walton 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] Btrfs updates

2011-06-04 Thread Chris Mason
Hi everyone,

The for-linus branch of the btrfs unstable repo:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git for-linus

Has our collection of fixes.  It's a little bigger than usual for rc2
because it includes Josef's queue of Btrfs changes.  It seemed best to
split them so we could concentrate on looking for any issues in the new
btrfs rc1 code from Fujitsu.  His tree is bug fixes and journal lock
reduction.

Some people have reported the initial caching of the free inode number
map (which happens only once when it is first enabled) is sucking down
too much CPU and IO time on their systems.  We don't have that one fixed
yet, but this pull does clean up a few other problems in the new inode
number allocatgor.  It also turns it off by default (mount -o
inode_cache to enable).

I was on the fence for turning this on by default, but we've already
kicked out three bugs so it seems best to keep it optional until 3.1.

Josef Bacik (15) commits (+478/-386):
Btrfs: don't try to allocate from a block group that doesn't have enough 
space (+8/-0)
Btrfs: take away the num_items argument from btrfs_join_transaction 
(+42/-48)
Btrfs: make sure to use the delalloc reserve when filling delalloc (+2/-0)
Btrfs: don't save the inode cache if we are deleting this root (+5/-0)
Btrfs: don't look at the extent buffer level 3 times in a row (+0/-3)
Btrfs: map the node block when looking for readahead targets (+21/-2)
Btrfs: set range_start to the right start in count_range_bits (+1/-1)
Btrfs: if we've already started a trans handle, use that one (+19/-0)
Btrfs: check for duplicate entries in the free space cache (+24/-3)
Btrfs: try not to sleep as much when doing slow caching (+11/-8)
Btrfs: fix how we do space reservation for truncate (+123/-37)
Btrfs: leave spinning on lookup and map the leaf (+12/-0)
Btrfs: kill BTRFS_I(inode)->block_group (+13/-110)
Btrfs: don't always do readahead (+20/-5)
Btrfs: kill trans_mutex (+177/-169)

Chris Mason (3) commits (+54/-9):
Btrfs: make sure we don't overflow the free space cache crc page (+19/-8)
Btrfs: fix uninit variable in the delayed inode code (+1/-0)
Btrfs: add mount -o inode_cache (+34/-1)

David Sterba (3) commits (+26/-21):
btrfs: use btrfs_ino to access inode number (+5/-4)
btrfs: fix uninitialized variable warning (+1/-1)
btrfs: add helper for fs_info->closing (+20/-16)

Arne Jansen (3) commits (+70/-53):
btrfs: scrub: don't reuse bios and pages (+65/-49)
btrfs: scrub: add explicit plugging (+4/-3)
btrfs: false BUG_ON when degraded (+1/-1)

liubo (1) commits (+6/-0):
Btrfs: don't save the inode cache in non-FS roots

Total: (25) commits

 fs/btrfs/btrfs_inode.h  |3 -
 fs/btrfs/ctree.c|   28 +++-
 fs/btrfs/ctree.h|   22 +++-
 fs/btrfs/delayed-inode.c|8 +-
 fs/btrfs/disk-io.c  |   36 +++---
 fs/btrfs/extent-tree.c  |  103 ++-
 fs/btrfs/extent_io.c|2 +-
 fs/btrfs/file.c |   10 +-
 fs/btrfs/free-space-cache.c |   70 ---
 fs/btrfs/inode-map.c|   34 +-
 fs/btrfs/inode.c|  261 +++--
 fs/btrfs/ioctl.c|   26 ++---
 fs/btrfs/relocation.c   |   34 +++--
 fs/btrfs/scrub.c|  123 ++
 fs/btrfs/super.c|8 +-
 fs/btrfs/transaction.c  |  302 +++
 fs/btrfs/transaction.h  |   29 +---
 fs/btrfs/volumes.c  |2 +-
 fs/btrfs/xattr.c|2 -
 19 files changed, 635 insertions(+), 468 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Safe fsck / consistent backup while mounted

2011-06-04 Thread Arne Jansen

On 04.06.2011 12:25, Martin Steigerwald wrote:

Hi!

In mailing list debian-user-german we are discussing safe ways to do a
fsck when mounted.

I tested with Ext4 that fsck -nf works either with mount -o remount,ro or
fsfreeze -f while writing with:

I=0; while true ; let I=I+1 ; do touch /boot/test$I ; sleep 0.2 ; done

In the read only mount case the write application returns errors, in the
fsfreeze case Linux kernel stacks the changes in memory, but the fsck
reports no errors like it should.



for online fsck you can use scrub, it checks at least partially the
consistency.





Now I thought about a way to safely backup a MySQL or other database -
without long service interruption:

- Tell DB to turn itself into consistent state and freeze there
- sync / btrfs filesystem sync ; fsfreeze -f /mountpoint
- btrfs subvolume snapshot
- fsfreeze -u /mountpoint
- Tell DB to continue business as usual


I'd just take a snapshot and backup from there. As a snapshot is a
consistent image of the filesystem at the time the snapshot is taken,
and every database is required to always have an at least recoverable
state on disk, the snapshot represents a state where your DB can
recover from.



My questions are:

1) Would this work?

2) Is the sync needed? And if so how to avoid the race condition between
the sync and the fsfreeze invocation? Reading from the fsfreeze manpage I
understand that fsfreeze allows all ongoing transactions to complete. But
does that include everything what sync would bring to disk?

3) Is the fsfreeze needed at all? Does btrfs subvolume freeze the
filesystem prior to the snapshot? The manpage doesn´t tell it.

Thanks,


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Safe fsck / consistent backup while mounted

2011-06-04 Thread Tomasz Chmielewski

Now I thought about a way to safely backup a MySQL or other database -
without long service interruption:

- Tell DB to turn itself into consistent state and freeze there
- sync / btrfs filesystem sync ; fsfreeze -f /mountpoint
- btrfs subvolume snapshot
- fsfreeze -u /mountpoint


Hmm, I don't think fsfreeze works properly with btrfs?


--
Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel BUG at fs/btrfs/extent-tree.c:1418!

2011-06-04 Thread Andreas Philipp
Hi,

On kernel 2.6.39 I encountered the following kernel BUG (see below). The btrfs 
filesystem (just application data) is 1.4TB big with several subvolumes, was 
created with -m raid1 -d raid0, and reports 108G free (via df -h) at the 
moment. The system has a dual core cpu. The load average is constantly 
increasing (reached 43 within 2 days), one core is 100% busy with kernel time 
and the other core is doing maybe up to 5% of kernel and user time while it 
spends the other 95% with io-wait. All process which tried to access the btrfs 
filesystem (for writing I guess) are stuck in D state. When the bug occured the 
vdr was doing a tv recording and noad was rereading another recording for 
marking all the ads.
Unfortunately, I am not at the site of the machine until Sunday evening and the 
machine did not react on a "shutdown -r", so I think I will have to push the 
power button then.
Is there anything I should take care of before hard rebooting?

Thanks,
Andreas Philipp

[ cut here ]
kernel BUG at fs/btrfs/extent-tree.c:1418!
invalid opcode:  [#1] SMP
last sysfs file: 
/sys/devices/pci:00/:00:1f.2/host2/target2:0:0/2:0:0:0/model
CPU 0
Modules linked in: xt_TCPMSS ipt_LOG ipt_REDIRECT xt_tcpudp ipt_MASQUERADE 
iptable_raw xt_comment iptable_nat ipt_REJECT bridge stp llc iptable_mangle 
nf_nat_tftp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 
nf_nat_ftp nf_nat_amanda nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 ts_kmp 
nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip 
nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre 
nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast 
nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp iptable_filter xt_DSCP 
xt_dscp xt_string xt_NFQUEUE xt_multiport xt_mark xt_hashlimit xt_conntrack 
xt_connmark nf_conntrack ip_tables x_tables coretemp snd_seq_oss 
snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss nfsd tun btrfs 
zlib_deflate lzo_compress cpufreq_ondemand cpufreq_stats acpi_cpufreq 
freq_table mperf zl10353 em28xx_dvb snd_hda_codec_hdmi tda826x tda10086 lnbp21 
stb6100 stb0899 tuner_xc2028 nvidia(P)
tuner tvp5150 snd_hda_codec_realtek snd_hda_intel snd_hda_codec budget 
budget_core saa7146 uvcvideo mantis mantis_core snd_usb_audio em28xx snd_hwdep 
snd_usbmidi_lib snd_pcm ttpci_eeprom snd_rawmidi snd_timer rtc_cmos 
snd_seq_device rtc_core tpm_tis dvb_core v4l2_common i2c_i801 videodev 
ir_lirc_codec lirc_dev tpm videobuf_vmalloc videobuf_core snd rc_core 
snd_page_alloc joydev tveeprom serio_raw rtc_lib tpm_bios v4l2_compat_ioctl32 
processor fuse xfs nfs lockd sunrpc reiserfs raid456 async_raid6_recov 
async_memcpy async_pq raid6_pq async_xor xor async_tx raid0 dm_snapshot 
dm_mirror dm_region_hash dm_log scsi_wait_scan usbhid uhci_hcd usb_storage 
ehci_hcd usbcore sg ata_piix ahci libahci pata_jmicron

Pid: 5359, comm: btrfs-endio-wri Tainted: PW   2.6.39 #2/965P-DQ6
RIP: 0010:[]  [] 
lookup_inline_extent_backref+0xec/0x3fd [btrfs]
RSP: 0018:88012d7db9d0  EFLAGS: 00010202
RAX: 0001 RBX: 880059572a30 RCX: 0019
RDX: 0001 RSI: 8800 RDI: 880136a29ef8
RBP: 00b2 R08: 00800020 R09: 
R10: 0034 R11: 880059572a30 R12: 88013d018920
R13: 0001 R14: 001d R15: 0001
FS:  () GS:88013fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 7faadc4c0c50 CR3: 00013419 CR4: 06f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process btrfs-endio-wri (pid: 5359, threadinfo 88012d7da000, task 
88013bc80100)
Stack:
 0240 88012d7dbb60 00322d7da000 880059572a30
 001d052e 88013afdd800 00352d7dbc41 03b84e69e000
 880136a29828 88012d7dbab8 03b84e69e000 0bf000a8
Call Trace:
 [] ? insert_inline_extent_backref+0x63/0xec [btrfs]
 [] ? update_block_group+0x1d4/0x1f1 [btrfs]
 [] ? __btrfs_inc_extent_ref+0xb1/0x1e3 [btrfs]
 [] ? run_clustered_refs+0x69d/0x768 [btrfs]
 [] ? btrfs_run_delayed_refs+0xcd/0x1c0 [btrfs]
 [] ? __btrfs_end_transaction+0x66/0x1c1 [btrfs]
 [] ? btrfs_finish_ordered_io+0x2b3/0x2d8 [btrfs]
 [] ? end_bio_extent_writepage+0xa0/0x14a [btrfs]
 [] ? worker_loop+0x17f/0x47d [btrfs]
 [] ? btrfs_queue_worker+0x248/0x248 [btrfs]
 [] ? btrfs_queue_worker+0x248/0x248 [btrfs]
 [] ? kthread+0x7a/0x82
 [] ? kernel_thread_helper+0x4/0x10
 [] ? kthread_worker_fn+0x139/0x139
 [] ? gs_change+0xb/0xb
Code: 24 50 41 b9 01 00 00 00 44 8b 44 24 24 48 89 d9 48 8b 74 24 28 4c 89 e7 
e8 7e 63 ff ff 41 89 c5 83 f8 00 0f 8c e0 02 00 00 74 04 <0f> 0b eb fe 4c 8b 2b 
48 63 73 40 4c 89 ef 48 6b f6 19 48 83 c6
RIP  [] lookup_inline_extent_backref+0xec/0x3fd [btrfs]
 RSP 
---[ end trace 1dac9e78db79cc

Safe fsck / consistent backup while mounted

2011-06-04 Thread Martin Steigerwald
Hi!

In mailing list debian-user-german we are discussing safe ways to do a 
fsck when mounted.

I tested with Ext4 that fsck -nf works either with mount -o remount,ro or 
fsfreeze -f while writing with:

I=0; while true ; let I=I+1 ; do touch /boot/test$I ; sleep 0.2 ; done

In the read only mount case the write application returns errors, in the 
fsfreeze case Linux kernel stacks the changes in memory, but the fsck 
reports no errors like it should.


Now I thought about a way to safely backup a MySQL or other database - 
without long service interruption:

- Tell DB to turn itself into consistent state and freeze there
- sync / btrfs filesystem sync ; fsfreeze -f /mountpoint
- btrfs subvolume snapshot
- fsfreeze -u /mountpoint
- Tell DB to continue business as usual

My questions are:

1) Would this work?

2) Is the sync needed? And if so how to avoid the race condition between 
the sync and the fsfreeze invocation? Reading from the fsfreeze manpage I 
understand that fsfreeze allows all ongoing transactions to complete. But 
does that include everything what sync would bring to disk?

3) Is the fsfreeze needed at all? Does btrfs subvolume freeze the 
filesystem prior to the snapshot? The manpage doesn´t tell it.

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7


signature.asc
Description: This is a digitally signed message part.


[PATCH v2 9/9] mkfs.btrfs: fix error text in '-r' mode

2011-06-04 Thread Sergei Trofimovich
Smart gcc noticed use of uninitialized warning when compiled
with -O0 flags:

mkfs.c:1291: error: 'file' may be used uninitialized in this function

Signed-off-by: Sergei Trofimovich 
---
 mkfs.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mkfs.c b/mkfs.c
index a65fb4d..44a05e8 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -1272,47 +1272,47 @@ int main(int ac, char **av)
fprintf(stderr, "error checking %s mount status\n", 
file);
exit(1);
}
if (ret == 1) {
fprintf(stderr, "%s is mounted\n", file);
exit(1);
}
ac--;
fd = open(file, O_RDWR);
if (fd < 0) {
fprintf(stderr, "unable to open %s\n", file);
exit(1);
}
first_fd = fd;
first_file = file;
ret = btrfs_prepare_device(fd, file, zero_end, 
&dev_block_count, &mixed);
if (block_count == 0)
block_count = dev_block_count;
} else {
ac = 0;
+   file = output;
fd = open_target(output);
if (fd < 0) {
fprintf(stderr, "unable to open the %s\n", file);
exit(1);
}
 
-   file = output;
first_fd = fd;
first_file = file;
block_count = size_sourcedir(source_dir, sectorsize,
 &num_of_meta_chunks, 
&size_of_data);
ret = zero_output_file(fd, block_count, sectorsize);
if (ret) {
fprintf(stderr, "unable to zero the output file\n");
exit(1);
}
}
if (mixed) {
if (!metadata_profile_opt)
metadata_profile = 0;
if (!data_profile_opt)
data_profile = 0;
 
if (metadata_profile != data_profile) {
fprintf(stderr, "With mixed block groups data and 
metadata "
"profiles must be the same\n");
exit(1);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 8/9] mkfs.btrfs: fix memory leak caused by 'scandir()' calls

2011-06-04 Thread Sergei Trofimovich
Signed-off-by: Sergei Trofimovich 
---
 mkfs.c |   16 
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/mkfs.c b/mkfs.c
index c8b19c1..a65fb4d 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -451,53 +451,67 @@ static int fill_inode_item(struct btrfs_trans_handle 
*trans,
blocks += 1;
blocks *= sectorsize;
btrfs_set_stack_inode_nbytes(dst, blocks);
}
}
if (S_ISLNK(src->st_mode))
btrfs_set_stack_inode_nbytes(dst, src->st_size + 1);
 
return 0;
 }
 
 static int directory_select(const struct direct *entry)
 {
if ((strncmp(entry->d_name, ".", entry->d_reclen) == 0) ||
(strncmp(entry->d_name, "..", entry->d_reclen) == 0))
return 0;
else
return 1;
 }
 
+static void free_namelist(struct direct **files, int count)
+{
+   int i;
+
+   if (count < 0)
+   return;
+
+   for (i = 0; i < count; ++i)
+   free(files[i]);
+   free (files);
+}
+
 static u64 calculate_dir_inode_size(char *dirname)
 {
int count, i;
struct direct **files, *cur_file;
u64 dir_inode_size = 0;
 
count = scandir(dirname, &files, directory_select, NULL);
 
for (i = 0; i < count; i++) {
cur_file = files[i];
dir_inode_size += strlen(cur_file->d_name);
}
 
+   free_namelist(files, count);
+
dir_inode_size *= 2;
return dir_inode_size;
 }
 
 static int add_inode_items(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
   struct stat *st, char *name,
   u64 self_objectid, ino_t parent_inum,
   int dir_index_cnt, struct btrfs_inode_item 
*inode_ret)
 {
int ret;
struct btrfs_key inode_key;
struct btrfs_inode_item btrfs_inode;
u64 objectid;
u64 inode_size = 0;
int name_len;
 
name_len = strlen(name);
fill_inode_item(trans, root, &btrfs_inode, st);
objectid = self_objectid;
@@ -954,49 +968,51 @@ static int traverse_directory(struct btrfs_trans_handle 
*trans,
dir_entry->inum = cur_inum;
list_add_tail(&dir_entry->list, 
&dir_head->list);
} else if (S_ISREG(st.st_mode)) {
ret = add_file_items(trans, root, &cur_inode,
 cur_inum, parent_inum, &st,
 cur_file->d_name, out_fd);
if (ret) {
fprintf(stderr, "add_file_items 
failed\n");
goto fail;
}
} else if (S_ISLNK(st.st_mode)) {
ret = add_symbolic_link(trans, root,
cur_inum, 
cur_file->d_name);
if (ret) {
fprintf(stderr, "add_symbolic_link 
failed\n");
goto fail;
}
}
}
 
+   free_namelist(files, count);
free(parent_dir_entry->path);
free(parent_dir_entry);
 
index_cnt = 2;
 
} while (!list_empty(&dir_head->list));
 
return 0;
 fail:
+   free_namelist(files, count);
free(parent_dir_entry->path);
free(parent_dir_entry);
return -1;
 }
 
 static int open_target(char *output_name)
 {
int output_fd;
output_fd = open(output_name, O_CREAT | O_RDWR | O_TRUNC,
 S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH);
 
return output_fd;
 }
 
 static int create_chunks(struct btrfs_trans_handle *trans,
 struct btrfs_root *root, u64 num_of_meta_chunks,
 u64 size_of_data)
 {
u64 chunk_start;
u64 chunk_size;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 7/9] mkfs.btrfs: free buffers allocated by pretty_sizes

2011-06-04 Thread Sergei Trofimovich
found by valgrind:
==2559== 16 bytes in 1 blocks are definitely lost in loss record 3 of 19
==2559==at 0x4C2720E: malloc (vg_replace_malloc.c:236)
==2559==by 0x412F7E: pretty_sizes (utils.c:1054)
==2559==by 0x4179E9: main (mkfs.c:1395)

Signed-off-by: Sergei Trofimovich 
---
 mkfs.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/mkfs.c b/mkfs.c
index 32f25f5..c8b19c1 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -1159,40 +1159,41 @@ int main(int ac, char **av)
u64 data_profile = BTRFS_BLOCK_GROUP_RAID0;
u32 leafsize = getpagesize();
u32 sectorsize = 4096;
u32 nodesize = leafsize;
u32 stripesize = 4096;
int zero_end = 1;
int option_index = 0;
int fd;
int first_fd;
int ret;
int i;
int mixed = 0;
int data_profile_opt = 0;
int metadata_profile_opt = 0;
 
char *source_dir = NULL;
int source_dir_set = 0;
char *output = "output.img";
u64 num_of_meta_chunks = 0;
u64 size_of_data = 0;
+   char * pretty_buf;
 
while(1) {
int c;
c = getopt_long(ac, av, "A:b:l:n:s:m:d:L:r:VM", long_options,
&option_index);
if (c < 0)
break;
switch(c) {
case 'A':
alloc_start = parse_size(optarg);
break;
case 'd':
data_profile = parse_profile(optarg);
data_profile_opt = 1;
break;
case 'l':
leafsize = parse_size(optarg);
break;
case 'L':
label = parse_label(optarg);
@@ -1378,41 +1379,42 @@ raid_groups:
if (!source_dir_set) {
ret = create_raid_groups(trans, root, data_profile,
 metadata_profile, mixed);
BUG_ON(ret);
}
 
ret = create_data_reloc_tree(trans, root);
BUG_ON(ret);
 
if (mixed) {
struct btrfs_super_block *super = &root->fs_info->super_copy;
u64 flags = btrfs_super_incompat_flags(super);
 
flags |= BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS;
btrfs_set_super_incompat_flags(super, flags);
}
 
printf("fs created label %s on %s\n\tnodesize %u leafsize %u "
"sectorsize %u size %s\n",
label, first_file, nodesize, leafsize, sectorsize,
-   pretty_sizes(btrfs_super_total_bytes(&root->fs_info->super_copy)));
+   pretty_buf = 
pretty_sizes(btrfs_super_total_bytes(&root->fs_info->super_copy)));
+   free (pretty_buf);
 
printf("%s\n", BTRFS_BUILD_VERSION);
btrfs_commit_transaction(trans, root);
 
if (source_dir_set) {
trans = btrfs_start_transaction(root, 1);
ret = create_chunks(trans, root,
num_of_meta_chunks, size_of_data);
BUG_ON(ret);
btrfs_commit_transaction(trans, root);
 
ret = make_image(source_dir, root, fd);
BUG_ON(ret);
}
 
ret = close_ctree(root);
BUG_ON(ret);
 
free(label);
return 0;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 6/9] mkfs.btrfs: write zeroes instead on uninitialized data.

2011-06-04 Thread Sergei Trofimovich
Found by valgrind:
==8968== Use of uninitialised value of size 8
==8968==at 0x41CE7D: crc32c_le (crc32c.c:98)
==8968==by 0x40A1D0: csum_tree_block_size (disk-io.c:82)
==8968==by 0x40A2D4: csum_tree_block (disk-io.c:105)
==8968==by 0x40A7D6: write_tree_block (disk-io.c:241)
==8968==by 0x40ACEE: __commit_transaction (disk-io.c:354)
==8968==by 0x40AE9E: btrfs_commit_transaction (disk-io.c:385)
==8968==by 0x42CF66: make_image (mkfs.c:1061)
==8968==by 0x42DE63: main (mkfs.c:1410)
==8968==  Uninitialised value was created by a stack allocation
==8968==at 0x42B5FB: add_inode_items (mkfs.c:493)

1. On-disk inode format has reserved (and thus, random at alloc time) fields:
   btrfs_inode_item: __le64 reserved[4]
2. Sometimes extents are created on disk without writing data there.
   (Or at least not all data is written there). Kernel code always had
   it kzalloc'ed.
Zero them all.

Signed-off-by: Sergei Trofimovich 
---
 extent_io.c |1 +
 mkfs.c  |7 +++
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/extent_io.c b/extent_io.c
index 069c199..a93d4d6 100644
--- a/extent_io.c
+++ b/extent_io.c
@@ -555,40 +555,41 @@ static int free_some_buffers(struct extent_io_tree *tree)
} else {
list_move_tail(&eb->lru, &tree->lru);
}
if (nrscan++ > 64)
break;
}
return 0;
 }
 
 static struct extent_buffer *__alloc_extent_buffer(struct extent_io_tree *tree,
   u64 bytenr, u32 blocksize)
 {
struct extent_buffer *eb;
int ret;
 
eb = malloc(sizeof(struct extent_buffer) + blocksize);
if (!eb) {
BUG();
return NULL;
}
+   memset (eb, 0, sizeof(struct extent_buffer) + blocksize);
 
eb->start = bytenr;
eb->len = blocksize;
eb->refs = 2;
eb->flags = 0;
eb->tree = tree;
eb->fd = -1;
eb->dev_bytenr = (u64)-1;
eb->cache_node.start = bytenr;
eb->cache_node.size = blocksize;
 
free_some_buffers(tree);
ret = insert_existing_cache_extent(&tree->cache, &eb->cache_node);
if (ret) {
free(eb);
return NULL;
}
list_add_tail(&eb->lru, &tree->lru);
tree->cache_size += blocksize;
return eb;
diff --git a/mkfs.c b/mkfs.c
index 8ff2b1e..32f25f5 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -394,40 +394,47 @@ static int add_directory_items(struct btrfs_trans_handle 
*trans,
if (S_ISLNK(st->st_mode))
filetype = BTRFS_FT_SYMLINK;
 
ret = btrfs_insert_dir_item(trans, root, name, name_len,
parent_inum, &location,
filetype, index_cnt);
 
*dir_index_cnt = index_cnt;
index_cnt++;
 
return ret;
 }
 
 static int fill_inode_item(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
   struct btrfs_inode_item *dst, struct stat *src)
 {
u64 blocks = 0;
u64 sectorsize = root->sectorsize;
 
+   /*
+* btrfs_inode_item has some reserved fields
+* and represents on-disk inode entry, so
+* zero everything to prevent information leak
+*/
+   memset (dst, 0, sizeof (*dst));
+
btrfs_set_stack_inode_generation(dst, trans->transid);
btrfs_set_stack_inode_size(dst, src->st_size);
btrfs_set_stack_inode_nbytes(dst, 0);
btrfs_set_stack_inode_block_group(dst, 0);
btrfs_set_stack_inode_nlink(dst, src->st_nlink);
btrfs_set_stack_inode_uid(dst, src->st_uid);
btrfs_set_stack_inode_gid(dst, src->st_gid);
btrfs_set_stack_inode_mode(dst, src->st_mode);
btrfs_set_stack_inode_rdev(dst, 0);
btrfs_set_stack_inode_flags(dst, 0);
btrfs_set_stack_timespec_sec(&dst->atime, src->st_atime);
btrfs_set_stack_timespec_nsec(&dst->atime, 0);
btrfs_set_stack_timespec_sec(&dst->ctime, src->st_ctime);
btrfs_set_stack_timespec_nsec(&dst->ctime, 0);
btrfs_set_stack_timespec_sec(&dst->mtime, src->st_mtime);
btrfs_set_stack_timespec_nsec(&dst->mtime, 0);
btrfs_set_stack_timespec_sec(&dst->otime, 0);
btrfs_set_stack_timespec_nsec(&dst->otime, 0);
 
if (S_ISDIR(src->st_mode)) {
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/9] mkfs.btrfs: fix symlink names writing

2011-06-04 Thread Sergei Trofimovich
Found by valgrind:
==8968== Use of uninitialised value of size 8
==8968==at 0x41CE7D: crc32c_le (crc32c.c:98)
==8968==by 0x40A1D0: csum_tree_block_size (disk-io.c:82)
==8968==by 0x40A2D4: csum_tree_block (disk-io.c:105)
==8968==by 0x40A7D6: write_tree_block (disk-io.c:241)
==8968==by 0x40ACEE: __commit_transaction (disk-io.c:354)
==8968==by 0x40AE9E: btrfs_commit_transaction (disk-io.c:385)
==8968==by 0x42CF66: make_image (mkfs.c:1061)
==8968==by 0x42DE63: main (mkfs.c:1410)
==8968==  Uninitialised value was created by a stack allocation
==8968==at 0x42B5FB: add_inode_items (mkfs.c:493)

readlink(2) does not write '\0' for us, so make it manually.

Signed-off-by: Sergei Trofimovich 
---
 mkfs.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/mkfs.c b/mkfs.c
index 9d7b792..8ff2b1e 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -692,45 +692,47 @@ static int record_file_extent(struct btrfs_trans_handle 
*trans,
   root->root_key.objectid,
   objectid, 0);
 fail:
btrfs_release_path(root, &path);
return ret;
 }
 
 static int add_symbolic_link(struct btrfs_trans_handle *trans,
 struct btrfs_root *root,
 u64 objectid, const char *path_name)
 {
int ret;
u64 sectorsize = root->sectorsize;
char *buf = malloc(sectorsize);
 
ret = readlink(path_name, buf, sectorsize);
if (ret <= 0) {
fprintf(stderr, "readlink failed for %s\n", path_name);
goto fail;
}
-   if (ret > sectorsize) {
+   if (ret >= sectorsize) {
fprintf(stderr, "symlink too long for %s", path_name);
ret = -1;
goto fail;
}
+
+   buf[ret] = '\0'; /* readlink does not do it for us */
ret = btrfs_insert_inline_extent(trans, root, objectid, 0,
 buf, ret + 1);
 fail:
free(buf);
return ret;
 }
 
 static int add_file_items(struct btrfs_trans_handle *trans,
  struct btrfs_root *root,
  struct btrfs_inode_item *btrfs_inode, u64 objectid,
  ino_t parent_inum, struct stat *st,
  const char *path_name, int out_fd)
 {
int ret;
ssize_t ret_read;
u64 bytes_read = 0;
char *buffer = NULL;
struct btrfs_key key;
int blocks;
u32 sectorsize = root->sectorsize;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/9] mkfs.btrfs: return some defined value instead of garbage when lookup checksum

2011-06-04 Thread Sergei Trofimovich
==31873== Command: ./mkfs.btrfs -r /some/root/
==31873== Parent PID: 31872
==31873==
==31873== Conditional jump or move depends on uninitialised value(s)
==31873==at 0x42C3D0: add_file_items (mkfs.c:792)
==31873==by 0x42CAB3: traverse_directory (mkfs.c:948)
==31873==by 0x42CF11: make_image (mkfs.c:1047)
==31873==by 0x42DE53: main (mkfs.c:1401)
==31873==  Uninitialised value was created by a stack allocation
==31873==at 0x41B1B1: btrfs_csum_file_block (file-item.c:195)

'ret' value was not initialized for 'found' branch.

The same fix sits in kernel:
> commit 639cb58675ce9b507eed9c3d6b3335488079b21a
> Author: Chris Mason 
> Date:   Thu Aug 28 06:15:25 2008 -0400
>
> Btrfs: Fix variable init during csum creation
>
> Signed-off-by: Chris Mason 

Signed-off-by: Sergei Trofimovich 
---
 file-item.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/file-item.c b/file-item.c
index 9732282..47f6ad2 100644
--- a/file-item.c
+++ b/file-item.c
@@ -201,40 +201,41 @@ int btrfs_csum_file_block(struct btrfs_trans_handle 
*trans,
struct btrfs_path *path;
struct btrfs_csum_item *item;
struct extent_buffer *leaf = NULL;
u64 csum_offset;
u32 csum_result = ~(u32)0;
u32 nritems;
u32 ins_size;
u16 csum_size =
btrfs_super_csum_size(&root->fs_info->super_copy);
 
path = btrfs_alloc_path();
BUG_ON(!path);
 
file_key.objectid = BTRFS_EXTENT_CSUM_OBJECTID;
file_key.offset = bytenr;
file_key.type = BTRFS_EXTENT_CSUM_KEY;
 
item = btrfs_lookup_csum(trans, root, path, bytenr, 1);
if (!IS_ERR(item)) {
leaf = path->nodes[0];
+   ret = 0;
goto found;
}
ret = PTR_ERR(item);
if (ret == -EFBIG) {
u32 item_size;
/* we found one, but it isn't big enough yet */
leaf = path->nodes[0];
item_size = btrfs_item_size_nr(leaf, path->slots[0]);
if ((item_size / csum_size) >= MAX_CSUM_ITEMS(root, csum_size)) 
{
/* already at max size, make a new one */
goto insert;
}
} else {
int slot = path->slots[0] + 1;
/* we didn't find a csum item, insert one */
nritems = btrfs_header_nritems(path->nodes[0]);
if (path->slots[0] >= nritems - 1) {
ret = btrfs_next_leaf(root, path);
if (ret == 1)
found_next = 1;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/9] mkfs.btrfs: fail on scandir error (-r mode)

2011-06-04 Thread Sergei Trofimovich
mkfs.btrfs does not handle relative pathnames for now. When
they are passed to it it creates empty image. So first time
I thought it does not work at all.

This patch adds error handling for scandir(). With patch it behaves
this way:

$ mkfs.btrfs -r ./root
...
fs created label (null) on output.img
nodesize 4096 leafsize 4096 sectorsize 4096 size 256.00MB
Btrfs v0.19-52-g438c5ff-dirty
scandir for ./root failed: No such file or directory
unable to traverse_directory
Making image is aborted.
mkfs.btrfs: mkfs.c:1402: main: Assertion `!(ret)' failed.

Signed-off-by: Sergei Trofimovich 
---
 mkfs.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/mkfs.c b/mkfs.c
index 57c88f9..9d7b792 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -878,40 +878,46 @@ static int traverse_directory(struct btrfs_trans_handle 
*trans,
btrfs_mark_buffer_dirty(leaf);
 
btrfs_release_path(root, &path);
 
do {
parent_dir_entry = list_entry(dir_head->list.next,
  struct directory_name_entry,
  list);
list_del(&parent_dir_entry->list);
 
parent_inum = parent_dir_entry->inum;
parent_dir_name = parent_dir_entry->dir_name;
if (chdir(parent_dir_entry->path)) {
fprintf(stderr, "chdir error for %s\n",
parent_dir_name);
goto fail;
}
 
count = scandir(parent_dir_entry->path, &files,
directory_select, NULL);
+   if (count == -1)
+   {
+   fprintf(stderr, "scandir for %s failed: %s\n",
+   parent_dir_name, strerror (errno));
+   goto fail;
+   }
 
for (i = 0; i < count; i++) {
cur_file = files[i];
 
if (lstat(cur_file->d_name, &st) == -1) {
fprintf(stderr, "lstat failed for file %s\n",
cur_file->d_name);
goto fail;
}
 
cur_inum = ++highest_inum + BTRFS_FIRST_FREE_OBJECTID;
ret = add_directory_items(trans, root,
  cur_inum, parent_inum,
  cur_file->d_name,
  &st, &dir_index_cnt);
if (ret) {
fprintf(stderr, "add_directory_items failed\n");
goto fail;
}
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/9] btrfs-convert: fix typo: 'all inode' -> 'all inodes'

2011-06-04 Thread Sergei Trofimovich
Signed-off-by: Sergei Trofimovich 
---
 convert.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/convert.c b/convert.c
index fbcf4a3..291dc27 100644
--- a/convert.c
+++ b/convert.c
@@ -1103,41 +1103,41 @@ static int copy_disk_extent(struct btrfs_root *root, 
u64 dst_bytenr,
char *buffer;
struct btrfs_fs_devices *fs_devs = root->fs_info->fs_devices;
 
buffer = malloc(num_bytes);
if (!buffer)
return -ENOMEM;
ret = pread(fs_devs->latest_bdev, buffer, num_bytes, src_bytenr);
if (ret != num_bytes)
goto fail;
ret = pwrite(fs_devs->latest_bdev, buffer, num_bytes, dst_bytenr);
if (ret != num_bytes)
goto fail;
ret = 0;
 fail:
free(buffer);
if (ret > 0)
ret = -1;
return ret;
 }
 /*
- * scan ext2's inode bitmap and copy all used inode.
+ * scan ext2's inode bitmap and copy all used inodes.
  */
 static int copy_inodes(struct btrfs_root *root, ext2_filsys ext2_fs,
   int datacsum, int packing, int noxattr)
 {
int ret;
errcode_t err;
ext2_inode_scan ext2_scan;
struct ext2_inode ext2_inode;
ext2_ino_t ext2_ino;
u64 objectid;
struct btrfs_trans_handle *trans;
 
trans = btrfs_start_transaction(root, 1);
if (!trans)
return -ENOMEM;
err = ext2fs_open_inode_scan(ext2_fs, 0, &ext2_scan);
if (err) {
fprintf(stderr, "ext2fs_open_inode_scan: %s\n", 
error_message(err));
return -1;
}
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/9] btrfs progs: fix extra metadata chunk allocation in --mixed case

2011-06-04 Thread Sergei Trofimovich
From: Arne Jansen 

When creating a mixed fs with mkfs, an extra metadata chunk got allocated.
This is because btrfs_reserve_extent calls do_chunk_alloc for METADATA,
which in turn wasn't able to find the proper space_info, as __find_space_info
did a hard compare of the flags. It is now sufficient for the space_info to
include the proper flag. This reflects the change done to the kernel code
to support mixed chunks.
Also for a subsequent chunk allocation (which should not be hit in the mkfs
case), the chunk is now created with the flags from the space_info instead
of the requested flags. A better solution would be to pull the full changeset
for the mixed case from the kernel into the user mode (or, even better, share
the code)

The additional chunk probably confused block_rsv calculation, which in turn
led to severeal ENOSPC Oopses.

Signed-off-by: Arne Jansen 
---
 extent-tree.c |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/extent-tree.c b/extent-tree.c
index b2f9bb2..c6c77c6 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1718,41 +1718,41 @@ int btrfs_write_dirty_block_groups(struct 
btrfs_trans_handle *trans,
 
clear_extent_bits(block_group_cache, start, end,
  BLOCK_GROUP_DIRTY, GFP_NOFS);
 
cache = (struct btrfs_block_group_cache *)(unsigned long)ptr;
ret = write_one_cache_group(trans, root, path, cache);
BUG_ON(ret);
}
btrfs_free_path(path);
return 0;
 }
 
 static struct btrfs_space_info *__find_space_info(struct btrfs_fs_info *info,
  u64 flags)
 {
struct list_head *head = &info->space_info;
struct list_head *cur;
struct btrfs_space_info *found;
list_for_each(cur, head) {
found = list_entry(cur, struct btrfs_space_info, list);
-   if (found->flags == flags)
+   if (found->flags & flags)
return found;
}
return NULL;
 
 }
 
 static int update_space_info(struct btrfs_fs_info *info, u64 flags,
 u64 total_bytes, u64 bytes_used,
 struct btrfs_space_info **space_info)
 {
struct btrfs_space_info *found;
 
found = __find_space_info(info, flags);
if (found) {
found->total_bytes += total_bytes;
found->bytes_used += bytes_used;
WARN_ON(found->total_bytes < found->bytes_used);
*space_info = found;
return 0;
}
@@ -1795,49 +1795,50 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
*trans,
u64 start;
u64 num_bytes;
int ret;
 
space_info = __find_space_info(extent_root->fs_info, flags);
if (!space_info) {
ret = update_space_info(extent_root->fs_info, flags,
0, 0, &space_info);
BUG_ON(ret);
}
BUG_ON(!space_info);
 
if (space_info->full)
return 0;
 
thresh = div_factor(space_info->total_bytes, 7);
if ((space_info->bytes_used + space_info->bytes_pinned + alloc_bytes) <
thresh)
return 0;
 
-   ret = btrfs_alloc_chunk(trans, extent_root, &start, &num_bytes, flags);
+   ret = btrfs_alloc_chunk(trans, extent_root, &start, &num_bytes,
+   space_info->flags);
if (ret == -ENOSPC) {
space_info->full = 1;
return 0;
}
 
BUG_ON(ret);
 
-   ret = btrfs_make_block_group(trans, extent_root, 0, flags,
+   ret = btrfs_make_block_group(trans, extent_root, 0, space_info->flags,
 BTRFS_FIRST_CHUNK_TREE_OBJECTID, start, num_bytes);
BUG_ON(ret);
return 0;
 }
 
 static int update_block_group(struct btrfs_trans_handle *trans,
  struct btrfs_root *root,
  u64 bytenr, u64 num_bytes, int alloc,
  int mark_free)
 {
struct btrfs_block_group_cache *cache;
struct btrfs_fs_info *info = root->fs_info;
u64 total = num_bytes;
u64 old_val;
u64 byte_in_group;
u64 start;
u64 end;
 
/* block accounting for super block */
old_val = btrfs_super_bytes_used(&info->super_copy);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/9] btrfs-progs: some fixes for bugs spotted by valgrind

2011-06-04 Thread Sergei Trofimovich
tmp branch recently got very nice feature: 'mkfs.btrfs -r /some/directory'.

It's very useful, when you need to creare minimal root: /bin/sh and fs_mark.

But there is another hidden feature! As '-r' can create whole filesystem
we can effectively valgrind a lot of code paths in btrfs and pick bugs.

This patch series is mostly (with one exception) dumb obvous holes plugs
(sometimes they are backports from kernel).

Patchset based on

git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git#tmp

commit e6bd18d8938986c997c45f0ea95b221d4edec095
Author: Christoph Hellwig 
Date:   Thu Apr 21 16:24:07 2011 -0400

First off the exception:

In order to make --mixed produce proper filesystems with meta+data only
blocks (and not meta+data/data ones, which confused space_cache and led
to an oops for me) I ask to consider for pulling Arne's patch:
> Subject: [PATCH v2 1/9] btrfs progs: fix extra metadata chunk allocation in 
> --mixed case

The rest of patches should be obvoius. They don't fix all the fair valgrind
compaints, but reduce them severely.

Changes since v1:
  - "[PATCH 8/9] mkfs.btrfs: fix memory leak caused by 'scandir()' calls":
'free_namelist()' now works correctly if 'count == -1'. It happens
when 'free_namelist()' is called right after 'scandir()' returning
an error.

Some stats:

convert.c |2 +-
extent-tree.c |7 ---
extent_io.c   |1 +
file-item.c   |1 +
mkfs.c|   39 ---
5 files changed, 43 insertions(+), 7 deletions(-)

Arne Jansen (1):
  btrfs progs: fix extra metadata chunk allocation in --mixed case

Sergei Trofimovich (8):
  btrfs-convert: fix typo: 'all inode' -> 'all inodes'
  mkfs.btrfs: fail on scandir error (-r mode)
  mkfs.btrfs: return some defined value instead of garbage when lookup 
checksum
  mkfs.btrfs: fix symlink names writing
  mkfs.btrfs: write zeroes instead on uninitialized data.
  mkfs.btrfs: free buffers allocated by pretty_sizes
  mkfs.btrfs: fix memory leak caused by 'scandir()' calls
  mkfs.btrfs: fix error text in '-r' mode
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Quota Implementation

2011-06-04 Thread Arne Jansen

On 03.06.2011 18:47, Hugo Mills wrote:

On Fri, Jun 03, 2011 at 06:24:41PM +0200, Arne Jansen wrote:

Hi,

If no one is already working on it, I'd like to take the Quota lock and
see how far I come.
Let me sketch out in short what I'm planning to do:

  - Quota will be subvolume based. Only the FS-trees and data extents
will be accounted.
  - Quota Groups can be defined. Every quota group can comprise any
number of subvolumes. A subvolume can be assigned to any number
of quota groups.
  - A Quota Group can account/limit the total amount of space that is
referenced by it and/or the amount of space that is exclusively
referenced (i.e. referenced by no other quota group).
  - With this it is possible to define a hierarchical quota that need
not necessarily reflect the filesystem hierarchy.
  - It is also possible to decide for each snapshot if it should be
accounted into the parent group. So in a scenario where each
subvolume reflect a user home, it's possible to have some snapshots
accounted to the user and others not (e.g. the ones needed for system
backups).
  - Quota information will be stored in new records, possibly in a
separate tree.
  - It should be possible to change the Quota config and group
assignments online, though this might need a full re-scan of the fs.
  - It does NOT include any kind of user/group (UID/GID) quota.

Any addenda or arguments why it's impossible or insane welcome.


There's a problem in that in some cases, it's possible to get into
a situation where you can't *delete* files because you're going over
quota. If I have two subvolumes that share most of their data
(e.g. one is a snapshot of the other), and both subvolumes have a
limit under the "exclusive use" clause, then deleting material from
subvolume A could cause subvolume B to go over quota.


I wouldn't prevent the deletion in A, but let go B over quota instead.
Maybe a limit on exclusive use is of little practical use, but a
tracking of it is very useful, as it is the space that will get freed
if this subvol should get deleted. So it is an answer to the question
'how big is this snapshot?'.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html