Re: btrfs-show vs. btrfs different output

2013-03-22 Thread Eric Sandeen
On 3/22/13 8:59 AM, Jon Nelson wrote:
> On Thu, Mar 21, 2013 at 11:25 AM, Eric Sandeen  wrote:
>> On 3/21/13 10:29 AM, Jon Nelson wrote:
>>> On Thu, Mar 21, 2013 at 10:11 AM, Eric Sandeen  wrote:
 On 3/21/13 10:04 AM, Jon Nelson wrote:
>>> ...
> 2. the current git btrfs-show and btrfs fi show both output
> *different* devices for device with UUID
> b5dc52bd-21bf-4173-8049-d54d88c82240, and they're both wrong.

 does blkid output find that uuid anywhere?

 Since you're working in git, can you maybe do a little bisecting
 to find out when it changed?  Should be a fairly quick test?
>>>
>>> blkid does /not/ report that uuid anywhere.
>>>
>>> git bisect, if I did it correctly, says:
>>>
>>>
>>> 6eba9002956ac40db87d42fb653a0524dc568810 is the first bad commit
>>> commit 6eba9002956ac40db87d42fb653a0524dc568810
>>> Author: Goffredo Baroncelli 
>>> Date:   Tue Sep 4 19:59:26 2012 +0200
>>>
>>> Correct un-initialized fsid variable
>>>
>>> :100644 100644 b21a87f827a6250da45f2fb6a1c3a6b651062243
>>> 03952051b5e25e0b67f0f910c84d93eb90de8480 M  disk-io.c
>>
>> Ok, I think this is another case of greedily scanning stale
>> backup superblocks (did you ever have btrfs on the whole sda
>> or sdb?)
>>
>> btrfs_read_dev_super() currently tries to scan all 3 superblocks
>> (primary & 2 backups).  I'm guessing that you have some stale
>> backup superblocks on sda and/or sdb.
>>
>> Before the above commit, if the first sb didn't look valid,
>> it'd skip to the 2nd.  If the 2nd (stale) one looked OK,
>> it'd compare its fsid to an uniniitialized variable (fsid)
>> which would fail (since the "fsid" contents were random.)
>> Same for the 3rd backup if found, and eventually it'd return
>> -1 as failure and not report the device.
>>
>> After the commit, it'd skip the first invalid sb as well.
>> But this time, it takes the fsid from the 2nd superblock as
>> "good" and makes it through the loop thinking that it's found
>> something valid.  Hence the report of a device which you didn't
>> expect even though the first superblock is indeed wiped out.
>>
>> There are some patches floating around to stop this
>> backup superblock scanning altogether.
>>
>> This might fix it for you; it basically returns failure
>> if any superblock on the device is found to be bad.
>>
>> What we really need is the right bits in the right places
>> to let the administrator know if a device looks like it might
>> be corrupt & in need of fixing, vs. ignoring it altogether.
>>
>> Not sure if this is something we want upstream but you could
>> test if if you like.
> 
> I did test and it appears to resolve the issue for me.
> Thank you!

Thanks.  I need to get back to finding the right overall solution
here, but have been busy elsewhere.  It's on the list ;)

Anand is looking at it too and has some patches on the list.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No space left on device (28)

2013-03-22 Thread Stefan Priebe

Hi Jsoef,

thanks!

Am 22.03.2013 21:49, schrieb Josef Bacik:

On Fri, Mar 22, 2013 at 01:10:05PM -0600, Stefan Priebe wrote:

Hi Josef,
Am 22.03.2013 16:54, schrieb Josef Bacik:

On Fri, Mar 22, 2013 at 07:56:41AM -0600, Stefan Priebe - Profihost AG wrote:

Hi Josef,
Am 22.03.2013 14:53, schrieb Josef Bacik:

On Fri, Mar 22, 2013 at 06:11:56AM -0600, Stefan Priebe - Profihost AG wrote:

Hi Chris,


Which kernel are you running?

-chris


vanilla 3.8.3.


Ok, with the 3.9 merge window Josef changed how we do the reservations.
Are you able to try a slightly more experimental kernel?


any ideas what i can check? 3.9-rc3 gives me same results.



Sorry Stefan I'm almost done with what I'm working on and then I'll work up a
patch for you to run so I can narrow down what's going on.  Thanks,


Great!

Thanks - just wanted to know that it's not my fault. I'm happy to test
the patch and provide feedback.


Ok I think we are way over-reserving for rename, can you give this patch a whirl
and see what happens?  If it still fails can you capture dmesg and reply with
that so I can see what's going on.  Thanks,


Thanks for the patch. I was able to copy some files and i'm still able
put it copy some then fails on some then copies some then fails on some.



Ok so I've been working on another ENOSPC related problem that I think is
affecting you as well.  So I'm going to nail that down and see if that helps you
too, if not we'll poke around your problem some more and try and work out what's
happening.  I'll get back to this fresh Monday morning.  Thanks,

Josef


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No space left on device (28)

2013-03-22 Thread Josef Bacik
On Fri, Mar 22, 2013 at 01:10:05PM -0600, Stefan Priebe wrote:
> Hi Josef,
> Am 22.03.2013 16:54, schrieb Josef Bacik:
> > On Fri, Mar 22, 2013 at 07:56:41AM -0600, Stefan Priebe - Profihost AG 
> > wrote:
> >> Hi Josef,
> >> Am 22.03.2013 14:53, schrieb Josef Bacik:
> >>> On Fri, Mar 22, 2013 at 06:11:56AM -0600, Stefan Priebe - Profihost AG 
> >>> wrote:
>  Hi Chris,
> 
> > Which kernel are you running?
> >
> > -chris
> 
>  vanilla 3.8.3.
> >>>
> >>> Ok, with the 3.9 merge window Josef changed how we do the 
> >>> reservations.
> >>> Are you able to try a slightly more experimental kernel?
> 
>  any ideas what i can check? 3.9-rc3 gives me same results.
> 
> >>>
> >>> Sorry Stefan I'm almost done with what I'm working on and then I'll work 
> >>> up a
> >>> patch for you to run so I can narrow down what's going on.  Thanks,
> >>
> >> Great!
> >>
> >> Thanks - just wanted to know that it's not my fault. I'm happy to test
> >> the patch and provide feedback.
> >>
> > Ok I think we are way over-reserving for rename, can you give this patch a 
> > whirl
> > and see what happens?  If it still fails can you capture dmesg and reply 
> > with
> > that so I can see what's going on.  Thanks,
> 
> Thanks for the patch. I was able to copy some files and i'm still able 
> put it copy some then fails on some then copies some then fails on some.
> 

Ok so I've been working on another ENOSPC related problem that I think is
affecting you as well.  So I'm going to nail that down and see if that helps you
too, if not we'll poke around your problem some more and try and work out what's
happening.  I'll get back to this fresh Monday morning.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corruption of active mmapped files in btrfs snapshots

2013-03-22 Thread Chris Mason
Quoting Chris Mason (2013-03-22 14:07:05)
> [ mmap corruptions with leveldb and btrfs compression ]
> 
> I ran this a number of times with compression off and wasn't able to
> trigger problems.  With compress=lzo, I see errors on every run.
> 
> Compile: gcc -Wall -o mmap-trunc mmap-trunc.c
> Run: ./mmap-trunc file_name
> 
> The basic idea is to create a 256MB file in steps.  Each step ftruncates
> the file larger, and then mmaps a region for writing.  It dirties some
> unaligned bytes (a little more than 8K), and then munmaps.
> 
> Then a verify stage goes back through the file to make sure the data we
> wrote is really there.  I'm using a simple rotating pattern of chars
> that compress very well.

Going through the code here, when I change the test to truncate once in
the very beginning, I still get errors.  So, it isn't an interaction
between mmap and truncate.  It must be a problem between lzo and mmap.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No space left on device (28)

2013-03-22 Thread Stefan Priebe

Hi Josef,
Am 22.03.2013 16:54, schrieb Josef Bacik:

On Fri, Mar 22, 2013 at 07:56:41AM -0600, Stefan Priebe - Profihost AG wrote:

Hi Josef,
Am 22.03.2013 14:53, schrieb Josef Bacik:

On Fri, Mar 22, 2013 at 06:11:56AM -0600, Stefan Priebe - Profihost AG wrote:

Hi Chris,


Which kernel are you running?

-chris


vanilla 3.8.3.


Ok, with the 3.9 merge window Josef changed how we do the reservations.
Are you able to try a slightly more experimental kernel?


any ideas what i can check? 3.9-rc3 gives me same results.



Sorry Stefan I'm almost done with what I'm working on and then I'll work up a
patch for you to run so I can narrow down what's going on.  Thanks,


Great!

Thanks - just wanted to know that it's not my fault. I'm happy to test
the patch and provide feedback.


Ok I think we are way over-reserving for rename, can you give this patch a whirl
and see what happens?  If it still fails can you capture dmesg and reply with
that so I can see what's going on.  Thanks,


Thanks for the patch. I was able to copy some files and i'm still able 
put it copy some then fails on some then copies some then fails on some.


Rsync output:
.software/kernel/linux-3.9-rc3/drivers/rtc/rtc-wm8350.c
   12156 100%   11.17kB/s0:00:01 (xfer#21395, to-check=1008/52625)
.software/kernel/linux-3.9-rc3/drivers/rtc/rtc-x1205.c
   16460 100%   15.05kB/s0:00:01 (xfer#21396, to-check=1007/52625)
.software/kernel/linux-3.9-rc3/drivers/rtc/systohc.c
1223 100%1.12kB/s0:00:01 (xfer#21397, to-check=1006/52625)
.software/kernel/linux-3.9-rc3/drivers/s390/
.software/kernel/linux-3.9-rc3/drivers/s390/Makefile
 144 100%0.13kB/s0:00:01 (xfer#21398, to-check=1052/52672)
.software/kernel/linux-3.9-rc3/drivers/s390/block/
rsync: recv_generator: mkdir 
"/mnt/.software/kernel/linux-3.9-rc3/drivers/s390/block" failed: No 
space left on device (28)

*** Skipping any contents from this failed directory ***
.software/kernel/linux-3.9-rc3/drivers/s390/char/
rsync: recv_generator: mkdir 
"/mnt/.software/kernel/linux-3.9-rc3/drivers/s390/char" failed: No space 
left on device (28)

*** Skipping any contents from this failed directory ***
.software/kernel/linux-3.9-rc3/drivers/s390/cio/
rsync: recv_generator: mkdir 
"/mnt/.software/kernel/linux-3.9-rc3/drivers/s390/cio" failed: No space 
left on device (28)
rsync: mkstemp 
"/mnt/.software/kernel/linux-3.9-rc3/drivers/regulator/.88pm8607.c.YBi0Rz" 
failed: No space left on device (28)

*** Skipping any contents from this failed directory ***
.software/kernel/linux-3.9-rc3/drivers/s390/crypto/
rsync: recv_generator: mkdir 
"/mnt/.software/kernel/linux-3.9-rc3/drivers/s390/crypto" failed: No 
space left on device (28)

*** Skipping any contents from this failed directory ***
.software/kernel/linux-3.9-rc3/drivers/s390/kvm/
rsync: recv_generator: mkdir 
"/mnt/.software/kernel/linux-3.9-rc3/drivers/s390/kvm" failed: No space 
left on device (28)

*** Skipping any contents from this failed directory ***
.software/kernel/linux-3.9-rc3/drivers/s390/net/
rsync: recv_generator: mkdir 
"/mnt/.software/kernel/linux-3.9-rc3/drivers/s390/net" failed: No space 
left on device (28)
rsync: mkstemp 
"/mnt/.software/kernel/linux-3.9-rc3/drivers/regulator/.Kconfig.0lsfuE" 
failed: No space left on device (28)

*** Skipping any contents from this failed directory ***
rsync: mkstemp 
"/mnt/.software/kernel/linux-3.9-rc3/drivers/regulator/.Makefile.C9wI7I" 
failed: No space left on device (28)

.software/kernel/linux-3.9-rc3/drivers/s390/scsi/
rsync: recv_generator: mkdir 
"/mnt/.software/kernel/linux-3.9-rc3/drivers/s390/scsi" failed: No space 
left on device (28)

*** Skipping any contents from this failed directory ***
.software/kernel/linux-3.9-rc3/drivers/sbus/
.software/kernel/linux-3.9-rc3/drivers/sbus/Makefile
  70 100%0.06kB/s0:00:01 (xfer#21399, to-check=1003/52824)
.software/kernel/linux-3.9-rc3/drivers/sbus/char/
rsync: recv_generator: mkdir 
"/mnt/.software/kernel/linux-3.9-rc3/drivers/sbus/char" failed: No space 
left on device (28)
rsync: mkstemp 
"/mnt/.software/kernel/linux-3.9-rc3/drivers/regulator/.aat2870-regulator.c.BGDwPN" 
failed: No space left on device (28)

*** Skipping any contents from this failed directory ***
.software/kernel/linux-3.9-rc3/drivers/scsi/
.software/kernel/linux-3.9-rc3/drivers/scsi/.3w-9xxx.ko.cmd
 208 100%0.16kB/s0:00:01 (xfer#21400, to-check=1407/53242)
.software/kernel/linux-3.9-rc3/drivers/scsi/.3w-9xxx.mod.o.cmd
   27987 100%   21.54kB/s0:00:01 (xfer#21401, to-check=1406/53242)
.software/kernel/linux-3.9-rc3/drivers/scsi/.3w-9xxx.o.cmd
   43340 100%   32.63kB/s0:00:01 (xfer#21402, to-check=1405/53242)
.software/kernel/linux-3.9-rc3/drivers/scsi/.3w-sas.ko.cmd
 204 100%   15.32kB/s0:00:00 (xfer#21403, to-check=1404/53242)
.software/kernel/linux-3.9-rc3/drivers/scsi/.3w-sas.mod.o.cmd
   27975 100%1.91MB/s0:00:00 (xfer#21404, to-check=1403/53242)
.software/

[PATCH] btrfs: make subvol creation/deletion killable in the early stages

2013-03-22 Thread David Sterba
The subvolume ioctls block on the parent directory mutex that can be
held by other concurrent snapshot activity for a long time. Give the
user at least some chance to get out of this situation by allowing
to send a kill signal.

Signed-off-by: David Sterba 
---
 fs/btrfs/ioctl.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 11c17a1..0911d01 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -723,7 +723,9 @@ static noinline int btrfs_mksubvol(struct path *parent,
struct dentry *dentry;
int error;
 
-   mutex_lock_nested(&dir->i_mutex, I_MUTEX_PARENT);
+   error = mutex_lock_killable_nested(&dir->i_mutex, I_MUTEX_PARENT);
+   if (error == -EINTR)
+   return error;
 
dentry = lookup_one_len(name, parent->dentry, namelen);
error = PTR_ERR(dentry);
@@ -2086,7 +2088,9 @@ static noinline int btrfs_ioctl_snap_destroy(struct file 
*file,
if (err)
goto out;
 
-   mutex_lock_nested(&dir->i_mutex, I_MUTEX_PARENT);
+   err = mutex_lock_killable_nested(&dir->i_mutex, I_MUTEX_PARENT);
+   if (err == -EINTR)
+   goto out;
dentry = lookup_one_len(vol_args->name, parent, namelen);
if (IS_ERR(dentry)) {
err = PTR_ERR(dentry);
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs "stuck" on

2013-03-22 Thread Roman Mamedov
On Thu, 21 Mar 2013 11:56:37 -0700
Ask Bjørn Hansen  wrote:

> Hello,
> 
> A few weeks ago I replaced a ZFS backup system with one backed by btrfs. A 
> script loops over a bunch of hosts rsyncing them to each their own subvolume. 
>  After each rsync I snapshot the "host-specific" subvolume.
> 
> The "disk" is an iscsi disk that in my benchmarks performs roughly like a 
> local raid with 2-3 SATA disks.

I think you should re-verify if this is still the case. Maybe your block
device performance suddenly plummeted for some other unrelated issue?

The simplest test would be "hdparm -t /dev/sdc".

Personally I use btrfs on top of an MD raid accessed over network via NBD (and
AoE before), without any major issues. Though my workload is perhaps somewhat
lighter than what you describe.

-- 
With respect,
Roman


signature.asc
Description: PGP signature


Re: corruption of active mmapped files in btrfs snapshots

2013-03-22 Thread Chris Mason
[ mmap corruptions with leveldb and btrfs compression ]

I ran this a number of times with compression off and wasn't able to
trigger problems.  With compress=lzo, I see errors on every run.

Compile: gcc -Wall -o mmap-trunc mmap-trunc.c
Run: ./mmap-trunc file_name

The basic idea is to create a 256MB file in steps.  Each step ftruncates
the file larger, and then mmaps a region for writing.  It dirties some
unaligned bytes (a little more than 8K), and then munmaps.

Then a verify stage goes back through the file to make sure the data we
wrote is really there.  I'm using a simple rotating pattern of chars
that compress very well.

I run it in batches of 100 with some memory pressure on the side:

for x in `seq 1 100` ; do (mmap-trunc f$x &) ; done

#define _FILE_OFFSET_BITS 64
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define FILE_SIZE ((loff_t)256 * 1024 * 1024)
/* make a painfully unaligned chunk size */
#define CHUNK_SIZE (8192 + 932)

#define mmap_align(x) (((x) + 4095) & ~4095)

char *file_name = NULL;

void mmap_one_chunk(int fd, loff_t *cur_size, unsigned char *file_buf)
{
int ret;
loff_t new_size = *cur_size + CHUNK_SIZE;
loff_t pos = *cur_size;
unsigned long map_size = mmap_align(CHUNK_SIZE) + 4096;
char val = file_buf[0];
char *p;
int extra;

/* step one, truncate out a hole */
ret = ftruncate(fd, new_size);
if (ret) {
perror("truncate");
exit(1);
}

if (val == 0 || val == 'z')
val = 'a';
else
val++;

memset(file_buf, val, CHUNK_SIZE);

extra = pos & 4095;
p = mmap(0, map_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd,
 pos - extra);
if (p == MAP_FAILED) {
perror("mmap");
exit(1);
}
memcpy(p + extra, file_buf, CHUNK_SIZE);

ret = munmap(p, map_size);
if (ret) {
perror("munmap");
exit(1);
}
*cur_size = new_size;
}

void check_chunks(int fd)
{
char *p;
loff_t checked = 0;
char val = 'a';
int i;
int errors = 0;
int ret;
int extra;
unsigned long map_size = mmap_align(CHUNK_SIZE) + 4096;

fprintf(stderr, "checking chunks\n");
while (checked < FILE_SIZE) {
extra = checked & 4095;
p = mmap(0, map_size, PROT_READ,
 MAP_SHARED, fd, checked - extra);
if (p == MAP_FAILED) {
perror("mmap");
exit(1);
}
for (i = 0; i < CHUNK_SIZE; i++) {
if (p[i + extra] != val) {
fprintf(stderr, "%s: bad val %x wanted %x 
offset 0x%llx\n",
file_name, p[i + extra], val,
(unsigned long long)checked + i);
errors++;
}
}
if (val == 'z')
val = 'a';
else
val++;
ret = munmap(p, map_size);
if (ret) {
perror("munmap");
exit(1);
}
checked += CHUNK_SIZE;
}
printf("%s found %d errors\n", file_name, errors);
if (errors)
exit(1);
}

int main(int ac, char **av)
{
unsigned char *file_buf;
loff_t pos = 0;
int ret;
int fd;

if (ac < 2) {
fprintf(stderr, "usage: mmap-trunc filename\n");
exit(1);
}

ret = posix_memalign((void **)&file_buf, 4096, CHUNK_SIZE);
if (ret) {
perror("cannot allocate memory\n");
exit(1);
}

file_buf[0] = 0;

file_name = av[1];

fprintf(stderr, "running test on %s\n", file_name);

unlink(file_name);
fd = open(file_name, O_RDWR | O_CREAT, 0600);
if (fd < 0) {
perror("open");
exit(1);
}

fprintf(stderr, "writing chunks\n");
while (pos < FILE_SIZE) {
mmap_one_chunk(fd, &pos, file_buf);
}
check_chunks(fd);
return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs "stuck" on

2013-03-22 Thread Mitch Harder
On Thu, Mar 21, 2013 at 1:56 PM, Ask Bjørn Hansen  wrote:
> Hello,
>
> A few weeks ago I replaced a ZFS backup system with one backed by btrfs. A 
> script loops over a bunch of hosts rsyncing them to each their own subvolume. 
>  After each rsync I snapshot the "host-specific" subvolume.
>
> The "disk" is an iscsi disk that in my benchmarks performs roughly like a 
> local raid with 2-3 SATA disks.
>
> It worked fine for about a week (~150 snapshots from ~20 sub volumes) before 
> it "suddenly" exploded in disk io wait. Doing anything (in particular 
> changes) on the file system is just insanely slow, rsync basically can't 
> complete (an rsync that should take 10-20 minutes takes 24 hours; I have a 
> directory of 60k files I tried deleting and it's deleting one file every few 
> minutes, that sort of thing).
>
> I am using 3.8.2-206.fc18.x86_64 (Fedora 18). I tried rebooting, it doesn't 
> make a difference. As soon as I boot "[btrfs-cleaner]" and 
> "[btrfs-transacti]" gets really busy.
>
> I wonder if it's because I deleted a few snapshots at some point?
>
> The file system is mounted with "-o compress=zlib,noatime"
>
> # mount | grep tank
> /dev/sdc on /tank type btrfs 
> (rw,noatime,seclabel,compress=zlib,space_cache,_netdev)
>
> I don't recall mounting it with space_cache; though I don't think that's the 
> default so I wonder if I did do that at some point. Could that be what's 
> messing me up?
>
> btrfs-cleaner stack:
>
> # cat /proc/1117/stack
> [] btrfs_commit_transaction+0x36a/0xa70 [btrfs]
> [] start_transaction+0x23f/0x460 [btrfs]
> [] btrfs_start_transaction+0x18/0x20 [btrfs]
> [] btrfs_drop_snapshot+0x3ef/0x5d0 [btrfs]
> [] btrfs_clean_old_snapshots+0x9f/0x120 [btrfs]
> [] cleaner_kthread+0xa9/0x120 [btrfs]
> [] kthread+0xc0/0xd0
> [] ret_from_fork+0x7c/0xb0
> [] 0x
>
>
> btrfs-transaction stack:
>
> #  cat /proc/1118/stack
> [] btrfs_tree_read_lock+0x95/0x110 [btrfs]
> [] btrfs_read_lock_root_node+0x3b/0x50 [btrfs]
> [] btrfs_search_slot+0x3f9/0x7a0 [btrfs]
> [] lookup_inline_extent_backref+0x8e/0x4d0 [btrfs]
> [] __btrfs_free_extent+0xc8/0x870 [btrfs]
> [] run_clustered_refs+0x459/0xb50 [btrfs]
> [] btrfs_run_delayed_refs+0xc8/0x2f0 [btrfs]
> [] btrfs_commit_transaction+0x86/0xa70 [btrfs]
> [] transaction_kthread+0x1a5/0x220 [btrfs]
> [] kthread+0xc0/0xd0
> [] ret_from_fork+0x7c/0xb0
> [] 0x
>
>
> Thank you for reading this far. Any suggestions would be most appreciated!
>

The space_cache option is probably not the issue.  As you've guessed,
this gets activated by default.

The cleaner runs to remove deleted snapshots.  Responsiveness while
the cleaner is running has been an issue that has come up, but it is
usually just an inconvenience.  I can't recall hearing about a
slowdown of this degree while the cleaner is running.

I haven't noticed many discussions on the Btrfs mailing list where
Btrfs is used in the context of iSCSI, so you may be seeing new issues
in your use case.

If you can, it would be interesting to know how well the cleaner runs
across iSCSI if nothing else is running.  If you could delete a single
snapshot, and make note of the space used before and after the cleaner
finishes and the time required, this might help isolate the issue.

As a work-around, I would suggest using a script to delete the files
in the subvolume before removing the snapshot.  This way, you will
have more control over the priority given to the deletion process.
Once the subvolume is empty, the cleaner usually runs much better.  :)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corruption of active mmapped files in btrfs snapshots

2013-03-22 Thread Sage Weil
On Fri, 22 Mar 2013, Chris Mason wrote:
> Quoting Alexandre Oliva (2013-03-22 10:17:30)
> > On Mar 22, 2013, Chris Mason  wrote:
> > 
> > > Are you using compression in btrfs or just in leveldb?
> > 
> > btrfs lzo compression.
> 
> Perfect, I'll focus on that part of things.
> 
> > 
> > > I'd like to take snapshots out of the picture for a minute.
> > 
> > That's understandable, I guess, but I don't know that anyone has ever
> > got the problem without snapshots.  I mean, even when the master copy of
> > the database got corrupted, snapshots of the subvol containing it were
> > being taken every now and again, because that's the way ceph works.
> 
> Hopefully Sage can comment, but the basic idea is that if you snapshot a
> database file the db must participate.  If it doesn't, it really is the
> same effect as crashing the box.
> 
> Something is definitely broken if we're corrupting the source files
> (either with or without snapshots), but avoiding incomplete writes in
> the snapshot files requires synchronization with the db.

In this case, we quiesce write activity, call leveldb's sync(), take the 
snapshot, and then continue.

(FWIW, this isn't the first time we've heard about leveldb corruption, but 
in each case we've looked into the user had the btrfs compression 
enabled so I suspect that's the right avenue of investigation!)

sage
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corruption of active mmapped files in btrfs snapshots

2013-03-22 Thread Chris Mason
In this case, I think Alexandre is scanning for zeros in the file.   The
incomplete writes will definitely show that.

-chris

Quoting Samuel Just (2013-03-22 13:06:41)
> Incomplete writes for leveldb should just result in lost updates, not
> corruption.  Also, we do stop writes before the snapshot is initiated
> so there should be no in-progress writes to leveldb other than leveldb
> compaction (though that might be something to investigate).
> -Sam
> 
> On Fri, Mar 22, 2013 at 7:26 AM, Chris Mason  wrote:
> > Quoting Alexandre Oliva (2013-03-22 10:17:30)
> >> On Mar 22, 2013, Chris Mason  wrote:
> >>
> >> > Are you using compression in btrfs or just in leveldb?
> >>
> >> btrfs lzo compression.
> >
> > Perfect, I'll focus on that part of things.
> >
> >>
> >> > I'd like to take snapshots out of the picture for a minute.
> >>
> >> That's understandable, I guess, but I don't know that anyone has ever
> >> got the problem without snapshots.  I mean, even when the master copy of
> >> the database got corrupted, snapshots of the subvol containing it were
> >> being taken every now and again, because that's the way ceph works.
> >
> > Hopefully Sage can comment, but the basic idea is that if you snapshot a
> > database file the db must participate.  If it doesn't, it really is the
> > same effect as crashing the box.
> >
> > Something is definitely broken if we're corrupting the source files
> > (either with or without snapshots), but avoiding incomplete writes in
> > the snapshot files requires synchronization with the db.
> >
> > -chris
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corruption of active mmapped files in btrfs snapshots

2013-03-22 Thread David Sterba
On Fri, Mar 22, 2013 at 10:26:59AM -0400, Chris Mason wrote:
> Quoting Alexandre Oliva (2013-03-22 10:17:30)
> > On Mar 22, 2013, Chris Mason  wrote:
> > 
> > > Are you using compression in btrfs or just in leveldb?
> > 
> > btrfs lzo compression.
> 
> Perfect, I'll focus on that part of things.

> > > I'd like to take snapshots out of the picture for a minute.

I've reproduced this without compression, with autodefrag on. The test
was using snapshots (ie. the unmmodified versino) and ended with

1087 blocks, 4316779 total size
snaptest.268/ca snaptest.268/db differ: char 4245170, line 16

after a few minutes.

Before that, I was running the NOSNAPS mode for many-minutes (up to 50k
rounds) without a reported problem.

There was the same 'make clean && make -j 32' kernel compilation running
in parallel, the box has 8 cpus, 4GB ram. Watching 'free' showed the
memory going up to a few gigs and down to ~130MB.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corruption of active mmapped files in btrfs snapshots

2013-03-22 Thread Samuel Just
Incomplete writes for leveldb should just result in lost updates, not
corruption.  Also, we do stop writes before the snapshot is initiated
so there should be no in-progress writes to leveldb other than leveldb
compaction (though that might be something to investigate).
-Sam

On Fri, Mar 22, 2013 at 7:26 AM, Chris Mason  wrote:
> Quoting Alexandre Oliva (2013-03-22 10:17:30)
>> On Mar 22, 2013, Chris Mason  wrote:
>>
>> > Are you using compression in btrfs or just in leveldb?
>>
>> btrfs lzo compression.
>
> Perfect, I'll focus on that part of things.
>
>>
>> > I'd like to take snapshots out of the picture for a minute.
>>
>> That's understandable, I guess, but I don't know that anyone has ever
>> got the problem without snapshots.  I mean, even when the master copy of
>> the database got corrupted, snapshots of the subvol containing it were
>> being taken every now and again, because that's the way ceph works.
>
> Hopefully Sage can comment, but the basic idea is that if you snapshot a
> database file the db must participate.  If it doesn't, it really is the
> same effect as crashing the box.
>
> Something is definitely broken if we're corrupting the source files
> (either with or without snapshots), but avoiding incomplete writes in
> the snapshot files requires synchronization with the db.
>
> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No space left on device (28)

2013-03-22 Thread Josef Bacik
On Fri, Mar 22, 2013 at 07:56:41AM -0600, Stefan Priebe - Profihost AG wrote:
> Hi Josef,
> Am 22.03.2013 14:53, schrieb Josef Bacik:
> > On Fri, Mar 22, 2013 at 06:11:56AM -0600, Stefan Priebe - Profihost AG 
> > wrote:
> >> Hi Chris,
> >>
> >>> Which kernel are you running?
> >>>
> >>> -chris
> >>
> >> vanilla 3.8.3.
> >
> > Ok, with the 3.9 merge window Josef changed how we do the reservations.
> > Are you able to try a slightly more experimental kernel?
> >>
> >> any ideas what i can check? 3.9-rc3 gives me same results.
> >>
> > 
> > Sorry Stefan I'm almost done with what I'm working on and then I'll work up 
> > a
> > patch for you to run so I can narrow down what's going on.  Thanks,
> 
> Great!
> 
> Thanks - just wanted to know that it's not my fault. I'm happy to test
> the patch and provide feedback.
> 

Ok I think we are way over-reserving for rename, can you give this patch a whirl
and see what happens?  If it still fails can you capture dmesg and reply with
that so I can see what's going on.  Thanks,

Josef


diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 9ac2eca..aabaea6 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4161,6 +4161,11 @@ out:
ret = 0;
}
if (flushing) {
+   if (ret == -ENOSPC) {
+   printk(KERN_ERR "returning enospc, dumping space 
info\n");
+   dump_space_info(space_info, 0, 0);
+   }
+
spin_lock(&space_info->lock);
space_info->flush = 0;
wake_up_all(&space_info->wait);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ca1b767..3980ae7 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3679,11 +3679,9 @@ static struct btrfs_trans_handle 
*__unlink_start_trans(struct inode *dir,
 * 1 for the dir item
 * 1 for the dir index
 * 1 for the inode ref
-* 1 for the inode ref in the tree log
-* 2 for the dir entries in the log
 * 1 for the inode
 */
-   trans = btrfs_start_transaction(root, 8);
+   trans = btrfs_start_transaction(root, 5);
if (!IS_ERR(trans) || PTR_ERR(trans) != -ENOSPC)
return trans;
 
@@ -8127,7 +8125,7 @@ static int btrfs_rename(struct inode *old_dir, struct 
dentry *old_dentry,
 * inodes.  So 5 * 2 is 10, plus 1 for the new link, so 11 total items
 * should cover the worst case number of items we'll modify.
 */
-   trans = btrfs_start_transaction(root, 20);
+   trans = btrfs_start_transaction(root, 11);
if (IS_ERR(trans)) {
 ret = PTR_ERR(trans);
 goto out_notrans;
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs-image: add the ability to santize file names when making an image

2013-03-22 Thread Josef Bacik
We've had a few users who wouldn't (or couldn't) provide us btrfs-images because
we maintain the file names when making an image.  So introduce a sanitize
option.  There are two uses, one that is fast and the other that is dog slow.
The fast way just generates garbage that's equal in length to the original name.
The slow way will try and find a crc32c collision for the file name that is also
the same length.  Finding a crc32c collision for the file name "btrfs-progs" on
my box without CPU crc32c support takes a little more than 3 minutes, and a
little less than 2 minutes for my box that has CPU crc32c support, so it's a
lengthy and CPU intensive process.

The idea is that we use -s for most cases, and then only use -ss when we need
the file system tree to be somewhat sane.  I could probably do a better job
about finding collisions, but I'll have to revist that later.  Thanks,

Signed-off-by: Josef Bacik 
---
 btrfs-image.c|  358 --
 man/btrfs-image.8.in |9 ++
 2 files changed, 358 insertions(+), 9 deletions(-)

diff --git a/btrfs-image.c b/btrfs-image.c
index 6befb94..39903a4 100644
--- a/btrfs-image.c
+++ b/btrfs-image.c
@@ -85,6 +85,7 @@ struct metadump_struct {
size_t num_threads;
pthread_mutex_t mutex;
pthread_cond_t cond;
+   struct rb_root name_tree;
 
struct list_head list;
struct list_head ordered;
@@ -97,6 +98,14 @@ struct metadump_struct {
int compress_level;
int done;
int data;
+   int sanitize_names;
+};
+
+struct name {
+   struct rb_node n;
+   char *val;
+   char *sub;
+   u32 len;
 };
 
 struct mdrestore_struct {
@@ -121,6 +130,8 @@ struct mdrestore_struct {
int old_restore;
 };
 
+static struct extent_buffer *alloc_dummy_eb(u64 bytenr, u32 size);
+
 static void csum_block(u8 *buf, size_t len)
 {
char result[BTRFS_CRC32_SIZE];
@@ -130,10 +141,309 @@ static void csum_block(u8 *buf, size_t len)
memcpy(buf, result, BTRFS_CRC32_SIZE);
 }
 
+static int has_name(struct btrfs_key *key)
+{
+   switch (key->type) {
+   case BTRFS_DIR_ITEM_KEY:
+   case BTRFS_DIR_INDEX_KEY:
+   case BTRFS_INODE_REF_KEY:
+   case BTRFS_INODE_EXTREF_KEY:
+   return 1;
+   default:
+   break;
+   }
+
+   return 0;
+}
+
+static char *generate_garbage(u32 name_len)
+{
+   char *buf = malloc(name_len);
+   int i;
+
+   if (!buf)
+   return NULL;
+
+   for (i = 0; i < name_len; i++) {
+   char c = rand() % 94 + 33;
+
+   if (c == '/')
+   c++;
+   buf[i] = c;
+   }
+
+   return buf;
+}
+
+static void tree_insert(struct rb_root *root, struct name *ins)
+{
+   struct rb_node ** p = &root->rb_node;
+   struct rb_node * parent = NULL;
+   struct name *entry;
+   u32 len;
+   int dir;
+
+   while(*p) {
+   parent = *p;
+   entry = rb_entry(parent, struct name, n);
+
+   len = min(ins->len, entry->len);
+   dir = memcmp(ins->val, entry->val, len);
+
+   if (dir < 0)
+   p = &(*p)->rb_left;
+   else if (dir > 0)
+   p = &(*p)->rb_right;
+   else
+   BUG();
+   }
+
+   rb_link_node(&ins->n, parent, p);
+   rb_insert_color(&ins->n, root);
+}
+
+static struct name *name_search(struct rb_root *root, char *name, u32 name_len)
+{
+   struct rb_node *n = root->rb_node;
+   struct name *entry = NULL;
+   u32 len;
+   int dir;
+
+   while (n) {
+   entry = rb_entry(n, struct name, n);
+
+   len = min(entry->len, name_len);
+
+   dir = memcmp(name, entry->val, len);
+   if (dir < 0)
+   n = n->rb_left;
+   else if (dir > 0)
+   n = n->rb_right;
+   else
+   return entry;
+   }
+
+   return NULL;
+}
+
+static char *find_collision(struct metadump_struct *md, char *name,
+   u32 name_len)
+{
+   struct name *val;
+   unsigned long checksum;
+   int found = 0;
+   int i;
+
+   val = name_search(&md->name_tree, name, name_len);
+   if (val) {
+   free(name);
+   return val->sub;
+   }
+
+   val = malloc(sizeof(struct name));
+   if (!val) {
+   fprintf(stderr, "Couldn't sanitize name, enomem\n");
+   return NULL;
+   }
+
+   memset(val, 0, sizeof(val));
+
+   val->val = name;
+   val->len = name_len;
+   val->sub = malloc(name_len);
+   if (!val->sub) {
+   fprintf(stderr, "Couldn't sanitize name, enomem\n");
+   free(val);
+   return NULL;
+   }
+
+   checksum = crc32c(~1, val->val, name_len);
+   memset(val->sub, ' ', n

Re: corruption of active mmapped files in btrfs snapshots

2013-03-22 Thread Chris Mason
Quoting Alexandre Oliva (2013-03-22 10:17:30)
> On Mar 22, 2013, Chris Mason  wrote:
> 
> > Are you using compression in btrfs or just in leveldb?
> 
> btrfs lzo compression.

Perfect, I'll focus on that part of things.

> 
> > I'd like to take snapshots out of the picture for a minute.
> 
> That's understandable, I guess, but I don't know that anyone has ever
> got the problem without snapshots.  I mean, even when the master copy of
> the database got corrupted, snapshots of the subvol containing it were
> being taken every now and again, because that's the way ceph works.

Hopefully Sage can comment, but the basic idea is that if you snapshot a
database file the db must participate.  If it doesn't, it really is the
same effect as crashing the box.

Something is definitely broken if we're corrupting the source files
(either with or without snapshots), but avoiding incomplete writes in
the snapshot files requires synchronization with the db.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


crash with 3.7.10 and balance.

2013-03-22 Thread Jon Nelson
I'm running openSUSE 12.3 on x86_64.
I was running a balance:
btrfs balance -dusage=5 -v /

using the latest btrfs tools code from git (as of this writing)
and got a crash:

[304158.496250] btrfs: found 75 extents
[304159.309289] btrfs: relocating block group 2303295684608 flags 17
[304159.839886] btrfs: found 1 extents
[304161.484616] [ cut here ]
[304161.484668] WARNING: at
/home/abuild/rpmbuild/BUILD/kernel-default-3.7.10/linux-3.7/fs/btrfs/super.c:246
__btrfs_abort_transaction+0xc3/0xe0 [btrfs]()
[304161.484671] Hardware name: TA790GX XE
[304161.484673] btrfs: Transaction aborted
[304161.484675] Modules linked in: af_packet md5 xt_REDIRECT
xt_pkttype xt_physdev xt_TCPMSS xt_tcpudp xt_LOG xt_limit iptable_nat
nf_nat_ipv4 nf_nat iptable_mangle xt_mark nfsd nfs_acl nfs fscache
lockd auth_rpcgss ebt_ip sunrpc ebtable_filter ebtables bridge stp llc
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT
iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_ftp
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4
nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter
ip6_tables x_tables cpufreq_conservative cpufreq_userspace
cpufreq_powersave snd_hda_codec_hdmi snd_hda_codec_realtek
acpi_cpufreq snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer
snd sr_mod cdrom radeon mperf ttm soundcore ata_generic sg
via_velocity sp5100_tco drm_kms_helper kvm_amd kvm microcode
snd_page_alloc r8169 pcspkr crc_ccitt button i2c_piix4 k10temp
edac_core drm pata_atiixp i2c_algo_bit edac_mce_amd shpchp pci_hotplug
wmi tcp_htcp autofs4 btrfs zlib_deflate libcrc32c raid456
async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy
async_tx raid10 raid0 raid1 ohci_hcd ehci_hcd usbcore usb_common
scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh dm_mirror
dm_region_hash dm_log dm_mod edd fan thermal processor thermal_sys
[304161.484749] Pid: 22397, comm: btrfs Tainted: GW
3.7.10-1.1-default #1
[304161.484751] Call Trace:
[304161.484770]  [] dump_trace+0x78/0x2c0
[304161.484777]  [] dump_stack+0x69/0x6f
[304161.484785]  [] warn_slowpath_common+0x79/0xc0
[304161.484791]  [] warn_slowpath_fmt+0x45/0x50
[304161.484812]  []
__btrfs_abort_transaction+0xc3/0xe0 [btrfs]
[304161.484844]  [] __btrfs_inc_extent_ref+0x1ed/0x250 [btrfs]
[304161.484899]  [] run_clustered_refs+0x666/0xa90 [btrfs]
[304161.484954]  [] btrfs_run_delayed_refs+0xca/0x310 [btrfs]
[304161.485012]  [] __btrfs_end_transaction+0xf9/0x420 [btrfs]
[304161.485085]  [] merge_reloc_root+0x48d/0x520 [btrfs]
[304161.485214]  [] merge_reloc_roots+0x101/0x140 [btrfs]
[304161.485337]  [] relocate_block_group+0x25e/0x6b0 [btrfs]
[304161.485459]  []
btrfs_relocate_block_group+0x1a9/0x2e0 [btrfs]
[304161.485579]  []
btrfs_relocate_chunk.isra.53+0x5d/0x6e0 [btrfs]
[304161.485674]  [] btrfs_balance+0x826/0xd60 [btrfs]
[304161.485770]  [] btrfs_ioctl_balance+0x136/0x420 [btrfs]
[304161.485878]  [] btrfs_ioctl+0xe54/0x1870 [btrfs]
[304161.485967]  [] do_vfs_ioctl+0x8f/0x520
[304161.485973]  [] sys_ioctl+0xa0/0xc0
[304161.485979]  [] system_call_fastpath+0x1a/0x1f
[304161.485989]  [<7f03050aef27>] 0x7f03050aef26
[304161.485991] ---[ end trace d010cbea0d653c96 ]---
[304161.485995] BTRFS error (device sdd) in
__btrfs_inc_extent_ref:1952: Object already exists
[304161.485996] btrfs is forced readonly
[304161.486051] [ cut here ]
[304161.486138] kernel BUG at
/home/abuild/rpmbuild/BUILD/kernel-default-3.7.10/linux-3.7/fs/btrfs/relocation.c:2279!
[304161.486299] invalid opcode:  [#1] SMP
[304161.486371] Modules linked in: af_packet md5 xt_REDIRECT
xt_pkttype xt_physdev xt_TCPMSS xt_tcpudp xt_LOG xt_limit iptable_nat
nf_nat_ipv4 nf_nat iptable_mangle xt_mark nfsd nfs_acl nfs fscache
lockd auth_rpcgss ebt_ip sunrpc ebtable_filter ebtables bridge stp llc
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT
iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_ftp
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4
nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter
ip6_tables x_tables cpufreq_conservative cpufreq_userspace
cpufreq_powersave snd_hda_codec_hdmi snd_hda_codec_realtek
acpi_cpufreq snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer
snd sr_mod cdrom radeon mperf ttm soundcore ata_generic sg
via_velocity sp5100_tco drm_kms_helper kvm_amd kvm microcode
snd_page_alloc r8169 pcspkr crc_ccitt button i2c_piix4 k10temp
edac_core drm pata_atiixp i2c_algo_bit edac_mce_amd shpchp pci_hotplug
wmi tcp_htcp autofs4 btrfs zlib_deflate libcrc32c raid456
async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy
async_tx raid10 raid0 raid1 ohci_hcd ehci_hcd usbcore usb_common
scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh dm_mirror
dm_region_hash dm_log dm_mod edd fan thermal processor thermal_sys
[304161.488642] CPU 3
[304161.488678] Pid: 22397, comm: btrfs Tainted: GW
3.7.10-1.1-default #1 BIOST

Re: corruption of active mmapped files in btrfs snapshots

2013-03-22 Thread Alexandre Oliva
On Mar 22, 2013, Chris Mason  wrote:

> Are you using compression in btrfs or just in leveldb?

btrfs lzo compression.

> I'd like to take snapshots out of the picture for a minute.

That's understandable, I guess, but I don't know that anyone has ever
got the problem without snapshots.  I mean, even when the master copy of
the database got corrupted, snapshots of the subvol containing it were
being taken every now and again, because that's the way ceph works.
Even back when I noticed corruption of firefox _CACHE_* files, snapshots
taken for archival were involved.  So, unless the program happens to
trigger the problem with the -DNOSNAPS option about as easily as it did
without it, I guess we may not have a choice but to keep snapshots in
the picture.

> We need some way to synchronize the leveldb with snapshotting

I purposefully refrained from doing that, because AFAICT ceph doesn't do
that.  Once I failed to trigger the problem with Sync calls, and
determined ceph only syncs the leveldb logs before taking its snapshots,
I went without syncing and finally succeeded in triggering the bug in
snapshots, by simulating very similar snapshotting and mmaping
conditions to those generated by ceph.  I haven't managed to trigger the
corruption of the master subvol yet with the test program, but I already
knew its corruption didn't occur as often as that of the snapshots, and
since it smells like two slightly different symptoms of the same bug, I
decided to leave the test program at that.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-show vs. btrfs different output

2013-03-22 Thread Jon Nelson
On Thu, Mar 21, 2013 at 11:25 AM, Eric Sandeen  wrote:
> On 3/21/13 10:29 AM, Jon Nelson wrote:
>> On Thu, Mar 21, 2013 at 10:11 AM, Eric Sandeen  wrote:
>>> On 3/21/13 10:04 AM, Jon Nelson wrote:
>> ...
 2. the current git btrfs-show and btrfs fi show both output
 *different* devices for device with UUID
 b5dc52bd-21bf-4173-8049-d54d88c82240, and they're both wrong.
>>>
>>> does blkid output find that uuid anywhere?
>>>
>>> Since you're working in git, can you maybe do a little bisecting
>>> to find out when it changed?  Should be a fairly quick test?
>>
>> blkid does /not/ report that uuid anywhere.
>>
>> git bisect, if I did it correctly, says:
>>
>>
>> 6eba9002956ac40db87d42fb653a0524dc568810 is the first bad commit
>> commit 6eba9002956ac40db87d42fb653a0524dc568810
>> Author: Goffredo Baroncelli 
>> Date:   Tue Sep 4 19:59:26 2012 +0200
>>
>> Correct un-initialized fsid variable
>>
>> :100644 100644 b21a87f827a6250da45f2fb6a1c3a6b651062243
>> 03952051b5e25e0b67f0f910c84d93eb90de8480 M  disk-io.c
>
> Ok, I think this is another case of greedily scanning stale
> backup superblocks (did you ever have btrfs on the whole sda
> or sdb?)
>
> btrfs_read_dev_super() currently tries to scan all 3 superblocks
> (primary & 2 backups).  I'm guessing that you have some stale
> backup superblocks on sda and/or sdb.
>
> Before the above commit, if the first sb didn't look valid,
> it'd skip to the 2nd.  If the 2nd (stale) one looked OK,
> it'd compare its fsid to an uniniitialized variable (fsid)
> which would fail (since the "fsid" contents were random.)
> Same for the 3rd backup if found, and eventually it'd return
> -1 as failure and not report the device.
>
> After the commit, it'd skip the first invalid sb as well.
> But this time, it takes the fsid from the 2nd superblock as
> "good" and makes it through the loop thinking that it's found
> something valid.  Hence the report of a device which you didn't
> expect even though the first superblock is indeed wiped out.
>
> There are some patches floating around to stop this
> backup superblock scanning altogether.
>
> This might fix it for you; it basically returns failure
> if any superblock on the device is found to be bad.
>
> What we really need is the right bits in the right places
> to let the administrator know if a device looks like it might
> be corrupt & in need of fixing, vs. ignoring it altogether.
>
> Not sure if this is something we want upstream but you could
> test if if you like.

I did test and it appears to resolve the issue for me.
Thank you!

-- 
Jon
Software Blacksmith
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No space left on device (28)

2013-03-22 Thread Stefan Priebe - Profihost AG
Hi Josef,
Am 22.03.2013 14:53, schrieb Josef Bacik:
> On Fri, Mar 22, 2013 at 06:11:56AM -0600, Stefan Priebe - Profihost AG wrote:
>> Hi Chris,
>>
>>> Which kernel are you running?
>>>
>>> -chris
>>
>> vanilla 3.8.3.
>
> Ok, with the 3.9 merge window Josef changed how we do the reservations.
> Are you able to try a slightly more experimental kernel?
>>
>> any ideas what i can check? 3.9-rc3 gives me same results.
>>
> 
> Sorry Stefan I'm almost done with what I'm working on and then I'll work up a
> patch for you to run so I can narrow down what's going on.  Thanks,

Great!

Thanks - just wanted to know that it's not my fault. I'm happy to test
the patch and provide feedback.

Stefan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No space left on device (28)

2013-03-22 Thread Josef Bacik
On Fri, Mar 22, 2013 at 06:11:56AM -0600, Stefan Priebe - Profihost AG wrote:
> Hi Chris,
> 
> > Which kernel are you running?
> >
> > -chris
> 
>  vanilla 3.8.3.
> >>>
> >>> Ok, with the 3.9 merge window Josef changed how we do the reservations.
> >>> Are you able to try a slightly more experimental kernel?
> 
> any ideas what i can check? 3.9-rc3 gives me same results.
> 

Sorry Stefan I'm almost done with what I'm working on and then I'll work up a
patch for you to run so I can narrow down what's going on.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: fix overflow when printing qgroup info

2013-03-22 Thread Wang Shilong

Hello,

> Hello,
> 
>> 
>> On Fri, March 22, 2013 at 12:53 (+0100), Wang Shilong wrote:
>>> From: Wang Shilong 
>>> 
>>> Since btrfs quota rescan has not been implemented yet,
>>> a user complains that "btrfs qgroup show" lists qgroup
>>> referenced/exclusive be negative. However, this should
>>> not happen even if overflow happens,because the type for
>>> qgroup referenced/exclusive is u64,fix it.
>>> 
>>> Signed-off-by: Wang Shilong 
>>> Reported-by: Koen De Wit 
>>> ---
>>> cmds-qgroup.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/cmds-qgroup.c b/cmds-qgroup.c
>>> index 60ca33d..fc4cb13 100644
>>> --- a/cmds-qgroup.c
>>> +++ b/cmds-qgroup.c
>>> @@ -105,7 +105,7 @@ static int qgroup_create(int create, int argc, char 
>>> **argv)
>>> 
>>> void print_qgroup_info(u64 objectid, struct btrfs_qgroup_info_item *info)
>>> {
>>> -   printf("%llu/%llu %lld %lld\n", objectid >> 48,
>>> +   printf("%llu/%llu %llu %llu\n", objectid >> 48,
>>> objectid & ((1ll << 48) - 1),
>>> btrfs_stack_qgroup_info_referenced(info),
>>> btrfs_stack_qgroup_info_exclusive(info));
>>> 
>> 
>> I don't like that change. Seeing negative numbers is what you should expect 
>> in
>> the current situation.
>> 
>> Once anyone come across negative numbers with a volume holding more data than
>> what can be tracked with 63 bit, I may come to agree to your change. For 
>> now, it
>> will confuse more than help.
> 
> Maybe, you are right.
> 
> But the type for referenced/exclusive is u64. Considering the following case:
> 
> overflow happens, referenced/exclusive changes into a big positive integer, 
> so next time,
> when we doing accounting, it may return edquot. So i think the check in the 
> kernel is necessary.

For the above case, From the user view, the referenced/exclusive is negative, 
but user can't continue to write data. How strange it is!

So, i think to have a check in the kernel and making the referenced/exclusive 
to be 0
and give a warning in the kernel is better than current situation.

If you agree with my approach, i will make the patch and send it next week. ~_~

Thanks,
Wang

> 
> Or am i missing something ?
> 
> Thanks,
> Wang
> 
> 
>> 
>> -Jan
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: fix overflow when printing qgroup info

2013-03-22 Thread Wang Shilong
Hello,

> 
> On Fri, March 22, 2013 at 12:53 (+0100), Wang Shilong wrote:
>> From: Wang Shilong 
>> 
>> Since btrfs quota rescan has not been implemented yet,
>> a user complains that "btrfs qgroup show" lists qgroup
>> referenced/exclusive be negative. However, this should
>> not happen even if overflow happens,because the type for
>> qgroup referenced/exclusive is u64,fix it.
>> 
>> Signed-off-by: Wang Shilong 
>> Reported-by: Koen De Wit 
>> ---
>> cmds-qgroup.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/cmds-qgroup.c b/cmds-qgroup.c
>> index 60ca33d..fc4cb13 100644
>> --- a/cmds-qgroup.c
>> +++ b/cmds-qgroup.c
>> @@ -105,7 +105,7 @@ static int qgroup_create(int create, int argc, char 
>> **argv)
>> 
>> void print_qgroup_info(u64 objectid, struct btrfs_qgroup_info_item *info)
>> {
>> -printf("%llu/%llu %lld %lld\n", objectid >> 48,
>> +printf("%llu/%llu %llu %llu\n", objectid >> 48,
>>  objectid & ((1ll << 48) - 1),
>>  btrfs_stack_qgroup_info_referenced(info),
>>  btrfs_stack_qgroup_info_exclusive(info));
>> 
> 
> I don't like that change. Seeing negative numbers is what you should expect in
> the current situation.
> 
> Once anyone come across negative numbers with a volume holding more data than
> what can be tracked with 63 bit, I may come to agree to your change. For now, 
> it
> will confuse more than help.

Maybe, you are right.

But the type for referenced/exclusive is u64. Considering the following case:

overflow happens, referenced/exclusive changes into a big positive integer, so 
next time,
when we doing accounting, it may return edquot. So i think the check in the 
kernel is necessary.

Or am i missing something ?

Thanks,
Wang


> 
> -Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: fix overflow when printing qgroup info

2013-03-22 Thread Jan Schmidt

On Fri, March 22, 2013 at 12:53 (+0100), Wang Shilong wrote:
> From: Wang Shilong 
> 
> Since btrfs quota rescan has not been implemented yet,
> a user complains that "btrfs qgroup show" lists qgroup
> referenced/exclusive be negative. However, this should
> not happen even if overflow happens,because the type for
> qgroup referenced/exclusive is u64,fix it.
> 
> Signed-off-by: Wang Shilong 
> Reported-by: Koen De Wit 
> ---
>  cmds-qgroup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/cmds-qgroup.c b/cmds-qgroup.c
> index 60ca33d..fc4cb13 100644
> --- a/cmds-qgroup.c
> +++ b/cmds-qgroup.c
> @@ -105,7 +105,7 @@ static int qgroup_create(int create, int argc, char 
> **argv)
>  
>  void print_qgroup_info(u64 objectid, struct btrfs_qgroup_info_item *info)
>  {
> - printf("%llu/%llu %lld %lld\n", objectid >> 48,
> + printf("%llu/%llu %llu %llu\n", objectid >> 48,
>   objectid & ((1ll << 48) - 1),
>   btrfs_stack_qgroup_info_referenced(info),
>   btrfs_stack_qgroup_info_exclusive(info));
> 

I don't like that change. Seeing negative numbers is what you should expect in
the current situation.

Once anyone come across negative numbers with a volume holding more data than
what can be tracked with 63 bit, I may come to agree to your change. For now, it
will confuse more than help.

-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adding a non-empty subvol to a qgroup

2013-03-22 Thread Arne Jansen
On 22.03.2013 13:03, Wang Shilong wrote:
> Hello Arne,
> 
> Since "quota rescan" has not been implemented yet,
> 
> overflow can happen, so until now, we can have a check when
> doing accounting in the kernel, if the referenced/exclusive is not
> enough to delete, we just make it to be 0 and give a warning.
> 
> Otherwise, user may get a strange integer(because of type u64).
> How do you think ? or we just wait for the implement of rescan.

I think we already print it negatively. Please just leave it as
it is.

Thanks,
Arne

> 
> Thanks,
> Wang
> 
>> All,
>>
>> When adding a subvolume to a qgroup, pre-existing files in that subvolume 
>> are not counted in the referenced/exclusive space of the qgroup. Is this 
>> intended behavior ?
>>
>> I create a subvol with one file:
>>
>>  # mkfs.btrfs /dev/sdg
>>  # mount /dev/sdg /mnt/fulldisk
>>  # cd /mnt/fulldisk
>>  # btrfs quota enable ./
>>  # btrfs sub create sub1
>>  # dd if=/dev/zero of=sub1/file1 bs=10 count=1
>>  # sync
>>  # btrfs qgroup show ./
>>  0/257 106496 106496
>>
>> Now I create a new qgroup on level 1 and add the qgroup of sub1 to it :
>>
>>  # btrfs qgroup create 1/0 ./
>>  # btrfs qgroup assign 0/257 1/0 ./
>>  # sync
>>  # btrfs fi sync ./
>>  # btrfs quota rescan ./
>>  # btrfs quota rescan ./sub1
>>  # btrfs qgroup show ./
>>  0/257 106496 106496
>>  1/0 0 0
>>
>> The pre-existing file does not contribute to the space numbers.
>>
>> Let's create a new file:
>>
>>  # dd if=/dev/zero of=sub1/file2 bs=5 count=1
>>  # sync
>>  # btrfs qgroup show ./
>>  0/257 159744 159744
>>  1/0 53248 53248
>>
>> We see that only the new file is included in the space numbers.
>>
>> Now I remove the first file:
>>
>>  # rm -f sub1/file1
>>  # sync
>>  # btrfs qgroup show ./
>>  0/257 57344 57344
>>  1/0 -49152 -49152
>>
>> The space numbers go below zero. Even if the behavior above is intended, the 
>> removal of the pre-existing file should not result in negative space numbers.
>>
>> Koen.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No space left on device (28)

2013-03-22 Thread Stefan Priebe - Profihost AG
Hi Chris,

> Which kernel are you running?
>
> -chris

 vanilla 3.8.3.
>>>
>>> Ok, with the 3.9 merge window Josef changed how we do the reservations.
>>> Are you able to try a slightly more experimental kernel?

any ideas what i can check? 3.9-rc3 gives me same results.

Greets,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corruption of active mmapped files in btrfs snapshots

2013-03-22 Thread Chris Mason
Quoting Alexandre Oliva (2013-03-22 01:27:42)
> On Mar 21, 2013, Chris Mason  wrote:
> 
> > Quoting Chris Mason (2013-03-21 14:06:14)
> >> With mmap the kernel can pick any given time to start writing out dirty
> >> pages.  The idea is that if the application makes more changes the page
> >> becomes dirty again and the kernel writes it again.
> 
> That's the theory.  But what if there's some race between the time the
> page is frozen for compressing and the time it's marked as clean, or
> it's marked as clean after it's further modified, or a subsequent write
> to the same page ends up overridden by the background compression of the
> old contents of the page?  These are all possibilities that come to mind
> without knowing much about btrfs inner workings.

Definitely, there is a lot of room for racing.  Are you using
compression in btrfs or just in leveldb?

> 
> >> So the question is, can you trigger this without snapshots being done
> >> at all?
> 
> I haven't tried, but I now have a program that hit the error condition
> while taking snapshots in background with small time perturbations to
> increase the likelihood of hitting a race condition at the exact time.
> It uses leveldb's infrastructure for the mmapping, but it shouldn't be
> too hard to adapt it so that it doesn't.
> 
> > So my test program creates an 8GB file in chunks of 1MB each.
> 
> That's probably too large a chunk to write at a time.  The bug is
> exercised with writes slightly smaller than a single page (although
> straddling across two consecutive pages).
> 
> This half-baked test program (hereby provided under the terms of the GNU
> GPLv3+) creates a btrfs subvolume and two files in it: one in which I/O
> will be performed with write()s, another that will get the same data
> appended with leveldb's mmap-based output interface.  Random block
> sizes, as well as milli and microsecond timing perturbations, are read
> from /dev/urandom, and the rest of the output buffer is filled with
> (char)1.
> 
> The test that actually failed (on the first try!, after some other
> variations that didn't fail) didn't have any of the #ifdef options
> enabled (i.e., no -D* flags during compilation), but it triggered the
> exact failure observed with ceph: zeros at the end of a page where there
> should have been nonzero data, followed by nonzero data on the following
> page!  That was within snapshots, not in the main subvol, but hopefully
> it's the same problem, just a bit harder to trigger.

I'd like to take snapshots out of the picture for a minute.  We need
some way to synchronize the leveldb with snapshotting because the
snapshot is basically the same thing as a crash from a db point of view.

Corrupting the main database file is a much different (and bigger)
problem.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: Include the device in most error printk()s

2013-03-22 Thread David Sterba
On Thu, Mar 21, 2013 at 01:58:19PM -0400, Chris Mason wrote:
> I really like this patch, but we're going to have to sit on it until the
> next merge window.  It's fairly large and we need to stick with things
> that are more clearly bug fixes for now.
> 
> But, I don't have a problem with queuing it into linux-next.  After the
> next rc we can get Josef's tree into the full linux-next for more
> testing.

I agree, this is not for current dev cycle, we may want to tweak the
messages and transform more printks to the enhanced format.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adding a non-empty subvol to a qgroup

2013-03-22 Thread Wang Shilong
Hello Arne,

Since "quota rescan" has not been implemented yet,

overflow can happen, so until now, we can have a check when
doing accounting in the kernel, if the referenced/exclusive is not
enough to delete, we just make it to be 0 and give a warning.

Otherwise, user may get a strange integer(because of type u64).
How do you think ? or we just wait for the implement of rescan.

Thanks,
Wang

> All,
> 
> When adding a subvolume to a qgroup, pre-existing files in that subvolume are 
> not counted in the referenced/exclusive space of the qgroup. Is this intended 
> behavior ?
> 
> I create a subvol with one file:
> 
>  # mkfs.btrfs /dev/sdg
>  # mount /dev/sdg /mnt/fulldisk
>  # cd /mnt/fulldisk
>  # btrfs quota enable ./
>  # btrfs sub create sub1
>  # dd if=/dev/zero of=sub1/file1 bs=10 count=1
>  # sync
>  # btrfs qgroup show ./
>  0/257 106496 106496
> 
> Now I create a new qgroup on level 1 and add the qgroup of sub1 to it :
> 
>  # btrfs qgroup create 1/0 ./
>  # btrfs qgroup assign 0/257 1/0 ./
>  # sync
>  # btrfs fi sync ./
>  # btrfs quota rescan ./
>  # btrfs quota rescan ./sub1
>  # btrfs qgroup show ./
>  0/257 106496 106496
>  1/0 0 0
> 
> The pre-existing file does not contribute to the space numbers.
> 
> Let's create a new file:
> 
>  # dd if=/dev/zero of=sub1/file2 bs=5 count=1
>  # sync
>  # btrfs qgroup show ./
>  0/257 159744 159744
>  1/0 53248 53248
> 
> We see that only the new file is included in the space numbers.
> 
> Now I remove the first file:
> 
>  # rm -f sub1/file1
>  # sync
>  # btrfs qgroup show ./
>  0/257 57344 57344
>  1/0 -49152 -49152
> 
> The space numbers go below zero. Even if the behavior above is intended, the 
> removal of the pre-existing file should not result in negative space numbers.
> 
> Koen.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs-progs: fix overflow when printing qgroup info

2013-03-22 Thread Wang Shilong
From: Wang Shilong 

Since btrfs quota rescan has not been implemented yet,
a user complains that "btrfs qgroup show" lists qgroup
referenced/exclusive be negative. However, this should
not happen even if overflow happens,because the type for
qgroup referenced/exclusive is u64,fix it.

Signed-off-by: Wang Shilong 
Reported-by: Koen De Wit 
---
 cmds-qgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cmds-qgroup.c b/cmds-qgroup.c
index 60ca33d..fc4cb13 100644
--- a/cmds-qgroup.c
+++ b/cmds-qgroup.c
@@ -105,7 +105,7 @@ static int qgroup_create(int create, int argc, char **argv)
 
 void print_qgroup_info(u64 objectid, struct btrfs_qgroup_info_item *info)
 {
-   printf("%llu/%llu %lld %lld\n", objectid >> 48,
+   printf("%llu/%llu %llu %llu\n", objectid >> 48,
objectid & ((1ll << 48) - 1),
btrfs_stack_qgroup_info_referenced(info),
btrfs_stack_qgroup_info_exclusive(info));
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adding a non-empty subvol to a qgroup

2013-03-22 Thread Wang Shilong
Hello,

> All,
> 
> When adding a subvolume to a qgroup, pre-existing files in that subvolume are 
> not counted in the referenced/exclusive space of the qgroup. Is this intended 
> behavior ?

Btrfs quota rescan has not been implemented yet. So until now, the pre-existing 
files are not counted.

> 
> I create a subvol with one file:
> 
>  # mkfs.btrfs /dev/sdg
>  # mount /dev/sdg /mnt/fulldisk
>  # cd /mnt/fulldisk
>  # btrfs quota enable ./
>  # btrfs sub create sub1
>  # dd if=/dev/zero of=sub1/file1 bs=10 count=1
>  # sync
>  # btrfs qgroup show ./
>  0/257 106496 106496
> 
> Now I create a new qgroup on level 1 and add the qgroup of sub1 to it :
> 
>  # btrfs qgroup create 1/0 ./
>  # btrfs qgroup assign 0/257 1/0 ./
>  # sync
>  # btrfs fi sync ./
>  # btrfs quota rescan ./
>  # btrfs quota rescan ./sub1
>  # btrfs qgroup show ./
>  0/257 106496 106496
>  1/0 0 0
> 
> The pre-existing file does not contribute to the space numbers.
> 
> Let's create a new file:
> 
>  # dd if=/dev/zero of=sub1/file2 bs=5 count=1
>  # sync
>  # btrfs qgroup show ./
>  0/257 159744 159744
>  1/0 53248 53248
> 
> We see that only the new file is included in the space numbers.
> 
> Now I remove the first file:
> 
>  # rm -f sub1/file1
>  # sync
>  # btrfs qgroup show ./
>  0/257 57344 57344
>  1/0 -49152 -49152
> 
> The space numbers go below zero. Even if the behavior above is intended, the 
> removal of the pre-existing file should not result in negative space numbers.


Since rescan has not been implemented, the above case can not be avoided. but 
the referenced/exclusive value should not be negative because 
the type for referenced/exclusive is u64.  

if the overflow  happens, we may get a positive integer which is also wrong.
The problems come in btrfs-progs, i will correct it. thanks 

Maybe, we should have a check in the kernel and give a warning if overflow 
happens. 

Thanks,
Wang

 
> 
> Koen.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration branch of btrfs-progs 2013-03-20

2013-03-22 Thread David Sterba
On Wed, Mar 20, 2013 at 09:08:49PM +0100, Martin Steigerwald wrote:
> The fragmentation visuation tool works nicely. Tried it on about 200 GiB
> /home, almost no fragmentation, some 3-4%, but mostly 0.something%,
> and on about 20 GiB root, were some chunks were fragmented upto 30-40%.
> 
> What kind of fragmentation does the tool count? From what I understand
> by looking at the source, extent fragmentation?

It's a high-level view of the 1GB chunk fragmentation, reflecting the
contiguous allocated vs free space.

A partition with frequent rewrites is likely to contain more fragmented
chunks as you observe for your / .

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Adding a non-empty subvol to a qgroup

2013-03-22 Thread Koen De Wit

All,

When adding a subvolume to a qgroup, pre-existing files in that 
subvolume are not counted in the referenced/exclusive space of the 
qgroup. Is this intended behavior ?


I create a subvol with one file:

  # mkfs.btrfs /dev/sdg
  # mount /dev/sdg /mnt/fulldisk
  # cd /mnt/fulldisk
  # btrfs quota enable ./
  # btrfs sub create sub1
  # dd if=/dev/zero of=sub1/file1 bs=10 count=1
  # sync
  # btrfs qgroup show ./
  0/257 106496 106496

Now I create a new qgroup on level 1 and add the qgroup of sub1 to it :

  # btrfs qgroup create 1/0 ./
  # btrfs qgroup assign 0/257 1/0 ./
  # sync
  # btrfs fi sync ./
  # btrfs quota rescan ./
  # btrfs quota rescan ./sub1
  # btrfs qgroup show ./
  0/257 106496 106496
  1/0 0 0

The pre-existing file does not contribute to the space numbers.

Let's create a new file:

  # dd if=/dev/zero of=sub1/file2 bs=5 count=1
  # sync
  # btrfs qgroup show ./
  0/257 159744 159744
  1/0 53248 53248

We see that only the new file is included in the space numbers.

Now I remove the first file:

  # rm -f sub1/file1
  # sync
  # btrfs qgroup show ./
  0/257 57344 57344
  1/0 -49152 -49152

The space numbers go below zero. Even if the behavior above is intended, 
the removal of the pre-existing file should not result in negative space 
numbers.


Koen.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No space left on device (28)

2013-03-22 Thread Stefan Priebe - Profihost AG
Hi,

Am 22.03.2013 07:41, schrieb cwillu:> On Fri, Mar 22, 2013 at 12:39 AM,
Stefan Priebe - Profihost AG
>  wrote:
>> Already tried with value 5 did not help ;-( and it also happens with
>> plain cp copying a 15gb file and aborts at about 80%
>
> You tried -musage=5?  Your original email said -dusage=5.

sorry i missed the difference - but it does not work:

~# btrfs fi balance start -musage=5 /mnt/
ERROR: error during balancing '/mnt/' - No space left on device
There may be more info in syslog - try dmesg | tail

~# dmesg|tail
[ 1183.019367] device fsid fc23c4a8-a351-4c91-96a2-ee6abeffe59a devid 1
transid 6224 /dev/mapper/raid54tb1
[ 1183.019860] btrfs: disk space caching is enabled
[42719.915781] btrfs: relocating block group 4194304 flags 4
[42727.916141] btrfs: 5 enospc errors during balance

Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html