Re: [RFC] improve space utilization on off-sized raid devices

2011-11-16 Thread Arne Jansen
On 17.11.2011 01:27, Thomas Schmidt wrote:
> I wrote a small patch to improve allocation on differently sized raid devices.
> 
> With 2.6.38 I frequently ran into a no space left error that I attribute to
> this. But I'm not entierly sure. The fs was an 8 device -d raid0 -m raid10.
> The used space was the same across all devices. 5 were full and 3 bigger ones 
> still had plenty of space.
> I was unable to use the remaning space and a balance did not fix it for
> long.
> 

Did you also test with 3.0? In 3.0, the allocation strategy changed vastly.
In your setup, it should stripe to all 8 devices until the 5 smaller ones
are full, and from then on stripe to the 3 remaining devices.
See commit

commit 73c5de0051533cbdf2bb656586c3eb21a475aa7d
Author: Arne Jansen 
Date:   Tue Apr 12 12:07:57 2011 +0200

btrfs: quasi-round-robin for chunk allocation

Also using raid1 instead of raid10 will yield a better space utilization.

-Arne

> Now I tried to avoid getting there again.
> 
> The basic idea to not allocate space on the devices with the least free
> space. The amount of devices to leave out is calculated on each allocation
> to ajust to changing circumstances. It leaves the minimum number that still
> can achieve full space usage.
> 


> Additionally I tought leaving at least one out might be of use in device 
> removal.
> 
> Please take extra care with this. I'm new to btrfs, kernel and C in general.
> It was written and tested with 3.0.0.
> 
> 
> --- volumes.c.orig  2011-10-07 16:50:04.0 +0200
> +++ volumes.c   2011-11-16 23:49:08.097085568 +0100
> @@ -2329,6 +2329,8 @@ static int __btrfs_alloc_chunk(struct bt
> u64 stripe_size;
> u64 num_bytes;
> int ndevs;
> +   u64 fs_total_avail;
> +   int opt_ndevs;
> int i;
> int j;
>  
> @@ -2404,6 +2406,7 @@ static int __btrfs_alloc_chunk(struct bt
>  * about the available holes on each device.
>  */
> ndevs = 0;
> +   fs_total_avail = 0;
> while (cur != &fs_devices->alloc_list) {
> struct btrfs_device *device;
> u64 max_avail;
> @@ -2448,6 +2451,7 @@ static int __btrfs_alloc_chunk(struct bt
> devices_info[ndevs].total_avail = total_avail;
> devices_info[ndevs].dev = device;
> ++ndevs;
> +   fs_total_avail += total_avail;
> }
>  
> /*
> @@ -2456,6 +2460,20 @@ static int __btrfs_alloc_chunk(struct bt
> sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
>  btrfs_cmp_device_info, NULL);
>  
> +   /*
> +* do not allocate space on all devices
> +* instead balance free space to maximise space utilization
> +* (this needs tweaking if parity raid gets implemented
> +* for n parity ignore the n first (after sort) devs in the sum and 
> division)
> +*/
> +   opt_ndevs = fs_total_avail / devices_info[0].total_avail;
> +   if (opt_ndevs >= ndevs)
> +   opt_ndevs = ndevs - 1; //optional, might be used for faster 
> dev remove?
> +   if (opt_ndevs < devs_min)
> +   opt_ndevs = devs_min;
> +   if (ndevs > opt_ndevs)
> +   ndevs = opt_ndevs;
> +
> /* round down to number of usable stripes */
> ndevs -= ndevs % devs_increment;
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: rewrite btrfs_trim_block_group()

2011-11-16 Thread Li Zefan
There are various bugs in block group trimming:

- It may trim from offset smaller than user-specified offset.
- It may trim beyond user-specified range.
- It may leak free space for extents smaller than specified minlen.
- It may truncate the last trimmed extent thus leak free space.
- With mixed extents+bitmaps, some extents may not be trimmed.
- With mixed extents+bitmaps, some bitmaps may not be trimmed (even
none will be trimmed). Even for those trimmed, not all the free space
in the bitmaps will be trimmed.

I rewrite btrfs_trim_block_group() and break it into two functions.
One is to trim extents only, and the other is to trim bitmaps only.

Signed-off-by: Li Zefan 
---
 fs/btrfs/free-space-cache.c |  235 ++-
 1 files changed, 164 insertions(+), 71 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 8c32434..89cc54e 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2575,17 +2575,57 @@ void btrfs_init_free_cluster(struct btrfs_free_cluster 
*cluster)
cluster->block_group = NULL;
 }
 
-int btrfs_trim_block_group(struct btrfs_block_group_cache *block_group,
-  u64 *trimmed, u64 start, u64 end, u64 minlen)
+static int do_trimming(struct btrfs_block_group_cache *block_group,
+  u64 *total_trimmed, u64 start, u64 bytes,
+  u64 reserved_start, u64 reserved_bytes)
 {
-   struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
-   struct btrfs_free_space *entry = NULL;
+   struct btrfs_space_info *space_info = block_group->space_info;
struct btrfs_fs_info *fs_info = block_group->fs_info;
-   u64 bytes = 0;
-   u64 actually_trimmed;
-   int ret = 0;
+   int ret;
+   int update = 0;
+   u64 trimmed = 0;
 
-   *trimmed = 0;
+   spin_lock(&space_info->lock);
+   spin_lock(&block_group->lock);
+   if (!block_group->ro) {
+   block_group->reserved += reserved_bytes;
+   space_info->bytes_reserved += reserved_bytes;
+   update = 1;
+   }
+   spin_unlock(&block_group->lock);
+   spin_unlock(&space_info->lock);
+
+   ret = btrfs_error_discard_extent(fs_info->extent_root,
+start, bytes, &trimmed);
+   if (!ret)
+   *total_trimmed += trimmed;
+
+   btrfs_add_free_space(block_group, reserved_start, reserved_bytes);
+
+   if (update) {
+   spin_lock(&space_info->lock);
+   spin_lock(&block_group->lock);
+   if (block_group->ro)
+   space_info->bytes_readonly += reserved_bytes;
+   block_group->reserved -= reserved_bytes;
+   space_info->bytes_reserved -= reserved_bytes;
+   spin_unlock(&space_info->lock);
+   spin_unlock(&block_group->lock);
+   }
+
+   return ret;
+}
+
+static int trim_no_bitmap(struct btrfs_block_group_cache *block_group,
+ u64 *total_trimmed, u64 start, u64 end, u64 minlen)
+{
+   struct btrfs_free_space_ctl *ctl = block_group->free_space_ctl;
+   struct btrfs_free_space *entry;
+   struct rb_node *node;
+   int ret;
+   u64 extent_start;
+   u64 extent_bytes;
+   u64 bytes;
 
while (start < end) {
spin_lock(&ctl->tree_lock);
@@ -2596,81 +2636,118 @@ int btrfs_trim_block_group(struct 
btrfs_block_group_cache *block_group,
}
 
entry = tree_search_offset(ctl, start, 0, 1);
-   if (!entry)
-   entry = tree_search_offset(ctl,
-  offset_to_bitmap(ctl, start),
-  1, 1);
-
-   if (!entry || entry->offset >= end) {
+   if (!entry) {
spin_unlock(&ctl->tree_lock);
break;
}
 
-   if (entry->bitmap) {
-   ret = search_bitmap(ctl, entry, &start, &bytes);
-   if (!ret) {
-   if (start >= end) {
-   spin_unlock(&ctl->tree_lock);
-   break;
-   }
-   bytes = min(bytes, end - start);
-   bitmap_clear_bits(ctl, entry, start, bytes);
-   if (entry->bytes == 0)
-   free_bitmap(ctl, entry);
-   } else {
-   start = entry->offset + BITS_PER_BITMAP *
-   block_group->sectorsize;
+   /* skip bitmaps */
+   while (entry->bitmap) {
+   node = rb_next(&entry->offset_index);
+   if (!node) {
 

[PATCH v2 1/2] Btrfs: fix to search one more bitmap for cluster setup

2011-11-16 Thread Li Zefan
Suppose there are two bitmaps [0, 256], [256, 512] and one extent
[100, 120] in the free space cache, and we want to setup a cluster
with offset=100, bytes=50.

In this case, there will be only one bitmap [256, 512] in the temporary
bitmaps list, and then setup_cluster_bitmap() won't search bitmap [0, 256].

The cause is, the list is constructed in setup_cluster_no_bitmap(),
and only bitmaps with bitmap_entry->offset >= offset will be added
into the list, and the very bitmap that convers offset has
bitmap_entry->offset <= offset.

Signed-off-by: Li Zefan 
---

v2: fix a NULL pointer deref.

---
 fs/btrfs/free-space-cache.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 181760f..8f792f4 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2453,11 +2453,23 @@ setup_cluster_bitmap(struct btrfs_block_group_cache 
*block_group,
struct btrfs_free_space *entry;
struct rb_node *node;
int ret = -ENOSPC;
+   u64 bitmap_offset = offset_to_bitmap(ctl, offset);
 
if (ctl->total_bitmaps == 0)
return -ENOSPC;
 
/*
+* The bitmap that covers offset won't be in the list unless offset
+* is just its start offset.
+*/
+   entry = list_first_entry(bitmaps, struct btrfs_free_space, list);
+   if (entry->offset != bitmap_offset) {
+   entry = tree_search_offset(ctl, bitmap_offset, 1, 0);
+   if (entry && list_empty(&entry->list))
+   list_add(&entry->list, bitmaps);
+   }
+
+   /*
 * First check our cached list of bitmaps and see if there is an entry
 * here that will work.
 */
-- 
1.7.3.1
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/21] [RFC] Btrfs: restriper

2011-11-16 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/23/2011 04:01 PM, Ilya Dryomov wrote:
> Hello,
> 
> This patch series adds an initial implementation of restriper (it's
> a clever name for relocation framework that allows to do selective
> profile changing and selective balancing with some goodies like
> pausing/resuming and reporting progress to the user.
> 
> Profile changing is global (per-FS) so far, per-subvolume profiles 
> require some discussion and can be implemented in future.  This is
> a RFC so some features/problems are not yet implemented/resolved.
> The current TODO list is as follows:

I managed to use these patches to convert the raid1 system and
metadata chunks back to single and drop the second disk from a two
disk array.  In doing so I noticed that the restriper required a force
switch to downgrade raid1 to single.  This seems completely
unnecessary to me.  A force switch to btrfs device delete might make
sense since delete may or may not force a downgrade, but with
restripe, the request to convert from raid1 to single is already quite
explicit with no room for ambiguity, so there should be no need for an
additional confirmation switch.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7Ee+oACgkQJ4UciIs+XuIGIQCdFx9cP7cPQPslE9IcFNDg/6Ns
LQYAn2l2ykGwiJt/yZNvuqePyMj3sxYH
=P+HR
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fs/btrfs/locking.c: Removed some unneeded return statements

2011-11-16 Thread Marcos Paulo de Souza
Signed-off-by: Marcos Paulo de Souza 
---
 fs/btrfs/locking.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index d77b67c..8abb870 100644
--- a/fs/btrfs/locking.c
+++ b/fs/btrfs/locking.c
@@ -48,7 +48,6 @@ void btrfs_set_lock_blocking_rw(struct extent_buffer *eb, int 
rw)
atomic_dec(&eb->spinning_readers);
read_unlock(&eb->lock);
}
-   return;
 }
 
 /*
@@ -71,7 +70,6 @@ void btrfs_clear_lock_blocking_rw(struct extent_buffer *eb, 
int rw)
if (atomic_dec_and_test(&eb->blocking_readers))
wake_up(&eb->read_lock_wq);
}
-   return;
 }
 
 /*
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] improve space utilization on off-sized raid devices

2011-11-16 Thread Thomas Schmidt
I wrote a small patch to improve allocation on differently sized raid devices.

With 2.6.38 I frequently ran into a no space left error that I attribute to
this. But I'm not entierly sure. The fs was an 8 device -d raid0 -m raid10.
The used space was the same across all devices. 5 were full and 3 bigger ones 
still had plenty of space.
I was unable to use the remaning space and a balance did not fix it for
long.

Now I tried to avoid getting there again.

The basic idea to not allocate space on the devices with the least free
space. The amount of devices to leave out is calculated on each allocation
to ajust to changing circumstances. It leaves the minimum number that still
can achieve full space usage.

Additionally I tought leaving at least one out might be of use in device 
removal.

Please take extra care with this. I'm new to btrfs, kernel and C in general.
It was written and tested with 3.0.0.


--- volumes.c.orig  2011-10-07 16:50:04.0 +0200
+++ volumes.c   2011-11-16 23:49:08.097085568 +0100
@@ -2329,6 +2329,8 @@ static int __btrfs_alloc_chunk(struct bt
u64 stripe_size;
u64 num_bytes;
int ndevs;
+   u64 fs_total_avail;
+   int opt_ndevs;
int i;
int j;
 
@@ -2404,6 +2406,7 @@ static int __btrfs_alloc_chunk(struct bt
 * about the available holes on each device.
 */
ndevs = 0;
+   fs_total_avail = 0;
while (cur != &fs_devices->alloc_list) {
struct btrfs_device *device;
u64 max_avail;
@@ -2448,6 +2451,7 @@ static int __btrfs_alloc_chunk(struct bt
devices_info[ndevs].total_avail = total_avail;
devices_info[ndevs].dev = device;
++ndevs;
+   fs_total_avail += total_avail;
}
 
/*
@@ -2456,6 +2460,20 @@ static int __btrfs_alloc_chunk(struct bt
sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
 btrfs_cmp_device_info, NULL);
 
+   /*
+* do not allocate space on all devices
+* instead balance free space to maximise space utilization
+* (this needs tweaking if parity raid gets implemented
+* for n parity ignore the n first (after sort) devs in the sum and 
division)
+*/
+   opt_ndevs = fs_total_avail / devices_info[0].total_avail;
+   if (opt_ndevs >= ndevs)
+   opt_ndevs = ndevs - 1; //optional, might be used for faster dev 
remove?
+   if (opt_ndevs < devs_min)
+   opt_ndevs = devs_min;
+   if (ndevs > opt_ndevs)
+   ndevs = opt_ndevs;
+
/* round down to number of usable stripes */
ndevs -= ndevs % devs_increment;

-- 
NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie!   
Jetzt informieren: http://www.gmx.net/de/go/freephone
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Segmentation Faults

2011-11-16 Thread David Sterba
Hi,

On Wed, Nov 16, 2011 at 04:39:21PM +, Tim Crone wrote:
> root@berna:~# uname -a
> Linux berna 2.6.38-bpo.2-amd64 #1 SMP Mon Jun 6 15:24:02 UTC 2011
> x86_64 GNU/Linux

2.6.38 is quite old from btrfs perspective and it's highly possible that
the bug you've hit is already fixed. Try newer (like 3.1) kernel and if
you still hit some sort of crash, please send the report.

> root@berna:/home/tjc# rm -rf .cache/chromium
> Segmentation fault
> root@berna:/home/tjc#
> Message from syslogd@localhost at Nov 16 11:19:35 ...
>  kernel:[   66.877568] [ cut here ]
> 
> Message from syslogd@localhost at Nov 16 11:19:35 ...
>  kernel:[   66.877572] invalid opcode:  [#1] SMP
> 
> Message from syslogd@localhost at Nov 16 11:19:35 ...
>  kernel:[   66.877573] last sysfs file: /sys/devices/virtual/block/md0/uevent
> 
> Message from syslogd@localhost at Nov 16 11:19:35 ...
>  kernel:[   66.877633] Stack:
> 
> Message from syslogd@localhost at Nov 16 11:19:35 ...
>  kernel:[   66.877640] Call Trace:
> 
> Message from syslogd@localhost at Nov 16 11:19:35 ...
>  kernel:[   66.877703] Code: 24 24 48 8b 74 24 28 48 8d 54 24 50 41 b9 01 00 
> 00 00 48 89 d9 4c 89 e7 e8 80 6e ff ff 83 f8 00 41 89 c5 0f 8c c5 02 00 00 74
> 04 <0f> 0b eb fe 4c 8b 2b 8b 73 40 4c 89 ef e8 a3 e5 ff ff 41 89 c6

btw such output is printed to every console when the crash occurs, but
lacks important information like stacktrace and names of functions.
without that it's very hard to get a clue what happened.

> (I was going to include some log entries, but the lines are longer than
> 80 characters and your system will not accept them.)

As for stacktrace and line length, I personally want to see the lines
not wrapped at 80 (except the eg. byte dump of instructions), it helps
readability.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cluster-devel] fallocate vs O_(D)SYNC

2011-11-16 Thread Mark Fasheh
On Wed, Nov 16, 2011 at 11:35:40AM -0800, Mark Fasheh wrote:
> > We should do it per FS though, I'll patch up btrfs.
> 
> I agree about doing it per FS. Ocfs2 just needs a one-liner to mark the
> journal transaction as synchronous.

Joel, here's an (untested) patch to fix this in Ocfs2.
--Mark

--
Mark Fasheh

From: Mark Fasheh 

ocfs2: honor O_(D)SYNC flag in fallocate

We need to sync the transaction which updates i_size if the file is marked
as needing sync semantics.

Signed-off-by: Mark Fasheh 
---
 fs/ocfs2/file.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index de4ea1a..cac00b4 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1950,6 +1950,9 @@ static int __ocfs2_change_file_space(struct file *file, 
struct inode *inode,
if (ret < 0)
mlog_errno(ret);
 
+   if (file->f_flags & O_SYNC)
+   handle->h_sync = 1;
+
ocfs2_commit_trans(osb, handle);
 
 out_inode_unlock:
-- 
1.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cluster-devel] fallocate vs O_(D)SYNC

2011-11-16 Thread Mark Fasheh
On Wed, Nov 16, 2011 at 11:18:06AM -0500, Chris Mason wrote:
> On Wed, Nov 16, 2011 at 04:57:55PM +0100, Jan Kara wrote:
> > On Wed 16-11-11 08:42:34, Christoph Hellwig wrote:
> > > On Wed, Nov 16, 2011 at 02:39:15PM +0100, Jan Kara wrote:
> > > > > This would work fine with XFS and be equivalent to what it does for
> > > > > O_DSYNC now.  But I'd rather see every filesystem do the right thing
> > > > > and make sure the update actually is on disk when doing O_(D)SYNC
> > > > > operations.
> > > >   OK, I don't really have a strong opinion here. Are you afraid that 
> > > > just
> > > > calling fsync() need not be enough to push all updates fallocate did to
> > > > disk?
> > > 
> > > No, the point is that you should not have to call fsync when doing
> > > O_SYNC I/O.  That's the whole point of it.
> >   I agree with you that userspace shouldn't have to call fsync. What I
> > meant is that sys_fallocate() or do_fallocate() can call
> > generic_write_sync(file, pos, len), and that would be completely
> > transparent to userspace.
> 
> We should do it per FS though, I'll patch up btrfs.

I agree about doing it per FS. Ocfs2 just needs a one-liner to mark the
journal transaction as synchronous.
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: compressed btrfs "No space left on device"

2011-11-16 Thread Arnd Hannemann
Am 14.11.2011 19:24, schrieb Arnd Hannemann:
> Am 14.11.2011 15:57, schrieb Arnd Hannemann:
> 
>> I'm using btrfs for my /usr/share/ partition and keep getting the following 
>> error
>> while installing a debian package which should take no more than 228 MB:
>>
>> Unpacking texlive-fonts-extra (from 
>> .../texlive-fonts-extra_2009-10ubuntu1_all.deb) ...
>>  dpkg: error processing 
>> /var/cache/apt/archives/texlive-fonts-extra_2009-10ubuntu1_all.deb 
>> (--unpack):
>>  unable to install new version of 
>> `/usr/share/texmf-texlive/fonts/type1/public/allrunes/frutlt.pfb': No space 
>> left on device
>>
>>
>> However df reports plenty of available space:
>>
>> /dev/mapper/vg0-usr_share
>>   5.0G  1.5G  2.5G  37% /usr/share
>>
>>
>> I already extended /dev/mapper/vg0-usr_share from 4G to 5G and ran defrag 
>> and balance on it with no luck.
>> I'm using ubuntu 11.10 on amd64 with the ubuntu 3.0.0 kernel.
> 
> FYI: The problem is the same with mainline kernel v3.1.1.

JFYI: the problem went away in 3.2-rc2  so someone must
have fixed something.

Thanks!

Best regards
Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: Prefix resize related printks with btrfs:

2011-11-16 Thread Arnd Hannemann
For the user it is confusing to find something like:
[10197.627710] new size for /dev/mapper/vg0-usr_share is 3221225472
in kernel log, because it doesn't point directly to btrfs.

This patch prefixes those messages with "btrfs:" like other btrfs
related printks.

Signed-off-by: Arnd Hannemann 
---
 fs/btrfs/ioctl.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 4a34c47..b6d2a1a 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1216,12 +1216,12 @@ static noinline int btrfs_ioctl_resize(struct 
btrfs_root *root,
*devstr = '\0';
devstr = vol_args->name;
devid = simple_strtoull(devstr, &end, 10);
-   printk(KERN_INFO "resizing devid %llu\n",
+   printk(KERN_INFO "btrfs: resizing devid %llu\n",
   (unsigned long long)devid);
}
device = btrfs_find_device(root, devid, NULL, NULL);
if (!device) {
-   printk(KERN_INFO "resizer unable to find device %llu\n",
+   printk(KERN_INFO "btrfs: resizer unable to find device %llu\n",
   (unsigned long long)devid);
ret = -EINVAL;
goto out_unlock;
@@ -1267,7 +1267,7 @@ static noinline int btrfs_ioctl_resize(struct btrfs_root 
*root,
do_div(new_size, root->sectorsize);
new_size *= root->sectorsize;
 
-   printk(KERN_INFO "new size for %s is %llu\n",
+   printk(KERN_INFO "btrfs: new size for %s is %llu\n",
device->name, (unsigned long long)new_size);
 
if (new_size > old_size) {
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] btrfs scrub: handle -ENOMEM from init_ipath()

2011-11-16 Thread Jan Schmidt
On 16.11.2011 09:28, Dan Carpenter wrote:
> init_ipath() can return an ERR_PTR(-ENOMEM).
> 
> Signed-off-by: Dan Carpenter 
Signed-off-by: Jan Schmidt 

Thanks,
-Jan

> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index ed11d38..b72ee47 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -256,6 +256,11 @@ static int scrub_print_warning_inode(u64 inum, u64 
> offset, u64 root, void *ctx)
>   btrfs_release_path(swarn->path);
>  
>   ipath = init_ipath(4096, local_root, swarn->path);
> + if (IS_ERR(ipath)) {
> + ret = PTR_ERR(ipath);
> + ipath = NULL;
> + goto err;
> + }
>   ret = paths_from_inode(inum, ipath);
>  
>   if (ret < 0)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Segmentation Faults

2011-11-16 Thread Tim Crone
I have some kind of corruption in my btrfs file system that causes kernel
segmentation faults when I try to delete files in my browser cache. Let me
know if you are interested in having any additional information besides what
is below. Thanks.

root@berna:~# uname -a
Linux berna 2.6.38-bpo.2-amd64 #1 SMP Mon Jun 6 15:24:02 UTC 2011
x86_64 GNU/Linux


root@berna:/usr/local/src/btrfs-progs# ./btrfsck -s 1 /dev/sda
using SB copy 1, bytenr 67108864
failed to read /dev/sr0: No medium found
ERROR: unable to scan the device '/dev/sdd' - Device or resource busy
failed to read /dev/sr0: No medium found
ERROR: unable to scan the device '/dev/sdd' - Device or resource busy
leaf parent key incorrect 1782596403200
bad block 1782596403200
incorrect offsets 3942 17510
bad block 1782603522048
warning, start mismatch 1778178670592 1778178764800
Aborted


root@berna:/home/tjc# rm -rf .cache/chromium
Segmentation fault
root@berna:/home/tjc#
Message from syslogd@localhost at Nov 16 11:19:35 ...
 kernel:[   66.877568] [ cut here ]

Message from syslogd@localhost at Nov 16 11:19:35 ...
 kernel:[   66.877572] invalid opcode:  [#1] SMP

Message from syslogd@localhost at Nov 16 11:19:35 ...
 kernel:[   66.877573] last sysfs file: /sys/devices/virtual/block/md0/uevent

Message from syslogd@localhost at Nov 16 11:19:35 ...
 kernel:[   66.877633] Stack:

Message from syslogd@localhost at Nov 16 11:19:35 ...
 kernel:[   66.877640] Call Trace:

Message from syslogd@localhost at Nov 16 11:19:35 ...
 kernel:[   66.877703] Code: 24 24 48 8b 74 24 28 48 8d 54 24 50 41 b9 01 00 
00 00 48 89 d9 4c 89 e7 e8 80 6e ff ff 83 f8 00 41 89 c5 0f 8c c5 02 00 00 74
04 <0f> 0b eb fe 4c 8b 2b 8b 73 40 4c 89 ef e8 a3 e5 ff ff 41 89 c6

(I was going to include some log entries, but the lines are longer than
80 characters and your system will not accept them.)





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix URL of btrfs-progs git repository in docs

2011-11-16 Thread Arnd Hannemann
The location of the btrfs-progs repository has been changed.
This patch updates the documentation accordingly.

Signed-off-by: Arnd Hannemann 
---
 Documentation/filesystems/btrfs.txt |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/btrfs.txt 
b/Documentation/filesystems/btrfs.txt
index 64087c3..7671352 100644
--- a/Documentation/filesystems/btrfs.txt
+++ b/Documentation/filesystems/btrfs.txt
@@ -63,8 +63,8 @@ IRC network.
 Userspace tools for creating and manipulating Btrfs file systems are
 available from the git repository at the following location:
 
- http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs-unstable.git
- git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git
+ http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git
+ git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
 
 These include the following tools:
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cluster-devel] fallocate vs O_(D)SYNC

2011-11-16 Thread Chris Mason
On Wed, Nov 16, 2011 at 04:57:55PM +0100, Jan Kara wrote:
> On Wed 16-11-11 08:42:34, Christoph Hellwig wrote:
> > On Wed, Nov 16, 2011 at 02:39:15PM +0100, Jan Kara wrote:
> > > > This would work fine with XFS and be equivalent to what it does for
> > > > O_DSYNC now.  But I'd rather see every filesystem do the right thing
> > > > and make sure the update actually is on disk when doing O_(D)SYNC
> > > > operations.
> > >   OK, I don't really have a strong opinion here. Are you afraid that just
> > > calling fsync() need not be enough to push all updates fallocate did to
> > > disk?
> > 
> > No, the point is that you should not have to call fsync when doing
> > O_SYNC I/O.  That's the whole point of it.
>   I agree with you that userspace shouldn't have to call fsync. What I
> meant is that sys_fallocate() or do_fallocate() can call
> generic_write_sync(file, pos, len), and that would be completely
> transparent to userspace.

We should do it per FS though, I'll patch up btrfs.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cluster-devel] fallocate vs O_(D)SYNC

2011-11-16 Thread Christoph Hellwig
On Wed, Nov 16, 2011 at 04:57:55PM +0100, Jan Kara wrote:
>   I agree with you that userspace shouldn't have to call fsync. What I
> meant is that sys_fallocate() or do_fallocate() can call
> generic_write_sync(file, pos, len), and that would be completely
> transparent to userspace.

That's different from how everything else in the I/O path works.
If filessystem want to use it, that's fine, but I suspect most could
do it more efficiently.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cluster-devel] fallocate vs O_(D)SYNC

2011-11-16 Thread Jan Kara
On Wed 16-11-11 08:42:34, Christoph Hellwig wrote:
> On Wed, Nov 16, 2011 at 02:39:15PM +0100, Jan Kara wrote:
> > > This would work fine with XFS and be equivalent to what it does for
> > > O_DSYNC now.  But I'd rather see every filesystem do the right thing
> > > and make sure the update actually is on disk when doing O_(D)SYNC
> > > operations.
> >   OK, I don't really have a strong opinion here. Are you afraid that just
> > calling fsync() need not be enough to push all updates fallocate did to
> > disk?
> 
> No, the point is that you should not have to call fsync when doing
> O_SYNC I/O.  That's the whole point of it.
  I agree with you that userspace shouldn't have to call fsync. What I
meant is that sys_fallocate() or do_fallocate() can call
generic_write_sync(file, pos, len), and that would be completely
transparent to userspace.

Honza
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix stat blocks accounting

2011-11-16 Thread David Sterba
Round inode bytes and delalloc bytes up to real blocksize before
converting to sector size. Otherwise eg. files smaller than 512
are reported with zero blocks due to incorrect rounding.

Signed-off-by: David Sterba 
---
 fs/btrfs/inode.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e16215f..8ad26b1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6794,11 +6794,13 @@ static int btrfs_getattr(struct vfsmount *mnt,
 struct dentry *dentry, struct kstat *stat)
 {
struct inode *inode = dentry->d_inode;
+   u32 blocksize = inode->i_sb->s_blocksize;
+
generic_fillattr(inode, stat);
stat->dev = BTRFS_I(inode)->root->anon_dev;
stat->blksize = PAGE_CACHE_SIZE;
-   stat->blocks = (inode_get_bytes(inode) +
-   BTRFS_I(inode)->delalloc_bytes) >> 9;
+   stat->blocks = (ALIGN(inode_get_bytes(inode), blocksize) +
+   ALIGN(BTRFS_I(inode)->delalloc_bytes, blocksize)) >> 9;
return 0;
 }
 
-- 
1.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG at fs/btrfs/inode.c:1587

2011-11-16 Thread Christian Brunner
2011/11/16 Chris Mason :
> On Tue, Nov 15, 2011 at 09:19:53AM +0100, Christian Brunner wrote:
>> Hi,
>>
>> this time I've hit a new bug. This happened while ceph was rebuilding
>> his filestore (heavy io).
>>
>> The btrfs version is from 3.2-rc1, applied to a 3.0 kernel.
>
> This one means some part of the kernel has set a btrfs data page dirty
> without going through the proper setup.  A few of us have hit it, but we
> haven't been able to nail down a solid way to reproduce it.
>
> Have you hit it more than once?


I' sorry, I've only hit this once and it's not reproduceable.

Regards,
Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG at fs/btrfs/inode.c:1587

2011-11-16 Thread Chris Mason
On Tue, Nov 15, 2011 at 09:19:53AM +0100, Christian Brunner wrote:
> Hi,
> 
> this time I've hit a new bug. This happened while ceph was rebuilding
> his filestore (heavy io).
> 
> The btrfs version is from 3.2-rc1, applied to a 3.0 kernel.

This one means some part of the kernel has set a btrfs data page dirty
without going through the proper setup.  A few of us have hit it, but we
haven't been able to nail down a solid way to reproduce it.

Have you hit it more than once?

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cluster-devel] fallocate vs O_(D)SYNC

2011-11-16 Thread Christoph Hellwig
On Wed, Nov 16, 2011 at 02:39:15PM +0100, Jan Kara wrote:
> > This would work fine with XFS and be equivalent to what it does for
> > O_DSYNC now.  But I'd rather see every filesystem do the right thing
> > and make sure the update actually is on disk when doing O_(D)SYNC
> > operations.
>   OK, I don't really have a strong opinion here. Are you afraid that just
> calling fsync() need not be enough to push all updates fallocate did to
> disk?

No, the point is that you should not have to call fsync when doing
O_SYNC I/O.  That's the whole point of it.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cluster-devel] fallocate vs O_(D)SYNC

2011-11-16 Thread Jan Kara
On Wed 16-11-11 07:45:50, Christoph Hellwig wrote:
> On Wed, Nov 16, 2011 at 11:54:13AM +0100, Jan Kara wrote:
> >   Yeah, only that nobody calls that fsync() automatically if the fd is
> > O_SYNC if I'm right. But maybe calling fdatasync() on the range which was
> > fallocated from sys_fallocate() if the fd is O_SYNC would do the trick for
> > most filesystems? That would match how we treat O_SYNC for other operations
> > as well. I'm just not sure whether XFS wouldn't take unnecessarily big hit
> > with this.
> 
> This would work fine with XFS and be equivalent to what it does for
> O_DSYNC now.  But I'd rather see every filesystem do the right thing
> and make sure the update actually is on disk when doing O_(D)SYNC
> operations.
  OK, I don't really have a strong opinion here. Are you afraid that just
calling fsync() need not be enough to push all updates fallocate did to
disk?

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cluster-devel] fallocate vs O_(D)SYNC

2011-11-16 Thread Christoph Hellwig
On Wed, Nov 16, 2011 at 11:54:13AM +0100, Jan Kara wrote:
>   Yeah, only that nobody calls that fsync() automatically if the fd is
> O_SYNC if I'm right. But maybe calling fdatasync() on the range which was
> fallocated from sys_fallocate() if the fd is O_SYNC would do the trick for
> most filesystems? That would match how we treat O_SYNC for other operations
> as well. I'm just not sure whether XFS wouldn't take unnecessarily big hit
> with this.

This would work fine with XFS and be equivalent to what it does for
O_DSYNC now.  But I'd rather see every filesystem do the right thing
and make sure the update actually is on disk when doing O_(D)SYNC
operations.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fallocate vs O_(D)SYNC

2011-11-16 Thread Zheng Liu
On Wed, Nov 16, 2011 at 03:42:56AM -0500, Christoph Hellwig wrote:
> It seems all filesystems but XFS ignore O_SYNC for fallocate, and never
> make sure the size update transaction made it to disk.
> 
> Given that a fallocate without FALLOC_FL_KEEP_SIZE very much is a data
> operation (it adds new blocks that return zeroes) that seems like a
> fairly nasty surprise for O_SYNC users.

Hi all,

This patch should be fix this problem in ext4.

From: Zheng Liu 

Make sure the transaction to be commited if O_(D)SYNC flag is set in
ext4_fallocate().

Signed-off-by: Zheng Liu 
---
 fs/ext4/extents.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 61fa9e1..f47e3ad 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4356,6 +4356,8 @@ retry:
ret = PTR_ERR(handle);
break;
}
+   if (file->f_flags & O_SYNC)
+   ext4_handle_sync(handle);
ret = ext4_map_blocks(handle, inode, &map, flags);
if (ret <= 0) {
 #ifdef EXT4FS_DEBUG
-- 
1.7.4.1


> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cluster-devel] fallocate vs O_(D)SYNC

2011-11-16 Thread Steven Whitehouse
Hi,

On Wed, 2011-11-16 at 11:54 +0100, Jan Kara wrote:
> Hello,
> 
> On Wed 16-11-11 09:43:08, Steven Whitehouse wrote:
> > On Wed, 2011-11-16 at 03:42 -0500, Christoph Hellwig wrote:
> > > It seems all filesystems but XFS ignore O_SYNC for fallocate, and never
> > > make sure the size update transaction made it to disk.
> > > 
> > > Given that a fallocate without FALLOC_FL_KEEP_SIZE very much is a data
> > > operation (it adds new blocks that return zeroes) that seems like a
> > > fairly nasty surprise for O_SYNC users.
> > 
> > In GFS2 we zero out the data blocks as we go (since our metadata doesn't
> > allow us to mark blocks as zeroed at alloc time) and also because we are
> > mostly interested in being able to do FALLOC_FL_KEEP_SIZE which we use
> > on our rindex system file in order to ensure that there is always enough
> > space to expand a filesystem.
> > 
> > So there is no danger of having non-zeroed blocks appearing later, as
> > that is done before the metadata change.
> > 
> > Our fallocate_chunk() function calls mark_inode_dirty(inode) on each
> > call, so that fsync should pick that up and ensure that the metadata has
> > been written back. So we should thus have both data and metadata stable
> > on disk.
> > 
> > Do you have some evidence that this is not happening?
>   Yeah, only that nobody calls that fsync() automatically if the fd is
> O_SYNC if I'm right. But maybe calling fdatasync() on the range which was
> fallocated from sys_fallocate() if the fd is O_SYNC would do the trick for
> most filesystems? That would match how we treat O_SYNC for other operations
> as well. I'm just not sure whether XFS wouldn't take unnecessarily big hit
> with this.
> 
>   Honza

Ah, I see now. Sorry, I missed the original point. So that would just be
a VFS addition to check the O_(D)SYNC flag as you suggest. I've no
objections to that, it makes sense to me,

Steve.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cluster-devel] fallocate vs O_(D)SYNC

2011-11-16 Thread Jan Kara
  Hello,

On Wed 16-11-11 09:43:08, Steven Whitehouse wrote:
> On Wed, 2011-11-16 at 03:42 -0500, Christoph Hellwig wrote:
> > It seems all filesystems but XFS ignore O_SYNC for fallocate, and never
> > make sure the size update transaction made it to disk.
> > 
> > Given that a fallocate without FALLOC_FL_KEEP_SIZE very much is a data
> > operation (it adds new blocks that return zeroes) that seems like a
> > fairly nasty surprise for O_SYNC users.
> 
> In GFS2 we zero out the data blocks as we go (since our metadata doesn't
> allow us to mark blocks as zeroed at alloc time) and also because we are
> mostly interested in being able to do FALLOC_FL_KEEP_SIZE which we use
> on our rindex system file in order to ensure that there is always enough
> space to expand a filesystem.
> 
> So there is no danger of having non-zeroed blocks appearing later, as
> that is done before the metadata change.
> 
> Our fallocate_chunk() function calls mark_inode_dirty(inode) on each
> call, so that fsync should pick that up and ensure that the metadata has
> been written back. So we should thus have both data and metadata stable
> on disk.
> 
> Do you have some evidence that this is not happening?
  Yeah, only that nobody calls that fsync() automatically if the fd is
O_SYNC if I'm right. But maybe calling fdatasync() on the range which was
fallocated from sys_fallocate() if the fd is O_SYNC would do the trick for
most filesystems? That would match how we treat O_SYNC for other operations
as well. I'm just not sure whether XFS wouldn't take unnecessarily big hit
with this.

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cluster-devel] fallocate vs O_(D)SYNC

2011-11-16 Thread Steven Whitehouse
Hi,

On Wed, 2011-11-16 at 03:42 -0500, Christoph Hellwig wrote:
> It seems all filesystems but XFS ignore O_SYNC for fallocate, and never
> make sure the size update transaction made it to disk.
> 
> Given that a fallocate without FALLOC_FL_KEEP_SIZE very much is a data
> operation (it adds new blocks that return zeroes) that seems like a
> fairly nasty surprise for O_SYNC users.
> 


In GFS2 we zero out the data blocks as we go (since our metadata doesn't
allow us to mark blocks as zeroed at alloc time) and also because we are
mostly interested in being able to do FALLOC_FL_KEEP_SIZE which we use
on our rindex system file in order to ensure that there is always enough
space to expand a filesystem.

So there is no danger of having non-zeroed blocks appearing later, as
that is done before the metadata change.

Our fallocate_chunk() function calls mark_inode_dirty(inode) on each
call, so that fsync should pick that up and ensure that the metadata has
been written back. So we should thus have both data and metadata stable
on disk.

Do you have some evidence that this is not happening?

Steve.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


fallocate vs O_(D)SYNC

2011-11-16 Thread Christoph Hellwig
It seems all filesystems but XFS ignore O_SYNC for fallocate, and never
make sure the size update transaction made it to disk.

Given that a fallocate without FALLOC_FL_KEEP_SIZE very much is a data
operation (it adds new blocks that return zeroes) that seems like a
fairly nasty surprise for O_SYNC users.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] btrfs scrub: handle -ENOMEM from init_ipath()

2011-11-16 Thread Dan Carpenter
init_ipath() can return an ERR_PTR(-ENOMEM).

Signed-off-by: Dan Carpenter 

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ed11d38..b72ee47 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -256,6 +256,11 @@ static int scrub_print_warning_inode(u64 inum, u64 offset, 
u64 root, void *ctx)
btrfs_release_path(swarn->path);
 
ipath = init_ipath(4096, local_root, swarn->path);
+   if (IS_ERR(ipath)) {
+   ret = PTR_ERR(ipath);
+   ipath = NULL;
+   goto err;
+   }
ret = paths_from_inode(inum, ipath);
 
if (ret < 0)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html