Re: can't access diagrams on wiki

2012-01-26 Thread Anand Jain



It worked. I missed the point that in mediawiki a
link-to-the-new page will create a new-page.

Further I hope the new contents created on btrfs.ipv5.de
will be merged with btrfs.wiki.kernel.org when the latter
is ready.

thanks, Anand


On Wednesday 25,January,2012 03:29 PM, Arne Jansen wrote:

On 25.01.2012 03:37, Anand Jain wrote:



The wiki on kernel.org is in read-only mode
[1] http://btrfs.ipv5.de/


  Is wiki still in read only mode? I am able to login,
  but there isn't any link to create new page ?



You mean the wiki  mentioned above? You have to confirm
your email address to create and edit pages. This was necessary
to slow down spammers :(

-Arne


thanks, Anand

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Will BTRFS repair or restore data if corrupted?

2012-01-26 Thread Hugo Mills
On Thu, Jan 26, 2012 at 12:27:57AM +0100, Waxhead wrote:
 Hi,
 
 From what I have read BTRFS does replace a bad copy of data with a
 known good copy (if it has one).

   Correct.

 Will BTRFS try to repair the corrupt data or will it simply silently
 restore the data without the user knowing that a file has been
 fixed?

   No, it'll just return the good copy and report the failure in the
system logs. If you want to fix the corrupt data, you need to use
scrub, which will check everything and fix blocks with failed
checksums.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- ...  one ping(1) to rule them all, and in the ---  
 darkness bind(2) them.  


signature.asc
Description: Digital signature


Re: Will BTRFS repair or restore data if corrupted?

2012-01-26 Thread Stefan Behrens
On 1/26/2012 9:59 AM, Hugo Mills wrote:
 On Thu, Jan 26, 2012 at 12:27:57AM +0100, Waxhead wrote:
[...]
 Will BTRFS try to repair the corrupt data or will it simply silently
 restore the data without the user knowing that a file has been
 fixed?
 
No, it'll just return the good copy and report the failure in the
 system logs. If you want to fix the corrupt data, you need to use
 scrub, which will check everything and fix blocks with failed
 checksums.

Since 3.2, btrfs rewrites the corrupt disk block (commit 4a54c8c and
f4a8e65 from Jan Schmidt), even without scrub.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Single drive volume + second drive - RAID1?

2012-01-26 Thread Duncan
James posted on Mon, 23 Jan 2012 13:17:53 -0800 as excerpted:

 On Mon, Jan 23, 2012 at 1:25 AM, Hugo Mills h...@carfax.org.uk wrote:
   Why not just create the filesystem as RAID-1 in the first place?

 # mkfs.btrfs -d raid1 -m raid1 /dev/sda1 /dev/sdb1
 
 As I said, I've only got two working drives large enough at present.
 
   Then you can restore from your backups. You do have backups, right?
 (Remember, this is a filesystem still marked as experimental).
 
 Yes, I know. :) I just have remote backups,

First post to the list, as I'm just planning my switch to btrfs... due to 
which I just read much of the wiki and thus have it fresh in mind...

Take a look at the UseCases page on the wiki.  There's several items of 
interest there that should be very useful to you right now.  Here's the 
link altho the below actually has a bit more detail than the wiki (but 
the wiki obviously has other useful information not apropos to this 
situation):

https://btrfs.wiki.kernel.org/articles/u/s/e/UseCases_8bd8.html


First:

Creating a btrfs-raid-1 in degraded mode, that is, with a missing 
drive, to be added later: Apparently it's not possible to do that 
directly yet, but there's a trick to work around the issue that sounds 
like just what you need ATM.

The idea is to create a small (the example in the wiki uses 4 GB, I 
guess I'm getting old as that doesn't seem all that small to me!!) fake 
device using loopback to serve as the missing device, create the 
filesystem specifying raid-1 for both data and metadata (-m raid1 -d 
raid1) giving mkfs both the real and loopback devices to work with, then 
delete the loopback device and remove the file backing it, so that all 
that's left is the single real device:

dd if=/dev/zero of=/tmp/empty bs=1 count=0 seek=4G
losetup /dev/loop1 /tmp/empty
mkfs.btrfs -m raid1 -d raid1 /dev/sda1 /dev/loop1
losetup -d /dev/loop1
rm /tmp/empty

I immediately thought of the possibility of sticking that temporary 
loopback device file on a USB thumbdrive if necessary...

You should then be able to copy everything over to the new btrfs 
operating in degraded raid-1.  After testing to ensure it's there and 
usable (bootable if that's your intention), you can blank the old drive.  
Adding it in to complete the raid-1 is then done this way:

mount /dev/sda1 /mnt
btrfs dev add /dev/sdb1 /mnt


Second:

This note might explain why you ended up with raid-0 where you thought 
you had raid-1:

(kernel 2.6.37) Simply creating the filesystem with too few devices will 
result in a RAID-0 filesystem. (This is probably a bug).


Third:

To verify that btrfs is using the raid level you expect:

On a 2.6.37 or later kernel, use

btrfs fi df /mountpoint

The required support was broken accidentally in earlier kernels, but has 
now been fixed.


NB:  As I mentioned above I'm only researching btrfs for my systems now, 
so obviously have no clue how the above suggestions work in practice.  
They're simply off the wiki.


(Now to send this via gmane.org since I'm doing the list as a newsgroup 
thru them, and get the email challenge, so I can post further replies and 
questions of my own...)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Trying to mount RAID1 degraded with removed disk - open_ctree failed

2012-01-26 Thread Duncan
Dirk Lutzebaeck posted on Sun, 22 Jan 2012 16:05:14 +0100 as excerpted:

 I have setup a RAID1 using 3 devices (500G each) on separate disks.
 After removing one disk physically the filesystem cannot be mounted in
 degraded nor in recovery mode.

 - latest kernel 3.2.1 and btrfs-tools on xubuntu 11.10

 What is happening? RAID1 should be mountable degraded with one
 missing/removed device.

Note that I'm only researching btrfs for my own systems at this point and 
am not using it yet.  However, because I *AM* researching it and already 
read thru most of the wiki documentation, it's fresh in mind.

Here's what the wiki says, tho of course it could be outdated:

https://btrfs.wiki.kernel.org/

From the multiple devices page:

 By default, metadata will be mirrored across two devices and data will
 be striped across all of the devices present.

Question:  Did you specify -m raid1 -d raid1 when you did the mkfs.btrfs?

While the -m raid1 would be the default given multiple devices, the -d 
raid1 is not.  If you didn't specify -d raid1, you'll have raid0/striped 
data with only the metadata being raid1/mirrored, thus explaining the 
problem.


At least with all devices present, the following should show the raid 
level actually used (from the use cases page):

 On a 2.6.37 or later kernel, use
 
 btrfs fi df /mountpoint
 
 The required support was broken accidentally in earlier kernels,
 but has now been fixed.

Also note since you're running a 3-device btrfs-raid-1, tho it shouldn't 
affect a single device dropout, from the sysadmin guide page (near the 
bottom of the raid and data replication section):

 With RAID-1 and RAID-10, only two copies of each byte of data are
 written, regardless of how many block devices are actually in use
 on the filesystem.

IOW, unlike standard or kernel/md raid-1, that 3-device btrfs-raid-1 will 
**NOT** protect you if two of the three devices go bad before you've had 
a chance to bring in and balance to a replacement for the first bad 
device.

As I said I'm just now researching my own btrfs upgrade, and don't know 
for sure whether that's true or not, but if it is, it's a HUGE negative 
for me, as I'm currently running 4-way kernel/md RAID-1 on an aging set 
of drives, and was hoping to upgrade to btrfs raid-1 for the checksummed 
integrity.  But given the age of the drives I really don't want to drop 
below dual redundancy (3 copies), and this two-copies-only (single 
redundancy) raid-1(-ish) no matter the number of devices, is 
disappointing indeed!

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/12] Btrfs: Fix file clone when source offset is not 0

2012-01-26 Thread Jan Schmidt
I was looking at the clone range ioctl and have some remarks:

On 27.01.2011 09:46, Li Zefan wrote:
 diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
 index f87552a..1b61dab 100644
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -1788,7 +1788,10 @@ static noinline long btrfs_ioctl_clone(struct file 
 *file, unsigned long srcfd,
  
   memcpy(new_key, key, sizeof(new_key));
   new_key.objectid = inode-i_ino;
 - new_key.offset = key.offset + destoff - off;
 + if (off = key.offset)
 + new_key.offset = key.offset + destoff - off;
 + else
 + new_key.offset = destoff;
 ^^^
1) This looks spurious to me. What if destoff isn't aligned? That's what
the key.offset - off code above is for. Before the patch, the code
didn't work at all, I agree. But this fix can only work for aligned
requests.

2) The error in new_key also has propagated to the extent item's backref
and wasn't fixed there. I did a range clone and ended up with an extent
item like that:
item 30 key (1318842368 EXTENT_ITEM 131072) itemoff 1047
itemsize 169
extent refs 8 gen 11103 flags 1
[...]
extent data backref root 257 objectid 272 offset
18446744073709494272 count 1

The last offset (equal to -14 * 4k) is obviously wrong. I didn't figure
out how the variables are computed, but it looks like there's something
wrong with the datao u64 to me.

3) Then, there's this code comment:

2180   /*
2181* TODO:
2186* - allow ranges within the same file to be cloned (provided
2187*   they don't overlap)?
2188*/

This should be safe to do. Even with the current interface, we only
check for inode equality. If they differ, cloning is permitted. Make a
full-file clone, and you'll end up with two inodes referring to the same
extent.

Detection of overlapping areas seems to be missing, though and should be
added. Until that, the inode check stands as a (very weak) protection
against overlapping clone requests.

-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs-raid questions I couldn't find an answer to on the wiki

2012-01-26 Thread Duncan
I'm currently researching an upgrade to (raid1-ed) btrfs from mostly 
reiserfs (which I've found quite reliable (even thru a period of bad ram 
and resulting system crashes) since data=ordered went in with 2.6.16 or 
whatever it was.  (Thanks, Chris! =:^)) on multiple md/raid-1s.  I have 
some questions that don't appear to be addressed well on the wiki, yet, 
or where the wiki info might be dated.

Device hardware is 4 now aging 300-gig disks with identical gpt-
partitioning on all four disks, using multiple 4-way md/raid-1s for most 
of the system.  I'm running gentoo/~amd64 with the linus mainline kernel 
from git, kernel generally updated 1-2X/wk except during the merge 
window, so I stay reasonably current.  I have btrfs-progs-, aka the 
live-git build, kernel.org mason tree, installed.

The current layout has a total of 16 physical disk partitions on each of 
the four drives, mostly of which are 4-disk md/raid1, but with a couple 
md/raid1s for local cache of redownloadables, etc, thrown in.  Some of 
the mds are further partitioned (mdp), some not.  A couple are only 2-
disk md/raid1 instead of the usual 4-disk.  Most mds have a working and 
backup copy of exactly the same partitioned size, thus explaining the 
multitude of partitions, since most of them come in pairs.  No lvm as I'm 
not running an initrd which meant it couldn't handle root, and I wasn't 
confident in my ability to recover the system in an emergency with lvm 
either, so I was best off without it.

Note that my current plan is to keep the backup sets as reiserfs on md/
raid1 for the time being, probably until btrfs comes out of experimental/
testing or at least until it further stabilizes, so I'm not too worried 
about btrfs as long as it's not going to go scribbling outside the 
partitions established for it.  For the worst-case I have boot-tested 
external-drive backup.

Three questions:

1) My /boot partition and its backup (which I do want to keep separate 
from root) are only 128 MB each.  The wiki recommends 1 gig sizes 
minimum, but there's some indication that's dated info due to mixed data/
metadata mode in recent kernels.

Is a 128 MB btrfs reasonable?  What's the mixed-mode minumum recommended 
and what is overhead going to look like?

2)  The wiki indicates that btrfs-raid1 and raid-10 only mirror data 2-
way, regardless of the number of devices.  On my now aging disks, I 
really do NOT like the idea of only 2-copy redundancy.  I'm far happier 
with the 4-way redundancy, twice for the important stuff since it's in 
both working and backup mds altho they're on the same 4-disk set (tho I 
do have an external drive backup as well, but it's not kept as current).

If true that's a real disappointment, as I was looking forward to btrfs-
raid1 with checksummed integrity management.

Is there really NO way to do more than 2-way btrfs-raid1?  If not, 
presumably layering it on md/raid1 is possible, but is two-way-btrfs-
raid1-on-2-way-md-raid1 or btrfs-on-single-4-way-md-raid1 (presumably 
still-duped btrfs metadata) recommended?  Or perhaps the recommendations 
for performance and reliability differ in that scenario?

3) How does btrfs space overhead (and ENOSPC issues) compare to reiserfs 
with its (default) journal and tail-packing?  My existing filesystems are 
128 MB and 4 GB at the low end, and 90 GB and 16 GB at the high end.  At 
the same size, can I expect to fit more or less data on them?  Do the 
compression options change that by much IRL?  Given that I'm using same-
sized partitions for my raid-1s, I guess at least /that/ angle of it's 
covered.

Thanks. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mkfs: Handle creation of filesystem larger than the first device

2012-01-26 Thread Jan Kara
make_btrfs() function takes a size of filesystem as an argument. It uses this
value to set the size of the first device as well which is wrong for
filesystems larger than this device. It results in 'attemp to access beyond end
of device' messages from the kernel. So add size of the first device as an
argument to make_btrfs().

CC: David Sterba dste...@suse.cz
Signed-off-by: Jan Kara j...@suse.cz
---
 convert.c |2 +-
 mkfs.c|6 --
 utils.c   |4 ++--
 utils.h   |2 +-
 4 files changed, 8 insertions(+), 6 deletions(-)

  As a side note, I'd guess that creating filesystem larger than all given
devices (especially in case of single device) is usually not what sysadmin
wants (we've spotted this bug when xfstest were happily creating 1 GB
filesystem on 500 MB device and it took us a while to notice the problem).
Thus maybe it should require some --force switch?

diff --git a/convert.c b/convert.c
index 291dc27..7f1932c 100644
--- a/convert.c
+++ b/convert.c
@@ -2374,7 +2374,7 @@ int do_convert(const char *devname, int datacsum, int 
packing, int noxattr)
goto fail;
}
ret = make_btrfs(fd, devname, ext2_fs-super-s_volume_name,
-blocks, total_bytes, blocksize, blocksize,
+blocks, total_bytes, total_bytes, blocksize, blocksize,
 blocksize, blocksize);
if (ret) {
fprintf(stderr, unable to create initial ctree\n);
diff --git a/mkfs.c b/mkfs.c
index e3ced19..f0d29bb 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -1294,8 +1294,10 @@ int main(int ac, char **av)
first_file = file;
source_dir_size = size_sourcedir(source_dir, sectorsize,
 num_of_meta_chunks, 
size_of_data);
-   if(block_count  source_dir_size)
+   if (block_count  source_dir_size)
block_count = source_dir_size;
+   dev_block_count = block_count;
+
ret = zero_output_file(fd, block_count, sectorsize);
if (ret) {
fprintf(stderr, unable to zero the output file\n);
@@ -1321,7 +1323,7 @@ int main(int ac, char **av)
leafsize * i;
}
 
-   ret = make_btrfs(fd, file, label, blocks, block_count,
+   ret = make_btrfs(fd, file, label, blocks, block_count, dev_block_count,
 nodesize, leafsize,
 sectorsize, stripesize);
if (ret) {
diff --git a/utils.c b/utils.c
index 178d1b9..f34da51 100644
--- a/utils.c
+++ b/utils.c
@@ -74,7 +74,7 @@ static u64 reference_root_table[] = {
 };
 
 int make_btrfs(int fd, const char *device, const char *label,
-  u64 blocks[7], u64 num_bytes, u32 nodesize,
+  u64 blocks[7], u64 num_bytes, u64 dev_num_bytes, u32 nodesize,
   u32 leafsize, u32 sectorsize, u32 stripesize)
 {
struct btrfs_super_block super;
@@ -276,7 +276,7 @@ int make_btrfs(int fd, const char *device, const char 
*label,
dev_item = btrfs_item_ptr(buf, nritems, struct btrfs_dev_item);
btrfs_set_device_id(buf, dev_item, 1);
btrfs_set_device_generation(buf, dev_item, 0);
-   btrfs_set_device_total_bytes(buf, dev_item, num_bytes);
+   btrfs_set_device_total_bytes(buf, dev_item, dev_num_bytes);
btrfs_set_device_bytes_used(buf, dev_item,
BTRFS_MKFS_SYSTEM_GROUP_SIZE);
btrfs_set_device_io_align(buf, dev_item, sectorsize);
diff --git a/utils.h b/utils.h
index c5f55e1..bf2d5a4 100644
--- a/utils.h
+++ b/utils.h
@@ -22,7 +22,7 @@
 #define BTRFS_MKFS_SYSTEM_GROUP_SIZE (4 * 1024 * 1024)
 
 int make_btrfs(int fd, const char *device, const char *label,
-  u64 blocks[6], u64 num_bytes, u32 nodesize,
+  u64 blocks[6], u64 num_bytes, u64 dev_num_bytes, u32 nodesize,
   u32 leafsize, u32 sectorsize, u32 stripesize);
 int btrfs_make_root_dir(struct btrfs_trans_handle *trans,
struct btrfs_root *root, u64 objectid);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: Fix busyloops in transaction waiting code

2012-01-26 Thread Jan Kara
wait_log_commit() and wait_for_writer() were using slightly different
conditions for deciding whether they should call schedule() and whether they
should continue in the wait loop. Thus it could happen that we busylooped when
the first condition was not true while the second one was. That is burning CPU
cycles needlessly and is deadly on UP machines...

Signed-off-by: Jan Kara j...@suse.cz
---
 fs/btrfs/tree-log.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index cb877e0..966cc74 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -1957,7 +1957,8 @@ static int wait_log_commit(struct btrfs_trans_handle 
*trans,
 
finish_wait(root-log_commit_wait[index], wait);
mutex_lock(root-log_mutex);
-   } while (root-log_transid  transid + 2 
+   } while (root-fs_info-last_trans_log_full_commit !=
+trans-transid  root-log_transid  transid + 2 
 atomic_read(root-log_commit[index]));
return 0;
 }
@@ -1966,7 +1967,8 @@ static int wait_for_writer(struct btrfs_trans_handle 
*trans,
   struct btrfs_root *root)
 {
DEFINE_WAIT(wait);
-   while (atomic_read(root-log_writers)) {
+   while (root-fs_info-last_trans_log_full_commit !=
+  trans-transid  atomic_read(root-log_writers)) {
prepare_to_wait(root-log_writer_wait,
wait, TASK_UNINTERRUPTIBLE);
mutex_unlock(root-log_mutex);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/12] Btrfs: Fix file clone when source offset is not 0

2012-01-26 Thread David Sterba
On Thu, Jan 26, 2012 at 02:52:32PM +0100, Jan Schmidt wrote:
 I was looking at the clone range ioctl and have some remarks:
 
 On 27.01.2011 09:46, Li Zefan wrote:
  diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
  index f87552a..1b61dab 100644
  --- a/fs/btrfs/ioctl.c
  +++ b/fs/btrfs/ioctl.c
  @@ -1788,7 +1788,10 @@ static noinline long btrfs_ioctl_clone(struct file 
  *file, unsigned long srcfd,
   
  memcpy(new_key, key, sizeof(new_key));
  new_key.objectid = inode-i_ino;
  -   new_key.offset = key.offset + destoff - off;
  +   if (off = key.offset)
  +   new_key.offset = key.offset + destoff - off;
  +   else
  +   new_key.offset = destoff;
^^^
 1) This looks spurious to me. What if destoff isn't aligned? That's what
 the key.offset - off code above is for. Before the patch, the code
 didn't work at all, I agree. But this fix can only work for aligned
 requests.

Source range and destination offset are accepted iff are aligned:

2300 /* verify the end result is block aligned */
2301 if (!IS_ALIGNED(off, bs) || !IS_ALIGNED(off + len, bs) ||
2302 !IS_ALIGNED(destoff, bs))
2303 goto out_unlock;


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: mask out gfp flasg in releasepage

2012-01-26 Thread David Sterba
btree_releasepage is a callback and can be passed unknown gfp flags and then
they may end up in kmem_cache_alloc called from alloc_extent_state, slab
allocator will BUG_ON when there is HIGHMEM or DMA32 flag set.

This may happen when btrfs is mounted from a loop device, which masks out
__GFP_IO flag. The check in try_release_extent_state

3399 if ((mask  GFP_NOFS) == GFP_NOFS)
3400 mask = GFP_NOFS;

will not work and passes unfiltered flags further resulting in crash at
mm/slab.c:2963

 [0024ae4c] cache_alloc_refill+0x3b4/0x5c8
 [0024c810] kmem_cache_alloc+0x204/0x294
 [001fd3c2] mempool_alloc+0x52/0x170
 [03c000ced0b0] alloc_extent_state+0x40/0xd4 [btrfs]
 [03c000cee5ae] __clear_extent_bit+0x38a/0x4cc [btrfs]
 [03c000cee78c] try_release_extent_state+0x9c/0xd4 [btrfs]
 [03c000cc4c66] btree_releasepage+0x7e/0xd0 [btrfs]
 [00210d84] shrink_page_list+0x6a0/0x724
 [00211394] shrink_inactive_list+0x230/0x578
 [00211bb8] shrink_list+0x6c/0x120
 [00211e4e] shrink_zone+0x1e2/0x228
 [00211f24] shrink_zones+0x90/0x254
 [00213410] do_try_to_free_pages+0xac/0x420
 [00213ae0] try_to_free_pages+0x13c/0x1b0
 [00204e6c] __alloc_pages_nodemask+0x5b4/0x9a8
 [001fb04a] grab_cache_page_write_begin+0x7e/0xe8

Signed-off-by: David Sterba dste...@suse.cz
---
 fs/btrfs/disk-io.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index da4457f..4c86711 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -961,6 +961,13 @@ static int btree_releasepage(struct page *page, gfp_t 
gfp_flags)
tree = BTRFS_I(page-mapping-host)-io_tree;
map = BTRFS_I(page-mapping-host)-extent_tree;
 
+   /*
+* We need to mask out eg. __GFP_HIGHMEM and __GFP_DMA32 as we're doing
+* slab allocation from alloc_extent_state down the callchain where
+* it'd hit a BUG_ON as those flags are not allowed.
+*/
+   gfp_flags = ~GFP_SLAB_BUG_MASK;
+
ret = try_release_extent_state(map, tree, page, gfp_flags);
if (!ret)
return 0;
-- 
1.7.8

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Fix busyloops in transaction waiting code

2012-01-26 Thread Chris Mason
On Thu, Jan 26, 2012 at 05:11:36PM +0100, Jan Kara wrote:
 wait_log_commit() and wait_for_writer() were using slightly different
 conditions for deciding whether they should call schedule() and whether they
 should continue in the wait loop. Thus it could happen that we busylooped when
 the first condition was not true while the second one was. That is burning CPU
 cycles needlessly and is deadly on UP machines...

Thanks Jan, I'll pull this in.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Will BTRFS repair or restore data if corrupted?

2012-01-26 Thread Zoiled

Stefan Behrens wrote:

On 1/26/2012 9:59 AM, Hugo Mills wrote:

On Thu, Jan 26, 2012 at 12:27:57AM +0100, Waxhead wrote:

[...]

Will BTRFS try to repair the corrupt data or will it simply silently
restore the data without the user knowing that a file has been
fixed?

No, it'll just return the good copy and report the failure in the
system logs. If you want to fix the corrupt data, you need to use
scrub, which will check everything and fix blocks with failed
checksums.

Since 3.2, btrfs rewrites the corrupt disk block (commit 4a54c8c and
f4a8e65 from Jan Schmidt), even without scrub.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



So if I for example edit a text file three times and store it I can get 
the following.

Version1: I currently like cheese
Version2: I currently like onions
Version3: I currently like apples
As far as I understand a disk corruption might result in me suddenly 
liking onions (or even cheese) instead of apples without any warning 
except in syslog.?! I really hope I have misunderstood the concept and 
that there is some error correction codes somewhere.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Will BTRFS repair or restore data if corrupted?

2012-01-26 Thread cwillu
 So if I for example edit a text file three times and store it I can get the
 following.
 Version1: I currently like cheese
 Version2: I currently like onions
 Version3: I currently like apples
 As far as I understand a disk corruption might result in me suddenly liking
 onions (or even cheese) instead of apples without any warning except in
 syslog.?! I really hope I have misunderstood the concept and that there is
 some error correction codes somewhere.

Yes, you've completely misunderstood the concept :p

There are crc's on each 4k block of data; if one copy fails the
checksum, and a second copy is available, and that copy does match,
then the good data will be returned and btrfs will overwrite the
corrupted copy with the good copy.  If there isn't another copy, then
an io error will be returned instead.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Will BTRFS repair or restore data if corrupted?

2012-01-26 Thread Zoiled

cwillu wrote:

So if I for example edit a text file three times and store it I can get the
following.
Version1: I currently like cheese
Version2: I currently like onions
Version3: I currently like apples
As far as I understand a disk corruption might result in me suddenly liking
onions (or even cheese) instead of apples without any warning except in
syslog.?! I really hope I have misunderstood the concept and that there is
some error correction codes somewhere.

Yes, you've completely misunderstood the concept :p

There are crc's on each 4k block of data; if one copy fails the
checksum, and a second copy is available, and that copy does match,
then the good data will be returned and btrfs will overwrite the
corrupted copy with the good copy.  If there isn't another copy, then
an io error will be returned instead.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




Phew... that sounds better :)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1.1] btrfs: mask out gfp flags in releasepage

2012-01-26 Thread David Sterba
[fixed the silly typo in subject]

From: David Sterba dste...@suse.cz

btree_releasepage is a callback and can be passed unknown gfp flags and then
they may end up in kmem_cache_alloc called from alloc_extent_state, slab
allocator will BUG_ON when there is HIGHMEM or DMA32 flag set.

This may happen when btrfs is mounted from a loop device, which masks out
__GFP_IO flag. The check in try_release_extent_state

3399 if ((mask  GFP_NOFS) == GFP_NOFS)
3400 mask = GFP_NOFS;

will not work and passes unfiltered flags further resulting in crash at
mm/slab.c:2963

 [0024ae4c] cache_alloc_refill+0x3b4/0x5c8
 [0024c810] kmem_cache_alloc+0x204/0x294
 [001fd3c2] mempool_alloc+0x52/0x170
 [03c000ced0b0] alloc_extent_state+0x40/0xd4 [btrfs]
 [03c000cee5ae] __clear_extent_bit+0x38a/0x4cc [btrfs]
 [03c000cee78c] try_release_extent_state+0x9c/0xd4 [btrfs]
 [03c000cc4c66] btree_releasepage+0x7e/0xd0 [btrfs]
 [00210d84] shrink_page_list+0x6a0/0x724
 [00211394] shrink_inactive_list+0x230/0x578
 [00211bb8] shrink_list+0x6c/0x120
 [00211e4e] shrink_zone+0x1e2/0x228
 [00211f24] shrink_zones+0x90/0x254
 [00213410] do_try_to_free_pages+0xac/0x420
 [00213ae0] try_to_free_pages+0x13c/0x1b0
 [00204e6c] __alloc_pages_nodemask+0x5b4/0x9a8
 [001fb04a] grab_cache_page_write_begin+0x7e/0xe8

Signed-off-by: David Sterba dste...@suse.cz
---


 fs/btrfs/disk-io.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index da4457f..4c86711 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -961,6 +961,13 @@ static int btree_releasepage(struct page *page, gfp_t 
gfp_flags)
tree = BTRFS_I(page-mapping-host)-io_tree;
map = BTRFS_I(page-mapping-host)-extent_tree;
 
+   /*
+* We need to mask out eg. __GFP_HIGHMEM and __GFP_DMA32 as we're doing
+* slab allocation from alloc_extent_state down the callchain where
+* it'd hit a BUG_ON as those flags are not allowed.
+*/
+   gfp_flags = ~GFP_SLAB_BUG_MASK;
+
ret = try_release_extent_state(map, tree, page, gfp_flags);
if (!ret)
return 0;
-- 
1.7.8

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs filesystem df command oddly named?

2012-01-26 Thread Wes
Just wondering,

Why is this command called 'df' when it reports total space and used
space but not free space?  Wouldn't this be more aptly named 'btrfs
filesystem du'  ?
It's been my understanding that traditionally 'df' has been to display
free space remaining (as well as total available and used, but with
the focus being on free space.  The 'df' man page indicates this as
well.)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs filesystem df command oddly named?

2012-01-26 Thread Chester
On Thu, Jan 26, 2012 at 7:11 PM, Wes anomaly...@gmail.com wrote:
 Just wondering,

 Why is this command called 'df' when it reports total space and used
 space but not free space?  Wouldn't this be more aptly named 'btrfs
 filesystem du'  ?
 It's been my understanding that traditionally 'df' has been to display
 free space remaining (as well as total available and used, but with
 the focus being on free space.  The 'df' man page indicates this as
 well.)
I don't know much of what goes on inside BtrFS, but I like to point
out that btrfs fi df doesn't actually report total space on the disk,
only the total space currently allocated.
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs filesystem df command oddly named?

2012-01-26 Thread Hugo Mills
On Fri, Jan 27, 2012 at 01:06:38PM +1100, Wes wrote:
  I don't know much of what goes on inside BtrFS, but I like to point
  out that btrfs fi df doesn't actually report total space on the disk,
  only the total space currently allocated.
 
 
 Good point, and this also supports the notion that it's more of a 'du'
 work-alike than a 'df' one

   On the other hand, df reports global figures, whereas du
reports values for a file or set of files. btrfs fi df also reports
(one set of) global figures, and a hypothetical btrfs fi du (which
will probably surface once Arne finishes the qgroups stuff) will
report on subvolumes and/or sets of files.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- UNIX: British manufacturer of modular shelving units. ---  


signature.asc
Description: Digital signature