Re: SSD format/mount parameters questions

2012-05-18 Thread Sander
Martin wrote (ao):
 Are there any format/mount parameters that should be set for using
 btrfs on SSDs (other than the ssd mount option)?

If possible, format the whole device, do not partition the ssd. This
will guarantee proper allignment.

The kernel will detect the ssd, and apply the ssd mount option
automatically.

 I've got a mix of various 120/128GB SSDs to newly set up. I will be
 using ext4 on the critical ones, but also wish to compare with
 btrfs...

I would use btrfs on the critical ones, as btrfs has checksums to detect
datacorruption.

 The mix includes some SSDs with the Sandforce controller that implements
 its own data compression and data deduplication. How well does btrfs fit
 with those compared to other non-data-compression controllers?

Since you have them both, you might want to find out yourself, and let
us know ;-)

FWIW (not much, as you already have them), I would not buy anything else
than intel. I have about 26 of them for years now (both in servers and
workstations, several series), and never had an issue. Two of my
colleagues have OCZ, and both had to RMA them.

Sander
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs: avoid memory leak of extent state in error handling routine

2012-05-18 Thread Liu Bo
We've forgotten to clear extent states in pinned tree, which will results in
space counter mismatch and memory leak:

WARNING: at fs/btrfs/extent-tree.c:7537 btrfs_free_block_groups+0x1f3/0x2e0 
[btrfs]()
...
space_info 2 has 8380416 free, is not full
space_info total=12582912, used=4096, pinned=4096, reserved=0, may_use=0, 
readonly=4194304
btrfs state leak: start 29364224 end 29376511 state 1 in tree 880075f20090 
refs 1
...

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/disk-io.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a7ffc88..046a737 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3595,6 +3595,8 @@ void btrfs_cleanup_one_transaction(struct 
btrfs_transaction *cur_trans,
 
btrfs_destroy_marked_extents(root, cur_trans-dirty_pages,
 EXTENT_DIRTY);
+   btrfs_destroy_pinned_extent(root,
+   root-fs_info-pinned_extents);
 
/*
memset(cur_trans, 0, sizeof(*cur_trans));
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: destroy the items of the delayed inodes in error handling routine

2012-05-18 Thread Liu Bo
From: Miao Xie mi...@cn.fujitsu.com

the items of the delayed inodes were forgotten to be freed, this patch
fix it.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/delayed-inode.c |   18 ++
 fs/btrfs/delayed-inode.h |3 +++
 fs/btrfs/disk-io.c   |6 ++
 3 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 03e3748..858d6c7 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1879,3 +1879,21 @@ void btrfs_kill_all_delayed_nodes(struct btrfs_root 
*root)
}
}
 }
+
+void btrfs_destroy_delayed_inodes(struct btrfs_root *root)
+{
+   struct btrfs_delayed_root *delayed_root;
+   struct btrfs_delayed_node *curr_node, *prev_node;
+
+   delayed_root = btrfs_get_delayed_root(root);
+
+   curr_node = btrfs_first_delayed_node(delayed_root);
+   while (curr_node) {
+   __btrfs_kill_delayed_node(curr_node);
+
+   prev_node = curr_node;
+   curr_node = btrfs_next_delayed_node(curr_node);
+   btrfs_release_delayed_node(prev_node);
+   }
+}
+
diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h
index 7083d08..c1cfa87 100644
--- a/fs/btrfs/delayed-inode.h
+++ b/fs/btrfs/delayed-inode.h
@@ -124,6 +124,9 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev);
 /* Used for drop dead root */
 void btrfs_kill_all_delayed_nodes(struct btrfs_root *root);
 
+/* Used for clean the transaction */
+void btrfs_destroy_delayed_inodes(struct btrfs_root *root)
+
 /* Used for readdir() */
 void btrfs_get_delayed_items(struct inode *inode, struct list_head *ins_list,
 struct list_head *del_list);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 20196f4..a56026f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3583,6 +3583,9 @@ void btrfs_cleanup_one_transaction(struct 
btrfs_transaction *cur_trans,
if (waitqueue_active(cur_trans-commit_wait))
wake_up(cur_trans-commit_wait);
 
+   btrfs_destroy_delayed_inodes(root);
+   btrfs_assert_delayed_root_empty(root);
+
btrfs_destroy_pending_snapshots(cur_trans);
 
btrfs_destroy_marked_extents(root, cur_trans-dirty_pages,
@@ -3635,6 +3638,9 @@ int btrfs_cleanup_transaction(struct btrfs_root *root)
if (waitqueue_active(t-commit_wait))
wake_up(t-commit_wait);
 
+   btrfs_destroy_delayed_inodes(root);
+   btrfs_assert_delayed_root_empty(root);
+
btrfs_destroy_pending_snapshots(t);
 
btrfs_destroy_delalloc_inodes(root);
-- 
1.7.6.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs: make sure that we've made everything in pinned tree clean

2012-05-18 Thread Liu Bo
Since we have two trees for recording pinned extents, we need to go through
both of them to make sure that we've done everything clean.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/disk-io.c |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 046a737..144f019 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3548,8 +3548,10 @@ static int btrfs_destroy_pinned_extent(struct btrfs_root 
*root,
u64 start;
u64 end;
int ret;
+   bool loop = true;
 
unpin = pinned_extents;
+again:
while (1) {
ret = find_first_extent_bit(unpin, 0, start, end,
EXTENT_DIRTY);
@@ -3567,6 +3569,15 @@ static int btrfs_destroy_pinned_extent(struct btrfs_root 
*root,
cond_resched();
}
 
+   if (loop) {
+   if (unpin == root-fs_info-freed_extents[0])
+   unpin = root-fs_info-freed_extents[1];
+   else
+   unpin = root-fs_info-freed_extents[0];
+   loop = false;
+   goto again;
+   }
+
return 0;
 }
 
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Newbie questions on some of btrfs code...

2012-05-18 Thread Alex Lyakas
Greetings everybody,
I have been studying some of the btrfs code and the developer
documentation on the wiki. My primary interest at this point, is to be
able to search within fs tree of a btrfs subvolume, which was created
as a snapshot of another subvolume. For that I have been using the
debug-tree tool plus the References.png diagram on the wiki. I realize
that my knowledge of btrfs is very rudimentary at this point, so
please bear with me.

# How can I navigate from EXTENT_DATA within the fs tree to
appropriate CHUNK_ITEM in the chunk tree? I am basically trying to
find where the file data resides on disk. For example, I have an
EXTENT_DATA like this:
item 30 key (265 EXTENT_DATA 0) itemoff 888 itemsize 53
extent data disk byte 12648448 nr 8192
extent data offset 0 nr 8192 ram 8192
extent compression 0
I can navigate from here to EXTENT_ITEM within the extent tree, using
btrfs_file_extent_item::disk_bytenr/disk_num_bytes as key for search:
item 3 key (12648448 EXTENT_ITEM 8192) itemoff 3870 itemsize 53
extent refs 1 gen 8 flags 1
extent data backref root 5 objectid 265 offset 0 count 1
But from there how can I reach the relevant CHUNK_ITEM?

# Once I have reached the CHUNK_ITEM, I assume that
btrfs_file_extent_item::offset/num_bytes fields will provide the exact
location of the data on disk. Is that correct? For now I assume that
btrfs was created on a single device and raid0 is used for data, so I
totally ignore mirroring/striping at this point.

# I have been trying to follow btrfs_fiemap(), which seems to do the
job, but it looks like it returns the disk_bytenr/disk_num_bytes
fields without following to CHUNK_ITEMs. Maybe I am wrong.

Some general questions on the ctree code.

# I saw that slot==0 is special. My understanding is that btrfs
maintains the property that the parent of each node/leaf has a key
pointing to that node/leaf, which must be equal to the key in the
slot==0 of this node/leaf. That's what fixup_low_keys() tries to
maintain. Is this correct?

# If my understanding in the previous bullet is correct: Is that the
reason that in btrfs_prev_leaf() it is assumed that if there is a
lesser key, btrfs_search_slot() will never bring us to the slot==0 of
the current leaf?

# btrfs_search_slot(): how can it happen that b becomes NULL, and we
exit the while loop? (and set ret=1)

# btrfs_insert_empty_items(): if nr1, then an array of keys is
expected to be passed. But btrfs_search_slot() is searching only for
the first key. What happens if the first key does not exist (as
expected), but the next key in the array exists?

# Do my questions make sense?

Thanks!
Alex.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Newbie questions on some of btrfs code...

2012-05-18 Thread Hugo Mills
On Fri, May 18, 2012 at 02:21:59PM +0300, Alex Lyakas wrote:
 Greetings everybody,
 I have been studying some of the btrfs code and the developer
 documentation on the wiki. My primary interest at this point, is to be
 able to search within fs tree of a btrfs subvolume, which was created
 as a snapshot of another subvolume. For that I have been using the
 debug-tree tool plus the References.png diagram on the wiki. I realize
 that my knowledge of btrfs is very rudimentary at this point, so
 please bear with me.
 
 # How can I navigate from EXTENT_DATA within the fs tree to
 appropriate CHUNK_ITEM in the chunk tree? I am basically trying to
 find where the file data resides on disk. For example, I have an
 EXTENT_DATA like this:
 item 30 key (265 EXTENT_DATA 0) itemoff 888 itemsize 53
 extent data disk byte 12648448 nr 8192
 extent data offset 0 nr 8192 ram 8192
 extent compression 0
 I can navigate from here to EXTENT_ITEM within the extent tree, using
 btrfs_file_extent_item::disk_bytenr/disk_num_bytes as key for search:
 item 3 key (12648448 EXTENT_ITEM 8192) itemoff 3870 itemsize 53
 extent refs 1 gen 8 flags 1
 extent data backref root 5 objectid 265 offset 0 count 1
 But from there how can I reach the relevant CHUNK_ITEM?

   CHUNK_ITEMs are indexed by the start address of the chunk, so for
the extent at $e, you need to search for the chunk item immediately
before the key (FIRST_CHUNK_TREE, CHUNK_ITEM, $e).

 # Once I have reached the CHUNK_ITEM, I assume that
 btrfs_file_extent_item::offset/num_bytes fields will provide the exact
 location of the data on disk. Is that correct? For now I assume that
 btrfs was created on a single device and raid0 is used for data, so I
 totally ignore mirroring/striping at this point.

   If you want to find the physical position of a given byte in a file
on disk (and repeating some of what you already know):

 - The FS tree holds the directory structure, so you use that to find
   the inode number of the file by name.

 - With the inode number, you can look in the FS tree again to get the
   set of extents which make up the file. These extents are a mapping
   from [byte offset within the file] to [byte offset in virtual
   address space].

 - The extent tree then holds extent info, indexed by virtual address.
   There are two main types of extent: the extents holding file data
   (EXTENT_ITEM), and, overlapping with them, extents representing the
   block groups (BLOCK_GROUP_ITEM), which are the high-level
   allocation units of the FS.

 - For any given file extent (EXTENT_ITEM), you can use the tree
   search API to look in the chunk tree for the chunks holding this
   virtual data extent. (For any non-single RAID level, there will be
   multiple chunks involved). You do this by simply finding CHUNK_ITEM
   items in the tree with a start value immediately less than or
   equal to the virtual-address offset of your file extent.

 - With any replicating RAID (-1 or -10) there will be multiple
   entries in the chunk tree for any given virtual address offset,
   representing the multiple mirrors. For any striped RAID level (-0,
   -10), each chunk record in the tree will have several btrfs_stripe
   records in its array.
   Each btrfs_stripe record that you end up with (duplicate copies
   from the RAID-1/-10, and stripes from the RAID-0/-10) will then
   reference the device tree, which gives you the physical location of
   that btrfs_stripe on a specific disk.

   Note that in the btrfs internal terminology, a stripe is a
contiguous (256MiB or 1GiB) sequence of bytes on a single disk. RAID
stripes (e.g. RAID-0, -10) are actually called sub-stripes in the
btrfs code. There's also no clearly-defined use of the terms chunk
and block group.

   HTH,
   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- How do you become King?  You stand in the marketplace and ---
  announce you're going to tax everyone. If you get out  
   alive, you're King.   


signature.asc
Description: Digital signature


Re: [PATCH v3 3/3] Btrfs: read device stats on mount, write modified ones during commit

2012-05-18 Thread David Sterba
On Wed, May 16, 2012 at 06:50:47PM +0200, Stefan Behrens wrote:
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -823,6 +823,26 @@ struct btrfs_csum_item {
   u8 csum;
  } __attribute__ ((__packed__));
  
 +struct btrfs_device_stats_item {
 + /*
 +  * grow this item struct at the end for future enhancements and keep
 +  * the existing values unchanged
 +  */
 + __le64 cnt_write_io_errs; /* EIO or EREMOTEIO from lower layers */
 + __le64 cnt_read_io_errs; /* EIO or EREMOTEIO from lower layers */
 + __le64 cnt_flush_io_errs; /* EIO or EREMOTEIO from lower layers */
 +
 + /* stats for indirect indications for I/O failures */
 + __le64 cnt_corruption_errs; /* checksum error, bytenr error or
 +  * contents is illegal: this is an
 +  * indication that the block was damaged
 +  * during read or write, or written to
 +  * wrong location or read from wrong
 +  * location */
 + __le64 cnt_generation_errs; /* an indication that blocks have not
 +  * been written */

A few spare u64 would come handy in the future. Currently there are 5,
so add like 7 or 11. We might be interested in collecting more types of
stats, ore more fine-grained.

I see the comment above about enhancing the structure, but then you need
to version this stucture. Let's say this kernel at version 3.5 add this
structre as you propose it now, and kernel 3.6 adds another item
'cnt_exploded'.

Accessing the 3.6-created image with a 3.5 will be ok (older kernel will
not touch the new items).

Accessing the 3.5-created image with a 3.6 will be problematic, as the
kernel would try to access -cnt_exploded .

So, either the 3.6 kernel needs to know not to touch the missing item
(ie. via reading the struct version from somewhere, stored on disk).

Or, there are spare items, which are zeroed in versions that do not use
them and naturally used otherwise, but when new kernel uses old image,
it finds zeros (and will be safe).

 +} __attribute__ ((__packed__));
 +
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Problem with restore of filesystem

2012-05-18 Thread Lars Bahner
I have sinned!

I had a production filesystem without a replica - which is bonked :(

Running restore ( i have tried Debian's btrfs-tools; master and
dangerdonteveruse branches version ) on kernels 3.2 and 3.3 I
consistently get this error message, and I would like suggestions as
to how i might proceed.

root@foo:/net/users/home/bahner/src/btrfs-progs# btrfs-restore -si
/dev/sdc /opt/data/restore/
failed to read /dev/sr0
failed to read /dev/sr0
parent transid verify failed on 5083380932608 wanted 332337 found 339991
parent transid verify failed on 5083380932608 wanted 332337 found 339991
parent transid verify failed on 5083380932608 wanted 332337 found 339991
parent transid verify failed on 5083380932608 wanted 332337 found 339991
Ignoring transid failure
Root objectid is 5
Skipping existing file
/opt/data/restore/data/move/treungen-s01/db/BackupSet/1322218096564/2011-11-25-14-15-41.log
If you wish to overwrite use the -o option to overwrite
btrfs-restore: disk-io.c:589: btrfs_read_fs_root: Assertion
`!(location-objectid == -8ULL || location-offset != (u64)-1)'
failed.
Aborted
root@foo:/net/users/home/bahner/src/btrfs-progs#

There is supposed to be a folder data/users
(/opt/data/restore/data/users) I thought i could somehow bypass the
rootaccess by giving a -m 'data/users/*'-parameter, for instance.
But thinking about I realize I was clutching a straws. And pointers in
the right direction is appreciated.

I have run find-root and tried with the latest blockid for the -t parameter.

Kind regards,
Lars Bahner
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] Btrfs: cancel the scrub when remounting a fs to ro

2012-05-18 Thread David Sterba
On Thu, May 17, 2012 at 07:58:21PM +0800, Miao Xie wrote:
 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -1151,6 +1151,8 @@ static int btrfs_remount(struct super_block *sb, int 
 *flags, char *data)
   /* pause restriper - we want to resume on remount to r/w */
   btrfs_pause_balance(root-fs_info);
  
 + btrfs_scrub_cancel(root);

Can we possibly switch scrub to readonly instead ? I'm not sure what's
the 'least surprise here', whether to cancel everything on the
filesystem upon ro-remount or just the minimal set of operations (and
leave the rest running if possible).

Looking at the scrub code, if dev-readonly is set, no repairs are done,
so the only concern is to wait for any outstanding IOs and then switch
to RO.


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: do not resize a seeding device

2012-05-18 Thread David Sterba
On Thu, May 17, 2012 at 08:08:08PM +0800, Liu Bo wrote:
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -1303,6 +1303,13 @@ static noinline int btrfs_ioctl_resize(struct 
 btrfs_root *root,
   ret = -EINVAL;
   goto out_free;
   }
 + if (device-fs_devices  device-fs_devices-seeding) {
 + printk(KERN_INFO btrfs: resizer unable to apply on 
 +seeding device %s\n, device-name);
 + ret = -EACCES;

I think EINVAL would be more appropriate. EACCESS is about permissions
which do not make much sense in context of resizing devices, besides
that CAP_SYS_ADMIN is required anyway (and checked a few lines above).


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Newbie questions on some of btrfs code...

2012-05-18 Thread Alex Lyakas
Thank you, Hugo, for the detailed explanation. I am now able to find
the CHUNK_ITEMs and to successfully locate the file data on disk.
Can you maybe address several follow-up questions I have?

# When looking for CHUNK_ITEMs, should I check that their
btrfs_chunk::type==BTRFS_BLOCK_GROUP_DATA (and not SYSTEM/METADATA
etc)? Or file extent should always be mapped to BTRFS_BLOCK_GROUP_DATA
chunk?

# It looks like I don't even need to bother with the extent tree at
this point, because from EXTENT_DATA in fs tree I can navigate
directly to CHUNK_ITEM in chunk tree, correct?

# For replicating RAID levels, you said there will be multiple
CHUNK_ITEMs. How do I find them then? Should I know in advance how
much there should be, and look for them, considering only
btrfs_chunk::type==BTRFS_BLOCK_GROUP_DATA? (I don't bother for
replication at this point, though).

# If I find in the fs tree an EXTENT_DATA of type
BTRFS_FILE_EXTENT_PREALLOC, how should I treat it? What does it mean?
(BTRFS_FILE_EXTENT_INLINE are easy to treat).

# One of my files has two EXTENT_DATAs, like this:
item 14 key (270 EXTENT_DATA 0) itemoff 1812 itemsize 53
extent data disk byte 432508928 nr 1474560
extent data offset 0 nr 1470464 ram 1474560
extent compression 0
item 15 key (270 EXTENT_DATA 1470464) itemoff 1759 itemsize 53
extent data disk byte 432082944 nr 126976
extent data offset 0 nr 126976 ram 126976
extent compression 0
Summing btrfs_file_extent_item::num_bytes gives
1470464+126976=1597440. (I know that I should not be summing
btrfs_file_extent_item::disk_num_bytes, but num_bytes).
However, it's INODE_ITEM gives size of 1593360, which is less:
item 11 key (270 INODE_ITEM 0) itemoff 1970 itemsize 160
inode generation 26 size 1593360 block group 0 mode 100700 
links 1

Is this a valid situation, or I should always consider size in
INODE_ITEM as the correct one?

Thanks again,
Alex.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Newbie questions on some of btrfs code...

2012-05-18 Thread Hugo Mills
On Fri, May 18, 2012 at 04:32:09PM +0300, Alex Lyakas wrote:
 Thank you, Hugo, for the detailed explanation. I am now able to find
 the CHUNK_ITEMs and to successfully locate the file data on disk.
 Can you maybe address several follow-up questions I have?
 
 # When looking for CHUNK_ITEMs, should I check that their
 btrfs_chunk::type==BTRFS_BLOCK_GROUP_DATA (and not SYSTEM/METADATA
 etc)? Or file extent should always be mapped to BTRFS_BLOCK_GROUP_DATA
 chunk?

   File extents will either be mapped to a data chunk, _or_ the file
data will live inline in the metadata area, following the
btrfs_extent_item. This is probably the trickiest piece of the on-disk
data format to figure out, and I fear that I didn't document it well
enough. Basically, it's non-obvious where inline extents are
calculated, because there's all sorts of awkward-looking type casting
to get to the data.

 # It looks like I don't even need to bother with the extent tree at
 this point, because from EXTENT_DATA in fs tree I can navigate
 directly to CHUNK_ITEM in chunk tree, correct?

   Mmm... possibly. Again, I'm not sure how this interacts with inline
extents.

 # For replicating RAID levels, you said there will be multiple
 CHUNK_ITEMs. How do I find them then? Should I know in advance how
 much there should be, and look for them, considering only
 btrfs_chunk::type==BTRFS_BLOCK_GROUP_DATA? (I don't bother for
 replication at this point, though).

   Actually, thinking about it, there's a single CHUNK_ITEM, and the
stripe[] array holds all of the per-disk allocations that correspond
to that block group. So, for RAID-1, you'll have precisely two
elements in the stripe[] array. Sorry for getting it wrong earlier.

 # If I find in the fs tree an EXTENT_DATA of type
 BTRFS_FILE_EXTENT_PREALLOC, how should I treat it? What does it mean?
 (BTRFS_FILE_EXTENT_INLINE are easy to treat).

   I don't know, sorry.

 # One of my files has two EXTENT_DATAs, like this:
   item 14 key (270 EXTENT_DATA 0) itemoff 1812 itemsize 53
   extent data disk byte 432508928 nr 1474560
   extent data offset 0 nr 1470464 ram 1474560
   extent compression 0
   item 15 key (270 EXTENT_DATA 1470464) itemoff 1759 itemsize 53
   extent data disk byte 432082944 nr 126976
   extent data offset 0 nr 126976 ram 126976
   extent compression 0
 Summing btrfs_file_extent_item::num_bytes gives
 1470464+126976=1597440. (I know that I should not be summing
 btrfs_file_extent_item::disk_num_bytes, but num_bytes).
 However, it's INODE_ITEM gives size of 1593360, which is less:
   item 11 key (270 INODE_ITEM 0) itemoff 1970 itemsize 160
   inode generation 26 size 1593360 block group 0 mode 100700 
 links 1
 
 Is this a valid situation, or I should always consider size in
 INODE_ITEM as the correct one?

   Again, I don't know off the top of my head. It's been some time
since I dug into these kinds of details, sorry.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- In my day, we didn't have fancy high numbers.  We had ---  
   nothing, one, twain and multitudes.   


signature.asc
Description: Digital signature


Re: Problem with restore of filesystem

2012-05-18 Thread cwillu
On Fri, May 18, 2012 at 6:04 AM, Lars Bahner
bah...@onlinebackupcompany.com wrote:
 I have sinned!

 I had a production filesystem without a replica - which is bonked :(

Grr...

 Running restore ( i have tried Debian's btrfs-tools; master and
 dangerdonteveruse branches version ) on kernels 3.2 and 3.3 I
 consistently get this error message, and I would like suggestions as
 to how i might proceed.

It's worth trying to mount with the latest 3.4rc; it might be able to
hobble along.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD format/mount parameters questions

2012-05-18 Thread Clemens Eisserer
 I would not buy anything else
 than intel. I have about 26 of them for years now (both in servers and
 workstations, several series), and never had an issue. Two of my
 colleagues have OCZ, and both had to RMA them.

I guess it boils down wether you want intel also to rule the SSD
market in the long term, as they do with PC processors...

Comparing intel SSDs with OCZ is not that fair, as OCZ has always been
low-priced bleeding edge stuff.
Usually ratings at Amazon are a good indicator how reliable the
product in question is.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD format/mount parameters questions

2012-05-18 Thread Tomasz Torcz
On Fri, May 18, 2012 at 05:08:33PM +0200, Clemens Eisserer wrote:
  I would not buy anything else
  than intel. I have about 26 of them for years now (both in servers and
  workstations, several series), and never had an issue. Two of my
  colleagues have OCZ, and both had to RMA them.
 
 I guess it boils down wether you want intel also to rule the SSD
 market in the long term, as they do with PC processors...
 
 Comparing intel SSDs with OCZ is not that fair, as OCZ has always been
 low-priced bleeding edge stuff.

  Looking into the controllers...
  first there were bunch of different ones; Intel had it own design with
SSD 320.
  Then come Sandforce; it got broadly used, despite sucking when used
with FDE. Even Intel started to used Sandforce - SSD 520. How's
reliabilty of Intel differs?
  Latest fad is Marvell controller; again Intel joins the pack with SSD510.

  So, Intel is not that different anymore.

-- 
Tomasz Torcz God, root, what's the difference?
xmpp: zdzich...@chrome.pl God is more forgiving.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD format/mount parameters questions

2012-05-18 Thread Calvin Walton
On Fri, 2012-05-18 at 17:32 +0200, Tomasz Torcz wrote:
 On Fri, May 18, 2012 at 05:08:33PM +0200, Clemens Eisserer wrote:
   I would not buy anything else
   than intel. I have about 26 of them for years now (both in servers and
   workstations, several series), and never had an issue. Two of my
   colleagues have OCZ, and both had to RMA them.
  
  I guess it boils down wether you want intel also to rule the SSD
  market in the long term, as they do with PC processors...
  
  Comparing intel SSDs with OCZ is not that fair, as OCZ has always been
  low-priced bleeding edge stuff.
 
   Looking into the controllers...
   first there were bunch of different ones; Intel had it own design with
 SSD 320.
   Then come Sandforce; it got broadly used, despite sucking when used
 with FDE. Even Intel started to used Sandforce - SSD 520. How's
 reliabilty of Intel differs?
   Latest fad is Marvell controller; again Intel joins the pack with SSD510.
 
   So, Intel is not that different anymore.

The controllers themselves really aren't that interesting any more - an
SSD controller is really just an ARM or MIPS core with some flash
interfaces, a SATA interface, and some ram - running proprietary
firmware.

Several of the Marvell devices actually have completely different
firmwares (e.g. Intel's firmware for Marvell devices was reportedly
developed by them in-house), and Intel's Sandforce firmware has some
customizations for improved reliability, at the expense of some speed.

-- 
Calvin Walton calvin.wal...@kepstin.ca

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs RAID with RAID cards (thread renamed)

2012-05-18 Thread Daniel Pocock

 - if a non-RAID SAS card is used, does it matter which card is chosen?
 Does btrfs work equally well with all of them?
 
 If you're using btrfs RAID, you need a HBA, not a RAID card. If the RAID 
 card can work as a HBA (usually labelled as JBOD mode) then you're good to 
 go.
 
 For example, HP CCISS controllers can't work in JBOD mode.

Would you know if they implement their own checksumming, similar to what
btrfs does?  Or if someone uses SmartArray (CCISS) RAID1, then they
simply don't get the full benefit of checksumming under any possible
configuration?

I've had a quick look at what is on the market, here are some observations:

- in many cases, IOPS (critical for SSDs) vary wildly: e.g.
  - SATA-3 SSDs advertise up to 85k IOPS, so RAID1 needs 170k IOPS
  - HP's standard HBAs don't support high IOPS
  - HP Gen8 SmartArray (e.g. P420) claims up to 200k IOPS
  - previous HP arrays (e.g. P212) support only 60k IOPS
  - many vendors don't advertise the IOPS prominently - I had to Google
the HP site to find those figures quoted in some PDFs, they don't quote
them in the quickspecs or product summary tables

- Adaptec now offers an SSD caching function in hardware, supposedly
drop it in the machine and all disks respond faster
  - how would this interact with btrfs checksumming?  E.g. I'm guessing
it would be necessary to ensure that data from both spindles is not
cached on the same SSD?
  - I started thinking about the possibility that data is degraded on
the mechanical disk but btrfs gets a good checksum read from the SSD and
remains blissfully unaware that the real disk is failing, then the other
disk goes completely offline one day, for whatever reason the data is
not in the SSD cache and the sector can't be read reliably from the
remaining physical disk - should such caching just be avoided or can it
be managed from btrfs itself in a manner that is foolproof?

How about the combination of btrfs/root/boot filesystems and grub?  Can
they all play nicely together?  This seems to be one compelling factor
with hardware RAID, the cards have a BIOS that can boot from any drive
even if the other is offline.




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Subdirectory creation on snapshot

2012-05-18 Thread David Sterba
On Mon, May 14, 2012 at 10:11:01AM -0700, Brendan Smithyman wrote:
 The disks that *are* still showing the subdirectory creation issue
 were both converted from ext4 (using old tools).  So perhaps that's a
 direction to explore.

Yeah, thanks for the hint.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-18 Thread Martin Mailand

Hi Josef,
there was one line before the bug.

[  995.725105] couldn't find orphan item for 524


Am 18.05.2012 16:48, schrieb Josef Bacik:

Ok hopefully this will print something out that makes sense.  Thanks,


-martin

[  241.754693] Btrfs loaded
[  241.755148] device fsid 43c4ebd9-3824-4b07-a710-3ec39b012759 devid 1 
transid 4 /dev/sdc

[  241.755750] btrfs: setting nodatacow
[  241.755753] btrfs: enabling auto defrag
[  241.755754] btrfs: disk space caching is enabled
[  241.755755] btrfs flagging fs with big metadata feature
[  241.768683] device fsid e7e7f2df-6a4e-45b1-85cc-860cda849953 devid 1 
transid 4 /dev/sdd

[  241.769028] btrfs: setting nodatacow
[  241.769030] btrfs: enabling auto defrag
[  241.769031] btrfs: disk space caching is enabled
[  241.769032] btrfs flagging fs with big metadata feature
[  241.781360] device fsid 203fdd4c-baac-49f8-bfdb-08486c937989 devid 1 
transid 4 /dev/sde

[  241.781854] btrfs: setting nodatacow
[  241.781859] btrfs: enabling auto defrag
[  241.781861] btrfs: disk space caching is enabled
[  241.781864] btrfs flagging fs with big metadata feature
[  242.713741] device fsid 95c36e12-0098-48d7-a08d-9d54a299206b devid 1 
transid 4 /dev/sdf

[  242.714110] btrfs: setting nodatacow
[  242.714118] btrfs: enabling auto defrag
[  242.714121] btrfs: disk space caching is enabled
[  242.714125] btrfs flagging fs with big metadata feature
[  995.725105] couldn't find orphan item for 524
[  995.725126] [ cut here ]
[  995.725134] kernel BUG at fs/btrfs/inode.c:2227!
[  995.725143] invalid opcode:  [#1] SMP
[  995.725158] CPU 0
[  995.725162] Modules linked in: btrfs zlib_deflate libcrc32c ext2 
coretemp ghash_clmulni_intel aesni_intel bonding cryptd aes_x86_64 
microcode psmouse serio_raw sb_edac edac_core joydev mei(C) ses ioatdma 
enclosure mac_hid lp parport ixgbe usbhid hid isci libsas megaraid_sas 
scsi_transport_sas igb dca mdio

[  995.725285]
[  995.725290] Pid: 2972, comm: ceph-osd Tainted: G C 
3.4.0-rc7.2012051800+ #14 Supermicro X9SRi/X9SRi
[  995.725324] RIP: 0010:[a028535f]  [a028535f] 
btrfs_orphan_del+0x14f/0x160 [btrfs]

[  995.725354] RSP: 0018:881016ed9d18  EFLAGS: 00010292
[  995.725364] RAX: 0037 RBX: 88101485fdb0 RCX: 

[  995.725378] RDX:  RSI: 0082 RDI: 
0246
[  995.725392] RBP: 881016ed9d58 R08:  R09: 

[  995.725405] R10:  R11: 00b6 R12: 
88101efe9f90
[  995.725419] R13: 88101efe9c00 R14: 0001 R15: 
0001
[  995.725433] FS:  7f58e5dbc700() GS:88107fc0() 
knlGS:

[  995.725466] CS:  0010 DS:  ES:  CR0: 80050033
[  995.725492] CR2: 03f28000 CR3: 00101acac000 CR4: 
000407f0
[  995.725522] DR0:  DR1:  DR2: 

[  995.725551] DR3:  DR6: 0ff0 DR7: 
0400
[  995.725581] Process ceph-osd (pid: 2972, threadinfo 881016ed8000, 
task 88101618)

[  995.725626] Stack:
[  995.725646]  0c02 88101deaf550 881016ed9d38 
88101deaf550
[  995.725700]   88101efe9c00 88101485fdb0 
880be890c1e0
[  995.725757]  881016ed9e08 a02897a8 88101485fdb0 


[  995.725807] Call Trace:
[  995.725835]  [a02897a8] btrfs_truncate+0x5e8/0x6d0 [btrfs]
[  995.725869]  [a028b121] btrfs_setattr+0xc1/0x1b0 [btrfs]
[  995.725898]  [811955c3] notify_change+0x183/0x320
[  995.725925]  [8117889e] do_truncate+0x5e/0xa0
[  995.725951]  [81178a24] sys_truncate+0x144/0x1b0
[  995.725979]  [8165fd29] system_call_fastpath+0x16/0x1b
[  995.726006] Code: 45 31 ff e9 3c ff ff ff 48 8b b3 58 fe ff ff 48 85 
f6 74 19 80 bb 60 fe ff ff 84 74 10 48 c7 c7 08 48 2e a0 31 c0 e8 09 7c 
3c e1 0f 0b 48 8b 73 40 eb ea 66 0f 1f 84 00 00 00 00 00 55 48 89 e5
[  995.726221] RIP  [a028535f] btrfs_orphan_del+0x14f/0x160 
[btrfs]

[  995.726258]  RSP 881016ed9d18
[  995.726574] ---[ end trace 4bde8f513a6d106d ]---

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: merge contigous regions when loading free space cache

2012-05-18 Thread Josef Bacik
When we write out the free space cache we will write out everything that is
in our in memory tree, and then we will just walk the pinned extents tree
and write anything we see there.  The problem with this is that during
normal operations the pinned extents will be merged back into the free space
tree normally, and then we can allocate space from the merged areas and
commit them to the tree log.  If we crash and replay the tree log we will
crash again because the tree log will try to free up space from what looks
like 2 seperate but contiguous entries, since one entry is from the original
free space cache and the other was a pinned extent that was merged back.  To
fix this we just need to walk the free space tree after we load it and merge
contiguous entries back together.  This will keep the tree log stuff from
breaking and it will make the allocator behave more nicely.  Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/free-space-cache.c |   41 +
 1 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index cecf8df..19a0d85 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -33,6 +33,8 @@
 
 static int link_free_space(struct btrfs_free_space_ctl *ctl,
   struct btrfs_free_space *info);
+static void unlink_free_space(struct btrfs_free_space_ctl *ctl,
+ struct btrfs_free_space *info);
 
 static struct inode *__lookup_free_space_inode(struct btrfs_root *root,
   struct btrfs_path *path,
@@ -584,6 +586,44 @@ static int io_ctl_read_bitmap(struct io_ctl *io_ctl,
return 0;
 }
 
+/*
+ * Since we attach pinned extents after the fact we can have contiguous 
sections
+ * of free space that are split up in entries.  This poses a problem with the
+ * tree logging stuff since it could have allocated across what appears to be 2
+ * entries since we would have merged the entries when adding the pinned 
extents
+ * back to the free space cache.  So run through the space cache that we just
+ * loaded and merge contiguous entries.  This will make the log replay stuff 
not
+ * blow up and it will make for nicer allocator behavior.
+ */
+static void merge_space_tree(struct btrfs_free_space_ctl *ctl)
+{
+   struct btrfs_free_space *e, *prev = NULL;
+   struct rb_node *n;
+
+again:
+   spin_lock(ctl-tree_lock);
+   for (n = rb_first(ctl-free_space_offset); n; n = rb_next(n)) {
+   e = rb_entry(n, struct btrfs_free_space, offset_index);
+   if (!prev)
+   goto next;
+   if (e-bitmap || prev-bitmap)
+   goto next;
+   if (prev-offset + prev-bytes == e-offset) {
+   unlink_free_space(ctl, prev);
+   unlink_free_space(ctl, e);
+   prev-bytes += e-bytes;
+   kmem_cache_free(btrfs_free_space_cachep, e);
+   link_free_space(ctl, prev);
+   prev = NULL;
+   spin_unlock(ctl-tree_lock);
+   goto again;
+   }
+next:
+   prev = e;
+   }
+   spin_unlock(ctl-tree_lock);
+}
+
 int __load_free_space_cache(struct btrfs_root *root, struct inode *inode,
struct btrfs_free_space_ctl *ctl,
struct btrfs_path *path, u64 offset)
@@ -726,6 +766,7 @@ int __load_free_space_cache(struct btrfs_root *root, struct 
inode *inode,
}
 
io_ctl_drop_pages(io_ctl);
+   merge_space_tree(ctl);
ret = 1;
 out:
io_ctl_free(io_ctl);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trim malfunction in linux 3.3.6

2012-05-18 Thread Sergey Kolesnikov
Hello.

2012/5/18 Liu Bo liubo2...@cn.fujitsu.com

 On 05/18/2012 12:10 AM, Sergey E. Kolesnikov wrote:
 Could you please show some logs about the corrpution?

Ugh. Sorry, but logs got corrupted too :-(
Today I've tested 3.4-rc7 kernel and everything seems to be fine.
May be fix mentioned by Tomasz should be ported back to 3.3.x since it
is current stable release and this bug is really dangerous.


Thanks,
Sergey.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-18 Thread Martin Mailand

Hi Josef,
now I get
[ 2081.142669] couldn't find orphan item for 2039, nlink 1, root 269, 
root being deleted no


-martin

Am 18.05.2012 21:01, schrieb Josef Bacik:

*sigh*  ok try this, hopefully it will point me in the right direction.  Thanks,



[  126.389847] Btrfs loaded
[  126.390284] device fsid 0c9d8c6d-2982-4604-b32a-fc443c4e2c50 devid 1 
transid 4 /dev/sdc

[  126.391246] btrfs: setting nodatacow
[  126.391252] btrfs: enabling auto defrag
[  126.391254] btrfs: disk space caching is enabled
[  126.391257] btrfs flagging fs with big metadata feature
[  126.405700] device fsid e8a0dc27-8714-49bd-a14f-ac37525febb1 devid 1 
transid 4 /dev/sdd

[  126.406162] btrfs: setting nodatacow
[  126.406167] btrfs: enabling auto defrag
[  126.406170] btrfs: disk space caching is enabled
[  126.406172] btrfs flagging fs with big metadata feature
[  126.419819] device fsid f67cd977-ebf4-41f2-9821-f2989e985954 devid 1 
transid 4 /dev/sde

[  126.420198] btrfs: setting nodatacow
[  126.420206] btrfs: enabling auto defrag
[  126.420210] btrfs: disk space caching is enabled
[  126.420214] btrfs flagging fs with big metadata feature
[  127.274555] device fsid 3001355e-c2e2-46c7-9eba-dfecb441d6a6 devid 1 
transid 4 /dev/sdf

[  127.274980] btrfs: setting nodatacow
[  127.274986] btrfs: enabling auto defrag
[  127.274989] btrfs: disk space caching is enabled
[  127.274992] btrfs flagging fs with big metadata feature
[ 2081.142669] couldn't find orphan item for 2039, nlink 1, root 269, 
root being deleted no

[ 2081.142735] [ cut here ]
[ 2081.142750] kernel BUG at fs/btrfs/inode.c:2228!
[ 2081.142766] invalid opcode:  [#1] SMP
[ 2081.142786] CPU 10
[ 2081.142794] Modules linked in: btrfs zlib_deflate libcrc32c ext2 
bonding coretemp ghash_clmulni_intel aesni_intel cryptd aes_x86_64 
microcode psmouse serio_raw sb_edac edac_core joydev mei(C) ioatdma ses 
enclosure mac_hid lp parport usbhid hid megaraid_sas isci libsas 
scsi_transport_sas igb ixgbe dca mdio

[ 2081.142974]
[ 2081.142985] Pid: 2966, comm: ceph-osd Tainted: G C 
3.4.0-rc7.2012051802+ #16 Supermicro X9SRi/X9SRi
[ 2081.143020] RIP: 0010:[a0269383]  [a0269383] 
btrfs_orphan_del+0x173/0x180 [btrfs]

[ 2081.143080] RSP: 0018:881016d83d18  EFLAGS: 00010292
[ 2081.143096] RAX: 0062 RBX: 881017ad4770 RCX: 

[ 2081.143115] RDX:  RSI: 0082 RDI: 
0246
[ 2081.143134] RBP: 881016d83d58 R08:  R09: 

[ 2081.143154] R10:  R11: 0116 R12: 
88101e7baf90
[ 2081.143173] R13: 88101e7bac00 R14: 0001 R15: 
0001
[ 2081.143193] FS:  7fcc1e736700() GS:88107fd4() 
knlGS:

[ 2081.143243] CS:  0010 DS:  ES:  CR0: 80050033
[ 2081.143274] CR2: 09269000 CR3: 00101ba87000 CR4: 
000407e0
[ 2081.143308] DR0:  DR1:  DR2: 

[ 2081.143341] DR3:  DR6: 0ff0 DR7: 
0400
[ 2081.143376] Process ceph-osd (pid: 2966, threadinfo 881016d82000, 
task 881023c744a0)

[ 2081.143424] Stack:
[ 2081.143447]  0c07 88101e1dac30 881016d83d38 
88101e1dac30
[ 2081.143510]   88101e7bac00 881017ad4770 
88101f0f7d60
[ 2081.143572]  881016d83e08 a026d7c8 881017ad4770 


[ 2081.143634] Call Trace:
[ 2081.143684]  [a026d7c8] btrfs_truncate+0x5e8/0x6d0 [btrfs]
[ 2081.143737]  [a026f141] btrfs_setattr+0xc1/0x1b0 [btrfs]
[ 2081.143773]  [811955c3] notify_change+0x183/0x320
[ 2081.143807]  [8117889e] do_truncate+0x5e/0xa0
[ 2081.143839]  [81178a24] sys_truncate+0x144/0x1b0
[ 2081.143873]  [8165fd29] system_call_fastpath+0x16/0x1b
[ 2081.143903] Code: a0 49 8b 8d f0 02 00 00 8b 53 48 4c 0f 44 c0 48 85 
f6 74 19 80 bb 60 fe ff ff 84 74 10 48 c7 c7 10 88 2c a0 31 c0 e8 e5 3b 
3e e1 0f 0b 48 8b 73 40 eb ea 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10
[ 2081.144199] RIP  [a0269383] btrfs_orphan_del+0x173/0x180 
[btrfs]

[ 2081.144258]  RSP 881016d83d18
[ 2081.144614] ---[ end trace 8d0829d100639242 ]---

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs: Probably the larger filesystem I will see for a long time

2012-05-18 Thread Christian Robert

Probably the larger filesystem I will ever see. Tryed 8 Exabytes but it failed.

[root@CentOS6-A:/root] # df
Filesystem1K-blocks  Used Available  Use%  
Mounted
/dev/mapper/vg01-root  17915884  11533392   5513572   68%  /
/dev/sda1508745140314342831   30%  /boot
/dev/mapper/data_0 66993872   1644372  619940603%  
/mnt/data_0
/dev/mapper/data_1 7881299347898368508360  78812482240918961%  
/mnt/data_1

[root@CentOS6-A:/root] # df -h
Filesystem Size  Used  Avail  Use%  Mounted
/dev/mapper/vg01-root   18G   11G   5.3G   68%  /
/dev/sda1  497M  138M   335M   30%  /boot
/dev/mapper/data_0  64G  1.6G60G3%  /mnt/data_0
/dev/mapper/data_1 7.0E  497M   7.0E1%  /mnt/data_1

[root@CentOS6-A:/root] # df -Th
Filesystem  Type  Size  Used  Avail  Use%
/dev/mapper/vg01-root   ext4   18G   11G   5.3G  68%
/dev/sda1   ext4  497M  138M   335M  30%
/dev/mapper/data_0  ext4   64G  1.6G60G  3%
/dev/mapper/data_1 btrfs  7.0E  499M   7.0E  1%
[root@CentOS6-A:/root] #


[root@CentOS6-A:/root] # uname -rv
3.4.0-rc7+ #23 SMP Wed May 16 20:20:47 EDT 2012


made with a dm-thin device sitting on a device pair composed of (metadata 
256Megs and data 23 Gigs)

running on my laptop at home.

yes, this is 7 Exabytes or 7,168 Petabytes or ( 7,340,032 Terabytes ) or 
7,516,192,768 Gigabytes.


please do not answer, it is just a statement of a fact at 3.4-rc7 (was not 
working at 3.4-rc3 if I remember).


Xtian.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/7] bcache: md conversion

2012-05-18 Thread Alex Elsayed
Dan Williams wrote:

 The consensus from LSF was that bcache need not invent a new interface
 when md and dm can both do the job.  As mentioned in patch 7 this series
 aims to be a minimal conversion.  Other refactoring items like
 deprecating register_lock for mddev-reconfig_mutex are deferred.
 
 This supports assembly of an already established cache array:
 
 mdadm -A /dev/md/bcache /dev/sd[ab]
 
 ...will create the /dev/md/bcache container and a subarray representing
 the cache volume.  Flash-only, or backing-device only volumes were not
 tested.  Create support and hot-add/hot-remove come later.
 
 Note:
 * When attempting to test with small loopback devices (100MB), assembly
   soft locks in bcache_journal_read().  That hang went away with larger
   devices, so there seems to be minimum component device size that needs
   to be considered in the tooling.

Is there any plan to separate the on-disk layout (per-device headers, etc) 
from the logic for the purpose of reuse? I can think of at least one case 
where this would be extremely useful: integration in BtrFS.

BtrFS already has its own methods for making sure a group of devices are all 
present when the filesystem is mounted, so it doesn't really need the 
formatting of the backing device bcache does to prevent it from being 
mounted solo. Putting bcache under BtrFS would be silly in the same way as 
putting it under a raid array, but bcache can't be put on top of BtrFS.

Logically, in looking at BtrFS' architecture, a cache would likely fit best 
at the 'block group' level, which IIUC would be roughly equivalent to the 
recommended 'over raid, under lvm' method of using bcache.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html