Re: Minimum device size of 256 MiB?

2012-05-11 Thread Berke Durak
Unfortunately nothing guarantees that the footprint of the filesystem
will stay bounded.

In other words, even if the number of blocks occupied by the
filesystem at any time is less than the capacity of the physical
device backing the sparse device, the total number of blocks ever
occupied by the filesystem since its creation may exceed the capacity
of the physical device.
-- 
Berke Durak
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Minimum device size of 256 MiB?

2012-05-11 Thread Berke Durak
On Fri, May 11, 2012 at 4:50 PM, Aaron Toponce
 wrote:
> I've noticed the same. I'm interested in researching the patterns
> the filesystem puts down on an encrypted container, but would like
> to use 1MB files as the block device for the filesystem. Looking for
> patterns in 256MB files is too expensive.

I think I may have found a workaround to my problem : use dm-zero to
create a sparse 256 MB device backed by a smaller partition.

I don't know if that would be useful for your crypto signature
analysis.

Of course it would be better if btrfs simply allowed smaller
partitions.

(Sorry for the previous mail it did not come out as intended.)
-- 
Berke Durak
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Minimum device size of 256 MiB?

2012-05-11 Thread Berke Durak
I think I may have found a workaround.

> One very interesting use of dm-zero is for creating "sparse" devices in
conjunction with dm-snapshot. A sparse device reports a device-size
larger than the amount of actual storage space
available for that device. A user can write data anywhere within the
sparse device and read it back like a normal
device. Reads to previously unwritten areas will return a zero'd
buffer. When enough data has been written to fill up
the actual storage space, the sparse device is deactivated. This can
be very useful for testing device and filesystem
limitations.

I don't know if that would be useful for your crypto signature analysis.

On Fri, May 11, 2012 at 4:50 PM, Aaron Toponce  wrote:
> On Fri, May 11, 2012 at 03:57:24PM -0400, Berke Durak wrote:
>> There seems to be a 256 MiB lower limit on device size : mkfs.btrfs
>> refuses to create a filesystem on a device that is smaller than that.
>
> I've noticed the same. I'm interested in researching the patterns the
> filesystem puts down on an encrypted container, but would like to use 1MB
> files as the block device for the filesystem. Looking for patterns in 256MB
> files is too expensive.
>
> --
> . o .   o . o   . . o   o . .   . o .
> . . o   . o o   o . o   . o o   . . o
> o o o   . o .   . o o   o o .   o o o
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Minimum device size of 256 MiB?

2012-05-11 Thread Aaron Toponce
On Fri, May 11, 2012 at 03:57:24PM -0400, Berke Durak wrote:
> There seems to be a 256 MiB lower limit on device size : mkfs.btrfs
> refuses to create a filesystem on a device that is smaller than that.

I've noticed the same. I'm interested in researching the patterns the
filesystem puts down on an encrypted container, but would like to use 1MB
files as the block device for the filesystem. Looking for patterns in 256MB
files is too expensive.

--
. o .   o . o   . . o   o . .   . o .
. . o   . o o   o . o   . o o   . . o
o o o   . o .   . o o   o o .   o o o


pgpXdOkzrLpUg.pgp
Description: PGP signature


Minimum device size of 256 MiB?

2012-05-11 Thread Berke Durak
Hello,

There seems to be a 256 MiB lower limit on device size : mkfs.btrfs
refuses to create a filesystem
on a device that is smaller than that.

I'm interested in using the btrfs RAID facility to provide redundancy
for an embedded system, but I can't afford
two 256 MiB partitions.

What are the structural reasons for this limit?  Could I reduce this
limit easily with a couple small code changes
or would it take a major effort?

Thanks,
-- 
Berke Durak
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-11 Thread Josef Bacik
On Fri, May 11, 2012 at 08:33:34PM +0200, Martin Mailand wrote:
> Hi Josef,
> 
> Am 11.05.2012 15:31, schrieb Josef Bacik:
> >That previous patch was against btrfs-next, this patch is against 3.4-rc6 if 
> >you
> >are on mainline.  Thanks,
> 
> I tried your patch against mainline, after a few minutes I hit this bug.
> 

Heh duh, sorry, try this one instead.  Thanks,

Josef

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 9b9b15f..54af1fa 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -57,9 +57,6 @@ struct btrfs_inode {
/* used to order data wrt metadata */
struct btrfs_ordered_inode_tree ordered_tree;
 
-   /* for keeping track of orphaned inodes */
-   struct list_head i_orphan;
-
/* list of all the delalloc inodes in the FS.  There are times we need
 * to write all the delalloc pages to disk, and this list is used
 * to walk them all.
@@ -156,6 +153,7 @@ struct btrfs_inode {
unsigned dummy_inode:1;
unsigned in_defrag:1;
unsigned delalloc_meta_reserved:1;
+   unsigned has_orphan_item:1;
 
/*
 * always compress this one file
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8fd7233..aad2600 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1375,7 +1375,7 @@ struct btrfs_root {
struct list_head root_list;
 
spinlock_t orphan_lock;
-   struct list_head orphan_list;
+   atomic_t orphan_inodes;
struct btrfs_block_rsv *orphan_block_rsv;
int orphan_item_inserted;
int orphan_cleanup_state;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a7ffc88..ff3bf4b 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1153,7 +1153,6 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 
sectorsize,
root->orphan_block_rsv = NULL;
 
INIT_LIST_HEAD(&root->dirty_list);
-   INIT_LIST_HEAD(&root->orphan_list);
INIT_LIST_HEAD(&root->root_list);
spin_lock_init(&root->orphan_lock);
spin_lock_init(&root->inode_lock);
@@ -1166,6 +1165,7 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 
sectorsize,
atomic_set(&root->log_commit[0], 0);
atomic_set(&root->log_commit[1], 0);
atomic_set(&root->log_writers, 0);
+   atomic_set(&root->orphan_inodes, 0);
root->log_batch = 0;
root->log_transid = 0;
root->last_log_commit = 0;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 61b16c6..5ba68d0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2072,12 +2072,12 @@ void btrfs_orphan_commit_root(struct btrfs_trans_handle 
*trans,
struct btrfs_block_rsv *block_rsv;
int ret;
 
-   if (!list_empty(&root->orphan_list) ||
+   if (atomic_read(&root->orphan_inodes) ||
root->orphan_cleanup_state != ORPHAN_CLEANUP_DONE)
return;
 
spin_lock(&root->orphan_lock);
-   if (!list_empty(&root->orphan_list)) {
+   if (atomic_read(&root->orphan_inodes)) {
spin_unlock(&root->orphan_lock);
return;
}
@@ -2134,8 +2134,8 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, 
struct inode *inode)
block_rsv = NULL;
}
 
-   if (list_empty(&BTRFS_I(inode)->i_orphan)) {
-   list_add(&BTRFS_I(inode)->i_orphan, &root->orphan_list);
+   if (!BTRFS_I(inode)->has_orphan_item) {
+   BTRFS_I(inode)->has_orphan_item = 1;
 #if 0
/*
 * For proper ENOSPC handling, we should do orphan
@@ -2148,6 +2148,7 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, 
struct inode *inode)
insert = 1;
 #endif
insert = 1;
+   atomic_inc(&root->orphan_inodes);
}
 
if (!BTRFS_I(inode)->orphan_meta_reserved) {
@@ -2195,9 +2196,13 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, 
struct inode *inode)
int release_rsv = 0;
int ret = 0;
 
+   /*
+* evict_inode gets called without holding the i_mutex so we need to
+* take the orphan lock to make sure we are safe in messing with these.
+*/
spin_lock(&root->orphan_lock);
-   if (!list_empty(&BTRFS_I(inode)->i_orphan)) {
-   list_del_init(&BTRFS_I(inode)->i_orphan);
+   if (BTRFS_I(inode)->has_orphan_item) {
+   BTRFS_I(inode)->has_orphan_item = 0;
delete_item = 1;
}
 
@@ -2215,6 +2220,9 @@ int btrfs_orphan_del(struct btrfs_trans_handle *trans, 
struct inode *inode)
if (release_rsv)
btrfs_orphan_release_metadata(inode);
 
+   if (trans && delete_item)
+   atomic_dec(&root->orphan_inodes);
+
return 0;
 }
 
@@ -2352,9 +2360,8 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
 * add this inode to the orphan list so btrfs_orphan_del does
 * the proper thing when we hit it
  

Re: [RFC] [PATCH 2/2] Btrfs: move over to use ->update_time

2012-05-11 Thread Josef Bacik
On Fri, May 11, 2012 at 09:06:31AM +0300, Kasatkin, Dmitry wrote:
> On Thu, Apr 12, 2012 at 2:32 PM, David Sterba  wrote:
> > On Thu, Apr 12, 2012 at 02:09:08PM +0300, Kasatkin, Dmitry wrote:
> >> Where is it? Can you please point out?
> >
> > http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16662
> >
> > the relevant part:
> >
> > -- a/fs/btrfs/super.c
> > +++ b/fs/btrfs/super.c
> > @@ -770,7 +770,7 @@ static int btrfs_fill_super(struct super_block *sb,
> >  #ifdef CONFIG_BTRFS_FS_POSIX_ACL
> >        sb->s_flags |= MS_POSIXACL;
> >  #endif
> > -
> > +       sb->s_flags |= MS_I_VERSION;
> >        err = open_ctree(sb, fs_devices, (char *)data);
> >        if (err) {
> >                printk("btrfs: open_ctree failed\n");
> 
> Hello,
> 
> FYI:
> 
> In fact just tried yesterday to use mount option "iversion" on 3.4-rc5
> and it seems to work without
>  +       sb->s_flags |= MS_I_VERSION;

This hasn't gone in yet, it won't go in until 3.5.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-11 Thread Martin Mailand

Hi Josef,

Am 11.05.2012 15:31, schrieb Josef Bacik:

That previous patch was against btrfs-next, this patch is against 3.4-rc6 if you
are on mainline.  Thanks,


I tried your patch against mainline, after a few minutes I hit this bug.

[ 1078.523655] [ cut here ]
[ 1078.523667] kernel BUG at fs/btrfs/inode.c:2211!
[ 1078.523676] invalid opcode:  [#1] SMP
[ 1078.523692] CPU 5
[ 1078.523696] Modules linked in: btrfs zlib_deflate libcrc32c mlx4_en 
bonding ext2 coretemp ghash_clmulni_intel aesni_intel cryptd aes_x86_64 
microcode psmouse serio_raw sb_edac edac_core mei(C) joydev ses ioatdma 
enclosure mac_hid lp parport isci libsas scsi_transport_sas usbhid hid 
igb megaraid_sas mlx4_core dca

[ 1078.523813]
[ 1078.523818] Pid: 4108, comm: ceph-osd Tainted: G C 
3.4.0-rc6+ #5 Supermicro X9SRi/X9SRi
[ 1078.523841] RIP: 0010:[]  [] 
btrfs_orphan_del+0xb2/0xc0 [btrfs]

[ 1078.523867] RSP: 0018:880ff14a5d38  EFLAGS: 00010282
[ 1078.523877] RAX: fffe RBX: 880ff004d6f0 RCX: 
00117400
[ 1078.523891] RDX: 001173ff RSI: 8810279f6ea0 RDI: 
ea00409e7d80
[ 1078.523905] RBP: 880ff14a5d58 R08: 60ef80001400 R09: 
a0202c6a
[ 1078.523918] R10:  R11: 00ba R12: 
0001
[ 1078.523932] R13: 881017663c00 R14: 0001 R15: 
88101776f5a0
[ 1078.523946] FS:  7f1d2c03c700() GS:88107fca() 
knlGS:

[ 1078.523961] CS:  0010 DS:  ES:  CR0: 80050033
[ 1078.523990] CR2: 050f4000 CR3: 000ff2a57000 CR4: 
000407e0
[ 1078.524019] DR0:  DR1:  DR2: 

[ 1078.524048] DR3:  DR6: 0ff0 DR7: 
0400
[ 1078.524077] Process ceph-osd (pid: 4108, threadinfo 880ff14a4000, 
task 880ff2aa44a0)

[ 1078.524121] Stack:
[ 1078.524141]  8810279f7460  881017663c00 
880ff004d6f0
[ 1078.524190]  880ff14a5e08 a022f5d8 880ff004d6f0 

[ 1078.524240]  880ff14a5e18 81188afd 8000 
80001000

[ 1078.524289] Call Trace:
[ 1078.524317]  [] btrfs_truncate+0x4d8/0x650 [btrfs]
[ 1078.524348]  [] ? path_lookupat+0x6d/0x750
[ 1078.524380]  [] btrfs_setattr+0xc1/0x1b0 [btrfs]
[ 1078.524408]  [] notify_change+0x183/0x320
[ 1078.524435]  [] do_truncate+0x5e/0xa0
[ 1078.524461]  [] sys_truncate+0x144/0x1b0
[ 1078.524489]  [] system_call_fastpath+0x16/0x1b
[ 1078.524516] Code: 8b 65 e8 4c 8b 6d f0 4c 8b 75 f8 c9 c3 0f 1f 40 00 
80 bb 60 fe ff ff 84 75 c1 eb bb 0f 1f 44 00 00 48 89 df e8 a0 73 fe ff 
eb c1 <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec

[ 1078.524710] RIP  [] btrfs_orphan_del+0xb2/0xc0 [btrfs]
[ 1078.524744]  RSP 
[ 1078.525013] ---[ end trace 88c92720204f7aa4 ]---


That's the drive with the broken btrfs.

[  212.843776] device fsid 28492275-01d3-4e89-9f1c-bd86057194bf devid 1 
transid 4 /dev/sdc

[  212.844630] btrfs: setting nodatacow
[  212.844637] btrfs: enabling auto defrag
[  212.844640] btrfs: disk space caching is enabled
[  212.844643] btrfs flagging fs with big metadata feature



-martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs RAID with enterprise SATA or SAS drives

2012-05-11 Thread Martin Steigerwald
Am Freitag, 11. Mai 2012 schrieb Duncan:
> Daniel Pocock posted on Wed, 09 May 2012 22:01:49 + as excerpted:
> > There is various information about
> > - enterprise-class drives (either SAS or just enterprise SATA)
> > - the SCSI/SAS protocols themselves vs SATA having more advanced
> > features (e.g. for dealing with error conditions)
> > than the average block device
> 
> This isn't a direct answer to that, but expressing a bit of concern
> over  the implications of your question, that you're planning on using
> btrfs in an enterprise class installation.
> 
> While various Enterprise Linux distributions do now officially
> "support"  btrfs, it's worth checking out exactly what that means in
> practice.
> 
> Meanwhile, in mainline Linux kernel terms, btrfs remains very much an 
> experimental filesystem, as expressed by the kernel config option that 
> turns btrfs on.  It's still under very intensive development, with an 
> error-fixing btrfsck only recently available and still coming with its 
> own "may make the problems worse instead of fixing them" warning.  
> Testers willing to risk the chance of data loss implied by that 
> "experimental filesystem" label should be running the latest stable 
> kernel at the oldest, and preferably the rcs by rc5 or so, as new
> kernels  continue to fix problems in older btrfs code as well as
> introduce new features and if you're running an older kernel, that
> means you're running a kernel with known problems that are fixed in
> the latest kernel.
> 
> Experimental also has implications in terms of backups.  A good
> sysadmin  always has backups, but normally, the working copy can be
> considered the primary copy, and there's backups of that.  On an
> experimental filesystem under as intense continued development as
> btrfs, by contrast, it's best to consider your btrfs copy an extra
> "throwaway" copy only intended for testing.  You still have your
> primary copy, along with all the usual backups, on something less
> experimental, since you never know when/where/ how your btrfs testing
> will screw up its copy.

Duncan, did you actually test BTRFS? Theory can´t replace real life 
experience.

>From all of my personal BTRFS installations not one has gone corrupt - and 
I have at least four, while more of them are in use at my employer. Except 
maybe a scratch data BRTFS RAID 0 over lots of SATA disks. But maybe it 
would have been fixable by btrfs-zero-log which I didn´t know of back then. 
Another one needed a btrfs-zero-log, but that was quite some time ago.

Some of the installations are in use for more than a year AFAIR.

While I would still be reluctant with deploying BTRFS for a customer for 
critical data and I think Oracle´s and SUSE´s move to support it officially 
is a bit daring, I don´t think BTRFS is in a "throwaway copy" state 
anymore.

As usual regular backups are important…

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: do not do balance in readonly mode

2012-05-11 Thread Josef Bacik
On Fri, May 11, 2012 at 06:11:26PM +0800, Liu Bo wrote:
> In normal cases, we would not be allowed to do balance in RO mode.
> However, when we're using a seeding device and adding another device to 
> sprout,
> things will change:
> 
> $ mkfs.btrfs /dev/sdb7
> $ btrfstune -S 1 /dev/sdb7
> $ mount /dev/sdb7 /mnt/btrfs -o ro
> $ btrfs fi bal /mnt/btrfs   ---> fail.
> $ btrfs dev add /dev/sdb8 /mnt/btrfs
> $ btrfs fi bal /mnt/btrfs   ---> works!
> 
> It should not be designed as an exception, and we'd better add another check 
> for
> mnt flags.
> 

Added to btrfs-next and added my Reviewed-by.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: fix wrong error returned by adding a device

2012-05-11 Thread Josef Bacik
On Thu, May 10, 2012 at 06:10:38PM +0800, Liu Bo wrote:
> Reproduce:
> $ mkfs.btrfs /dev/sdb7
> $ mount /dev/sdb7 /mnt/btrfs -o ro
> $ btrfs dev add /dev/sdb8 /mnt/btrfs
> ERROR: error adding the device '/dev/sdb8' - Invalid argument
> 
> Since we mount with readonly options, and /dev/sdb7 is not a seeding one,
> a readonly notification is preferred.
> 
> Signed-off-by: Liu Bo 
> ---
>  fs/btrfs/volumes.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 1411b99..48a06d1 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1633,7 +1633,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char 
> *device_path)
>   int ret = 0;
>  
>   if ((sb->s_flags & MS_RDONLY) && !root->fs_info->fs_devices->seeding)
> - return -EINVAL;
> + return -EROFS;
>  
>   bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
> root->fs_info->bdev_holder);
> -- 
> 1.6.5.2
> 

I've committed these to btrfs-next and added my Reviewed-by.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-11 Thread Christian Brunner
2012/5/10 Josef Bacik :
> On Fri, Apr 27, 2012 at 01:02:08PM +0200, Christian Brunner wrote:
>> Am 24. April 2012 18:26 schrieb Sage Weil :
>> > On Tue, 24 Apr 2012, Josef Bacik wrote:
>> >> On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote:
>> >> > After running ceph on XFS for some time, I decided to try btrfs again.
>> >> > Performance with the current "for-linux-min" branch and big metadata
>> >> > is much better. The only problem (?) I'm still seeing is a warning
>> >> > that seems to occur from time to time:
>> >
>> > Actually, before you do that... we have a new tool,
>> > test_filestore_workloadgen, that generates a ceph-osd-like workload on the
>> > local file system.  It's a subset of what a full OSD might do, but if
>> > we're lucky it will be sufficient to reproduce this issue.  Something like
>> >
>> >  test_filestore_workloadgen --osd-data /foo --osd-journal /bar
>> >
>> > will hopefully do the trick.
>> >
>> > Christian, maybe you can see if that is able to trigger this warning?
>> > You'll need to pull it from the current master branch; it wasn't in the
>> > last release.
>>
>> Trying to reproduce with test_filestore_workloadgen didn't work for
>> me. So here are some instructions on how to reproduce with a minimal
>> ceph setup.
>> [...]
>
> Well I feel like an idiot, I finally get it to reproduce, go look at where I
> want to put my printks and theres the problem staring me right in the face.
> I've looked seriously at this problem 2 or 3 times and have missed this every
> single freaking time.  Here is the patch I'm trying, please try it on yours to
> make sure it fixes the problem.  It takes like 2 hours for it to reproduce for
> me so I won't be able to fully test it until tomorrow, but so far it hasn't
> broken anything so it should be good.  Thanks,

Great! I've put your patch on my testbox and will run a test over the
weekend. I'll report back on monday.

Thanks,
Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph on btrfs 3.4rc

2012-05-11 Thread Josef Bacik
On Thu, May 10, 2012 at 04:35:23PM -0400, Josef Bacik wrote:
> On Fri, Apr 27, 2012 at 01:02:08PM +0200, Christian Brunner wrote:
> > Am 24. April 2012 18:26 schrieb Sage Weil :
> > > On Tue, 24 Apr 2012, Josef Bacik wrote:
> > >> On Fri, Apr 20, 2012 at 05:09:34PM +0200, Christian Brunner wrote:
> > >> > After running ceph on XFS for some time, I decided to try btrfs again.
> > >> > Performance with the current "for-linux-min" branch and big metadata
> > >> > is much better. The only problem (?) I'm still seeing is a warning
> > >> > that seems to occur from time to time:
> > >
> > > Actually, before you do that... we have a new tool,
> > > test_filestore_workloadgen, that generates a ceph-osd-like workload on the
> > > local file system.  It's a subset of what a full OSD might do, but if
> > > we're lucky it will be sufficient to reproduce this issue.  Something like
> > >
> > >  test_filestore_workloadgen --osd-data /foo --osd-journal /bar
> > >
> > > will hopefully do the trick.
> > >
> > > Christian, maybe you can see if that is able to trigger this warning?
> > > You'll need to pull it from the current master branch; it wasn't in the
> > > last release.
> > 
> > Trying to reproduce with test_filestore_workloadgen didn't work for
> > me. So here are some instructions on how to reproduce with a minimal
> > ceph setup.
> > 
> > You will need a single system with two disks and a bit of memory.
> > 
> > - Compile and install ceph (detailed instructions:
> > http://ceph.newdream.net/docs/master/ops/install/mkcephfs/)
> > 
> > - For the test setup I've used two tmpfs files as journal devices. To
> > create these, do the following:
> > 
> > # mkdir -p /ceph/temp
> > # mount -t tmpfs tmpfs /ceph/temp
> > # dd if=/dev/zero of=/ceph/temp/journal0 count=500 bs=1024k
> > # dd if=/dev/zero of=/ceph/temp/journal1 count=500 bs=1024k
> > 
> > - Now you should create and mount btrfs. Here is what I did:
> > 
> > # mkfs.btrfs -l 64k -n 64k /dev/sda
> > # mkfs.btrfs -l 64k -n 64k /dev/sdb
> > # mkdir /ceph/osd.000
> > # mkdir /ceph/osd.001
> > # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sda /ceph/osd.000
> > # mount -o noatime,space_cache,inode_cache,autodefrag /dev/sdb /ceph/osd.001
> > 
> > - Create /etc/ceph/ceph.conf similar to the attached ceph.conf. You
> > will probably have to change the btrfs devices and the hostname
> > (os39).
> > 
> > - Create the ceph filesystems:
> > 
> > # mkdir /ceph/mon
> > # mkcephfs -a -c /etc/ceph/ceph.conf
> > 
> > - Start ceph (e.g. "service ceph start")
> > 
> > - Now you should be able to use ceph - "ceph -s" will tell you about
> > the state of the ceph cluster.
> > 
> > - "rbd create -size 100 testimg" will create an rbd image on the ceph 
> > cluster.
> > 
> > - Compile my test with "gcc -o rbdtest rbdtest.c -lrbd" and run it
> > with "./rbdtest testimg".
> > 
> > I can see the first btrfs_orphan_commit_root warning after an hour or
> > so... I hope that I've described all necessary steps. If there is a
> > problem just send me a note.
> > 
> 
> Well I feel like an idiot, I finally get it to reproduce, go look at where I
> want to put my printks and theres the problem staring me right in the face.
> I've looked seriously at this problem 2 or 3 times and have missed this every
> single freaking time.  Here is the patch I'm trying, please try it on yours to
> make sure it fixes the problem.  It takes like 2 hours for it to reproduce for
> me so I won't be able to fully test it until tomorrow, but so far it hasn't
> broken anything so it should be good.  Thanks,
> 

That previous patch was against btrfs-next, this patch is against 3.4-rc6 if you
are on mainline.  Thanks,

Josef


diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 9b9b15f..54af1fa 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -57,9 +57,6 @@ struct btrfs_inode {
/* used to order data wrt metadata */
struct btrfs_ordered_inode_tree ordered_tree;
 
-   /* for keeping track of orphaned inodes */
-   struct list_head i_orphan;
-
/* list of all the delalloc inodes in the FS.  There are times we need
 * to write all the delalloc pages to disk, and this list is used
 * to walk them all.
@@ -156,6 +153,7 @@ struct btrfs_inode {
unsigned dummy_inode:1;
unsigned in_defrag:1;
unsigned delalloc_meta_reserved:1;
+   unsigned has_orphan_item:1;
 
/*
 * always compress this one file
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8fd7233..aad2600 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1375,7 +1375,7 @@ struct btrfs_root {
struct list_head root_list;
 
spinlock_t orphan_lock;
-   struct list_head orphan_list;
+   atomic_t orphan_inodes;
struct btrfs_block_rsv *orphan_block_rsv;
int orphan_item_inserted;
int orphan_cleanup_state;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a7ffc88..ff3bf4b 100644
--- a/

[PATCH] Fix "set-dafault" typo in cmds-subvolume.c

2012-05-11 Thread Chris Samuel
Andrei Popa reported that there were two typos of default as dafault,
this patch fixes those two typos up.

Signed-off-by: Chris Samuel 
---
 cmds-subvolume.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/cmds-subvolume.c b/cmds-subvolume.c
index 950fa8f..23c72b4 100644
--- a/cmds-subvolume.c
+++ b/cmds-subvolume.c
@@ -394,7 +394,7 @@ static int cmd_snapshot(int argc, char **argv)
 }
 
 static const char * const cmd_subvol_get_default_usage[] = {
-   "btrfs subvolume get-dafault ",
+   "btrfs subvolume get-default ",
"Get the default subvolume of a filesystem",
NULL
 };
@@ -432,7 +432,7 @@ static int cmd_subvol_get_default(int argc, char **argv)
 }
 
 static const char * const cmd_subvol_set_default_usage[] = {
-   "btrfs subvolume set-dafault  ",
+   "btrfs subvolume set-default  ",
"Set the default subvolume of a filesystem",
NULL
 };
-- 
1.7.4.1

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP


signature.asc
Description: This is a digitally signed message part.


Re: btrfs tools typo

2012-05-11 Thread Chris Samuel
On Friday 11 May 2012 19:26:16 Andrei Popa wrote:

> In the latest btrfs tools from git it's a typo:

I'll send a patch now.

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP


signature.asc
Description: This is a digitally signed message part.


brfs-progs patches

2012-05-11 Thread Andrei Popa
Hello,

In the btrfs wiki
( https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories ) it
says:
"Hugo Mills maintains an "integration" branch of all the patches for the
userspace tools that have been seen on the mailing list. There are two
important branches in this repository. For "stable" commits that have
been offered upstream, there's the for-chris branch. If you are writing
patches for the userspace tools, you should probably develop against
this branch [To make my life easier -Hugo]. This can be found with:

$ git clone http://git.darksatanic.net/repo/btrfs-progs-unstable.git
$ cd btrfs-progs-unstable
$ git checkout for-chris
"
This year the repository has not been updated at all, is or this
information still valid or it should be removed from the wiki ?

There are some btrfs-progs patches send on the mailing-list which don't
appear in Chris git for btrfs-progs.
Does someone maintain a git tree with them ?

Thanks,
Andrei



smime.p7s
Description: S/MIME cryptographic signature


[PATCH] Btrfs: do not do balance in readonly mode

2012-05-11 Thread Liu Bo
In normal cases, we would not be allowed to do balance in RO mode.
However, when we're using a seeding device and adding another device to sprout,
things will change:

$ mkfs.btrfs /dev/sdb7
$ btrfstune -S 1 /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs -o ro
$ btrfs fi bal /mnt/btrfs   ---> fail.
$ btrfs dev add /dev/sdb8 /mnt/btrfs
$ btrfs fi bal /mnt/btrfs   ---> works!

It should not be designed as an exception, and we'd better add another check for
mnt flags.

Signed-off-by: Liu Bo 
---
 fs/btrfs/ioctl.c |   12 +---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 14f8e1f..f056469 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3212,8 +3212,9 @@ void update_ioctl_balance_args(struct btrfs_fs_info 
*fs_info, int lock,
}
 }
 
-static long btrfs_ioctl_balance(struct btrfs_root *root, void __user *arg)
+static long btrfs_ioctl_balance(struct file *file, void __user *arg)
 {
+   struct btrfs_root *root = BTRFS_I(fdentry(file)->d_inode)->root;
struct btrfs_fs_info *fs_info = root->fs_info;
struct btrfs_ioctl_balance_args *bargs;
struct btrfs_balance_control *bctl;
@@ -3225,6 +3226,10 @@ static long btrfs_ioctl_balance(struct btrfs_root *root, 
void __user *arg)
if (fs_info->sb->s_flags & MS_RDONLY)
return -EROFS;
 
+   ret = mnt_want_write(file->f_path.mnt);
+   if (ret)
+   return ret;
+
mutex_lock(&fs_info->volume_mutex);
mutex_lock(&fs_info->balance_mutex);
 
@@ -3291,6 +3296,7 @@ out_bargs:
 out:
mutex_unlock(&fs_info->balance_mutex);
mutex_unlock(&fs_info->volume_mutex);
+   mnt_drop_write(file->f_path.mnt);
return ret;
 }
 
@@ -3386,7 +3392,7 @@ long btrfs_ioctl(struct file *file, unsigned int
case BTRFS_IOC_DEV_INFO:
return btrfs_ioctl_dev_info(root, argp);
case BTRFS_IOC_BALANCE:
-   return btrfs_ioctl_balance(root, NULL);
+   return btrfs_ioctl_balance(file, NULL);
case BTRFS_IOC_CLONE:
return btrfs_ioctl_clone(file, arg, 0, 0, 0);
case BTRFS_IOC_CLONE_RANGE:
@@ -3419,7 +3425,7 @@ long btrfs_ioctl(struct file *file, unsigned int
case BTRFS_IOC_SCRUB_PROGRESS:
return btrfs_ioctl_scrub_progress(root, argp);
case BTRFS_IOC_BALANCE_V2:
-   return btrfs_ioctl_balance(root, argp);
+   return btrfs_ioctl_balance(file, argp);
case BTRFS_IOC_BALANCE_CTL:
return btrfs_ioctl_balance_ctl(root, arg);
case BTRFS_IOC_BALANCE_PROGRESS:
-- 
1.6.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs tools typo

2012-05-11 Thread Andrei Popa
In the latest btrfs tools from git it's a typo:

ierdnac-hp ~ # btrfs|grep dafault
btrfs subvolume get-dafault 
btrfs subvolume set-dafault  
ierdnac-hp ~ # 

Andrei 
-- 
Andrei Popa
NOC Manager - Nextgen Communications
0760 683 280


smime.p7s
Description: S/MIME cryptographic signature