Poor performance unlinking hard-linked files

2010-11-12 Thread Bron Gondwana
I had a spare piece of hardware sitting around, so I thought I'd test btrfs 
performance with the Cyrus IMAPd server by setting up an extra replica target 
on the spare machine.

Some background on Cyrus replication: when copying a folder the replication 
system first "reserves" all messages it's going to need.  It tries to maintain 
"single instance store" as it's called in Cyrus terminology - hard links 
between identical messages on disk.

This is done in the latest version of Cyrus by storing the sha1 of each file in 
an index, and scanning the currently active mailboxes on the replica to see if 
they already have a copy of the file.  If so, a hard link is made in the 
data/sync./$pid/ directory back to the original file in the mailbox directory.

Cyrus stores one file per email, which pushes filesystems pretty hard.  We used 
reiser3 until recently, and are part way through converting to ext4.

If the file is not already available on the replica, a new copy is uploaded 
directly into the sync./$pid directory.

Either way, when the mailbox is then created or updated, the files get 
hardlinked from the sync./$pid directory to their final location.

They get kept around for a little while, until the sync_server decides it's 
time for a reset because it's using too much memory keeping all the tracking 
data.  Then it unlinks all the files in sync./$pid and starts searching for 
necessary files again.

Most of the time, this means single instance store works - the source and 
destination mailboxes always get heated up by adding both of them to the sync 
log, so the duplication will be found.

-

Anyway, that's the background - a daemon that creates a pile of files in one 
directory, symlinks them out all over the file system, then unlinks all the 
original files later.

We're finding that as the filesystem grows (currently about 30% full on a 300Gb 
filesystem) the unlink performance becomes horrible.  Watching iostat, there's 
a lot of reading going on as well.  It really looks like the unlinks are 
performing pretty badly in this one case.

Ideally there would be a nice filesystem API Cyrus could call that said "delete 
all the files in this directory"!  Failing that, is there anything we can do to 
improve this use case?  Real-time production use isn't QUITE so bad as an 
initial sync, but lmtp delivery uses the same method - spool to staging file, 
parse it there, then symlink to all the delivery targets before unlinking the 
original.

Thanks,

Bron.
-- 
  Bron Gondwana
  br...@fastmail.fm

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kmem_cache_alloc doesn't return ERR_PTR so no need to check for it.

2010-11-12 Thread Chris Samuel
According to scripts/coccinelle/null/eno.cocci "The various basic
memory allocation functions don't return ERR_PTR" so there's no
point in calling IS_ERR() on the return value from them, the
existing test is good enough.

Signed-off-by: Chris Samuel 
---
 fs/btrfs/extent_map.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 23cb8da..8797704 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -50,7 +50,7 @@ struct extent_map *alloc_extent_map(gfp_t mask)
 {
struct extent_map *em;
em = kmem_cache_alloc(extent_map_cache, mask);
-   if (!em || IS_ERR(em))
+   if (!em)
return em;
em->in_tree = 0;
em->flags = 0;
-- 
1.7.0.4

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] 'unused' calculated with wrong sign.

2010-11-12 Thread Josef Bacik
On Sat, Nov 13, 2010 at 12:17:56AM +0100, Arne Jansen wrote:
> 'unused' calculated with wrong sign in reserve_metadata_bytes().
> This might have lead to unwanted over-reservations.
> 
> Signed-off-by: Arne Jansen 
> ---
>  fs/btrfs/extent-tree.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index a541bc8..ddaf634 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -3413,7 +3413,7 @@ again:
>* our reservation.
>*/
>   if (unused <= space_info->total_bytes) {
> - unused -= space_info->total_bytes;
> + unused = space_info->total_bytes - unused;
>   if (unused >= num_bytes) {
>   if (!reserved)
>   space_info->bytes_reserved += orig_bytes;
> -- 
> 1.7.2.2
> 

Good catch

Reviewed-by: Josef Bacik 

Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] 'unused' calculated with wrong sign.

2010-11-12 Thread Arne Jansen
'unused' calculated with wrong sign in reserve_metadata_bytes().
This might have lead to unwanted over-reservations.

Signed-off-by: Arne Jansen 
---
 fs/btrfs/extent-tree.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a541bc8..ddaf634 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3413,7 +3413,7 @@ again:
 * our reservation.
 */
if (unused <= space_info->total_bytes) {
-   unused -= space_info->total_bytes;
+   unused = space_info->total_bytes - unused;
if (unused >= num_bytes) {
if (!reserved)
space_info->bytes_reserved += orig_bytes;
-- 
1.7.2.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: my mail

2010-11-12 Thread Hugo Mills
On Fri, Nov 12, 2010 at 07:33:57PM +, h...@carfax.org.uk wrote:
> From 2de353ddda78ef5cbc84e1d3267606bc44e48faa Mon Sep 17 00:00:00 2001

   Gaah. This worked last night. Sorry. :(

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- "You got very nice eyes, Deedee. Never noticed them ---   
   before. They real?"   


signature.asc
Description: Digital signature


[no subject]

2010-11-12 Thread hugo
>From 2de353ddda78ef5cbc84e1d3267606bc44e48faa Mon Sep 17 00:00:00 2001
Message-Id: 
<2de353ddda78ef5cbc84e1d3267606bc44e48faa.1289589812.git.h...@carfax.org.uk>
From: Hugo Mills 
Date: Sat, 6 Nov 2010 00:18:12 +
Subject: [PATCH] Clean up typography in the man pages.
To: linux-btrfs@vger.kernel.org
Cc: Goffredo Baroncelli 

The man pages are a bit vague about their use of bold and italic, and
don't lay out the meaning of options for each command very well. This
patch tightens up on the type-styles and layout for the main man pages
(btrfs, btrfsck, mkfs.btrfs).

Signed-off-by: Hugo Mills 
---
 man/btrfs.8.in  |  270 +-
 man/btrfsck.8.in|4 +-
 man/mkfs.btrfs.8.in |   39 
 3 files changed, 200 insertions(+), 113 deletions(-)

diff --git a/man/btrfs.8.in b/man/btrfs.8.in
index 26ef982..7569a9e 100644
--- a/man/btrfs.8.in
+++ b/man/btrfs.8.in
@@ -1,39 +1,73 @@
+.so an
 .TH BTRFS 8 "" "btrfs" "btrfs"
 .\"
 .\" Man page written by Goffredo Baroncelli  (Feb 2010)
+.\" Typography fixed by Hugo Mills  (Oct 2010)
 .\"
 .SH NAME
 btrfs \- control a btrfs filesystem
 .SH SYNOPSIS
-\fBbtrfs\fP \fBsubvolume snapshot\fP\fI  [/]\fP
+\fBsubvolume snapshot \fI \fR[\fI\fR] \fI\fR
 .PP
-\fBbtrfs\fP \fBsubvolume delete\fP\fI \fP
+
+.BI "btrfs subvolume delete " 
+.PP
+
+.B btrfs subvolume create
+.RI [  ] " "
 .PP
-\fBbtrfs\fP \fBsubvolume create\fP\fI [/]\fP
+
+.BI "btrfs subvolume list " 
+.PP
+
+.BI "btrfs subvolume set-default " " "
 .PP
-\fBbtrfs\fP \fBsubvolume list\fP\fI \fP
+
+.B "btrfs filesystem defragment "
+.RB [ -vcf "] [" -s
+.IR  ]
+.RB [ -l
+.IR  ]
+.RB [ -t
+.IR  "] "  |  " ..."
 .PP
-\fBbtrfs\fP \fBsubvolume set-default\fP\fI  \fP
+
+.BI "btrfs filesystem sync " 
 .PP
-\fBbtrfs\fP \fBfilesystem defrag\fP\fI | [|...]\fP
+
+.BR "btrfs filesystem resize " [ + | \- ] \fI\fP [ gmk ]| max
+.I 
 .PP
-\fBbtrfs\fP \fBfilesystem sync\fP\fI  \fP
+
+.BR "btrfs filesystem df " [ -h | --human-readable | -H | --si ]
++.I 
 .PP
-\fBbtrfs\fP \fBfilesystem resize\fP\fI [+/\-][gkm]|max \fP
+
+.BR "btrfs filesystem show " [ -h | --human-readable | -H | --si ]
+.RI [  |  ]
 .PP
-\fBbtrfs\fP \fBdevice scan\fP\fI [ [..]]\fP
+.B btrfs device scan
+.RI [  ] " ..."
 .PP
-\fBbtrfs\fP \fBdevice show\fP\fI | [|...]\fP
+
+.B btrfs device show
+.IR  |  " ..."
 .PP
-\fBbtrfs\fP \fBdevice balance\fP\fI  \fP
+
+.BI "btrfs device balance " 
 .PP
-\fBbtrfs\fP \fBdevice add\fP\fI  [..]  \fP
+
+.BI "btrfs device add " 
+.RI [  " ... ]" 
 .PP
-\fBbtrfs\fP \fBdevice delete\fP\fI  [..]  \fP]
 
+.B "btrfs device delete"
+.IR  [  " ... ]" 
 .PP
-\fBbtrfs\fP \fBhelp|\-\-help|\-h \fP\fI\fP
+
+.BR "btrfs help" | \-\-help | \-h
 .PP
+
 .SH DESCRIPTION
 .B btrfs
 is used to control the filesystem and the files and directories stored. It is
@@ -42,123 +76,174 @@ filesystem, to defrag a file or a directory, flush the 
data to the disk,
 to resize the filesystem, to scan the device.
 
 It is possible to abbreviate the commands unless the commands  are ambiguous.
-For example: it is possible to run
-.I btrfs sub snaps
+For example, it is possible to run
+.B btrfs sub snaps
 instead of
-.I btrfs subvolume snapshot.
+.B btrfs subvolume snapshot.
 But
-.I btrfs dev s
+.B btrfs dev s
 is not allowed, because
-.I dev s
+.B dev s
 may be interpreted both as
-.I device show
+.B device show
 and as
-.I device scan.
+.B device scan.
 In this case
-.I btrfs
+.B btrfs
 returns an error.
 
 If a command is terminated by
-.I --help
-, the relevant help is showed. If the passed command matches more commands,
-the help of all the matched commands are showed. For example
-.I btrfs dev --help
+.B --help
+, the relevant help is shown. If the passed command matches more commands,
+the help of all the matched commands is shown. For example
+.B btrfs dev --help
 shows the help of all
-.I device*
-command.
+.B device*
+commands.
 
 .SH COMMANDS
-.TP
+.SS
+subvolume snapshot \fI \fR[\fI\fR] \fI\fR
 
-\fBsubvolume snapshot\fR\fI  [/]\fR
-Create a writable snapshot of the subvolume \fI\fR with the name
-\fI\fR in the \fI\fR directory. If \fI\fR is not a
-subvolume, \fBbtrfs\fR returns an error.
-.TP
+Create a writable snapshot of the subvolume \fI\fP with the
+name \fI\fP in the \fI\fP directory. If \fI\fP is
+not a subvolume, \fBbtrfs\fP returns an error.
 
-\fBsubvolume delete\fR\fI \fR
-Delete the subvolume \fI\fR. If \fI\fR is not a
-subvolume, \fBbtrfs\fR returns an error.
-.TP
+.SS
+subvolume delete \fI\fP
 
-\fBsubvolume create\fR\fI [/]\fR
-Create a subvolume in \fI\fR (or in the current directory if
-\fI\fR is omitted).
-.TP
+Delete the subvolume \fI\fP. If \fI\fP is not a
+subvolume, \fBbtrfs\fP returns an error.
 
-\fBsubvolume list\fR\fI \fR
-List the subvolumes present in the filesystem \fI\fR. For every
-subvolume is showed the subvolume ID (second column), 
-the ID of the \fItop level\fR 
-subvolume (fifth column), and the path (seventh column) relative to the
-\fItop level\fR subvolume.
-These  may 

[no subject]

2010-11-12 Thread hugo
>From 2de353ddda78ef5cbc84e1d3267606bc44e48faa Mon Sep 17 00:00:00 2001
Message-Id: 
<2de353ddda78ef5cbc84e1d3267606bc44e48faa.1289589812.git.h...@carfax.org.uk>
From: Hugo Mills 
Date: Sat, 6 Nov 2010 00:18:12 +
Subject: [PATCH] Clean up typography in the man pages.
To: linux-btrfs@vger.kernel.org
Cc: Goffredo Baroncelli 

The man pages are a bit vague about their use of bold and italic, and
don't lay out the meaning of options for each command very well. This
patch tightens up on the type-styles and layout for the main man pages
(btrfs, btrfsck, mkfs.btrfs).

Signed-off-by: Hugo Mills 
---
 man/btrfs.8.in  |  270 +-
 man/btrfsck.8.in|4 +-
 man/mkfs.btrfs.8.in |   39 
 3 files changed, 200 insertions(+), 113 deletions(-)

diff --git a/man/btrfs.8.in b/man/btrfs.8.in
index 26ef982..7569a9e 100644
--- a/man/btrfs.8.in
+++ b/man/btrfs.8.in
@@ -1,39 +1,73 @@
+.so an
 .TH BTRFS 8 "" "btrfs" "btrfs"
 .\"
 .\" Man page written by Goffredo Baroncelli  (Feb 2010)
+.\" Typography fixed by Hugo Mills  (Oct 2010)
 .\"
 .SH NAME
 btrfs \- control a btrfs filesystem
 .SH SYNOPSIS
-\fBbtrfs\fP \fBsubvolume snapshot\fP\fI  [/]\fP
+\fBsubvolume snapshot \fI \fR[\fI\fR] \fI\fR
 .PP
-\fBbtrfs\fP \fBsubvolume delete\fP\fI \fP
+
+.BI "btrfs subvolume delete " 
+.PP
+
+.B btrfs subvolume create
+.RI [  ] " "
 .PP
-\fBbtrfs\fP \fBsubvolume create\fP\fI [/]\fP
+
+.BI "btrfs subvolume list " 
+.PP
+
+.BI "btrfs subvolume set-default " " "
 .PP
-\fBbtrfs\fP \fBsubvolume list\fP\fI \fP
+
+.B "btrfs filesystem defragment "
+.RB [ -vcf "] [" -s
+.IR  ]
+.RB [ -l
+.IR  ]
+.RB [ -t
+.IR  "] "  |  " ..."
 .PP
-\fBbtrfs\fP \fBsubvolume set-default\fP\fI  \fP
+
+.BI "btrfs filesystem sync " 
 .PP
-\fBbtrfs\fP \fBfilesystem defrag\fP\fI | [|...]\fP
+
+.BR "btrfs filesystem resize " [ + | \- ] \fI\fP [ gmk ]| max
+.I 
 .PP
-\fBbtrfs\fP \fBfilesystem sync\fP\fI  \fP
+
+.BR "btrfs filesystem df " [ -h | --human-readable | -H | --si ]
++.I 
 .PP
-\fBbtrfs\fP \fBfilesystem resize\fP\fI [+/\-][gkm]|max \fP
+
+.BR "btrfs filesystem show " [ -h | --human-readable | -H | --si ]
+.RI [  |  ]
 .PP
-\fBbtrfs\fP \fBdevice scan\fP\fI [ [..]]\fP
+.B btrfs device scan
+.RI [  ] " ..."
 .PP
-\fBbtrfs\fP \fBdevice show\fP\fI | [|...]\fP
+
+.B btrfs device show
+.IR  |  " ..."
 .PP
-\fBbtrfs\fP \fBdevice balance\fP\fI  \fP
+
+.BI "btrfs device balance " 
 .PP
-\fBbtrfs\fP \fBdevice add\fP\fI  [..]  \fP
+
+.BI "btrfs device add " 
+.RI [  " ... ]" 
 .PP
-\fBbtrfs\fP \fBdevice delete\fP\fI  [..]  \fP]
 
+.B "btrfs device delete"
+.IR  [  " ... ]" 
 .PP
-\fBbtrfs\fP \fBhelp|\-\-help|\-h \fP\fI\fP
+
+.BR "btrfs help" | \-\-help | \-h
 .PP
+
 .SH DESCRIPTION
 .B btrfs
 is used to control the filesystem and the files and directories stored. It is
@@ -42,123 +76,174 @@ filesystem, to defrag a file or a directory, flush the 
data to the disk,
 to resize the filesystem, to scan the device.
 
 It is possible to abbreviate the commands unless the commands  are ambiguous.
-For example: it is possible to run
-.I btrfs sub snaps
+For example, it is possible to run
+.B btrfs sub snaps
 instead of
-.I btrfs subvolume snapshot.
+.B btrfs subvolume snapshot.
 But
-.I btrfs dev s
+.B btrfs dev s
 is not allowed, because
-.I dev s
+.B dev s
 may be interpreted both as
-.I device show
+.B device show
 and as
-.I device scan.
+.B device scan.
 In this case
-.I btrfs
+.B btrfs
 returns an error.
 
 If a command is terminated by
-.I --help
-, the relevant help is showed. If the passed command matches more commands,
-the help of all the matched commands are showed. For example
-.I btrfs dev --help
+.B --help
+, the relevant help is shown. If the passed command matches more commands,
+the help of all the matched commands is shown. For example
+.B btrfs dev --help
 shows the help of all
-.I device*
-command.
+.B device*
+commands.
 
 .SH COMMANDS
-.TP
+.SS
+subvolume snapshot \fI \fR[\fI\fR] \fI\fR
 
-\fBsubvolume snapshot\fR\fI  [/]\fR
-Create a writable snapshot of the subvolume \fI\fR with the name
-\fI\fR in the \fI\fR directory. If \fI\fR is not a
-subvolume, \fBbtrfs\fR returns an error.
-.TP
+Create a writable snapshot of the subvolume \fI\fP with the
+name \fI\fP in the \fI\fP directory. If \fI\fP is
+not a subvolume, \fBbtrfs\fP returns an error.
 
-\fBsubvolume delete\fR\fI \fR
-Delete the subvolume \fI\fR. If \fI\fR is not a
-subvolume, \fBbtrfs\fR returns an error.
-.TP
+.SS
+subvolume delete \fI\fP
 
-\fBsubvolume create\fR\fI [/]\fR
-Create a subvolume in \fI\fR (or in the current directory if
-\fI\fR is omitted).
-.TP
+Delete the subvolume \fI\fP. If \fI\fP is not a
+subvolume, \fBbtrfs\fP returns an error.
 
-\fBsubvolume list\fR\fI \fR
-List the subvolumes present in the filesystem \fI\fR. For every
-subvolume is showed the subvolume ID (second column), 
-the ID of the \fItop level\fR 
-subvolume (fifth column), and the path (seventh column) relative to the
-\fItop level\fR subvolume.
-These  may 

Re: [PATCH v2 2/2] Cancel filesystem balance.

2010-11-12 Thread Hugo Mills
On Fri, Nov 12, 2010 at 11:36:55AM +, Hugo Mills wrote:
> On Fri, Nov 12, 2010 at 03:28:08PM +1100, Chris Samuel wrote:
> > On 12/11/10 12:33, Li Zefan wrote:
> > 
> > > Is there any blocker that prevents us from canceling balance
> > > by just Ctrl+C ?
> > 
> > Given that there's been at least 1 report of it taking 12 hours
> > to balance a non-trivial amount of data I suspect putting this
> > operation into the background by default and having the cancel
> > option might be a better plan.
> 
>Only 12 hours? Last time I tried it, it took 19. :)
> 
>It would certainly be easy enough to fork a copy of the userspace
> tool to run the ioctl in the background. Probably a little more work
> to make the balance a kernel thread. I'd prefer the former, for
> ease of implementation.

   How's this?


This patch makes a balance operation fork and detach from the current
terminal, to run the userspace side of the balance in the background.

Introduce a --wait switch so that a synchronous balance can be done if
the user requires.

Signed-off-by: Hugo Mills 
---
 btrfs.c|8 
 btrfs_cmds.c   |   56 +---
 man/btrfs.8.in |2 +-
 3 files changed, 58 insertions(+), 8 deletions(-)

diff --git a/btrfs.c b/btrfs.c
index 93f7886..7b42658 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -91,12 +91,12 @@ static struct Command commands[] = {
  "filesystem df", "\n"
"Show space usage information for a mount point\n."
},
-   { do_balance, 1,
- "filesystem balance", "\n"
+   { do_balance, -1,
+ "filesystem balance", "[-w|--wait] \n"
"Balance the chunks across the device."
},
-   { do_balance, 1,
- "balance start", "\n"
+   { do_balance, -1,
+ "balance start", "[-w|--wait] \n"
"Synonym for \"btrfs filesystem balance\"."
},
{ do_balance_progress, -1,
diff --git a/btrfs_cmds.c b/btrfs_cmds.c
index d246a8b..13be603 100644
--- a/btrfs_cmds.c
+++ b/btrfs_cmds.c
@@ -754,12 +754,41 @@ int do_add_volume(int nargs, char **args)
 
 }
 
+const struct option balance_options[] = {
+   { "wait", 0, NULL, 'w' },
+   { NULL, 0, NULL, 0 }
+};
+
 int do_balance(int argc, char **argv)
 {
-
int fdmnt, ret=0;
+   int background = 1;
struct btrfs_ioctl_vol_args args;
-   char*path = argv[1];
+   char *path;
+   int ttyfd;
+
+   optind = 1;
+   while(1) {
+   int c = getopt_long(argc, argv, "w", balance_options, NULL);
+   if (c < 0)
+   break;
+   switch(c) {
+   case 'w':
+   background = 0;
+   break;
+   default:
+   fprintf(stderr, "Invalid arguments for balance\n");
+   free(argv);
+   return 1;
+   }
+   }
+
+   if(optind >= argc) {
+   fprintf(stderr, "No filesystem path given for balance\n");
+   return 1;
+   }
+
+   path = argv[optind];
 
fdmnt = open_file_or_dir(path);
if (fdmnt < 0) {
@@ -767,8 +796,29 @@ int do_balance(int argc, char **argv)
return 12;
}
 
+   if (background) {
+   int pid = fork();
+   if (pid == 0) {
+   /* We're in the child, and can run in the background */
+   ttyfd = open("/dev/tty", O_RDWR);
+   if (ttyfd > 0)
+   ioctl(ttyfd, TIOCNOTTY, 0);
+   /* Fall through to the BTRFS_IOC_BALANCE ioctl */
+   } else if (pid > 0) {
+   /* We're in the parent, and the fork succeeded */
+   printf("Background balance started\n");
+   return 0;
+   } else {
+   /* We're in the parent, and the fork failed */
+   fprintf(stderr, "ERROR: can't start background process 
-- %s\n",
+   strerror(errno));
+   }
+   }
+
memset(&args, 0, sizeof(args));
-   ret = ioctl(fdmnt, BTRFS_IOC_BALANCE, &args);
+   printf("ioctl\n");
+   sleep(60);
+   /* ret = ioctl(fdmnt, BTRFS_IOC_BALANCE, &args); */
close(fdmnt);
if(ret<0){
fprintf(stderr, "ERROR: balancing '%s'\n", path);
diff --git a/man/btrfs.8.in b/man/btrfs.8.in
index 3f7642e..1410aaa 100644
--- a/man/btrfs.8.in
+++ b/man/btrfs.8.in
@@ -27,7 +27,7 @@ btrfs \- control a btrfs filesystem
 .PP
 \fBbtrfs\fP \fBdevice show\fP\fI | [|...]\fP
 .PP
-\fBbtrfs\fP \fBdevice balance\fP\fI  \fP
+\fBbtrfs\fP \fBdevice balance\fP [\fB-w\fP|\fB--wait\fP] \fI\fP
 .PP
 \fBbtrfs\fP \fBdevice add\fP\fI  [..]  \fP
 .PP

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwke

Re: Btrfs-progs: Update man page for mixed data+metadata option.

2010-11-12 Thread Mike Fedyk
On Fri, Nov 12, 2010 at 6:28 AM, Marek Otahal  wrote:
> On Friday 12 of November 2010 18:44:12 you wrote:
>> On Thu, Nov 11, 2010 at 11:41 PM, Josef Bacik  wrote:
>> > On Fri, Nov 12, 2010 at 05:47:14PM +1100, Chris Samuel wrote:
>> >> On 11/11/10 23:52, Josef Bacik wrote:
>> >> > This feature incurs a performance penalty in larger filesystems, it is
>> >> > recommended for use with filesystems of 1 GiB or smaller.
>> >>
>> >> Maybe slightly stronger, for example:
>> >>
>> >> This feature incurs a performance penalty for larger filesystems and it
>> >> is ONLY recommended for use with filesystems of 1 GiB or smaller.
>> >>
>> >> Is it worth having a check and a warning printed if a user does
>> >> try and make a filesystem larger than 1GiB with this option ?
>> >>
>> >> Just in case they don't RTFM...
>> >
>> > No because depending on your usage it's actually kind of usefull for
>> > anything less than 5 GiB, and you're only looking at about a 5-10% perf
>> > degredation when using it on larger filesystems.  Thanks,
>>
>> Then a warning of 10% slowdown if > 10GB would be good.  It's
>> surprising how many will just read some forum post and not concern
>> themselves with the docs at all.
>>
>> And making them type "yes" if > 100GB is probably a good idea too...
> My 2c: I'm against bloating the program just because of people who don't RTFM.
> Just mention it clearly in docs and that's enough, linux does what it's asked
> for, not the "Are you really really sure you want to do this?" known from some
> other OS. Anyway, btrfs-progs would be probably run by a user with root

I was thinking of what ssh does when it sees a changed key...
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs-progs: Update man page for mixed data+metadata option.

2010-11-12 Thread Mitch Harder
On Fri, Nov 12, 2010 at 4:44 AM, Mike Fedyk  wrote:
> On Thu, Nov 11, 2010 at 11:41 PM, Josef Bacik  wrote:
>> On Fri, Nov 12, 2010 at 05:47:14PM +1100, Chris Samuel wrote:
>>> On 11/11/10 23:52, Josef Bacik wrote:
>>>
>>> > This feature incurs a performance penalty in larger filesystems, it is
>>> > recommended for use with filesystems of 1 GiB or smaller.
>>>
>>> Maybe slightly stronger, for example:
>>>
>>> This feature incurs a performance penalty for larger filesystems and it
>>> is ONLY recommended for use with filesystems of 1 GiB or smaller.
>>>
>>> Is it worth having a check and a warning printed if a user does
>>> try and make a filesystem larger than 1GiB with this option ?
>>>
>>> Just in case they don't RTFM...
>>>
>>
>> No because depending on your usage it's actually kind of usefull for anything
>> less than 5 GiB, and you're only looking at about a 5-10% perf degredation 
>> when
>> using it on larger filesystems.  Thanks,
>>
>
> Then a warning of 10% slowdown if > 10GB would be good.  It's
> surprising how many will just read some forum post and not concern
> themselves with the docs at all.
>
> And making them type "yes" if > 100GB is probably a good idea too...
>

Just for clarification, you'll probably see a ~5-10% slowdown for any
size partition, not just "if > 10 GB"

But for smaller filesystems (~<5 GiB), you may want to accept the
performance penalty for more efficient disk space utilization.

For really small filesystems (~<1 GiB), using Btrfs defaults can
really start to impact on space utilization.  So the parent
data+metadata patch currently forces the data+metadata option on <1
GiB.

The Wiki is probably a better place for more extensive discussion of
the merits and trade-offs of this option.

The user still needs to actively "opt-in" to this option (unless their
partition is < GiB), and the man page will indicate a performance
penalty is incurred.

For something as important as changing your filesystem default
settings, it seems fair to expect the user has done their homework
when changing from the default setting.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs-progs: Update man page for mixed data+metadata option.

2010-11-12 Thread Marek Otahal
On Friday 12 of November 2010 18:44:12 you wrote:
> On Thu, Nov 11, 2010 at 11:41 PM, Josef Bacik  wrote:
> > On Fri, Nov 12, 2010 at 05:47:14PM +1100, Chris Samuel wrote:
> >> On 11/11/10 23:52, Josef Bacik wrote:
> >> > This feature incurs a performance penalty in larger filesystems, it is
> >> > recommended for use with filesystems of 1 GiB or smaller.
> >> 
> >> Maybe slightly stronger, for example:
> >> 
> >> This feature incurs a performance penalty for larger filesystems and it
> >> is ONLY recommended for use with filesystems of 1 GiB or smaller.
> >> 
> >> Is it worth having a check and a warning printed if a user does
> >> try and make a filesystem larger than 1GiB with this option ?
> >> 
> >> Just in case they don't RTFM...
> > 
> > No because depending on your usage it's actually kind of usefull for
> > anything less than 5 GiB, and you're only looking at about a 5-10% perf
> > degredation when using it on larger filesystems.  Thanks,
> 
> Then a warning of 10% slowdown if > 10GB would be good.  It's
> surprising how many will just read some forum post and not concern
> themselves with the docs at all.
> 
> And making them type "yes" if > 100GB is probably a good idea too...
My 2c: I'm against bloating the program just because of people who don't RTFM. 
Just mention it clearly in docs and that's enough, linux does what it's asked 
for, not the "Are you really really sure you want to do this?" known from some 
other OS. Anyway, btrfs-progs would be probably run by a user with root 
privileges and such should be aware of what actions they do, or read the man 
page. My opinion. 
Cheers, Mark

-- 

Marek Otahal :o)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] Cancel filesystem balance.

2010-11-12 Thread Hugo Mills
On Fri, Nov 12, 2010 at 03:28:08PM +1100, Chris Samuel wrote:
> On 12/11/10 12:33, Li Zefan wrote:
> 
> > Is there any blocker that prevents us from canceling balance
> > by just Ctrl+C ?
> 
> Given that there's been at least 1 report of it taking 12 hours
> to balance a non-trivial amount of data I suspect putting this
> operation into the background by default and having the cancel
> option might be a better plan.

   Only 12 hours? Last time I tried it, it took 19. :)

   It would certainly be easy enough to fork a copy of the userspace
tool to run the ioctl in the background. Probably a little more work
to make the balance a kernel thread. I'd prefer the former, for
ease of implementation.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Great oxymorons of the world, no.  3: Military Intelligence ---   


signature.asc
Description: Digital signature


Re: [PATCH v2 2/2] Cancel filesystem balance.

2010-11-12 Thread Sander
Helmut Hullen wrote (ao):
> Du meintest am 12.11.10:
> > My humble opinion: I very much like the way mdadm works, with the
> > progress bar in /proc/mdstat if an array is rebuilding for example.
> 
> Hmmm - it blocks the console for a long time.

Actually, mdadm doesn't block the console as it returns the prompt
immediately :-)

Sander

-- 
Humilis IT Services and Solutions
http://www.humilis.net
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] Cancel filesystem balance.

2010-11-12 Thread Helmut Hullen
Hallo, Sander,

Du meintest am 12.11.10:

>> Given that there's been at least 1 report of it taking 12 hours
>> to balance a non-trivial amount of data I suspect putting this
>> operation into the background by default and having the cancel
>> option might be a better plan.
>>
>> Thoughts ?

> My humble opinion: I very much like the way mdadm works, with the
> progress bar in /proc/mdstat if an array is rebuilding for example.

Hmmm - it blocks the console for a long time.

I prefer running those long time jobs via "at"; all messages go into a  
mail to root (or whoever has started the job).

"balance" seems to produce a logfile entry every 40 seconds, that may be  
about 1500 lines for a 1 TByte job.
A progress bar produces similar messages, but they aren't as good  
readable in an e-mail ...

I know those long and nearly unreadable mails p.e. from "squidGuard" ...

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Linear (JBOD) Array Mode

2010-11-12 Thread Gordan Bobic

Is there an option in btrfs for this mode of RAID? I know it supports
the equivalent of RAID10, but what I am after is JBOD of mirrors. The
reason I want this is for making a really low power home NAS, typically
for home theater/media use. I believe this would yield better power
savings in the average case.

Here is my reasoning. This sort of a setup would be idle most of the
time, and the disks can safely be spun down. When a request comes to
read a file (typically a multi-MB or multi-GB sequentially read file),
in RAID0 all the disks would wake up because each chunk would come off
as different disk. In JBOD, only the disk that has the file on it would
need to be awake to serve that file while the rest can go to sleep.

Of course JBOD yields relatively poor performance, but since an average
SATA disk will saturate a Gb ethernet link with a linear read (the sort
that I envisage my storage box doing most of the time), performance
isn't really an issue.

I know I can create two RAID0 stripes and create a BTRFS mirror on top
of that, but if I were to do that, then a disk failure would mean
re-mirroring the entire stripe instead of just one disk, which is not a
sane solution on a big multi-TB array.

Is there a way to do this with BTRFS?

Gordan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs-progs: Update man page for mixed data+metadata option.

2010-11-12 Thread Mike Fedyk
On Thu, Nov 11, 2010 at 11:41 PM, Josef Bacik  wrote:
> On Fri, Nov 12, 2010 at 05:47:14PM +1100, Chris Samuel wrote:
>> On 11/11/10 23:52, Josef Bacik wrote:
>>
>> > This feature incurs a performance penalty in larger filesystems, it is
>> > recommended for use with filesystems of 1 GiB or smaller.
>>
>> Maybe slightly stronger, for example:
>>
>> This feature incurs a performance penalty for larger filesystems and it
>> is ONLY recommended for use with filesystems of 1 GiB or smaller.
>>
>> Is it worth having a check and a warning printed if a user does
>> try and make a filesystem larger than 1GiB with this option ?
>>
>> Just in case they don't RTFM...
>>
>
> No because depending on your usage it's actually kind of usefull for anything
> less than 5 GiB, and you're only looking at about a 5-10% perf degredation 
> when
> using it on larger filesystems.  Thanks,
>

Then a warning of 10% slowdown if > 10GB would be good.  It's
surprising how many will just read some forum post and not concern
themselves with the docs at all.

And making them type "yes" if > 100GB is probably a good idea too...
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] Cancel filesystem balance.

2010-11-12 Thread Andreas Philipp

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
On 12.11.2010 10:07, Sander wrote:
> Chris Samuel wrote (ao):
>> On 12/11/10 12:33, Li Zefan wrote:
>>
>>> Is there any blocker that prevents us from canceling balance by
>>> just Ctrl+C ?
>>
>> Given that there's been at least 1 report of it taking 12 hours
>> to balance a non-trivial amount of data I suspect putting this
>> operation into the background by default and having the cancel
>> option might be a better plan.
>>
>> Thoughts ?
>
> My humble opinion: I very much like the way mdadm works, with the
> progress bar in /proc/mdstat if an array is rebuilding for
> example.
>
> Sander
Just my personal opinion: I would like the balance to happen in
background by default (with an option for foreground operation as
well) and another command/option to cancel the operation. Of course, a
progress bar (together with some estimate of the remaining time?) like
the one mdadm offers would be great.

Andreas Philipp
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
iQIcBAEBAgAGBQJM3QhMAAoJEJIcBJ3+XkgioZoQAJ5bAD1xShKfda9oPK75x707
0AYu7C4sUSbu/3BINCv7XEv8MNBU++2FezRc/weSXjgmkhX4pXpMBhFZL6edlpot
VQ0y87rJ79HPe1NibiAmp4HOW95J2+R987mOJpIU10xU1TvpWAhpQextUKi27eg+
Z+XO/vm0oCv0+6YSqKnjpUPNd12r5zo7msanPnny5t57oFDXd6LCRjflt5FsP6mI
HKdM3R3EiBdlCyJsBpjDCZmJpUKvOFqoqn2OT/g7BHkE6XjuY0HqUysFLbNtWGIr
zhnx95W+cqgFY1YAvLwbrirtzX8MvO4R83c+klwQJPM9eL3+GkxyMrbnN3uQ4ie+
1wsgVCyyVT6QFLXnVeqo4ZvSNZh2/9S6waL1T0cFj1YAKBnDI3mCPo+S9CcIUuH4
KnOv3bqA3okAy0WUh3FuWNOc9fMX3tEPtE+b9JqDo1BmG0ZdnsGEdbJQxRe1xRFG
CwWsT1efV1tXEqrfsigxAlpMj9PY/uagwEYmhjQG1QU9/yGhXeZYfvE/cdYD5+Nj
chB2CpHISyQSLKAqwvLRF/tmkIjMRWrW2O3j3RGmkyEyrGl7eFKM0//1n1kmkGMc
SEAF3/fn61Wt/4YtA9r1py1Xe1deNhBGVuQQe2M1YMhITV0wSjIWPjmeL0eYu8n7
EsJdFOXoQIxIlkI7/lB+
=FQQ8
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] Cancel filesystem balance.

2010-11-12 Thread Sander
Chris Samuel wrote (ao):
> On 12/11/10 12:33, Li Zefan wrote:
> 
> > Is there any blocker that prevents us from canceling balance
> > by just Ctrl+C ?
> 
> Given that there's been at least 1 report of it taking 12 hours
> to balance a non-trivial amount of data I suspect putting this
> operation into the background by default and having the cancel
> option might be a better plan.
> 
> Thoughts ?

My humble opinion: I very much like the way mdadm works, with the
progress bar in /proc/mdstat if an array is rebuilding for example.

Sander

-- 
Humilis IT Services and Solutions
http://www.humilis.net
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] Cancel filesystem balance.

2010-11-12 Thread Helmut Hullen
Hallo, Chris,

Du meintest am 12.11.10:

>> Is there any blocker that prevents us from canceling balance
>> by just Ctrl+C ?

> Given that there's been at least 1 report of it taking 12 hours
> to balance a non-trivial amount of data I suspect putting this
> operation into the background by default and having the cancel
> option might be a better plan.

> Thoughts ?

Here: 17 hours for adding a 1.5-TByte partition with 1 TByte data.  
That's no foreground job. Even monitoring blocks a console for a long  
time.

I've seen log entries in "/var/log/messages" - they don't (didn't) show  
how much work was remaining, but they showed at least that btrfs was  
still working hard.

Nov  9 06:52:35 Arktur kernel: btrfs: relocating block group 1546888675328 
flags 1
Nov  9 06:53:20 Arktur kernel: btrfs: relocating block group 1545814933504 
flags 1
...
Nov 10 00:14:01 Arktur kernel: btrfs: relocating block group 12582912 flags 1
Nov 10 00:14:06 Arktur kernel: btrfs: relocating block group 4194304 flags 4

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html