Re: extremely slow syncing on btrfs with 2.6.39.1

2011-07-12 Thread Jan Stilow
On 07/11/2011 02:18 AM, Kok, Auke-jan H wrote:
 I've been monitoring the lists for a while now but didn't see this
 problem mentioned in particular: I've got a fairly standard desktop
 system at home, 700gb WD drive, nothing special, with 2 btrfs
 filesystems and some snapshots. The system runs for days, and I've
 noticed unusual disk activity the other evening - turns out that it's
 taking forever to sync().
 
 $ uname -r
 2.6.39.1
 $ grep btrfs /proc/mounts
 /dev/root / btrfs rw,relatime 0 0# is /dev/sdb2 #
 /dev/sdb5 /home btrfs rw,relatime 0 0
 $ time sync
 
 real  1m5.552s
 user  0m0.000s
 sys   0m2.102s
 
 $ time sync
 
 real  1m16.830s
 user  0m0.001s
 sys   0m1.490s
 
 $ df -h / /home
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/root47G   33G  7.7G  82% /
 /dev/sdb5   652G  216G  421G  34% /home
 $ btrfs fi df /
 Data: total=35.48GB, used=29.86GB
 System, DUP: total=16.00MB, used=12.00KB
 System: total=4.00MB, used=0.00
 Metadata, DUP: total=4.50GB, used=1.67GB
  $ btrfs fi df /home
 Data: total=310.01GB, used=209.53GB
 System, DUP: total=8.00MB, used=48.00KB
 System: total=4.00MB, used=0.00
 Metadata, DUP: total=11.00GB, used=2.98GB
 Metadata: total=8.00MB, used=0.00
 
 I'll switch to 3.0 soon, but, given the fact that we're going to be
 running MeeGo on 2.6.39 probably for a while, I was wondering if
 anyone knows off the top of their heads if this issue is
 known/identified. If not then I'll need to make someone do some
 patching ;).
 
 Auke

You should read the thread Abysmal Performance of these mailing list
from last month. They had a similar problem and downgraded to a 2.6.38
kernel. By the way, that works for me too for the time being.

Best Regards.

Jan Stilow
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: extremely slow syncing on btrfs with 2.6.39.1

2011-07-12 Thread Li Zefan
 I've been monitoring the lists for a while now but didn't see this
 problem mentioned in particular: I've got a fairly standard desktop
 system at home, 700gb WD drive, nothing special, with 2 btrfs
 filesystems and some snapshots. The system runs for days, and I've
 noticed unusual disk activity the other evening - turns out that it's
 taking forever to sync().

 $ uname -r
 2.6.39.1
 $ grep btrfs /proc/mounts
 /dev/root / btrfs rw,relatime 0 0# is /dev/sdb2 # /dev/sdb5 /home
 btrfs rw,relatime 0 0 $ time sync

 real1m5.552s
 user0m0.000s
 sys 0m2.102s

 $ time sync

 real1m16.830s
 user0m0.001s
 sys 0m1.490s

 $ df -h / /home
 Filesystem  Size  Used Avail Use% Mounted on /dev/root47G  
 33G  7.7G  82% / /dev/sdb5   652G  216G  421G  34% /home $ btrfs fi
 df /
 Data: total=35.48GB, used=29.86GB
 System, DUP: total=16.00MB, used=12.00KB System: total=4.00MB,
 used=0.00
 Metadata, DUP: total=4.50GB, used=1.67GB
  $ btrfs fi df /home
 Data: total=310.01GB, used=209.53GB
 System, DUP: total=8.00MB, used=48.00KB System: total=4.00MB, used=0.00
 Metadata, DUP: total=11.00GB, used=2.98GB Metadata: total=8.00MB,
 used=0.00

 I'll switch to 3.0 soon, but, given the fact that we're going to be
 running MeeGo on 2.6.39 probably for a while, I was wondering if anyone
 knows off the top of their heads if this issue is known/identified. If
 not then I'll need to make someone do some patching ;).

 Auke

 You should read the thread Abysmal Performance of these mailing list
 from last month. They had a similar problem and downgraded to a 2.6.38
 kernel. By the way, that works for me too for the time being.

 Best Regards.

 Jan Stilow
 
 I had similar experience with two servers running on 2.6.39 - the 
 performance was terrible, after downgrade to 2.6.38 the speed is OK again.
 

Then you can turn to bisection to find out the culprit.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/5] scrub userland implementation

2011-07-12 Thread Jan Schmidt
On 11.07.2011 22:45, Hugo Mills wrote:
OK, here's the remainder of my comments for this file. Not much for
 this bit -- just one comment about locking, a reminder, and an
 observation.

Again, I ripped out the bits I simply corrected. My comments below.

 [...]

 +static int scrub_write_progress(pthread_mutex_t *m, const char *fsid,
 +struct scrub_progress* data, int n)
 +{
 +int ret;
 +int fd;
 +int old;
 +
 +ret = pthread_mutex_lock(m);
 +if (ret) {
 +ret = -errno;
 +goto out;
 +}
 +
 +pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, old);
 
Probably not massively important, but you don't check for the
 return value of this call or its counterpart at the end of this
 function.

pthread_* return values where wrong throughout the code. Good that you
pointed at this one. It's all fixed now.

 [...]

 +static void *scrub_one_dev(void *ctx)
 +{
 +struct scrub_progress *sp = ctx;
 +int ret;
 +struct timeval tv;
 +
 +sp-stats.canceled = 0;
 +sp-stats.duration = 0;
 +sp-stats.finished = 0;
 +
 +ret = ioctl(sp-fd, BTRFS_IOC_SCRUB, sp-scrub_args);
 +gettimeofday(tv, NULL);
 +sp-ret = ret;
 +sp-stats.duration = tv.tv_sec - sp-stats.t_start;
 +sp-stats.canceled = !!ret;
 +sp-ioctl_errno = errno;
 +ret = pthread_mutex_lock(sp-progress_mutex);
 +if (ret)
 +return ERR_PTR(-errno);
 +sp-stats.finished = 1;
 +ret = pthread_mutex_unlock(sp-progress_mutex);
 
If you downgrade .finished to a plain int, rather than a u64, then
 is this locking actually be needed? You have one place where the lock
 is held to write a single value (which is atomic for an int, IIRC),
 and you have another place where you hold the lock to read and compare
 it. I don't see any problem with removing the lock for that.

Conclusion first: I want to stick with the mutex. My reasoning:
- this is by no means time critical code
- the mutexes do the memory barriers required for synchronization
- using int instead of u64 would complicate the kvread macros

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/5] scrub userland implementation

2011-07-12 Thread Jan Schmidt
On 11.07.2011 22:57, Hugo Mills wrote:
 On Mon, Jul 11, 2011 at 04:29:24PM +0200, Jan Schmidt wrote:
 On 10.07.2011 20:23, Hugo Mills wrote:
Yes, this is over three months after the initial posting, but since
 nobody else has looked at it yet, and the patch is in my integration
 stack...

 ... thanks!

I've not reviewed the whole thing -- just the scrub start code so
 far. I've removed the bits I've not checked from the file below.

 I rebased the old branch I found to your current integration branch and
 fixed up a most of what you mentioned. I'll not send a new version out
 until after your complete review (or your statement that this is it or
 your statement that you would rather going on reviewing the revised
 version).
 
Thanks. The other half has just gone out (with few comments).

I'm through now, but I'll wait another day for you to protest on my
latest comments before sending the new version.

 Things I ripped out are accepted and corrected without resistance.
 Comments follow.
 
Only a couple of rejoinders below.
 
 On Wed, Mar 30, 2011 at 06:53:12PM +0200, Jan Schmidt wrote:
 [...]
 
 +  case 4: /* read dev id */
 +  for (j=0; isdigit(l[i+j])  i+j  avail; ++j)
 +  ;
 +  if (!j || i+j+1 = avail)

j == 0 is clearer than !j here, IMO

 +  _SCRUB_ILLEGAL;
 +  p[curr]-devid = atoll(l[i]);
 +  i += j + 1;

Is there any reason that you couldn't just use strtoull here? We
 know that the string is terminated with a \n (by the guarantee of
 state 1), so strtoull will always finish within the buffer.

 I just found it way easier to use atoll. We already know the first
 character really is a digit, so why bother with a more cumbersome function?
 
Ah, my mistake for not being clearer, I think: I was talking about
 the for loop at the head of the state 4 code as well. That only exists
 in order to find the end of the number read in by atoll, and strtoull
 would do that for you.

Alright, now I see your point. However, with strtoull I would have to
calculate with character pointers, whereas anything else uses direct
character access with i and j here. And I don't need the fancy bits of
strtoull, either. So I'd like to stick with atoll.

 [...]
 
 +  char fsid[37];

Magic number. is there a #define in libuuid for this value?

 At least the man page of uuid_parse clearly states uuids have 36 bytes
 plus a \0 in printf format. uuid/uuid.h doesn't contain such a constant.
 But volumes.c, print-tree.c and others do it with 37, too.
 
OK, if that's conventional (and not defined in uuid.h), then go
 with the magic number.
 
Hugo.
 

Thanks,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/5] scrub userland implementation

2011-07-12 Thread Hugo Mills
On Tue, Jul 12, 2011 at 10:49:59AM +0200, Jan Schmidt wrote:
 On 11.07.2011 22:57, Hugo Mills wrote:
  On Mon, Jul 11, 2011 at 04:29:24PM +0200, Jan Schmidt wrote:
  On 10.07.2011 20:23, Hugo Mills wrote:
 Yes, this is over three months after the initial posting, but since
  nobody else has looked at it yet, and the patch is in my integration
  stack...
 
  ... thanks!
 
 I've not reviewed the whole thing -- just the scrub start code so
  far. I've removed the bits I've not checked from the file below.
 
  I rebased the old branch I found to your current integration branch and
  fixed up a most of what you mentioned. I'll not send a new version out
  until after your complete review (or your statement that this is it or
  your statement that you would rather going on reviewing the revised
  version).
  
 Thanks. The other half has just gone out (with few comments).
 
 I'm through now, but I'll wait another day for you to protest on my
 latest comments before sending the new version.
 
  Things I ripped out are accepted and corrected without resistance.
  Comments follow.
  
 Only a couple of rejoinders below.
  
  On Wed, Mar 30, 2011 at 06:53:12PM +0200, Jan Schmidt wrote:
  [...]
  
  +case 4: /* read dev id */
  +for (j=0; isdigit(l[i+j])  i+j  avail; ++j)
  +;
  +if (!j || i+j+1 = avail)
 
 j == 0 is clearer than !j here, IMO
 
  +_SCRUB_ILLEGAL;
  +p[curr]-devid = atoll(l[i]);
  +i += j + 1;
 
 Is there any reason that you couldn't just use strtoull here? We
  know that the string is terminated with a \n (by the guarantee of
  state 1), so strtoull will always finish within the buffer.
 
  I just found it way easier to use atoll. We already know the first
  character really is a digit, so why bother with a more cumbersome function?
  
 Ah, my mistake for not being clearer, I think: I was talking about
  the for loop at the head of the state 4 code as well. That only exists
  in order to find the end of the number read in by atoll, and strtoull
  would do that for you.
 
 Alright, now I see your point. However, with strtoull I would have to
 calculate with character pointers, whereas anything else uses direct
 character access with i and j here. And I don't need the fancy bits of
 strtoull, either. So I'd like to stick with atoll.

   OK, it's not something I feel massively strongly about. Stick with
atoll, then.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- He's a nutcase, you know. There's no getting away from it -- ---  
 he'll end up with a knighthood 


signature.asc
Description: Digital signature


Re: after mounting with -o degraded: ioctl: LOOP_CLR_FD: Device or resource busy

2011-07-12 Thread Ilya Dryomov
On Tue, Jul 12, 2011 at 02:47:41AM +0200, krz...@gmail.com  wrote:
 dd if=/dev/null of=img5 bs=1 seek=2G
 dd if=/dev/null of=img6 bs=1 seek=2G
 mkfs.btrfs -d raid1 -m raid1 img5 img6
 losetup /dev/loop4 img5
 losetup /dev/loop5 img6
 btrfs device scan
 mount -t btrfs /dev/loop4 dir
 umount dir
 losetup -d /dev/loop5
 mount -t btrfs -o degraded /dev/loop4 dir
 umount dir
 losetup -d /dev/loop4
 ioctl: LOOP_CLR_FD: Device or resource busy
 mkfs.ext3 /dev/loop4
 mke2fs 1.39 (29-May-2006)
 /dev/loop4 is apparently in use by the system; will not make a filesystem 
 here!
 
 this only happens after mouting with -o degraded. loopback device is
 unusable until next reboot

So mkfs.ext3 fails to open device with O_EXCL.

We are missing blk_put() call on error path.  At least once: if
btrfs_open_devices() fails (and it does if you e.g. zero out the
superblock on a raid1 fs while it's unmounted and then immediately
mount) it's caller never releases the other device(s).  If nobody else
steps up I can try to fix this in the next couple of days.

Thanks,

Ilya

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] add detailed help messages to btrfs command

2011-07-12 Thread Hubert Kario
On Monday 11 of July 2011 17:13:13 Jan Schmidt wrote:
 Hi Hubert,
 
 I have to admit I did not recognize this patch but now Hugo is forcing
 me to use the detailed help messages and I've got an improvement to
 suggest:
 
 On 23.01.2011 13:42, Hubert Kario wrote:
[snip]
  { do_defrag, -1,
  
filesystem defragment, [-vcf] [-s start] [-l len] [-t size]
file|dir [file|dir...]\n
  
  -   Defragment a file or a directory.
  +   Defragment a file or a directory.,
  +  [-vcf] [-s start] [-l len] [-t size] file|dir
  [file|dir...]\n +  Defragment file data or directory
  metadata.\n
  +-v be verbose\n
  +-c compress the file while defragmenting\n
  +-f flush data to disk immediately after
  defragmenting\n +-s start   defragment only from byte
  onward\n +-l len defragment only up to len
  bytes\n
  +-t sizeminimal size of file to be considered for
  defragmenting\n
 
 Lots of too long lines.

you mean the code or the printed messages? 
messages fit a 80 column screen, I remember I double checked it

 
 I don't like to repeat the synopsis passage. How about adding the
 general -help when printing -adv_help as well? This reduces the need
 of duplication.

I think I added it because of differences in formatting.
Also I'd say we don't want to overload the user with information when he 
mistypes a command so the main help command should be as concise as possible 
while the advanced may be much more detailed (looking at the mailing list, `fi 
df` could definitely use some more verbose help message)

 
 To prove my point, looking at the current version in Hugo's integration
 branch, your two synopsis lines already got inconsistent regarding the
 -c option :-)

That's because the patches are submitted with base as Chris tree, not the 
Hugo's so the result is a real patchwork that needs some clean-up

[snip]

  @@ -148,10 +184,10 @@ static void help(char *np)
  
  printf(Usage:\n);
  for( cp = commands; cp-verb; cp++ )
  
  -   print_help(np, cp);
  +   print_help(np, cp, BASIC_HELP);
  
  printf(\n\t%s help|--help|-h\n\t\tShow the help.\n,np);
 
 ^
 You did not change this, but as we are here, ...

I wanted to leave as much code unchanged as possible (this /was/ my first 
patch to btrfs-tools)

 
  -   printf(\n\t%s cmd --help\n\t\tShow detailed help for a command
  or\n\t\t + printf(\n\t%s cmd --help\n\t\tShow detailed help for a
  command or
 
  ^^^
 ... why not extending the general rule so that help messages will be
 printed with --help and -h?

We have to remember that this way we loose -h switch, quite intuitive to show 
base 2 sizes with `btrfs file df`... 

-- 
Hubert
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] add detailed help messages to btrfs command

2011-07-12 Thread Hugo Mills
On Tue, Jul 12, 2011 at 12:40:06PM +0200, Hubert Kario wrote:
 On Monday 11 of July 2011 17:13:13 Jan Schmidt wrote:
  Hi Hubert,
  
  I have to admit I did not recognize this patch but now Hugo is forcing
  me to use the detailed help messages and I've got an improvement to
  suggest:
  
  On 23.01.2011 13:42, Hubert Kario wrote:
 [snip]
 { do_defrag, -1,
 
   filesystem defragment, [-vcf] [-s start] [-l len] [-t size]
   file|dir [file|dir...]\n
   
   - Defragment a file or a directory.
   + Defragment a file or a directory.,
   +  [-vcf] [-s start] [-l len] [-t size] file|dir
   [file|dir...]\n +  Defragment file data or directory
   metadata.\n
   +-v be verbose\n
   +-c compress the file while defragmenting\n
   +-f flush data to disk immediately after
   defragmenting\n +-s start   defragment only from byte
   onward\n +-l len defragment only up to len
   bytes\n
   +-t sizeminimal size of file to be considered for
   defragmenting\n
  
  Lots of too long lines.
 
 you mean the code or the printed messages? 
 messages fit a 80 column screen, I remember I double checked it
 
  
  I don't like to repeat the synopsis passage. How about adding the
  general -help when printing -adv_help as well? This reduces the need
  of duplication.
 
 I think I added it because of differences in formatting.
 Also I'd say we don't want to overload the user with information when he 
 mistypes a command so the main help command should be as concise as possible 
 while the advanced may be much more detailed (looking at the mailing list, 
 `fi 
 df` could definitely use some more verbose help message)
 
  
  To prove my point, looking at the current version in Hugo's integration
  branch, your two synopsis lines already got inconsistent regarding the
  -c option :-)
 
 That's because the patches are submitted with base as Chris tree, not the 
 Hugo's so the result is a real patchwork that needs some clean-up
 
 [snip]
 
   @@ -148,10 +184,10 @@ static void help(char *np)
   
 printf(Usage:\n);
 for( cp = commands; cp-verb; cp++ )
   
   - print_help(np, cp);
   + print_help(np, cp, BASIC_HELP);
   
 printf(\n\t%s help|--help|-h\n\t\tShow the help.\n,np);
  
  ^
  You did not change this, but as we are here, ...
 
 I wanted to leave as much code unchanged as possible (this /was/ my first 
 patch to btrfs-tools)
 
  
   - printf(\n\t%s cmd --help\n\t\tShow detailed help for a command
   or\n\t\t +   printf(\n\t%s cmd --help\n\t\tShow detailed help for 
   a
   command or
  
   ^^^
  ... why not extending the general rule so that help messages will be
  printed with --help and -h?
 
 We have to remember that this way we loose -h switch, quite intuitive to show 
 base 2 sizes with `btrfs file df`... 

   Indeed. To quote from df --help:

  -h, --human-readable  print sizes in human readable format (e.g., 1K 234M 2G)
  -H, --si  likewise, but use powers of 1000 not 1024
[...]
  --help display this help and exit

   I should probably resurrect my patch to implement this for btrfs.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- To an Englishman, 100 miles is a long way;  to an American, ---   
100 years is a long time.


signature.asc
Description: Digital signature


Re: [PATCH 2/2] add detailed help messages to btrfs command

2011-07-12 Thread Hubert Kario
On Tuesday 12 of July 2011 00:22:01 Hugo Mills wrote:
 On Mon, Jul 11, 2011 at 09:11:24PM +0200, Jan Schmidt wrote:
  On 07/11/2011 08:38 PM, Goffredo Baroncelli wrote:
[snip]
   A script extracts from the comment in the source both:
   - the text for the man page
   - the text for the detailed help.
 
Or possibly going the other direction: from the man page (which
 contains all of the information we need to reproduce in the code), it
 should be possible, with appropriate structuring, to retrieve the bits
 that the code needs to know about, and insert them into a table in a
 generated .c file. Just a thought.

I think that the first line in normal help message and advanced help message 
can and sometimes should be different.

The basic as concise as possible, while the advanced verbose and quite 
detailed (for example explaining what a filesystem scrub /is/)
 
Oh, and the current man page needs some major work on its
 typography -- it's inconsistent with both itself, and with most other
 man pages, as far as I can tell. I did have a patch for that, but it
 was a long time ago, and clashed with almost everything.

Yes, until we won't have a single current tree for btrfs-progs there will be 
inconsistencies that will need to be fixed later.

But I guess that with Hugo's tree we're getting there.
 
  Does anybody have such a script around? I suppose we're not the first
  ones writing help texts and man pages.
  
   So we can reach the following goals:
   - the help is linked to the code
   - is less likely to forget to update the message
   - the man page, the helps are always aligned
  
  Only, we still will need like short and long help. E.g. the full text in
  the man page may be inappropriate as a --help message. Also, we do need
  a clever idea to get indentation right in the man pages. I fiddled a lot
  on the man pages for scrub parameter indentation (to get the second line
  describing a command line option indented correctly to start below the
  text of the first line, that was).
 
We actually need three levels of help:
 [snip]

I'd that the biggest problem, putting it all in code and formatting will be a 
real pain...

Hubert
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory leak?

2011-07-12 Thread Stephane Chazelas
2011-07-11 10:01:21 +0100, Stephane Chazelas:
[...]
 Same without dmcrypt. So to sum up, BUG() reached in btrfs-fixup
 thread when doing an 
 
 - rsync (though I also got (back when on ubuntu and 2.6.38) at
   least one occurrence using bsdtar | bsdtar)
 - of a large amount of data (with a large number of files),
   though the bug occurs quite early probably after having
   transfered about 50-100GB
 - the source FS being btrfs with compress-force on 3 devices
   (one of which slightly shorter than the others) and a lot of
   subvolumes and snapshots (I'm now copying from read-only
   snapshots but that happened with RW ones as well).
 - to a newly created btrfs fs
 - on one device (/dev/sdd or dmcrypt)
 - mounted with compress or compress-force.
 
 - noatime on either source or dest doesn't make a difference
   (wrt the occurrence of fixup BUG())
 - can't reproduce it when dest is not mounted with compress
 - beside that BUG(),
 - kernel memory is being used up (mostly in
   btrfs_inode_cache) and can't be reclaimed (leading to crash
   with oom killing everybody)
 - the target FS can be unmounted but that does not reclaim
   memory. However the *source* FS (that is not the one we tried
   with and without compress) cannot be unmounted (umount hangs,
   see another email for its stack trace).
 - Only way to get out of there is reboot with sysrq-b
 - happens with 2.6.38, 2.6.39, 3.0.0rc6
 - CONFIG_SLAB_DEBUG, CONFIG_DEBUG_PAGEALLOC,
   CONFIG_DEBUG_SLAB_LEAK, slub_debug don't tell us anything
   useful (there's more info in /proc/slabinfo when
   CONFIG_SLAB_DEBUG is on, see below)
 - happens with CONFIG_SLUB as well.
[...]

I don't know which of CONFIG_SLUB or noatime made it, but in
that setup with both enabled, I do get the BUG(), but the system
memory is not exhausted even after rsync goes over the section
with millions of files where it used to cause the oom crash.

The only issue remaining then is that I can't umount the source
FS (and thus causing reboot issues). We could still have 2 or 3
different issues here for all we know.

The situation is a lot more comfortable for me now though.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: last_index variable in btrfs_buffered_write function

2011-07-12 Thread Chris Mason
Excerpts from Mitch Harder's message of 2011-07-11 15:38:45 -0400:
 2011/7/11 João Eduardo Luís jecl...@gmail.com:
  Hello.
 
  Am I reading the code the wrong way, or is the 'last_index' variable in 
  '__btrfs_buffered_write()' (and previously used in 
  'btrfs_file_aio_write()') irrelevant?
 
  It appears to just be used in 'prepare_pages()', passed as an argument, but 
  never actually used by this function.
 
  Furthermore, I'm not sure what is intended with this variable, but if the 
  idea is to assign it with the  last page in the range, then I would say 
  that instead of
 
  last_index = (pos + iov_iter_count(i))  PAGE_CACHE_SHIFT;
 
  it should be
 
   last_index = (pos + iov_iter_count(i) - 1)  PAGE_CACHE_SHIFT;
 
  Then again, I may be missing something.
 
  Cheers.
 
 
 I came to the same conclusion a few months ago when looking at a bug
 in the same area of code.
 
 The calculation appears to be wrong, but since it's not used anywhere,
 you can't say for certain.  :)
 
 I just haven't gotten around to testing a patch to confirm the hypothesis.

I'd say it is a victim of a cleanup that didn't completely clean it up.
It is unused ;)

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Lockdep warning in btrfs_clear_lock_blocking

2011-07-12 Thread Morten P.D. Stevens
2011/5/10 David Sterba d...@jikos.cz

 Hi,

 I've hit this lockdep warning, 2.6.39rc6. Single btrfs partition, 30GB,
 filled with 2GB, compress-force=lzo, warning trigered after normal copy+du.
 Happened only once.

 [Might be a false positive.]

Hi,

I have a similar error with 3.0-rc6.

OS: Fedora 15
Kernel: 3.0-0.rc6.git6.1.fc16.x86_64

Jul 12 15:44:13 x86-002 kernel: [ 1294.229850]
Jul 12 15:44:13 x86-002 kernel: [ 1294.229852]
=
Jul 12 15:44:13 x86-002 kernel: [ 1294.230298] [ INFO: possible
recursive locking detected ]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443] 3.0-0.rc6.git6.1.fc16.x86_64 #1
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]
-
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443] mv/1289 is trying to
acquire lock:
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]
((eb-lock)-rlock){+.+...}, at: [a01b84cc]
btrfs_try_spin_lock+0x27/0x83 [btrfs]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443] but task is already holding lock:
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]
((eb-lock)-rlock){+.+...}, at: [a01b849c]
btrfs_clear_lock_blocking+0x1f/0x28 [btrfs]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443] other info that might
help us debug this:
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  Possible unsafe
locking scenario:
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]CPU0
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]   lock((eb-lock)-rlock);
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]   lock((eb-lock)-rlock);
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  *** DEADLOCK ***
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  May be due to missing
lock nesting notation
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443] 1 lock held by mv/1289:
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  #0:
((eb-lock)-rlock){+.+...}, at: [a01b849c]
btrfs_clear_lock_blocking+0x1f/0x28 [btrfs]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443] stack backtrace:
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443] Pid: 1289, comm: mv Not
tainted 3.0-0.rc6.git6.1.fc16.x86_64 #1
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443] Call Trace:
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [81088c6d]
__lock_acquire+0x917/0xcf7
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [8100e9ad] ?
paravirt_read_tsc+0x9/0xd
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [8100ee82] ?
sched_clock+0x9/0xd
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [8107a44d] ?
sched_clock_local+0x12/0x75
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [a01b849c] ?
btrfs_clear_lock_blocking+0x1f/0x28 [btrfs]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [a01b84cc] ?
btrfs_try_spin_lock+0x27/0x83 [btrfs]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [810894da]
lock_acquire+0xbf/0x103
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [a01b84cc] ?
btrfs_try_spin_lock+0x27/0x83 [btrfs]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [814f61e8]
_raw_spin_lock+0x36/0x6a
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [a01b84cc] ?
btrfs_try_spin_lock+0x27/0x83 [btrfs]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [a01b849c] ?
btrfs_clear_lock_blocking+0x1f/0x28 [btrfs]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [a01b84cc]
btrfs_try_spin_lock+0x27/0x83 [btrfs]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [a0178d30]
btrfs_search_slot+0x37b/0x499 [btrfs]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [a0186a3a]
btrfs_lookup_csum+0x68/0x10a [btrfs]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [a0186c59]
__btrfs_lookup_bio_sums+0x17d/0x2e2 [btrfs]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [a0186e0e]
btrfs_lookup_bio_sums+0x16/0x18 [btrfs]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [a01917b3]
btrfs_submit_bio_hook+0x9b/0x111 [btrfs]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [a01a9137]
submit_one_bio+0x92/0xca [btrfs]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [a01ac752]
extent_readpages+0xbf/0xd0 [btrfs]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [a0192aeb] ?
uncompress_inline+0x11e/0x11e [btrfs]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [a019260c]
btrfs_readpages+0x1f/0x21 [btrfs]
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [810f2ff7]
__do_page_cache_readahead+0x158/0x1de
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [810f32e6]
ra_submit+0x21/0x25
Jul 12 15:44:13 x86-002 kernel: [ 1294.230443]  [810f34d8]

Re: Delayed inode operations not doing the right thing with enospc

2011-07-12 Thread Christian Brunner
2011/6/7 Josef Bacik jo...@redhat.com:
 On 06/06/2011 09:39 PM, Miao Xie wrote:
 On fri, 03 Jun 2011 14:46:10 -0400, Josef Bacik wrote:
 I got a lot of these when running stress.sh on my test box



 This is because use_block_rsv() is having to do a
 reserve_metadata_bytes(), which shouldn't happen as we should have
 reserved enough space for those operations to complete.  This is
 happening because use_block_rsv() will call get_block_rsv(), which if
 root-ref_cows is set (which is the case on all fs roots) we will use
 trans-block_rsv, which will only have what the current transaction
 starter had reserved.

 What needs to be done instead is we need to have a block reserve that
 any reservation that is done at create time for these inodes is migrated
 to this special reserve, and then when you run the delayed inode items
 stuff you set trans-block_rsv to the special block reserve so the
 accounting is all done properly.

 This is just off the top of my head, there may be a better way to do it,
 I've not actually looked that the delayed inode code at all.

 I would do this myself but I have a ever increasing list of shit to do
 so will somebody pick this up and fix it please?  Thanks,

 Sorry, it's my miss.
 I forgot to set trans-block_rsv to global_block_rsv, since we have migrated
 the space from trans_block_rsv to global_block_rsv.

 I'll fix it soon.


 There is another problem, we're failing xfstest 204.  I tried making
 reserve_metadata_bytes commit the transaction regardless of whether or
 not there were pinned bytes but the test just hung there.  Usually it
 takes 7 seconds to run and I ctrl+c'ed it after a couple of minutes.
 204 just creates a crap ton of files, which is what is killing us.
 There needs to be a way to start flushing delayed inode items so we can
 reclaim the space they are holding onto so we don't get enospc, and it
 needs to be better than just committing the transaction because that is
 dog slow.  Thanks,

 Josef

Is there a solution for this?

I'm running a 2.6.38.8 kernel with all the btrfs patches from 3.0rc7
(except the pluging). When starting a ceph rebuild on the btrfs
volumes I get a lot of warnings from block_rsv_use_bytes in
use_block_rsv:

[ 2157.922054] [ cut here ]
[ 2157.927270] WARNING: at fs/btrfs/extent-tree.c:5683
btrfs_alloc_free_block+0x1f8/0x360 [btrfs]()
[ 2157.937123] Hardware name: ProLiant DL180 G6
[ 2157.942132] Modules linked in: btrfs zlib_deflate libcrc32c bonding
ipv6 pcspkr serio_raw iTCO_wdt iTCO_vendor_support ghes hed
i7core_edac edac_core ixgbe dca mdio iomemory_vsl(P) hpsa squashfs
usb_storage [last unloaded: scsi_wait_scan]
[ 2157.967386] Pid: 10280, comm: btrfs-freespace Tainted: PW
2.6.38.8-1.fits.4.el6.x86_64 #1
[ 2157.977554] Call Trace:
[ 2157.980383]  [8106482f] ? warn_slowpath_common+0x7f/0xc0
[ 2157.987382]  [8106488a] ? warn_slowpath_null+0x1a/0x20
[ 2157.994192]  [a0240b88] ?
btrfs_alloc_free_block+0x1f8/0x360 [btrfs]
[ 2158.002354]  [a026eda8] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
[ 2158.010014]  [a0231132] ? split_leaf+0x142/0x8c0 [btrfs]
[ 2158.016990]  [a022a29b] ? generic_bin_search+0x19b/0x210 [btrfs]
[ 2158.024784]  [a022ca1a] ? btrfs_leaf_free_space+0x8a/0xe0 [btrfs]
[ 2158.032627]  [a0231f83] ? btrfs_search_slot+0x6d3/0x7a0 [btrfs]
[ 2158.040325]  [a0244942] ?
btrfs_csum_file_blocks+0x632/0x830 [btrfs]
[ 2158.048477]  [a026ffda] ? clear_extent_bit+0x17a/0x440 [btrfs]
[ 2158.056026]  [a024ffc5] ? add_pending_csums+0x45/0x70 [btrfs]
[ 2158.063530]  [a0252dad] ?
btrfs_finish_ordered_io+0x22d/0x360 [btrfs]
[ 2158.071755]  [a0252f2c] ?
btrfs_writepage_end_io_hook+0x4c/0xa0 [btrfs]
[ 2158.080172]  [a027049b] ?
end_bio_extent_writepage+0x13b/0x180 [btrfs]
[ 2158.088505]  [815406fb] ? schedule_timeout+0x17b/0x2e0
[ 2158.095258]  [8118964d] ? bio_endio+0x1d/0x40
[ 2158.101171]  [a0247764] ? end_workqueue_fn+0xf4/0x130 [btrfs]
[ 2158.108621]  [a027b30e] ? worker_loop+0x13e/0x540 [btrfs]
[ 2158.115703]  [a027b1d0] ? worker_loop+0x0/0x540 [btrfs]
[ 2158.122563]  [a027b1d0] ? worker_loop+0x0/0x540 [btrfs]
[ 2158.129413]  [81086356] ? kthread+0x96/0xa0
[ 2158.135093]  [8100ce44] ? kernel_thread_helper+0x4/0x10
[ 2158.141913]  [810862c0] ? kthread+0x0/0xa0
[ 2158.147467]  [8100ce40] ? kernel_thread_helper+0x0/0x10
[ 2158.154287] ---[ end trace 55e53c726a04ecd7 ]---

Thanks,
Christian
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: last_index variable in btrfs_buffered_write function

2011-07-12 Thread Mitch Harder
2011/7/12 Chris Mason chris.ma...@oracle.com:
 Excerpts from Mitch Harder's message of 2011-07-11 15:38:45 -0400:
 2011/7/11 João Eduardo Luís jecl...@gmail.com:
  Hello.
 
  Am I reading the code the wrong way, or is the 'last_index' variable in 
  '__btrfs_buffered_write()' (and previously used in 
  'btrfs_file_aio_write()') irrelevant?
 
  It appears to just be used in 'prepare_pages()', passed as an argument, 
  but never actually used by this function.
 
  Furthermore, I'm not sure what is intended with this variable, but if the 
  idea is to assign it with the  last page in the range, then I would say 
  that instead of
 
  last_index = (pos + iov_iter_count(i))  PAGE_CACHE_SHIFT;
 
  it should be
 
   last_index = (pos + iov_iter_count(i) - 1)  PAGE_CACHE_SHIFT;
 
  Then again, I may be missing something.
 
  Cheers.
 

 I came to the same conclusion a few months ago when looking at a bug
 in the same area of code.

 The calculation appears to be wrong, but since it's not used anywhere,
 you can't say for certain.  :)

 I just haven't gotten around to testing a patch to confirm the hypothesis.

 I'd say it is a victim of a cleanup that didn't completely clean it up.
 It is unused ;)


I've put together a patch for this clean-up.

I'll send it to the list.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: Remove unused variable 'last_index' in file.c

2011-07-12 Thread Mitch Harder
The variable 'last_index' is calculated in the __btrfs_buffered_write
function and passed as a parameter to the prepare_pages function,
but is not used anywhere in the prepare_pages function.

Remove instances of 'last_index' in these functions.

Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org
---
 fs/btrfs/file.c |7 ++-
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index bd6bbb8..b061de0 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1059,7 +1059,7 @@ static int prepare_uptodate_page(struct page *page, u64 
pos)
 static noinline int prepare_pages(struct btrfs_root *root, struct file *file,
 struct page **pages, size_t num_pages,
 loff_t pos, unsigned long first_index,
-unsigned long last_index, size_t write_bytes)
+size_t write_bytes)
 {
struct extent_state *cached_state = NULL;
int i;
@@ -1159,7 +1159,6 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
struct btrfs_root *root = BTRFS_I(inode)-root;
struct page **pages = NULL;
unsigned long first_index;
-   unsigned long last_index;
size_t num_written = 0;
int nrptrs;
int ret = 0;
@@ -1172,7 +1171,6 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
return -ENOMEM;
 
first_index = pos  PAGE_CACHE_SHIFT;
-   last_index = (pos + iov_iter_count(i))  PAGE_CACHE_SHIFT;
 
while (iov_iter_count(i)  0) {
size_t offset = pos  (PAGE_CACHE_SIZE - 1);
@@ -1206,8 +1204,7 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
 * contents of pages from loop to loop
 */
ret = prepare_pages(root, file, pages, num_pages,
-   pos, first_index, last_index,
-   write_bytes);
+   pos, first_index, write_bytes);
if (ret) {
btrfs_delalloc_release_space(inode,
num_pages  PAGE_CACHE_SHIFT);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sign a bug in “set_extent_bit”

2011-07-12 Thread Li Zefan
Qiang Zhu wrote:
 hi 
 
 In the end part of “set_extent_bit”,I found when err occurs ,there is no 
 operate to free prealloc which have been allocated in 
 alloc_extent_state_atomic
 this may lead a menory leak when set_state_bits failed.

No, it won't. 'prealloc' has been inserted into rbtree before set_state_bits()
is called().

 err = set_state_bits(tree, prealloc, bits);
 
 if (err) {
 prealloc = NULL;
 goto out; // this direct will do nothing because prealloc=NULL.
 }
 wish your reply.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html