Re: [PATCH] Btrfs-progs use safe string manipulation functions

2011-02-11 Thread Lars Wirzenius
On to, 2011-02-10 at 10:29 -0300, Eduardo Silva wrote:
 [PATCH] Add safe string manipulation functions
 
 Deprecate direct use of strcpy(3)
 The following string manipulation function has been added:
 
- string_copy() : wrapper of strcpy(3)
- string_ncopy(): wrapper of strncpy(3)
 
 both function compose safe NULL terminated strings.

I'd like make some comments, which I hope will be acceptable.

Firstly, calling strcpy dangerous is, to me, rather overblown. It is
easy to make mistakes, but it is not at all dangerous the way, for
example, gets(3) is dangerous. strcpy can be used safely, gets cannot.
Also, if you consider strcpy to be dangerous, then strcat should be
dangerous too. However, given the risk of overwriting a buffer with
strcpy, I agree that it's good to see if an alternative can be found.

Secondly, if you're going to make wrappers or helper functions for
string handling in C, you need to decide several things right from the
start:

* do you do static or dynamic allocation?
* how do you handle errors?
* do you want a minimal wrapper or replacement, or a whole new library?

I am not familiar enough with the btrfs-progs code base to give any
strong recommendations, but off the top of my head I would suggest
these, for this patch:

* make use of fairly minimal wrappers/replacements (at least for now)
* handle errors by calling abort or exit
* don't allocate data dynamically (or else it's not a minimal wrapper)

For error handling, there are two kinds of things that can happen:
normal run-time errors (malloc returns NULL, writing to a file fails,
etc), and programming errors (wrong parameters to functions). If we're
doing a minimal wrapper without dynamic memory allocation, the only
thing string functions should need to worry about is programming errors.
For those, abort(3) is the appropriate way to terminate the program,
since it causes a core dump, which can be inspected with a debugger.
Since btrfs-progs are non-interactive command line tools, this should be
OK.

For checking function arguments, the assert macro is appropriate. It
calls abort if the test fails. I am not sure I would check for
parameters being non-NULL, though, since the kernel will trap such usage
and cause a segfault, which, again, can be analyzed with a debugger.

For things like string copying, another problem to consider is what to
do if the target array is not large enough? The two possibilities is to
silently truncate the output string, return an error code of some sort,
or to abort. The error code is a bit tedious, since it requires the
caller to check for it, and do something sensible if it's not enough.
For btrfs-progs, I would suggest aborting.

Taking all of these together, my suggestion for a safer strcpy would
be along these lines (outline only, not tested code):

void safer_strcpy(char *target, size_t tsize, const char
*source)
{
size_t n;

n = snprintf(target, tsize, %s, source);
assert(n  tsize);
}

void safer_strncpy(char *tgt, size_t tsize, const char *src, size_t n)
{
assert(n  tsize); /* There must be space for the '\0'. */
memset(tgt, '\0', tsize);
strncpy(tgt, src, n);
}

Note that for any reasonable error checking to be possible the safety
functions need to know the size of the target memory area. Otherwise no
sensible checks can be done -- you have to rely on the caller to check
that the target array is big enough, and then you're nowhere better than
with plain strcpy.

(Also note that I did not call the function string_copy, since global
names starting with str are reserved to the C implementation.)

Your function fills in the target array with zero bytes. Is that
necessary? If it is, then the memset call needs to be added to
safer_strcpy.

(I don't find it useful to return the target array as the return value
of the function, so I didn't do that.)

-- 
Blog/wiki/website hosting with ikiwiki (free for free software):
http://www.branchable.com/

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LOOP_GET_STATUS(64) truncates pathnames to 64 chars (was Re: Bug in mkfs.btrfs?!)

2011-02-11 Thread Felix Blanke
Hi,

are you sure that patch is in the kernel?

I'm using 2.6.37 and don't have those attribues in my /sys.



Felix

On 10. February 2011 - 13:29, Petr Uzel wrote:
 Date: Thu, 10 Feb 2011 13:29:27 +0100
 From: Petr Uzel petr.u...@suse.cz
 To: Chris Samuel ch...@csamuel.org
 Cc: Felix Blanke felixbla...@gmail.com, kreij...@inwind.it, Hugo Mills
  hugo-l...@carfax.org.uk, linux-btrfs@vger.kernel.org, Linux Kernel
  linux-ker...@vger.kernel.org
 Subject: Re: LOOP_GET_STATUS(64) truncates pathnames to 64 chars (was Re:
  Bug in mkfs.btrfs?!)
 Mail-Followup-To: Chris Samuel ch...@csamuel.org, Felix Blanke
  felixbla...@gmail.com, kreij...@inwind.it, Hugo Mills
  hugo-l...@carfax.org.uk, linux-btrfs@vger.kernel.org, Linux Kernel
  linux-ker...@vger.kernel.org
 
 On Tue, Jan 25, 2011 at 11:15:11AM +1100, Chris Samuel wrote:
  /*
   * CC'd to linux-kernel in case they have any feedback on this.
   *
   * Long thread, trying to work out why mkfs.btrfs failed to
   * make a filesystem on an encrypted loopback mount called
   * /dev/loop2. Cause turned out to be mkfs.btrfs calling
   * LOOP_GET_STATUS to find out if the block device was mounted
   * and getting a truncated device name back and so it later
   * fails when lstat() is called on the truncated device path.
   *
   * The long device name for the encrypted loopback mount was
   * because /dev/disk/by-id/$ID was used when Felix created it
   * to cope with devices moving around.
   */
  
  On 25/01/11 00:01, Felix Blanke wrote:
  
   you were talking about the LOOP_GET_STATUS function. I'm not
   quite sure where does it came from. Is it part of the kernel?
   Or does it come from the util-linux package?
  
  It's in the kernel, and there is both LOOP_GET_STATUS (old
  implementation) and LOOP_GET_STATUS64 (new implementation).
  
  They return structures called loop_info and loop_info64
  respectively and both are defined in include/linux/loop.h .
  
  Sadly in both cases the lengths of paths are defined to be
  LO_NAME_SIZE which is currently 64 and hence either
  implementation will cause the problematic:
  
  lstat(/dev/disk/by-id/ata-INTEL_SSDSA2M160G2GC_CVPO939201JX160AGN-par,
  0x7fffa30b3cf0) = -1 ENOENT (No such file or directory)
  
  I've CC'd this to the LKML in case they have any feedback on
  this apparent problem with the API.
  
 Since 2.6.37, you can get full path to the backing file from sys:
 cat /sys/block/loopX/loop/backing_file
 
 See
 http://linux.derkeiler.com/Mailing-Lists/Kernel/2010-07/msg10996.html
 
 
 HTH,
 
 Petr
 
 --
 Petr Uzel
 IRC: ptr_uzl @ freenode


---end quoted text---
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ENOSPC Regression

2011-02-11 Thread Mitch Harder
I'm encountering premature ENOSPC issues recently where my Btrfs
testing partition will either prematurely return an ENOSPC, or lock up
the operations trying to access the partition.

I have bisected the problem to this commit:
http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commit;h=914ee295af418e936ec20a08c1663eaabe4cd07a
(Btrfs: pwrite blocked when writing from the mmaped buffer of the same page)

I am encountering the problem on a small-ish 3.5 GB Btrfs partition.
I can replicate the problem with and without compression.  I can also
replicate the problem with and without reformating the partition.

For most operations I run on this partition, Btrfs is performing
without error.  But when I compile openmotif-2.3.3 on a kernel that is
after the above referenced commit, I'll get either an ENOSPC error or
the partition locks up.

When I encounter a lock-up issue, there are no errors in dmesg, and no
delayed processes are showing (unless I try to run an additional
operation on that partition, such as 'ls', which will subsequently
show up as delayed).  However, the build process for openmotif-2.3.3
appears frozen, and several processes related to the build are shown
as running, and will not even respond to 'kill -s 9 pid'

The partition only has about 500 MB of data when I encounter the
problems, and openmotif-2.3.3 typically only requires about 30-60 MB
to compile.

However, running 'btrfs fi show' indicates that btrfs has attempted to
reserve all the space on the disk for data and metadata.  When running
a kernel prior to the above referenced commit, btrfs will compile
openmotif-2.3.3 without needing to reserve much extra space on the
partition.

Let me know if you would like any additional information or tests.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.37: Multi-second I/O latency while untarring

2011-02-11 Thread Andrew Lutomirski
As I type this, I have an ssh process running that's dumping data into
a fifo at high speed (maybe 500Mbps) and a tar process that's
untarring from the same fifo onto btrfs.  The btrfs fs is mounted -o
space_cache,compress.  This machine has 8GB ram, 8 logical cores, and
a fast (i7-2600) CPU, so it's not an issue with the machine struggling
under load.

Every few tens of seconds, my system stalls for several seconds.
These stalls cause keyboard input to be lost, firefox to hang, etc.

Setting tar's ionice priority to best effort / 7 or to idle makes no difference.

ionice idle and queue_depth = 1 on the disk (a slow 2TB WD) also makes
no difference.

max_sectors_kb = 64 in addition to the above doesn't help either.

latencytop shows regular instances of 2-7 *second* latency, variously
in sync_page, start_transaction, btrfs_start_ordered_extent, and
do_get_write_access (from jbd2 on my ext4 root partition).

echo 3 drop_caches gave me 7 GB free RAM.  I still had stalls when
4-5 GB were still free (so it shouldn't be a problem with important
pages being evicted).

In case it matters, all of my partitions are on LVM on dm-crypt, but
this machine has AES-NI so the overhead from that should be minimal.
In fact, overall CPU usage is only about 10%.

What gives?  I thought this stuff was supposed to be better on modern kernels.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.37: Multi-second I/O latency while untarring

2011-02-11 Thread Chris Mason
Excerpts from Andrew Lutomirski's message of 2011-02-11 10:08:52 -0500:
 As I type this, I have an ssh process running that's dumping data into
 a fifo at high speed (maybe 500Mbps) and a tar process that's
 untarring from the same fifo onto btrfs.  The btrfs fs is mounted -o
 space_cache,compress.  This machine has 8GB ram, 8 logical cores, and
 a fast (i7-2600) CPU, so it's not an issue with the machine struggling
 under load.
 
 Every few tens of seconds, my system stalls for several seconds.
 These stalls cause keyboard input to be lost, firefox to hang, etc.
 
 Setting tar's ionice priority to best effort / 7 or to idle makes no 
 difference.
 
 ionice idle and queue_depth = 1 on the disk (a slow 2TB WD) also makes
 no difference.
 
 max_sectors_kb = 64 in addition to the above doesn't help either.
 
 latencytop shows regular instances of 2-7 *second* latency, variously
 in sync_page, start_transaction, btrfs_start_ordered_extent, and
 do_get_write_access (from jbd2 on my ext4 root partition).
 
 echo 3 drop_caches gave me 7 GB free RAM.  I still had stalls when
 4-5 GB were still free (so it shouldn't be a problem with important
 pages being evicted).
 
 In case it matters, all of my partitions are on LVM on dm-crypt, but
 this machine has AES-NI so the overhead from that should be minimal.
 In fact, overall CPU usage is only about 10%.
 
 What gives?  I thought this stuff was supposed to be better on modern kernels.

We can tell more if you post the full traces from latencytop.  I have a
patch here for latencytop that adds a -c mode, which dumps the traces
out to a text files.

http://oss.oracle.com/~mason/latencytop.patch

Based on what you have here, I think it's probably a latency problem
between btrfs and the dm-crypt stuff.  How easily can setup a test
partition without dm-crypt?

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.37: Multi-second I/O latency while untarring

2011-02-11 Thread Matt
On Fri, Feb 11, 2011 at 3:08 PM, Andrew Lutomirski a...@luto.us wrote:
 As I type this, I have an ssh process running that's dumping data into
 a fifo at high speed (maybe 500Mbps) and a tar process that's
 untarring from the same fifo onto btrfs.  The btrfs fs is mounted -o
 space_cache,compress.  This machine has 8GB ram, 8 logical cores, and
 a fast (i7-2600) CPU, so it's not an issue with the machine struggling
 under load.

 Every few tens of seconds, my system stalls for several seconds.
 These stalls cause keyboard input to be lost, firefox to hang, etc.

 Setting tar's ionice priority to best effort / 7 or to idle makes no 
 difference.

 ionice idle and queue_depth = 1 on the disk (a slow 2TB WD) also makes
 no difference.

 max_sectors_kb = 64 in addition to the above doesn't help either.

 latencytop shows regular instances of 2-7 *second* latency, variously
 in sync_page, start_transaction, btrfs_start_ordered_extent, and
 do_get_write_access (from jbd2 on my ext4 root partition).

 echo 3 drop_caches gave me 7 GB free RAM.  I still had stalls when
 4-5 GB were still free (so it shouldn't be a problem with important
 pages being evicted).

 In case it matters, all of my partitions are on LVM on dm-crypt, but
 this machine has AES-NI so the overhead from that should be minimal.
 In fact, overall CPU usage is only about 10%.

 What gives?  I thought this stuff was supposed to be better on modern kernels.

 --Andy
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hi Andrew,

you could try the following patch to speed up dm-crypt:

https://patchwork.kernel.org/patch/365542/

I'm using it on top of a highly-patched 2.6.37 kernel

not sure if exactly that version was included in 2.6.38


there are some additional handles to speed up dm:

e.g. PCRYCONFIG_CRYPTO_PCRYPT=y

Regards

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recovering data from disk with loose cable

2011-02-11 Thread Ben Gamari
On Wed, 9 Feb 2011 21:46:38 -0500, Ben Gamari bgam...@gmail.com wrote:
 We have a disk array behind two external SATA port multipliers (four
 disks on each multiplier) which has been running btrfs (RAID 1 for
 both data and metadata). Unfortunately, earlier today it seems one of
 the SATA cables came loose, resulting in the kernel (2.6.37)
 eventually OOPSing although apparently not before writing quite a bit
 of data. Upon reboot, I was met with the dreaded,
 
 disk-io.c:741: open_ctree_fd: Assertion `!(!tree_root-node)' failed.
 
 Unfortunately any attempt to run any of the btrfs-progs utilities
 (from git) met a similar end. There was recently a patch to try harder
 in recovering from this problem posted to the list[1], although
 unfortunately it is unable to find a root. Considering there are eight
 disks in the array and only four were affected by the loose cable, I
 find it very hard to believe there is no way to recover this volume.
 Any suggestions at all would be greatly appreciated. Recovering this
 data would mean a lot. Thanks,
 
Given there has been no response to this, I suppose I should assume this
data is unrecoverable? It's not the end of the world if so, but again,
it would be nice to get a few files and it seems like a small subset of
the metadata is corrupted.

Cheers,

- Ben
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ENOSPC Regression

2011-02-11 Thread Josef Bacik
On Fri, Feb 11, 2011 at 07:21:47AM -0600, Mitch Harder wrote:
 I'm encountering premature ENOSPC issues recently where my Btrfs
 testing partition will either prematurely return an ENOSPC, or lock up
 the operations trying to access the partition.
 
 I have bisected the problem to this commit:
 http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commit;h=914ee295af418e936ec20a08c1663eaabe4cd07a
 (Btrfs: pwrite blocked when writing from the mmaped buffer of the same page)
 
 I am encountering the problem on a small-ish 3.5 GB Btrfs partition.
 I can replicate the problem with and without compression.  I can also
 replicate the problem with and without reformating the partition.
 
 For most operations I run on this partition, Btrfs is performing
 without error.  But when I compile openmotif-2.3.3 on a kernel that is
 after the above referenced commit, I'll get either an ENOSPC error or
 the partition locks up.
 
 When I encounter a lock-up issue, there are no errors in dmesg, and no
 delayed processes are showing (unless I try to run an additional
 operation on that partition, such as 'ls', which will subsequently
 show up as delayed).  However, the build process for openmotif-2.3.3
 appears frozen, and several processes related to the build are shown
 as running, and will not even respond to 'kill -s 9 pid'
 
 The partition only has about 500 MB of data when I encounter the
 problems, and openmotif-2.3.3 typically only requires about 30-60 MB
 to compile.
 
 However, running 'btrfs fi show' indicates that btrfs has attempted to
 reserve all the space on the disk for data and metadata.  When running
 a kernel prior to the above referenced commit, btrfs will compile
 openmotif-2.3.3 without needing to reserve much extra space on the
 partition.
 
 Let me know if you would like any additional information or tests.

Can you try my btrfs-work tree and see if you still have the same problem?
Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ENOSPC Regression

2011-02-11 Thread Mitch Harder
On Fri, Feb 11, 2011 at 10:22 AM, Josef Bacik jo...@redhat.com wrote:
 On Fri, Feb 11, 2011 at 07:21:47AM -0600, Mitch Harder wrote:
 I'm encountering premature ENOSPC issues recently where my Btrfs
 testing partition will either prematurely return an ENOSPC, or lock up
 the operations trying to access the partition.

 I have bisected the problem to this commit:
 http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commit;h=914ee295af418e936ec20a08c1663eaabe4cd07a
 (Btrfs: pwrite blocked when writing from the mmaped buffer of the same page)

 I am encountering the problem on a small-ish 3.5 GB Btrfs partition.
 I can replicate the problem with and without compression.  I can also
 replicate the problem with and without reformating the partition.

 For most operations I run on this partition, Btrfs is performing
 without error.  But when I compile openmotif-2.3.3 on a kernel that is
 after the above referenced commit, I'll get either an ENOSPC error or
 the partition locks up.

 When I encounter a lock-up issue, there are no errors in dmesg, and no
 delayed processes are showing (unless I try to run an additional
 operation on that partition, such as 'ls', which will subsequently
 show up as delayed).  However, the build process for openmotif-2.3.3
 appears frozen, and several processes related to the build are shown
 as running, and will not even respond to 'kill -s 9 pid'

 The partition only has about 500 MB of data when I encounter the
 problems, and openmotif-2.3.3 typically only requires about 30-60 MB
 to compile.

 However, running 'btrfs fi show' indicates that btrfs has attempted to
 reserve all the space on the disk for data and metadata.  When running
 a kernel prior to the above referenced commit, btrfs will compile
 openmotif-2.3.3 without needing to reserve much extra space on the
 partition.

 Let me know if you would like any additional information or tests.

 Can you try my btrfs-work tree and see if you still have the same problem?
 Thanks,

 Josef


I've built and tested the 2.6.38-rc1 kernel from the master branch of
git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work.git,
and I am still getting the same issue.

I've just noticed there is another thread going on about this same problem.

I'll just pile on that thread if I come across something new.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LOOP_GET_STATUS(64) truncates pathnames to 64 chars (was Re: Bug in mkfs.btrfs?!)

2011-02-11 Thread Milan Broz
On 02/11/2011 08:23 PM, Felix Blanke wrote:
 What do you mean with configured?
 
 I'm using loop devices with loop aes, and I've looked into /sys for a device 
 which is actually in use.

Ehm. It is really Loop-AES?

Then ask author to backport it there, Loop-AES is not mainline code.
He usually replaces the whole upstream loop implementation with old patched 
version.

Milan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LOOP_GET_STATUS(64) truncates pathnames to 64 chars (was Re: Bug in mkfs.btrfs?!)

2011-02-11 Thread Felix Blanke
Yeah, for me its loop-aes.
Ah ok, didn't knew that it replaces that whole loop thing :)


Felix

On Feb 11, 2011 8:32 PM, Milan Broz mb...@redhat.com wrote:
 On 02/11/2011 08:23 PM, Felix Blanke wrote:
 What do you mean with configured?

 I'm using loop devices with loop aes, and I've looked into /sys for a device 
 which is actually in use.

 Ehm. It is really Loop-AES?

 Then ask author to backport it there, Loop-AES is not mainline code.
 He usually replaces the whole upstream loop implementation with old patched 
 version.

 Milan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.37: Multi-second I/O latency while untarring

2011-02-11 Thread Andrew Lutomirski
On Fri, Feb 11, 2011 at 10:44 AM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Andrew Lutomirski's message of 2011-02-11 10:08:52 -0500:
 As I type this, I have an ssh process running that's dumping data into
 a fifo at high speed (maybe 500Mbps) and a tar process that's
 untarring from the same fifo onto btrfs.  The btrfs fs is mounted -o
 space_cache,compress.  This machine has 8GB ram, 8 logical cores, and
 a fast (i7-2600) CPU, so it's not an issue with the machine struggling
 under load.

 Every few tens of seconds, my system stalls for several seconds.
 These stalls cause keyboard input to be lost, firefox to hang, etc.

 Setting tar's ionice priority to best effort / 7 or to idle makes no 
 difference.

 ionice idle and queue_depth = 1 on the disk (a slow 2TB WD) also makes
 no difference.

 max_sectors_kb = 64 in addition to the above doesn't help either.

 latencytop shows regular instances of 2-7 *second* latency, variously
 in sync_page, start_transaction, btrfs_start_ordered_extent, and
 do_get_write_access (from jbd2 on my ext4 root partition).

 echo 3 drop_caches gave me 7 GB free RAM.  I still had stalls when
 4-5 GB were still free (so it shouldn't be a problem with important
 pages being evicted).

 In case it matters, all of my partitions are on LVM on dm-crypt, but
 this machine has AES-NI so the overhead from that should be minimal.
 In fact, overall CPU usage is only about 10%.

 What gives?  I thought this stuff was supposed to be better on modern 
 kernels.

 We can tell more if you post the full traces from latencytop.  I have a
 patch here for latencytop that adds a -c mode, which dumps the traces
 out to a text files.

 http://oss.oracle.com/~mason/latencytop.patch

Big dump at end of email from latencytop git + your patch.


 Based on what you have here, I think it's probably a latency problem
 between btrfs and the dm-crypt stuff.  How easily can setup a test
 partition without dm-crypt?

Not so easily on that disk.  I left some space inside the LVM to play
with but none outside.

I'll try hooking up another disk over eSATA l (on a Cougar Point 3Gbps
controller, so it might blow up).


And here's the dump:

=== Fri Feb 11 14:44:07 2011
Globals: Cause Maximum Percentage
synchronous write   4249.1 msec 35.5 %
Writing to a pipe   4248.5 msec 35.5 %
Writing a page to disk  105.9 msec  2.1 %
Page fault   23.7 msec  0.2 %
Reading from a pipe   4.7 msec 19.8 %
Waiting for event (select)4.6 msec  6.4 %
Waiting for event (poll)  1.3 msec  0.0 %
Executing raw SCSI command1.3 msec  0.2 %
opening cdrom device  1.3 msec  0.3 %
Process details:
Process ksoftirqd/1 (10) Total:  50.0 msec
[run_ksoftirqd]   4.8 msec100.0 %
run_ksoftirqd kthread kernel_thread_helper
Process ksoftirqd/2 (15) Total:   8.7 msec
[run_ksoftirqd]   4.9 msec100.0 %
run_ksoftirqd kthread kernel_thread_helper
Process ksoftirqd/3 (19) Total:   2.9 msec
[run_ksoftirqd]   2.9 msec100.0 %
run_ksoftirqd kthread kernel_thread_helper
Process ksoftirqd/5 (27) Total:  80.6 msec
[run_ksoftirqd]   5.0 msec100.0 %
run_ksoftirqd kthread kernel_thread_helper
Process scsi_eh_1 (62) Total:  45.0 msec
Executing internal ATA command0.7 msec 62.3 %
ata_exec_internal_sg ata_exec_internal atapi_eh_request_sense
ata_eh_link_autopsy ata_eh_autopsy sata_pmp_error_handler
ahci_error_handler ata_scsi_error scsi_error_handler kthread
kernel_thread_helper
SCSI error handler0.6 msec 37.7 %
scsi_error_handler kthread kernel_thread_helper
Process kworker/7:1 (76) Total:   8.7 msec
. 3.9 msec100.0 %
worker_thread kthread kernel_thread_helper
Process kworker/4:1 (139) Total: 124.0 msec
. 4.9 msec100.0 %
worker_thread kthread kernel_thread_helper
Process kworker/6:1 (140) Total:  11.7 msec
. 3.8 msec100.0 %
worker_thread kthread kernel_thread_helper
Process kworker/5:1 (141) Total:  12.5 msec
. 4.9 msec100.0 %
worker_thread kthread kernel_thread_helper
Process kworker/2:1 (142) Total:  26.1 msec
. 4.9 msec100.0 %
worker_thread kthread kernel_thread_helper
Process kworker/1:1 (143) Total:  47.1 msec
. 4.9 msec100.0 %
worker_thread kthread kernel_thread_helper
Process kworker/3:1 (150) Total:   4.6 msec
. 3.1 msec100.0 %
worker_thread kthread kernel_thread_helper
Process jbd2/dm-1-8 (376) Total:  66.7 msec
Writing buffer to disk (synchronous) 66.7 msec100.0 %

kernel BUG at /usr/src/btrfs-work/fs/btrfs/extent-tree.c:2195

2011-02-11 Thread Wido den Hollander
Hi,

While testing with my Ceph cluster I saw some btrfs messages:
http://pastebin.com/URN3ShVb

I'm not sure when these messages came up (What state of the OSD).

To keep up with the recent btrfs changes I'm using Josef's btrfs-work
repository ( aba63cd31ab85e3ec7e9805fadc77dad8b7fc945 ) with the 2.6.38
kernel.

One of my OSD's (Object Store Daemons) is still blocking, this is the
OSD which is using /dev/sdc (See the pastebin errors about sdc).

It's in status D, the stack is showing:

root@noisy:~# cat /proc/1974/task/2043/stack 
[a033fc3a] btrfs_commit_transaction_async+0x25a/0x2e0 [btrfs]
[a036e48e] btrfs_mksubvol+0x2ae/0x350 [btrfs]
[a036e62a] btrfs_ioctl_snap_create_transid+0xfa/0x150 [btrfs]
[a036e709] btrfs_ioctl_snap_create_v2+0x89/0x100 [btrfs]
[a0371692] btrfs_ioctl+0x762/0xa90 [btrfs]
[8116de1d] vfs_ioctl+0x1d/0x50
[8116e8b9] do_vfs_ioctl+0x69/0x1d0
[8116eab4] sys_ioctl+0x94/0xa0
[8100c002] system_call_fastpath+0x16/0x1b
[] 0x
root@noisy:~#

I don't know if it is related to the messages in my dmesg, but I thought
i'd send it anyway.

Is this a known bug?

Thank you,

Wido

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


null pointer dereference in iov_iter_copy_from_user_atomic while updating rpm packages

2011-02-11 Thread Clemens Eisserer
Hi,

While updating my fedora rawhide installation, I got the Ooops listed
at the end of the Email.
Is this a known bug (I didn't find anything specific), or should I file a bug?

Thank you in advance, Clemens


Feb 10 10:59:45 testbox kernel: [  524.495751] BUG: unable to handle
kernel NULL pointer dereference at   (null)
Feb 10 10:59:45 testbox kernel: [  524.496006] IP: [c04267a2]
kmap_atomic_prot+0x1c/0x111
Feb 10 10:59:45 testbox kernel: [  524.496006] *pde = 
Feb 10 10:59:45 testbox kernel: [  524.496006] Oops:  [#1] SMP
Feb 10 10:59:45 testbox kernel: [  524.496006] last sysfs file:
/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
Feb 10 10:59:45 testbox kernel: [  524.496006] Modules linked in:
sunrpc cpufreq_ondemand acpi_cpufreq mperf ip6t_REJECT
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
snd_hda_codec_si3054 snd_hda_codec_realtek arc4 snd_hda_intel
snd_hda_codec snd_hwdep snd_seq snd_seq_device iwl3945 snd_pcm iwlcore
mac80211 snd_timer ppdev e1000e snd cfg80211 parport_pc soundcore
iTCO_wdt toshiba_bluetooth joydev parport snd_page_alloc toshiba_acpi
microcode iTCO_vendor_support sparse_keymap rfkill uinput ipv6 btrfs
zlib_deflate libcrc32c sdhci_pci sdhci firewire_ohci mmc_core
firewire_core crc_itu_t yenta_socket i915 drm_kms_helper drm
i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
Feb 10 10:59:45 testbox kernel: [  524.496006]
Feb 10 10:59:45 testbox kernel: [  524.496006] Pid: 1465, comm:
build-locale-ar Not tainted 2.6.38-0.rc3.git4.1.fc15.i686 #1 Portable
PC/Tecra A8
Feb 10 10:59:45 testbox kernel: [  524.496006] EIP: 0060:[c04267a2]
EFLAGS: 00210202 CPU: 0
Feb 10 10:59:45 testbox kernel: [  524.496006] EIP is at
kmap_atomic_prot+0x1c/0x111
Feb 10 10:59:45 testbox kernel: [  524.496006] EAX: f1d56000 EBX:
f1d57eb8 ECX:  EDX: 0163
Feb 10 10:59:45 testbox kernel: [  524.496006] ESI:  EDI:
0163 EBP: f1d57de8 ESP: f1d57dd4
Feb 10 10:59:45 testbox kernel: [  524.496006]  DS: 007b ES: 007b FS:
00d8 GS: 00e0 SS: 0068
Feb 10 10:59:45 testbox kernel: [  524.496006] Process build-locale-ar
(pid: 1465, ti=f1d56000 task=f1d1f110 task.ti=f1d56000)
Feb 10 10:59:45 testbox kernel: [  524.496006] Stack:
Feb 10 10:59:45 testbox kernel: [  524.496006]   f1d57df0
f1d57eb8 1000  f1d57df0 c04268aa f1d57e08
Feb 10 10:59:45 testbox kernel: [  524.496006]  c04ab3cd 
012c 1000  f1d57e2c f8217b41 012c
Feb 10 10:59:45 testbox kernel: [  524.496006]  1010 0002
1000 f1d57eb8 113c  f1d57edc f8218129
Feb 10 10:59:45 testbox kernel: [  524.496006] Call Trace:
Feb 10 10:59:45 testbox kernel: [  524.496006]  [c04268aa]
__kmap_atomic+0x13/0x15
Feb 10 10:59:45 testbox kernel: [  524.496006]  [c04ab3cd]
iov_iter_copy_from_user_atomic+0x28/0x6c
Feb 10 10:59:45 testbox kernel: [  524.496006]  [f8217b41]
btrfs_copy_from_user.isra.6+0x5c/0x96 [btrfs]
Feb 10 10:59:45 testbox kernel: [  524.496006]  [f8218129]
btrfs_file_aio_write+0x480/0x79b [btrfs]
Feb 10 10:59:45 testbox kernel: [  524.496006]  [c04dd8e4] ?
mem_cgroup_update_page_stat+0x1a/0xd4
Feb 10 10:59:45 testbox kernel: [  524.496006]  [c04e3e76]
do_sync_write+0x96/0xcf
Feb 10 10:59:45 testbox kernel: [  524.496006]  [c04e4265] ?
rw_verify_area+0xd0/0xf3
Feb 10 10:59:45 testbox kernel: [  524.496006]  [c04e44fd] vfs_write+0x8f/0xd7
Feb 10 10:59:45 testbox kernel: [  524.496006]  [c04e3de0] ?
do_sync_write+0x0/0xcf
Feb 10 10:59:45 testbox kernel: [  524.496006]  [c04e46bf] sys_write+0x42/0x63
Feb 10 10:59:45 testbox kernel: [  524.496006]  [c07d449c]
syscall_call+0x7/0xb
Feb 10 10:59:45 testbox kernel: [  524.496006] Code: 26 00 8b 15 08 b9
af c0 e8 58 f9 ff ff 5d c3 55 89 e5 57 56 53 83 ec 08 3e 8d 74 26 00
89 c6 89 e0 25 00 e0 ff ff 89 d7 ff 40 14 8b 06 c1 e8 1e 69 c0 80 03
00 00 05 00 07 a3 c0 e8 49 fe ff ff
Feb 10 10:59:45 testbox kernel: [  524.496006] EIP: [c04267a2]
kmap_atomic_prot+0x1c/0x111 SS:ESP 0068:f1d57dd4
Feb 10 10:59:45 testbox kernel: [  524.496006] CR2: 
Feb 10 10:59:45 testbox kernel: [  524.582447] ---[ end trace
e16f2400ae6eb809 ]---
Feb 10 10:59:45 testbox kernel: [  524.584816] note:
build-locale-ar[1465] exited with preempt_count 2
Feb 10 10:59:45 testbox kernel: [  524.584819] BUG: sleeping function
called from invalid context at kernel/rwsem.c:21
Feb 10 10:59:45 testbox kernel: [  524.584822] in_atomic(): 1,
irqs_disabled(): 0, pid: 1465, name: build-locale-ar
Feb 10 10:59:45 testbox kernel: [  524.584828] Pid: 1465, comm:
build-locale-ar Tainted: G  D 2.6.38-0.rc3.git4.1.fc15.i686 #1
Feb 10 10:59:45 testbox kernel: [  524.584830] Call Trace:
Feb 10 10:59:45 testbox kernel: [  524.584835]  [c042e20a] ?
__might_sleep+0xdd/0xe4
Feb 10 10:59:45 testbox kernel: [  524.584839]  [c07d382c] ?
down_read+0x1c/0x30
Feb 10 10:59:45 testbox kernel: [  524.584843]  [c046c69f] ?
acct_collect+0x3e/0x138
Feb 10 10:59:45 testbox kernel: [  524.584847]  [c043da92] ?
do_exit+0x1d0/0x62c
Feb 10 

Re: null pointer dereference in iov_iter_copy_from_user_atomic while updating rpm packages

2011-02-11 Thread Chris Mason
Excerpts from Clemens Eisserer's message of 2011-02-11 18:05:55 -0500:
 Hi,
 
 While updating my fedora rawhide installation, I got the Ooops listed
 at the end of the Email.
 Is this a known bug (I didn't find anything specific), or should I file a bug?
 
 Thank you in advance, Clemens

I think we've fixed this in rc4, or you can git pull from the current
btrfs-unstable tree.

-chris

 
 
 Feb 10 10:59:45 testbox kernel: [  524.495751] BUG: unable to handle
 kernel NULL pointer dereference at   (null)
 Feb 10 10:59:45 testbox kernel: [  524.496006] IP: [c04267a2]
 kmap_atomic_prot+0x1c/0x111
 Feb 10 10:59:45 testbox kernel: [  524.496006] *pde = 
 Feb 10 10:59:45 testbox kernel: [  524.496006] Oops:  [#1] SMP
 Feb 10 10:59:45 testbox kernel: [  524.496006] last sysfs file:
 /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
 Feb 10 10:59:45 testbox kernel: [  524.496006] Modules linked in:
 sunrpc cpufreq_ondemand acpi_cpufreq mperf ip6t_REJECT
 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
 snd_hda_codec_si3054 snd_hda_codec_realtek arc4 snd_hda_intel
 snd_hda_codec snd_hwdep snd_seq snd_seq_device iwl3945 snd_pcm iwlcore
 mac80211 snd_timer ppdev e1000e snd cfg80211 parport_pc soundcore
 iTCO_wdt toshiba_bluetooth joydev parport snd_page_alloc toshiba_acpi
 microcode iTCO_vendor_support sparse_keymap rfkill uinput ipv6 btrfs
 zlib_deflate libcrc32c sdhci_pci sdhci firewire_ohci mmc_core
 firewire_core crc_itu_t yenta_socket i915 drm_kms_helper drm
 i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
 Feb 10 10:59:45 testbox kernel: [  524.496006]
 Feb 10 10:59:45 testbox kernel: [  524.496006] Pid: 1465, comm:
 build-locale-ar Not tainted 2.6.38-0.rc3.git4.1.fc15.i686 #1 Portable
 PC/Tecra A8
 Feb 10 10:59:45 testbox kernel: [  524.496006] EIP: 0060:[c04267a2]
 EFLAGS: 00210202 CPU: 0
 Feb 10 10:59:45 testbox kernel: [  524.496006] EIP is at
 kmap_atomic_prot+0x1c/0x111
 Feb 10 10:59:45 testbox kernel: [  524.496006] EAX: f1d56000 EBX:
 f1d57eb8 ECX:  EDX: 0163
 Feb 10 10:59:45 testbox kernel: [  524.496006] ESI:  EDI:
 0163 EBP: f1d57de8 ESP: f1d57dd4
 Feb 10 10:59:45 testbox kernel: [  524.496006]  DS: 007b ES: 007b FS:
 00d8 GS: 00e0 SS: 0068
 Feb 10 10:59:45 testbox kernel: [  524.496006] Process build-locale-ar
 (pid: 1465, ti=f1d56000 task=f1d1f110 task.ti=f1d56000)
 Feb 10 10:59:45 testbox kernel: [  524.496006] Stack:
 Feb 10 10:59:45 testbox kernel: [  524.496006]   f1d57df0
 f1d57eb8 1000  f1d57df0 c04268aa f1d57e08
 Feb 10 10:59:45 testbox kernel: [  524.496006]  c04ab3cd 
 012c 1000  f1d57e2c f8217b41 012c
 Feb 10 10:59:45 testbox kernel: [  524.496006]  1010 0002
 1000 f1d57eb8 113c  f1d57edc f8218129
 Feb 10 10:59:45 testbox kernel: [  524.496006] Call Trace:
 Feb 10 10:59:45 testbox kernel: [  524.496006]  [c04268aa]
 __kmap_atomic+0x13/0x15
 Feb 10 10:59:45 testbox kernel: [  524.496006]  [c04ab3cd]
 iov_iter_copy_from_user_atomic+0x28/0x6c
 Feb 10 10:59:45 testbox kernel: [  524.496006]  [f8217b41]
 btrfs_copy_from_user.isra.6+0x5c/0x96 [btrfs]
 Feb 10 10:59:45 testbox kernel: [  524.496006]  [f8218129]
 btrfs_file_aio_write+0x480/0x79b [btrfs]
 Feb 10 10:59:45 testbox kernel: [  524.496006]  [c04dd8e4] ?
 mem_cgroup_update_page_stat+0x1a/0xd4
 Feb 10 10:59:45 testbox kernel: [  524.496006]  [c04e3e76]
 do_sync_write+0x96/0xcf
 Feb 10 10:59:45 testbox kernel: [  524.496006]  [c04e4265] ?
 rw_verify_area+0xd0/0xf3
 Feb 10 10:59:45 testbox kernel: [  524.496006]  [c04e44fd] 
 vfs_write+0x8f/0xd7
 Feb 10 10:59:45 testbox kernel: [  524.496006]  [c04e3de0] ?
 do_sync_write+0x0/0xcf
 Feb 10 10:59:45 testbox kernel: [  524.496006]  [c04e46bf] 
 sys_write+0x42/0x63
 Feb 10 10:59:45 testbox kernel: [  524.496006]  [c07d449c]
 syscall_call+0x7/0xb
 Feb 10 10:59:45 testbox kernel: [  524.496006] Code: 26 00 8b 15 08 b9
 af c0 e8 58 f9 ff ff 5d c3 55 89 e5 57 56 53 83 ec 08 3e 8d 74 26 00
 89 c6 89 e0 25 00 e0 ff ff 89 d7 ff 40 14 8b 06 c1 e8 1e 69 c0 80 03
 00 00 05 00 07 a3 c0 e8 49 fe ff ff
 Feb 10 10:59:45 testbox kernel: [  524.496006] EIP: [c04267a2]
 kmap_atomic_prot+0x1c/0x111 SS:ESP 0068:f1d57dd4
 Feb 10 10:59:45 testbox kernel: [  524.496006] CR2: 
 Feb 10 10:59:45 testbox kernel: [  524.582447] ---[ end trace
 e16f2400ae6eb809 ]---
 Feb 10 10:59:45 testbox kernel: [  524.584816] note:
 build-locale-ar[1465] exited with preempt_count 2
 Feb 10 10:59:45 testbox kernel: [  524.584819] BUG: sleeping function
 called from invalid context at kernel/rwsem.c:21
 Feb 10 10:59:45 testbox kernel: [  524.584822] in_atomic(): 1,
 irqs_disabled(): 0, pid: 1465, name: build-locale-ar
 Feb 10 10:59:45 testbox kernel: [  524.584828] Pid: 1465, comm:
 build-locale-ar Tainted: G  D 2.6.38-0.rc3.git4.1.fc15.i686 #1
 Feb 10 10:59:45 testbox kernel: [  524.584830] Call Trace:
 Feb 10 10:59:45 testbox kernel: [  524.584835]  [c042e20a] ?
 

Re: 2.6.37: Multi-second I/O latency while untarring

2011-02-11 Thread Andrew Lutomirski
On Fri, Feb 11, 2011 at 10:44 AM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Andrew Lutomirski's message of 2011-02-11 10:08:52 -0500:
 As I type this, I have an ssh process running that's dumping data into
 a fifo at high speed (maybe 500Mbps) and a tar process that's
 untarring from the same fifo onto btrfs.  The btrfs fs is mounted -o
 space_cache,compress.  This machine has 8GB ram, 8 logical cores, and
 a fast (i7-2600) CPU, so it's not an issue with the machine struggling
 under load.

 Every few tens of seconds, my system stalls for several seconds.
 These stalls cause keyboard input to be lost, firefox to hang, etc.

 Setting tar's ionice priority to best effort / 7 or to idle makes no 
 difference.

 ionice idle and queue_depth = 1 on the disk (a slow 2TB WD) also makes
 no difference.

 max_sectors_kb = 64 in addition to the above doesn't help either.

 latencytop shows regular instances of 2-7 *second* latency, variously
 in sync_page, start_transaction, btrfs_start_ordered_extent, and
 do_get_write_access (from jbd2 on my ext4 root partition).

 echo 3 drop_caches gave me 7 GB free RAM.  I still had stalls when
 4-5 GB were still free (so it shouldn't be a problem with important
 pages being evicted).

 In case it matters, all of my partitions are on LVM on dm-crypt, but
 this machine has AES-NI so the overhead from that should be minimal.
 In fact, overall CPU usage is only about 10%.

 What gives?  I thought this stuff was supposed to be better on modern 
 kernels.

 We can tell more if you post the full traces from latencytop.  I have a
 patch here for latencytop that adds a -c mode, which dumps the traces
 out to a text files.

 http://oss.oracle.com/~mason/latencytop.patch

 Based on what you have here, I think it's probably a latency problem
 between btrfs and the dm-crypt stuff.  How easily can setup a test
 partition without dm-crypt?

Done, on the same physical disk as before.  The latency is just as
bad.  On this test, I wrote a total of 3.1G, which is under half of my
RAM.  That should rule out lots of VM issues.  latencytop trace below.

The impression I get (from watching the disk activity light) is that
the disk is mostly idle but every now and then writes out a ton of
data.  While it's writing, the system often becomes unusable.

P.S.  How bad is this?  I got it on both disks.
btrfs: free space inode generation (0) did not match free space cache
generation (11070) for block group 1103101952




=== Fri Feb 11 19:30:57 2011
Globals: Cause Maximum Percentage
Writing a page to disk  2009.0 msec 19.7 %
fsync() on a file (type 'F' for details)612.2 msec  5.0 %
synchronous write   573.6 msec  1.8 %
Page fault   57.3 msec  0.7 %
Writing buffer to disk (synchronous) 45.2 msec  0.1 %
Unlinking file   12.6 msec  0.0 %
Waiting for event (select)5.0 msec 22.3 %
Reading from a pipe   5.0 msec 29.9 %
Waiting for event (poll)  5.0 msec 17.8 %
Process details:
Process kthreadd (2) Total:   1.9 msec
kthreadd kernel thread1.9 msec100.0 %
kthreadd kernel_thread_helper
Process ksoftirqd/0 (3) Total:  18.5 msec
[run_ksoftirqd]   4.0 msec100.0 %
run_ksoftirqd kthread kernel_thread_helper
Process ksoftirqd/1 (10) Total:  19.6 msec
[run_ksoftirqd]   4.9 msec100.0 %
run_ksoftirqd kthread kernel_thread_helper
Process kworker/0:1 (11) Total: 556.3 msec
. 5.0 msec100.0 %
worker_thread kthread kernel_thread_helper
Process ksoftirqd/2 (15) Total:   8.1 msec
[run_ksoftirqd]   2.9 msec100.0 %
run_ksoftirqd kthread kernel_thread_helper
Process ksoftirqd/4 (23) Total:  11.2 msec
[run_ksoftirqd]   4.3 msec100.0 %
run_ksoftirqd kthread kernel_thread_helper
Process scsi_eh_1 (62) Total:  38.8 msec
SCSI error handler0.9 msec 39.9 %
scsi_error_handler kthread kernel_thread_helper
Executing internal ATA command0.7 msec 60.1 %
ata_exec_internal_sg ata_exec_internal atapi_eh_request_sense
ata_eh_link_autopsy ata_eh_autopsy sata_pmp_error_handler
ahci_error_handler ata_scsi_error scsi_error_handler kthread
kernel_thread_helper
Process kworker/u:4 (69) Total: 616.5 msec
Creating block layer request 54.9 msec 77.8 %
get_request_wait __make_request generic_make_request
kcryptd_crypt_write_io_submit kcryptd_crypt process_one_work
worker_thread kthread kernel_thread_helper
. 5.0 msec 22.2 %
worker_thread kthread kernel_thread_helper
Process kworker/u:5 (70) Total: 1712.3 msec
Creating block layer request492.8 msec 94.3 %