Re: space cache generation (...) does not match inode (...)

2011-08-08 Thread Andrew Lutomirski
On Mon, Aug 8, 2011 at 8:14 AM, Josef Bacik jo...@redhat.com wrote:
 On 08/06/2011 10:16 PM, Andrew Lutomirski wrote:
 I've always gotten space cache generation warnings, but some time
 after 3.0 they started going nuts.  I get:

 space cache generation (14667727114112179905) does not match inode (154185)

 and other similar messages (with a huge number and a smaller number)
 at rates higher than one message per ms.  They don't happen
 constantly, but they come in bursts big enough to fill my log buffer.


 Yeah sorry that's going to happen when you first switch to 3.0.  We
 switched the space cache stuff over to using the normal checksumming
 code so all old space cache is going to look invalid.  This is nothing
 to worry about, it will just end up discarded and re-generated.  Thanks,

Can you put in a rate limit and make the message less alarming?
There's enough log spam from it that I can't see anything else in my
log.

--Andy


 Josef

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


space cache generation (...) does not match inode (...)

2011-08-06 Thread Andrew Lutomirski
I've always gotten space cache generation warnings, but some time
after 3.0 they started going nuts.  I get:

space cache generation (14667727114112179905) does not match inode (154185)

and other similar messages (with a huge number and a smaller number)
at rates higher than one message per ms.  They don't happen
constantly, but they come in bursts big enough to fill my log buffer.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/inode.c:4676!

2011-06-10 Thread Andrew Lutomirski
On Fri, Jun 10, 2011 at 2:43 PM, Marek Otahal markota...@gmail.com wrote:

 The test-case is quite easy,
 1. mount the FS, just with compress-force=lzo option // I didn't try without, 
 but on my other btrfs partition that doesn't use compression the err never 
 happened ...so, can the others who experience the bug confirm compress=lzo 
 used?

Yes, I use compress=lzo.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to remove a device on a RAID-1 before replacing it?

2011-03-29 Thread Andrew Lutomirski
I have a disk with a SMART failure.  It still works but I assume it'll
fail sooner or later.

I want to remove it from my btrfs volume, replace it, and add the new
one.  But the obvious command doesn't work:

# btrfs device delete /dev/dm-5 /mnt/foo
ERROR: error removing the device '/dev/dm-5'

dmesg says:
btrfs: unable to go below two devices on raid1

With mdadm, I would fail the device, remove it, run degraded until I
get a new device, and hot-add that device.

With btrfs, I'd like some confirmation from the fs that data is
balanced appropriately so I won't get data loss if I just yank the
drive.  And I don't even know how to tell btrfs to release the drive
so I can safely remove it.

(Mounting with -o degraded doesn't help.  I could umount, remove the
disk, then remount, but that feels like a hack.)

This is 2.6.38.1 running Fedora 14's version of btrfs-progs, but
btrfs-progs-unstable git does the same thing, as does btrfs-vol -r.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to remove a device on a RAID-1 before replacing it?

2011-03-29 Thread Andrew Lutomirski
On Tue, Mar 29, 2011 at 4:21 PM, cwillu cwi...@cwillu.com wrote:
 On Tue, Mar 29, 2011 at 2:09 PM, Andrew Lutomirski l...@mit.edu wrote:
 I have a disk with a SMART failure.  It still works but I assume it'll
 fail sooner or later.

 I want to remove it from my btrfs volume, replace it, and add the new
 one.  But the obvious command doesn't work:

 # btrfs device delete /dev/dm-5 /mnt/foo
 ERROR: error removing the device '/dev/dm-5'

 dmesg says:
 btrfs: unable to go below two devices on raid1

 With mdadm, I would fail the device, remove it, run degraded until I
 get a new device, and hot-add that device.

 With btrfs, I'd like some confirmation from the fs that data is
 balanced appropriately so I won't get data loss if I just yank the
 drive.  And I don't even know how to tell btrfs to release the drive
 so I can safely remove it.

 (Mounting with -o degraded doesn't help.  I could umount, remove the
 disk, then remount, but that feels like a hack.)

 There's no nice way to remove a failing disk in btrfs right now
 (btrfs dev delete is more of a online management thing to politely
 remove a perfectly functional disk you'd like to use for something
 else.)  As I understand things, the only way to do it right now is the
 umount, remove disk, remount w/ degraded, and then btrfs add the new
 device.


Well, the disk *is* perfectly functional.  It just won't be for long.

I guess what I'm saying is that either btrfs dev delete isn't really
working -- I want to be able to convert to non-RAID and back or
degraged and back or something else equivalent.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.37: Multi-second I/O latency while untarring

2011-03-28 Thread Andrew Lutomirski
On Mon, Feb 14, 2011 at 10:22 AM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Andrew Lutomirski's message of 2011-02-11 19:35:02 -0500:
 On Fri, Feb 11, 2011 at 10:44 AM, Chris Mason chris.ma...@oracle.com wrote:

 
  We can tell more if you post the full traces from latencytop.  I have a
  patch here for latencytop that adds a -c mode, which dumps the traces
  out to a text files.
 
  http://oss.oracle.com/~mason/latencytop.patch
 
  Based on what you have here, I think it's probably a latency problem
  between btrfs and the dm-crypt stuff.  How easily can setup a test
  partition without dm-crypt?

 Done, on the same physical disk as before.  The latency is just as
 bad.  On this test, I wrote a total of 3.1G, which is under half of my
 RAM.  That should rule out lots of VM issues.  latencytop trace below.

 Just to confirm, you say on a physical disk you mean without dm-crypt?

Sorry for the exceedingly slow reply.

This problem is really bad with 2.6.38.1.  To make it a little easier
to demonstrate, I wrote a tool that shows off the problem.

I made a test btrfs partition on a plain disk partition (same disk as
my dm-crypt but an unencrypted partition).  Now clone a kernel tree
there and run make -j8.  Wait until the disk starts to write data out
in earnest (takes awhile to dirty enough pages).  Watch crap like this
happen (with nr_requests = 2048, scheduler = deadline).

io_latency_watch read 1M file on test partition

read took 0.000 seconds (worst = 0.963s)
read took 0.000 seconds (worst = 0.963s)
read took 0.022 seconds (worst = 0.963s)
read took 0.000 seconds (worst = 0.963s)
read took 0.028 seconds (worst = 0.963s)
read took 1.430 seconds (worst = 1.430s)
read took 0.270 seconds (worst = 1.430s)
read took 1.237 seconds (worst = 1.430s)
read took 0.282 seconds (worst = 1.430s)
read took 0.131 seconds (worst = 1.430s)

io_latency_watch read 1M file on other partition on same disk is
similar, and io_latency_test write dir on other partition is even
worse.

The cfq scheduler is similar.

--Andy
/* io_latency_test.c
 * Copyright (c) 2011 Andy Lutomirski
 * Licensed under GPLv2.
 *
 * Compile with gcc -O2 -std=gnu99 -lrt
 */

#define _FILE_OFFSET_BITS 64
#define _GNU_SOURCE
#include stdio.h
#include stdlib.h
#include unistd.h
#include stdbool.h
#include time.h
#include stdint.h
#include string.h
#include signal.h
#include inttypes.h
#include fcntl.h

volatile const char *file_to_unlink;

void handler(int x)
{
  if (file_to_unlink)
unlink((char*)file_to_unlink);

  _exit(0);
}

void do_read(const char *name)
{
  int fd = open(name, O_RDONLY | O_DIRECT);
  if (fd  0) {
perror(open);
exit(1);
  }

  uint64_t worst = 0;
  off_t size = lseek(fd, 0, SEEK_END);
  if (size == (off_t)-1) {
perror(lseek);
abort();
  }

  size -= (size % 4096);

  if (size  4096) {
printf(File is smaller than 4k\n);
exit(1);
  }

  printf(File size is % PRIu64  bytes -- bigger is better\n, (uint64_t)size);

  while(true)
{
  uint64_t pos = 4096 * (random() % (size / 4096));

  struct timespec start;
  clock_gettime(CLOCK_MONOTONIC, start);

  unsigned char x[4096];
  if (pread(fd, x, 4096, pos) != 4096) {
	perror(pread);
	abort();
  }

  struct timespec end;
  clock_gettime(CLOCK_MONOTONIC, end);
  
  uint64_t ns = (end.tv_nsec - start.tv_nsec) + 10ULL * (end.tv_sec - start.tv_sec);

  if (ns  worst)
	worst = ns;

  printf(read took %.3f seconds (worst = %.3fs)\n,
	 1e-9 * ns, 1e-9 * worst);

  if (posix_fadvise(fd, 0, size, POSIX_FADV_DONTNEED) != 0)
	perror(posix_fadvise);

  usleep(100);
}
}

void do_write(const char *dir)
{
  char *name;
  if (asprintf(name, %s/tmp.XX, dir) == -1)
abort();

  int fd = mkstemp(name);
  if (fd == -1) {
perror(mkstemp);
abort();
  }

  file_to_unlink = name;

  uint64_t worst = 0;

  unsigned char x;
  while(true)
{
  x++;
  struct timespec start;
  clock_gettime(CLOCK_MONOTONIC, start);

  if (pwrite(fd, x, 1, 0) != 1) {
	perror(pwrite);
	abort();
  }

  if (fdatasync(fd) != 0) {
	perror(fdatasync);
	abort();
  }

  struct timespec end;
  clock_gettime(CLOCK_MONOTONIC, end);
  
  uint64_t ns = (end.tv_nsec - start.tv_nsec) + 10ULL * (end.tv_sec - start.tv_sec);

  if (ns  worst)
	worst = ns;

  printf(write + fsync took %.3f seconds (worst = %.3fs)\n,
	 1e-9 * ns, 1e-9 * worst);

  usleep(100);
}
}

int main(int argc, char **argv)
{
  if (argc != 3) {
printf(Usage: %s write dir or %s read file\n, argv[0], argv[0]);
return 1;
  }

  bool write;
  if (!strcmp(argv[1], write)) {
write = true;
  } else if (!strcmp(argv[1], read)) {
write = false;
  } else {
printf(Bad mode\n);
return 1;
  }

  struct sigaction sa;
  sa.sa_handler = handler;
  sigemptyset(sa.sa_mask);
  sa.sa_flags = 0;
  if (sigaction(SIGINT, sa, 0) != 0) {
perror(sigaction);
exit(1);
  

2.6.37: bash is looping unkillably in btrfs

2011-02-12 Thread Andrew Lutomirski
I have two processes that are unkillable and taking about 50% of a CPU
each.  There is no actual I/O happening (disk light is off and the
disk even spun down after awhile).  This may or may not be related to
unmounting a filesystem.  (I'm not sure -- I have two btrfs
failesystems and I unmounted one before I noticed the problem).

bash is in:

 [81423344] schedule_timeout+0x36/0xe3
 [810348df] ? ttwu_post_activation+0x60/0xf9
 [81030ee2] ? need_resched+0x23/0x2d
 [8142312c] wait_for_common+0xa8/0xf8
 [8103af01] ? default_wake_function+0x0/0x14
 [81047c5f] ? local_bh_enable_ip+0xe/0x10
 [81423234] wait_for_completion+0x1d/0x1f
 [81114fa5] writeback_inodes_sb_nr+0x76/0x7d
 [81115378] writeback_inodes_sb_nr_if_idle+0x41/0x56
 [a01562b8] shrink_delalloc.clone.43+0xa4/0x13c [btrfs]
 [814245c6] ? _raw_spin_lock+0xe/0x10
 [a0159aae] btrfs_delalloc_reserve_metadata+0x12e/0x140 [btrfs]
 [a0159b94] btrfs_delalloc_reserve_space+0x2a/0x47 [btrfs]
 [810b5db0] ? unlock_page+0x2a/0x2f
 [a0172885] btrfs_file_aio_write+0x503/0x8b3 [btrfs]
 [811c302e] ? security_dentry_open+0x2f/0x33
 [8110dcef] ? mnt_want_write+0x2e/0x4a
 [810f78fe] do_sync_write+0xcb/0x108
 [81030efa] ? should_resched+0xe/0x2e
 [811ca3a2] ? selinux_file_permission+0x5a/0xb9
 [811c2f0a] ? security_file_permission+0x2e/0x33
 [810f7fed] vfs_write+0xac/0xff
 [810f81f4] sys_write+0x4a/0x6e
 [81002beb] system_call_fastpath+0x16/0x1b

flush-btrfs-6 is in:

 [81115f50] bdi_writeback_thread+0x151/0x20b
 [81115dff] ? bdi_writeback_thread+0x0/0x20b
 [81115dff] ? bdi_writeback_thread+0x0/0x20b
 [8105c550] kthread+0x82/0x8a
 [81003994] kernel_thread_helper+0x4/0x10
 [8105c4ce] ? kthread+0x0/0x8a
 [81003990] ? kernel_thread_helper+0x0/0x10

perf top says:

  870.00 23.2% clockevents_program_event
/lib/modules/2.6.37+/build/vmlinux
  541.00 14.4% update_ts_time_stats
/lib/modules/2.6.37+/build/vmlinux
  465.00 12.4% sched_clock
/lib/modules/2.6.37+/build/vmlinux
  133.00  3.5% get_next_timer_interrupt
/lib/modules/2.6.37+/build/vmlinux
  116.00  3.1% schedule
/lib/modules/2.6.37+/build/vmlinux
   57.00  1.5% pick_next_task_fair
/lib/modules/2.6.37+/build/vmlinux
   55.00  1.5% select_task_rq_fair
/lib/modules/2.6.37+/build/vmlinux
   50.00  1.3% __switch_to
/lib/modules/2.6.37+/build/vmlinux
   46.00  1.2% enqueue_hrtimer
/lib/modules/2.6.37+/build/vmlinux
   45.00  1.2% ktime_get
/lib/modules/2.6.37+/build/vmlinux
   44.00  1.2% read_hpet
/lib/modules/2.6.37+/build/vmlinux
   37.00  1.0% try_to_wake_up
/lib/modules/2.6.37+/build/vmlinux
   36.00  1.0% task_rq_lock
/lib/modules/2.6.37+/build/vmlinux
   36.00  1.0% update_rq_clock
/lib/modules/2.6.37+/build/vmlinux
   34.00  0.9% sched_clock_cpu
/lib/modules/2.6.37+/build/vmlinux
   33.00  0.9% tick_nohz_restart_sched_tick
/lib/modules/2.6.37+/build/vmlinux
   32.00  0.9% resched_task
/lib/modules/2.6.37+/build/vmlinux
   31.00  0.8% enqueue_entity
/lib/modules/2.6.37+/build/vmlinux
   31.00  0.8% _raw_spin_unlock_irqrestore
/lib/modules/2.6.37+/build/vmlinux
   31.00  0.8% reschedule_interrupt
/lib/modules/2.6.37+/build/vmlinux
   30.00  0.8% c1e_idle
/lib/modules/2.6.37+/build/vmlinux
   30.00  0.8% _raw_spin_lock
/lib/modules/2.6.37+/build/vmlinux
   29.00  0.8% tick_nohz_stop_sched_tick
/lib/modules/2.6.37+/build/vmlinux
   27.00  0.7% place_entity
/lib/modules/2.6.37+/build/vmlinux
   25.00  0.7% wb_do_writeback
/lib/modules/2.6.37+/build/vmlinux
   25.00  0.7% sched_clock_local
/lib/modules/2.6.37+/build/vmlinux
   24.00  0.6% do_raw_spin_lock
/lib/modules/2.6.37+/build/vmlinux
   22.00  0.6% local_bh_disable
/lib/modules/2.6.37+/build/vmlinux
   22.00  0.6% rb_insert_color
/lib/modules/2.6.37+/build/vmlinux
   22.00  0.6% _raw_spin_lock_bh
/lib/modules/2.6.37+/build/vmlinux
   21.00  0.6% __hrtimer_start_range_ns
/lib/modules/2.6.37+/build/vmlinux
   21.00  0.6% rb_erase
/lib/modules/2.6.37+/build/vmlinux
   20.00  0.5% hrtick_update
/lib/modules/2.6.37+/build/vmlinux
   19.00  0.5% dec128
/lib/modules/2.6.37+/kernel/arch/x86/crypto/aes-x86_64.ko
   19.00  0.5% select_nohz_load_balancer
/lib/modules/2.6.37+/build/vmlinux
   18.00  0.5% select_idle_sibling
/lib/modules/2.6.37+/build/vmlinux
   17.00  0.5% update_curr
/lib/modules/2.6.37+/build/vmlinux

powertop shows 47k IPI per seconds (rescheduling interrupts).

perf record -p 14726 

2.6.37: Multi-second I/O latency while untarring

2011-02-11 Thread Andrew Lutomirski
As I type this, I have an ssh process running that's dumping data into
a fifo at high speed (maybe 500Mbps) and a tar process that's
untarring from the same fifo onto btrfs.  The btrfs fs is mounted -o
space_cache,compress.  This machine has 8GB ram, 8 logical cores, and
a fast (i7-2600) CPU, so it's not an issue with the machine struggling
under load.

Every few tens of seconds, my system stalls for several seconds.
These stalls cause keyboard input to be lost, firefox to hang, etc.

Setting tar's ionice priority to best effort / 7 or to idle makes no difference.

ionice idle and queue_depth = 1 on the disk (a slow 2TB WD) also makes
no difference.

max_sectors_kb = 64 in addition to the above doesn't help either.

latencytop shows regular instances of 2-7 *second* latency, variously
in sync_page, start_transaction, btrfs_start_ordered_extent, and
do_get_write_access (from jbd2 on my ext4 root partition).

echo 3 drop_caches gave me 7 GB free RAM.  I still had stalls when
4-5 GB were still free (so it shouldn't be a problem with important
pages being evicted).

In case it matters, all of my partitions are on LVM on dm-crypt, but
this machine has AES-NI so the overhead from that should be minimal.
In fact, overall CPU usage is only about 10%.

What gives?  I thought this stuff was supposed to be better on modern kernels.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.37: Multi-second I/O latency while untarring

2011-02-11 Thread Andrew Lutomirski
On Fri, Feb 11, 2011 at 10:44 AM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Andrew Lutomirski's message of 2011-02-11 10:08:52 -0500:
 As I type this, I have an ssh process running that's dumping data into
 a fifo at high speed (maybe 500Mbps) and a tar process that's
 untarring from the same fifo onto btrfs.  The btrfs fs is mounted -o
 space_cache,compress.  This machine has 8GB ram, 8 logical cores, and
 a fast (i7-2600) CPU, so it's not an issue with the machine struggling
 under load.

 Every few tens of seconds, my system stalls for several seconds.
 These stalls cause keyboard input to be lost, firefox to hang, etc.

 Setting tar's ionice priority to best effort / 7 or to idle makes no 
 difference.

 ionice idle and queue_depth = 1 on the disk (a slow 2TB WD) also makes
 no difference.

 max_sectors_kb = 64 in addition to the above doesn't help either.

 latencytop shows regular instances of 2-7 *second* latency, variously
 in sync_page, start_transaction, btrfs_start_ordered_extent, and
 do_get_write_access (from jbd2 on my ext4 root partition).

 echo 3 drop_caches gave me 7 GB free RAM.  I still had stalls when
 4-5 GB were still free (so it shouldn't be a problem with important
 pages being evicted).

 In case it matters, all of my partitions are on LVM on dm-crypt, but
 this machine has AES-NI so the overhead from that should be minimal.
 In fact, overall CPU usage is only about 10%.

 What gives?  I thought this stuff was supposed to be better on modern 
 kernels.

 We can tell more if you post the full traces from latencytop.  I have a
 patch here for latencytop that adds a -c mode, which dumps the traces
 out to a text files.

 http://oss.oracle.com/~mason/latencytop.patch

Big dump at end of email from latencytop git + your patch.


 Based on what you have here, I think it's probably a latency problem
 between btrfs and the dm-crypt stuff.  How easily can setup a test
 partition without dm-crypt?

Not so easily on that disk.  I left some space inside the LVM to play
with but none outside.

I'll try hooking up another disk over eSATA l (on a Cougar Point 3Gbps
controller, so it might blow up).


And here's the dump:

=== Fri Feb 11 14:44:07 2011
Globals: Cause Maximum Percentage
synchronous write   4249.1 msec 35.5 %
Writing to a pipe   4248.5 msec 35.5 %
Writing a page to disk  105.9 msec  2.1 %
Page fault   23.7 msec  0.2 %
Reading from a pipe   4.7 msec 19.8 %
Waiting for event (select)4.6 msec  6.4 %
Waiting for event (poll)  1.3 msec  0.0 %
Executing raw SCSI command1.3 msec  0.2 %
opening cdrom device  1.3 msec  0.3 %
Process details:
Process ksoftirqd/1 (10) Total:  50.0 msec
[run_ksoftirqd]   4.8 msec100.0 %
run_ksoftirqd kthread kernel_thread_helper
Process ksoftirqd/2 (15) Total:   8.7 msec
[run_ksoftirqd]   4.9 msec100.0 %
run_ksoftirqd kthread kernel_thread_helper
Process ksoftirqd/3 (19) Total:   2.9 msec
[run_ksoftirqd]   2.9 msec100.0 %
run_ksoftirqd kthread kernel_thread_helper
Process ksoftirqd/5 (27) Total:  80.6 msec
[run_ksoftirqd]   5.0 msec100.0 %
run_ksoftirqd kthread kernel_thread_helper
Process scsi_eh_1 (62) Total:  45.0 msec
Executing internal ATA command0.7 msec 62.3 %
ata_exec_internal_sg ata_exec_internal atapi_eh_request_sense
ata_eh_link_autopsy ata_eh_autopsy sata_pmp_error_handler
ahci_error_handler ata_scsi_error scsi_error_handler kthread
kernel_thread_helper
SCSI error handler0.6 msec 37.7 %
scsi_error_handler kthread kernel_thread_helper
Process kworker/7:1 (76) Total:   8.7 msec
. 3.9 msec100.0 %
worker_thread kthread kernel_thread_helper
Process kworker/4:1 (139) Total: 124.0 msec
. 4.9 msec100.0 %
worker_thread kthread kernel_thread_helper
Process kworker/6:1 (140) Total:  11.7 msec
. 3.8 msec100.0 %
worker_thread kthread kernel_thread_helper
Process kworker/5:1 (141) Total:  12.5 msec
. 4.9 msec100.0 %
worker_thread kthread kernel_thread_helper
Process kworker/2:1 (142) Total:  26.1 msec
. 4.9 msec100.0 %
worker_thread kthread kernel_thread_helper
Process kworker/1:1 (143) Total:  47.1 msec
. 4.9 msec100.0 %
worker_thread kthread kernel_thread_helper
Process kworker/3:1 (150) Total:   4.6 msec
. 3.1 msec100.0 %
worker_thread kthread kernel_thread_helper
Process jbd2/dm-1-8 (376) Total:  66.7 msec
Writing buffer to disk (synchronous) 66.7 msec100.0 %

Re: 2.6.37: Multi-second I/O latency while untarring

2011-02-11 Thread Andrew Lutomirski
On Fri, Feb 11, 2011 at 10:44 AM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Andrew Lutomirski's message of 2011-02-11 10:08:52 -0500:
 As I type this, I have an ssh process running that's dumping data into
 a fifo at high speed (maybe 500Mbps) and a tar process that's
 untarring from the same fifo onto btrfs.  The btrfs fs is mounted -o
 space_cache,compress.  This machine has 8GB ram, 8 logical cores, and
 a fast (i7-2600) CPU, so it's not an issue with the machine struggling
 under load.

 Every few tens of seconds, my system stalls for several seconds.
 These stalls cause keyboard input to be lost, firefox to hang, etc.

 Setting tar's ionice priority to best effort / 7 or to idle makes no 
 difference.

 ionice idle and queue_depth = 1 on the disk (a slow 2TB WD) also makes
 no difference.

 max_sectors_kb = 64 in addition to the above doesn't help either.

 latencytop shows regular instances of 2-7 *second* latency, variously
 in sync_page, start_transaction, btrfs_start_ordered_extent, and
 do_get_write_access (from jbd2 on my ext4 root partition).

 echo 3 drop_caches gave me 7 GB free RAM.  I still had stalls when
 4-5 GB were still free (so it shouldn't be a problem with important
 pages being evicted).

 In case it matters, all of my partitions are on LVM on dm-crypt, but
 this machine has AES-NI so the overhead from that should be minimal.
 In fact, overall CPU usage is only about 10%.

 What gives?  I thought this stuff was supposed to be better on modern 
 kernels.

 We can tell more if you post the full traces from latencytop.  I have a
 patch here for latencytop that adds a -c mode, which dumps the traces
 out to a text files.

 http://oss.oracle.com/~mason/latencytop.patch

 Based on what you have here, I think it's probably a latency problem
 between btrfs and the dm-crypt stuff.  How easily can setup a test
 partition without dm-crypt?

Done, on the same physical disk as before.  The latency is just as
bad.  On this test, I wrote a total of 3.1G, which is under half of my
RAM.  That should rule out lots of VM issues.  latencytop trace below.

The impression I get (from watching the disk activity light) is that
the disk is mostly idle but every now and then writes out a ton of
data.  While it's writing, the system often becomes unusable.

P.S.  How bad is this?  I got it on both disks.
btrfs: free space inode generation (0) did not match free space cache
generation (11070) for block group 1103101952




=== Fri Feb 11 19:30:57 2011
Globals: Cause Maximum Percentage
Writing a page to disk  2009.0 msec 19.7 %
fsync() on a file (type 'F' for details)612.2 msec  5.0 %
synchronous write   573.6 msec  1.8 %
Page fault   57.3 msec  0.7 %
Writing buffer to disk (synchronous) 45.2 msec  0.1 %
Unlinking file   12.6 msec  0.0 %
Waiting for event (select)5.0 msec 22.3 %
Reading from a pipe   5.0 msec 29.9 %
Waiting for event (poll)  5.0 msec 17.8 %
Process details:
Process kthreadd (2) Total:   1.9 msec
kthreadd kernel thread1.9 msec100.0 %
kthreadd kernel_thread_helper
Process ksoftirqd/0 (3) Total:  18.5 msec
[run_ksoftirqd]   4.0 msec100.0 %
run_ksoftirqd kthread kernel_thread_helper
Process ksoftirqd/1 (10) Total:  19.6 msec
[run_ksoftirqd]   4.9 msec100.0 %
run_ksoftirqd kthread kernel_thread_helper
Process kworker/0:1 (11) Total: 556.3 msec
. 5.0 msec100.0 %
worker_thread kthread kernel_thread_helper
Process ksoftirqd/2 (15) Total:   8.1 msec
[run_ksoftirqd]   2.9 msec100.0 %
run_ksoftirqd kthread kernel_thread_helper
Process ksoftirqd/4 (23) Total:  11.2 msec
[run_ksoftirqd]   4.3 msec100.0 %
run_ksoftirqd kthread kernel_thread_helper
Process scsi_eh_1 (62) Total:  38.8 msec
SCSI error handler0.9 msec 39.9 %
scsi_error_handler kthread kernel_thread_helper
Executing internal ATA command0.7 msec 60.1 %
ata_exec_internal_sg ata_exec_internal atapi_eh_request_sense
ata_eh_link_autopsy ata_eh_autopsy sata_pmp_error_handler
ahci_error_handler ata_scsi_error scsi_error_handler kthread
kernel_thread_helper
Process kworker/u:4 (69) Total: 616.5 msec
Creating block layer request 54.9 msec 77.8 %
get_request_wait __make_request generic_make_request
kcryptd_crypt_write_io_submit kcryptd_crypt process_one_work
worker_thread kthread kernel_thread_helper
. 5.0 msec 22.2 %
worker_thread kthread kernel_thread_helper
Process kworker/u:5 (70) Total: 1712.3 msec
Creating block layer request492.8 msec 94.3 %

[2.6.33 regression] btrfs mount causes memory corruption

2010-02-25 Thread Andrew Lutomirski
Mounting btrfs corrupts memory and causes nasty crashes within a few
seconds.  This seems to happen even if the mount fails (note the
unrecognized mount option).  This is a regression from 2.6.32, and
I've attached an example.

--Andy

Btrfs loaded
device fsid cf4a8e080605f191-af91bbbf445c98b8 devid 2 transid 68136 /dev/dm-2
device fsid cf4a8e080605f191-af91bbbf445c98b8 devid 1 transid 68136 /dev/dm-1
device fsid cf4a8e080605f191-af91bbbf445c98b8 devid 2 transid 68136
/dev/mapper/big_2
device fsid cf4a8e080605f191-af91bbbf445c98b8 devid 1 transid 68136
/dev/mapper/big_1
device fsid cf4a8e080605f191-af91bbbf445c98b8 devid 1 transid 68136
/dev/mapper/big_1
btrfs: unrecognized mount option 'acl'
btrfs: open_ctree failed
[ cut here ]
kernel BUG at mm/slub.c:2969!
invalid opcode:  [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/run
CPU 6
Pid: 2692, comm: bash Tainted: GW  2.6.33 #2 P6T WS PRO/System
Product Name
RIP: 0010:[810fbbde]  [810fbbde] kfree+0x62/0xd5
RSP: 0018:88019db87c68  EFLAGS: 00010246
RAX: 0048 RBX: 88019db87d18 RCX: 8801b175de20
RDX: ea00 RSI: ea000380 RDI: 8801
RBP: 88019db87c88 R08: 81a57aa0 R09: 8801b551c240
R10: 0002412fde13 R11:  R12: 8801
R13: 811d9532 R14: 0010 R15: 88019db87ce8
FS:  7fde0bce7700() GS:8800282c() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f041b1b4600 CR3: 0001b776a000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process bash (pid: 2692, threadinfo 88019db86000, task 88019d928000)
Stack:
 8801b551c240 88019db87d18  88019b65f164
0 88019db87ca8 811d9532 88019db87ce8 8801b4b8f548
0 88019db87cc8 811de035 8801b4b8f548 8801b644bba8
Call Trace:
 [811d9532] ebitmap_destroy+0x21/0x3c
 [811de035] context_destroy+0x58/0x6c
 [811e0787] security_compute_sid+0x26d/0x282
 [811e0815] security_transition_sid+0x1f/0x21
 [811d45d9] selinux_bprm_set_creds+0xd1/0x25f
 [810e3510] ? vma_link+0x88/0xb1
 [811d4a29] ? selinux_vm_enough_memory+0x40/0x45
 [8120cc58] ? spin_unlock_irqrestore+0x9/0xb
 [8120cce0] ? __up_write+0x42/0x47
 [811c909d] security_bprm_set_creds+0x13/0x15
 [8110cc3b] prepare_binprm+0xc3/0xf0
 [8110d55e] do_execve+0x150/0x2d2
 [81010eaf] sys_execve+0x43/0x5a
 [8100a0ca] stub_execve+0x6a/0xc0
Code: 83 c3 08 48 83 3b 00 eb ec 49 83 fc 10 0f 86 82 00 00 00 4c 89
e7 e8 c5 e2 ff ff 48 89 c6 48 8b 00 84 c0 78 14 66 a9 00 c0 75 04 0f
0b eb fe 48 89 f7 e8 ea 36 fd ff eb 5c 48 8b 4d 08 48 8b 7e
RIP  [810fbbde] kfree+0x62/0xd5
 RSP 88019db87c68
---[ end trace 57f7151f6a5def07 ]---
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6.33 regression] btrfs mount causes memory corruption

2010-02-25 Thread Andrew Lutomirski
On Thu, Feb 25, 2010 at 3:23 PM, Josef Bacik jo...@redhat.com wrote:
 On Thu, Feb 25, 2010 at 03:01:08PM -0500, Andrew Lutomirski wrote:
 Mounting btrfs corrupts memory and causes nasty crashes within a few
 seconds.  This seems to happen even if the mount fails (note the
 unrecognized mount option).  This is a regression from 2.6.32, and
 I've attached an example.


 And it only happens when you mount a btrfs fs?  Can you show me a trace of 
 when
 you mount a btrfs fs with valid mount options?  I'd like to see if we're not
 cleaning up something properly or what.  Thanks,

Seems OK.  Or maybe I just got lucky, but it's crashed every time I
tried to mount with 'acl' before.

I even went through a couple iterations of trying to mount with
'xattr' and 'user_xattr', both of which failed.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6.33 regression] btrfs mount causes memory corruption

2010-02-25 Thread Andrew Lutomirski
On Thu, Feb 25, 2010 at 3:38 PM, Josef Bacik jo...@redhat.com wrote:

 Ok it looks like we have a problem kfree'ing the wrong stuff.  we kstrdup the
 options string, but then strsep screws with the pointer, so when we kfree() 
 it,
 we're not giving it the right pointer.  Please try this patch, and mount with 
 -o
 acl and other such garbage to make sure it actually worked (acl isn't a valid
 mount option btw).  Let me know if it works.  Thanks,

 Josef


 diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
 index 8a1ea6e..f8b4521 100644
 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -128,7 +128,7 @@ int btrfs_parse_options(struct btrfs_root *root, char 
 *options)
  {
        struct btrfs_fs_info *info = root-fs_info;
        substring_t args[MAX_OPT_ARGS];
 -       char *p, *num;
 +       char *p, *num, *orig;
        int intarg;
        int ret = 0;

 @@ -143,6 +143,7 @@ int btrfs_parse_options(struct btrfs_root *root, char 
 *options)
        if (!options)
                return -ENOMEM;

 +       orig = options;

        while ((p = strsep(options, ,)) != NULL) {
                int token;
 @@ -280,7 +281,7 @@ int btrfs_parse_options(struct btrfs_root *root, char 
 *options)
                }
        }
  out:
 -       kfree(options);
 +       kfree(orig);
        return ret;
  }




Thanks for the instant patch.  I hammered on it a bit and it hasn't
crashed yet.  I'll let you know if it crashes later.  (The earlier
trial with xattr crashed after a couple minutes.)

In the mean time,

Tested-by: Andy Lutomirski l...@mit.edu

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Where is current btrfs and btrfs-progs development?

2009-12-15 Thread Andrew Lutomirski
It looks like the git trees at:

http://git.kernel.org/smart/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git

and

http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=summary

are several weeks out of date.  For example, the patch here:

http://news.gmane.org/gmane.comp.file-systems.btrfs

looks like it's based on a revision that isn't in the
btrfs-progs-unstable tree.  Is there an up to date tree or patch queue
somewhere?

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Snapshot mysteries (and an oops)

2009-12-11 Thread Andrew Lutomirski
Hi all-

I'm a bit mystified by snapshots.  I think that there are some bugs in
btrfsctl at least (or maybe its documentation).  There's definitely at
least one bug in the kernel.

Here's some commands I just tried (vanilla 2.6.32, btrfs-progs from
git today.  test is a brand-new empty btrfs filesystem, mounted with
default options).  Questions and comments are inline:

[test]# btrfsctl -S subvol1 .
operation complete
Btrfs v0.19-4-gab8fb4c
[test]# touch subvol1/file1
[test]# btrfsctl -s snap1 subvol1
operation complete
Btrfs v0.19-4-gab8fb4c
[test]# ls snap1
file1

OK, so it looks like I can make a snapshot of a subvolume, and
everything works as expected.

[test]# mkdir dir2
[test]# touch dir2/file2
[test]# btrfsctl -s snap2 dir2
operation complete
Btrfs v0.19-4-gab8fb4c
[test]# ls snap2
dir2  snap1  subvol1
[test]# ls snap2/snap1
[test]#

WTF?  It looks like btrfsctl just snapshotted the subvolume containing
dir2 instead of snapshotting the directory.  I would have expected it
to either snapshot just the directory or, if that's impossible, to
fail.

[test]# rm -rf snap1
rm: cannot remove directory `snap1': Directory not empty
[test]# ls snap1
[test]#

OK, so rmdir can't remove snapshots.  (Is there any good reason for that?)

[test]# btrfsctl -D snap1
ioctl:: No such file or directory
[test]# btrfsctl -D snap1 .
operation complete
Btrfs v0.19-4-gab8fb4c

I can't make any sense of that.  What's the second parameter to -D
supposed to do?

[test]# btrfsctl -D subvol1 .
operation complete
Btrfs v0.19-4-gab8fb4c

Phew.  That worked :)

[test]# rm -rf *

OK, now I'm back to where I started.

[test]# btrfsctl -S subvol2 .
operation complete
Btrfs v0.19-4-gab8fb4c
[test]# touch subvol2/file
[test]# ln subvol2/file file
Segmentation fault

Crap.  I guess I wasn't supposed to try that.  dmesg attached:

Process ln (pid: 3153, threadinfo 88019694, task 8801a4149780)
Stack:
  88017a741e00 88018af585d0 000e
0 880196941e28 88018af585d0 88017a741e00 88017a7905d0
0 88017e4b3680 88017a790688 880196941e78 81105988
Call Trace:
 [81105988] vfs_link+0xd5/0x14a

Thanks,
Andy
 [811057e9] ? lookup_hash+0x3b/0x3f
 [81107eb1] sys_linkat+0xc4/0x121
 [8106af52] ? up_read+0xe/0x10
 [8141d2a9] ? do_page_fault+0x269/0x299
 [81095e6c] ? audit_syscall_entry+0x11e/0x14a
 [81107f2c] sys_link+0x1e/0x22
 [81011cf2] system_call_fastpath+0x16/0x1b
Code: ff 85 c0 41 89 c6 ba 01 00 00 00 75 39 49 8b 44 24 20 48 89 da
4c 89 fe 4c 89 e7 49 89 45 e0 e8 8f dc ff ff 85 c0 41 89 c6 74 04 0f
0b eb fe 48 8b 45 b8 31 d2 48 89 de 4c 89 e7 48 8b 48 28 e8
RIP  [a0bc4305] btrfs_link+0xcf/0x144 [btrfs]
 RSP 880196941dd8
---[ end trace 95f0a8585b4e506f ]---
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html