Re: btrfs-transaction blocked for more than 120 seconds

2014-01-05 Thread Sulla
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dear Chris!

Certainly: I have 3 HDDs, all of which WD20EARS. Originally I wanted to
let btrfs handle all 3 devices directly without making partitions, but
this was impossible, as at least /boot needed to be ext4, at least back
then when I set up the server. And back then btrfs also hadn't raid5-like
functionality, so I decided to put good old partitions and md-Raids and
LVM on them and use btrfs just as plain file-systems on the partitions
provided by LVM.

On the WD disks I thus created 2 partitions each, the first sdX1 being
~500MiB, the rest, 1.9995 TiB is one partition of, sdX2.

I built a Raid1 on the 3 small partitions sdX1 with ext4 for boot, each
disk is bootable with grub installed into the MBR.

I combined the 3 large partitions to a Raid5 of size 3,64TB:

/proc/mdstat reads:
md0 : active raid1 sda1[5] sdb1[4] sdc1[3]
  498676 blocks super 1.2 [3/3] [UUU]
md1 : active raid5 sda2[5] sdb2[4] sdc2[3]
  3904907520 blocks super 1.2 level 5, 8k chunk, algorithm 2 [3/3] [UUU]

the information you requested:
# sudo mdadm -D /dev/md1
/dev/md1:
Version : 1.2
  Creation Time : Thu Jul 14 18:49:25 2011
 Raid Level : raid5
 Array Size : 3904907520 (3724.01 GiB 3998.63 GB)
  Used Dev Size : 1952453760 (1862.01 GiB 1999.31 GB)
   Raid Devices : 3
  Total Devices : 3
Persistence : Superblock is persistent
Update Time : Sun Jan  5 22:07:22 2014
  State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
 Layout : left-symmetric
 Chunk Size : 8K
   Name : freedom:1  (local to host freedom)
   UUID : 44b72520:a78af6f7:dba13fb3:2203127d
 Events : 576884
Number   Major   Minor   RaidDevice State
   4   8   180  active sync   /dev/sdb2
   5   821  active sync   /dev/sda2
   3   8   342  active sync   /dev/sdc2



I use the Raid5 md1 as physical volume for LVM: pvdisplay gives:
  --- Physical volume ---
  PV Name   /dev/md1
  VG Name   MAIN
  PV Size   3.64 TiB / not usable 2.06 MiB
  Allocatable   yes
  PE Size   4.00 MiB
  Total PE  953346
  Free PE   6274
  Allocated PE  947072
  PV UUID   WcuEx8-ehJL-xHdf-ElwF-b9s3-dlmM-KZlDNG

I keep a reserve of 6274 4MiB blocks (=24GiB) in case one of the logical
volumes runs out of space...

I created the following logical volumes, named after their intended
mountpoints:
  --- Logical volume ---
  LV Path/dev/MAIN/ROOT
  LV NameROOT
  VG NameMAIN
  LV UUIDkURJks-xHox-73B5-n02x-eZfS-agDD-n1dtAm
  LV Write Accessread/write
  LV Creation host, time ,
  LV Status  available
  # open 1
  LV Size19.31 GiB
  Current LE 4944
  Segments   2
  Allocation inherit
  Read ahead sectors auto
  - currently set to 256
  Block device   252:0

and similar:
  --- Logical volume ---
  LV Path/dev/MAIN/SWAP: 1.8GB
  LV Path/dev/MAIN/HOME: 18.6GB
  LV Path/dev/MAIN/TMP: 9.3 GB
  LV Path/dev/MAIN/DATA1 2.6 TB
  LV Path/dev/MAIN/DATA2: 0.9 TB


as filesystem I used btrfs during install form an ubuntu server, I don't
recall which, might have been 11.10 or 12.04 (?) for all logical
partitions except swap, of course,

any other information I can supply?
regards, Sulla

- -- 
Cogito cogito ergo cogito sum.
   Ambrose Bierce














-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.21 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlLJy+8ACgkQR6b2EdogPFupxgCfeDRdeO+PYoQNIjtySAYEmSEr
PNoAoLPNcSqDHsDzM8pAuHlbva7j18MS
=XBOA
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-05 Thread Sulla
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Thanks Chris!

Thanks for your support.

 echo 120 /sys/block/sdX/device/timeout
timeout is 30 for my HDDs. I'm well aware that the WD green HDDs are not
the perfect ones for servers, but they were cheaper - and quieter - than
the black ones for servers. I'll get the red ones next, though. ;-)

 You also need to schedule regular scrubs at the md level as well.

Ubuntu does that once a month.

 cat /sys/block/mdX/mismatch_cnt
this resides in cat /sys/devices/virtual/block/md1/md/mismatch_cnt on my
machine.
the count is zero.

 The workload is presumably small file sizes, like a mail server?
Yes. It serves as a mailserver (maildir-format), but also as a samba file
server with quite big files...

btrfs ran fine for more than a year, so I'm not sure how reproducible the
problem is...

I don't really wish to install or compile cumstom kernels, to be honest.
Not sure how problematic they might be during the next do-release-upgrade...

Sulla


- -- 
Russian Roulette is not the same without a gun
and baby when it's love, if it's not rough, it isn't fun, fun.
   Lady GaGa, Pokerface












-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.21 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlLJ+A8ACgkQR6b2EdogPFuFwwCffSjZpDJvIj70Ag+CPbClCVuc
viEAnjqnxcEdhKR2Gq84eGYEXfjfb23F
=pmTS
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-04 Thread Sulla
Oh gosh, I don't know what went wrong with my btrfs root filesystem, and I
probably will never know, too:

The sudo balance start / was running fine for about 4 or 5 hours, running
at a system load of ~3 when balance status / told me the balancing was on
its way and had completed 19 out of 23 extents.

At this moment the system load started to increase and increase an increase
and when it reached 147 (!!) (while top was showing me NOTHING was going on)
I resetted the computer. TTY1 showed some kernel panics and btrfs-bug
messages, but those files were lost because they've never made it to disk.

Fortunately my RAID5 stayed in sync and everything was fine. System also
booted, but with the same 120+ secs hangs as before. System was unusable, as
e.g. all IMAP logins time-out-ed.

So
* I booted into a live-CD
* mounted a backup disk
* cp-ed all files of the root fs to the backup disk (it could read them
flawlessly)
* formatted the root-partition to ext4 (yes, I feel sad about it)
* cp-ed all root-files from the backupdisk to the ext4 root system
* stroke the subvol=@ boot argument from /boot/grub/grub.cfg
* and rebooted my server.

How I love linux! Wouldn't be possible with M$!!

Now its running fine again, system is responsive as it should be. No clue
'bout what went wrong, though.

I still have /home and the huge data partitions on btrfs and plan to leave
it so. While it would not be difficult to put /home on ext4 it would be a
major effort to cp the ~3TB data off and on the disks...

Thanx for your support,
Sulla

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-01 Thread Sulla
Dear Duncan!

Thanks very much for your exhaustive answer.

Hm, I also thought of fragmentation. Alhtough I don't think this is really
very likely, as my server doesn't serve things that likely cause fragmentation.
It is a mailserver (but only maildir-format), fileserver for windows clients
(huge files that hardly don't get rewritten), a server for TV-records (but
only copy recordings from a sat receiver after they have been recorded, so
no heavy rewriting here), a tiny webserver and all kinds of such things, but
not a storage for huge databases, virtual machines or a target for
filesharing clients.
It however serves as a target for a hardlink-based backupprogram run on
windows PCs, but only once per month or so, so that shouldn't bee too much.

The problem must lie somewhere on the root partition itslef, because the
system is already slow before mounting the fat data-partitions.

I'll give the defragmentation a try. But
# sudo btrfs filesystem defrag -r
doesn't work, because -r is an unknown option (I'm running 
Btrfs v0.20-rc1 on an Ubuntu 3.11.0-14-generic kernel).

I'm doing a
# sudo btrfs filesystem defrag / 
on the root directory at the moment.

Question: will this defragment everything or just the root-fs and will I
need to run a defragment on /home as well, as /home is a separate btrfs
filesystem?

I've also added autodefrag mountoptions and will do a mount -a after the
defragmentation.

I've considered a
# sudo btrfs balance start
as well, would this do any good? How close should I let the data fill the
partition? The large data partitions are 85% used, root is 70% used. Is this
safe or should I add space?

Thanx, Wolfgang

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs-transaction blocked for more than 120 seconds

2013-12-31 Thread Sulla
Dear all!

On my Ubuntu Server 13.10 I use a RAID5 blockdevice consisting of 3 WD20EARS
drives. On this I built a LVM and in this LVM I use quite normal partitions
/, /home, SWAP (/boot resides on a RAID1.) and also a custom /data
partition. Everything (except boot and swap) is on btrfs.

sometimes my system hangs for quite some time (top is showing a high wait
percentage), then runs on normally. I get kernel messages into
/var/log/sylsog, see below. I am unable to make any sense of the kernel
messages, there is no reference to the filesystem or drive affected (at
least I can not find one).

Question: What is happening here?
* Is a HDD failing (smart looks good, however)
* Is something wrong with my btrfs-filesystem? with which one?
* How can I find the cause?

thanks, Wolfgang


Dec 31 12:27:49 freedom kernel: [ 4681.264112] INFO: task
btrfs-transacti:529 blocked for more than 120 seconds.

Dec 31 12:27:49 freedom kernel: [ 4681.264239] echo 0 
/proc/sys/kernel/hung_task_timeout_secs disables this message.

Dec 31 12:27:49 freedom kernel: [ 4681.264367] btrfs-transacti D
88013fc14580 0   529  2 0x

Dec 31 12:27:49 freedom kernel: [ 4681.264377]  880138345e10
0046 880138345fd8 00014580

Dec 31 12:27:49 freedom kernel: [ 4681.264386]  880138345fd8
00014580 880135615dc0 880132fb6a00

Dec 31 12:27:49 freedom kernel: [ 4681.264393]  880133f45800
880138345e30 880137ee2000 880137ee2070

Dec 31 12:27:49 freedom kernel: [ 4681.264402] Call Trace:

Dec 31 12:27:49 freedom kernel: [ 4681.264418]  [816eaa79]
schedule+0x29/0x70

Dec 31 12:27:49 freedom kernel: [ 4681.264477]  [a032a57d]
btrfs_commit_transaction+0x34d/0x980 [btrfs]

Dec 31 12:27:49 freedom kernel: [ 4681.264487]  [81085580] ?
wake_up_atomic_t+0x30/0x30

Dec 31 12:27:49 freedom kernel: [ 4681.264517]  [a0321be5]
transaction_kthread+0x1a5/0x240 [btrfs]

Dec 31 12:27:49 freedom kernel: [ 4681.264548]  [a0321a40] ?
verify_parent_transid+0x150/0x150 [btrfs]

Dec 31 12:27:49 freedom kernel: [ 4681.264557]  [810847b0]
kthread+0xc0/0xd0

Dec 31 12:27:49 freedom kernel: [ 4681.264565]  [810846f0] ?
kthread_create_on_node+0x120/0x120

Dec 31 12:27:49 freedom kernel: [ 4681.264573]  [816f566c]
ret_from_fork+0x7c/0xb0

Dec 31 12:27:49 freedom kernel: [ 4681.264580]  [810846f0] ?
kthread_create_on_node+0x120/0x120

Dec 31 12:27:49 freedom kernel: [ 4681.264610] INFO: task kworker/u4:0:9975
blocked for more than 120 seconds.

Dec 31 12:27:49 freedom kernel: [ 4681.264722] echo 0 
/proc/sys/kernel/hung_task_timeout_secs disables this message.

Dec 31 12:27:49 freedom kernel: [ 4681.264847] kworker/u4:0D
88013fd14580 0  9975  2 0x

Dec 31 12:27:49 freedom kernel: [ 4681.264861] Workqueue: writeback
bdi_writeback_workfn (flush-btrfs-4)

Dec 31 12:27:49 freedom kernel: [ 4681.264865]  8800a8739538
0046 8800a8739fd8 00014580

Dec 31 12:27:49 freedom kernel: [ 4681.264873]  8800a8739fd8
00014580 8801351e5dc0 8801351e5dc0

Dec 31 12:27:49 freedom kernel: [ 4681.264880]  880134c5e6a8
880134c5e6b0  880134c5e6b8

Dec 31 12:27:49 freedom kernel: [ 4681.264887] Call Trace:

Dec 31 12:27:49 freedom kernel: [ 4681.264895]  [816eaa79]
schedule+0x29/0x70

Dec 31 12:27:49 freedom kernel: [ 4681.264902]  [816ec465]
rwsem_down_write_failed+0x105/0x1e0

Dec 31 12:27:49 freedom kernel: [ 4681.264911]  [8136257d] ?
__rwsem_do_wake+0xdd/0x160

Dec 31 12:27:49 freedom kernel: [ 4681.264918]  [81369763]
call_rwsem_down_write_failed+0x13/0x20

Dec 31 12:27:49 freedom kernel: [ 4681.264927]  [816e9e7d] ?
down_write+0x2d/0x30

Dec 31 12:27:49 freedom kernel: [ 4681.264956]  [a030fbe0]
cache_block_group+0x290/0x3b0 [btrfs]

Dec 31 12:27:49 freedom kernel: [ 4681.264963]  [81085580] ?
wake_up_atomic_t+0x30/0x30

Dec 31 12:27:49 freedom kernel: [ 4681.264991]  [a0317d48]
find_free_extent+0xa38/0xac0 [btrfs]

Dec 31 12:27:49 freedom kernel: [ 4681.265022]  [a0317ef2]
btrfs_reserve_extent+0xa2/0x1c0 [btrfs]

Dec 31 12:27:49 freedom kernel: [ 4681.265056]  [a033103d]
__cow_file_range+0x15d/0x4a0 [btrfs]

Dec 31 12:27:49 freedom kernel: [ 4681.265090]  [a0331efa]
cow_file_range+0x8a/0xd0 [btrfs]

Dec 31 12:27:49 freedom kernel: [ 4681.265122]  [a0332290]
run_delalloc_range+0x350/0x390 [btrfs]

Dec 31 12:27:49 freedom kernel: [ 4681.265158]  [a0346bf1] ?
find_lock_delalloc_range.constprop.42+0x1d1/0x1f0 [btrfs]

Dec 31 12:27:49 freedom kernel: [ 4681.265194]  [a0348764]
__extent_writepage+0x304/0x750 [btrfs]

Dec 31 12:27:49 freedom kernel: [ 4681.265202]  [8109a1d5] ?
set_next_entity+0x95/0xb0

Dec 31 12:27:49 freedom kernel: [ 4681.265212]  [810115c6] ?
__switch_to+0x126/0x4b0

Dec 31 12:27:49 freedom kernel: [