BUG during send, cannot delete subvolume

2018-04-12 Thread Matt McKinnon

Hi All,

I had a ctree.c error during a send/receive backup:

kernel BUG at fs/btrfs/ctree.c:1862

Nothing seemed to go wrong otherwise on the file system.  After 
restarting the send, it completed, but I'm left with a subvolume I can't 
delete:


BTRFS warning (device sdb1): Attempt to delete subvolume 176188 during send

I don't see any zombie btrfs send processes lying around.  Is there 
anyway to delete this volume?  Do I just need a reboot?


-Matt
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transacti hammering the system

2017-12-01 Thread Matt McKinnon

Well, it's at zero now...

# btrfs fi df /export/
Data, single: total=30.45TiB, used=30.25TiB
System, DUP: total=32.00MiB, used=3.62MiB
Metadata, DUP: total=66.50GiB, used=65.16GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


On 01/12/17 16:47, Duncan wrote:

Hans van Kranenburg posted on Fri, 01 Dec 2017 18:06:23 +0100 as
excerpted:


On 12/01/2017 05:31 PM, Matt McKinnon wrote:

Sorry, I missed your in-line reply:



2) How big is this filesystem? What does your `btrfs fi df
/mountpoint` say?



# btrfs fi df /export/
Data, single: total=30.45TiB, used=30.25TiB
System, DUP: total=32.00MiB, used=3.62MiB
Metadata, DUP: total=66.50GiB, used=65.08GiB
GlobalReserve, single: total=512.00MiB, used=53.69MiB


Multi-TiB filesystem, check. total/used ratio looks healthy.


Not so healthy, from here.  Data/metadata are healthy, yes,
but...

Any usage at all of global reserve is a red flag indicating that
something in the filesystem thinks, or thought when it resorted
to global reserve, that space is running out.

Global reserve usage doesn't really hint what the problem is,
but it's definitely a red flag that there /is/ a problem, and
it's easily overlooked, as it apparently was here.

It's likely indication of a bug, possibly one of the ones fixed
right around 4.12/4.13.  I'll let the devs and better experts take
it from there, but I'd certainly be worried until global reserve
drops to zero usage.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transacti hammering the system

2017-12-01 Thread Matt McKinnon
Right.  The file system is 48T, with 17T available, so we're not quite 
pushing it yet.


So far so good on the space_cache=v2 mount.  I'm surprised this isn't on 
the gotcha page in the wiki; it may end up making a world of difference 
to the users here


Thanks again,
Matt

On 01/12/17 13:24, Hans van Kranenburg wrote:

On 12/01/2017 06:57 PM, Holger Hoffstätte wrote:

On 12/01/17 18:34, Matt McKinnon wrote:

Thanks, I'll give space_cache=v2 a shot.


Yes, very much recommended.


My mount options are: rw,relatime,space_cache,autodefrag,subvolid=5,subvol=/


Turn autodefrag off and use noatime instead of relatime.

Your filesystem also seems very full,


We don't know. btrfs fi df only displays allocated space. And that being
full is good, it means not too much free space fragments everywhere.


that's bad with every filesystem but
*especially* with btrfs because the allocator has to work really hard to find
free space for COWing. Really consider deleting stuff or adding more space.



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transacti hammering the system

2017-12-01 Thread Matt McKinnon

Thanks, I'll give space_cache=v2 a shot.

My mount options are: rw,relatime,space_cache,autodefrag,subvolid=5,subvol=/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transacti hammering the system

2017-12-01 Thread Matt McKinnon

Sorry, I missed your in-line reply:



1) The one right above, btrfs_write_out_cache, is the write-out of the
free space cache v1. Do you see this for multiple seconds going on, and
does it match the time when it's writing X MB/s to disk?



It seems to only last until the next watch update.

[] io_schedule+0x16/0x40
[] get_request+0x23e/0x720
[] blk_queue_bio+0xc1/0x3a0
[] generic_make_request+0xf8/0x2a0
[] submit_bio+0x75/0x150
[] btrfs_map_bio+0xe5/0x2f0 [btrfs]
[] btree_submit_bio_hook+0x8c/0xe0 [btrfs]
[] submit_one_bio+0x63/0xa0 [btrfs]
[] flush_epd_write_bio+0x3b/0x50 [btrfs]
[] flush_write_bio+0xe/0x10 [btrfs]
[] btree_write_cache_pages+0x379/0x450 [btrfs]
[] btree_writepages+0x5d/0x70 [btrfs]
[] do_writepages+0x1c/0x70
[] __filemap_fdatawrite_range+0xaa/0xe0
[] filemap_fdatawrite_range+0x13/0x20
[] btrfs_write_marked_extents+0xe9/0x110 [btrfs]
[] btrfs_write_and_wait_transaction.isra.22+0x3d/0x80 
[btrfs]

[] btrfs_commit_transaction+0x665/0x900 [btrfs]
[] transaction_kthread+0x18a/0x1c0 [btrfs]
[] kthread+0x109/0x140
[] ret_from_fork+0x25/0x30

The last three lines will stick around for a while.  Is switching to 
space cache v2 something that everyone should be doing?  Something that 
would be a good test at least?




2) How big is this filesystem? What does your `btrfs fi df /mountpoint` say?



# btrfs fi df /export/
Data, single: total=30.45TiB, used=30.25TiB
System, DUP: total=32.00MiB, used=3.62MiB
Metadata, DUP: total=66.50GiB, used=65.08GiB
GlobalReserve, single: total=512.00MiB, used=53.69MiB



3) What kind of workload are you running? E.g. how can you describe it
within a range from "big files which just sit there" to "small writes
and deletes all over the place all the time"?



It's a pretty light workload most of the time.  It's a file system that 
exports two NFS shares to a small lab group.  I believe it is more small 
reads all over a large file (MRI imaging) rather than small writes.



4) What kernel version is this? `uname -a` output?



# uname -a
Linux machine_name 4.12.8-custom #1 SMP Tue Aug 22 10:15:01 EDT 2017 
x86_64 x86_64 x86_64 GNU/Linux


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transacti hammering the system

2017-12-01 Thread Matt McKinnon

These seem to come up most often:

[] transaction_kthread+0x133/0x1c0 [btrfs]
[] kthread+0x109/0x140
[] ret_from_fork+0x25/0x30


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transacti hammering the system

2017-12-01 Thread Matt McKinnon

Thanks for this.  Here's what I get:


[] transaction_kthread+0x133/0x1c0 [btrfs]
[] kthread+0x109/0x140
[] ret_from_fork+0x25/0x30

...

[] io_schedule+0x16/0x40
[] get_request+0x23e/0x720
[] blk_queue_bio+0xc1/0x3a0
[] generic_make_request+0xf8/0x2a0
[] submit_bio+0x75/0x150
[] btrfs_map_bio+0xe5/0x2f0 [btrfs]
[] btree_submit_bio_hook+0x8c/0xe0 [btrfs]
[] submit_one_bio+0x63/0xa0 [btrfs]
[] flush_epd_write_bio+0x3b/0x50 [btrfs]
[] flush_write_bio+0xe/0x10 [btrfs]
[] btree_write_cache_pages+0x379/0x450 [btrfs]
[] btree_writepages+0x5d/0x70 [btrfs]
[] do_writepages+0x1c/0x70
[] __filemap_fdatawrite_range+0xaa/0xe0
[] filemap_fdatawrite_range+0x13/0x20
[] btrfs_write_marked_extents+0xe9/0x110 [btrfs]
[] btrfs_write_and_wait_transaction.isra.22+0x3d/0x80 
[btrfs]

[] btrfs_commit_transaction+0x665/0x900 [btrfs]

...

[] io_schedule+0x16/0x40
[] wait_on_page_bit+0xe8/0x120
[] read_extent_buffer_pages+0x1cd/0x2e0 [btrfs]
[] btree_read_extent_buffer_pages+0x9f/0x100 [btrfs]
[] read_tree_block+0x32/0x50 [btrfs]
[] read_block_for_search.isra.32+0x120/0x2e0 [btrfs]
[] btrfs_next_old_leaf+0x215/0x400 [btrfs]
[] btrfs_next_leaf+0x10/0x20 [btrfs]
[] btrfs_lookup_csums_range+0x12e/0x410 [btrfs]
[] csum_exist_in_range.isra.49+0x2a/0x81 [btrfs]
[] run_delalloc_nocow+0x9b2/0xa10 [btrfs]
[] run_delalloc_range+0x68/0x340 [btrfs]
[] writepage_delalloc.isra.47+0xf0/0x140 [btrfs]
[] __extent_writepage+0xc7/0x290 [btrfs]
[] extent_write_cache_pages.constprop.53+0x2b5/0x450 
[btrfs]

[] extent_writepages+0x4d/0x70 [btrfs]
[] btrfs_writepages+0x28/0x30 [btrfs]
[] do_writepages+0x1c/0x70
[] __filemap_fdatawrite_range+0xaa/0xe0
[] filemap_fdatawrite_range+0x13/0x20
[] btrfs_fdatawrite_range+0x20/0x50 [btrfs]
[] __btrfs_write_out_cache+0x3d9/0x420 [btrfs]
[] btrfs_write_out_cache+0x86/0x100 [btrfs]
[] btrfs_write_dirty_block_groups+0x261/0x390 [btrfs]
[] commit_cowonly_roots+0x1fb/0x290 [btrfs]
[] btrfs_commit_transaction+0x434/0x900 [btrfs]

...

[] tree_search_offset.isra.23+0x37/0x1d0 [btrfs]

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs-transacti hammering the system

2017-12-01 Thread Matt McKinnon

Hi All,

Is there any way to figure out what exactly btrfs-transacti is chugging 
on?  I have a few file systems that seem to get wedged for days on end 
with this process pegged around 100%.  I've stopped all snapshots, made 
sure no quotas were enabled, turned on autodefrag in the mount options, 
tried manual defragging, kernel upgrades, yet still this brings my 
system to a crawl.


Network I/O to the system seems very tiny.  The only I/O I see to the 
disk is btrfs-transacti writing a couple M/s.


# time touch foo

real2m54.303s
user0m0.000s
sys 0m0.002s

# uname -r
4.12.8-custom

# btrfs --version
btrfs-progs v4.13.3

Yes, I know I'm a bit behind there...

-Matt



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel BUG at fs/btrfs/ctree.c:3182

2017-10-16 Thread Matt McKinnon

Hi All,

Been having issues on one machine and I was wondering if I could get 
some help tracking the issue down.


# uname -a
Linux riperton 4.13.5-custom #1 SMP Sat Oct 7 18:28:16 EDT 2017 x86_64 
x86_64 x86_64 GNU/Linux


# btrfs --version
btrfs-progs v4.13.3

# btrfs fi show
Label: none  uuid: 8133a362-8e41-4da4-b607-a27832861157
Total devices 1 FS bytes used 41.64TiB
devid1 size 50.93TiB used 41.88TiB path /dev/sda1

# btrfs fi df /export/
Data, single: total=41.70TiB, used=41.57TiB
System, DUP: total=64.00MiB, used=4.56MiB
Metadata, DUP: total=90.00GiB, used=72.30GiB
Metadata, single: total=1.53GiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B


[617994.948036] [ cut here ]
[617994.948040] kernel BUG at fs/btrfs/ctree.c:3182!
[617994.952786] invalid opcode:  [#1] SMP
[617994.956896] Modules linked in: ipmi_devintf xt_tcpudp 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
iptable_filter ip_tables x_tables intel_ra
pl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm 
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btrfs pcbc 
aesni_intel aes_
x86_64 crypto_simd glue_helper cryptd dm_multipath joydev lpc_ich mei_me 
mei nfsd ioatdma auth_rpcgss nfs_acl ipmi_si wmi nfs ipmi_msghandler 
lockd grace sunrp
c fscache shpchp mac_hid lp parport ses enclosure scsi_transport_sas 
raid10 raid456 async_raid6_recov hid_generic async_memcpy async_pq 
usbhid async_xor hid as
ync_tx xor igb raid6_pq libcrc32c i2c_algo_bit raid1 ahci dca raid0 
libahci ptp megaraid_sas multipath pps_core linear dm_mirror 
dm_region_hash dm_log
[617995.025316] CPU: 1 PID: 3191 Comm: nfsd Tainted: GW 
4.13.5-custom #1
[617995.032965] Hardware name: Supermicro 
X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014

[617995.042092] task: 996bac7d5a00 task.stack: bb7984b74000
[617995.048134] RIP: 0010:btrfs_set_item_key_safe+0x14e/0x160 [btrfs]
[617995.054310] RSP: 0018:bb7984b77658 EFLAGS: 00010246
[617995.059622] RAX:  RBX: 0037 RCX: 
00018000
[617995.066834] RDX:  RSI: bb7984b7776e RDI: 
bb7984b77677
[617995.074051] RBP: bb7984b776b0 R08: bb7984b77677 R09: 

[617995.081263] R10:  R11: 0003 R12: 
bb7984b77666
[617995.088483] R13: 99679cc00460 R14: bb7984b7776e R15: 
9966184867a8
[617995.095705] FS:  () GS:9967afc8() 
knlGS:

[617995.103876] CS:  0010 DS:  ES:  CR0: 80050033
[617995.109707] CR2: 7fdbaad6 CR3: 00071fe09000 CR4: 
001406e0

[617995.116921] Call Trace:
[617995.119493]  __btrfs_drop_extents+0x50c/0xdd0 [btrfs]
[617995.124663]  ? btrfs_encode_fh+0xd0/0xd0 [btrfs]
[617995.129390]  btrfs_log_changed_extents+0x31b/0x640 [btrfs]
[617995.134990]  ? free_extent_buffer+0x4b/0x90 [btrfs]
[617995.139976]  btrfs_log_inode+0x8de/0xb90 [btrfs]
[617995.144686]  ? dput+0xf1/0x1d0
[617995.147847]  btrfs_log_inode_parent+0x21a/0x960 [btrfs]
[617995.153164]  ? kmem_cache_alloc+0x194/0x1a0
[617995.157459]  ? start_transaction+0x120/0x440 [btrfs]
[617995.162528]  btrfs_log_dentry_safe+0x69/0x90 [btrfs]
[617995.167599]  btrfs_sync_file+0x2ab/0x3e0 [btrfs]
[617995.172309]  vfs_fsync_range+0x3d/0xb0
[617995.176168]  btrfs_file_write_iter+0x45b/0x560 [btrfs]
[617995.181396]  do_iter_readv_writev+0xe2/0x130
[617995.185753]  do_iter_write+0x7f/0x190
[617995.189506]  vfs_iter_write+0x19/0x30
[617995.193271]  nfsd_vfs_write+0xb1/0x310 [nfsd]
[617995.197719]  nfsd_write+0x134/0x1e0 [nfsd]
[617995.201908]  nfsd3_proc_write+0x92/0x110 [nfsd]
[617995.206533]  nfsd_dispatch+0xb9/0x250 [nfsd]
[617995.210915]  svc_process_common+0x36e/0x6f0 [sunrpc]
[617995.215979]  svc_process+0xfc/0x1c0 [sunrpc]
[617995.220339]  nfsd+0xe9/0x160 [nfsd]
[617995.223918]  kthread+0x109/0x140
[617995.227238]  ? nfsd_destroy+0x60/0x60 [nfsd]
[617995.231591]  ? kthread_park+0x60/0x60
[617995.235348]  ret_from_fork+0x25/0x30
[617995.239010] Code: 48 8b 45 bf 48 8d 7d c7 4c 89 f6 48 89 45 d0 0f b6 
45 be 88 45 cf 48 8b 45 b6 48 89 45 c7 e8 aa f3 ff ff 85 c0 0f 8f 55 ff 
ff ff <0f> 0b

0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
[617995.257983] RIP: btrfs_set_item_key_safe+0x14e/0x160 [btrfs] RSP: 
bb7984b77658

[617995.265696] ---[ end trace 41d8bb716a419cdd ]---



And after a reboot we come up with this warning:



[  112.712899] [ cut here ]
[  112.712943] WARNING: CPU: 5 PID: 505 at fs/btrfs/file.c:547 
btrfs_drop_extent_cache+0x3c5/0x3d0 [btrfs]
[  112.712944] Modules linked in: intel_rapl sb_edac 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel xt_tcpudp kvm 
nf_conntrack_ipv4 nf_defrag_ipv4 irqbypass xt_conntrack crct10dif_pclmul 
nf_conntrack crc32_pclmul ghash_clmulni_intel pcbc iptable_filter 
ip_tables aesni_intel x_tables aes_x86_64 crypto_simd glue_helper cryptd 
dm_multipath 

Re: Struggling with file system slowness

2017-05-09 Thread Matt McKinnon
Those snapshots were created using Marc Merlin's script (thanks, Marc). 
They don't do anything except sit around on the file system for a week 
or so and then are removed.


I'm now doing quarter-hourly snaps instead of nightly since I have 
nightly backups of the filesytem going off-site.  So far the 
btrfs-transaction and memory spikes have not returned.


-Matt





On 05/09/2017 03:14 PM, Liu Bo wrote:

On Fri, May 05, 2017 at 09:24:32AM -0400, Matt McKinnon wrote:

Too little information. Is IO happening at the same time? Is
compression on? Deduplicated? Lots of subvolumes? SSD? What
kind of workload and file size/distribution profile?


Only write IO during the load spikes.  No compression, no deduplication.  12
volumes (including snapshots).  Spinning disks.  Medium workload; file sizes
are all over the map since this hold about 30 user home directories.

Interestingly enough, the problems which had persisted for many weeks went
away when all snapshots were removed.  btrfs-transaction spikes disappeared.
Memory usage went from 30G to under 2G.



Were those snapshots served as backup?

Could you please elaborate how you create snapshots?  We could
probably hammer out a testcase to improve the situation.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Struggling with file system slowness

2017-05-04 Thread Matt McKinnon

Hi All,

Trying to peg down why I have one server that has btrfs-transacti pegged 
at 100% CPU for most of the time.


I thought this might have to do with fragmentation as mentioned in the 
Gotchas page in the wiki (btrfs-endio-wri doesn't seem to be involved as 
mentioned in the wiki), but after running a full defrag of the file 
system, and also enabling the 'autodefrag' mount option, the problem 
still persists.


What's the best way to figure out what btrfs is chugging away at here?

Kernel: 4.10.13-custom
btrfs-progs: v4.10.2


-Matt
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hard crash on 4.9.5, part 2

2017-01-30 Thread Matt McKinnon
I have an error on this file system I've had in the distant pass where 
the mount would fail with a "file exists" error.  Running a btrfs check 
gives the following over and over again:


Found file extent holes:
start: 0, len: 290816
root 257 inode 28472371 errors 1000, some csum missing
root 257 inode 28472416 errors 1000, some csum missing
root 257 inode 9182183 errors 1000, some csum missing
root 257 inode 9182186 errors 1000, some csum missing
root 257 inode 28419536 errors 1100, file extent discount, some csum missing
Found file extent holes:
start: 0, len: 290816
root 257 inode 28472371 errors 1000, some csum missing
root 257 inode 28472416 errors 1000, some csum missing
root 257 inode 9182183 errors 1000, some csum missing
root 257 inode 9182186 errors 1000, some csum missing
root 257 inode 28419536 errors 1100, file extent discount, some csum missing


Are these found per subvolume snapshot I have and will eventually end?

Here is the crash after the mount (with recovery/usebackuproot):

[  627.233213] BTRFS warning (device sda1): 'recovery' is deprecated, 
use 'usebackuproot' instead
[  627.233216] BTRFS info (device sda1): trying to use backup root at 
mount time

[  627.233218] BTRFS info (device sda1): disk space caching is enabled
[  627.233220] BTRFS info (device sda1): has skinny extents
[  709.234688] [ cut here ]
[  709.234734] WARNING: CPU: 5 PID: 3468 at fs/btrfs/file.c:546 
btrfs_drop_extent_cache+0x3e8/0x400 [btrfs]
[  709.234735] Modules linked in: ipmi_devintf nfsd auth_rpcgss nfs_acl 
nfs lockd grace sunrpc fscache lp parport intel_rapl sb_edac
 edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel 
xt_tcpudp kvm nf_conntrack_ipv4 nf_defrag_ipv4 irqbypass crct10d
if_pclmul crc32_pclmul ghash_clmulni_intel xt_conntrack aesni_intel 
btrfs nf_conntrack aes_x86_64 lrw gf128mul iptable_filter glue_h
elper ip_tables ablk_helper cryptd x_tables dm_multipath joydev mei_me 
ioatdma mei lpc_ich wmi ipmi_si ipmi_msghandler shpchp mac_hi
d ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor hid_generic megarai
d_sas raid6_pq ahci libcrc32c libahci igb usbhid raid1 hid i2c_algo_bit 
raid0 dca ptp multipath pps_core linear dm_mirror dm_region_

hash dm_log
[  709.234812] CPU: 5 PID: 3468 Comm: mount Not tainted 4.9.5-custom #1
[  709.234813] Hardware name: Supermicro 
X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014
[  709.234816]  bd3784bb7568 8e3c8e7c  

[  709.234820]  bd3784bb75a8 8e07d3d1 02220070 
9e5f0ae4d150
[  709.234823]  0002d000 9e5f0bc91f78 9e5f0bc91da8 
0002c000

[  709.234827] Call Trace:
[  709.234837]  [] dump_stack+0x63/0x87
[  709.234846]  [] __warn+0xd1/0xf0
[  709.234850]  [] warn_slowpath_null+0x1d/0x20
[  709.234874]  [] btrfs_drop_extent_cache+0x3e8/0x400 
[btrfs]
[  709.234895]  [] __btrfs_drop_extents+0x5b2/0xd30 
[btrfs]
[  709.234914]  [] ? 
generic_bin_search.constprop.36+0x8b/0x1e0 [btrfs]
[  709.234931]  [] ? btrfs_set_path_blocking+0x36/0x70 
[btrfs]

[  709.234942]  [] ? kmem_cache_alloc+0x194/0x1a0
[  709.234958]  [] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
[  709.234977]  [] btrfs_drop_extents+0x79/0xa0 [btrfs]
[  709.235002]  [] replay_one_extent+0x414/0x7b0 [btrfs]
[  709.235007]  [] ? autoremove_wake_function+0x40/0x40
[  709.235030]  [] replay_one_buffer+0x4cc/0x7c0 [btrfs]
[  709.235053]  [] ? 
mark_extent_buffer_accessed+0x4f/0x70 [btrfs]

[  709.235074]  [] walk_down_log_tree+0x1ba/0x3b0 [btrfs]
[  709.235094]  [] walk_log_tree+0xb4/0x1a0 [btrfs]
[  709.235114]  [] btrfs_recover_log_trees+0x20e/0x460 
[btrfs]

[  709.235133]  [] ? replay_one_extent+0x7b0/0x7b0 [btrfs]
[  709.235154]  [] open_ctree+0x2640/0x27f0 [btrfs]
[  709.235171]  [] btrfs_mount+0xca4/0xec0 [btrfs]
[  709.235176]  [] ? find_next_zero_bit+0x1e/0x20
[  709.235180]  [] ? pcpu_next_unpop+0x3e/0x50
[  709.235184]  [] ? find_next_bit+0x19/0x20
[  709.235190]  [] mount_fs+0x39/0x160
[  709.235193]  [] ? __alloc_percpu+0x15/0x20
[  709.235196]  [] vfs_kern_mount+0x67/0x110
[  709.235213]  [] btrfs_mount+0x18b/0xec0 [btrfs]
[  709.235216]  [] ? find_next_zero_bit+0x1e/0x20
[  709.235220]  [] mount_fs+0x39/0x160
[  709.235223]  [] ? __alloc_percpu+0x15/0x20
[  709.235225]  [] vfs_kern_mount+0x67/0x110
[  709.235228]  [] do_mount+0x1bb/0xc80
[  709.235232]  [] ? kmem_cache_alloc_trace+0x14b/0x1b0
[  709.235235]  [] SyS_mount+0x83/0xd0
[  709.235240]  [] entry_SYSCALL_64_fastpath+0x1e/0xad
[  709.235243] ---[ end trace d4e5dcddb432b7d3 ]---
[  709.354972] BTRFS: error (device sda1) in btrfs_replay_log:2506: 
errno=-17 Object already exists (Failed to recover log tree)
[  709.355570] BTRFS error (device sda1): cleaner transaction attach 
returned -30

[  709.548919] BTRFS error (device sda1): open_ctree failed


-Matt
--
To unsubscribe from this list: send the line "un

Re: Hard crash on 4.9.5

2017-01-28 Thread Matt McKinnon
This same file system (which crashed again with the same errors) is also 
giving this output during a metadata or data balance:


Jan 27 19:42:47 my_machine kernel: [  335.018123] BTRFS info (device 
sda1): no csum found for inode 28472371 start 2191360
Jan 27 19:42:47 my_machine kernel: [  335.018128] BTRFS info (device 
sda1): no csum found for inode 28472371 start 2195456
Jan 27 19:42:47 my_machine kernel: [  335.018491] BTRFS info (device 
sda1): no csum found for inode 28472371 start 4018176
Jan 27 19:42:47 my_machine kernel: [  335.018496] BTRFS info (device 
sda1): no csum found for inode 28472371 start 4022272
Jan 27 19:42:47 my_machine kernel: [  335.018499] BTRFS info (device 
sda1): no csum found for inode 28472371 start 4026368
Jan 27 19:42:47 my_machine kernel: [  335.018502] BTRFS info (device 
sda1): no csum found for inode 28472371 start 4030464
Jan 27 19:42:47 my_machine kernel: [  335.019443] BTRFS info (device 
sda1): no csum found for inode 28472371 start 6156288
Jan 27 19:42:47 my_machine kernel: [  335.019688] BTRFS info (device 
sda1): no csum found for inode 28472371 start 7933952
Jan 27 19:42:47 my_machine kernel: [  335.019693] BTRFS info (device 
sda1): no csum found for inode 28472371 start 7938048
Jan 27 19:42:47 my_machine kernel: [  335.019754] BTRFS info (device 
sda1): no csum found for inode 28472371 start 8077312
Jan 27 19:42:47 my_machine kernel: [  335.025485] BTRFS warning (device 
sda1): csum failed ino 28472371 off 2191360 csum 4031061501 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.025490] BTRFS warning (device 
sda1): csum failed ino 28472371 off 2195456 csum 2371784003 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.025526] BTRFS warning (device 
sda1): csum failed ino 28472371 off 4018176 csum 3812080098 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.025531] BTRFS warning (device 
sda1): csum failed ino 28472371 off 4022272 csum 2776681411 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.025534] BTRFS warning (device 
sda1): csum failed ino 28472371 off 4026368 csum 1179241675 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.025540] BTRFS warning (device 
sda1): csum failed ino 28472371 off 4030464 csum 1256914217 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.026142] BTRFS warning (device 
sda1): csum failed ino 28472371 off 7933952 csum 2695958066 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.026147] BTRFS warning (device 
sda1): csum failed ino 28472371 off 7938048 csum 3260800596 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.026934] BTRFS warning (device 
sda1): csum failed ino 28472371 off 6156288 csum 4293116449 expected csum 0
Jan 27 19:42:47 my_machine kernel: [  335.033249] BTRFS warning (device 
sda1): csum failed ino 28472371 off 8077312 csum 4031878292 expected csum 0


Can these be ignored?


On 01/25/2017 04:06 PM, Liu Bo wrote:

On Mon, Jan 23, 2017 at 03:03:55PM -0500, Matt McKinnon wrote:

Wondering what to do about this error which says 'reboot needed'.  Has
happened a three times in the past week:



Well, I don't think btrfs's logic here is wrong, the following stack
shows that a nfs client has sent a second unlink against the same inode
while somehow the inode was not fully deleted by the first unlink.

So it'd be good that you could add some debugging information to get us
further.

Thanks,

-liubo


Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device sda1):
err add delayed dir index item(index: 23810) into the deletion tree of the
delayed node(root id: 257, inode id: 2661433, errno: -17)
Jan 23 14:16:17 my_machine kernel: [ 2568.611010] [ cut here
]
Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at
fs/btrfs/delayed-inode.c:1557!
Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode:  [#1]
SMP
Jan 23 14:16:17 my_machine kernel: [ 2568.624960] Modules linked in: ufs
qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_rej
ect_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd au
th_rpcgss nfs_acl nfs lockd grace sunrpc fscache intel_rapl sb_edac
edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_int
el kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper crypt
d dm_multipath joydev mei_me mei lpc_ich ioatdma wmi ipmi_si ipmi_msghandler
btrfs shpchp mac_hid lp parport ses enclosure scsi_tran
sport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq libcrc32c igb hid_generic i2c_algo_
bit raid1 dca usbhid ahci raid0 ptp megaraid_sas multipath
Jan 23 14:16:17 my_machine kernel: [ 2568.697150]  hid libahci pps_core
linear dm_mirror dm_region_hash dm_log
Jan 23 14:16:17 my_machine kernel: [ 2568.702689] CPU: 0 PID: 2440 Comm:
nfsd Tainted: GW   4.9.5-custom #1
Jan 23 14:16:17

Hard crash on 4.9.5

2017-01-23 Thread Matt McKinnon
Wondering what to do about this error which says 'reboot needed'.  Has 
happened a three times in the past week:


Jan 23 14:16:17 my_machine kernel: [ 2568.595648] BTRFS error (device 
sda1): err add delayed dir index item(index: 23810) into the deletion 
tree of the delayed node(root id: 257, inode id: 2661433, errno: -17)
Jan 23 14:16:17 my_machine kernel: [ 2568.611010] [ cut here 
]
Jan 23 14:16:17 my_machine kernel: [ 2568.615628] kernel BUG at 
fs/btrfs/delayed-inode.c:1557!
Jan 23 14:16:17 my_machine kernel: [ 2568.620942] invalid opcode:  
[#1] SMP
Jan 23 14:16:17 my_machine kernel: [ 2568.624960] Modules linked in: ufs 
qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_REJECT nf_rej
ect_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack 
nf_conntrack iptable_filter ip_tables x_tables ipmi_devintf nfsd au
th_rpcgss nfs_acl nfs lockd grace sunrpc fscache intel_rapl sb_edac 
edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_int
el kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel 
aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper crypt
d dm_multipath joydev mei_me mei lpc_ich ioatdma wmi ipmi_si 
ipmi_msghandler btrfs shpchp mac_hid lp parport ses enclosure scsi_tran
sport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c igb hid_generic i2c_algo_

bit raid1 dca usbhid ahci raid0 ptp megaraid_sas multipath
Jan 23 14:16:17 my_machine kernel: [ 2568.697150]  hid libahci pps_core 
linear dm_mirror dm_region_hash dm_log
Jan 23 14:16:17 my_machine kernel: [ 2568.702689] CPU: 0 PID: 2440 Comm: 
nfsd Tainted: GW   4.9.5-custom #1
Jan 23 14:16:17 my_machine kernel: [ 2568.710166] Hardware name: 
Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28

/2014
Jan 23 14:16:17 my_machine kernel: [ 2568.719207] task: 95a42addab80 
task.stack: b9da8533
Jan 23 14:16:17 my_machine kernel: [ 2568.725124] RIP: 
0010:[]  [] 
btrfs_delete_delayed_dir_inde

x+0x286/0x290 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.735604] RSP: 
0018:b9da85333be0  EFLAGS: 00010286
Jan 23 14:16:17 my_machine kernel: [ 2568.740917] RAX:  
RBX: 95a3b104b690 RCX: 
Jan 23 14:16:17 my_machine kernel: [ 2568.748048] RDX: 0001 
RSI: 95a42fc0dcc8 RDI: 95a42fc0dcc8
Jan 23 14:16:17 my_machine kernel: [ 2568.755171] RBP: b9da85333c48 
R08: 0491 R09: 
Jan 23 14:16:17 my_machine kernel: [ 2568.762297] R10: 0005 
R11: 0006 R12: 95a3b104b6d8
Jan 23 14:16:17 my_machine kernel: [ 2568.769429] R13: 5d02 
R14: 95a82953d800 R15: ffef
Jan 23 14:16:17 my_machine kernel: [ 2568.776555] FS: 
() GS:95a42fc0() knlGS:
Jan 23 14:16:17 my_machine kernel: [ 2568.784639] CS:  0010 DS:  ES: 
 CR0: 80050033
Jan 23 14:16:17 my_machine kernel: [ 2568.790377] CR2: 7f12ea376000 
CR3: 0003e1e07000 CR4: 001406f0

Jan 23 14:16:17 my_machine kernel: [ 2568.797503] Stack:
Jan 23 14:16:17 my_machine kernel: [ 2568.799524]  9b7fe5f2 
95a3b104b560 0004 95a3f96b3e80
Jan 23 14:16:17 my_machine kernel: [ 2568.806983]  95a3f96b3e80 
39ff95a814eeeb68 6000289c 5d02
Jan 23 14:16:17 my_machine kernel: [ 2568.814436]  95a3f7457c40 
95a3bcb74138 95a814eeeb68 00289c39

Jan 23 14:16:17 my_machine kernel: [ 2568.821891] Call Trace:
Jan 23 14:16:17 my_machine kernel: [ 2568.824343]  [] 
? mutex_lock+0x12/0x2f
Jan 23 14:16:17 my_machine kernel: [ 2568.829671]  [] 
__btrfs_unlink_inode+0x198/0x4c0 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.836555]  [] 
btrfs_unlink_inode+0x1c/0x40 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.843086]  [] 
btrfs_unlink+0x6b/0xb0 [btrfs]
Jan 23 14:16:17 my_machine kernel: [ 2568.849091]  [] 
vfs_unlink+0xda/0x190
Jan 23 14:16:17 my_machine kernel: [ 2568.854315]  [] 
? lookup_one_len+0xd3/0x130
Jan 23 14:16:17 my_machine kernel: [ 2568.860075]  [] 
nfsd_unlink+0x16e/0x210 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.866084]  [] 
nfsd3_proc_remove+0x7c/0x110 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.872529]  [] 
nfsd_dispatch+0xb8/0x1f0 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.878641]  [] 
svc_process_common+0x43f/0x700 [sunrpc]
Jan 23 14:16:17 my_machine kernel: [ 2568.885432]  [] 
svc_process+0xfc/0x1c0 [sunrpc]
Jan 23 14:16:17 my_machine kernel: [ 2568.891528]  [] 
nfsd+0xf0/0x160 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.896838]  [] 
? nfsd_destroy+0x60/0x60 [nfsd]
Jan 23 14:16:17 my_machine kernel: [ 2568.902931]  [] 
kthread+0xca/0xe0
Jan 23 14:16:17 my_machine kernel: [ 2568.907807]  [] 
? kthread_park+0x60/0x60
Jan 23 14:16:17 my_machine kernel: [ 2568.913296]  [] 
ret_from_fork+0x25/0x30
Jan 23 14:16:17 my_machine kernel: [ 2568.918693] Code: ff ff 48 8b 43 
10 49 8b 

kernel crash after upgrading to 4.9

2017-01-04 Thread Matt McKinnon

Hi All,

I seem to have a similar issue to a subject in December:

Subject: page allocation stall in kernel 4.9 when copying files from one 
btrfs hdd to another


In my case, this is caused when rsync'ing large amounts of data over NFS 
to the server with the BTRFS file system.  This was not apparent in the 
previous kernel (4.7).


The poster mentioned some suggestions from Ducan here:

https://mail-archive.com/linux-btrfs@vger.kernel.org/msg60083.html

But those are not visible in the thread.  What suggestions were given to 
help alleviate this pain?


-Matt
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2016-08-10 Thread Matt McKinnon
t here ]
[   79.922000] WARNING: CPU: 6 PID: 2632 at fs/btrfs/file.c:546 
btrfs_drop_extent_cache+0x3e8/0x400 [btrfs]
[   79.922002] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_tcpudp 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
iptable_filter ip_tables x_tables ipmi_devintf sb_edac edac_core 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btrfs aesni_intel 
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd dm_multipath nfsd 
auth_rpcgss joydev nfs_acl mei_me nfs lpc_ich mei lockd wmi grace 
ipmi_si sunrpc ipmi_msghandler fscache shpchp ioatdma mac_hid lp parport 
ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor hid_generic igb raid6_pq 
i2c_algo_bit libcrc32c dca usbhid raid1 ahci raid0 ptp megaraid_sas 
multipath hid libahci pps_core linear dm_mirror dm_region_hash dm_log

[   79.922063] CPU: 6 PID: 2632 Comm: mount Not tainted 4.7.0-custom #1
[   79.922065] Hardware name: Supermicro 
X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014
[   79.922067]   88046ca1f538 813b816c 

[   79.922071]   88046ca1f578 8107a321 
02226ca1f5e0
[   79.922074]  880841d19460 e000 880841e21290 
880841e210c0

[   79.922077] Call Trace:
[   79.922089]  [] dump_stack+0x63/0x87
[   79.922096]  [] __warn+0xd1/0xf0
[   79.922099]  [] warn_slowpath_null+0x1d/0x20
[   79.922117]  [] btrfs_drop_extent_cache+0x3e8/0x400 
[btrfs]
[   79.922133]  [] __btrfs_drop_extents+0x5b2/0xd30 
[btrfs]
[   79.922147]  [] ? 
generic_bin_search.constprop.36+0x85/0x190 [btrfs]
[   79.922160]  [] ? btrfs_set_path_blocking+0x36/0x70 
[btrfs]

[   79.922173]  [] ? btrfs_search_slot+0x438/0x970 [btrfs]
[   79.922178]  [] ? kmem_cache_alloc+0x1d6/0x1f0
[   79.922190]  [] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
[   79.922205]  [] btrfs_drop_extents+0x79/0xa0 [btrfs]
[   79.94]  [] replay_one_extent+0x419/0x750 [btrfs]
[   79.922241]  [] replay_one_buffer+0x4db/0x7d0 [btrfs]
[   79.922258]  [] ? 
mark_extent_buffer_accessed+0x4f/0x70 [btrfs]

[   79.922274]  [] walk_down_log_tree+0x1cc/0x3d0 [btrfs]
[   79.922289]  [] walk_log_tree+0xba/0x1a0 [btrfs]
[   79.922304]  [] btrfs_recover_log_trees+0x213/0x470 
[btrfs]

[   79.922318]  [] ? replay_one_extent+0x750/0x750 [btrfs]
[   79.922335]  [] open_ctree+0x264d/0x2760 [btrfs]
[   79.922348]  [] btrfs_mount+0xc94/0xeb0 [btrfs]
[   79.922353]  [] ? find_next_zero_bit+0x1e/0x20
[   79.922358]  [] ? pcpu_next_unpop+0x3e/0x50
[   79.922362]  [] ? find_next_bit+0x19/0x20
[   79.922368]  [] mount_fs+0x39/0x160
[   79.922371]  [] ? __alloc_percpu+0x15/0x20
[   79.922375]  [] vfs_kern_mount+0x67/0x110
[   79.922387]  [] btrfs_mount+0x18b/0xeb0 [btrfs]
[   79.922390]  [] ? find_next_zero_bit+0x1e/0x20
[   79.922394]  [] mount_fs+0x39/0x160
[   79.922397]  [] ? __alloc_percpu+0x15/0x20
[   79.922399]  [] vfs_kern_mount+0x67/0x110
[   79.922402]  [] do_mount+0x22a/0xd90
[   79.922406]  [] ? __kmalloc_track_caller+0x1af/0x250
[   79.922408]  [] ? strndup_user+0x41/0x80
[   79.922411]  [] ? memdup_user+0x42/0x70
[   79.922413]  [] SyS_mount+0x83/0xd0
[   79.922418]  [] entry_SYSCALL_64_fastpath+0x1e/0xa8
[   79.922436] ---[ end trace 0db3466cdad31dcf ]---




On 08/09/2016 10:25 PM, Chris Murphy wrote:

On Tue, Aug 9, 2016 at 6:29 PM, Matt McKinnon <m...@techsquare.com> wrote:

Spoke too soon.  Do I need to continue to run with that mount option in
place?


It shouldn't be necessary. Something's still wrong for some reason,
even with DUP metadata being CoW'd so someone else is going to have to
speak up what the problem is. And that btrfs check not only doesn't
come up clean but crashes suggests some confluence of things in kernel
4.3 and your hardware conspired to make the file system inconsistent
in a way that isn't immediately recovering the usual way. That is,
usebackuproots working suggests that there's a bug elsewhere in the
storage stack because normally that shouldn't be necessary -
something's happened out of order.

1 size 50.93TiB used 22.67TiB path /dev/sda1

What is the exact nature of this block device?

If getting this back up and running is urgent I suggest inquiring on
IRC what the next steps are.

In the meantime I'd get a btrfs-image (which is probably going to be
quite large given metadata is 60GiB), if that pukes then see if 'btrfs
inspect-internal dump-tree /dev/sda1 > dumptree.log' which may also
fail but before it fails might contain something useful. Obviously
btrfs check shouldn't crash so that's a bug already. What do you get
for free -m? It's known that btrfs check needs a lot of memory and
pretty much all the metadata needs to be read in, so... if you have an
SSD available it might make sense to setup a huge pile of swap on that
SSD and rerun btrfs check.




--
To unsubscribe from this list: send the line "uns

Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2016-08-10 Thread Matt McKinnon

I performed a quick balance which gave me:

[39020.030638] BTRFS info (device sda1): relocating block group 
25428383236096 flags 1
[39020.206097] BTRFS warning (device sda1): block group 23113395863552 
has wrong amount of free space
[39020.206101] BTRFS warning (device sda1): failed to load free space 
cache for block group 23113395863552, rebuilding it now


then a crash dump.

Remounted with -o clear_cache,nospace_cache and the balance completed. 
Running a larger balance now.


Will umount, and remount with default options to see if that works.

-Matt

On 08/10/2016 03:09 AM, g6094...@freenet.de wrote:

Hi,

from what i see you have a non finished balance ongoing, since you have
system and metadata DUP and single information on disk.

so you should (re)run a balance for this data.


sash


Am 10.08.2016 um 02:17 schrieb Matt McKinnon:

-o usebackuproot worked well.

after the file system settled, performing a sync and a clean umount, a
normal mount works now as well.

Anything I should be doing going forward?

Thanks,
Matt

On 08/09/2016 08:01 PM, Chris Murphy wrote:

On Tue, Aug 9, 2016 at 5:15 PM, Matt McKinnon <m...@techsquare.com>
wrote:

Hello,

Our server recently crashed and was rebooted.  When it returned our
BTRFS
volume is mounting read-only:


What happens when you try mounting with -o usebackuproot ?

If that fails, what output do you get for 'btrfs check' (without
--repair)? If you only get some "errors 400, nbytes wrong" then
--repair should fix the problem.




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2016-08-09 Thread Matt McKinnon

# btrfs check /dev/sda1
Checking filesystem on /dev/sda1
UUID: 33f9089e-acc7-4a39-8b83-b18bb182faaf
checking extents
ref mismatch on [958277767168 5894144] extent item 0, found 1
Backref 958277767168 root 257 owner 15799573 offset 750342144 num_refs 0 
not found in extent tree
Incorrect local backref count on 958277767168 root 257 owner 15799573 
offset 750342144 found 1 wanted 0 back 0x15d380f90

backpointer mismatch on [958277767168 5894144]
ref mismatch on [958298935296 9666560] extent item 0, found 2
Backref 958298935296 root 257 owner 15799573 offset 559185920 num_refs 0 
not found in extent tree
Incorrect local backref count on 958298935296 root 257 owner 15799573 
offset 559185920 found 2 wanted 0 back 0x15d3809a0

backpointer mismatch on [958298935296 9666560]


about 859 of those ...

Then:

owner ref check failed [25737445867520 16384]
checking free space cache
There is no free space entry for 109105479680-109105496064
There is no free space entry for 109105479680-109551026176
cache appears valid but isn't 109014155264
There is no free space entry for 139709693952-139709710336
There is no free space entry for 139709693952-140152668160
cache appears valid but isn't 139615797248
Wanted offset 171291525120, found 171291426816
Wanted offset 171291525120, found 171291426816
cache appears valid but isn't 171291181056
Wanted offset 220146597888, found 220146532352
Wanted offset 220146597888, found 220146532352
cache appears valid but isn't 220146434048
btrfs: unable to add free space :-17
free-space-cache.c:824: btrfs_add_free_space: Assertion `ret == -EEXIST` 
failed.

btrfs[0x464af9]
btrfs(btrfs_add_free_space+0x154)[0x46531f]
btrfs(load_free_space_cache+0xab7)[0x465e36]
btrfs(cmd_check+0x22c7)[0x42db0e]
btrfs(main+0x155)[0x40a4fd]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7faad34cdf45]
btrfs[0x40a0f9]


and we crashed out of the check there.

-Matt

On 08/09/2016 08:06 PM, Chris Murphy wrote:

On Tue, Aug 9, 2016 at 6:01 PM, Chris Murphy <li...@colorremedies.com> wrote:

On Tue, Aug 9, 2016 at 5:15 PM, Matt McKinnon <m...@techsquare.com> wrote:

Hello,

Our server recently crashed and was rebooted.  When it returned our BTRFS
volume is mounting read-only:


What happens when you try mounting with -o usebackuproot ?

If that fails, what output do you get for 'btrfs check' (without
--repair)? If you only get some "errors 400, nbytes wrong" then
--repair should fix the problem.


This could also be a regression somewhere...
https://bugzilla.kernel.org/show_bug.cgi?id=60522



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2016-08-09 Thread Matt McKinnon
Spoke too soon.  Do I need to continue to run with that mount option in 
place?



[   83.775984] BTRFS warning (device sda1): block group 25741009879040 
has wrong amount of free space
[   83.775989] BTRFS warning (device sda1): failed to load free space 
cache for block group 25741009879040, rebuilding it now
[   85.231748] BTRFS warning (device sda1): block group 25737721544704 
has wrong amount of free space
[   85.231752] BTRFS warning (device sda1): failed to load free space 
cache for block group 25737721544704, rebuilding it now

[   98.913796] BTRFS info (device sda1): disk space caching is enabled
[   98.913803] BTRFS info (device sda1): has skinny extents
[  179.564408] BTRFS warning (device sda1): block group 78412513280 has 
wrong amount of free space
[  179.564414] BTRFS warning (device sda1): failed to load free space 
cache for block group 78412513280, rebuilding it now

[  667.106718] [ cut here ]
[  667.106772] WARNING: CPU: 0 PID: 2726 at fs/btrfs/extent-tree.c:2963 
btrfs_run_delayed_refs+0x292/0x2d0 [btrfs]

[  667.106775] BTRFS: Transaction aborted (error -17)
[  667.106777] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_tcpudp 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
iptable_filter ip_tables x_tables ipmi_devintf sb_edac edac_core 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel btrfs kvm 
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel 
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd dm_multipath 
joydev lpc_ich mei_me mei wmi ipmi_si ipmi_msghandler nfsd auth_rpcgss 
nfs_acl nfs lockd grace ioatdma sunrpc shpchp mac_hid fscache lp parport 
ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c 
hid_generic igb raid1 usbhid i2c_algo_bit ahci raid0 dca multipath ptp 
hid megaraid_sas libahci linear pps_core dm_mirror dm_region_hash dm_log
[  667.106859] CPU: 0 PID: 2726 Comm: btrfs-transacti Not tainted 
4.7.0-custom #1
[  667.106861] Hardware name: Supermicro 
X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014
[  667.106864]   880464e73c08 813b816c 
880464e73c58
[  667.106869]   880464e73c48 8107a321 
0b936c3cc170
[  667.106873]  880443191130 88046c3cc170 88046b43f000 


[  667.106878] Call Trace:
[  667.106889]  [] dump_stack+0x63/0x87
[  667.106896]  [] __warn+0xd1/0xf0
[  667.106901]  [] warn_slowpath_fmt+0x4f/0x60
[  667.106925]  [] btrfs_run_delayed_refs+0x292/0x2d0 
[btrfs]
[  667.106947]  [] 
btrfs_write_dirty_block_groups+0x178/0x3b0 [btrfs]
[  667.106974]  [] commit_cowonly_roots+0x23c/0x2e0 
[btrfs]
[  667.106999]  [] 
btrfs_commit_transaction+0x4fb/0xa80 [btrfs]

[  667.107021]  [] transaction_kthread+0x1d2/0x200 [btrfs]
[  667.107042]  [] ? 
btrfs_cleanup_transaction+0x580/0x580 [btrfs]

[  667.107047]  [] kthread+0xc9/0xe0
[  667.107053]  [] ret_from_fork+0x1f/0x40
[  667.107056]  [] ? kthread_park+0x60/0x60
[  667.107060] ---[ end trace 336c80ba4db66e78 ]---
[  667.107065] BTRFS: error (device sda1) in 
btrfs_run_delayed_refs:2963: errno=-17 Object already exists

[  667.116389] BTRFS info (device sda1): forced readonly
[  667.117081] BTRFS warning (device sda1): Skipping commit of aborted 
transaction.
[  667.117086] BTRFS: error (device sda1) in cleanup_transaction:1853: 
errno=-17 Object already exists



On 08/09/2016 08:06 PM, Chris Murphy wrote:

On Tue, Aug 9, 2016 at 6:01 PM, Chris Murphy <li...@colorremedies.com> wrote:

On Tue, Aug 9, 2016 at 5:15 PM, Matt McKinnon <m...@techsquare.com> wrote:

Hello,

Our server recently crashed and was rebooted.  When it returned our BTRFS
volume is mounting read-only:


What happens when you try mounting with -o usebackuproot ?

If that fails, what output do you get for 'btrfs check' (without
--repair)? If you only get some "errors 400, nbytes wrong" then
--repair should fix the problem.


This could also be a regression somewhere...
https://bugzilla.kernel.org/show_bug.cgi?id=60522



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2016-08-09 Thread Matt McKinnon

-o usebackuproot worked well.

after the file system settled, performing a sync and a clean umount, a 
normal mount works now as well.


Anything I should be doing going forward?

Thanks,
Matt

On 08/09/2016 08:01 PM, Chris Murphy wrote:

On Tue, Aug 9, 2016 at 5:15 PM, Matt McKinnon <m...@techsquare.com> wrote:

Hello,

Our server recently crashed and was rebooted.  When it returned our BTRFS
volume is mounting read-only:


What happens when you try mounting with -o usebackuproot ?

If that fails, what output do you get for 'btrfs check' (without
--repair)? If you only get some "errors 400, nbytes wrong" then
--repair should fix the problem.




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BTRFS: error (device sda1) in btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2016-08-09 Thread Matt McKinnon

Hello,

Our server recently crashed and was rebooted.  When it returned our 
BTRFS volume is mounting read-only:


[  142.395093] BTRFS: error (device sda1) in 
btrfs_run_delayed_refs:2963: errno=-17 Object already exists

[  142.404418] BTRFS info (device sda1): forced readonly

I tried upgrading the kernel from 4.3 to 4.7.  Upgraded btrfs-progs to 
v4.7 as well.


# uname -a
Linux hostname 4.7.0-custom #1 SMP Tue Aug 9 11:16:28 EDT 2016 x86_64 
x86_64 x86_64 GNU/Linux


# btrfs --version
btrfs-progs v4.7

# btrfs fi show
Label: none  uuid: 33f9089e-acc7-4a39-8b83-b18bb182faaf
Total devices 1 FS bytes used 14.95TiB
devid1 size 50.93TiB used 22.67TiB path /dev/sda1

# btrfs fi df /export/
Data, single: total=22.53TiB, used=14.89TiB
System, DUP: total=40.00MiB, used=2.39MiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=70.50GiB, used=60.21GiB
Metadata, single: total=1.51GiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B

# dmesg
[  142.394841] [ cut here ]
[  142.394874] WARNING: CPU: 6 PID: 269 at fs/btrfs/extent-tree.c:2963 
btrfs_run_delayed_refs+0x292/0x2d0 [btrfs]

[  142.394876] BTRFS: Transaction aborted (error -17)
[  142.394878] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_tcpudp 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
iptable_filter ip_tables x_tables ipmi_devintf nfsd auth_rpcgss nfs_acl 
nfs lockd grace sunrpc fscache sb_edac edac_core x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul 
glue_helper ablk_helper cryptd dm_multipath joydev lpc_ich mei_me mei 
ioatdma wmi ipmi_si ipmi_msghandler shpchp mac_hid btrfs lp parport ses 
enclosure scsi_transport_sas raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor igb raid6_pq libcrc32c 
i2c_algo_bit raid1 hid_generic dca usbhid raid0 ptp hid ahci 
megaraid_sas multipath libahci pps_core linear dm_mirror dm_region_hash 
dm_log
[  142.394942] CPU: 6 PID: 269 Comm: kworker/u18:5 Not tainted 
4.7.0-custom #1
[  142.394944] Hardware name: Supermicro 
X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014

[  142.394966] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[  142.394969]   88086a057ca0 813b816c 
88086a057cf0
[  142.394972]   88086a057ce0 8107a321 
0b9325288170
[  142.394975]  8808519eb000 880825288170 88086b2c1000 
0020

[  142.394978] Call Trace:
[  142.394987]  [] dump_stack+0x63/0x87
[  142.394993]  [] __warn+0xd1/0xf0
[  142.394996]  [] warn_slowpath_fmt+0x4f/0x60
[  142.395012]  [] btrfs_run_delayed_refs+0x292/0x2d0 
[btrfs]
[  142.395025]  [] delayed_ref_async_start+0x94/0xb0 
[btrfs]

[  142.395044]  [] normal_work_helper+0xc0/0x2d0 [btrfs]
[  142.395050]  [] ? pwq_activate_delayed_work+0x42/0xb0
[  142.395066]  [] btrfs_extent_refs_helper+0x12/0x20 
[btrfs]

[  142.395070]  [] process_one_work+0x153/0x3f0
[  142.395073]  [] worker_thread+0x12b/0x4b0
[  142.395076]  [] ? rescuer_thread+0x340/0x340
[  142.395079]  [] kthread+0xc9/0xe0
[  142.395085]  [] ret_from_fork+0x1f/0x40
[  142.395088]  [] ? kthread_park+0x60/0x60
[  142.395090] ---[ end trace e2b0b8dc37502011 ]---
[  142.395093] BTRFS: error (device sda1) in 
btrfs_run_delayed_refs:2963: errno=-17 Object already exists

[  142.404418] BTRFS info (device sda1): forced readonly
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Data recovery from a linear multi-disk btrfs file system

2016-07-15 Thread Matt

> On 15 Jul 2016, at 14:10, Austin S. Hemmelgarn <ahferro...@gmail.com> wrote:
> 
> On 2016-07-15 05:51, Matt wrote:
>> Hello
>> 
>> I glued together 6 disks in linear lvm fashion (no RAID) to obtain one large 
>> file system (see below).  One of the 6 disk failed. What is the best way to 
>> recover from this?
>> 
> The tool you want is `btrfs restore`.  You'll need somewhere to put the files 
> from this too of course.  That said, given that you had data in raid0 mode, 
> you're not likely to get much other than very small files back out of this, 
> and given other factors, you're not likely to get what you would consider 
> reasonable performance out of this either.

Thanks so much for pointing me towards btrfs-restore. I surely will give it a 
try.  Note that the FS is not a RAID0 but  linear (“JPOD") configuration. This 
is why  it somehow did not occur to me to try btrfs-restore.  The good news 
about in this configuration  the files are *not* distributed across disks. We 
can  read most of the files just fine.  The failed disk was actually smaller 
than the others five so that we should be able to recover more than 5/6 of the 
data, shouldn’t we?  My trouble is that the IO errors due to the missing disk  
cripple the transfer speed of both rsync and dd_rescue.

> Your best bet to get a working filesystem again would be to just recreate it 
> from scratch, there's not much else that can be done when you've got a raid0 
> profile and have lost a disk.

This is what I plan to do if there if btrfs-restore turns out to be too slow 
and nobody on this list has any better idea.  It will, however, require  
transferring  >15TB across the Atlantic (this is were the “backup” reside).  
This can be tedious which is why I would love to avoid it.

Matt

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Data recovery from a linear multi-disk btrfs file system

2016-07-15 Thread Matt
Hello

I glued together 6 disks in linear lvm fashion (no RAID) to obtain one large 
file system (see below).  One of the 6 disk failed. What is the best way to 
recover from this?

Thanks to RAID1 of the metadata I can still access the data residing on the 
remaining 5 disks after mounting ro,force.  What I would like to do now is to 

1) Find out the names of all the files with missing data
2) Make the file system fully functional (rw) again.

To achieve 2 I wanted to move the data of the disk. This, however, turns out to 
be rather difficult. 
 - rsync does not provide a immediate time-out option in case of an IO error
 - Even when I set the time-out for dd_rescue to a minimum, the transfer speed 
is still way too low to move the data
 (> 15TB) off the file system.
Both methods are too slow to move off the data within a reasonable time frame. 

Does anybody have a suggestion how to best recover from this? (Our backup is 
incomplete).
I am looking for either a tool to move off the  data — something which gives up 
immediately in case of IO error and log the affected files.
Alternatively I am looking for a btrfs command like  “ btrfs device delete 
missing “ for a non-RAID multi-disk btrfs filesystem.
Would some variant of  "btrfs balance" do something helpful?

Any help is appreciated!

Regards,
Matt

# btrfs fi show
Label: none  uuid: d82fff2c-0232-47dd-a257-04c67141fc83
Total devices 6 FS bytes used 16.83TiB
devid1 size 3.64TiB used 3.47TiB path /dev/sdc
devid2 size 3.64TiB used 3.47TiB path /dev/sdd
devid3 size 3.64TiB used 3.47TiB path /dev/sde
devid4 size 3.64TiB used 3.47TiB path /dev/sdf
devid5 size 1.82TiB used 1.82TiB path /dev/sdb
*** Some devices missing


# btrfs fi df /work
Data, RAID0: total=18.31TiB, used=16.80TiB
Data, single: total=8.00MiB, used=8.00MiB
System, RAID1: total=8.00MiB, used=896.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=34.00GiB, used=30.18GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs tragedy: lack of space for metadata leads to loss of fs.

2015-08-25 Thread Matt Ruffalo
On 2015-08-25 09:44, Miguel Negrão wrote:
 Hi list,

 This weekend had my first btrfs horror story. 

 system: 3.13.0-49-lowlatency, btrfs-progs v4.1.2

 A disclaimer: I know 3.13 is very out of date, but I the requirement of
 keeping kernel up to date clashes with my requirement of keeping a stable
 system. At the moment I can't disturb my system as I'm doing important work,
 upgrading kernel requires upgrading ubuntu, which will upgrade a lot of
 packages and might lead to problems which I don't have time to fix. One
 might argue that in the end I lost time anyway dealing with these btrfs
 issues. When I'm done with this current work I will update the whole system
 which will update the kernel in the process.

Hi-

I have no useful advice about filesystem recovery, but would like to
point out that newer kernels are backported to Ubuntu LTS versions and
can be installed without any significant disruption of the system.

The normal kernel backports are named 'linux-generic-lts-version
codename', and the low latency versions are
'linux-lowlatency-lts-version codename', so you could install kernel
3.16 (from 14.10 utopic) by installing linux-lowlatency-lts-utopic,
and kernel 3.19 (from 15.04 vivid) by installing
linux-lowlatency-lts-vivid. Kernel 4.1 will be available as
linux-{generic,lowlatency}-lts-wily a bit after 15.10 is released.

MMR...



signature.asc
Description: OpenPGP digital signature


[PATCH RESEND] btrfs: Align EOF length to block in extent_same

2015-04-26 Thread Matt Robinson
It is not currently possible to deduplicate the last block of files
whose size is not a multiple of the block size, as the btrfs_extent_same
ioctl returns -EINVAL if offset + size is greater than the file size or
is not aligned to the fs block size.

For example, with the default block size of 16K and two identical
1,000,000 byte files, calling the extent_same ioctl with offset 0 and
length set to 1,000,000 the call fails with -EINVAL.  The same call
with a length of 999,424 will succeed, but the final 576 bytes can then
not be shared.  This seems to have a larger impact on the amount of
space actually freed by the ioctl than would be expected - in my
testing the amount of space freed was generally reduced by 50-100% for
files sized from a few megabytes downwards which has a significant
negative impact on the usefulness of the extent_same ioctl in some
circumstances.

To resolve this, this patch allows unaligned offset + length values to
be passed to btrfs_ioctl_file_extent_same if offset + length is equal
to the file size of both src and dest.  This is implemented in the same
way as in btrfs_ioctl_clone.

To return to the earlier example 1,000,000 byte file - this patch would
allow a length of 1,000,000 bytes to be passed as it is equal to the
file lengths and would be internally extended to the end of the block
(1,015,808), allowing one set of extents to be shared completely between
the full length of both files.

Signed-off-by: Matt Robinson g...@nerdoftheherd.com
---
 fs/btrfs/ioctl.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index ca5d968..0588076 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2878,14 +2878,16 @@ static int btrfs_cmp_data(struct inode *src, u64 loff, 
struct inode *dst,
return ret;
 }
 
-static int extent_same_check_offsets(struct inode *inode, u64 off, u64 len)
+static int extent_same_check_offsets(struct inode *inode, u64 off, u64 len,
+u64 len_aligned)
 {
u64 bs = BTRFS_I(inode)-root-fs_info-sb-s_blocksize;
 
if (off + len  inode-i_size || off + len  off)
return -EINVAL;
+
/* Check that we are block aligned - btrfs_clone() requires this */
-   if (!IS_ALIGNED(off, bs) || !IS_ALIGNED(off + len, bs))
+   if (!IS_ALIGNED(off, bs) || !IS_ALIGNED(off + len_aligned, bs))
return -EINVAL;
 
return 0;
@@ -2895,6 +2897,8 @@ static int btrfs_extent_same(struct inode *src, u64 loff, 
u64 len,
 struct inode *dst, u64 dst_loff)
 {
int ret;
+   u64 len_aligned = len;
+   u64 bs = BTRFS_I(src)-root-fs_info-sb-s_blocksize;
 
/*
 * btrfs_clone() can't handle extents in the same file
@@ -2909,11 +2913,15 @@ static int btrfs_extent_same(struct inode *src, u64 
loff, u64 len,
 
btrfs_double_lock(src, loff, dst, dst_loff, len);
 
-   ret = extent_same_check_offsets(src, loff, len);
+   /* if we extend to both eofs, continue to block boundaries */
+   if (loff + len == src-i_size  dst_loff + len == dst-i_size)
+   len_aligned = ALIGN(src-i_size, bs) - loff;
+
+   ret = extent_same_check_offsets(src, loff, len, len_aligned);
if (ret)
goto out_unlock;
 
-   ret = extent_same_check_offsets(dst, dst_loff, len);
+   ret = extent_same_check_offsets(dst, dst_loff, len, len_aligned);
if (ret)
goto out_unlock;
 
@@ -2926,7 +2934,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, 
u64 len,
 
ret = btrfs_cmp_data(src, loff, dst, dst_loff, len);
if (ret == 0)
-   ret = btrfs_clone(src, dst, loff, len, len, dst_loff);
+   ret = btrfs_clone(src, dst, loff, len, len_aligned, dst_loff);
 
 out_unlock:
btrfs_double_unlock(src, loff, dst, dst_loff, len);
@@ -3172,8 +3180,7 @@ static void clone_update_extent_map(struct inode *inode,
  * @inode: Inode to clone to
  * @off: Offset within source to start clone from
  * @olen: Original length, passed by user, of range to clone
- * @olen_aligned: Block-aligned value of olen, extent_same uses
- *   identical values here
+ * @olen_aligned: Block-aligned value of olen
  * @destoff: Offset within @inode to start clone
  */
 static int btrfs_clone(struct inode *src, struct inode *inode,
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs-cleaner FS DoS issues

2015-04-15 Thread Matt Grant
Hi!

When ever I delete a large snapshot this stalls all the processes on my system 
for 30 minutes plus, kernel v19.2. Btrfs-cleaner is taking 100% CPU when 
completely stalled. Every few minutes its say 99.8% and satall abates and  
processes/io happen.

About to lodge a kernel bug about this as it is serious issue.

Could some one look at making the clean up process more sensitive to when the 
system is idle? MD Raid is very good at this, and it should be possible to set 
this up.

Best Regards,

Matt Grant

Re: [PATCH 1/1] btrfs: Align EOF length to block in extent_same

2015-04-11 Thread Matt Robinson
Hi All,

As David hasn't got back to me I'm guessing that he is too busy with
other things at present. If anyone else is able to spare the time to
review my patch and give me feedback that would be very much
appreciated.

Many Thanks,

Matt

On 3 March 2015 at 00:27, Zygo Blaxell ce3g8...@umail.furryterror.org wrote:

 I second this.  I've seen the same behavior.

 Clone seems to have evolved a little further than extent-same knows about.
 e.g. there is code in the extent-same ioctl that tries to avoid doing
 a clone from within one inode to elsewhere in the same inode; however,
 the clone ioctl (which extent-same calls) has no such restriction.

 As Matt mentioned, clone_range seems quite happy to accept a partial block
 at EOF.  cp --reflink would be much harder to use if it did not.

 On Mon, Mar 02, 2015 at 08:59:11PM +, Matt Robinson wrote:
  Hi David,
 
  Have you had a chance to look at this?  Am very happy to answer
  further questions, adjust my implementation, provide a different kind
  of test case, etc.
 
  Many Thanks,
 
  Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Python pybtrfs df wrapper script to report btrfs metadata, block, space in df compatible output

2015-03-31 Thread Matt Grant
Hi!

Use this at work.  Releasing to list to prompt design of output that is
100% df format compatible for automated reporting and graphing.

This is so BTRFS space statistics acn bre reported back to existing
graphing tool chains that many of you would have in place.  Quite useful
for munin, hint, hint :-)

URL for github is:

https://github.com/grantma/pybtrfs.git

There is also a shell script there that can be called from cron.

Please get back to me if you have any questions.

Standard not warranted disclaimers apply to the code.  Its GPLv3
licensed.

Best Regards,

Matt Grant

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] btrfs: Align EOF length to block in extent_same

2015-03-02 Thread Matt Robinson
Hi David,

Have you had a chance to look at this?  Am very happy to answer
further questions, adjust my implementation, provide a different kind
of test case, etc.

Many Thanks,

Matt

On 28 January 2015 at 19:46, Matt Robinson g...@nerdoftheherd.com wrote:
 On 28 January 2015 at 12:55, David Sterba dste...@suse.cz wrote:
 On Mon, Jan 26, 2015 at 06:05:51PM +, Matt Robinson wrote:
 It is not currently possible to deduplicate the last block of files
 whose size is not a multiple of the block size, as the btrfs_extent_same
 ioctl returns -EINVAL if offset + size is greater than the file size or
 is not aligned to the fs block size.

 Do you have a reproducer for that?

 I've been using the (quick and dirty) bash script at the end of this
 mail which uses btrfs-extent-same from
 https://github.com/markfasheh/duperemove/ to call the ioctl.  To
 summarize: it creates a new filesystem, creates a file with a size
 which is not a multiple of the block size, copies it, and then calls
 the ioctl to ask firstly for all of the complete blocks (for
 comparison) and then the entire files to be deduplicated.

 Running the script under a kernel compiled from
 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git gives
 a status of -22 from the second btrfs-extent-same call and the final
 btrfs filesystem df shows:
 Data, single: total=8.00MiB, used=1.91MiB

 However, running under the same kernel plus my patch shows this final
 data usage:
 Data, single: total=8.00MiB, used=980.00KiB

 The alignment is required to let btrfs_clone and the extent dropping
 functions to work. [...]

 Which is why it is currently not possible to deduplicate a final
 incomplete block of a file:
 * Passing len + offset = the actual end of the file: Rejected as it is
 not aligned
 * Passing len + offset = the end of the block: Rejected as it exceeds
 the actual end of the file.

 Please let me know if you need any further information, if my
 implementation should be different or there is a better way I could
 demonstrate the issue?

 Many Thanks,

 Matt

 ---

 #!/bin/bash -e

 if [[ $EUID -ne 0 ]]; then
echo This script must be run as root
exit 1
 fi

 loopback=$(losetup -f)

 echo ## Create new btrfs filesystem on a loopback device
 dd if=/dev/zero of=testfs bs=1048576 count=1500
 losetup $loopback testfs
 mkfs.btrfs $loopback
 mkdir testfsmnt
 mount $loopback testfsmnt

 echo -e \n## Create 100 byte random file
 dd if=/dev/urandom of=testfsmnt/test1 bs=100 count=1
 echo
 btrfs filesystem sync testfsmnt
 btrfs filesystem df testfsmnt

 echo -e \n## Copy file
 cp testfsmnt/test1 testfsmnt/test2
 echo
 btrfs filesystem sync testfsmnt
 btrfs filesystem df testfsmnt

 echo -e \n## Dedupe to end of last full block
 btrfs-extent-same 999424 testfsmnt/test1 0 testfsmnt/test2 0
 echo
 btrfs filesystem sync testfsmnt
 btrfs filesystem df testfsmnt

 echo -e \n## Dedupe to end of file
 btrfs-extent-same 100 testfsmnt/test1 0 testfsmnt/test2 0
 echo
 btrfs filesystem sync testfsmnt
 btrfs filesystem df testfsmnt

 echo -e \nClean up
 umount testfsmnt
 rmdir testfsmnt
 losetup -d $loopback
 rm testfs
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs fixes, changes don't appear on git repo

2015-02-26 Thread Matt
Hi linux-btrfs list,

Hi Chris, Hi Josef,


it seemingly happened in the past and now it seems to happen again:

after patches have been posted to the linux-btrfs mailing list and
pulled by Linus,

changes occured and additional pull-requests followed - the old
commits don't appear to be anywhere accessible besides Linus' tree


example:

http://marc.info/?l=linux-btrfsm=142203898505309w=2

[GIT PULL] Btrfs fixes
from January 23rd


I picked a specific patch out:
http://marc.info/?l=linux-btrfsm=142141473603234w=2
Btrfs: fix race deleting block group from space_info-ro_bgs list

since it might lead to lockups

I've seen some lockups in the past few days which involved Btrfs while
playing and adding some bleeding edge patches to my custom kernel
(3.19 based) - or in general to have Btrfs' latest and greatest code

thus I want to make sure that I have the latest
stability-/bugfix-related patches


When searching at
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/
for Btrfs: fix race deleting block group from space_info-ro_bgs list

it winds up in Linus' repo with a merge/pull from January 24th
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/?qt=grepq=Btrfs%3A+fix+race+deleting+block+group+from+space_info-%3Ero_bgs+list


But when searching in the integration or current for-linus branch
http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/log/?h=integration
http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/log/?h=for-linus

http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/log/?h=for-linusqt=grepq=Btrfs%3A+fix+race+deleting+block+group+from+space_info-%3Ero_bgs+list

http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/log/?h=integrationqt=grepq=Btrfs%3A+fix+race+deleting+block+group+from+space_info-%3Ero_bgs+list

There is no such commit - which I can't seem to wrap my head around

The same result with

git log --grep foo ...

after having fetched the latest state of Chris' repo


Am I missing something ?

It would be really nice to have a repo where all of the latest Btrfs
patches are stored and accessible - and a clear picture on why this
weirdness happens


Sorry if this was already asked in the past, since I'm not aware of
such a report


Many thanks in advance for your answers

Kind Regards

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs fixes, changes don't appear on git repo

2015-02-26 Thread Matt
On Thu, Feb 26, 2015 at 11:04 PM, Chris Mason c...@fb.com wrote:


 On Thu, Feb 26, 2015 at 4:49 PM, Matt jackdac...@gmail.com wrote:

 Hi linux-btrfs list,

 Hi Chris, Hi Josef,


 it seemingly happened in the past and now it seems to happen again:

 after patches have been posted to the linux-btrfs mailing list and
 pulled by Linus,

 changes occured and additional pull-requests followed - the old
 commits don't appear to be anywhere accessible besides Linus' tree


 example:

 http://marc.info/?l=linux-btrfsm=142203898505309w=2

 [GIT PULL] Btrfs fixes
 from January 23rd


 Sorry for the confusion.  What happens is that I send Linus pulls for the
 things he's missing, and we have slightly parallel development branches.

 Before 3.19-rc1, I forked 3.18-rc5 and rebased my 3.19 merge window on top
 of that.  All of my commits for 3.19 went on top of this branch.

 I forked our tree for the 4.0 merge window at 3.19-rc5.  This is where all
 the 4.0 commits went.  But, 3.19 kept rolling and we had additional fixes in
 before 3.19-final.

 I use the same branch for every pull to Linus (for-linus), so during
 3.19-rc6 I sent him code on top of for-linus, which at the time was based on
 3.18-rc5 and had all my 3.19 code in it.

 Then the 4.0 merge window started and I switched to my 3.19-rc5 based merge
 window tree, which was actually missing the commit you mentioned because
 Linus took it after rc5.

 It all works for Linus because git merges things easily, and he actually
 prefers that you don't merge in later releases unless you need some fix to
 keep things stable.  In other words, if my for-linus for the 4.0 merge
 window has a merge with 3.19-final, he may push back.

Thanks for the swift and elaborate explanation !

Yes, that's what got me now and in the past confused - I'm sure I'm
not the only one ;)


 In general, you can take my for-linus on top of the last released Linus
 kernel and have all the current commits that are considered stable.

That's the plan :)


 In the future, I'll keep a for-linus-xxyyzz for the last release to make
 this less confusing.

Wow, this would make things a lot clearer and getting an overview much faster !

Your repo then probably would resemble Paul E. McKenney's (
https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git/ )
but I like it that way - everything accessible and comprehensible

Surely a win-win for the devs  community

Thanks again


 -chris




Kind Regards

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] btrfs: Align EOF length to block in extent_same

2015-01-26 Thread Matt Robinson
It is not currently possible to deduplicate the last block of files
whose size is not a multiple of the block size, as the btrfs_extent_same
ioctl returns -EINVAL if offset + size is greater than the file size or
is not aligned to the fs block size. This prevents btrfs from freeing up
the last extent in the file, causing gains from deduplication to be
smaller than expected.

To resolve this, allow unaligned offset + length values to be passed to
btrfs_ioctl_file_extent_same if offset + length = file size for both src
and dest.  This is implemented in the same way as in btrfs_ioctl_clone.

Signed-off-by: Matt Robinson g...@nerdoftheherd.com
---
 fs/btrfs/ioctl.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index d49fe8a..a407d8a 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2871,14 +2871,16 @@ static int btrfs_cmp_data(struct inode *src, u64 loff, 
struct inode *dst,
return ret;
 }
 
-static int extent_same_check_offsets(struct inode *inode, u64 off, u64 len)
+static int extent_same_check_offsets(struct inode *inode, u64 off, u64 len,
+u64 len_aligned)
 {
u64 bs = BTRFS_I(inode)-root-fs_info-sb-s_blocksize;
 
if (off + len  inode-i_size || off + len  off)
return -EINVAL;
+
/* Check that we are block aligned - btrfs_clone() requires this */
-   if (!IS_ALIGNED(off, bs) || !IS_ALIGNED(off + len, bs))
+   if (!IS_ALIGNED(off, bs) || !IS_ALIGNED(off + len_aligned, bs))
return -EINVAL;
 
return 0;
@@ -2888,6 +2890,8 @@ static int btrfs_extent_same(struct inode *src, u64 loff, 
u64 len,
 struct inode *dst, u64 dst_loff)
 {
int ret;
+   u64 len_aligned = len;
+   u64 bs = BTRFS_I(src)-root-fs_info-sb-s_blocksize;
 
/*
 * btrfs_clone() can't handle extents in the same file
@@ -2899,11 +2903,15 @@ static int btrfs_extent_same(struct inode *src, u64 
loff, u64 len,
 
btrfs_double_lock(src, loff, dst, dst_loff, len);
 
-   ret = extent_same_check_offsets(src, loff, len);
+   /* if we extend to both eofs, continue to block boundaries */
+   if (loff + len == src-i_size  dst_loff + len == dst-i_size)
+   len_aligned = ALIGN(src-i_size, bs) - loff;
+
+   ret = extent_same_check_offsets(src, loff, len, len_aligned);
if (ret)
goto out_unlock;
 
-   ret = extent_same_check_offsets(dst, dst_loff, len);
+   ret = extent_same_check_offsets(dst, dst_loff, len, len_aligned);
if (ret)
goto out_unlock;
 
@@ -2916,7 +2924,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, 
u64 len,
 
ret = btrfs_cmp_data(src, loff, dst, dst_loff, len);
if (ret == 0)
-   ret = btrfs_clone(src, dst, loff, len, len, dst_loff);
+   ret = btrfs_clone(src, dst, loff, len, len_aligned, dst_loff);
 
 out_unlock:
btrfs_double_unlock(src, loff, dst, dst_loff, len);
@@ -3162,8 +3170,7 @@ static void clone_update_extent_map(struct inode *inode,
  * @inode: Inode to clone to
  * @off: Offset within source to start clone from
  * @olen: Original length, passed by user, of range to clone
- * @olen_aligned: Block-aligned value of olen, extent_same uses
- *   identical values here
+ * @olen_aligned: Block-aligned value of olen
  * @destoff: Offset within @inode to start clone
  */
 static int btrfs_clone(struct inode *src, struct inode *inode,
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


corruption, bad block, input/output errors - do i run --repair?

2014-11-07 Thread Matt McKinnon
] ? __do_page_fault+0x28c/0x550
[501087.544187]  [8112528c] ? acct_account_cputime+0x1c/0x20
[501087.544189]  [811f1106] do_vfs_ioctl+0x86/0x4f0
[501087.544192]  [810244a5] ? syscall_trace_enter+0x165/0x280
[501087.544193]  [811f1601] SyS_ioctl+0x91/0xb0
[501087.544198]  [8176fc7f] tracesys+0xe1/0xe6
[501087.544199] ---[ end trace e2a77238816656f5 ]---
[501087.579519] parent transid verify failed on 20809493159936 wanted 
4486137218058286914 found 390978



I have been sending incremental snapshot dumps over to an identical file 
server as backups.  Everything checks out OK there.  Do I try to run 
check with --repair first, and fall back to my backup if that fails?


-Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Blocked tasks on 3.15.1

2014-07-20 Thread Matt
+0x69/0x150
[16388.319532] [81422fa8] ? __btrfs_prealloc_file_range+0xe8/0x380
[16388.319534] [8140b6f2] ? btrfs_write_dirty_block_groups+0x642/0x6d0
[16388.319535] [819cb00c] ? commit_cowonly_roots+0x173/0x221
[16388.319537] [8141ad19] ? btrfs_commit_transaction+0x509/0xa30
[16388.319538] [8141b2cb] ? start_transaction+0x8b/0x5b0
[16388.319539] [81416d65] ? transaction_kthread+0x1d5/0x240
[16388.319540] [81416b90] ? btrfs_cleanup_transaction+0x560/0x560
[16388.319541] [810e579a] ? kthread+0xca/0xe0
[16388.319543] [810e56d0] ? kthread_create_on_node+0x180/0x180
[16388.319544] [819d3c7c] ? ret_from_fork+0x7c/0xb0
[16388.319545] [810e56d0] ? kthread_create_on_node+0x180/0x180


but the previous error message I saw seemed related to

http://www.spinics.net/lists/linux-btrfs/msg35145.html

and

http://www.spinics.net/lists/linux-btrfs/msg33628.html


Be aware that this kernel is a highly patched up 3.14.13 with latest
Btrfs integration/for-linus branch - up to

commit abdd2e80a57e5f7278f47913315065f0a3d78d20
Author: Filipe Manana fdman...@gmail.com
Date:   Tue Jun 24 17:46:58 2014 +0100

Btrfs: fix crash when starting transaction

except

Btrfs: fix broken free space cache after the system crashed

(commit e570fd27f2c5d7eac3876bccf99e9838d7f911a3)

which doesn't seem to apply cleanly for me.

So it's not really representative when looking for other kernel
internals but should show almost 100% similar behavior like a 3.15+
kernel with latest integration/for-linus branch.

Currently I have no reason and plans to migrate to 3.15 since I'm
planning to wait for it to mature a little bit more.


Root is on Btrfs with lzo compression on an Intel SSD.

Last time this happened I had the partition formatted with zlib/gzip
compression. This time it's with lzo and also happening.

The problem is that rsync can't be killed off - so the load will
increase over time, only option being to reboot via Magic SYSRQ Key:

ps aux | grep rsync
root 12233  0.1  0.0  33880  4776 pts/0D+   22:20   0:03 rsync
-aiP --delete --inplace --stats /home/matt/news/ /bak/matt/news/
root 12234  0.0  0.0  0 0 pts/0Z+   22:20   0:00
[rsync] defunct
root 12579  0.0  0.0  30380  1376 pts/0S+   23:20   0:00 rsync
-ai --delete --inplace --stats /home/matt/.links/ /bak/matt/.links/
root 12580  0.0  0.0  30352   940 pts/0D+   23:20   0:00 rsync
-ai --delete --inplace --stats /home/matt/.links/ /bak/matt/.links/
root 12581  0.0  0.0  30352   280 pts/0S+   23:20   0:00 rsync
-ai --delete --inplace --stats /home/matt/.links/ /bak/matt/.links/
root 12583  0.0  0.0  18916  1000 pts/1S+   23:21   0:00 grep
--color=auto rsync

/bak is a newly created partition which a few days ago just got
finished getting written to (around 1.5 TB of data).


Any ideas or other patches I could try ?

If I understood correctly

Btrfs: fix abnormal long waiting in fsync doesn't apply to the 3.14
kernel base since it's rather new (June 5th, according to
http://code.metager.de/source/history/linux/stable/mm/ )

and btrfs: test for valid bdev before kobj removal in
btrfs_rm_device is not related

Keep up the great work !

Btrfs is significantly more resilient than in the past (surviving
hardlocks, etc.) - but this high load related rsync blocked task
behavior (also had this in the past several kernel versions back)
creates headaches and still prevents it from being used on a regular,
efficient, basis. I'd *really* like to have an alternative filesystem
with checksums besides ZFS


Kind Regards

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Some impossible benchmark results with LUKS - what am I missing?

2014-03-26 Thread Matt
Hey folks,

I have been experimenting with btrfs on a home NAS box, and have some 
benchmark \
results that I don't believe. I'm hoping someone here has some insight on what 
I've \
missed that would cause such a result.

The short version: I'm seeing roughly a 36% (on write) to 55% (on read) 
performance \
*improvement* when using btrfs on LUKS containers over btrfs on raw disks. 
This \
should not be the case!

The test setup:
My test file is a large file that I previously generated from /dev/urandom. I 
saw \
similar results using /dev/zero as input, as well as a repeated copy of the 
whole \
Ubuntu 14.04 ISO (i.e., real-ish data).

My calculated MB/s numbers are based on the 'real' time in the output.

Live-booted Ubuntu 14.04 (nightly from 3/11/14, kernel 3.13.5)
4x 4TB WD Red drives in standard (no RAID) configuration
i3-4130 CPU (has AES-NI for accelerated encryption)
Default BTRFS options from the disk gui, always raid10.

Tested configurations:
Raw: btrfs raid10 on 4x raw drives
Encrypted: btrfs raid10 on 4 separate LUKS containers on 4x raw drives 
(default LUKS \
options)


Read command: $ time sh -c dd if=test.out of=/dev/null bs=4k
Raw:
512+0 records in
512+0 records out
2097152 bytes (21 GB) copied, 149.841 s, 140 MB/s

real 2m29.849s
user 0m2.764s
sys 0m7.064s

= 133.467690809 MB/s

Encrypted:
$ time sh -c dd if=test2.out of=/dev/null bs=4k
512+0 records in
512+0 records out
2097152 bytes (21 GB) copied, 96.6127 s, 217 MB/s

real 1m36.627s
user 0m3.331s
sys 0m9.518s

= 206.981485506 MB/s


Read+Write: $ time sh -c dd if=test2.out of=test20grand.out bs=4k  sync
Raw:
512+0 records in
512+0 records out
2097152 bytes (21 GB) copied, 227.069 s, 92.4 MB/s

real 3m49.701s
user 0m2.854s
sys 0m15.936s

= 87.069712365 MB/s

Encrypted:
512+0 records in
512+0 records out
2097152 bytes (21 GB) copied, 167.823 s, 125 MB/s

real 2m48.784s
user 0m2.955s
sys 0m17.956s

= 118.494644042 MB/s


Any ideas what could explain this result?

One coworker suggested that perhaps the LUKS container was returning from 
'sync' \
early, before actually finishing the write to disk. This would seem to violate 
the \
assumptions of the 'sync' primitive, so I have my doubts.

I'm also interested in learning how I can reliably benchmark the real cost of 
running \
full-disk encryption under btrfs on my system.

Thanks!


Evan Powell | Technical Lead
epow...@zenoss.com


Hi Evan,

just to be sure:

did you do a

echo 3  /proc/sys/vm/drop_caches

before *each* test ?

also try reversing the order of tests like so:

Encrypted
RAW

whether that makes a difference


It would also be interesting to see the output of

cryptsetup luksDump

and

Version:   *
Cipher name:*
Cipher mode:*
Hash spec:*




Interesting find indeed ! Thanks for sharing the finding

I'm currently using Btrfs on an encrypted system partition (without
AES-NI supported hardware) and things already feel and are faster than
with ext4

we need to find out what this magic is =)


Kind Regards


Thanks

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: One random read streaming is fast (~1200MB/s), but two or more are slower (~750MB/s)?

2013-05-02 Thread Matt Pursley
Hey Josef,

Were you able to try this multi-thread test on any more drives?


I did a test with 12, 6, 3, and 1 drive.  And, it looks like I see the
multi-thread speed reduces, as the number of drives in the raid goes
up.

Like this:
- 50% speed reduction with 2 threads on 12 drives
- 25% speed reduction with 2 threads on 6 drives
- 10% speed reduction with 2 threads on 3 drives
- 5% speed reduction with 2 threads on 1 drive



I only have 12 slots on my HBA card, but I wonder if 24 drives would
reduce the speed to 25% with 2 threads?

Matt










make btrfs fs...
___

12 drives...
mkfs.btrfs -f -d raid6 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde
/dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl

6 drives...
mkfs.btrfs -f -d raid6 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf

3 drives...
mkfs.btrfs -f -d raid5 /dev/sda /dev/sdb /dev/sdc

1 drive...
mkfs.btrfs -f /dev/sda

mount /dev/sda /tmp/btrfs_test/

___


make zero files...
___
kura1 ~ # for j in {1..2} ; do dd if=/dev/zero
of=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M count=1
conv=fdatasync  done
___


===

btrfs raid6 on 12 drives with 2 threads = ~650MB/s
___
kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M 
done
vm.drop_caches = 1
1048576 bytes (10 GB) copied, 31.0431 s, 338 MB/s
1048576 bytes (10 GB) copied, 31.2235 s, 336 MB/s

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M 
done
1048576 bytes (10 GB) copied, 29.869 s, 351 MB/s
1048576 bytes (10 GB) copied, 30.5561 s, 343 MB/s

___


btrfs raid6 on 12 drives with 1 thread = ~1100MB/s
___
kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M 
done
1048576 bytes (10 GB) copied, 9.69881 s, 1.1 GB/s

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M 
done
1048576 bytes (10 GB) copied, 9.56475 s, 1.1 GB/s
___


==

btrfs raid6 on 6 drives with 2 thread =  ~500MB/s
___
 kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M 
done
1048576 bytes (10 GB) copied, 41.3899 s, 253 MB/s
1048576 bytes (10 GB) copied, 41.6916 s, 252 MB/s

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M 
done
1048576 bytes (10 GB) copied, 40.3178 s, 260 MB/s
1048576 bytes (10 GB) copied, 41.4087 s, 253 MB/s

___



btrfs raid6 on 6 drives with 1 thread =  ~600MB/s
___

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M 
done
1048576 bytes (10 GB) copied, 17.5686 s, 597 MB/s

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M 
done
1048576 bytes (10 GB) copied, 17.5396 s, 598 MB/s
___


==

btrfs raid5 on 3 drives with 2 thread = ~300MB/s
___

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M 
done
1048576 bytes (10 GB) copied, 67.636 s, 155 MB/s
1048576 bytes (10 GB) copied, 70.1783 s, 149 MB/s

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M 
done
1048576 bytes (10 GB) copied, 69.4945 s, 151 MB/s
1048576 bytes (10 GB) copied, 70.8279 s, 148 MB/s

___



btrfs raid5 on 3 drives with 1 thread = ~319MB/s
___

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M 
done
1048576 bytes (10 GB) copied, 32.8559 s, 319 MB/s

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M 
done
1048576 bytes (10 GB) copied, 32.8483 s, 319 MB/s

___


==


btrfs (no raid) on 1 drive with 2 thread =  ~155MB/s
___
 kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M 
done
1048576 bytes (10 GB) copied, 134.982 s, 77.7 MB/s
1048576 bytes (10 GB) copied, 135.237 s, 77.5 MB/s

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd
of=/dev/null if=/tmp/btrfs_test/testfile_bs1m_size10GB_${j} bs=1M 
done
1048576 bytes (10 GB) copied, 134.549 s, 77.9 MB/s
1048576 bytes (10 GB) copied, 135.293 s, 77.5 MB/s


___


btrfs (no raid) on 1 drive with 1 thread =  ~162MB/s
___

kura1 btrfs_test # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd
of=/dev/null if=/tmp/btrfs_test

Re: One random read streaming is fast (~1200MB/s), but two or more are slower (~750MB/s)?

2013-04-26 Thread Matt Pursley
Hey Josef,

Thanks for looking into this further!That is about the same
results that I was seeing, though I didn't test it with just one
drive.. only with all 12 drives in my jbod.  I will do a test with
just one disk, and see if I also get the same results.

Let me know if you also see the same results with multiple drives in
your raid...


Thanks,
Matt





On Thu, Apr 25, 2013 at 2:10 PM, Josef Bacik jba...@fusionio.com wrote:
 On Thu, Apr 25, 2013 at 03:01:18PM -0600, Matt Pursley wrote:
 Ok, awesome, let me know how it goes..  I don't have the raid
 formatted to btrfs right now, but I could probably do that in about 30
 minutes or so.


 Huh so I'm getting the full bandwidth, 120 mb/s with one thread and 60 mb/s 
 with
 two threads.  These are just cheap sata drives tho, I'll try and dig up a box
 with 3 fusion cards for something a little closer to the speeds you are seeing
 and see if that makes a difference.  Thanks,

 Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: One random read streaming is fast (~1200MB/s), but two or more are slower (~750MB/s)?

2013-04-25 Thread Matt Pursley
Hey Josef,

Were you able to look into this any further?
It's still pretty reproducible on my machine...


Thanks,
Matt





On Thu, Apr 18, 2013 at 2:58 PM, Josef Bacik jba...@fusionio.com wrote:

 This is strange, and I can't see any reason why this would happen.  I'll try 
 and
 reproduce next week when I'm back from LSF.  Thanks,

 Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: One random read streaming is fast (~1200MB/s), but two or more are slower (~750MB/s)?

2013-04-17 Thread Matt Pursley
On Tue, Apr 16, 2013 at 11:55 PM, Sander san...@humilis.net wrote:
 Matt Pursley wrote (ao):
 I have an LSI HBA card (LSI SAS 9207-8i) with 12 7200rpm SAS drives
 attached.  When it's formated with mdraid6+ext4 I get about 1200MB/s
 for multiple streaming random reads with iozone.  With btrfs in
 3.9.0-rc4 I can also get about 1200MB/s, but only with one stream at a
 time.

 Just curious, is that btrfs on top of mdraid6, or is this experimental
 btrfs raid6 without md?



This is the experimental btrfs raid6 without md.

But, I did do a mdraid6 with btrfs test last night... and with that
setup I only get the ~750MB/s result.. even with just one
thread/stream...

I will flip the system back to btrfsraid6+btrfs today to verify that
I still get the full 1200MB/s with one stream/thread and ~750MB/s with
two or more streams/threads with that setup...


Thanks,
Matt



___
mdraid6+btrfs_64GBRam_80files # sysctl vm.drop_caches=1 ; dd
of=/dev/null if=/var/data/persist/testfile bs=640k
vm.drop_caches = 1
2+0 records in
2+0 records out
1310720 bytes (13 GB) copied, 18.2109 s, 720 MB/s
___
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


One random read streaming is fast (~1200MB/s), but two or more are slower (~750MB/s)?

2013-04-16 Thread Matt Pursley
Hi All,

I have an LSI HBA card (LSI SAS 9207-8i) with 12 7200rpm SAS drives
attached.  When it's formated with mdraid6+ext4 I get about 1200MB/s
for multiple streaming random reads with iozone.  With btrfs in
3.9.0-rc4 I can also get about 1200MB/s, but only with one stream at a
time.

As soon as I add a second (or more), the speed will drop to about
750MB/s.  If I add more streams (10, 20, etc), the total throughput
stays at around 750MB/s.  I only see the full 1200MB/s in btrfs when
I'm running a single read at a time (e.g. sequential reads with dd,
random reads with iozone, etc).

This feel like a bug or mis-configuration on my system.   As if can
read at the full speed, but just only with one stream running at a
time.  The options I have tried varying are -l 64k with mkfs.btrfs,
and -o thread_pool=16 when mounting.  But, neither of those options
seem to change the behaviour.



Anyone know any reasons why I would see the speed drop when going from
one to more then one stream at a time with btrfs raid6?  We would like
to use btrfs (mostly for snapshots), but we do need to get the full
1200MB/s streaming speeds too..





Thanks,
Matt



___
Here's some example output..



Single thread = ~1.1GB/s
_
kura1 persist # sysctl vm.drop_caches=1 ; dd if=/dev/zero
of=/var/data/persist/testfile bs=640k count=2
vm.drop_caches = 1
2+0 records in
2+0 records out
1310720 bytes (13 GB) copied, 7.14139 s, 1.8 GB/s

kura1 persist # sysctl vm.drop_caches=1 ; dd of=/dev/null
if=/var/data/persist/testfile bs=640k
vm.drop_caches = 1
2+0 records in
2+0 records out
1310720 bytes (13 GB) copied, 11.2666 s, 1.2 GB/s

kura1 persist # sysctl vm.drop_caches=1 ; dd of=/dev/null
if=/var/data/persist/testfile bs=640k
vm.drop_caches = 1
2+0 records in
2+0 records out
1310720 bytes (13 GB) copied, 11.5005 s, 1.1 GB/s





1 thread = ~1000MB/s ...
___
kura1 scripts # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd
of=/dev/null if=/var/data/persist/testfile_$j bs=640k ; done
vm.drop_caches = 1
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 6.52018 s, 1.0 GB/s
kura1 scripts # sysctl vm.drop_caches=1 ; for j in {1..1} ; do dd
of=/dev/null if=/var/data/persist/testfile_$j bs=640k ; done
vm.drop_caches = 1
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 6.55731 s, 999 MB/s
___


2 threads = ~750MB/s combined...
___
# sysctl vm.drop_caches=1 ; for j in {1..2} ; do dd of=/dev/null
if=/var/data/persist/testfile_$j bs=640k  done
vm.drop_caches = 1
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 17.5068 s, 374 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 17.7599 s, 369 MB/s
___



20 threads = ~750MB/s combined...
___
# sysctl vm.drop_caches=1 ; for j in {1..20} ; do dd of=/dev/null
if=/var/data/persist/testfile_$j bs=640k  done
vm.drop_caches = 1
kura1 scripts # 1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 168.223 s, 39.0 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 168.275 s, 38.9 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 169.466 s, 38.7 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 169.606 s, 38.6 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 170.503 s, 38.4 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 170.629 s, 38.4 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 170.633 s, 38.4 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 170.744 s, 38.4 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 170.844 s, 38.4 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 170.896 s, 38.3 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 171.027 s, 38.3 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 171.135 s, 38.3 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 171.389 s, 38.2 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 171.414 s, 38.2 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 171.674 s, 38.2 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 171.897 s, 38.1 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 171.956 s, 38.1 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 171.995 s, 38.1 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 172.044 s, 38.1 MB/s
1+0 records in
1+0 records out
655360 bytes (6.6 GB) copied, 172.08 s, 38.1 MB/s




### Similar results with random reads in iozone...

1 thread = ~1000MB/s
_
kura1 scripts # for j in {1..1} ; do sysctl vm.drop_caches=1 ; iozone
-f /var/data/10GBfolders

Btrfs and more compression algorithms

2012-05-24 Thread Matt
Hi Chris, Hi Josef,

Hi Btrfs-List and all other Btrfs-devs that I've forgot,


is there a chance we'll see a xz file-compression support in Btrfs
anytime soon ?

I'm sure folks have been waiting for additional compression support
besides gzip and lzo (bzip2 seems out of question due to its slowness,
there's pbzip2 but that's not included in the kernel).

This would be a really nice bonus due to the processors getting faster
and SSD usage is more and more widespread - add an efficient
implementation and

we would have a fast, extremely efficient and feature-rich filesystem.

My current situation is that several of my harddrives are almost
completely full - even with forced gzip-compression - so I thought I'd
asked whether there was any change in the near future ahead.
There's fusecompress but that probably wouldn't end up being as stable
as a btrfs with xz/lzma-support.


Thanks for your consideration and your work on Btrfs !

It got significantly more stable compared to the past :)

(I use it mainly for some small backup hdds;

a troublesome usage however is still suspending-to-ram/to-disk
regularly and with that the partition [I have a dedicated partition
for the portage-tarball of Gentoo Linux]
where the filesystem seems to take some damage where it can't be
written to anymore via rsync (or other programs). The bash session
hangs (and nothing gets written to the partition).
Running scrub revealed no issues. I haven't had a chance to test it
yet with the new btrfs-progs - haven't suspended meanwhile)


Kind Regards

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 0/3] drivers/staging: zcache: dynamic page cache/swap compression

2011-02-15 Thread Matt
On Mon, Feb 14, 2011 at 8:59 PM, Matt jackdac...@gmail.com wrote:
 On Mon, Feb 14, 2011 at 1:29 AM, Matt jackdac...@gmail.com wrote:
 On Mon, Feb 14, 2011 at 1:24 AM, Matt jackdac...@gmail.com wrote:
 On Mon, Feb 14, 2011 at 12:08 AM, Matt jackdac...@gmail.com wrote:
 On Wed, Feb 9, 2011 at 1:03 AM, Dan Magenheimer
 dan.magenhei...@oracle.com wrote:
 [snip]

 If I've missed anything important, please let me know!

 Thanks again!
 Dan


 Hi Dan,

 thank you so much for answering my email in such detail !

 I shall pick up on that mail in my next email sending to the mailing list 
 :)


 currently I've got a problem with btrfs which seems to get triggered
 by cleancache get-operations:


 Feb 14 00:37:19 lupus kernel: [ 2831.297377] device fsid
 354120c992a00761-5fa07d400126a895 devid 1 transid 7
 /dev/mapper/portage
 Feb 14 00:37:19 lupus kernel: [ 2831.297698] btrfs: enabling disk space 
 caching
 Feb 14 00:37:19 lupus kernel: [ 2831.297700] btrfs: force lzo compression
 Feb 14 00:37:19 lupus kernel: [ 2831.315844] zcache: created ephemeral
 tmem pool, id=3
 Feb 14 00:39:20 lupus kernel: [ 2951.853188] BUG: unable to handle
 kernel paging request at 01400050
 Feb 14 00:39:20 lupus kernel: [ 2951.853219] IP: [8133ef1b]
 btrfs_encode_fh+0x2b/0x120
 Feb 14 00:39:20 lupus kernel: [ 2951.853242] PGD 0
 Feb 14 00:39:20 lupus kernel: [ 2951.853251] Oops:  [#1] PREEMPT SMP
 Feb 14 00:39:20 lupus kernel: [ 2951.853275] last sysfs file:
 /sys/devices/platform/coretemp.3/temp1_input
 Feb 14 00:39:20 lupus kernel: [ 2951.853295] CPU 4
 Feb 14 00:39:20 lupus kernel: [ 2951.853303] Modules linked in: radeon
 ttm drm_kms_helper cfbcopyarea cfbimgblt cfbfillrect ipt_REJECT
 ipt_LOG xt_limit xt_tcpudp xt_state nf_nat_irc nf_conntrack_irc
 nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp
 iptable_filter ipt_addrtype xt_DSCP xt_dscp xt_iprange ip_tables
 ip6table_filter xt_NFQUEUE xt_owner xt_hashlimit xt_conntrack xt_mark
 xt_multiport xt_connmark nf_conntrack xt_string ip6_tables x_tables
 it87 hwmon_vid coretemp snd_seq_dummy snd_seq_oss snd_seq_midi_event
 snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_hda_codec_hdmi
 snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm
 snd_timer snd soundcore i2c_i801 wmi e1000e shpchp snd_page_alloc
 libphy e1000 scsi_wait_scan sl811_hcd ohci_hcd ssb usb_storage
 ehci_hcd [last unloaded: tg3]
 Feb 14 00:39:20 lupus kernel: [ 2951.853682]
 Feb 14 00:39:20 lupus kernel: [ 2951.853690] Pid: 11394, comm:
 btrfs-transacti Not tainted 2.6.37-plus_v16_zcache #4 FMP55/ipower
 G3710
 Feb 14 00:39:20 lupus kernel: [ 2951.853725] RIP:
 0010:[8133ef1b]  [8133ef1b]
 btrfs_encode_fh+0x2b/0x120
 Feb 14 00:39:20 lupus kernel: [ 2951.853751] RSP:
 0018:880129a11b00  EFLAGS: 00010246
 Feb 14 00:39:20 lupus kernel: [ 2951.853767] RAX: 00ff
 RBX: 88014a1ce628 RCX: 
 Feb 14 00:39:20 lupus kernel: [ 2951.853788] RDX: 880129a11b3c
 RSI: 880129a11b70 RDI: 0006
 Feb 14 00:39:20 lupus kernel: [ 2951.853808] RBP: 0140
 R08: 8133eef0 R09: 880129a11c68
 Feb 14 00:39:20 lupus kernel: [ 2951.853829] R10: 0001
 R11: 0001 R12: 88014a1ce780
 Feb 14 00:39:20 lupus kernel: [ 2951.853849] R13: 88021fefc000
 R14: 88021fef9000 R15: 
 Feb 14 00:39:20 lupus kernel: [ 2951.853870] FS:
 () GS:8800bf50()
 knlGS:
 Feb 14 00:39:20 lupus kernel: [ 2951.853894] CS:  0010 DS:  ES:
  CR0: 8005003b
 Feb 14 00:39:20 lupus kernel: [ 2951.853911] CR2: 01400050
 CR3: 01c27000 CR4: 06e0
 Feb 14 00:39:20 lupus kernel: [ 2951.853932] DR0: 
 DR1:  DR2: 
 Feb 14 00:39:20 lupus kernel: [ 2951.853952] DR3: 
 DR6: 0ff0 DR7: 0400
 Feb 14 00:39:20 lupus kernel: [ 2951.853973] Process btrfs-transacti
 (pid: 11394, threadinfo 880129a1, task 880202e4ac40)
 Feb 14 00:39:20 lupus kernel: [ 2951.853999] Stack:
 Feb 14 00:39:20 lupus kernel: [ 2951.854006]  880129a11b50
 8803 88003c60a098 0003
 Feb 14 00:39:20 lupus kernel: [ 2951.854035]  
 810e6aaa  000602e4ac40
 Feb 14 00:39:20 lupus kernel: [ 2951.854063]  8133e3f0
 810e6cee 1000 
 Feb 14 00:39:20 lupus kernel: [ 2951.854092] Call Trace:
 Feb 14 00:39:20 lupus kernel: [ 2951.854103]  [810e6aaa] ?
 cleancache_get_key+0x4a/0x60
 Feb 14 00:39:20 lupus kernel: [ 2951.854122]  [8133e3f0] ?
 btrfs_wake_function+0x0/0x20
 Feb 14 00:39:20 lupus kernel: [ 2951.854140]  [810e6cee] ?
 __cleancache_flush_inode+0x3e/0x70
 Feb 14 00:39:20 lupus kernel: [ 2951.854161]  [810b34d2] ?
 truncate_inode_pages_range+0x42/0x440
 Feb 14 00:39:20 lupus kernel: [ 2951.854182

Re: [PATCH V2 0/3] drivers/staging: zcache: dynamic page cache/swap compression

2011-02-15 Thread Matt
On Mon, Feb 14, 2011 at 4:35 AM, Minchan Kim minchan@gmail.com wrote:
 On Mon, Feb 14, 2011 at 10:29 AM, Matt jackdac...@gmail.com wrote:
 On Mon, Feb 14, 2011 at 1:24 AM, Matt jackdac...@gmail.com wrote:
 On Mon, Feb 14, 2011 at 12:08 AM, Matt jackdac...@gmail.com wrote:
 On Wed, Feb 9, 2011 at 1:03 AM, Dan Magenheimer
 dan.magenhei...@oracle.com wrote:
 [snip]

 If I've missed anything important, please let me know!

 Thanks again!
 Dan


 Hi Dan,

 thank you so much for answering my email in such detail !

 I shall pick up on that mail in my next email sending to the mailing list 
 :)


 currently I've got a problem with btrfs which seems to get triggered
 by cleancache get-operations:


 Feb 14 00:37:19 lupus kernel: [ 2831.297377] device fsid
 354120c992a00761-5fa07d400126a895 devid 1 transid 7
 /dev/mapper/portage
 Feb 14 00:37:19 lupus kernel: [ 2831.297698] btrfs: enabling disk space 
 caching
 Feb 14 00:37:19 lupus kernel: [ 2831.297700] btrfs: force lzo compression
 Feb 14 00:37:19 lupus kernel: [ 2831.315844] zcache: created ephemeral
 tmem pool, id=3
 Feb 14 00:39:20 lupus kernel: [ 2951.853188] BUG: unable to handle
 kernel paging request at 01400050
 Feb 14 00:39:20 lupus kernel: [ 2951.853219] IP: [8133ef1b]
 btrfs_encode_fh+0x2b/0x120
 Feb 14 00:39:20 lupus kernel: [ 2951.853242] PGD 0
 Feb 14 00:39:20 lupus kernel: [ 2951.853251] Oops:  [#1] PREEMPT SMP
 Feb 14 00:39:20 lupus kernel: [ 2951.853275] last sysfs file:
 /sys/devices/platform/coretemp.3/temp1_input
 Feb 14 00:39:20 lupus kernel: [ 2951.853295] CPU 4
 Feb 14 00:39:20 lupus kernel: [ 2951.853303] Modules linked in: radeon
 ttm drm_kms_helper cfbcopyarea cfbimgblt cfbfillrect ipt_REJECT
 ipt_LOG xt_limit xt_tcpudp xt_state nf_nat_irc nf_conntrack_irc
 nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp
 iptable_filter ipt_addrtype xt_DSCP xt_dscp xt_iprange ip_tables
 ip6table_filter xt_NFQUEUE xt_owner xt_hashlimit xt_conntrack xt_mark
 xt_multiport xt_connmark nf_conntrack xt_string ip6_tables x_tables
 it87 hwmon_vid coretemp snd_seq_dummy snd_seq_oss snd_seq_midi_event
 snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_hda_codec_hdmi
 snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm
 snd_timer snd soundcore i2c_i801 wmi e1000e shpchp snd_page_alloc
 libphy e1000 scsi_wait_scan sl811_hcd ohci_hcd ssb usb_storage
 ehci_hcd [last unloaded: tg3]
 Feb 14 00:39:20 lupus kernel: [ 2951.853682]
 Feb 14 00:39:20 lupus kernel: [ 2951.853690] Pid: 11394, comm:
 btrfs-transacti Not tainted 2.6.37-plus_v16_zcache #4 FMP55/ipower
 G3710
 Feb 14 00:39:20 lupus kernel: [ 2951.853725] RIP:
 0010:[8133ef1b]  [8133ef1b]
 btrfs_encode_fh+0x2b/0x120
 Feb 14 00:39:20 lupus kernel: [ 2951.853751] RSP:
 0018:880129a11b00  EFLAGS: 00010246
 Feb 14 00:39:20 lupus kernel: [ 2951.853767] RAX: 00ff
 RBX: 88014a1ce628 RCX: 
 Feb 14 00:39:20 lupus kernel: [ 2951.853788] RDX: 880129a11b3c
 RSI: 880129a11b70 RDI: 0006
 Feb 14 00:39:20 lupus kernel: [ 2951.853808] RBP: 0140
 R08: 8133eef0 R09: 880129a11c68
 Feb 14 00:39:20 lupus kernel: [ 2951.853829] R10: 0001
 R11: 0001 R12: 88014a1ce780
 Feb 14 00:39:20 lupus kernel: [ 2951.853849] R13: 88021fefc000
 R14: 88021fef9000 R15: 
 Feb 14 00:39:20 lupus kernel: [ 2951.853870] FS:
 () GS:8800bf50()
 knlGS:
 Feb 14 00:39:20 lupus kernel: [ 2951.853894] CS:  0010 DS:  ES:
  CR0: 8005003b
 Feb 14 00:39:20 lupus kernel: [ 2951.853911] CR2: 01400050
 CR3: 01c27000 CR4: 06e0
 Feb 14 00:39:20 lupus kernel: [ 2951.853932] DR0: 
 DR1:  DR2: 
 Feb 14 00:39:20 lupus kernel: [ 2951.853952] DR3: 
 DR6: 0ff0 DR7: 0400
 Feb 14 00:39:20 lupus kernel: [ 2951.853973] Process btrfs-transacti
 (pid: 11394, threadinfo 880129a1, task 880202e4ac40)
 Feb 14 00:39:20 lupus kernel: [ 2951.853999] Stack:
 Feb 14 00:39:20 lupus kernel: [ 2951.854006]  880129a11b50
 8803 88003c60a098 0003
 Feb 14 00:39:20 lupus kernel: [ 2951.854035]  
 810e6aaa  000602e4ac40
 Feb 14 00:39:20 lupus kernel: [ 2951.854063]  8133e3f0
 810e6cee 1000 
 Feb 14 00:39:20 lupus kernel: [ 2951.854092] Call Trace:
 Feb 14 00:39:20 lupus kernel: [ 2951.854103]  [810e6aaa] ?
 cleancache_get_key+0x4a/0x60
 Feb 14 00:39:20 lupus kernel: [ 2951.854122]  [8133e3f0] ?
 btrfs_wake_function+0x0/0x20
 Feb 14 00:39:20 lupus kernel: [ 2951.854140]  [810e6cee] ?
 __cleancache_flush_inode+0x3e/0x70
 Feb 14 00:39:20 lupus kernel: [ 2951.854161]  [810b34d2] ?
 truncate_inode_pages_range+0x42/0x440
 Feb 14 00:39:20 lupus kernel: [ 2951.854182

Re: [PATCH V2 0/3] drivers/staging: zcache: dynamic page cache/swap compression

2011-02-15 Thread Matt
On Wed, Feb 16, 2011 at 1:27 AM, Dan Magenheimer
dan.magenhei...@oracle.com wrote:
 -Original Message-
 From: Matt [mailto:jackdac...@gmail.com]
 Sent: Tuesday, February 15, 2011 5:12 PM
 To: Minchan Kim
 Cc: Dan Magenheimer; gre...@suse.de; Chris Mason; linux-
 ker...@vger.kernel.org; linux...@kvack.org; ngu...@vflare.org; linux-
 bt...@vger.kernel.org; Josef Bacik; Dan Rosenberg; Yan Zheng;
 mi...@cn.fujitsu.com; Li Zefan
 Subject: Re: [PATCH V2 0/3] drivers/staging: zcache: dynamic page
 cache/swap compression

 On Mon, Feb 14, 2011 at 4:35 AM, Minchan Kim minchan@gmail.com
 wrote:
  On Mon, Feb 14, 2011 at 10:29 AM, Matt jackdac...@gmail.com wrote:
  On Mon, Feb 14, 2011 at 1:24 AM, Matt jackdac...@gmail.com wrote:
  On Mon, Feb 14, 2011 at 12:08 AM, Matt jackdac...@gmail.com
 wrote:
  On Wed, Feb 9, 2011 at 1:03 AM, Dan Magenheimer
  dan.magenhei...@oracle.com wrote:
  [snip]
 
  If I've missed anything important, please let me know!
 
  Thanks again!
  Dan
 
 
  Hi Dan,
 
  thank you so much for answering my email in such detail !
 
  I shall pick up on that mail in my next email sending to the
 mailing list :)
 
 
  currently I've got a problem with btrfs which seems to get
 triggered
  by cleancache get-operations:
 
 
  Feb 14 00:37:19 lupus kernel: [ 2831.297377] device fsid
  354120c992a00761-5fa07d400126a895 devid 1 transid 7
  /dev/mapper/portage
  Feb 14 00:37:19 lupus kernel: [ 2831.297698] btrfs: enabling disk
 space caching
  Feb 14 00:37:19 lupus kernel: [ 2831.297700] btrfs: force lzo
 compression
  Feb 14 00:37:19 lupus kernel: [ 2831.315844] zcache: created
 ephemeral
  tmem pool, id=3
  Feb 14 00:39:20 lupus kernel: [ 2951.853188] BUG: unable to handle
  kernel paging request at 01400050
  Feb 14 00:39:20 lupus kernel: [ 2951.853219] IP:
 [8133ef1b]
  btrfs_encode_fh+0x2b/0x120
  Feb 14 00:39:20 lupus kernel: [ 2951.853242] PGD 0
  Feb 14 00:39:20 lupus kernel: [ 2951.853251] Oops:  [#1]
 PREEMPT SMP
  Feb 14 00:39:20 lupus kernel: [ 2951.853275] last sysfs file:
  /sys/devices/platform/coretemp.3/temp1_input
  Feb 14 00:39:20 lupus kernel: [ 2951.853295] CPU 4
  Feb 14 00:39:20 lupus kernel: [ 2951.853303] Modules linked in:
 radeon
  ttm drm_kms_helper cfbcopyarea cfbimgblt cfbfillrect ipt_REJECT
  ipt_LOG xt_limit xt_tcpudp xt_state nf_nat_irc nf_conntrack_irc
  nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
 nf_conntrack_ftp
  iptable_filter ipt_addrtype xt_DSCP xt_dscp xt_iprange ip_tables
  ip6table_filter xt_NFQUEUE xt_owner xt_hashlimit xt_conntrack
 xt_mark
  xt_multiport xt_connmark nf_conntrack xt_string ip6_tables
 x_tables
  it87 hwmon_vid coretemp snd_seq_dummy snd_seq_oss
 snd_seq_midi_event
  snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss
 snd_hda_codec_hdmi
  snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep
 snd_pcm
  snd_timer snd soundcore i2c_i801 wmi e1000e shpchp snd_page_alloc
  libphy e1000 scsi_wait_scan sl811_hcd ohci_hcd ssb usb_storage
  ehci_hcd [last unloaded: tg3]
  Feb 14 00:39:20 lupus kernel: [ 2951.853682]
  Feb 14 00:39:20 lupus kernel: [ 2951.853690] Pid: 11394, comm:
  btrfs-transacti Not tainted 2.6.37-plus_v16_zcache #4 FMP55/ipower
  G3710
  Feb 14 00:39:20 lupus kernel: [ 2951.853725] RIP:
  0010:[8133ef1b]  [8133ef1b]
  btrfs_encode_fh+0x2b/0x120
  Feb 14 00:39:20 lupus kernel: [ 2951.853751] RSP:
  0018:880129a11b00  EFLAGS: 00010246
  Feb 14 00:39:20 lupus kernel: [ 2951.853767] RAX: 00ff
  RBX: 88014a1ce628 RCX: 
  Feb 14 00:39:20 lupus kernel: [ 2951.853788] RDX: 880129a11b3c
  RSI: 880129a11b70 RDI: 0006
  Feb 14 00:39:20 lupus kernel: [ 2951.853808] RBP: 0140
  R08: 8133eef0 R09: 880129a11c68
  Feb 14 00:39:20 lupus kernel: [ 2951.853829] R10: 0001
  R11: 0001 R12: 88014a1ce780
  Feb 14 00:39:20 lupus kernel: [ 2951.853849] R13: 88021fefc000
  R14: 88021fef9000 R15: 
  Feb 14 00:39:20 lupus kernel: [ 2951.853870] FS:
  () GS:8800bf50()
  knlGS:
  Feb 14 00:39:20 lupus kernel: [ 2951.853894] CS:  0010 DS: 
 ES:
   CR0: 8005003b
  Feb 14 00:39:20 lupus kernel: [ 2951.853911] CR2: 01400050
  CR3: 01c27000 CR4: 06e0
  Feb 14 00:39:20 lupus kernel: [ 2951.853932] DR0: 
  DR1:  DR2: 
  Feb 14 00:39:20 lupus kernel: [ 2951.853952] DR3: 
  DR6: 0ff0 DR7: 0400
  Feb 14 00:39:20 lupus kernel: [ 2951.853973] Process btrfs-
 transacti
  (pid: 11394, threadinfo 880129a1, task 880202e4ac40)
  Feb 14 00:39:20 lupus kernel: [ 2951.853999] Stack:
  Feb 14 00:39:20 lupus kernel: [ 2951.854006]  880129a11b50
  8803 88003c60a098 0003
  Feb 14 00:39:20 lupus kernel: [ 2951.854035]  
  810e6aaa 

Re: 2.6.37: Multi-second I/O latency while untarring

2011-02-11 Thread Matt
On Fri, Feb 11, 2011 at 3:08 PM, Andrew Lutomirski a...@luto.us wrote:
 As I type this, I have an ssh process running that's dumping data into
 a fifo at high speed (maybe 500Mbps) and a tar process that's
 untarring from the same fifo onto btrfs.  The btrfs fs is mounted -o
 space_cache,compress.  This machine has 8GB ram, 8 logical cores, and
 a fast (i7-2600) CPU, so it's not an issue with the machine struggling
 under load.

 Every few tens of seconds, my system stalls for several seconds.
 These stalls cause keyboard input to be lost, firefox to hang, etc.

 Setting tar's ionice priority to best effort / 7 or to idle makes no 
 difference.

 ionice idle and queue_depth = 1 on the disk (a slow 2TB WD) also makes
 no difference.

 max_sectors_kb = 64 in addition to the above doesn't help either.

 latencytop shows regular instances of 2-7 *second* latency, variously
 in sync_page, start_transaction, btrfs_start_ordered_extent, and
 do_get_write_access (from jbd2 on my ext4 root partition).

 echo 3 drop_caches gave me 7 GB free RAM.  I still had stalls when
 4-5 GB were still free (so it shouldn't be a problem with important
 pages being evicted).

 In case it matters, all of my partitions are on LVM on dm-crypt, but
 this machine has AES-NI so the overhead from that should be minimal.
 In fact, overall CPU usage is only about 10%.

 What gives?  I thought this stuff was supposed to be better on modern kernels.

 --Andy
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hi Andrew,

you could try the following patch to speed up dm-crypt:

https://patchwork.kernel.org/patch/365542/

I'm using it on top of a highly-patched 2.6.37 kernel

not sure if exactly that version was included in 2.6.38


there are some additional handles to speed up dm:

e.g. PCRYCONFIG_CRYPTO_PCRYPT=y

Regards

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption?

2011-01-07 Thread Matt
On Thu, Jan 6, 2011 at 4:56 PM, Heinz Diehl h...@fancy-poultry.org wrote:
 On 05.12.2010, Milan Broz wrote:

 It still seems to like dmcrypt with its parallel processing is just
 trigger to another bug in 37-rc.

 To come back to this: my 3 systems (XFS filesystem) running the latest
 dm-crypt-scale-to-multiple-cpus patch from Andi Kleen/Milan Broz have
 not showed a single problem since 2.6.37-rc6 and above. No corruption any
 longer, no freezes, nothing. The patch applies cleanly to 2.6.37, too,
 and runs just fine.

 I blindly guess that my data corruption problem was related to something else 
 in the
 2.6.37-rc series up to -rc4/5.

 Since this patch is a significant improvement: any chance that it finally gets
 merged into mainline/stable?



Hi Heinz,

I've been using this patch since 2.6.37-rc6+ with ext4 and xfs
filesystems and haven't seen any corruptions since then
(ext4 got fixed since 2.6.37-rc6, xfs showed no problems from the start)

http://git.eu.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1449032be17abb69116dbc393f67ceb8bd034f92
(is the actual temporary fix for ext4)

Regards

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-15 Thread Matt
On Mon, Dec 13, 2010 at 7:56 PM, Jon Nelson jnel...@jamponi.net wrote:
 On Sun, Dec 12, 2010 at 8:06 PM, Ted Ts'o ty...@mit.edu wrote:
 On Sun, Dec 12, 2010 at 07:11:28AM -0600, Jon Nelson wrote:
 I'm glad you've been able to reproduce the problem! If you should need
 any further assistance, please do not hesitate to ask.

 This patch seems to fix the problem for me.  (Unless the partition is
 mounted with mblk_io_submit.)

 Could you confirm that it fixes it for you as well?

 I believe I have applied the (relevant) inode.c changes to
 bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc, rebuilt and begun testing.
 Now at 28 passes without error, I think I can say that the patch
 appears to resolve the issue.

 --
 Jon


Confirmed !

I'm running my box for 5+ hours right now with your patch applied in
addition to Andi's/Milan's patch
(http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-crypt-scale-to-multiple-CPUs.patch)
, Ted, and can't see any indications of corruptions so far (while
doing an emerge -e system) and doing everyday stuff.
My /home partition (with ext4) is also still intact [which of course
has a backup] so it seems to fix it for me, too

so the corruption I was seeing was similar in a way to that of Jon

You can add a
Tested-by: Matthias Bayer jackdac...@gmail.com

Thanks a lot to everyone for your support ! :)


I have a question though: the deactivation of multiple page-io
submission support most likely only would affect bigger systems or
also desktop systems (like mine) ?

Regards

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-15 Thread Matt
On Wed, Dec 15, 2010 at 8:16 PM, Andi Kleen a...@firstfloor.org wrote:
 I have a question though: the deactivation of multiple page-io
 submission support most likely only would affect bigger systems or
 also desktop systems (like mine) ?

 I think this is not a final fix, just a workaround.
 The problem with the other path still really needs to be tracked down.

 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only.


ok,

thanks for the clarification

Regards

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-15 Thread Matt
On Wed, Dec 15, 2010 at 8:25 PM, Matt jackdac...@gmail.com wrote:
 On Wed, Dec 15, 2010 at 8:16 PM, Andi Kleen a...@firstfloor.org wrote:
 I have a question though: the deactivation of multiple page-io
 submission support most likely only would affect bigger systems or
 also desktop systems (like mine) ?

 I think this is not a final fix, just a workaround.
 The problem with the other path still really needs to be tracked down.

 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only.


 ok,

 thanks for the clarification

 Regards

 Matt


Sorry to spam the mailing lists again

make that a
Reported-and-Tested-by: Matthias Bayer jackdac...@gmail.com

(hope that's the correct way to write it)

Regards

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Matt
On Fri, Dec 10, 2010 at 2:38 AM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Andi Kleen's message of 2010-12-09 18:16:16 -0500:
  512MB.
 
  'free' reports 75MB, 419MB free.
 
  I originally noticed the problem on really real hardware (thinkpad
  T61p), however.

 If you can easily reproduce it could you try a git bisect?

 Do we have a known good kernel?  I looked back through the thread and
 didn't see any reports where the postgres test on ext4 passed in this
 config.

 -chris


Try a kernel before 5a87b7a5da250c9be6d757758425dfeaf8ed3179

from the tests I've done that one showed the least or no corruption if
you count the empty /etc/env.d/03opengl as an artefact

(I tested 3 commits in total)


1) 5a87b7a5da250c9be6d757758425dfeaf8ed3179

2) 1de3e3df917459422cb2aecac440febc8879d410

3) bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc

1 - 3 (earlier - later)

Regards

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [dm-devel] hunt for 2.6.37 dm-crypt+ext4 corruption?

2010-12-05 Thread Matt
On Sun, Dec 5, 2010 at 2:24 PM, Theodore Tso ty...@mit.edu wrote:

 On Dec 5, 2010, at 5:21 AM, Milan Broz wrote:

 Which kernel? 2.6.37-rc?

 Anyone seen this with 2.6.36 and the same dmcrypt patch?
 (All info I had is that is is stable with here.)

 It still seems to like dmcrypt with its parallel processing is just
 trigger to another bug in 37-rc.

 I've been using a kernel which is between 2.6.37-rc2 and -rc3 with a LUKS / 
 dm-crypt / LVM / ext4 setup for my primary file systems, and I haven't 
 observed any corruption for the last two weeks or so.   It's on my todo list 
 to upgrade to top of Linus's tree, but perhaps this is a useful data point.

 As another thought, what version of GCC are people using who are having 
 difficulty?   Could this perhaps be a compiler-related issue?

 -- Ted



Hi Ted,

to quote its output:


gcc -v
Using built-in specs.
COLLECT_GCC=/usr/x86_64-pc-linux-gnu/gcc-bin/4.5.1/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/4.5.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with:
/var/tmp/portage/sys-devel/gcc-4.5.1-r1/work/gcc-4.5.1/configure
--prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.5.1
--includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.1/include
--datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.5.1
--mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.5.1/man
--infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.5.1/info
--with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.5.1/include/g++-v4
--host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu
--disable-altivec --disable-fixed-point --with-ppl --with-cloog
--enable-lto --enable-nls --without-included-gettext
--with-system-zlib --disable-werror --enable-secureplt
--enable-multilib --enable-libmudflap --disable-libssp --enable-esp
--enable-libgomp --enable-cld
--with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/4.5.1/python
--enable-checking=release --enable-java-awt=gtk --enable-objc-gc
--enable-languages=c,c++,java,objc,obj-c++,fortran --enable-shared
--enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu
--with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo
Hardened 4.5.1-r1 p1.4, pie-0.4.5'
Thread model: posix
gcc version 4.5.1 (Gentoo Hardened 4.5.1-r1 p1.4, pie-0.4.5)


output of emerge -p gcc:

These are the packages that would be merged, in order:

Calculating dependencies  ... done!
[ebuild   R   ] sys-devel/gcc-4.5.1-r1  USE=fortran gcj graphite gtk
hardened lto mudflap (multilib) multislot nls nptl objc objc++ objc-gc
openmp (-altivec) -bootstrap -build -doc (-fixed-point) (-libffi)
(-n32) (-n64) -nocxx -nopie -nossp -test -vanilla 0 kB



and to be precise it's gcc 4.5.1 with some gentoo-specific fixes and
fixes from upstream (4.5.2) [take a look at patchset 1.4],
in my case it also has the --enable-esp functionality [hardened]
which should include something like -D_FORTIFY_SOURCE=2, -fstack-protector-all
and for linking/ldd: -Wl,-z,now -Wl,-z,relro

(I don't know if the part with the linker and the fstack-protector is accurate)

I'm adding below the output of mount of the system-partition of the
system I was running the kernel on - where the [more observable]
corruption was observed (checkout
bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc)
- this output got generated while I mounted it from my working (no
corruption observed) system with 2.6.36 kernel - I don't know if it's
useful - just in case you might need it
[forgot to post this in my last email]

Thanks  Regards

Matt



[  607.849644] EXT4-fs (dm-7): INFO: recovery required on readonly filesystem
[  607.849651] EXT4-fs (dm-7): write access will be enabled during recovery
[  609.559363] EXT4-fs (dm-7): orphan cleanup on readonly fs
[  609.559375] EXT4-fs (dm-7): ext4_orphan_cleanup: truncating inode
2238873 to 0 bytes
[  609.559493] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting
unreferenced inode 2231865
[  609.559531] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting
unreferenced inode 2231870
[  609.559553] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting
unreferenced inode 2396001
[  609.559588] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting
unreferenced inode 2396036
[  609.559610] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting
unreferenced inode 2395699
[  609.559675] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting
unreferenced inode 2231859
[  609.559695] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting
unreferenced inode 2231868
[  609.559715] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting
unreferenced inode 2396696
[  609.559736] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting
unreferenced inode 2396697
[  609.559755] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting
unreferenced inode 2396699
[  609.559775] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting
unreferenced inode 2395948
[  609.559809] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting
unreferenced inode 2231856
[  609.559830] EXT4-fs (dm-7): ext4_orphan_cleanup: deleting
unreferenced inode 2231866
[  609.559850] EXT4-fs (dm

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-04 Thread Matt
On Wed, Dec 1, 2010 at 10:23 PM, Mike Snitzer snit...@redhat.com wrote:
 On Wed, Dec 01 2010 at  3:45pm -0500,
 Milan Broz mb...@redhat.com wrote:


 On 12/01/2010 08:34 PM, Jon Nelson wrote:
  Perhaps this is useful: for myself, I found that when I started using
  2.6.37rc3 that postgresql starting having a *lot* of problems with
  corruption. Specifically, I noted zeroed pages, corruption in headers,
  all sorts of stuff on /newly created/ tables, especially during index
  creation. I had a fairly high hit rate of failure. I backed off to
  2.6.34.7 and have *zero* problems (in fact, prior to 2.6.37rc3, I had
  never had a corruption issue with postgresql). I ran on 2.6.36 for a
  few weeks as well, without issue.
 
  I am using kcrypt with lvm on top of that, and ext4 on top of that.

 With unpatched dmcrypt (IOW with Linus' git)? Then it must be ext4 or
 dm-core problem because there were no patches for dm-crypt...

 Matt and Jon,

 If you'd be up to it: could you try testing your dm-crypt+ext4
 corruption reproducers against the following two 2.6.37-rc commits:

 1) 1de3e3df917459422cb2aecac440febc8879d410
 then
 2) bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc

 Then, depending on results of no corruption for those commits, bonus
 points for testing the same commits but with Andi and Milan's latest
 dm-crypt cpu scalability patch applied too:
 https://patchwork.kernel.org/patch/365542/

 Thanks!
 Mike


Hi Mike,

it seems like there isn't even much testing to do:

I tested all 3 commits / checkouts by re-compiling gcc which was/is
the 2nd easy way to trigger this corruption, compiling google's
chromium (v9) and looking at the output/existance of gcc, g++ and
eselect opengl list

so far everything went fine

After that I used the new patch (v6 or pre-v6), before that I had to

replace WQ_MEM_RECLAIM with WQ_RESCUER

and, re-compiled the kernels

shortly after I had booted up the system with the first kernel
(http://git.eu.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5a87b7a5da250c9be6d757758425dfeaf8ed3179)
the output of 'eselect opengl list' did show no opengl backend
selected

so it seems to manifest itself even earlier (ext4: call
mpage_da_submit_io() from mpage_da_map_blocks()) even if only subtly
and over time -
I'm still currently running that kernel and posting from it  having tests run

I'm not sure if it's even a problem with ext4 - I haven't had the time
to test with XFS yet - maybe it's also happening with that so it more
likely would be dm-core, like Milan suspected
(http://marc.info/?l=linux-kernelm=129123636223477w=2) :(

@Jon,

you had time to do some tests meanwhile ? what did you find out ?

even though most of the time it's compiling I don't need to do much -
I need the box for work so if my time allows next tests would be next
weekend and I'm back to my other partition

I really do hope that this bugger can be nailed down ASAP - I like the
improvements made in 2.6.37 but without the dm-crypt multi-cpu patch
it's only half the fun ;)

Thanks  Regards

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-04 Thread Matt
On Sat, Dec 4, 2010 at 8:38 PM, Mike Snitzer snit...@redhat.com wrote:
 On Sat, Dec 04 2010 at  2:18pm -0500,
 Matt jackdac...@gmail.com wrote:

 On Wed, Dec 1, 2010 at 10:23 PM, Mike Snitzer snit...@redhat.com wrote:
  Matt and Jon,
 
  If you'd be up to it: could you try testing your dm-crypt+ext4
  corruption reproducers against the following two 2.6.37-rc commits:
 
  1) 1de3e3df917459422cb2aecac440febc8879d410
  then
  2) bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc
 
  Then, depending on results of no corruption for those commits, bonus
  points for testing the same commits but with Andi and Milan's latest
  dm-crypt cpu scalability patch applied too:
  https://patchwork.kernel.org/patch/365542/
 
  Thanks!
  Mike
 

 Hi Mike,

 it seems like there isn't even much testing to do:

 I tested all 3 commits / checkouts by re-compiling gcc which was/is
 the 2nd easy way to trigger this corruption, compiling google's
 chromium (v9) and looking at the output/existance of gcc, g++ and
 eselect opengl list

 Can you be a bit more precise about what you're doing to reproduce?
 What sequence?  What (if any) builds are going in parallel?  Etc.

 so far everything went fine

 After that I used the new patch (v6 or pre-v6), before that I had to

 replace WQ_MEM_RECLAIM with WQ_RESCUER

 and, re-compiled the kernels

 shortly after I had booted up the system with the first kernel
 (http://git.eu.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5a87b7a5da250c9be6d757758425dfeaf8ed3179)
 the output of 'eselect opengl list' did show no opengl backend
 selected

 so it seems to manifest itself even earlier (ext4: call
 mpage_da_submit_io() from mpage_da_map_blocks()) even if only subtly
 and over time -
 I'm still currently running that kernel and posting from it  having tests 
 run

 OK.

 I'm not sure if it's even a problem with ext4 - I haven't had the time
 to test with XFS yet - maybe it's also happening with that so it more
 likely would be dm-core, like Milan suspected
 (http://marc.info/?l=linux-kernelm=129123636223477w=2) :(

 It'd be interesting to try to reproduce with that same kernel but using
 XFS.  I'll check with Milan on what he thinks would be the best next
 steps.  Ideally we'll be able to reproduce your results to aid in
 pinpointing the issue.  I think Milan will be trying to do so shortly
 (if he hasn't started already -- using gentoo emerge, etc).

 even though most of the time it's compiling I don't need to do much -
 I need the box for work so if my time allows next tests would be next
 weekend and I'm back to my other partition

 I really do hope that this bugger can be nailed down ASAP - I like the
 improvements made in 2.6.37 but without the dm-crypt multi-cpu patch
 it's only half the fun ;)

 Sure, we'll need to get to the bottom of this before we can have
 confidence sending the dm-crypt cpu scalability patch upstream.

 Thanks for your testing,
 Mike


I should have made it clear that the results I get are observed when
using the kernels/checkouts *with* the dm-crypt multi-cpu patch,
without the patch I didn't see that kind of problems (hardlocks, files
missing, etc.)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-04 Thread Matt
On Sat, Dec 4, 2010 at 8:38 PM, Mike Snitzer snit...@redhat.com wrote:
 On Sat, Dec 04 2010 at  2:18pm -0500,
 Matt jackdac...@gmail.com wrote:

 On Wed, Dec 1, 2010 at 10:23 PM, Mike Snitzer snit...@redhat.com wrote:
  Matt and Jon,
 
  If you'd be up to it: could you try testing your dm-crypt+ext4
  corruption reproducers against the following two 2.6.37-rc commits:
 
  1) 1de3e3df917459422cb2aecac440febc8879d410
  then
  2) bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc
 
  Then, depending on results of no corruption for those commits, bonus
  points for testing the same commits but with Andi and Milan's latest
  dm-crypt cpu scalability patch applied too:
  https://patchwork.kernel.org/patch/365542/
 
  Thanks!
  Mike
 

 Hi Mike,

 it seems like there isn't even much testing to do:

 I tested all 3 commits / checkouts by re-compiling gcc which was/is
 the 2nd easy way to trigger this corruption, compiling google's
 chromium (v9) and looking at the output/existance of gcc, g++ and
 eselect opengl list

 Can you be a bit more precise about what you're doing to reproduce?
 What sequence?  What (if any) builds are going in parallel?  Etc.

 so far everything went fine

 After that I used the new patch (v6 or pre-v6), before that I had to

 replace WQ_MEM_RECLAIM with WQ_RESCUER

 and, re-compiled the kernels

 shortly after I had booted up the system with the first kernel
 (http://git.eu.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5a87b7a5da250c9be6d757758425dfeaf8ed3179)
 the output of 'eselect opengl list' did show no opengl backend
 selected

 so it seems to manifest itself even earlier (ext4: call
 mpage_da_submit_io() from mpage_da_map_blocks()) even if only subtly
 and over time -
 I'm still currently running that kernel and posting from it  having tests 
 run

 OK.

 I'm not sure if it's even a problem with ext4 - I haven't had the time
 to test with XFS yet - maybe it's also happening with that so it more
 likely would be dm-core, like Milan suspected
 (http://marc.info/?l=linux-kernelm=129123636223477w=2) :(

 It'd be interesting to try to reproduce with that same kernel but using
 XFS.  I'll check with Milan on what he thinks would be the best next
 steps.  Ideally we'll be able to reproduce your results to aid in
 pinpointing the issue.  I think Milan will be trying to do so shortly
 (if he hasn't started already -- using gentoo emerge, etc).

 even though most of the time it's compiling I don't need to do much -
 I need the box for work so if my time allows next tests would be next
 weekend and I'm back to my other partition

 I really do hope that this bugger can be nailed down ASAP - I like the
 improvements made in 2.6.37 but without the dm-crypt multi-cpu patch
 it's only half the fun ;)

 Sure, we'll need to get to the bottom of this before we can have
 confidence sending the dm-crypt cpu scalability patch upstream.

 Thanks for your testing,
 Mike


OK, before bed time I found some kind of corruption:

running kernel is from commit: bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc

the messages might be overseen - so they're difficult to notice:

steps:
1) bootup
2) (might need to re-install graphics driver due to driver switch, in
this case magic properties [or what's its name] didn't change so the
kernel module still worked)
3) firing up 2 xterms, xload, xclock, gksu - terminal - firefox,
nautilus --no-desktop, gnome-mplayer (playing mp3)
4) emerge -1 sys-devel/gcc (from one of the xterms)

after emerge -1 sys-devel/gcc
finished it displayed:

 Auto-cleaning packages...
portage: COUNTER for sys-devel/patch-2.6.1 was corrupted; resetting to
value of 0
portage: COUNTER for sys-devel/patch-2.6.1 was corrupted; resetting to
value of 0

(the COUNTER file normally should have a value, e.g.:
cat /var/db/pkg/sys-devel/gcc-4.5.1-r1/COUNTER
20560)

in this case it's empty:
cat /var/db/pkg/sys-devel/patch-2.6.1/COUNTER

(shows nothing)

reference thread: http://forums.gentoo.org/viewtopic-t-836605-start-0.html

it's solvable by re-install but in case of not-recoverable files (e.g.
personal files) it would be critical
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-02 Thread Matt
On Wed, Dec 1, 2010 at 10:23 PM, Mike Snitzer snit...@redhat.com wrote:
 On Wed, Dec 01 2010 at  3:45pm -0500,
 Milan Broz mb...@redhat.com wrote:


 On 12/01/2010 08:34 PM, Jon Nelson wrote:
  Perhaps this is useful: for myself, I found that when I started using
  2.6.37rc3 that postgresql starting having a *lot* of problems with
  corruption. Specifically, I noted zeroed pages, corruption in headers,
  all sorts of stuff on /newly created/ tables, especially during index
  creation. I had a fairly high hit rate of failure. I backed off to
  2.6.34.7 and have *zero* problems (in fact, prior to 2.6.37rc3, I had
  never had a corruption issue with postgresql). I ran on 2.6.36 for a
  few weeks as well, without issue.
 
  I am using kcrypt with lvm on top of that, and ext4 on top of that.

 With unpatched dmcrypt (IOW with Linus' git)? Then it must be ext4 or
 dm-core problem because there were no patches for dm-crypt...

 Matt and Jon,

 If you'd be up to it: could you try testing your dm-crypt+ext4
 corruption reproducers against the following two 2.6.37-rc commits:

 1) 1de3e3df917459422cb2aecac440febc8879d410
 then
 2) bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc

 Then, depending on results of no corruption for those commits, bonus
 points for testing the same commits but with Andi and Milan's latest
 dm-crypt cpu scalability patch applied too:
 https://patchwork.kernel.org/patch/365542/

 Thanks!
 Mike


Yeah sure,

I'll have to set up another testing system (on a separate partition /
volume group) for its own so that will take some time,
first tests will be run probably in the weekend,

thanks for those pointers !

I took a look at git-web - you think
5a87b7a5da250c9be6d757758425dfeaf8ed3179 might be relevant, too ?

the others seem rather minor compared to those you posted

Afaik last time I run vanilla 2.6.37-rc* (which was probably around
rc1) I saw no corruption at all but I'll give it a test-run without
the dm-crypt patch anyway

Thanks  Regards

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dm-crypt barrier support is effective

2010-12-01 Thread Matt
On Mon, Nov 15, 2010 at 12:24 AM, Matt jackdac...@gmail.com wrote:
 On Sun, Nov 14, 2010 at 10:54 PM, Milan Broz mb...@redhat.com wrote:
 On 11/14/2010 10:49 PM, Matt wrote:
 only with the dm-crypt scaling patch I could observe the data-corruption

 even with v5 I sent on Friday?

 Are you sure that it is not related to some fs problem in 2.6.37-rc1?

 If it works on 2.6.36 without problems, it is probably problems somewhere
 else (flush/fua conversion was trivial here - DM is still doing full flush
 and there are no other changes in code IMHO.)

 Milan


 Hi Milan,

 I'm aware of your new v5 patch (which should include several
 improvements (or potential fixes in my case) over the v3 patch)

 as I already wrote my schedule unfortunately currently doesn't allow
 me to test it

 * in the case of no corruption it would be nice to have 2.6.37-rc* running :)

 * in the case of data corruption that would mean restoring my system -
 since it's my production box and right now I don't have a fallback at
 reach
 at earliest I could give it a shot at the beginning of December. Then
 I could also test reiserfs and ext4 as a system partition to rule out
 that it's
 a ext4-specific thing (currently I'm running reiserfs on my system-partition).

 Thanks !

 Matt



OK guys,

I've updated my system to latest glibc 2.12.1-r3 (on gentoo) and gcc
hardened 4.5.1-r1 with 1.4 patchset which also uses pie (that one
should fix problems with graphite)

not much system changes besides that,

with those it worked fine with 2.6.36 and I couldn't observe any
filesystem corruption



the bad news is: I'm again seeing corruption (!) [on ext4, on the /
(root) partition]:

I was re-emerging/re-installing stuff - pretty trivial stuff actually
(which worked fine in the past): emerging gnome-base programs (gconf,
librsvg, nautilus, gnome-mount, gnome-vfs, gvfs, imagemagick,
xine-lib) and some others: terminal (from xfce), vtwm, rman, vala
(library), xclock, xload, atk, gtk+, vte

during that I noticed some corruption and programs kept failing to
configure/compile, saying that g++ was missing; I re-extracted gcc
(which I previously had made an backup-tarball), that seemed to help
for some time until programs again failed with some corrupted files
from gcc

so I re-emerged gcc (compiling it) and after it had finished the same
error occured I already had written about in an previous email:
the content of /etc/env.d/03opengl got corrupted - but NOT the whole file:

normally it's
# Configuration file for eselect
# This file has been automatically generated.
LDPATH=
OPENGL_PROFILE=
-- where the path to the graphics-drivers and the opengl profile is written;

in this case of the corruption it only where 
symbols


I have no clue how this file could be connected with gcc


=== so the No.1 trigger of this kind of corruption where files are
empty, missing or the content gets corrupted (at least for me) is
compiling software which is part of the system (e.g. emerge -e
system);

the system is Gentoo ~amd64; with binutils 2.20.51.0.12 (afaik this
one has changed from 2.20.51.0.10 to 2.20.51.0.12 from my last
report); gcc 4.5.1 (Gentoo Hardened 4.5.1-r1 p1.4, pie-0.4.5) --
works fine with 2.6.36 and 2.6.36.1

I'm not sure whether benchmarks would have the same impact



the kernel currently running is 2.6.37-rc4 with the [PATCH v5] dm
crypt: scale to multiple CPUs

besides that additional patchsets are applied (I apologize that it's
not only plain vanilla with the dm-crypt patch):
* Prevent kswapd dumping excessive amounts of memory in response to
high-order allocation
* ext4: coordinate data-only flush requests sent by fsync
* vmscan: protect executable page from inactive list scan
* writeback livelock fixes v2

I originally had hoped that the mentioned patch in ext4: coordinate
data-only flush requests sent by fsync, namely: md: Call
blk_queue_flush() to establish flush/fua and additional changes 
fixes to 2.6.37-rc4 would once and for all fix problems but it didn't

I'm also using the the writeback livelock fixes and the dm-crypt scale
to multiple CPUs with 2.6.36 so those generally work fine

so it has be something that changed from 2.6.36-2.6.37 within
dm-crypt or other parts that gets stressed and breaks during usage of
the [PATCH v5] dm crypt: scale to multiple CPUs patch

the other included patches surely won't be the cause for that (100%).

Filesystem corruption only seems to occur on the / (root) where the
system resides -

Fortunately I haven't encountered any corruption on my /home partition
which also uses ext4 and during rsync'ing from /home to other data
partitions with ext4 and xfs (I don't want to try to seriously corrupt
any of my data so I played it safe from the beginning and didn't use
anything heavy such as virtualmachines, etc.) - browsing the web,
using firefox  chromium, amarok, etc. worked fine so far

the system is in a pretty new state - which means I extracted it
from a tarball out of an liveCD environment

Re: dm-crypt barrier support is effective

2010-12-01 Thread Matt
On Wed, Dec 1, 2010 at 5:52 PM, Mike Snitzer snit...@redhat.com wrote:
 On Wed, Dec 01 2010 at 11:05am -0500,
 Matt jackdac...@gmail.com wrote:

 On Mon, Nov 15, 2010 at 12:24 AM, Matt jackdac...@gmail.com wrote:
  On Sun, Nov 14, 2010 at 10:54 PM, Milan Broz mb...@redhat.com wrote:
  On 11/14/2010 10:49 PM, Matt wrote:
  only with the dm-crypt scaling patch I could observe the data-corruption
 
  even with v5 I sent on Friday?
 
  Are you sure that it is not related to some fs problem in 2.6.37-rc1?
 
  If it works on 2.6.36 without problems, it is probably problems somewhere
  else (flush/fua conversion was trivial here - DM is still doing full flush
  and there are no other changes in code IMHO.)
 
  Milan
 
 
  Hi Milan,
 
  I'm aware of your new v5 patch (which should include several
  improvements (or potential fixes in my case) over the v3 patch)
 
  as I already wrote my schedule unfortunately currently doesn't allow
  me to test it
 
  * in the case of no corruption it would be nice to have 2.6.37-rc* running 
  :)
 
  * in the case of data corruption that would mean restoring my system -
  since it's my production box and right now I don't have a fallback at
  reach
  at earliest I could give it a shot at the beginning of December. Then
  I could also test reiserfs and ext4 as a system partition to rule out
  that it's
  a ext4-specific thing (currently I'm running reiserfs on my 
  system-partition).
 
  Thanks !
 
  Matt
 


 OK guys,

 I've updated my system to latest glibc 2.12.1-r3 (on gentoo) and gcc
 hardened 4.5.1-r1 with 1.4 patchset which also uses pie (that one
 should fix problems with graphite)

 not much system changes besides that,

 with those it worked fine with 2.6.36 and I couldn't observe any
 filesystem corruption

 So dm-crypt cpu scalability v5 with 2.6.36 worked fine.

 the bad news is: I'm again seeing corruption (!) [on ext4, on the /
 (root) partition]:

 ...

 === so the No.1 trigger of this kind of corruption where files are
 empty, missing or the content gets corrupted (at least for me) is
 compiling software which is part of the system (e.g. emerge -e
 system);

 the system is Gentoo ~amd64; with binutils 2.20.51.0.12 (afaik this
 one has changed from 2.20.51.0.10 to 2.20.51.0.12 from my last
 report); gcc 4.5.1 (Gentoo Hardened 4.5.1-r1 p1.4, pie-0.4.5) --
 works fine with 2.6.36 and 2.6.36.1

 I'm not sure whether benchmarks would have the same impact

 Seems this emerge is a good test if it reliably enduces the corruption.

 the kernel currently running is 2.6.37-rc4 with the [PATCH v5] dm
 crypt: scale to multiple CPUs

 besides that additional patchsets are applied (I apologize that it's
 not only plain vanilla with the dm-crypt patch):
 * Prevent kswapd dumping excessive amounts of memory in response to
 high-order allocation
 * ext4: coordinate data-only flush requests sent by fsync
 * vmscan: protect executable page from inactive list scan
 * writeback livelock fixes v2

 Have you actually experienced any of the issues the above patches are
 meant to address?  Seems you're applying patches guessing/hoping
 that they'll fix the dm-crypt corruption.

 I originally had hoped that the mentioned patch in ext4: coordinate
 data-only flush requests sent by fsync, namely: md: Call
 blk_queue_flush() to establish flush/fua and additional changes 
 fixes to 2.6.37-rc4 would once and for all fix problems but it didn't

 That md patch doesn't help DM at all.  And the ext4 coordination patch
 is completely bleeding and actually broken (especially as it relates to
 DM -- but that breakage is ony a concern for request-based DM,
 e.g. DM-mapth), anyway see:
 https://www.redhat.com/archives/dm-devel/2010-November/msg00185.html

 I'm not sure which patches you're using for the ext4 fsync changes but
 please don't use them at all.  It is purely an optimization for
 extremely heavy fsync workloads and is only getting in the way at this
 point.

 I'm also using the the writeback livelock fixes and the dm-crypt scale
 to multiple CPUs with 2.6.36 so those generally work fine

 so it has be something that changed from 2.6.36-2.6.37 within
 dm-crypt or other parts that gets stressed and breaks during usage of
 the [PATCH v5] dm crypt: scale to multiple CPUs patch

 the other included patches surely won't be the cause for that (100%).

 Filesystem corruption only seems to occur on the / (root) where the
 system resides -

 We need better fault isolation; you've introduced enough change that it
 isn't helping zero in on what your particular problem is.  Milan has
 tested he latest version of the dm-crypt cpu scalability patch quite a
 bit and hasn't seen any corruption -- but clearly the corruption you're
 seeing is a real concern and we need to get to the bottom of it.

 I'd really appreciate it if you could just use Linus' latest linux-2.6
 tree plus Milan's latest patch (technically v6 even though it wasn't
 labeled as such): https://patchwork.kernel.org/patch/365542/

 Porting

Re: dm-crypt barrier support is effective (was: Re: DM-CRYPT: Scale to multiple CPUs v3 on 2.6.37-rc* ?)

2010-11-14 Thread Matt
 stable !

only with the dm-crypt scaling patch I could observe the data-corruption


Thanks !

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dm-crypt barrier support is effective

2010-11-14 Thread Matt
On Sun, Nov 14, 2010 at 10:54 PM, Milan Broz mb...@redhat.com wrote:
 On 11/14/2010 10:49 PM, Matt wrote:
 only with the dm-crypt scaling patch I could observe the data-corruption

 even with v5 I sent on Friday?

 Are you sure that it is not related to some fs problem in 2.6.37-rc1?

 If it works on 2.6.36 without problems, it is probably problems somewhere
 else (flush/fua conversion was trivial here - DM is still doing full flush
 and there are no other changes in code IMHO.)

 Milan


Hi Milan,

I'm aware of your new v5 patch (which should include several
improvements (or potential fixes in my case) over the v3 patch)

as I already wrote my schedule unfortunately currently doesn't allow
me to test it

* in the case of no corruption it would be nice to have 2.6.37-rc* running :)

* in the case of data corruption that would mean restoring my system -
since it's my production box and right now I don't have a fallback at
reach
at earliest I could give it a shot at the beginning of December. Then
I could also test reiserfs and ext4 as a system partition to rule out
that it's
a ext4-specific thing (currently I'm running reiserfs on my system-partition).

Thanks !

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: checkpatch fixes in various files

2010-07-19 Thread Matt Lupfer
From: Matt Lupfer mrlup...@us.ibm.com

Fixes innocuous style issues identified by the checkpatch stript.

Signed-off-by: Matt Lupfer mrlup...@us.ibm.com
Reviewed-by: Ben Chociej bccho...@us.ibm.com
Reviewed-by: Conor Scott crsc...@us.ibm.com
Reviewed-by: Steve French sfre...@us.ibm.com
---
 fs/btrfs/async-thread.c |2 +-
 fs/btrfs/disk-io.c  |4 ++--
 fs/btrfs/export.c   |3 ++-
 fs/btrfs/extent-tree.c  |8 
 fs/btrfs/extent_io.h|4 ++--
 fs/btrfs/extent_map.h   |8 
 fs/btrfs/free-space-cache.c |   20 +---
 fs/btrfs/inode.c|   27 +++
 fs/btrfs/ioctl.c|   19 +++
 fs/btrfs/locking.c  |4 ++--
 fs/btrfs/ordered-data.c |3 +--
 fs/btrfs/ordered-data.h |7 ---
 fs/btrfs/tree-log.c |4 ++--
 fs/btrfs/tree-log.h |3 ++-
 fs/btrfs/volumes.c  |4 ++--
 15 files changed, 63 insertions(+), 57 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 7ec1409..e142da3 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -260,7 +260,7 @@ static struct btrfs_work *get_next_work(struct 
btrfs_worker_thread *worker,
struct btrfs_work *work = NULL;
struct list_head *cur = NULL;
 
-   if(!list_empty(prio_head))
+   if (!list_empty(prio_head))
cur = prio_head-next;
 
smp_mb();
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 34f7c37..4513eaf 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -243,8 +243,8 @@ static int csum_tree_block(struct btrfs_root *root, struct 
extent_buffer *buf,
   failed on %llu wanted %X found %X 
   level %d\n,
   root-fs_info-sb-s_id,
-  (unsigned long long)buf-start, val, 
found,
-  btrfs_header_level(buf));
+  (unsigned long long)buf-start, val,
+  found, btrfs_header_level(buf));
}
if (result != (char *)inline_result)
kfree(result);
diff --git a/fs/btrfs/export.c b/fs/btrfs/export.c
index 951ef09..e7e5463 100644
--- a/fs/btrfs/export.c
+++ b/fs/btrfs/export.c
@@ -223,7 +223,8 @@ static struct dentry *btrfs_get_parent(struct dentry *child)
 
key.type = BTRFS_INODE_ITEM_KEY;
key.offset = 0;
-   dentry = d_obtain_alias(btrfs_iget(root-fs_info-sb, key, root, 
NULL));
+   dentry = d_obtain_alias(btrfs_iget(root-fs_info-sb, key, root,
+   NULL));
if (!IS_ERR(dentry))
dentry-d_op = btrfs_dentry_operations;
return dentry;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a46b64d..1298500 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4578,9 +4578,8 @@ static noinline int find_free_extent(struct 
btrfs_trans_handle *trans,
empty_cluster = 64 * 1024;
}
 
-   if ((data  BTRFS_BLOCK_GROUP_DATA)  btrfs_test_opt(root, SSD)) {
+   if ((data  BTRFS_BLOCK_GROUP_DATA)  btrfs_test_opt(root, SSD))
last_ptr = root-fs_info-data_alloc_cluster;
-   }
 
if (last_ptr) {
spin_lock(last_ptr-lock);
@@ -4642,7 +4641,8 @@ have_block_group:
if (unlikely(block_group-cached == BTRFS_CACHE_NO)) {
u64 free_percent;
 
-   free_percent = 
btrfs_block_group_used(block_group-item);
+   free_percent = btrfs_block_group_used(
+   block_group-item);
free_percent *= 100;
free_percent = div64_u64(free_percent,
 block_group-key.offset);
@@ -7862,7 +7862,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
 
release_global_block_rsv(info);
 
-   while(!list_empty(info-space_info)) {
+   while (!list_empty(info-space_info)) {
space_info = list_entry(info-space_info.next,
struct btrfs_space_info,
list);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 5691c7b..2ebfef0 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -184,8 +184,8 @@ int test_range_bit(struct extent_io_tree *tree, u64 start, 
u64 end,
 int clear_extent_bits(struct extent_io_tree *tree, u64 start, u64 end,
  int bits, gfp_t mask);
 int clear_extent_bit(struct extent_io_tree *tree, u64 start, u64 end,
-int bits, int wake, int delete, struct extent_state 
**cached,
-gfp_t mask);
+int bits, int wake, int delete

Re: Copy/move btrfs volume

2010-07-01 Thread Matt Brown
On 07/01/2010 05:33 AM, Lubos Kolouch wrote:
 Daniel J Blueman, Thu, 01 Jul 2010 12:26:10 +0100:
 What is the correct way to do this?

 The only way to do this preserving duplication is to use hardlinks
 between duplicated files (which reference counts the inode), and use
 'rsync -H'.

 Dan

Hello,

With backed up files consisting of hard links, I usually use dd to copy
the file systems at the block level

# dd if=/dev/sda of=/dev/sdb bs=20M

and then expand the file system. This is because I found that tools like
rsync, while usually fast, are extremely slow when dealing with millions
of hard linked files.

This could also be used for btrfs to keep its snapshots.

 A scenario - I have raid5 of say, 1TB HDDs. It contains many snapshots.
 Then, few years later, new machine is bought and there are, say, 5TB
 discs.
 ...
 Lubos

For me, I had to copy over BackupPC hardlinked files from a full disk to
a smaller disk, both using ext4, and I could not use dd. What normally
should have taken an hour, instead took almost a week. (Yes, I wanted to
use btrfs, but it had a hard link limit of 255 - don't know if it still
does.)

It would be nice to have a btrfs command that could rapidly copy over
the file system, snapshots, and all other file system info.

But what benefit would having a native btrfs 'copy/rsync' command have
over the dd/resize option?

Pros
- Files will be immediately checksumed on new disks, but this may not be
as important since a checksum/verify command will be implemented.
- Great 'feature' for copying files to new drives, and keeping
snapshots. Could even be used to export snapshots.
- I believe compressed files will have to be uncompressed and
recompressed, depending on when file is checksummed. (I may be wrong on
this one). This will actually be a con for slow and/or high load machines.
- One command instead of many (dd - resize - verify).

Cons
- File system would still have to be unmounted, or at least read-only,
as I doubt the command will have rsync's update or delete abilities.
But, maybe it could.

Questionable
- May be faster than dd/resize, or it may be just as slow as rsync is
with hard links. And I am talking about dozens to thousands of
snapshots, and millions to billions of files.

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Mounting raid without a btrfsctl scan

2010-05-15 Thread Matt Brown
Hi,

Would it be possible and feasible to support mounting btrfs
raid/multi-device filesystems without having to run 'btrfsctl -a'?

Currently, as you may know, if a one wants to attach a btrfs raid
filesystem to a system (usb, hotswap, reboot, etc), the user or program
has to run:

btrfsctl -a (or similar)
mount /dev/sdb1 /mount/point

While this works, it will require patching of various subsystems
involved with managing disks, such as udev, mkinitrd, dracut, hal, and
others. Each one will have to know to scan, then mount.

For example, I have a system that has a btrfs raid1 as root. However, I
had to patch the boot loader (dracut) so during boot it would scan just
before mounting the root filesystem.

I filed a bug with dracut, but the more I think of it, the more it seems
that either mount.btrfs should be taking care of this, or another part
of btrfs.

Any thoughts or plans on the matter?

Thanks,
Matt

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html