Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-14 Thread Dave Chinner
.8+/-1.3e+04 files/s) and that shows in the runtime which also drops from 3m57s to 3m22s. So regardless of what aim7 results we get from these changes, I'll be merging them pending review and further testing... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 3.16 102/305] xfs: xfs_iflush_cluster fails to abort on error

2016-08-14 Thread Dave Chinner
On Sat, Aug 13, 2016 at 06:42:51PM +0100, Ben Hutchings wrote: > 3.16.37-rc1 review patch. If anyone has any objections, please let me know. > > -- > > From: Dave Chinner > > commit b1438f477934f5a4d5a44df26f3079a7575d5946 upstream. > > When a fai

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-14 Thread Dave Chinner
On Sat, Aug 13, 2016 at 02:30:54AM +0200, Christoph Hellwig wrote: > On Fri, Aug 12, 2016 at 08:02:08PM +1000, Dave Chinner wrote: > > Which says "no change". Oh well, back to the drawing board... > > I don't see how it would change thing much - for all relevant calc

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-12 Thread Dave Chinner
On Fri, Aug 12, 2016 at 04:51:24PM +0800, Ye Xiaolong wrote: > On 08/12, Ye Xiaolong wrote: > >On 08/12, Dave Chinner wrote: > > [snip] > > >>lkp-folk: the patch I've just tested it attached below - can you > >>feed that through your test and see if it f

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Dave Chinner
On Thu, Aug 11, 2016 at 10:02:39PM -0700, Linus Torvalds wrote: > On Thu, Aug 11, 2016 at 9:16 PM, Dave Chinner wrote: > > > > That's why running aim7 as your "does the filesystem scale" > > benchmark is somewhat irrelevant to scaling applications on hig

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Dave Chinner
her logging rates by doing this That's why running aim7 as your "does the filesystem scale" benchmark is somewhat irrelevant to scaling applications on high performance systems these days - users with fast storage will be expecting to see that 1.9GB/s throughput from their app, not 600MB/s Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Dave Chinner
On Thu, Aug 11, 2016 at 07:27:52PM -0700, Linus Torvalds wrote: > On Thu, Aug 11, 2016 at 5:54 PM, Dave Chinner wrote: > > > > So, removing mark_page_accessed() made the spinlock contention > > *worse*. > > > > 36.51% [kernel] [k] _raw_spin_unlock_irqr

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Dave Chinner
On Fri, Aug 12, 2016 at 10:54:42AM +1000, Dave Chinner wrote: > I'm now going to test Christoph's theory that this is an "overwrite > doing lots of block mapping" issue. More on that to follow. Ok, so going back to the profiles, I can say it's not an overwrite

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Dave Chinner
On Thu, Aug 11, 2016 at 11:16:12AM +1000, Dave Chinner wrote: > On Wed, Aug 10, 2016 at 05:33:20PM -0700, Huang, Ying wrote: > We need to know what is happening that is different - there's a good > chance the mapping trace events will tell us. Huang, can you get > a raw event tr

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-11 Thread Dave Chinner
design level - the mapping->tree_lock is a global serialisation point I'm now going to test Christoph's theory that this is an "overwrite doing lots of block mapping" issue. More on that to follow. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-10 Thread Dave Chinner
n_unlock_irqrestore I don't think that this is the same as what aim7 is triggering as there's no XFS write() path allocation functions near the top of the profile to speak of. Still, I don't recall seeing this before... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-10 Thread Dave Chinner
On Thu, Aug 11, 2016 at 10:36:59AM +0800, Ye Xiaolong wrote: > On 08/11, Dave Chinner wrote: > >On Thu, Aug 11, 2016 at 11:16:12AM +1000, Dave Chinner wrote: > >> I need to see these events: > >> > >>xfs_file* > >>xfs_iomap* > >>

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-10 Thread Dave Chinner
On Thu, Aug 11, 2016 at 11:16:12AM +1000, Dave Chinner wrote: > I need to see these events: > > xfs_file* > xfs_iomap* > xfs_get_block* > > For both kernels. An example trace from 4.8-rc1 running the command > `xfs_io -f -c 'pwrite 0 512k -b 128k

Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-10 Thread Dave Chinner
red_write: dev 253:32 ino 0x84 size 0x4 offset 0x4 count 0x2 xfs_io-2946 [001] 253971.751236: xfs_iomap_found: dev 253:32 ino 0x84 size 0x4 offset 0x4 count 131072 type invalid startoff 0x0 startblock 24 blockcount 0x60 xfs_io-2946 [001] 253971.751381: xfs_file_buffered_write: dev 253:32 ino 0x84 size 0x4 offset 0x6 count 0x2 xfs_io-2946 [001] 253971.751415: xfs_iomap_prealloc_size: dev 253:32 ino 0x84 prealloc blocks 128 shift 0 m_writeio_blocks 16 xfs_io-2946 [001] 253971.751425: xfs_iomap_alloc: dev 253:32 ino 0x84 size 0x4 offset 0x6 count 131072 type invalid startoff 0x60 startblock -1 blockcount 0x90 That's the output I need for the complete test - you'll need to use a better recording mechanism that this (e.g. trace-cmd record, trace-cmd report) because it will generate a lot of events. Compress the two report files (they'll be large) and send them to me offlist. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

2016-08-10 Thread Dave Chinner
't spin on at all. We really need instruction level perf profiles to understand this - I don't have a machine with this many cpu cores available locally, so I'm not sure I'm going to be able to make any progress tracking it down in the short term. Maybe the lkp team has more in-depth cpu usage profiles they can share? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

[GIT PULL] xfs: reverse mapping support for 4.8-rc1

2016-08-06 Thread Dave Chinner
create mode 100644 fs/xfs/libxfs/xfs_rmap_btree.h create mode 100644 fs/xfs/xfs_rmap_item.c create mode 100644 fs/xfs/xfs_rmap_item.h create mode 100644 fs/xfs/xfs_trans_rmap.c -- Dave Chinner da...@fromorbit.com

Re: [bug, 4.8] /proc/meminfo: counter values are very wrong

2016-08-05 Thread Dave Chinner
On Fri, Aug 05, 2016 at 09:59:35PM +1000, Dave Chinner wrote: > On Fri, Aug 05, 2016 at 11:54:17AM +0100, Mel Gorman wrote: > > On Fri, Aug 05, 2016 at 09:11:10AM +1000, Dave Chinner wrote: > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > index

Re: [bug, 4.8] /proc/meminfo: counter values are very wrong

2016-08-05 Thread Dave Chinner
On Fri, Aug 05, 2016 at 11:54:17AM +0100, Mel Gorman wrote: > On Fri, Aug 05, 2016 at 09:11:10AM +1000, Dave Chinner wrote: > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index fb975cec3518..baa97da3687d 100644 > > > --- a/mm/page_alloc.c > > >

Re: [bug, 4.8] /proc/meminfo: counter values are very wrong

2016-08-04 Thread Dave Chinner
On Thu, Aug 04, 2016 at 01:34:58PM +0100, Mel Gorman wrote: > On Thu, Aug 04, 2016 at 01:24:09PM +0100, Mel Gorman wrote: > > On Thu, Aug 04, 2016 at 03:10:51PM +1000, Dave Chinner wrote: > > > Hi folks, > > > > > > I just noticed a whacky memory usage prof

[bug, 4.8] /proc/meminfo: counter values are very wrong

2016-08-03 Thread Dave Chinner
be freed and removed from teh page cache. According to the per-node counters, that is not happening and there gigabytes of invalidated pages still sitting on the active LRUs. Something is broken Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 1/3] Add a new field to struct shrinker

2016-07-28 Thread Dave Chinner
On Thu, Jul 28, 2016 at 11:25:13AM +0100, Mel Gorman wrote: > On Thu, Jul 28, 2016 at 03:49:47PM +1000, Dave Chinner wrote: > > Seems you're all missing the obvious. > > > > Add a tracepoint for a shrinker callback that includes a "name" > > fie

Re: [PATCH 1/3] Add a new field to struct shrinker

2016-07-27 Thread Dave Chinner
that includes a "name" field, have the shrinker callback fill it out appropriately. e.g in the superblock shrinker: trace_shrinker_callback(shrinker, shrink_control, sb->s_type->name); And generic code that doesn't want to put a specific context name in there can simply call: trace_shrinker_callback(shrinker, shrink_control, __func__); And now you know exactly what shrinker is being run. No need to add names to any structures, it's call site defined so is flexible, and if you're not using tracepoints has no overhead. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

[GIT PULL] xfs: update for 4.8-rc1

2016-07-26 Thread Dave Chinner
xfs: rearrange xfs_bmap_add_free parameters xfs: convert list of extents to free into a regular list xfs: refactor btree maxlevels computation Dave Chinner (14): xfs: reduce lock hold times in buffer writeback Merge branch 'fs-4.8-iomap-infrastructure' into fo

Re: [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists

2016-07-20 Thread Dave Chinner
e remote. So it's really only a per-cpu structure for list addition Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: linux-next: manual merge of the xfs tree with Linus' tree

2016-07-20 Thread Dave Chinner
[PATCH] in the subject line don't get the immediate attention of my mail filters, so I didn't see it immediately. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [BUG] Slab corruption during XFS writeback under memory pressure

2016-07-19 Thread Dave Chinner
On Tue, Jul 19, 2016 at 02:22:47PM -0700, Calvin Owens wrote: > On 07/18/2016 07:05 PM, Calvin Owens wrote: > >On 07/17/2016 11:02 PM, Dave Chinner wrote: > >>On Sun, Jul 17, 2016 at 10:00:03AM +1000, Dave Chinner wrote: > >>>On Fri, Jul 15, 2016 at 05:18:

Re: [BUG] Slab corruption during XFS writeback under memory pressure

2016-07-17 Thread Dave Chinner
On Sun, Jul 17, 2016 at 10:00:03AM +1000, Dave Chinner wrote: > On Fri, Jul 15, 2016 at 05:18:02PM -0700, Calvin Owens wrote: > > Hello all, > > > > I've found a nasty source of slab corruption. Based on seeing similar > > symptoms > > on boxes at Faceboo

Re: [BUG] Slab corruption during XFS writeback under memory pressure

2016-07-16 Thread Dave Chinner
argv[1], O_RDWR|O_CREAT, 0644); > if (fd == -1) { > perror("Can't open"); > return 1; > } > > if (!fork()) { > count = atol(argv[2]); > > while (1) { > for (i = 0; i < count; i++) > if (write(fd, crap, CHUNK) != CHUNK) > perror("Eh?"); > > fsync(fd); > ftruncate(fd, 0); > } H. Truncate is used, but only after fsync. If the truncate is removed, does the problem go away? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 00/31] Move LRU page reclaim from zones to nodes v8

2016-07-11 Thread Dave Chinner
On Mon, Jul 11, 2016 at 10:02:24AM +0100, Mel Gorman wrote: > On Mon, Jul 11, 2016 at 10:47:57AM +1000, Dave Chinner wrote: > > > I had tested XFS with earlier releases and noticed no major problems > > > so later releases tested only one filesystem. Given the changes sin

Re: Hang due to nfs letting tasks freeze with locked inodes

2016-07-10 Thread Dave Chinner
On Fri, Jul 08, 2016 at 01:05:40PM +, Trond Myklebust wrote: > > On Jul 8, 2016, at 08:55, Trond Myklebust > > wrote: > >> On Jul 8, 2016, at 08:48, Seth Forshee > >> wrote: On Fri, Jul 08, 2016 at > >> 09:53:30AM +1000, Dave Chinner wrote: > &g

Re: [PATCH 00/31] Move LRU page reclaim from zones to nodes v8

2016-07-10 Thread Dave Chinner
On Fri, Jul 08, 2016 at 10:52:03AM +0100, Mel Gorman wrote: > On Fri, Jul 08, 2016 at 09:27:13AM +1000, Dave Chinner wrote: > > . > > > This series is not without its hazards. There are at least three areas > > > that I'm concerned with even though I cou

Re: Hang due to nfs letting tasks freeze with locked inodes

2016-07-07 Thread Dave Chinner
(just like NFS) and hence sys_sync() isn't sufficient to quiesce a filesystem's operations. But I'm used to being ignored on this topic (for almost 10 years, now!). Indeed, it's been made clear in the past that I know absolutely nothing about what is needed to be done to safely suspend filesystem operations... :/ Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 00/31] Move LRU page reclaim from zones to nodes v8

2016-07-07 Thread Dave Chinner
results on XFS for the tests you ran on ext4. It might also be worth running some highly concurrent inode cache benchmarks (e.g. the 50-million inode, 16-way concurrent fsmark tests) to see what impact heavy slab cache pressure has on shrinker behaviour and system balance... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 2/2] GFS2: Add a gfs2-specific prune_icache_sb

2016-06-28 Thread Dave Chinner
On Tue, Jun 28, 2016 at 10:13:32AM +0100, Steven Whitehouse wrote: > Hi, > > On 28/06/16 03:08, Dave Chinner wrote: > >On Fri, Jun 24, 2016 at 02:50:11PM -0500, Bob Peterson wrote: > >>This patch adds a new prune_icache_sb function for the VFS slab > >>shrinker

Re: [PATCH 2/2] GFS2: Add a gfs2-specific prune_icache_sb

2016-06-27 Thread Dave Chinner
erblock shrinker for the above reasons - it's far too easy for people to get badly wrong. If there are specific limitations on how inodes can be freed, then move the parts of inode *freeing* that cause problems to a different context via the ->evict/destroy callouts and trigger that external context processing on demand. That external context can just do bulk "if it is on the list then free it" processing, because the reclaim policy has already been executed to place that inode on the reclaim list. This is essentially what XFS does, but it also uses the ->nr_cached_objects/->free_cached_objects() callouts in the superblock shrinker to provide the reclaim rate feedback mechanism required to throttle incoming memory allocations. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 1/2] vfs: Add hooks for filesystem-specific prune_icache_sb

2016-06-27 Thread Dave Chinner
but do not destroy/free it - you simply queue it to an internal list and then do the cleanup/freeing in your own time? i.e. why do you need a special callout just to defer freeing to another thread when we already have hooks than enable you to do this? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH 2/2] xfs: map KM_MAYFAIL to __GFP_RETRY_HARD

2016-06-15 Thread Dave Chinner
not required." To then map KM_MAYFAIL to a flag that implies the allocation will internally retry to try exceptionally hard to prevent failure seems wrong. IOWs, KM_MAYFAIL means XFS is just using for normal allocator behaviour here, so I'm not sure what problem this change is actually solving and it's not clear from the description Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH-tip 6/6] xfs: Enable reader optimistic spinning for DAX inodes

2016-06-14 Thread Dave Chinner
is much rarer As it is, I'm *extremely* paranoid when it comes to changes to core locking like this. Performance is secondary to correctness, and we need much more than just a few benchmarks to verify there aren't locking bugs being introduced Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)

2016-06-02 Thread Dave Chinner
On Thu, Jun 02, 2016 at 02:44:30PM +0200, Holger Hoffstätte wrote: > On 06/02/16 14:13, Stefan Priebe - Profihost AG wrote: > > > > Am 31.05.2016 um 09:31 schrieb Dave Chinner: > >> On Tue, May 31, 2016 at 08:11:42AM +0200, Stefan Priebe - Profihost AG > >> wrote

Re: Internal error xfs_trans_cancel

2016-06-01 Thread Dave Chinner
g the problem now. > > I was able to reproduce it again with the same steps. Hmmm, Ok. I've been running the lockperf test and kernel builds all day on a filesystem that is identical in shape and size to yours (i.e. xfs_info output is the same) but I haven't reproduced it yet. Is it possible to get a metadump image of your filesystem to see if I can reproduce it on that? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: Internal error xfs_trans_cancel

2016-06-01 Thread Dave Chinner
urrent affinity list: 0-15 pid 9597's new affinity list: 0,4,8,12 sh: 1: cannot create /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor: Directory nonexistent posix01 -n 8 -l 100 posix02 -n 8 -l 100 posix03 -n 8 -i 100 $ So, I've just removed those tests from your script. I'll see if I have any luck with reproducing the problem now. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: Internal error xfs_trans_cancel

2016-06-01 Thread Dave Chinner
FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F You didn't run out of space or something unusual like that? Does 'xfs_repair -n ' report any errors? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)

2016-05-31 Thread Dave Chinner
he XFS code appears to be handling the dirty page that is being passed to it correctly. We'll work out what needs to be done to get rid of the warning for this case, wether it be a mm/ change or an XFS change. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)

2016-05-30 Thread Dave Chinner
On Tue, May 31, 2016 at 12:59:04PM +0900, Minchan Kim wrote: > On Tue, May 31, 2016 at 12:55:09PM +1000, Dave Chinner wrote: > > On Tue, May 31, 2016 at 10:07:24AM +0900, Minchan Kim wrote: > > > On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote: > > > > B

Re: shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)

2016-05-30 Thread Dave Chinner
On Tue, May 31, 2016 at 10:07:24AM +0900, Minchan Kim wrote: > On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote: > > [adding lkml and linux-mm to the cc list] > > > > On Mon, May 30, 2016 at 09:23:48AM +0200, Stefan Priebe - Profihost AG > > wrote: >

shrink_active_list/try_to_release_page bug? (was Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage)

2016-05-30 Thread Dave Chinner
es relating to writeback and memory reclaim. It might be worth trying as a workaround for now. MM-folk - is this analysis correct? If so, why is shrink_active_list() calling try_to_release_page() on dirty pages? Is this just an oversight or is there some problem that this is trying to work around? It seems trivial to fix to me (add a !PageDirty check), but I don't know why the check is there in the first place... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [GIT PULL] xfs: updates for 4.7-rc1

2016-05-29 Thread Dave Chinner
On Thu, May 26, 2016 at 07:05:11PM -0700, Linus Torvalds wrote: > On Thu, May 26, 2016 at 5:13 PM, Dave Chinner wrote: > > On Thu, May 26, 2016 at 10:19:13AM -0700, Linus Torvalds wrote: > >> > >> i'm ok with the late branches, it's not like xfs has been a pro

Re: [GIT PULL] xfs: updates for 4.7-rc1

2016-05-26 Thread Dave Chinner
On Thu, May 26, 2016 at 10:19:13AM -0700, Linus Torvalds wrote: > On Wed, May 25, 2016 at 11:13 PM, Dave Chinner wrote: > > > > Just yell if this is not OK and I'll drop those branches for this > > merge and resend the pull request > > i'm ok with the

[GIT PULL] xfs: updates for 4.7-rc1

2016-05-25 Thread Dave Chinner
e kmem_realloc xfs: fix warning in xfs_finish_page_writeback for non-debug builds Dave Chinner (20): xfs: Don't wrap growfs AGFL indexes xfs: build bios directly in xfs_add_to_ioend xfs: don't release bios on completion immediately xfs: remove xfs_fs_evict_

Re: [GIT PULL] y2038 changes for vfs

2016-05-25 Thread Dave Chinner
he patchset *exactly* like Linus us now suggesting, I walked away and haven't looked at your patches since. Is it any wonder that no other filesystem maintainer has bothered to waste their time on this since? Linus - I'd suggest these VFS timestamp patches need to go through Al's VFS tree. That way we don't get unreviewed VFS infrastructure changes going into your tree via a door that nobody was paying attention to... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: sharing page cache pages between multiple mappings

2016-05-19 Thread Dave Chinner
up. At > that point we switch to sharing pages with the read-write copy. Unless I'm missing something here (quite possible!), I'm not sure we can fix that problem with page cache sharing or reflink. It implies we are sharing pages in a downwards direction - private overlay pages/mappings from multiple inodes would need to be shared with a single underlying shared read-only inode, and I lack the imagination to see how that works... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v2] locking/rwsem: Add reader-owned state to the owner field

2016-05-18 Thread Dave Chinner
king contexts via ASSERT(xfs_isilocked()) calls Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: Linux-next parallel cp workload hang

2016-05-18 Thread Dave Chinner
On Thu, May 19, 2016 at 12:17:26AM +1000, Dave Chinner wrote: > Patch below should fix the deadlock. The test has been running for several hours without failure using this patch, so I'd say this fixes the problem... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: Linux-next parallel cp workload hang

2016-05-18 Thread Dave Chinner
On Wed, May 18, 2016 at 07:46:17PM +0800, Xiong Zhou wrote: > > On Wed, May 18, 2016 at 07:54:09PM +1000, Dave Chinner wrote: > > On Wed, May 18, 2016 at 04:31:50PM +0800, Xiong Zhou wrote: > > > Hi, > > > > > > On Wed, May 18, 2016 at 03:56:34PM +1000, Dav

Re: Linux-next parallel cp workload hang

2016-05-18 Thread Dave Chinner
On Wed, May 18, 2016 at 04:31:50PM +0800, Xiong Zhou wrote: > Hi, > > On Wed, May 18, 2016 at 03:56:34PM +1000, Dave Chinner wrote: > > On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote: > > > Hi, > > > > > > Parallel cp workload (xfstest

Re: Linux-next parallel cp workload hang

2016-05-17 Thread Dave Chinner
lock, but it is not obvious what that may be yet. Can you reproduce this with CONFIG_XFS_DEBUG=y set? if you can, and it doesn't trigger any warnings or asserts, can you then try to reproduce it while tracing the following events: xfs_buf_lock xfs_buf_lock_done xfs_buf_trylock xfs_buf_unlock So we might be able to see if there's an unexpected buffer locking/state pattern occurring when the hang occurs? Also, if you run on slower storage, does the hang get harder or easier to hit? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v3 2/5] block: Add bdev_supports_dax() for dax mount checks

2016-05-09 Thread Dave Chinner
patch should replace blkdev_dax_capable(), or just reuse that > >> existing routine, or am I missing something? > > > > Good question. bdev_supports_dax() is a helper function tailored for the > > filesystem's mount -o dax case. While blkdev_dax_capable() is similar, it > > does not need error messages like "device does not support dax" since it > > implicitly enables dax when capable. So, I think we can keep > > blkdev_dax_capable(), but change it to call bdev_direct_access() so that > > actual check is performed in a single place. > > Sounds good to me. Can you name them consistently then? i.e. blkdev_dax_supported() and blkdev_dax_capable()? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 1/6] statx: Add a system call to make enhanced file info available

2016-05-08 Thread Dave Chinner
[ OT, but I'll reply anyway :P ] On Fri, May 06, 2016 at 02:29:23PM -0400, J. Bruce Fields wrote: > On Thu, May 05, 2016 at 08:56:02AM +1000, Dave Chinner wrote: > > In the latest XFS filesystem format, we randomise the generation > > value during every inode allocation to m

Re: [RFC v2 PATCH 0/8] VFS:userns: support portable root filesystems

2016-05-05 Thread Dave Chinner
On Thu, May 05, 2016 at 11:24:35PM +0100, Djalal Harouni wrote: > On Thu, May 05, 2016 at 10:23:14AM +1000, Dave Chinner wrote: > > On Wed, May 04, 2016 at 04:26:46PM +0200, Djalal Harouni wrote: > > > This is version 2 of the VFS:userns support portable root filesystems > &

Re: [PATCH 1/6] statx: Add a system call to make enhanced file info available

2016-05-05 Thread Dave Chinner
y-handle capability available > > > > ...the last bit seems to indicate that we don't really need this > > anyway, as most userland servers now work with filehandles from the > > kernel. > > > > Maybe leave it out for now? It can always be added later. > > Yeah... probably a good idea. Fine by me. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC v2 PATCH 0/8] VFS:userns: support portable root filesystems

2016-05-04 Thread Dave Chinner
On Wed, May 04, 2016 at 06:44:14PM -0700, Andy Lutomirski wrote: > On Wed, May 4, 2016 at 5:23 PM, Dave Chinner wrote: > > On Wed, May 04, 2016 at 04:26:46PM +0200, Djalal Harouni wrote: > >> This is version 2 of the VFS:userns support portable root filesystems > >> R

Re: [PATCH v5 0/2] ext4: Improve parallel I/O performance on NVDIMM

2016-05-04 Thread Dave Chinner
cs of direct IO, because otherwise a single writer prevents any IO concurrency and that's a bigger problem for DAX that traditional storage due to the access speed and bandwidth available. This was always intended to be fixed by the introduction of proper range locking for IO, not by revertin

Re: [RFC v2 PATCH 0/8] VFS:userns: support portable root filesystems

2016-05-04 Thread Dave Chinner
in VFS means "virtual" and has nothing to do with disks or persistent storage formats. Indeed, let's convert the UID to "on-disk" format for a network filesystem client . > * Add XFS support. What is the problem here? Next question: how does this work with uid/gid based quotas? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v2 3/3] xfs: Add alignment check for DAX mount

2016-05-04 Thread Dave Chinner
is > specified. > > Signed-off-by: Toshi Kani > Cc: Dave Chinner > Cc: Dan Williams > Cc: Ross Zwisler > Cc: Christoph Hellwig > Cc: Boaz Harrosh > --- > fs/xfs/xfs_super.c | 23 +++ > 1 file changed, 19 insertions(+), 4 deletions(-) > >

Re: [PATCH 1/6] statx: Add a system call to make enhanced file info available

2016-05-04 Thread Dave Chinner
ct of updating something requested. I would suggest that exposing them from the NFS server is something we most definitely don't want to do because they are the only thing that keeps remote users from guessing filehandles with ease Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v2 5/5] dax: handle media errors in dax_do_io

2016-05-03 Thread Dave Chinner
On Tue, May 03, 2016 at 10:28:15AM -0700, Dan Williams wrote: > On Mon, May 2, 2016 at 6:51 PM, Dave Chinner wrote: > > On Mon, May 02, 2016 at 04:25:51PM -0700, Dan Williams wrote: > [..] > > Yes, I know, and it doesn't answer any of the questions I just > > asked.

Re: [PATCH v2 5/5] dax: handle media errors in dax_do_io

2016-05-03 Thread Dave Chinner
ehaviour present. The only guarantee for persistence that an app will be able to rely on is msync(). > But I don't see how that > direction is getting turned into an argument against msync() efficiency. Promoting a model that works around inefficiency rather than solving it is no

Re: [PATCH 0/2] scop GFP_NOFS api

2016-05-03 Thread Dave Chinner
On Sun, May 01, 2016 at 08:19:44AM +1000, NeilBrown wrote: > On Sat, Apr 30 2016, Dave Chinner wrote: > > Indeed, blocking the superblock shrinker in reclaim is a key part of > > balancing inode cache pressure in XFS. If the shrinker starts > > hitting dirty inodes, it bl

Re: [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context

2016-05-03 Thread Dave Chinner
On Tue, May 03, 2016 at 05:38:23PM +0200, Michal Hocko wrote: > On Sat 30-04-16 09:40:08, Dave Chinner wrote: > > On Fri, Apr 29, 2016 at 02:12:20PM +0200, Michal Hocko wrote: > [...] > > > - was it > > > "inconsistent {RECLAIM_FS-ON-[RW]} -> {IN-RECLAIM_FS-[

Re: [PATCH v2 5/5] dax: handle media errors in dax_do_io

2016-05-02 Thread Dave Chinner
m filesytems. Encoding cache flushing for data integrity into the userspace applications assumes that such future pmem-based storage will have identical persistence requirements to the existing hardware. This, to me, seems very unlikely to be the case (especially when considering different platforms (e.g. power, ARM)) and so, again, application developers are likely to have to fall back to using a kernel provided data integrity primitive they know they can rely on (i.e. msync()). Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v2 5/5] dax: handle media errors in dax_do_io

2016-05-02 Thread Dave Chinner
On Mon, May 02, 2016 at 04:25:51PM -0700, Dan Williams wrote: > On Mon, May 2, 2016 at 4:04 PM, Dave Chinner wrote: > > On Mon, May 02, 2016 at 11:18:36AM -0400, Jeff Moyer wrote: > >> Dave Chinner writes: > >> > >> > On Mon, Apr 25, 2016 at 11:53:13PM +

Re: [PATCH v2 5/5] dax: handle media errors in dax_do_io

2016-05-02 Thread Dave Chinner
On Mon, May 02, 2016 at 10:53:25AM -0700, Dan Williams wrote: > On Mon, May 2, 2016 at 8:18 AM, Jeff Moyer wrote: > > Dave Chinner writes: > [..] > >> We need some form of redundancy and correction in the PMEM stack to > >> prevent single sector errors from

Re: [PATCH v2 5/5] dax: handle media errors in dax_do_io

2016-05-02 Thread Dave Chinner
On Mon, May 02, 2016 at 11:18:36AM -0400, Jeff Moyer wrote: > Dave Chinner writes: > > > On Mon, Apr 25, 2016 at 11:53:13PM +, Verma, Vishal L wrote: > >> On Tue, 2016-04-26 at 09:25 +1000, Dave Chinner wrote: > > You're assuming that only the DAX aware app

Re: [PATCH 0/2] scop GFP_NOFS api

2016-04-29 Thread Dave Chinner
in place, I'd then make the changes to the generic superblock shrinker code to enable finer grained reclaim and optimise the XFS shrinkers to make use of it... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 0/2] scop GFP_NOFS api

2016-04-29 Thread Dave Chinner
hread can't keep up with all of the allocation pressure that occurs. e.g. a 20-core intel CPU with local memory will be seen as a single node and so will have a single kswapd thread to do reclaim. There's a massive imbalance between maximum reclaim rate and maximum allocation rate in situations like this. If we want memory reclaim to run faster, we to be able to do more work *now*, not defer it to a context with limited execution resources. i.e. IMO deferring more work to a single reclaim thread per node is going to limit memory reclaim scalability and performance, not improve it. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context

2016-04-29 Thread Dave Chinner
On Fri, Apr 29, 2016 at 02:12:20PM +0200, Michal Hocko wrote: > On Fri 29-04-16 07:51:45, Dave Chinner wrote: > > On Thu, Apr 28, 2016 at 10:17:59AM +0200, Michal Hocko wrote: > > > [Trim the CC list] > > > On Wed 27-04-16 08:58:45, Dave Chinner wrote: > > &

Re: [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context

2016-04-28 Thread Dave Chinner
On Thu, Apr 28, 2016 at 10:17:59AM +0200, Michal Hocko wrote: > [Trim the CC list] > On Wed 27-04-16 08:58:45, Dave Chinner wrote: > [...] > > Often these are to silence lockdep warnings (e.g. commit b17cb36 > > ("xfs: fix missing KM_NOFS tags to keep lockdep happy")

Re: [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context

2016-04-27 Thread Dave Chinner
On Wed, Apr 27, 2016 at 10:03:11AM +0200, Michal Hocko wrote: > On Wed 27-04-16 08:58:45, Dave Chinner wrote: > > On Tue, Apr 26, 2016 at 01:56:12PM +0200, Michal Hocko wrote: > > > From: Michal Hocko > > > > > > THIS PATCH IS FOR TESTING ONLY AND NOT MEANT

Re: [PATCH] xfs: idle aild if the AIL is pushed up to the target LSN

2016-04-27 Thread Dave Chinner
On Wed, Apr 27, 2016 at 08:31:38PM +0200, Lucas Stach wrote: > Am Dienstag, den 26.04.2016, 09:08 +1000 schrieb Dave Chinner: > [...] > > > > > > > > > > > That said, I'm not sure whether there's a notable benefit of > > > > idling >

Re: [PATCH 1/2] mm: add PF_MEMALLOC_NOFS

2016-04-26 Thread Dave Chinner
#x27;t actually care about in XFS at all. That way I can carry all the XFS changes in the XFS tree and not have to worry about when this stuff gets merged or conflicts with the rest of the work that is being done to the mm/ code and whatever tree that eventually lands in... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context

2016-04-26 Thread Dave Chinner
ing to restart the flood of false positive lockdep warnings we've silenced over the years, so perhaps lockdep needs to be made smarter as well... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v2 5/5] dax: handle media errors in dax_do_io

2016-04-26 Thread Dave Chinner
On Mon, Apr 25, 2016 at 09:18:42PM -0700, Dan Williams wrote: > On Mon, Apr 25, 2016 at 7:56 PM, Dave Chinner wrote: > > On Mon, Apr 25, 2016 at 06:45:08PM -0700, Dan Williams wrote: > >> > I haven't seen any design/documentation for infrastructure at the > >

Re: [PATCH v2 5/5] dax: handle media errors in dax_do_io

2016-04-25 Thread Dave Chinner
On Mon, Apr 25, 2016 at 06:45:08PM -0700, Dan Williams wrote: > On Mon, Apr 25, 2016 at 5:11 PM, Dave Chinner wrote: > > On Mon, Apr 25, 2016 at 04:43:14PM -0700, Dan Williams wrote: > [..] > >> Maybe I missed something, but all these assumptions are already > >> pre

Re: [PATCH v2 5/5] dax: handle media errors in dax_do_io

2016-04-25 Thread Dave Chinner
On Mon, Apr 25, 2016 at 11:53:13PM +, Verma, Vishal L wrote: > On Tue, 2016-04-26 at 09:25 +1000, Dave Chinner wrote: > >  > <> > > > > > > > - It checks badblocks and discovers it's files have lost data > > Lots of hand-waving here. How doe

Re: [PATCH v2 5/5] dax: handle media errors in dax_do_io

2016-04-25 Thread Dave Chinner
On Mon, Apr 25, 2016 at 04:43:14PM -0700, Dan Williams wrote: > On Mon, Apr 25, 2016 at 4:25 PM, Dave Chinner wrote: > > On Mon, Apr 25, 2016 at 05:14:36PM +, Verma, Vishal L wrote: > >> On Mon, 2016-04-25 at 01:31 -0700, h...@infradead.org wrote: > >> > On S

Re: [PATCH v2 5/5] dax: handle media errors in dax_do_io

2016-04-25 Thread Dave Chinner
d that then assumes the the filesystem will zero blocks if they get reused to clear errors on that LBA sector mapping before they are accessible again to userspace.. It seems to me that there are a number of assumptions being made across multiple layers here. Maybe I've missed something - can you point me to the design/architecture description so I can see how "app does data recovery itself" dance is supposed to work? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] xfs: idle aild if the AIL is pushed up to the target LSN

2016-04-25 Thread Dave Chinner
orker requests the complete AIL to be > pushed out, then it goes back to sleep indefinitely until the log fills > up again. The behaviour suggests that your filesystem is not idle. The filesystem takes up to 90s to be marked idle (log needs to be covered, state machine takes 3x30s cycles to transition to idle "covered" state. If you want the filesytsem to idle quickly, then run the log worker more frequently to get the target updated more quickly. This will also speed up the log covering state machine as well. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] fs: add the FIGETFROZEN ioctl call

2016-04-22 Thread Dave Chinner
On Fri, Apr 22, 2016 at 11:53:48PM +0200, Florian Margaine wrote: > On Tue, Apr 19, 2016 at 1:06 AM, Dave Chinner wrote: > >> A way to query freeze state might be nice, I think, but yeah, it's > >> racy, so you can't depend on it - but it might be useful in the &

Re: [PATCH v3 1/2] ext4: Pass in DIO_SKIP_DIO_COUNT flag if inode_dio_begin() called

2016-04-19 Thread Dave Chinner
On Mon, Apr 18, 2016 at 03:46:46PM -0400, Waiman Long wrote: > On 04/15/2016 06:19 PM, Dave Chinner wrote: > >On Fri, Apr 15, 2016 at 01:17:41PM -0400, Waiman Long wrote: > >>On 04/15/2016 04:17 AM, Dave Chinner wrote: > >>>On Thu, Apr 14, 2016 at 12:21:13PM -0400,

Re: [PATCH] fs: add the FIGETFROZEN ioctl call

2016-04-18 Thread Dave Chinner
On Mon, Apr 18, 2016 at 11:20:22AM -0400, Eric Sandeen wrote: > > > On 4/14/16 10:17 PM, Dave Chinner wrote: > > On Thu, Apr 14, 2016 at 09:57:07AM +0200, Florian Margaine wrote: > >> This lets userland get the filesystem freezing status, aka whether the > >> fil

Re: [PATCH v3 1/2] ext4: Pass in DIO_SKIP_DIO_COUNT flag if inode_dio_begin() called

2016-04-15 Thread Dave Chinner
On Fri, Apr 15, 2016 at 01:17:41PM -0400, Waiman Long wrote: > On 04/15/2016 04:17 AM, Dave Chinner wrote: > >On Thu, Apr 14, 2016 at 12:21:13PM -0400, Waiman Long wrote: > >>On 04/13/2016 11:16 PM, Dave Chinner wrote: > >>>On Tue, Apr 12, 2016 at 02:12:54PM -0400,

Re: [PATCH v3 1/2] ext4: Pass in DIO_SKIP_DIO_COUNT flag if inode_dio_begin() called

2016-04-15 Thread Dave Chinner
On Thu, Apr 14, 2016 at 12:21:13PM -0400, Waiman Long wrote: > On 04/13/2016 11:16 PM, Dave Chinner wrote: > >On Tue, Apr 12, 2016 at 02:12:54PM -0400, Waiman Long wrote: > >>When performing direct I/O, the current ext4 code does > >>not pass in the DIO_SKIP_DIO_C

Re: [PATCH] fs: add the FIGETFROZEN ioctl call

2016-04-14 Thread Dave Chinner
de(filp)->i_sb; > + > + if (!capable(CAP_SYS_ADMIN)) > + return -EPERM; > + > + return sb->s_writers.frozen; This makes the internal freeze implementation states part of the userspace ABI. This needs an API that is separate from the internal implementation... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v3 1/2] ext4: Pass in DIO_SKIP_DIO_COUNT flag if inode_dio_begin() called

2016-04-13 Thread Dave Chinner
ce bypassing the DIO accounting will cause AIO writes to race with truncate. Same AIO vs truncate problem occurs with the indirect read case you modified to skip the direct IO layer accounting. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] fs: return EPERM on immutable inode

2016-04-05 Thread Dave Chinner
igned-off-by: Eryu Guan > --- > > I noticed this when running LTP on overlayfs, setxattr03 failed due to > unexpected EACCES on immutable inode. This should be in the commit message itself, rather than "EPERM looks more reasonable". Other than that, change seems fine to me.

Re: [PATCHSET v3][RFC] Make background writeback not suck

2016-03-31 Thread Dave Chinner
On Thu, Mar 31, 2016 at 09:25:33PM -0600, Jens Axboe wrote: > On 03/31/2016 06:46 PM, Dave Chinner wrote: > >>>virtio in guest, XFS direct IO -> no-op -> scsi in host. > >> > >>That has write back caching enabled on the guest, correct? > > > >No

Re: [PATCHSET v3][RFC] Make background writeback not suck

2016-03-31 Thread Dave Chinner
so on. Throttling policy decisions belong above the block layer, even though the throttle mechanism itself is in the block layer. FWIW, this is analogous to REQ_READA, which tells the block layer that a read is not important and can be discarded if there is too much load. Policy is set at the layer that knows whether the IO can be discarded safely, the mechanism is implemented at a lower layer that knows about load, scheduling and other things the higher layers know nothing about. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCHSET v3][RFC] Make background writeback not suck

2016-03-31 Thread Dave Chinner
On Thu, Mar 31, 2016 at 09:29:30PM -0600, Jens Axboe wrote: > On 03/31/2016 06:56 PM, Dave Chinner wrote: > >I'm not changing the host kernels - it's a production machine and so > >it runs long uptime testing of stable kernels. (e.g. catch slow > >memory lea

Re: [PATCHSET v3][RFC] Make background writeback not suck

2016-03-31 Thread Dave Chinner
to note whether the block throttling has any noticable difference in behaviour when compared to just having a very shallow request queue Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCHSET v3][RFC] Make background writeback not suck

2016-03-31 Thread Dave Chinner
round, but we need some way of conveying > that information to the backend. I'm not changing the host kernels - it's a production machine and so it runs long uptime testing of stable kernels. (e.g. catch slow memory leaks, etc). So if you've disabled throttling in the guest, I can't test the throttling changes. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

<    4   5   6   7   8   9   10   11   12   13   >