.8+/-1.3e+04 files/s) and that shows in the runtime which also
drops from 3m57s to 3m22s.
So regardless of what aim7 results we get from these changes, I'll
be merging them pending review and further testing...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Sat, Aug 13, 2016 at 06:42:51PM +0100, Ben Hutchings wrote:
> 3.16.37-rc1 review patch. If anyone has any objections, please let me know.
>
> --
>
> From: Dave Chinner
>
> commit b1438f477934f5a4d5a44df26f3079a7575d5946 upstream.
>
> When a fai
On Sat, Aug 13, 2016 at 02:30:54AM +0200, Christoph Hellwig wrote:
> On Fri, Aug 12, 2016 at 08:02:08PM +1000, Dave Chinner wrote:
> > Which says "no change". Oh well, back to the drawing board...
>
> I don't see how it would change thing much - for all relevant calc
On Fri, Aug 12, 2016 at 04:51:24PM +0800, Ye Xiaolong wrote:
> On 08/12, Ye Xiaolong wrote:
> >On 08/12, Dave Chinner wrote:
>
> [snip]
>
> >>lkp-folk: the patch I've just tested it attached below - can you
> >>feed that through your test and see if it f
On Thu, Aug 11, 2016 at 10:02:39PM -0700, Linus Torvalds wrote:
> On Thu, Aug 11, 2016 at 9:16 PM, Dave Chinner wrote:
> >
> > That's why running aim7 as your "does the filesystem scale"
> > benchmark is somewhat irrelevant to scaling applications on hig
her logging rates by doing this
That's why running aim7 as your "does the filesystem scale"
benchmark is somewhat irrelevant to scaling applications on high
performance systems these days - users with fast storage will be
expecting to see that 1.9GB/s throughput from their app, not
600MB/s
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Thu, Aug 11, 2016 at 07:27:52PM -0700, Linus Torvalds wrote:
> On Thu, Aug 11, 2016 at 5:54 PM, Dave Chinner wrote:
> >
> > So, removing mark_page_accessed() made the spinlock contention
> > *worse*.
> >
> > 36.51% [kernel] [k] _raw_spin_unlock_irqr
On Fri, Aug 12, 2016 at 10:54:42AM +1000, Dave Chinner wrote:
> I'm now going to test Christoph's theory that this is an "overwrite
> doing lots of block mapping" issue. More on that to follow.
Ok, so going back to the profiles, I can say it's not an overwrite
On Thu, Aug 11, 2016 at 11:16:12AM +1000, Dave Chinner wrote:
> On Wed, Aug 10, 2016 at 05:33:20PM -0700, Huang, Ying wrote:
> We need to know what is happening that is different - there's a good
> chance the mapping trace events will tell us. Huang, can you get
> a raw event tr
design level - the mapping->tree_lock is a global serialisation
point
I'm now going to test Christoph's theory that this is an "overwrite
doing lots of block mapping" issue. More on that to follow.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
n_unlock_irqrestore
I don't think that this is the same as what aim7 is triggering as
there's no XFS write() path allocation functions near the top of the
profile to speak of. Still, I don't recall seeing this before...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Thu, Aug 11, 2016 at 10:36:59AM +0800, Ye Xiaolong wrote:
> On 08/11, Dave Chinner wrote:
> >On Thu, Aug 11, 2016 at 11:16:12AM +1000, Dave Chinner wrote:
> >> I need to see these events:
> >>
> >>xfs_file*
> >>xfs_iomap*
> >>
On Thu, Aug 11, 2016 at 11:16:12AM +1000, Dave Chinner wrote:
> I need to see these events:
>
> xfs_file*
> xfs_iomap*
> xfs_get_block*
>
> For both kernels. An example trace from 4.8-rc1 running the command
> `xfs_io -f -c 'pwrite 0 512k -b 128k
red_write: dev
253:32 ino 0x84 size 0x4 offset 0x4 count 0x2
xfs_io-2946 [001] 253971.751236: xfs_iomap_found: dev 253:32
ino 0x84 size 0x4 offset 0x4 count 131072 type invalid startoff 0x0
startblock 24 blockcount 0x60
xfs_io-2946 [001] 253971.751381: xfs_file_buffered_write: dev
253:32 ino 0x84 size 0x4 offset 0x6 count 0x2
xfs_io-2946 [001] 253971.751415: xfs_iomap_prealloc_size: dev
253:32 ino 0x84 prealloc blocks 128 shift 0 m_writeio_blocks 16
xfs_io-2946 [001] 253971.751425: xfs_iomap_alloc: dev 253:32
ino 0x84 size 0x4 offset 0x6 count 131072 type invalid startoff 0x60
startblock -1 blockcount 0x90
That's the output I need for the complete test - you'll need to use
a better recording mechanism that this (e.g. trace-cmd record,
trace-cmd report) because it will generate a lot of events. Compress
the two report files (they'll be large) and send them to me offlist.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
't spin on at all.
We really need instruction level perf profiles to understand
this - I don't have a machine with this many cpu cores available
locally, so I'm not sure I'm going to be able to make any progress
tracking it down in the short term. Maybe the lkp team has more
in-depth cpu usage profiles they can share?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
create mode 100644 fs/xfs/libxfs/xfs_rmap_btree.h
create mode 100644 fs/xfs/xfs_rmap_item.c
create mode 100644 fs/xfs/xfs_rmap_item.h
create mode 100644 fs/xfs/xfs_trans_rmap.c
--
Dave Chinner
da...@fromorbit.com
On Fri, Aug 05, 2016 at 09:59:35PM +1000, Dave Chinner wrote:
> On Fri, Aug 05, 2016 at 11:54:17AM +0100, Mel Gorman wrote:
> > On Fri, Aug 05, 2016 at 09:11:10AM +1000, Dave Chinner wrote:
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index
On Fri, Aug 05, 2016 at 11:54:17AM +0100, Mel Gorman wrote:
> On Fri, Aug 05, 2016 at 09:11:10AM +1000, Dave Chinner wrote:
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index fb975cec3518..baa97da3687d 100644
> > > --- a/mm/page_alloc.c
> > >
On Thu, Aug 04, 2016 at 01:34:58PM +0100, Mel Gorman wrote:
> On Thu, Aug 04, 2016 at 01:24:09PM +0100, Mel Gorman wrote:
> > On Thu, Aug 04, 2016 at 03:10:51PM +1000, Dave Chinner wrote:
> > > Hi folks,
> > >
> > > I just noticed a whacky memory usage prof
be freed and removed from teh
page cache. According to the per-node counters, that is not
happening and there gigabytes of invalidated pages still sitting on
the active LRUs.
Something is broken
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Thu, Jul 28, 2016 at 11:25:13AM +0100, Mel Gorman wrote:
> On Thu, Jul 28, 2016 at 03:49:47PM +1000, Dave Chinner wrote:
> > Seems you're all missing the obvious.
> >
> > Add a tracepoint for a shrinker callback that includes a "name"
> > fie
that includes a "name"
field, have the shrinker callback fill it out appropriately. e.g
in the superblock shrinker:
trace_shrinker_callback(shrinker, shrink_control, sb->s_type->name);
And generic code that doesn't want to put a specific context name in
there can simply call:
trace_shrinker_callback(shrinker, shrink_control, __func__);
And now you know exactly what shrinker is being run.
No need to add names to any structures, it's call site defined so is
flexible, and if you're not using tracepoints has no overhead.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
xfs: rearrange xfs_bmap_add_free parameters
xfs: convert list of extents to free into a regular list
xfs: refactor btree maxlevels computation
Dave Chinner (14):
xfs: reduce lock hold times in buffer writeback
Merge branch 'fs-4.8-iomap-infrastructure' into fo
e
remote.
So it's really only a per-cpu structure for list addition
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
[PATCH] in the subject line don't get the immediate
attention of my mail filters, so I didn't see it immediately.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, Jul 19, 2016 at 02:22:47PM -0700, Calvin Owens wrote:
> On 07/18/2016 07:05 PM, Calvin Owens wrote:
> >On 07/17/2016 11:02 PM, Dave Chinner wrote:
> >>On Sun, Jul 17, 2016 at 10:00:03AM +1000, Dave Chinner wrote:
> >>>On Fri, Jul 15, 2016 at 05:18:
On Sun, Jul 17, 2016 at 10:00:03AM +1000, Dave Chinner wrote:
> On Fri, Jul 15, 2016 at 05:18:02PM -0700, Calvin Owens wrote:
> > Hello all,
> >
> > I've found a nasty source of slab corruption. Based on seeing similar
> > symptoms
> > on boxes at Faceboo
argv[1], O_RDWR|O_CREAT, 0644);
> if (fd == -1) {
> perror("Can't open");
> return 1;
> }
>
> if (!fork()) {
> count = atol(argv[2]);
>
> while (1) {
> for (i = 0; i < count; i++)
> if (write(fd, crap, CHUNK) != CHUNK)
> perror("Eh?");
>
> fsync(fd);
> ftruncate(fd, 0);
> }
H. Truncate is used, but only after fsync. If the truncate
is removed, does the problem go away?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Mon, Jul 11, 2016 at 10:02:24AM +0100, Mel Gorman wrote:
> On Mon, Jul 11, 2016 at 10:47:57AM +1000, Dave Chinner wrote:
> > > I had tested XFS with earlier releases and noticed no major problems
> > > so later releases tested only one filesystem. Given the changes sin
On Fri, Jul 08, 2016 at 01:05:40PM +, Trond Myklebust wrote:
> > On Jul 8, 2016, at 08:55, Trond Myklebust
> > wrote:
> >> On Jul 8, 2016, at 08:48, Seth Forshee
> >> wrote: On Fri, Jul 08, 2016 at
> >> 09:53:30AM +1000, Dave Chinner wrote:
> &g
On Fri, Jul 08, 2016 at 10:52:03AM +0100, Mel Gorman wrote:
> On Fri, Jul 08, 2016 at 09:27:13AM +1000, Dave Chinner wrote:
> > .
> > > This series is not without its hazards. There are at least three areas
> > > that I'm concerned with even though I cou
(just like NFS) and hence sys_sync() isn't sufficient to quiesce a
filesystem's operations.
But I'm used to being ignored on this topic (for almost 10 years,
now!). Indeed, it's been made clear in the past that I know
absolutely nothing about what is needed to be done to safely
suspend filesystem operations... :/
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
results on XFS for the
tests you ran on ext4. It might also be worth running some highly
concurrent inode cache benchmarks (e.g. the 50-million inode, 16-way
concurrent fsmark tests) to see what impact heavy slab cache
pressure has on shrinker behaviour and system balance...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, Jun 28, 2016 at 10:13:32AM +0100, Steven Whitehouse wrote:
> Hi,
>
> On 28/06/16 03:08, Dave Chinner wrote:
> >On Fri, Jun 24, 2016 at 02:50:11PM -0500, Bob Peterson wrote:
> >>This patch adds a new prune_icache_sb function for the VFS slab
> >>shrinker
erblock shrinker for
the above reasons - it's far too easy for people to get badly wrong.
If there are specific limitations on how inodes can be freed, then
move the parts of inode *freeing* that cause problems to a different
context via the ->evict/destroy callouts and trigger that external
context processing on demand. That external context can just do bulk
"if it is on the list then free it" processing, because the reclaim
policy has already been executed to place that inode on the reclaim
list.
This is essentially what XFS does, but it also uses the
->nr_cached_objects/->free_cached_objects() callouts in the
superblock shrinker to provide the reclaim rate feedback mechanism
required to throttle incoming memory allocations.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
but do
not destroy/free it - you simply queue it to an internal list and
then do the cleanup/freeing in your own time?
i.e. why do you need a special callout just to defer freeing to
another thread when we already have hooks than enable you to do
this?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
not required." To then
map KM_MAYFAIL to a flag that implies the allocation will internally
retry to try exceptionally hard to prevent failure seems wrong.
IOWs, KM_MAYFAIL means XFS is just using for normal allocator
behaviour here, so I'm not sure what problem this change is actually
solving and it's not clear from the description
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
is much rarer
As it is, I'm *extremely* paranoid when it comes to changes to core
locking like this. Performance is secondary to correctness, and we
need much more than just a few benchmarks to verify there aren't
locking bugs being introduced
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Thu, Jun 02, 2016 at 02:44:30PM +0200, Holger Hoffstätte wrote:
> On 06/02/16 14:13, Stefan Priebe - Profihost AG wrote:
> >
> > Am 31.05.2016 um 09:31 schrieb Dave Chinner:
> >> On Tue, May 31, 2016 at 08:11:42AM +0200, Stefan Priebe - Profihost AG
> >> wrote
g the problem now.
>
> I was able to reproduce it again with the same steps.
Hmmm, Ok. I've been running the lockperf test and kernel builds all
day on a filesystem that is identical in shape and size to yours
(i.e. xfs_info output is the same) but I haven't reproduced it yet.
Is it possible to get a metadump image of your filesystem to see if
I can reproduce it on that?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
urrent affinity list: 0-15
pid 9597's new affinity list: 0,4,8,12
sh: 1: cannot create /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor:
Directory nonexistent
posix01 -n 8 -l 100
posix02 -n 8 -l 100
posix03 -n 8 -i 100
$
So, I've just removed those tests from your script. I'll see if I
have any luck with reproducing the problem now.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
You didn't run out of space or something unusual like that? Does
'xfs_repair -n ' report any errors?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
he XFS code appears to be handling the
dirty page that is being passed to it correctly. We'll work out what
needs to be done to get rid of the warning for this case, wether it
be a mm/ change or an XFS change.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, May 31, 2016 at 12:59:04PM +0900, Minchan Kim wrote:
> On Tue, May 31, 2016 at 12:55:09PM +1000, Dave Chinner wrote:
> > On Tue, May 31, 2016 at 10:07:24AM +0900, Minchan Kim wrote:
> > > On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote:
> > > > B
On Tue, May 31, 2016 at 10:07:24AM +0900, Minchan Kim wrote:
> On Tue, May 31, 2016 at 08:36:57AM +1000, Dave Chinner wrote:
> > [adding lkml and linux-mm to the cc list]
> >
> > On Mon, May 30, 2016 at 09:23:48AM +0200, Stefan Priebe - Profihost AG
> > wrote:
>
es relating to writeback and
memory reclaim. It might be worth trying as a workaround for now.
MM-folk - is this analysis correct? If so, why is
shrink_active_list() calling try_to_release_page() on dirty pages?
Is this just an oversight or is there some problem that this is
trying to work around? It seems trivial to fix to me (add a
!PageDirty check), but I don't know why the check is there in the
first place...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Thu, May 26, 2016 at 07:05:11PM -0700, Linus Torvalds wrote:
> On Thu, May 26, 2016 at 5:13 PM, Dave Chinner wrote:
> > On Thu, May 26, 2016 at 10:19:13AM -0700, Linus Torvalds wrote:
> >>
> >> i'm ok with the late branches, it's not like xfs has been a pro
On Thu, May 26, 2016 at 10:19:13AM -0700, Linus Torvalds wrote:
> On Wed, May 25, 2016 at 11:13 PM, Dave Chinner wrote:
> >
> > Just yell if this is not OK and I'll drop those branches for this
> > merge and resend the pull request
>
> i'm ok with the
e kmem_realloc
xfs: fix warning in xfs_finish_page_writeback for non-debug builds
Dave Chinner (20):
xfs: Don't wrap growfs AGFL indexes
xfs: build bios directly in xfs_add_to_ioend
xfs: don't release bios on completion immediately
xfs: remove xfs_fs_evict_
he patchset
*exactly* like Linus us now suggesting, I walked away and haven't
looked at your patches since. Is it any wonder that no other
filesystem maintainer has bothered to waste their time on this
since?
Linus - I'd suggest these VFS timestamp patches need to go through
Al's VFS tree. That way we don't get unreviewed VFS infrastructure
changes going into your tree via a door that nobody was paying
attention to...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
up. At
> that point we switch to sharing pages with the read-write copy.
Unless I'm missing something here (quite possible!), I'm not sure
we can fix that problem with page cache sharing or reflink. It
implies we are sharing pages in a downwards direction - private
overlay pages/mappings from multiple inodes would need to be shared
with a single underlying shared read-only inode, and I lack the
imagination to see how that works...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
king
contexts via ASSERT(xfs_isilocked()) calls
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Thu, May 19, 2016 at 12:17:26AM +1000, Dave Chinner wrote:
> Patch below should fix the deadlock.
The test has been running for several hours without failure using
this patch, so I'd say this fixes the problem...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Wed, May 18, 2016 at 07:46:17PM +0800, Xiong Zhou wrote:
>
> On Wed, May 18, 2016 at 07:54:09PM +1000, Dave Chinner wrote:
> > On Wed, May 18, 2016 at 04:31:50PM +0800, Xiong Zhou wrote:
> > > Hi,
> > >
> > > On Wed, May 18, 2016 at 03:56:34PM +1000, Dav
On Wed, May 18, 2016 at 04:31:50PM +0800, Xiong Zhou wrote:
> Hi,
>
> On Wed, May 18, 2016 at 03:56:34PM +1000, Dave Chinner wrote:
> > On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote:
> > > Hi,
> > >
> > > Parallel cp workload (xfstest
lock, but it is
not obvious what that may be yet.
Can you reproduce this with CONFIG_XFS_DEBUG=y set? if you can, and
it doesn't trigger any warnings or asserts, can you then try to
reproduce it while tracing the following events:
xfs_buf_lock
xfs_buf_lock_done
xfs_buf_trylock
xfs_buf_unlock
So we might be able to see if there's an unexpected buffer
locking/state pattern occurring when the hang occurs?
Also, if you run on slower storage, does the hang get harder or
easier to hit?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
patch should replace blkdev_dax_capable(), or just reuse that
> >> existing routine, or am I missing something?
> >
> > Good question. bdev_supports_dax() is a helper function tailored for the
> > filesystem's mount -o dax case. While blkdev_dax_capable() is similar, it
> > does not need error messages like "device does not support dax" since it
> > implicitly enables dax when capable. So, I think we can keep
> > blkdev_dax_capable(), but change it to call bdev_direct_access() so that
> > actual check is performed in a single place.
>
> Sounds good to me.
Can you name them consistently then? i.e. blkdev_dax_supported() and
blkdev_dax_capable()?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
[ OT, but I'll reply anyway :P ]
On Fri, May 06, 2016 at 02:29:23PM -0400, J. Bruce Fields wrote:
> On Thu, May 05, 2016 at 08:56:02AM +1000, Dave Chinner wrote:
> > In the latest XFS filesystem format, we randomise the generation
> > value during every inode allocation to m
On Thu, May 05, 2016 at 11:24:35PM +0100, Djalal Harouni wrote:
> On Thu, May 05, 2016 at 10:23:14AM +1000, Dave Chinner wrote:
> > On Wed, May 04, 2016 at 04:26:46PM +0200, Djalal Harouni wrote:
> > > This is version 2 of the VFS:userns support portable root filesystems
> &
y-handle capability available
> >
> > ...the last bit seems to indicate that we don't really need this
> > anyway, as most userland servers now work with filehandles from the
> > kernel.
> >
> > Maybe leave it out for now? It can always be added later.
>
> Yeah... probably a good idea.
Fine by me.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Wed, May 04, 2016 at 06:44:14PM -0700, Andy Lutomirski wrote:
> On Wed, May 4, 2016 at 5:23 PM, Dave Chinner wrote:
> > On Wed, May 04, 2016 at 04:26:46PM +0200, Djalal Harouni wrote:
> >> This is version 2 of the VFS:userns support portable root filesystems
> >> R
cs of direct IO, because otherwise a single writer
prevents any IO concurrency and that's a bigger problem for DAX that
traditional storage due to the access speed and bandwidth available.
This was always intended to be fixed by the introduction of proper
range locking for IO, not by revertin
in VFS means "virtual" and has nothing to do with
disks or persistent storage formats. Indeed, let's convert the UID
to "on-disk" format for a network filesystem client
.
> * Add XFS support.
What is the problem here?
Next question: how does this work with uid/gid based quotas?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
is
> specified.
>
> Signed-off-by: Toshi Kani
> Cc: Dave Chinner
> Cc: Dan Williams
> Cc: Ross Zwisler
> Cc: Christoph Hellwig
> Cc: Boaz Harrosh
> ---
> fs/xfs/xfs_super.c | 23 +++
> 1 file changed, 19 insertions(+), 4 deletions(-)
>
>
ct of updating something requested.
I would suggest that exposing them from the NFS server is something
we most definitely don't want to do because they are the only thing
that keeps remote users from guessing filehandles with ease
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, May 03, 2016 at 10:28:15AM -0700, Dan Williams wrote:
> On Mon, May 2, 2016 at 6:51 PM, Dave Chinner wrote:
> > On Mon, May 02, 2016 at 04:25:51PM -0700, Dan Williams wrote:
> [..]
> > Yes, I know, and it doesn't answer any of the questions I just
> > asked.
ehaviour present. The only
guarantee for persistence that an app will be able to rely on is
msync().
> But I don't see how that
> direction is getting turned into an argument against msync() efficiency.
Promoting a model that works around inefficiency rather than solving
it is no
On Sun, May 01, 2016 at 08:19:44AM +1000, NeilBrown wrote:
> On Sat, Apr 30 2016, Dave Chinner wrote:
> > Indeed, blocking the superblock shrinker in reclaim is a key part of
> > balancing inode cache pressure in XFS. If the shrinker starts
> > hitting dirty inodes, it bl
On Tue, May 03, 2016 at 05:38:23PM +0200, Michal Hocko wrote:
> On Sat 30-04-16 09:40:08, Dave Chinner wrote:
> > On Fri, Apr 29, 2016 at 02:12:20PM +0200, Michal Hocko wrote:
> [...]
> > > - was it
> > > "inconsistent {RECLAIM_FS-ON-[RW]} -> {IN-RECLAIM_FS-[
m filesytems.
Encoding cache flushing for data integrity into the userspace
applications assumes that such future pmem-based storage will have
identical persistence requirements to the existing hardware. This,
to me, seems very unlikely to be the case (especially when
considering different platforms (e.g. power, ARM)) and so, again,
application developers are likely to have to fall back to using a
kernel provided data integrity primitive they know they can rely on
(i.e. msync()).
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Mon, May 02, 2016 at 04:25:51PM -0700, Dan Williams wrote:
> On Mon, May 2, 2016 at 4:04 PM, Dave Chinner wrote:
> > On Mon, May 02, 2016 at 11:18:36AM -0400, Jeff Moyer wrote:
> >> Dave Chinner writes:
> >>
> >> > On Mon, Apr 25, 2016 at 11:53:13PM +
On Mon, May 02, 2016 at 10:53:25AM -0700, Dan Williams wrote:
> On Mon, May 2, 2016 at 8:18 AM, Jeff Moyer wrote:
> > Dave Chinner writes:
> [..]
> >> We need some form of redundancy and correction in the PMEM stack to
> >> prevent single sector errors from
On Mon, May 02, 2016 at 11:18:36AM -0400, Jeff Moyer wrote:
> Dave Chinner writes:
>
> > On Mon, Apr 25, 2016 at 11:53:13PM +, Verma, Vishal L wrote:
> >> On Tue, 2016-04-26 at 09:25 +1000, Dave Chinner wrote:
> > You're assuming that only the DAX aware app
in place, I'd then make the changes to the generic
superblock shrinker code to enable finer grained reclaim and
optimise the XFS shrinkers to make use of it...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
hread can't keep up
with all of the allocation pressure that occurs. e.g. a 20-core
intel CPU with local memory will be seen as a single node and so
will have a single kswapd thread to do reclaim. There's a massive
imbalance between maximum reclaim rate and maximum allocation rate
in situations like this. If we want memory reclaim to run faster,
we to be able to do more work *now*, not defer it to a context with
limited execution resources.
i.e. IMO deferring more work to a single reclaim thread per node is
going to limit memory reclaim scalability and performance, not
improve it.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Fri, Apr 29, 2016 at 02:12:20PM +0200, Michal Hocko wrote:
> On Fri 29-04-16 07:51:45, Dave Chinner wrote:
> > On Thu, Apr 28, 2016 at 10:17:59AM +0200, Michal Hocko wrote:
> > > [Trim the CC list]
> > > On Wed 27-04-16 08:58:45, Dave Chinner wrote:
> > &
On Thu, Apr 28, 2016 at 10:17:59AM +0200, Michal Hocko wrote:
> [Trim the CC list]
> On Wed 27-04-16 08:58:45, Dave Chinner wrote:
> [...]
> > Often these are to silence lockdep warnings (e.g. commit b17cb36
> > ("xfs: fix missing KM_NOFS tags to keep lockdep happy")
On Wed, Apr 27, 2016 at 10:03:11AM +0200, Michal Hocko wrote:
> On Wed 27-04-16 08:58:45, Dave Chinner wrote:
> > On Tue, Apr 26, 2016 at 01:56:12PM +0200, Michal Hocko wrote:
> > > From: Michal Hocko
> > >
> > > THIS PATCH IS FOR TESTING ONLY AND NOT MEANT
On Wed, Apr 27, 2016 at 08:31:38PM +0200, Lucas Stach wrote:
> Am Dienstag, den 26.04.2016, 09:08 +1000 schrieb Dave Chinner:
> [...]
> > >
> > > >
> > > > That said, I'm not sure whether there's a notable benefit of
> > > > idling
>
#x27;t actually care about in XFS at all. That way I can carry all
the XFS changes in the XFS tree and not have to worry about when
this stuff gets merged or conflicts with the rest of the work that
is being done to the mm/ code and whatever tree that eventually
lands in...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ing
to restart the flood of false positive lockdep warnings we've
silenced over the years, so perhaps lockdep needs to be made smarter
as well...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Mon, Apr 25, 2016 at 09:18:42PM -0700, Dan Williams wrote:
> On Mon, Apr 25, 2016 at 7:56 PM, Dave Chinner wrote:
> > On Mon, Apr 25, 2016 at 06:45:08PM -0700, Dan Williams wrote:
> >> > I haven't seen any design/documentation for infrastructure at the
> >
On Mon, Apr 25, 2016 at 06:45:08PM -0700, Dan Williams wrote:
> On Mon, Apr 25, 2016 at 5:11 PM, Dave Chinner wrote:
> > On Mon, Apr 25, 2016 at 04:43:14PM -0700, Dan Williams wrote:
> [..]
> >> Maybe I missed something, but all these assumptions are already
> >> pre
On Mon, Apr 25, 2016 at 11:53:13PM +, Verma, Vishal L wrote:
> On Tue, 2016-04-26 at 09:25 +1000, Dave Chinner wrote:
> >
> <>
>
> > >
> > > - It checks badblocks and discovers it's files have lost data
> > Lots of hand-waving here. How doe
On Mon, Apr 25, 2016 at 04:43:14PM -0700, Dan Williams wrote:
> On Mon, Apr 25, 2016 at 4:25 PM, Dave Chinner wrote:
> > On Mon, Apr 25, 2016 at 05:14:36PM +, Verma, Vishal L wrote:
> >> On Mon, 2016-04-25 at 01:31 -0700, h...@infradead.org wrote:
> >> > On S
d that then assumes the the filesystem will zero blocks if
they get reused to clear errors on that LBA sector mapping before
they are accessible again to userspace..
It seems to me that there are a number of assumptions being made
across multiple layers here. Maybe I've missed something - can you
point me to the design/architecture description so I can see how
"app does data recovery itself" dance is supposed to work?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
orker requests the complete AIL to be
> pushed out, then it goes back to sleep indefinitely until the log fills
> up again.
The behaviour suggests that your filesystem is not idle. The
filesystem takes up to 90s to be marked idle (log needs to be
covered, state machine takes 3x30s cycles to transition to idle
"covered" state.
If you want the filesytsem to idle quickly, then run the log worker
more frequently to get the target updated more quickly. This will
also speed up the log covering state machine as well.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Fri, Apr 22, 2016 at 11:53:48PM +0200, Florian Margaine wrote:
> On Tue, Apr 19, 2016 at 1:06 AM, Dave Chinner wrote:
> >> A way to query freeze state might be nice, I think, but yeah, it's
> >> racy, so you can't depend on it - but it might be useful in the &
On Mon, Apr 18, 2016 at 03:46:46PM -0400, Waiman Long wrote:
> On 04/15/2016 06:19 PM, Dave Chinner wrote:
> >On Fri, Apr 15, 2016 at 01:17:41PM -0400, Waiman Long wrote:
> >>On 04/15/2016 04:17 AM, Dave Chinner wrote:
> >>>On Thu, Apr 14, 2016 at 12:21:13PM -0400,
On Mon, Apr 18, 2016 at 11:20:22AM -0400, Eric Sandeen wrote:
>
>
> On 4/14/16 10:17 PM, Dave Chinner wrote:
> > On Thu, Apr 14, 2016 at 09:57:07AM +0200, Florian Margaine wrote:
> >> This lets userland get the filesystem freezing status, aka whether the
> >> fil
On Fri, Apr 15, 2016 at 01:17:41PM -0400, Waiman Long wrote:
> On 04/15/2016 04:17 AM, Dave Chinner wrote:
> >On Thu, Apr 14, 2016 at 12:21:13PM -0400, Waiman Long wrote:
> >>On 04/13/2016 11:16 PM, Dave Chinner wrote:
> >>>On Tue, Apr 12, 2016 at 02:12:54PM -0400,
On Thu, Apr 14, 2016 at 12:21:13PM -0400, Waiman Long wrote:
> On 04/13/2016 11:16 PM, Dave Chinner wrote:
> >On Tue, Apr 12, 2016 at 02:12:54PM -0400, Waiman Long wrote:
> >>When performing direct I/O, the current ext4 code does
> >>not pass in the DIO_SKIP_DIO_C
de(filp)->i_sb;
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EPERM;
> +
> + return sb->s_writers.frozen;
This makes the internal freeze implementation states part of the
userspace ABI. This needs an API that is separate from the internal
implementation...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ce bypassing the DIO
accounting will cause AIO writes to race with truncate.
Same AIO vs truncate problem occurs with the indirect read case you
modified to skip the direct IO layer accounting.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
igned-off-by: Eryu Guan
> ---
>
> I noticed this when running LTP on overlayfs, setxattr03 failed due to
> unexpected EACCES on immutable inode.
This should be in the commit message itself, rather than "EPERM
looks more reasonable".
Other than that, change seems fine to me.
On Thu, Mar 31, 2016 at 09:25:33PM -0600, Jens Axboe wrote:
> On 03/31/2016 06:46 PM, Dave Chinner wrote:
> >>>virtio in guest, XFS direct IO -> no-op -> scsi in host.
> >>
> >>That has write back caching enabled on the guest, correct?
> >
> >No
so on.
Throttling policy decisions belong above the block layer, even
though the throttle mechanism itself is in the block layer.
FWIW, this is analogous to REQ_READA, which tells the block layer
that a read is not important and can be discarded if there is too
much load. Policy is set at the layer that knows whether the IO can
be discarded safely, the mechanism is implemented at a lower layer
that knows about load, scheduling and other things the higher layers
know nothing about.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Thu, Mar 31, 2016 at 09:29:30PM -0600, Jens Axboe wrote:
> On 03/31/2016 06:56 PM, Dave Chinner wrote:
> >I'm not changing the host kernels - it's a production machine and so
> >it runs long uptime testing of stable kernels. (e.g. catch slow
> >memory lea
to note whether the block throttling has any
noticable difference in behaviour when compared to just having a
very shallow request queue
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
round, but we need some way of conveying
> that information to the backend.
I'm not changing the host kernels - it's a production machine and so
it runs long uptime testing of stable kernels. (e.g. catch slow
memory leaks, etc). So if you've disabled throttling in the guest, I
can't test the throttling changes.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
801 - 900 of 2158 matches
Mail list logo