it within
their filesystem via ioctls like everyone else does.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vge
From: Dave Chinner
Several subsystems use the same construct for LRU lists - a list
head, a spin lock and and item count. They also use exactly the same
code for adding and removing items from the LRU. Create a generic
type for these LRU lists.
This is the beginning of generic, node aware LRUs
From: Dave Chinner
The per-ag inode cache radix trees are not walked by the shrinkers
any more, so there is no need for a special walker that contained
heurisitcs to prevent multiple shrinker instances from colliding
with each other. Hence we can just remote that and convert the code
to use the
From: Dave Chinner
Convert the remaining couple of random shrinkers in the tree to the
new API.
Signed-off-by: Dave Chinner
---
arch/x86/kvm/mmu.c | 35 +--
net/sunrpc/auth.c | 45 +++--
2 files changed, 56
From: Dave Chinner
There are no more users of this API, so kill it dead, dead, dead and
quietly bury the corpse in a shallow, unmarked grave in a dark
forest deep in the hills...
Signed-off-by: Dave Chinner
---
include/linux/shrinker.h | 15 +--
include/trace/events
From: Dave Chinner
Pass the node of the current zone being reclaimed to shrink_slab(),
allowing the shrinker control nodemask to be set appropriately for
node aware shrinkers.
Signed-off-by: Dave Chinner
---
fs/drop_caches.c |1 +
include/linux/shrinker.h |3 +++
mm/memory
From: Dave Chinner
XFS currently only tracks inodes for reclaim via tag bits in the
inode cache radix tree. While this is awesome for background reclaim
because it allows inodes to be reclaimed in ascending disk offset
order, it sucks for direct memory reclaim which really is trying to
free the
From: Dave Chinner
Convert the driver shrinkers to the new API. Most changes are
compile tested only because I either don't have the hardware or it's
staging stuff.
FWIW, the md and android code is pretty good, but the rest of it
makes me want to claw my eyes out. The amount of bro
From: Dave Chinner
Before we split up the dcache_lru_lock, the unused dentry counter
needs to be made independent of the global dcache_lru_lock. Convert
it to per-cpu counters to do this.
Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
---
fs/dcache.c | 17 ++---
1
From: Dave Chinner
Convert the filesystem shrinkers to use the new API, and standardise
some of the behaviours of the shrinkers at the same time. For
example, nr_to_scan means the number of objects to scan, not the
number of objects to free.
I refactored the CIFS idmap shrinker a little - it
From: Dave Chinner
With the dentry LRUs being per-sb structures, there is no real need
for a global dentry_lru_lock. The locking can be made more
fine-grained by moving to a per-sb LRU lock, isolating the LRU
operations of different filesytsems completely from each other.
Signed-off-by: Dave
From: Dave Chinner
Convert the buftarg LRU to use the new generic LRU list and take
advantage of the functionality it supplies to make the buffer cache
shrinker node aware.
Signed-off-by: Dave Chinner
---
fs/xfs/xfs_buf.c | 162 +-
fs/xfs
From: Dave Chinner
Signed-off-by: Dave Chinner
---
fs/dcache.c| 171 +---
fs/super.c | 10 +--
include/linux/fs.h | 15 +++--
3 files changed, 82 insertions(+), 114 deletions(-)
diff --git a/fs/dcache.c b/fs/dcache.c
index
From: Dave Chinner
Now that we have an LRU list API, we can start to enhance the
implementation. This splits the single LRU list into per-node lists
and locks to enhance scalability. Items are placed on lists
according to the node the memory belongs to. To make scanning the
lists efficient
From: Dave Chinner
Signed-off-by: Dave Chinner
---
fs/inode.c | 173 +---
fs/super.c | 11 ++--
include/linux/fs.h |6 +-
3 files changed, 75 insertions(+), 115 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index
From: Dave Chinner
One of the big problems with modifying the way the dcache shrinker
and LRU implementation works is that the LRU is abused in several
ways. One of these is shrink_dentry_list().
Basically, we can move a dentry off the LRU onto a different list
without doing any accounting
Hi Glauber,
Here's a working version of my patchset for generic LRU lists and
NUMA-aware shrinkers.
There are several parts to this patch set. The NUMA aware shrinkers
are based on having a generic node-based LRU list implementation,
and there are subsystems that need to be converted to use these
From: Dave Chinner
Convert the XFS dquot lru to use the list_lru construct and convert
the shrinker to being node aware.
Signed-off-by: Dave Chinner
---
fs/xfs/xfs_dquot.c |7 +-
fs/xfs/xfs_qm.c| 307 ++--
fs/xfs/xfs_qm.h|4
From: Dave Chinner
Convert superblock shrinker to use the new count/scan API, and
propagate the API changes through to the filesystem callouts. The
filesystem callouts already use a count/scan API, so it's just
changing counters to longs to match the VM API.
This requires the dentry and
From: Dave Chinner
Now that the shrinker is passing a nodemask in the scan control
structure, we can pass this to the the generic LRU list code to
isolate reclaim to the lists on matching nodes.
This requires a small amount of refactoring of the LRU list API,
which might be best split out into
From: Dave Chinner
The current shrinker callout API uses an a single shrinker call for
multiple functions. To determine the function, a special magical
value is passed in a parameter to change the behaviour. This
complicates the implementation and return value specification for
the different
On Wed, Nov 28, 2012 at 01:13:11AM +, Chris Wilson wrote:
> On Wed, 28 Nov 2012 10:14:44 +1100, Dave Chinner wrote:
> > +/*
> > + * XXX: (dchinner) This is one of the worst cases of shrinker abuse I've
> > seen.
> > + *
> > + * i915_gem_purge() expe
On Wed, Nov 28, 2012 at 12:21:54PM +0400, Glauber Costa wrote:
> On 11/28/2012 07:17 AM, Dave Chinner wrote:
> > On Wed, Nov 28, 2012 at 01:13:11AM +, Chris Wilson wrote:
> >> On Wed, 28 Nov 2012 10:14:44 +1100, Dave Chinner
> >> wrote:
> >>> +/*
>
#x27;s new
> SEEK_DATA and SEEK_HOLE options: so add them while the minutiae are still
> on my mind (in particular, the !PageUptodate-ness of pages fallocated but
> still unwritten).
>
> [a...@linux-foundation.org: fix warning with CONFIG_TMPFS=n]
> Signed-off-by: Hugh Dickins
> ---
824
mmap(NULL, 4294975488, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f60afcf3000
munmap(0x7f61afcf5000, 2147491840) = 0
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
2147483648) = 2147479552
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
On Thu, Nov 29, 2012 at 02:29:33PM +0400, Glauber Costa wrote:
> On 11/29/2012 01:28 AM, Dave Chinner wrote:
> > On Wed, Nov 28, 2012 at 12:21:54PM +0400, Glauber Costa wrote:
> >> On 11/28/2012 07:17 AM, Dave Chinner wrote:
> >>> On Wed, Nov 28, 2012 at 01:13:
On Thu, Nov 29, 2012 at 11:02:24AM -0800, Andi Kleen wrote:
> Dave Chinner writes:
> >
> > Comments, thoughts and flames all welcome.
>
> Doing the reclaim per CPU sounds like a big change in the VM balance.
It's per node, not per CPU. And AFAICT, it hasn't chang
ith mpage_readpages(), so it's not just direct IO that has
this problem
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
ead to
deadlocks. Here's another that's been known for ages:
http://oss.sgi.com/archives/xfs/2011-08/msg00168.html
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
On Wed, Mar 13, 2013 at 10:00:51PM -0400, Peter Hurley wrote:
> On Wed, 2013-03-13 at 14:23 +1100, Dave Chinner wrote:
> > We don't care about the ordering between multiple concurrent
> > metadata modifications - what matters is whether the ongoing data IO
> > around
very disk has
it's own filesystem), knowing which filesystem(s) are getting stuck
in writeback from the sysrq-w or hangcheck output is pretty damn
important
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux
ema we need for filesystems to
behave sanely. i.e. shouldn't we be aiming to simplify things
as we rework locking rather than make the more complex?
IOWs, I think the "it's a mapping range lock" approach is not the
right level to be providing IO exclusion semantics. After all, it&
On Wed, Feb 06, 2013 at 08:25:34PM +0100, Jan Kara wrote:
> On Wed 06-02-13 10:25:12, Dave Chinner wrote:
> > On Mon, Feb 04, 2013 at 01:38:31PM +0100, Jan Kara wrote:
> > > On Thu 31-01-13 16:07:57, Andrew Morton wrote:
> > > > > c) i_mutex doesn't allow any
gt; }
> - return 255; /* invalid */
> + return FILEID_INVALID; /* invalid */
> }
I think you can drop the "/* invalid */" comment from there now as
it is redundant with this change.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this li
ops cleared before being returned to the new user, and
newly allocated xfs_bufs are zeroed before being initialised. I
really need to know what you are doing to be able to get to the
bottom of it
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the li
On Sat, Feb 23, 2013 at 07:06:10AM +, Tony Lu wrote:
> >From: Dave Chinner [mailto:da...@fromorbit.com]
> >On Fri, Feb 22, 2013 at 08:12:52AM +, Tony Lu wrote:
> >> I encountered the following panic when using xfs partitions as rootfs,
> >> which
> >>
120k 0 0
> >none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
> >none /run/user tmpfs
> >rw,nosuid,nodev,noexec,relatime,size=102400k,mode=755 0 0
> >/dev/sda6 /home ext4 rw,noatime,discard 0 0
^^^
I'd say that's your problem
the log is corrupt and that implies a
deeper problem
> And also there is code in xlog_write_log_records() which handles
> non-sector-align reads and writes.
Yes, it does handle it, but that doesn't mean that it is correct to
pass unaligned block ranges to it.
Cheers,
Dave.
--
Dave Chi
kernel and determine if you can still
reproduce the problem on your system - that way we'll know if the
bug really has been fixed or not
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body
than 2-3TB.
Perhaps it would make testing 1-2TB ext4 filesystems fast enough
that you could do it regularly...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
M
; new i_projid field.
>
> Cc: Ben Myers
> Cc: Alex Elder
> Cc: Dave Chinner
> Signed-off-by: "Eric W. Biederman"
> ---
> fs/xfs/xfs_icache.c |2 +-
> fs/xfs/xfs_inode.c|6 +-
> fs/xfs/xfs_inode.h|7 ++-
> fs/xfs/xfs_ioctl
inode->i_gid= ip->i_d.di_gid;
Which further empahsises the layer violation...
> switch (inode->i_mode & S_IFMT) {
> case S_IFBLK:
> diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> index cf5b1d0..a9e07dd 100644
> --- a/fs/xfs/xfs_itabl
ng of project IDs like this this should not be
>> converted at all as it has nothing at all to do with the
>> namespaces.
Please drop this patch or replace it with a simple patch that passes
the project ID as an xfs_dqid_t (i.e. a flat, 32 bit quota
identifier) instead so you can kill t
s_eof_blocks to define them all
as xfs_dqid_t and convert them in place to the type that is
compatible with the XFS core use of these fields (i.e. comparing
them with the on-disk inode uid/gid/prid values).
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the
nto
> xfs's user namespace.
> - Replace uses of fa->fsx_projid with projid throughout
> xfs_ioctl_setattr.
>
> Cc: Ben Myers
> Cc: Alex Elder
> Cc: Dave Chinner
> Signed-off-by: "Eric W. Biederman"
> ---
> fs/xfs/xfs_ioctl.c | 26
On Sun, Feb 17, 2013 at 05:11:02PM -0800, Eric W. Biederman wrote:
> From: "Eric W. Biederman"
>
> Cc: Ben Myers
> Cc: Alex Elder
> Cc: Dave Chinner
> Signed-off-by: "Eric W. Biederman"
> ---
> fs/xfs/xfs_qm.c|6 +++---
> fs/xf
stent and litters id/namespace conversions all over
the place, so i don't think these cahgnes are necessary.
Hence I'd say just do absolute minimum needed for the
is_superquota() checks to work and leave all the kqid ->
xfs_dqid_t+type conversion at the boundary of the quota subsystem
wher
On Sun, Feb 17, 2013 at 05:25:43PM -0800, Eric W. Biederman wrote:
> Dave Chinner writes:
>
> > On Wed, Feb 13, 2013 at 10:13:16AM -0800, Eric W. Biederman wrote:
> >
> >> The crazy thing is that is that xfs appears to
> >> directly write their incor
ng two locks:
Nobody should be doing reverse dentry-to-name lookups in a quantity
sufficient for it to become a performance limiting factor. What is
the Oracle DB actually using this path for?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line
On Thu, Feb 21, 2013 at 11:13:27PM -0500, Waiman Long wrote:
> On 02/21/2013 07:13 PM, Andi Kleen wrote:
> >Dave Chinner writes:
> >
> >>On Tue, Feb 19, 2013 at 01:50:55PM -0500, Waiman Long wrote:
> >>>It was found that the Oracle database software issue
oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=f14e527f411712f89178c31370b5d733ea1d0280
FWIW, I think your change might need work - there's the possibility
that is can round up the length beyond the end of the log if we ask
to read up to the last sector of the log (i.e. blkno + blklen ==
end
hese operations can be found
using fsx. The issue here, however, involves memory reclaim
interactions and so is not something fsx can reproduce in isolation. :/
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kerne
On Mon, Mar 11, 2013 at 11:43:34PM -0700, Michel Lespinasse wrote:
> Hi Dave,
>
> On Mon, Mar 11, 2013 at 7:36 PM, Dave Chinner wrote:
> > On Sun, Mar 10, 2013 at 10:17:42PM -0700, Michel Lespinasse wrote:
> >> - since all readers are woken at once, you might see
init_user_ns) ? qid : -1;
> + default:
> + BUG();
> + }
> + return kqid;
> +}
> +
> +static inline struct kqid make_kqid_invalid(int type)
> +{
> + struct kqid kqid;
> +
> + kqid.type = type;
> + switch (type) {
> + case USRQUOTA:
&g
On Mon, Aug 27, 2012 at 06:05:06PM -0400, Naoya Horiguchi wrote:
> On Mon, Aug 27, 2012 at 08:26:07AM +1000, Dave Chinner wrote:
> > On Fri, Aug 24, 2012 at 01:24:16PM -0400, Naoya Horiguchi wrote:
> > > Let me explain more to clarify my whole scenario. If a memory error
>
On Wed, Aug 29, 2012 at 02:31:26AM -0700, Eric W. Biederman wrote:
>
> Dave thanks for taking the time to take a detailed look at this code.
>
> Dave Chinner writes:
>
> > On Tue, Aug 28, 2012 at 12:09:56PM -0700, Eric W. Biederman wrote:
> >>
> >> Ad
On Wed, Aug 29, 2012 at 02:32:04PM +0900, Jun'ichi Nomura wrote:
> On 08/29/12 11:59, Dave Chinner wrote:
> > On Mon, Aug 27, 2012 at 06:05:06PM -0400, Naoya Horiguchi wrote:
> >> And yes, I understand it's ideal, but many applications choose not to
> >> do t
gh stack?
I mean, we can deal with it like the ia32 4k stack issue was dealt
with (i.e. ignore those stupid XFS people, that's an XFS bug), or
we can face the reality that storage stacks have become so complex
that 8k is no longer a big enough stack for a modern system
Cheers,
Dave.
--
[] iterate_supers+0xe1/0xf0
> > [75716.705798] [] sys_sync+0x2b/0x60
> > [75716.705802] [] system_call_fastpath+0x1a/0x1f
> > [75836.701197] INFO: task sync:8790 blocked for more than 120 seconds.
Which simply says that writeback of the dirty data at the time of
the sync call has
orkqueue to avoid stack overflows, then the context
switches are going to cause significant performance regressions for
high IOPS workloads. I don't really like either situation.
So while you are discussing stack issues, think a little about the
bigger picture outside of the immediate
aren't used. xfs_dbg
does not exist - the function is xfs_debug(). The compiler won't
catch that until the macro is used, so only add the macros which are
needed for this patch series.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
etter at the caller site, not in
the function itself. i.e. if we get a NULLAGNUMBER returned, the
caller decided whether to emit an error message or not.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the b
}
> + return 0;
> }
> }
>
> +out_spc:
> + *inop = NULLFSINO;
> + return ENOSPC;
> out_alloc:
> *IO_agbp = NULL;
> return xfs_dialloc_ag(tp, agbp, parent, inop);
Default behaviour on a loo
7;t a massive increase in
size as a result of this. If we do start to use ratelimiting in lots
of places in XFS, then we might have to revisit this, but it's OK
for now.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
On Tue, Oct 02, 2012 at 10:52:05AM +0200, Lukáš Czerner wrote:
> On Mon, 1 Oct 2012, Jeff Moyer wrote:
> > Date: Mon, 01 Oct 2012 12:52:19 -0400
> > From: Jeff Moyer
> > To: Lukas Czerner
> > Cc: Jens Axboe , linux-kernel@vger.kernel.org,
> > Dave Chinn
ix_fadvise() can't really specifify everything we'd
> want to for an SSD cache).
Similar discussions about posix_fadvise() are being had for marking
ranges of files as volatile (i.e. useful for determining what can be
evicted from a cache when space reclaim is required).
http
On Tue, Oct 02, 2012 at 07:41:10PM -0700, Kent Overstreet wrote:
> On Wed, Oct 03, 2012 at 11:28:25AM +1000, Dave Chinner wrote:
> > On Tue, Oct 02, 2012 at 05:20:29PM -0700, Kent Overstreet wrote:
> > > On Tue, Oct 02, 2012 at 01:41:17PM -0400, Jeff Moyer wrote:
> > >
files are
opened (e.g. via udev rules). Hence you need to explain why you need
to change the default block device readahead on the fly, and why
fadvise(POSIX_FADV_NORMAL) is "inappropriate" to set readahead
windows to the new defaults.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
single indirect node (covering block pointers for 4 MB), plus 256
> separate block pointers (covering the last megabyte), and a 5 GB file
> can be represented using 1 double-indirect node and 256 indirect nodes,
> and each of them can still be followed by direct "tail" data and
&
ix this race by using a read lock around I/O paths and write lock
> around block size changing, but normal rw semaphore cause cache line
> bouncing when taken for read by multiple processors and I/O performance
> degradation because of it is measurable.
This doesn't sound like a new
On Wed, Oct 24, 2012 at 07:53:59AM +0800, YingHang Zhu wrote:
> Hi Dave,
> On Wed, Oct 24, 2012 at 6:47 AM, Dave Chinner wrote:
> > On Tue, Oct 23, 2012 at 08:46:51PM +0800, Ying Zhu wrote:
> >> Hi,
> >> Recently we ran into the bug that an opened file's ra_p
On Thu, Oct 25, 2012 at 08:17:05AM +0800, YingHang Zhu wrote:
> On Thu, Oct 25, 2012 at 4:19 AM, Dave Chinner wrote:
> > On Wed, Oct 24, 2012 at 07:53:59AM +0800, YingHang Zhu wrote:
> >> Hi Dave,
> >> On Wed, Oct 24, 2012 at 6:47 AM, Dave Chinner wrote:
> >>
On Thu, Oct 25, 2012 at 10:09:31AM -0400, Mikulas Patocka wrote:
>
>
> On Wed, 24 Oct 2012, Dave Chinner wrote:
>
> > On Fri, Oct 19, 2012 at 06:54:41PM -0400, Mikulas Patocka wrote:
> > >
> > >
> > > On Fri, 19 Oct 2012, Peter Zijlstra wrote:
>
ages here as it is
not a set-and-forget value. e.g. shrink_readahead_size_eio() can
reduce ra_pages as a result of IO errors. Hence if you have had io
errors, telling the kernel that you are now going to do sequential
IO should reset the readahead to the maximum ra_pages value
supported
Cheers,
On Thu, Nov 01, 2012 at 04:30:10PM -0500, Ben Myers wrote:
> Hi Dave,
>
> On Tue, Oct 30, 2012 at 09:26:13AM +1100, Dave Chinner wrote:
> > On Mon, Oct 29, 2012 at 09:03:15PM +0100, Torsten Kaiser wrote:
> > > After experiencing a hang of all IO yesterday (
> > >
equirements before discussing how or what to
implement. Indeed, discussion shoul dreally focus on getting the
core, in-memory infrastructure sorted out first before trying to
expand the functionality further...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list:
of megabytes
> If bio isn't discard
> aligned, what device will do?
Up to the device.
> Further, why driver handles alignment/granularity
> if device will ignore misaligned request.
When you send a series of sequential unaligned requests, the device
may ignore them all. Hence you en
he optimisation. I can't evalute the
merit of this change without data telling me it is worthwhile, and
it's a lot of code to churn for no benefit
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the
XFS_FSB_TO_B(mp, mp->m_sb.sb_dblocks))
return -XFS_ERROR(EINVAL);
> start = BTOBB(range.start);
> end = start + BTOBBT(range.len) - 1;
> minlen = BTOBB(max_t(u64, granularity, range.minlen));
And that will prevent the overflow in BTOBB() just as effectively...
Che
On Tue, Jul 31, 2012 at 08:55:59AM +0800, majianpeng wrote:
> On 2012-07-31 05:42 Dave Chinner Wrote:
> >On Mon, Jul 30, 2012 at 03:14:28PM +0800, majianpeng wrote:
> >> When exec bio_alloc, the bi_rw is zero.But after calling bio_add_page,
> >> it will use bi_rw.
>
eating a new one.
So I don't think this is a good idea at all...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger
On Mon, Sep 24, 2012 at 10:42:21AM +0800, Guo Chao wrote:
> On Sat, Sep 22, 2012 at 08:49:12AM +1000, Dave Chinner wrote:
>
> > On Fri, Sep 21, 2012 at 05:31:02PM +0800, Guo Chao wrote:
> > > This patchset optimizes several places which take the per inode spin lock.
> >
On Mon, Sep 24, 2012 at 02:12:05PM +0800, Guo Chao wrote:
> On Mon, Sep 24, 2012 at 02:23:43PM +1000, Dave Chinner wrote:
> > On Mon, Sep 24, 2012 at 10:42:21AM +0800, Guo Chao wrote:
> > > On Sat, Sep 22, 2012 at 08:49:12AM +1000, Dave Chinner wrote:
> > >
> >
On Mon, Sep 24, 2012 at 03:08:52PM +0800, Guo Chao wrote:
> On Mon, Sep 24, 2012 at 04:28:12PM +1000, Dave Chinner wrote:
> > > Ah, this is intended to be a code clean patchset actually. I thought these
> > > locks are redundant in an obvious and trivial manner. If, o
asure enterprise level NFS
servers.
5. Are the improvements consistent across different filesystem
types? We've had writeback changes in the past cause improvements
on one filesystem but significant regressions on others. I'd
suggest that you need to present results for ext4, XFS and btrfs so
that we have a decent idea of what we can expect from the change to
the generic code.
Yeah, I'm asking a lot of questions. That's because the generic
writeback code is extremely important to performance and the impact
of a change cannot be evaluated from a single test.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
em associated with this hot_range_item */
> + struct hot_inode_item *hot_inode;
> + /* starting offset of this range */
> + u64 start;
> + /* length of this range */
> + u64 len;
What units?
u64 start; /* start offset in bytes */
u64 len /* l
all
the abstraction down to something much simpler, say:
int hot_inode_update_freqs()
{
he = hot_inode_item_find(tree, inum, null)
if (!he) {
new_he = allocate()
if (!new_he)
return -ENOMEM;
tracking hot inodes or not.
This then means the hot inode tracking for the superblock can be
initialised by the filesystem as part of it's fill_super method,
along with the filesystem specific code that will use the hot
tracking information the VFS gathers
Cheers,
Dave.
--
Dave Chinner
On Tue, Sep 25, 2012 at 04:59:55PM +0800, Guo Chao wrote:
> On Mon, Sep 24, 2012 at 06:26:54PM +1000, Dave Chinner wrote:
> > @@ -783,14 +783,19 @@ static void __wait_on_freeing_inode(struct inode
> > *inode);
> > static struct inode *find_inode(s
esting hasn't been taught
> what it is, so strerror(EHWPOISON) returns "Unknown error 133". We
> could simply allow open(2) and stat(2) return this error, although I
> wonder if we're just better off defining a new error code.
If we are going to add special new "file co
at is a win. But
.
> >> +EXPORT_SYMBOL_GPL(hot_update_freqs);
... it's an exported function, so it can't be inline or static, so
using "inline" is wrong whatever way you look at it. ;)
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this
lockdep output that implicate GFP_KERNEL allocations from vm_map_ram
in GFP_NOFS conditions as the potential cause....
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
xfs: don't vmap inode cluster buffers during free
From: Dave Chinner
Signed-off-by: Dave Chinner
---
fs/xfs/xfs_inode.c
[add the linux-mm cc I forgot to add before sending]
On Tue, Oct 30, 2012 at 09:26:13AM +1100, Dave Chinner wrote:
> On Mon, Oct 29, 2012 at 09:03:15PM +0100, Torsten Kaiser wrote:
> > After experiencing a hang of all IO yesterday (
> > http://marc.info/?l=linux-kernel&m=13514
data. Hence *anything* that
the kernel wants to store on permanent storage should be using
xattrs because then the application has complete control of what is
stored without caring about what filesystem it is storing it on.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from t
s) then such a threshold might
only be appropriate for caches that are not memcg controlled. In
that case, handling it in the shrinker infrastructure itself is a
much better idea than hacking thresholds into individual shrinker
callouts.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
--
To unsub
optimisations should be done at the
point where they are issued - any attempt to further optimise them
by adding delays down in the stack to aggregate FUA operations will
only increase latency of the operations that the issuer want to have
complete as fast as possible
Cheers,
Dave.
--
Dave Ch
>setattr(). If it really matters, I'll
just open code file_remove_suid() into XFS like ocfs2 does just so
we don't get that warning being emitted by trinity.
FWIW, buffered IO on XFS - the normal case for most operations -
holds the i_mutex over the call to file_remove_
On Mon, Jul 08, 2013 at 02:53:52PM +0200, Michal Hocko wrote:
> On Thu 04-07-13 18:36:43, Michal Hocko wrote:
> > On Wed 03-07-13 21:24:03, Dave Chinner wrote:
> > > On Tue, Jul 02, 2013 at 02:44:27PM +0200, Michal Hocko wrote:
> > > > On Tue 02-07-1
4
>
> So there's no many *known* users of this features ... but it's more
> important
> not to break *unknown* users of it.
There are commercial products (i.e. proprietary, closed source) that
use it. SGI has one (DMF) and there are a couple of other backup
program
files and then the next power of
> two up to 262144 files.
>
> Note, when running the test, the slow down doesn't always happen
> but most of the tests will show a slow down.
Can you please check if the patch attached to this mail:
http://marc.info/?l=linux-kernel&m=137276874
1 - 100 of 2158 matches
Mail list logo