Hi!
> > > Which shouldn't be true. There is no fundamental reason why
> > > ordinary writes should be slower than O_DIRECT.
> >
> > Again, there IS a reason: O_DIRECT eliminates the cpu overhead of the
> > kernel-user copy,
>
> You assume that ordinary read()/write() is *required* to do the co
Phillip Susi wrote:
[]
> You seem to have missed the point of this thread. Denis Vlasenko's
> message that you replied to simply pointed out that they are
> semantically equivalent, so O_DIRECT can be dropped provided that O_SYNC
> + madvise could be fixed to perform as well. Several people inclu
On Tue, Jan 30, 2007 at 06:07:14PM -0500, Phillip Susi wrote:
> It most certainly matters where the error happened because "you are
> screwd" is not an acceptable outcome in a mission critical application.
An I/O error is not an acceptable outcome in a mission critical app,
all mission critical
Andrea Arcangeli wrote:
When you have I/O errors during _writes_ (not Read!!) the raid must
kick the disk out of the array before the OS ever notices. And if it's
software raid that you're using, the OS should kick out the disk
before your app ever notices any I/O error. when the write I/O error
On Tue, Jan 30, 2007 at 08:57:20PM +0100, Andrea Arcangeli wrote:
> Please try yourself, it's simple enough:
>
>time dd if=/dev/hda of=/dev/null bs=16M count=100
>time dd if=/dev/hda of=/dev/null bs=16M count=100 iflag=sync
sorry, reading won't help much to exercise sync ;). But t
On Tue, Jan 30, 2007 at 01:50:41PM -0500, Phillip Susi wrote:
> It should return the number of bytes successfully written before the
> error, giving you the location of the first error. Also using smaller
> individual writes ( preferably issued in parallel ) also allows the
> problem spot to be
Andrea Arcangeli wrote:
On Tue, Jan 30, 2007 at 10:36:03AM -0500, Phillip Susi wrote:
Did you intentionally drop this reply off list?
No.
Then I'll restore the lkml to the cc list.
No, it doesn't... or at least can't report WHERE the error is.
O_SYNC doesn't report where the error is eit
On Monday 29 January 2007 18:00, Andrea Arcangeli wrote:
> On Sun, Jan 28, 2007 at 06:03:08PM +0100, Denis Vlasenko wrote:
> > I still don't see much difference between O_SYNC and O_DIRECT write
> > semantic.
>
> O_DIRECT is about avoiding the copy_user between cache and userland,
> when working w
On Sun, Jan 28, 2007 at 06:03:08PM +0100, Denis Vlasenko wrote:
> I still don't see much difference between O_SYNC and O_DIRECT write
> semantic.
O_DIRECT is about avoiding the copy_user between cache and userland,
when working with devices that runs faster than ram (think >=100M/sec,
quite standa
Denis Vlasenko wrote:
I still don't see much difference between O_SYNC and O_DIRECT write
semantic.
Yes, if you change the normal io paths to properly support playing
vmsplice games ( which have a number of corner cases ) to get the zero
copy, and support madvise() and O_SYNC to control cachi
On Sunday 28 January 2007 16:30, Bill Davidsen wrote:
> Denis Vlasenko wrote:
> > On Saturday 27 January 2007 15:01, Bodo Eggert wrote:
> >> Denis Vlasenko <[EMAIL PROTECTED]> wrote:
> >>> On Friday 26 January 2007 19:23, Bill Davidsen wrote:
> Denis Vlasenko wrote:
> > On Thursday 25 Janu
On Sunday 28 January 2007 16:18, Bill Davidsen wrote:
> Denis Vlasenko wrote:
> > On Friday 26 January 2007 19:23, Bill Davidsen wrote:
> >> Denis Vlasenko wrote:
> >>> On Thursday 25 January 2007 21:45, Michael Tokarev wrote:
> Phillip Susi wrote:
>
> [...]
>
> But even single-th
Denis Vlasenko wrote:
On Saturday 27 January 2007 15:01, Bodo Eggert wrote:
Denis Vlasenko <[EMAIL PROTECTED]> wrote:
On Friday 26 January 2007 19:23, Bill Davidsen wrote:
Denis Vlasenko wrote:
On Thursday 25 January 2007 21:45, Michael Tokarev wrote:
But even single-threaded I/O but in larg
Denis Vlasenko wrote:
On Friday 26 January 2007 19:23, Bill Davidsen wrote:
Denis Vlasenko wrote:
On Thursday 25 January 2007 21:45, Michael Tokarev wrote:
Phillip Susi wrote:
[...]
But even single-threaded I/O but in large quantities benefits from O_DIRECT
significantly, and I poi
On Saturday 27 January 2007 15:01, Bodo Eggert wrote:
> Denis Vlasenko <[EMAIL PROTECTED]> wrote:
> > On Friday 26 January 2007 19:23, Bill Davidsen wrote:
> >> Denis Vlasenko wrote:
> >> > On Thursday 25 January 2007 21:45, Michael Tokarev wrote:
>
> >> >> But even single-threaded I/O but in larg
Denis Vlasenko <[EMAIL PROTECTED]> wrote:
> On Friday 26 January 2007 19:23, Bill Davidsen wrote:
>> Denis Vlasenko wrote:
>> > On Thursday 25 January 2007 21:45, Michael Tokarev wrote:
>> >> But even single-threaded I/O but in large quantities benefits from
>> >> O_DIRECT significantly, and I poi
On Friday 26 January 2007 19:23, Bill Davidsen wrote:
> Denis Vlasenko wrote:
> > On Thursday 25 January 2007 21:45, Michael Tokarev wrote:
> >> Phillip Susi wrote:
> >>> Denis Vlasenko wrote:
> You mean "You can use aio_write" ?
> >>> Exactly. You generally don't use O_DIRECT without aio. C
On Friday 26 January 2007 18:05, Phillip Susi wrote:
> Denis Vlasenko wrote:
> > Which shouldn't be true. There is no fundamental reason why
> > ordinary writes should be slower than O_DIRECT.
>
> Again, there IS a reason: O_DIRECT eliminates the cpu overhead of the
> kernel-user copy,
You assu
Denis Vlasenko wrote:
On Thursday 25 January 2007 21:45, Michael Tokarev wrote:
Phillip Susi wrote:
Denis Vlasenko wrote:
You mean "You can use aio_write" ?
Exactly. You generally don't use O_DIRECT without aio. Combining the
two is what gives the big win.
Well, it's not only aio. Multith
Denis Vlasenko wrote:
Which shouldn't be true. There is no fundamental reason why
ordinary writes should be slower than O_DIRECT.
Again, there IS a reason: O_DIRECT eliminates the cpu overhead of the
kernel-user copy, and when coupled with multithreading or aio, allows
the IO queues to be ke
Mark Lord wrote:
You guys need to backup in this thread.
Every example of O_DIRECT here could be replaced with
calls to mmap(), msync(), and madvise() (or posix_fadvise).
In addition to being at least as fast as O_DIRECT,
these have the added benefit of using the page cache (avoiding reads for
Mark Lord wrote:
> You guys need to backup in this thread.
>
> Every example of O_DIRECT here could be replaced with
> calls to mmap(), msync(), and madvise() (or posix_fadvise).
No. How about handling IO errors? There is no practical way for it with
mmap().
> In addition to being at least as fa
You guys need to backup in this thread.
Every example of O_DIRECT here could be replaced with
calls to mmap(), msync(), and madvise() (or posix_fadvise).
In addition to being at least as fast as O_DIRECT,
these have the added benefit of using the page cache
(avoiding reads for data already pres
Denis Vlasenko wrote:
Well, I too currently work with Oracle.
Apparently people who wrote damn thing have very, eh, Oracle-centric
world-view. "We want direct writes to the disk. Period." Why? Does it
makes sense? Are there better ways? - nothing. They think they know better.
I fear you are tak
On Thursday 25 January 2007 21:45, Michael Tokarev wrote:
> Phillip Susi wrote:
> > Denis Vlasenko wrote:
> >> You mean "You can use aio_write" ?
> >
> > Exactly. You generally don't use O_DIRECT without aio. Combining the
> > two is what gives the big win.
>
> Well, it's not only aio. Multith
Phillip Susi wrote:
> Denis Vlasenko wrote:
>> You mean "You can use aio_write" ?
>
> Exactly. You generally don't use O_DIRECT without aio. Combining the
> two is what gives the big win.
Well, it's not only aio. Multithreaded I/O also helps alot -- all this,
say, to utilize a raid array with
Denis Vlasenko wrote:
You mean "You can use aio_write" ?
Exactly. You generally don't use O_DIRECT without aio. Combining the
two is what gives the big win.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordom
On Thursday 25 January 2007 20:28, Phillip Susi wrote:
> > Ahhh shit, are you saying that fdatasync will wait until writes
> > *by all other processes* to thios file will hit the disk?
> > Is that thue?
>
> I think all processes yes, but certainly all writes to this file by this
> process. That
Denis Vlasenko wrote:
If you opened a file and are doing only O_DIRECT writes, you
*always* have your written data flushed, by each write().
How is it different from writes done using
"normal" write() + fdatasync() pairs?
Because you can do writes async, but not fdatasync ( unless there is an
On Thursday 25 January 2007 16:44, Phillip Susi wrote:
> Denis Vlasenko wrote:
> > I will still disagree on this point (on point "use O_DIRECT, it's faster").
> > There is no reason why O_DIRECT should be faster than "normal" read/write
> > to large, aligned buffer. If O_DIRECT is faster on today's
Denis Vlasenko wrote:
I will still disagree on this point (on point "use O_DIRECT, it's faster").
There is no reason why O_DIRECT should be faster than "normal" read/write
to large, aligned buffer. If O_DIRECT is faster on today's kernel,
then Linux' read()/write() can be optimized more.
Ahh bu
On Monday 22 January 2007 17:17, Phillip Susi wrote:
> > You do not need to know which read() exactly failed due to bad disk.
> > Filename and offset from the start is enough. Right?
> >
> > So, SIGIO/SIGBUS can provide that, and if your handler is of
> > void (*sa_sigaction)(int, siginfo_t *,
Denis Vlasenko wrote:
The difference is that you block exactly when you try to access
data which is not there yet, not sooner (potentially much sooner).
If application (e.g. database) needs to know whether data is _really_ there,
it should use aio_read (or something better, something which doesn
Andrea Arcangeli wrote:
> Linus may be right that perhaps one day the CPU will be so much faster
> than disk that such a copy will not be measurable and then O_DIRECT
> could be downgraded to O_STREAMING or an fadvise. If such a day will
> come by, probably that same day Dr. Tanenbaum will be final
Denis Vlasenko wrote:
What will happen if we just make open ignore O_DIRECT? ;)
And then anyone who feels sad about is advised to do it
like described here:
http://lkml.org/lkml/2002/5/11/58
Then database and other high performance IO users will be broken. Most
of Linus's rant there is bein
Hello everyone,
This is a long thread about O_DIRECT surprisingly without a single
bugreport in it, that's a good sign that O_DIRECT is starting to work
well in 2.6 too ;)
On Fri, Jan 12, 2007 at 02:47:48PM -0800, Andrew Morton wrote:
> On Fri, 12 Jan 2007 15:35:09 -0700
> Erik Andersen <[EMAIL P
On Sunday 21 January 2007 13:09, Michael Tokarev wrote:
> Denis Vlasenko wrote:
> > On Saturday 20 January 2007 21:55, Michael Tokarev wrote:
> >> Denis Vlasenko wrote:
> >>> On Thursday 11 January 2007 18:13, Michael Tokarev wrote:
> example, which isn't quite possible now from userspace. Bu
Denis Vlasenko wrote:
> On Saturday 20 January 2007 21:55, Michael Tokarev wrote:
>> Denis Vlasenko wrote:
>>> On Thursday 11 January 2007 18:13, Michael Tokarev wrote:
example, which isn't quite possible now from userspace. But as long as
O_DIRECT actually writes data before returning f
On Saturday 20 January 2007 21:55, Michael Tokarev wrote:
> Denis Vlasenko wrote:
> > On Thursday 11 January 2007 18:13, Michael Tokarev wrote:
> >> example, which isn't quite possible now from userspace. But as long as
> >> O_DIRECT actually writes data before returning from write() call (as it
>
Denis Vlasenko wrote:
> On Thursday 11 January 2007 18:13, Michael Tokarev wrote:
>> example, which isn't quite possible now from userspace. But as long as
>> O_DIRECT actually writes data before returning from write() call (as it
>> seems to be the case at least with a normal filesystem on a real
On Sunday 14 January 2007 10:11, Nate Diller wrote:
> On 1/12/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
> Most applications don't get the kind of performance analysis that
> Digeo was doing, and even then, it's rather lucky that we caught that.
> So I personally think it'd be best for libc or s
On Thursday 11 January 2007 18:13, Michael Tokarev wrote:
> example, which isn't quite possible now from userspace. But as long as
> O_DIRECT actually writes data before returning from write() call (as it
> seems to be the case at least with a normal filesystem on a real block
> device - I don't t
On Thursday 11 January 2007 16:50, Linus Torvalds wrote:
>
> On Thu, 11 Jan 2007, Nick Piggin wrote:
> >
> > Speaking of which, why did we obsolete raw devices? And/or why not just
> > go with a minimal O_DIRECT on block device support? Not a rhetorical
> > question -- I wasn't involved in the di
On Tue, 16 Jan 2007, Arjan van de Ven wrote:
> On Tue, 2007-01-16 at 21:26 +0100, Bodo Eggert wrote:
> > Helge Hafting <[EMAIL PROTECTED]> wrote:
> > > Michael Tokarev wrote:
> > >> But seriously - what about just disallowing non-O_DIRECT opens together
> > >> with O_DIRECT ones ?
> > >>
> > >
I think one problem with mmap/msync is that they can't maintain
i_size atomically like regular write does. so, one needs to
implement own i_size management in userspace.
thanks, Alex
> Side note: the only reason O_DIRECT exists is because database people are
> too used to it, because other OS's
On Tue, 2007-01-16 at 21:26 +0100, Bodo Eggert wrote:
> Helge Hafting <[EMAIL PROTECTED]> wrote:
> > Michael Tokarev wrote:
>
> >> But seriously - what about just disallowing non-O_DIRECT opens together
> >> with O_DIRECT ones ?
> >>
> > Please do not create a new local DOS attack.
> > I open s
On 1/12/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:
On Thu, 11 Jan 2007, Roy Huang wrote:
>
> On a embedded systerm, limiting page cache can relieve memory
> fragmentation. There is a patch against 2.6.19, which limit every
> opened file page cache and total pagecache. When the limit reach, i
Helge Hafting <[EMAIL PROTECTED]> wrote:
> Michael Tokarev wrote:
>> But seriously - what about just disallowing non-O_DIRECT opens together
>> with O_DIRECT ones ?
>>
> Please do not create a new local DOS attack.
> I open some important file, say /etc/resolv.conf
> with O_DIRECT and just sit
On Fri, 12 January 2007 00:19:45 +0800, Aubrey wrote:
>
> Yes for desktop, server, but maybe not for embedded system, specially
> for no-mmu linux. In many embedded system cases, the whole system is
> running in the ram, including file system. So it's not necessary using
> page cache anymore. Page
Michael Tokarev wrote:
Chris Mason wrote:
[]
I recently spent some time trying to integrate O_DIRECT locking with
page cache locking. The basic theory is that instead of using
semaphores for solving O_DIRECT vs buffered races, you put something
into the radix tree (I call it a placeholder) t
Bill Davidsen <[EMAIL PROTECTED]> wrote:
> My point is, that there is code to handle sparse data now, without
> O_DIRECT involved, and if O_DIRECT bypasses that, it's not a problem
> with the idea of O_DIRECT, the kernel has a security problem.
The idea of O_DIRECT is to bypass the pagecache, and
On Sat, 13 Jan 2007, Bill Davidsen wrote:
> Bodo Eggert wrote:
>
> > (*) This would allow fadvise_size(), too, which could reduce fragmentation
> > (and give an early warning on full disks) without forcing e.g. fat to
> > zero all blocks. OTOH, fadvise_size() would allow users to reserve
Michael Tokarev wrote:
Bill Davidsen wrote:
If I got it right (and please someone tell me if I *really* got it right!),
the problem is elsewhere.
Suppose you have a filesystem, not at all related to databases and stuff.
Your usual root filesystem, with your /etc/ /var and so on directories.
On 1/12/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
On Fri, 12 Jan 2007 15:35:09 -0700
Erik Andersen <[EMAIL PROTECTED]> wrote:
> On Fri Jan 12, 2007 at 05:09:09PM -0500, Linus Torvalds wrote:
> > I suspect a lot of people actually have other reasons to avoid caches.
> >
> > For example, the re
Bill Davidsen wrote:
> Linus Torvalds wrote:
>>
[]
>> But what O_DIRECT does right now is _not_ really sensible, and the
>> O_DIRECT propeller-heads seem to have some problem even admitting that
>> there _is_ a problem, because they don't care.
>
> You say that as if it were a failing. Currently
Linus Torvalds wrote:
On Sat, 13 Jan 2007, Michael Tokarev wrote:
(No, really - this load isn't entirely synthetic. It's a typical database
workload - random I/O all over, on a large file. If it can, it combines
several I/Os into one, by requesting more than a single block at a time,
but over
Bodo Eggert wrote:
(*) This would allow fadvise_size(), too, which could reduce fragmentation
(and give an early warning on full disks) without forcing e.g. fat to
zero all blocks. OTOH, fadvise_size() would allow users to reserve the
complete disk space without his filesizes reflect
Linus Torvalds <[EMAIL PROTECTED]> wrote:
> On Sat, 13 Jan 2007, Michael Tokarev wrote:
>> (No, really - this load isn't entirely synthetic. It's a typical database
>> workload - random I/O all over, on a large file. If it can, it combines
>> several I/Os into one, by requesting more than a sing
Bill Davidsen wrote:
The point is that if you want to be able to allocate at all, sometimes
you will have to write dirty pages, garbage collect, and move or swap
programs. The hardware is just too limited to do something less painful,
and the user can't see memory to do things better. Linus is
On Fri, 12 Jan 2007 15:35:09 -0700
Erik Andersen <[EMAIL PROTECTED]> wrote:
> On Fri Jan 12, 2007 at 05:09:09PM -0500, Linus Torvalds wrote:
> > I suspect a lot of people actually have other reasons to avoid caches.
> >
> > For example, the reason to do O_DIRECT may well not be that you want to
On Fri Jan 12, 2007 at 05:09:09PM -0500, Linus Torvalds wrote:
> I suspect a lot of people actually have other reasons to avoid caches.
>
> For example, the reason to do O_DIRECT may well not be that you want to
> avoid caching per se, but simply because you want to limit page cache
> activity.
Linus Torvalds wrote:
>
> On Sat, 13 Jan 2007, Michael Tokarev wrote:
>>> At that point, O_DIRECT would be a way of saying "we're going to do
>>> uncached accesses to this pre-allocated file". Which is a half-way
>>> sensible thing to do.
>> Half-way?
>
> I suspect a lot of people actually have
On Sat, 13 Jan 2007, Michael Tokarev wrote:
> >
> > At that point, O_DIRECT would be a way of saying "we're going to do
> > uncached accesses to this pre-allocated file". Which is a half-way
> > sensible thing to do.
>
> Half-way?
I suspect a lot of people actually have other reasons to avoi
Zan Lynx wrote:
> On Sat, 2007-01-13 at 00:03 +0300, Michael Tokarev wrote:
> [snip]
>> And sure thing, withOUT O_DIRECT, the whole system is almost dead under this
>> load - because everything is thrown away from the cache, even caches of /bin
>> /usr/bin etc... ;) (For that, fadvise() seems to h
On Sat, 2007-01-13 at 00:03 +0300, Michael Tokarev wrote:
[snip]
> And sure thing, withOUT O_DIRECT, the whole system is almost dead under this
> load - because everything is thrown away from the cache, even caches of /bin
> /usr/bin etc... ;) (For that, fadvise() seems to help a bit, but not alot
Linus Torvalds wrote:
[]
> My point is that you can get basically ALL THE SAME GOOD BEHAVIOUR without
> having all the BAD behaviour that O_DIRECT adds.
*This* point I got from the beginning, once I tried to think how it all
is done internally (I never thought about that, because I'm not a kernel
On Sat, 13 Jan 2007, Michael Tokarev wrote:
>
> (No, really - this load isn't entirely synthetic. It's a typical database
> workload - random I/O all over, on a large file. If it can, it combines
> several I/Os into one, by requesting more than a single block at a time,
> but overall it is ran
Michael Tokarev wrote:
> Michael Tokarev wrote:
> By the way. I just ran - for fun - a read test of a raid array.
>
> Reading blocks of size 512kbytes, starting at random places on a 400Gb
> array, doing 64threads.
>
> O_DIRECT: 336.73 MB/sec.
> !O_DIRECT: 146.00 MB/sec.
And when turning off r
Michael Tokarev wrote:
[]
> After all the explanations, I still don't see anything wrong with the
> interface itself. O_DIRECT isn't "different semantics" - we're still
> writing and reading some data. Yes, O_DIRECT and non-O_DIRECT usages
> somewhat contradicts with each other, but there are oth
Chris Mason wrote:
[]
> I recently spent some time trying to integrate O_DIRECT locking with
> page cache locking. The basic theory is that instead of using
> semaphores for solving O_DIRECT vs buffered races, you put something
> into the radix tree (I call it a placeholder) to keep the page cache
On Fri, Jan 12, 2007 at 10:06:22AM -0800, Linus Torvalds wrote:
> > looking at the splice(2) api it seems like it'll be difficult to implement
> > O_DIRECT pread/pwrite from userland using splice... so there'd need to be
> > some help there.
>
> You'd use vmsplice() to put the write buffers int
On Thu, 11 Jan 2007, dean gaudet wrote:
>
> it seems to me that if splice and fadvise and related things are
> sufficient for userland to take care of things "properly" then O_DIRECT
> could be changed into splice/fadvise calls either by a library or in the
> kernel directly...
The problem i
Linus Torvalds wrote:
O_DIRECT is still crazily racy versus pagecache operations.
>>>
>>>Yes. O_DIRECT is really fundamentally broken. There's just no way to fix
>>>it sanely.
>>
>>How about aliasing O_DIRECT to POSIX_FADV_NOREUSE (sortof) ?
>
>
> That is what I think some users could do. If
Linus Torvalds wrote:
>>OK, madvise() used with mmap'ed file allows to have reads from a file
>>with zero-copy between kernel/user buffers and don't pollute cache
>>memory unnecessarily. But how about writes? How is to do zero-copy
>>writes to a file and don't pollute cache memory without using O_D
dean gaudet wrote:
it seems to me that if splice and fadvise and related things are
sufficient for userland to take care of things "properly" then O_DIRECT
could be changed into splice/fadvise calls either by a library or in the
kernel directly...
No, because the semantics are entirely differ
Hua Zhong wrote:
The other problem besides the inability to handle IO errors is that
mmap()+msync() is synchronous. You need to go async to keep
the pipelines full.
msync(addr, len, MS_ASYNC); doesn't do what you want?
No, because there is no notification of completion. In fact, does this
Aubrey wrote:
On 1/12/07, Nick Piggin <[EMAIL PROTECTED]> wrote:
Linus Torvalds wrote:
>
> On Fri, 12 Jan 2007, Nick Piggin wrote:
>
>>We are talking about about fragmentation. And limiting pagecache to
try to
>>avoid fragmentation is a bandaid, especially when the problem can
be solved
>>(no
On Thu, 11 Jan 2007, Linus Torvalds wrote:
> On Thu, 11 Jan 2007, Viktor wrote:
> >
> > OK, madvise() used with mmap'ed file allows to have reads from a file
> > with zero-copy between kernel/user buffers and don't pollute cache
> > memory unnecessarily. But how about writes? How is to do zero-co
On 1/12/07, Nick Piggin <[EMAIL PROTECTED]> wrote:
Linus Torvalds wrote:
>
> On Fri, 12 Jan 2007, Nick Piggin wrote:
>
>>We are talking about about fragmentation. And limiting pagecache to try to
>>avoid fragmentation is a bandaid, especially when the problem can be solved
>>(not just papered ove
On Fri, 12 Jan 2007, Nick Piggin wrote:
>
> Yeah *smallish* higher order allocations are fine, and we use them all the
> time for things like stacks or networking.
>
> But Aubrey (who somehow got removed from the cc list) wants to do order 9
> allocations from userspace in his nommu environment
Nick Piggin wrote:
Linus Torvalds wrote:
Very basic issue: the perfect is the enemy of the good. Claiming that
there is a "proper solution" is usually a total red herring. Quite
often there isn't, and the "paper over" is actually not papering over,
it's quite possibly the best solution there
Linus Torvalds wrote:
On Fri, 12 Jan 2007, Nick Piggin wrote:
We are talking about about fragmentation. And limiting pagecache to try to
avoid fragmentation is a bandaid, especially when the problem can be solved
(not just papered over, but solved) in userspace.
It's not clear that the prob
On Fri, 12 Jan 2007, Nick Piggin wrote:
>
> We are talking about about fragmentation. And limiting pagecache to try to
> avoid fragmentation is a bandaid, especially when the problem can be solved
> (not just papered over, but solved) in userspace.
It's not clear that the problem _can_ be solved
Bill Davidsen wrote:
Nick Piggin wrote:
Aubrey wrote:
Exactly, and the *real* fix is to modify userspace not to make >
PAGE_SIZE
mallocs[*] if it is to be nommu friendly. It is the kernel hacks to do
things
like limit cache size that are the bandaids.
Tuning the system to work appropriat
Limiting total page cache can be considered first. Only if total page
cache overrun limit, check whether the file overrun its per-file
limit. If it is true, release partial page cache and wake up kswapd at
the same time.
On 1/12/07, Aubrey <[EMAIL PROTECTED]> wrote:
On 1/11/07, Roy Huang <[EMAIL
Aubrey wrote:
On 1/11/07, Roy Huang <[EMAIL PROTECTED]> wrote:
On a embedded systerm, limiting page cache can relieve memory
fragmentation. There is a patch against 2.6.19, which limit every
opened file page cache and total pagecache. When the limit reach, it
will release the page cache overrun
Nick Piggin wrote:
Aubrey wrote:
On 1/11/07, Nick Piggin <[EMAIL PROTECTED]> wrote:
What you _really_ want to do is avoid large mallocs after boot, or use
a CPU with an mmu. I don't think nommu linux was ever intended to be a
simple drop in replacement for a normal unix kernel.
Is there a
On 1/11/07, Roy Huang <[EMAIL PROTECTED]> wrote:
On a embedded systerm, limiting page cache can relieve memory
fragmentation. There is a patch against 2.6.19, which limit every
opened file page cache and total pagecache. When the limit reach, it
will release the page cache overrun the limit.
Th
linux-os (Dick Johnson) wrote:
On Wed, 10 Jan 2007, Aubrey wrote:
Hi all,
Opening file with O_DIRECT flag can do the un-buffered read/write access.
So if I need un-buffered access, I have to change all of my
applications to add this flag. What's more, Some scripts like "cp
oldfile newfile" sti
> The other problem besides the inability to handle IO errors is that
> mmap()+msync() is synchronous. You need to go async to keep
> the pipelines full.
msync(addr, len, MS_ASYNC); doesn't do what you want?
> Now if someone wants to implement an aio version of msync and
> mlock, that might do
Michael Tokarev wrote:
Linus Torvalds wrote:
On Thu, 11 Jan 2007, Viktor wrote:
OK, madvise() used with mmap'ed file allows to have reads from a file
with zero-copy between kernel/user buffers and don't pollute cache
memory unnecessarily. But how about writes? How is to do zero-copy
writes to a
On Thu, 2007-01-11 at 11:00 -0800, Linus Torvalds wrote:
>
> On Thu, 11 Jan 2007, Trond Myklebust wrote:
> >
> > For NFS, the main feature of interest when it comes to O_DIRECT is
> > strictly uncached I/O. Replacing it with POSIX_FADV_NOREUSE won't help
> > because it can't guarantee that the pa
On Thu, 11 Jan 2007, Trond Myklebust wrote:
>
> For NFS, the main feature of interest when it comes to O_DIRECT is
> strictly uncached I/O. Replacing it with POSIX_FADV_NOREUSE won't help
> because it can't guarantee that the page will be thrown out of the page
> cache before some second process
On Thu, 2007-01-11 at 09:04 -0800, Linus Torvalds wrote:
> That is what I think some users could do. If the main issue with O_DIRECT
> is the page cache allocations, if we instead had better (read: "any")
> support for POSIX_FADV_NOREUSE, one class of reasons O_DIRECT usage would
> just go away.
On Thu, 11 Jan 2007, Alan wrote:
>
> Well you can - its called SG_IO and that really does get the OS out of
> the way. O_DIRECT gets crazy when you stop using it on devices directly
> and use it on files
Well, on a raw disk, O_DIRECT is fine too, but yeah, you might as well
use SG_IO at that p
> space, just as an example) is wrong in the first place, but the really
> subtle problems come when you realize that you can't really just "bypass"
> the OS.
Well you can - its called SG_IO and that really does get the OS out of
the way. O_DIRECT gets crazy when you stop using it on devices dir
Linus Torvalds wrote:
>
> On Thu, 11 Jan 2007, Viktor wrote:
>> OK, madvise() used with mmap'ed file allows to have reads from a file
>> with zero-copy between kernel/user buffers and don't pollute cache
>> memory unnecessarily. But how about writes? How is to do zero-copy
>> writes to a file and
On Thu, 11 Jan 2007, Xavier Bestel wrote:
> Le jeudi 11 janvier 2007 à 07:50 -0800, Linus Torvalds a écrit :
> > > O_DIRECT is still crazily racy versus pagecache operations.
> >
> > Yes. O_DIRECT is really fundamentally broken. There's just no way to fix
> > it sanely.
>
> How about aliasing
Le jeudi 11 janvier 2007 à 07:50 -0800, Linus Torvalds a écrit :
> > O_DIRECT is still crazily racy versus pagecache operations.
>
> Yes. O_DIRECT is really fundamentally broken. There's just no way to fix
> it sanely.
How about aliasing O_DIRECT to POSIX_FADV_NOREUSE (sortof) ?
Xav
-
On Thu, 11 Jan 2007, Roy Huang wrote:
>
> On a embedded systerm, limiting page cache can relieve memory
> fragmentation. There is a patch against 2.6.19, which limit every
> opened file page cache and total pagecache. When the limit reach, it
> will release the page cache overrun the limit.
I do
1 - 100 of 130 matches
Mail list logo