Re: O_DIRECT please; Sybase 12.5

2001-07-05 Thread Andrew Morton

Andrea Arcangeli wrote:
> 
> On Fri, Jul 06, 2001 at 12:28:15AM +1000, Andrew Morton wrote:
> > ext3 journals data.  That's unique and it breaks things (or rather,
> > things break it).   It'd be trivial to support O_DIRECT in ext3's
> > writeback mode (metadata-only), but nobody uses that.
> 
> I thought everybody uses metadata-only to avoid killing data-write
> performance.

ext3 has three modes:

data=journal

Data is journalled.  Yes, this slows things down
significantly.

data=ordered

The default mode and the most popular.  All data is written
to disk prior to a commit.  Write throughput is good, and
you don't have uninitialised data in your files after a
crash.

data=writeback

Metadata-only.   Better write throughput (in dbench, anyway),
but only metadata integrity is preserved after a crash. ie:
fsck says the fs is fine, but files can (and almost always do)
contain random stuff after a crash.

Ordered data mode is really nice.  It's not magical though - for example,
if you reset the machine during a kernel build, a subsequent `make' will
fail because you have a number of .o files which have zero length.
That's the length they happened to have when the machine went down.

For ordered-data mode we need to keep track of all the buffers which
are associated with a transaction's journalled metadata and ensure that
they are written out before the transaction commits.  That is done with
a little structure which hangs off ->b_private.

> So I thought it was ok to at first support O_DIRECT only
> for metadata journaling, doing that should be a three liner as you said
> and that is what I expected.

Yup.  metadata-only journalling is all-round much, much simpler.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: O_DIRECT please; Sybase 12.5

2001-07-05 Thread Andrea Arcangeli

On Fri, Jul 06, 2001 at 12:28:15AM +1000, Andrew Morton wrote:
> ext3 journals data.  That's unique and it breaks things (or rather,
> things break it).   It'd be trivial to support O_DIRECT in ext3's
> writeback mode (metadata-only), but nobody uses that.

I thought everybody uses metadata-only to avoid killing data-write
performance. So I thought it was ok to at first support O_DIRECT only
for metadata journaling, doing that should be a three liner as you said
and that is what I expected.

> >From a quick look it seems that we'll need fs-private implementations
> of generic_direct_IO() and brw_kiovec() at least.

brw_kiovec is called by generic_direct_IO, so yes, all you need is a
private generic_direct_IO implementation to deal with the journaled data
writes.

> I'll take a closer look.

OK, thanks!

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: O_DIRECT please; Sybase 12.5

2001-07-05 Thread Andrew Morton

Andrea Arcangeli wrote:
> 
> Andrew Morton took care of ext3 O_DIRECT support (included into the ext3
> patch and conditional to #ifdef KERNEL_HAS_O_DIRECT that he asked me to
> add to the latest o_direct patches). (you know O_DIRECT is 99% common
> code, so supporting new fs is almost a no brainer)

Sorry, haven't looked at that yet.

ext3 journals data.  That's unique and it breaks things (or rather,
things break it).   It'd be trivial to support O_DIRECT in ext3's
writeback mode (metadata-only), but nobody uses that.

>From a quick look it seems that we'll need fs-private implementations
of generic_direct_IO() and brw_kiovec() at least.

I'll take a closer look.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: O_DIRECT please; Sybase 12.5

2001-07-05 Thread Andrea Arcangeli

On Fri, Jun 29, 2001 at 10:50:15AM +0100, Alan Cox wrote:
> > the boss say "If Linux makes Sybase go through the page cache on
> > reads, maybe we'll just have to switch to Solaris.  That's
> > a serious performance problem."
> 
> Thats something you'd have to benchmark. It depends on a very large number
> of factors including whether the database uses mmap, the average I/O size
> and the like

correct, here the benchmarks:

http://boudicca.tux.org/hypermail/linux-kernel/2001week17/1175.html

http://boudicca.tux.org/hypermail/linux-kernel/2001week17/att-1175/01-directio.png

of course the huge improvement is also because of broken VM in the
buffered-io case.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: O_DIRECT please; Sybase 12.5

2001-07-05 Thread Andrea Arcangeli

On Fri, Jun 29, 2001 at 02:39:00AM -0700, Dan Kegel wrote:
> At work I had to sit through a meeting where I heard
> the boss say "If Linux makes Sybase go through the page cache on
> reads, maybe we'll just have to switch to Solaris.  That's
> a serious performance problem."
> All I could say was "I expect Linux will support O_DIRECT
> soon, and Sybase will support that within a year."  
> 
> Er, so did I promise too much?  Andrea mentioned O_DIRECT recently
> ( http://marc.theaimsgroup.com/?l=linux-kernel&m=99253913516599&w=2,
>  http://lwn.net/2001/0510/bigpage.php3 )
> Is it supported yet in 2.4, or is this a 2.5 thing?

all 2.4 kernel in SuSE 7.2 ships with O_DIRECT enabled by default for
ext2, just open your files with O_DIRECT as luser and there you go.
Today I got in my inbox a patch from Chris Wedgwood for reiserfs, and
Andrew Morton took care of ext3 O_DIRECT support (included into the ext3
patch and conditional to #ifdef KERNEL_HAS_O_DIRECT that he asked me to
add to the latest o_direct patches). (you know O_DIRECT is 99% common
code, so supporting new fs is almost a no brainer)

I will send the o_direct patch to Linus for 2.4 too but possibly this is
2.5 material, however I will fully support it for 2.4 too indeed as it
is rock solid and you can just use it in production, same thing that
everybody has to do for rawio in 2.2.

I will release a new patch soon against 2.4.7pre2 in the next aa
patchkit as soon as I finished to synchronize my tree.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: O_DIRECT please; Sybase 12.5

2001-07-03 Thread Stephen C. Tweedie

Hi,

On Tue, Jul 03, 2001 at 08:10:39AM -0700, Daryll Strauss wrote:

> I recall hearing about a problem with the md device and raw IO. It was
> something about the block sizes not matching causing performance
> problems. Has anything been done to improve those issues?

The problem is a combination of two things.  First, raw IO is always
fully synchronous, so with raw IO (and O_DIRECT) you are, in effect,
explicitly instructing the kernel not to do any readahead.  That makes
it hard to keep two disks running in parallel with soft raid if you
are using small IOs, obviously.

Secondly, raw IO pins buffers in physical memory, and to avoid
causing serious VM problems due to having too much unswappable memory
pinned by arbitrary applications, the current raw IO driver limits the
pinned chunk size to 64k.  That, combined with the sequential nature
of raw IO, can limit performance, certainly.

Raw IO is quite capable of running with larger chunk sizes, but we
really need a kernel limiter of some description to prevent users from
using this mechanism to pin massive amounts of memory for raw IO at
once.  There are several candidate mechanisms for that, but none in
the main kernel right now.

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: O_DIRECT please; Sybase 12.5

2001-07-03 Thread Daryll Strauss

On Tue, Jul 03, 2001 at 10:42:53AM +0100, Stephen C. Tweedie wrote:
> On Fri, Jun 29, 2001 at 02:39:00AM -0700, Dan Kegel wrote:
> 
> > It supports raw partitions, which is good; that might satisfy my
> > boss (although the administration will be a pain, and I'm not
> > sure whether it's really supported by Dell RAID devices).
> 
> All block devices support raw IO --- the raw IO mechanism talks to the
> device driver through the normal kernel-internal block IO entry
> points.
> 
> > I'd prefer O_DIRECT :-(
> 
> Andrea Arcangeli has already posted patches you can try for ext2.  The
> functionality isn't in the mainline kernel yet, though.

I recall hearing about a problem with the md device and raw IO. It was
something about the block sizes not matching causing performance
problems. Has anything been done to improve those issues?

- |Daryll
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: O_DIRECT please; Sybase 12.5

2001-07-03 Thread Stephen C. Tweedie

Hi,

On Fri, Jun 29, 2001 at 02:39:00AM -0700, Dan Kegel wrote:

> It supports raw partitions, which is good; that might satisfy my
> boss (although the administration will be a pain, and I'm not
> sure whether it's really supported by Dell RAID devices).

All block devices support raw IO --- the raw IO mechanism talks to the
device driver through the normal kernel-internal block IO entry
points.

> I'd prefer O_DIRECT :-(

Andrea Arcangeli has already posted patches you can try for ext2.  The
functionality isn't in the mainline kernel yet, though.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: O_DIRECT please; Sybase 12.5

2001-06-29 Thread Steve Lord


XFS supports O_DIRECT on linux, has done for a while.

Steve

> At work I had to sit through a meeting where I heard
> the boss say "If Linux makes Sybase go through the page cache on
> reads, maybe we'll just have to switch to Solaris.  That's
> a serious performance problem."
> All I could say was "I expect Linux will support O_DIRECT
> soon, and Sybase will support that within a year."  
> 
> Er, so did I promise too much?  Andrea mentioned O_DIRECT recently
> ( http://marc.theaimsgroup.com/?l=linux-kernel&m=99253913516599&w=2,
>  http://lwn.net/2001/0510/bigpage.php3 )
> Is it supported yet in 2.4, or is this a 2.5 thing?
> 
> And what are the chances Sybase will support that flag any time
> soon?  I just read on news://forums.sybase.com/sybase.public.ase.linux
> that Sybase ASE 12.5 was released today, and a 60 day eval is downloadable
> for NT and Linux.  I'm downloading now; it's a biggie.
> 
> It supports raw partitions, which is good; that might satisfy my
> boss (although the administration will be a pain, and I'm not
> sure whether it's really supported by Dell RAID devices).
> I'd prefer O_DIRECT :-(
> 
> Hope somebody can give me encouraging news.
> 
> Thanks,
> Dan
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: O_DIRECT please; Sybase 12.5

2001-06-29 Thread Mike Harrold

> 
> Alan Cox wrote:
> > 
> > > the boss say "If Linux makes Sybase go through the page cache on
> > > reads, maybe we'll just have to switch to Solaris.  That's
> > > a serious performance problem."
> > 
> > Thats something you'd have to benchmark. It depends on a very large number
> > of factors including whether the database uses mmap, the average I/O size
> > and the like
> 
> I'll probably benchmark raw vs. non-raw I/O with Sybase ASE 12.5
> on our application once we've come up to speed on basic performance
> issues (we're database newbies).

Quite obviously. One of the primary things a DBA is supposed to do is ensure
that the disk is accessed as *few* times as possible. What size database do
you have? How much memory has the machine have? How much memory does the
database have? How many engines is the database running?

We can take this off-list if you want, but disk I/O shouldn't really be an
issue for any database as long as other parameters are set correctly. Sybase
recommends raw devices *not* because they are faster, but because it's the
only way that they (Sybase) can guarantee the data is actually written to
disk (legal liability, etc.).

/Mike (Sybase DBA)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: O_DIRECT please; Sybase 12.5

2001-06-29 Thread Andi Kleen

Dan Kegel <[EMAIL PROTECTED]> writes:
> 
> And what are the chances Sybase will support that flag any time
> soon?  I just read on news://forums.sybase.com/sybase.public.ase.linux

When Sybase always submits its buffers block aligned (same requirement as
for raw io) you can do it with a simple LD_PRELOAD hack.

I hacked sapdb (which has source available unlike sybase) to do direct IO 
and it seems to not hurt at least.

> It supports raw partitions, which is good; that might satisfy my
> boss (although the administration will be a pain, and I'm not
> sure whether it's really supported by Dell RAID devices).
> I'd prefer O_DIRECT :-(

LVM makes raw partitions much less worse than they used to be. It is 
basically a file system of raw partitions; allowing you to move and resize
them.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: O_DIRECT please; Sybase 12.5

2001-06-29 Thread Dan Kegel

Alan Cox wrote:
> 
> > the boss say "If Linux makes Sybase go through the page cache on
> > reads, maybe we'll just have to switch to Solaris.  That's
> > a serious performance problem."
> 
> Thats something you'd have to benchmark. It depends on a very large number
> of factors including whether the database uses mmap, the average I/O size
> and the like

I'll probably benchmark raw vs. non-raw I/O with Sybase ASE 12.5
on our application once we've come up to speed on basic performance
issues (we're database newbies).
 
> > It supports raw partitions, which is good; that might satisfy my
> > boss (although the administration will be a pain, and I'm not
> > sure whether it's really supported by Dell RAID devices).
> > I'd prefer O_DIRECT :-(
> 
> We already support raw direct I/O to devices themselves so they should support
> that - if not then Oracle I believe already does.

Haven't seen Sybase talk about O_DIRECT.  Not sure we want to
pony up the Sybase license fees.  (I'm still in denial about
databases in general, and hope I can switch to PostgreSQL
at some point.)

BTW, 
http://eval.veritas.com/webfiles/whitepapers/sybaseedition/sybase14_performance_paper.pdf
seems to show that raw beats O_DIRECT hands down on Solaris.
Will that hold on Linux, or is your (forthcoming?) O_DIRECT
higher performance than the one on Solaris?

Thanks,
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: O_DIRECT please; Sybase 12.5

2001-06-29 Thread Alan Cox

> the boss say "If Linux makes Sybase go through the page cache on
> reads, maybe we'll just have to switch to Solaris.  That's
> a serious performance problem."

Thats something you'd have to benchmark. It depends on a very large number
of factors including whether the database uses mmap, the average I/O size
and the like

> It supports raw partitions, which is good; that might satisfy my
> boss (although the administration will be a pain, and I'm not
> sure whether it's really supported by Dell RAID devices).
> I'd prefer O_DIRECT :-(

We already support raw direct I/O to devices themselves so they should support
that - if not then Oracle I believe already does.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



O_DIRECT please; Sybase 12.5

2001-06-29 Thread Dan Kegel

At work I had to sit through a meeting where I heard
the boss say "If Linux makes Sybase go through the page cache on
reads, maybe we'll just have to switch to Solaris.  That's
a serious performance problem."
All I could say was "I expect Linux will support O_DIRECT
soon, and Sybase will support that within a year."  

Er, so did I promise too much?  Andrea mentioned O_DIRECT recently
( http://marc.theaimsgroup.com/?l=linux-kernel&m=99253913516599&w=2,
 http://lwn.net/2001/0510/bigpage.php3 )
Is it supported yet in 2.4, or is this a 2.5 thing?

And what are the chances Sybase will support that flag any time
soon?  I just read on news://forums.sybase.com/sybase.public.ase.linux
that Sybase ASE 12.5 was released today, and a 60 day eval is downloadable
for NT and Linux.  I'm downloading now; it's a biggie.

It supports raw partitions, which is good; that might satisfy my
boss (although the administration will be a pain, and I'm not
sure whether it's really supported by Dell RAID devices).
I'd prefer O_DIRECT :-(

Hope somebody can give me encouraging news.

Thanks,
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/