Re: O_DIRECT please; Sybase 12.5
Andrea Arcangeli wrote: > > On Fri, Jul 06, 2001 at 12:28:15AM +1000, Andrew Morton wrote: > > ext3 journals data. That's unique and it breaks things (or rather, > > things break it). It'd be trivial to support O_DIRECT in ext3's > > writeback mode (metadata-only), but nobody uses that. > > I thought everybody uses metadata-only to avoid killing data-write > performance. ext3 has three modes: data=journal Data is journalled. Yes, this slows things down significantly. data=ordered The default mode and the most popular. All data is written to disk prior to a commit. Write throughput is good, and you don't have uninitialised data in your files after a crash. data=writeback Metadata-only. Better write throughput (in dbench, anyway), but only metadata integrity is preserved after a crash. ie: fsck says the fs is fine, but files can (and almost always do) contain random stuff after a crash. Ordered data mode is really nice. It's not magical though - for example, if you reset the machine during a kernel build, a subsequent `make' will fail because you have a number of .o files which have zero length. That's the length they happened to have when the machine went down. For ordered-data mode we need to keep track of all the buffers which are associated with a transaction's journalled metadata and ensure that they are written out before the transaction commits. That is done with a little structure which hangs off ->b_private. > So I thought it was ok to at first support O_DIRECT only > for metadata journaling, doing that should be a three liner as you said > and that is what I expected. Yup. metadata-only journalling is all-round much, much simpler. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: O_DIRECT please; Sybase 12.5
On Fri, Jul 06, 2001 at 12:28:15AM +1000, Andrew Morton wrote: > ext3 journals data. That's unique and it breaks things (or rather, > things break it). It'd be trivial to support O_DIRECT in ext3's > writeback mode (metadata-only), but nobody uses that. I thought everybody uses metadata-only to avoid killing data-write performance. So I thought it was ok to at first support O_DIRECT only for metadata journaling, doing that should be a three liner as you said and that is what I expected. > >From a quick look it seems that we'll need fs-private implementations > of generic_direct_IO() and brw_kiovec() at least. brw_kiovec is called by generic_direct_IO, so yes, all you need is a private generic_direct_IO implementation to deal with the journaled data writes. > I'll take a closer look. OK, thanks! Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: O_DIRECT please; Sybase 12.5
Andrea Arcangeli wrote: > > Andrew Morton took care of ext3 O_DIRECT support (included into the ext3 > patch and conditional to #ifdef KERNEL_HAS_O_DIRECT that he asked me to > add to the latest o_direct patches). (you know O_DIRECT is 99% common > code, so supporting new fs is almost a no brainer) Sorry, haven't looked at that yet. ext3 journals data. That's unique and it breaks things (or rather, things break it). It'd be trivial to support O_DIRECT in ext3's writeback mode (metadata-only), but nobody uses that. >From a quick look it seems that we'll need fs-private implementations of generic_direct_IO() and brw_kiovec() at least. I'll take a closer look. - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: O_DIRECT please; Sybase 12.5
On Fri, Jun 29, 2001 at 10:50:15AM +0100, Alan Cox wrote: > > the boss say "If Linux makes Sybase go through the page cache on > > reads, maybe we'll just have to switch to Solaris. That's > > a serious performance problem." > > Thats something you'd have to benchmark. It depends on a very large number > of factors including whether the database uses mmap, the average I/O size > and the like correct, here the benchmarks: http://boudicca.tux.org/hypermail/linux-kernel/2001week17/1175.html http://boudicca.tux.org/hypermail/linux-kernel/2001week17/att-1175/01-directio.png of course the huge improvement is also because of broken VM in the buffered-io case. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: O_DIRECT please; Sybase 12.5
On Fri, Jun 29, 2001 at 02:39:00AM -0700, Dan Kegel wrote: > At work I had to sit through a meeting where I heard > the boss say "If Linux makes Sybase go through the page cache on > reads, maybe we'll just have to switch to Solaris. That's > a serious performance problem." > All I could say was "I expect Linux will support O_DIRECT > soon, and Sybase will support that within a year." > > Er, so did I promise too much? Andrea mentioned O_DIRECT recently > ( http://marc.theaimsgroup.com/?l=linux-kernel&m=99253913516599&w=2, > http://lwn.net/2001/0510/bigpage.php3 ) > Is it supported yet in 2.4, or is this a 2.5 thing? all 2.4 kernel in SuSE 7.2 ships with O_DIRECT enabled by default for ext2, just open your files with O_DIRECT as luser and there you go. Today I got in my inbox a patch from Chris Wedgwood for reiserfs, and Andrew Morton took care of ext3 O_DIRECT support (included into the ext3 patch and conditional to #ifdef KERNEL_HAS_O_DIRECT that he asked me to add to the latest o_direct patches). (you know O_DIRECT is 99% common code, so supporting new fs is almost a no brainer) I will send the o_direct patch to Linus for 2.4 too but possibly this is 2.5 material, however I will fully support it for 2.4 too indeed as it is rock solid and you can just use it in production, same thing that everybody has to do for rawio in 2.2. I will release a new patch soon against 2.4.7pre2 in the next aa patchkit as soon as I finished to synchronize my tree. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: O_DIRECT please; Sybase 12.5
Hi, On Tue, Jul 03, 2001 at 08:10:39AM -0700, Daryll Strauss wrote: > I recall hearing about a problem with the md device and raw IO. It was > something about the block sizes not matching causing performance > problems. Has anything been done to improve those issues? The problem is a combination of two things. First, raw IO is always fully synchronous, so with raw IO (and O_DIRECT) you are, in effect, explicitly instructing the kernel not to do any readahead. That makes it hard to keep two disks running in parallel with soft raid if you are using small IOs, obviously. Secondly, raw IO pins buffers in physical memory, and to avoid causing serious VM problems due to having too much unswappable memory pinned by arbitrary applications, the current raw IO driver limits the pinned chunk size to 64k. That, combined with the sequential nature of raw IO, can limit performance, certainly. Raw IO is quite capable of running with larger chunk sizes, but we really need a kernel limiter of some description to prevent users from using this mechanism to pin massive amounts of memory for raw IO at once. There are several candidate mechanisms for that, but none in the main kernel right now. Cheers, Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: O_DIRECT please; Sybase 12.5
On Tue, Jul 03, 2001 at 10:42:53AM +0100, Stephen C. Tweedie wrote: > On Fri, Jun 29, 2001 at 02:39:00AM -0700, Dan Kegel wrote: > > > It supports raw partitions, which is good; that might satisfy my > > boss (although the administration will be a pain, and I'm not > > sure whether it's really supported by Dell RAID devices). > > All block devices support raw IO --- the raw IO mechanism talks to the > device driver through the normal kernel-internal block IO entry > points. > > > I'd prefer O_DIRECT :-( > > Andrea Arcangeli has already posted patches you can try for ext2. The > functionality isn't in the mainline kernel yet, though. I recall hearing about a problem with the md device and raw IO. It was something about the block sizes not matching causing performance problems. Has anything been done to improve those issues? - |Daryll - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: O_DIRECT please; Sybase 12.5
Hi, On Fri, Jun 29, 2001 at 02:39:00AM -0700, Dan Kegel wrote: > It supports raw partitions, which is good; that might satisfy my > boss (although the administration will be a pain, and I'm not > sure whether it's really supported by Dell RAID devices). All block devices support raw IO --- the raw IO mechanism talks to the device driver through the normal kernel-internal block IO entry points. > I'd prefer O_DIRECT :-( Andrea Arcangeli has already posted patches you can try for ext2. The functionality isn't in the mainline kernel yet, though. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: O_DIRECT please; Sybase 12.5
XFS supports O_DIRECT on linux, has done for a while. Steve > At work I had to sit through a meeting where I heard > the boss say "If Linux makes Sybase go through the page cache on > reads, maybe we'll just have to switch to Solaris. That's > a serious performance problem." > All I could say was "I expect Linux will support O_DIRECT > soon, and Sybase will support that within a year." > > Er, so did I promise too much? Andrea mentioned O_DIRECT recently > ( http://marc.theaimsgroup.com/?l=linux-kernel&m=99253913516599&w=2, > http://lwn.net/2001/0510/bigpage.php3 ) > Is it supported yet in 2.4, or is this a 2.5 thing? > > And what are the chances Sybase will support that flag any time > soon? I just read on news://forums.sybase.com/sybase.public.ase.linux > that Sybase ASE 12.5 was released today, and a 60 day eval is downloadable > for NT and Linux. I'm downloading now; it's a biggie. > > It supports raw partitions, which is good; that might satisfy my > boss (although the administration will be a pain, and I'm not > sure whether it's really supported by Dell RAID devices). > I'd prefer O_DIRECT :-( > > Hope somebody can give me encouraging news. > > Thanks, > Dan > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: O_DIRECT please; Sybase 12.5
> > Alan Cox wrote: > > > > > the boss say "If Linux makes Sybase go through the page cache on > > > reads, maybe we'll just have to switch to Solaris. That's > > > a serious performance problem." > > > > Thats something you'd have to benchmark. It depends on a very large number > > of factors including whether the database uses mmap, the average I/O size > > and the like > > I'll probably benchmark raw vs. non-raw I/O with Sybase ASE 12.5 > on our application once we've come up to speed on basic performance > issues (we're database newbies). Quite obviously. One of the primary things a DBA is supposed to do is ensure that the disk is accessed as *few* times as possible. What size database do you have? How much memory has the machine have? How much memory does the database have? How many engines is the database running? We can take this off-list if you want, but disk I/O shouldn't really be an issue for any database as long as other parameters are set correctly. Sybase recommends raw devices *not* because they are faster, but because it's the only way that they (Sybase) can guarantee the data is actually written to disk (legal liability, etc.). /Mike (Sybase DBA) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: O_DIRECT please; Sybase 12.5
Dan Kegel <[EMAIL PROTECTED]> writes: > > And what are the chances Sybase will support that flag any time > soon? I just read on news://forums.sybase.com/sybase.public.ase.linux When Sybase always submits its buffers block aligned (same requirement as for raw io) you can do it with a simple LD_PRELOAD hack. I hacked sapdb (which has source available unlike sybase) to do direct IO and it seems to not hurt at least. > It supports raw partitions, which is good; that might satisfy my > boss (although the administration will be a pain, and I'm not > sure whether it's really supported by Dell RAID devices). > I'd prefer O_DIRECT :-( LVM makes raw partitions much less worse than they used to be. It is basically a file system of raw partitions; allowing you to move and resize them. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: O_DIRECT please; Sybase 12.5
Alan Cox wrote: > > > the boss say "If Linux makes Sybase go through the page cache on > > reads, maybe we'll just have to switch to Solaris. That's > > a serious performance problem." > > Thats something you'd have to benchmark. It depends on a very large number > of factors including whether the database uses mmap, the average I/O size > and the like I'll probably benchmark raw vs. non-raw I/O with Sybase ASE 12.5 on our application once we've come up to speed on basic performance issues (we're database newbies). > > It supports raw partitions, which is good; that might satisfy my > > boss (although the administration will be a pain, and I'm not > > sure whether it's really supported by Dell RAID devices). > > I'd prefer O_DIRECT :-( > > We already support raw direct I/O to devices themselves so they should support > that - if not then Oracle I believe already does. Haven't seen Sybase talk about O_DIRECT. Not sure we want to pony up the Sybase license fees. (I'm still in denial about databases in general, and hope I can switch to PostgreSQL at some point.) BTW, http://eval.veritas.com/webfiles/whitepapers/sybaseedition/sybase14_performance_paper.pdf seems to show that raw beats O_DIRECT hands down on Solaris. Will that hold on Linux, or is your (forthcoming?) O_DIRECT higher performance than the one on Solaris? Thanks, Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: O_DIRECT please; Sybase 12.5
> the boss say "If Linux makes Sybase go through the page cache on > reads, maybe we'll just have to switch to Solaris. That's > a serious performance problem." Thats something you'd have to benchmark. It depends on a very large number of factors including whether the database uses mmap, the average I/O size and the like > It supports raw partitions, which is good; that might satisfy my > boss (although the administration will be a pain, and I'm not > sure whether it's really supported by Dell RAID devices). > I'd prefer O_DIRECT :-( We already support raw direct I/O to devices themselves so they should support that - if not then Oracle I believe already does. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
O_DIRECT please; Sybase 12.5
At work I had to sit through a meeting where I heard the boss say "If Linux makes Sybase go through the page cache on reads, maybe we'll just have to switch to Solaris. That's a serious performance problem." All I could say was "I expect Linux will support O_DIRECT soon, and Sybase will support that within a year." Er, so did I promise too much? Andrea mentioned O_DIRECT recently ( http://marc.theaimsgroup.com/?l=linux-kernel&m=99253913516599&w=2, http://lwn.net/2001/0510/bigpage.php3 ) Is it supported yet in 2.4, or is this a 2.5 thing? And what are the chances Sybase will support that flag any time soon? I just read on news://forums.sybase.com/sybase.public.ase.linux that Sybase ASE 12.5 was released today, and a 60 day eval is downloadable for NT and Linux. I'm downloading now; it's a biggie. It supports raw partitions, which is good; that might satisfy my boss (although the administration will be a pain, and I'm not sure whether it's really supported by Dell RAID devices). I'd prefer O_DIRECT :-( Hope somebody can give me encouraging news. Thanks, Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/