Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
On Tuesday February 19, [EMAIL PROTECTED] wrote: > On Mon, Feb 18, 2008 at 04:24:27PM +0300, Michael Tokarev wrote: > > First, I still don't understand why in God's sake barriers are "working" > > while regular cache flushes are not. Almost no consumer-grade hard drive > > supports write barriers, but they all support regular cache flushes, and > > the latter should be enough (while not the most speed-optimal) to ensure > > data safety. Why to require write cache disable (like in XFS FAQ) instead > > of going the flush-cache-when-appropriate (as opposed to write-barrier- > > when-appropriate) way? > > Devil's advocate: > > Why should we need to support multiple different block layer APIs > to do the same thing? Surely any hardware that doesn't support barrier > operations can emulate them with cache flushes when they receive a > barrier I/O from the filesystem The simple answer to "why multiple APIs" is "different performance trade-offs". If barriers are implemented in at the end of the pipeline, they can presumably be reasonably cheap. If they have to be implemented at the top of the pipeline, thus stalling the whole pipeline, they are likely to be more expensive. A filesystem may be able to mitigate the expense if it knows something about the purpose of the data. e.g. ext3 in data=writeback mode could wait only for journal writes to complete before submitting the (would-be) barrier write of the commit block, and would not bother to wait for data writes. However, consistent APIs are also a good thing. I would easily accept an argument that a BIO_RW_BARRER request must *always* be correctly ordered around all other requests to the same device. If a layered device cannot get the service it requires from lower level devices, it must do that flush/write/wait itself. That should be paired with a way for the upper levels to find out how efficient barriers are. I guess the three levels of barrier efficiency are: 1/ handled above the elevator - least efficient 2/ handled between elevator and device (by 'flush request'), medium 3/ handled inside device (e.g. ordered SCSI request) most efficient. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
On Monday February 18, [EMAIL PROTECTED] wrote: > > > I'll put it even more strongly. My experience is that disabling write > cache plus disabling barriers is often much faster than enabling both > barriers and write cache enabled, when doing metadata intensive > operations, as long as you have a drive that is good at CTQ/NCQ. Doesn't this speed gain come at a correctness cost? Barriers aren't only about flushed the write cache. They are also about preventing re-ordering, both at the "elevator" layer and inside the device. If the device does CTQ, it could re-order requests could it not? Then two writes sent from the fs could make it to media in the wrong order, and a power failure in between could corrupt your data. Or am I misunderstanding something ?? (of course this only applies to XFS where disabling barriers means "hope for the best", as opposed to ext3 where disabling barriers means "don't trust the device, impose explicit ordering"). NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
Jeremy Higdon wrote: On Tue, Feb 19, 2008 at 09:16:44AM +1100, David Chinner wrote: On Mon, Feb 18, 2008 at 04:24:27PM +0300, Michael Tokarev wrote: First, I still don't understand why in God's sake barriers are "working" while regular cache flushes are not. Almost no consumer-grade hard drive supports write barriers, but they all support regular cache flushes, and the latter should be enough (while not the most speed-optimal) to ensure data safety. Why to require write cache disable (like in XFS FAQ) instead of going the flush-cache-when-appropriate (as opposed to write-barrier- when-appropriate) way? Devil's advocate: Why should we need to support multiple different block layer APIs to do the same thing? Surely any hardware that doesn't support barrier operations can emulate them with cache flushes when they receive a barrier I/O from the filesystem Also, given that disabling the write cache still allows CTQ/NCQ to operate effectively and that in most cases WCD+CTQ is as fast as WCE+barriers, the simplest thing to do is turn off volatile write caches and not require any extra software kludges for safe operation. I'll put it even more strongly. My experience is that disabling write cache plus disabling barriers is often much faster than enabling both barriers and write cache enabled, when doing metadata intensive operations, as long as you have a drive that is good at CTQ/NCQ. The only time write cache + barriers is significantly faster is when doing single threaded data writes, such as direct I/O, or if CTQ/NCQ is not enabled, or the drive does a poor job at it. jeremy It would be interesting to compare numbers. In the large, single threaded write case, what I have measured is roughly 2x faster writes with barriers/write cache enabled on S-ATA/ATA class drives. I think that this case alone is a fairly common one. For very small file sizes, I have seen write cache off beat barriers + write cache enabled as well but barriers start out performing write cache disabled when you get up to moderate sizes (need to rerun tests to get precise numbers/cross over data). The type of workload is also important. In the test cases that I ran, the application needs to fsync() each file so we beat up on the barrier code pretty heavily. ric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
Jeremy Higdon wrote: On Tue, Feb 19, 2008 at 09:16:44AM +1100, David Chinner wrote: On Mon, Feb 18, 2008 at 04:24:27PM +0300, Michael Tokarev wrote: First, I still don't understand why in God's sake barriers are working while regular cache flushes are not. Almost no consumer-grade hard drive supports write barriers, but they all support regular cache flushes, and the latter should be enough (while not the most speed-optimal) to ensure data safety. Why to require write cache disable (like in XFS FAQ) instead of going the flush-cache-when-appropriate (as opposed to write-barrier- when-appropriate) way? Devil's advocate: Why should we need to support multiple different block layer APIs to do the same thing? Surely any hardware that doesn't support barrier operations can emulate them with cache flushes when they receive a barrier I/O from the filesystem Also, given that disabling the write cache still allows CTQ/NCQ to operate effectively and that in most cases WCD+CTQ is as fast as WCE+barriers, the simplest thing to do is turn off volatile write caches and not require any extra software kludges for safe operation. I'll put it even more strongly. My experience is that disabling write cache plus disabling barriers is often much faster than enabling both barriers and write cache enabled, when doing metadata intensive operations, as long as you have a drive that is good at CTQ/NCQ. The only time write cache + barriers is significantly faster is when doing single threaded data writes, such as direct I/O, or if CTQ/NCQ is not enabled, or the drive does a poor job at it. jeremy It would be interesting to compare numbers. In the large, single threaded write case, what I have measured is roughly 2x faster writes with barriers/write cache enabled on S-ATA/ATA class drives. I think that this case alone is a fairly common one. For very small file sizes, I have seen write cache off beat barriers + write cache enabled as well but barriers start out performing write cache disabled when you get up to moderate sizes (need to rerun tests to get precise numbers/cross over data). The type of workload is also important. In the test cases that I ran, the application needs to fsync() each file so we beat up on the barrier code pretty heavily. ric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
On Monday February 18, [EMAIL PROTECTED] wrote: I'll put it even more strongly. My experience is that disabling write cache plus disabling barriers is often much faster than enabling both barriers and write cache enabled, when doing metadata intensive operations, as long as you have a drive that is good at CTQ/NCQ. Doesn't this speed gain come at a correctness cost? Barriers aren't only about flushed the write cache. They are also about preventing re-ordering, both at the elevator layer and inside the device. If the device does CTQ, it could re-order requests could it not? Then two writes sent from the fs could make it to media in the wrong order, and a power failure in between could corrupt your data. Or am I misunderstanding something ?? (of course this only applies to XFS where disabling barriers means hope for the best, as opposed to ext3 where disabling barriers means don't trust the device, impose explicit ordering). NeilBrown -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
On Tuesday February 19, [EMAIL PROTECTED] wrote: On Mon, Feb 18, 2008 at 04:24:27PM +0300, Michael Tokarev wrote: First, I still don't understand why in God's sake barriers are working while regular cache flushes are not. Almost no consumer-grade hard drive supports write barriers, but they all support regular cache flushes, and the latter should be enough (while not the most speed-optimal) to ensure data safety. Why to require write cache disable (like in XFS FAQ) instead of going the flush-cache-when-appropriate (as opposed to write-barrier- when-appropriate) way? Devil's advocate: Why should we need to support multiple different block layer APIs to do the same thing? Surely any hardware that doesn't support barrier operations can emulate them with cache flushes when they receive a barrier I/O from the filesystem The simple answer to why multiple APIs is different performance trade-offs. If barriers are implemented in at the end of the pipeline, they can presumably be reasonably cheap. If they have to be implemented at the top of the pipeline, thus stalling the whole pipeline, they are likely to be more expensive. A filesystem may be able to mitigate the expense if it knows something about the purpose of the data. e.g. ext3 in data=writeback mode could wait only for journal writes to complete before submitting the (would-be) barrier write of the commit block, and would not bother to wait for data writes. However, consistent APIs are also a good thing. I would easily accept an argument that a BIO_RW_BARRER request must *always* be correctly ordered around all other requests to the same device. If a layered device cannot get the service it requires from lower level devices, it must do that flush/write/wait itself. That should be paired with a way for the upper levels to find out how efficient barriers are. I guess the three levels of barrier efficiency are: 1/ handled above the elevator - least efficient 2/ handled between elevator and device (by 'flush request'), medium 3/ handled inside device (e.g. ordered SCSI request) most efficient. NeilBrown -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Implement barrier support for single device DM devices
On Tue, Feb 19, 2008 at 02:39:00AM +, Alasdair G Kergon wrote: > > For example, how safe > > xfs is if barriers are not supported or turned off? > > The last time we tried xfs with dm it didn't seem to notice -EOPNOTSUPP > everywhere it should => recovery may find corruption. Bug reports, please. What we don't know about, we can't fix. As of this commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0bfefc46dc028df60120acdb92062169c9328769 XFS should be handling all cases of -EOPNOTSUPP for barrier I/Os. If you are still having problems, please let us know. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
> My complaint about having to support them within dm when more than one > device is involved is because any efficiencies disappear: you can't send > further I/O to any one device until all the other devices have completed > their barrier (or else later I/O to that device could overtake the > barrier on another device). And then I argue that it would be better I was wondering: would it help DM to have the concept of a "barrier window" As in "this barrier is only affective for this group of requests" With such a concept DM would need to stall only inside the groups and possible even issue such barrier groups in parallel, couldn't it? I'm sure you guys all have thought far more about barriers than I ever did; if that idea came up before why was it dismissed? -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
My complaint about having to support them within dm when more than one device is involved is because any efficiencies disappear: you can't send further I/O to any one device until all the other devices have completed their barrier (or else later I/O to that device could overtake the barrier on another device). And then I argue that it would be better I was wondering: would it help DM to have the concept of a barrier window As in this barrier is only affective for this group of requests With such a concept DM would need to stall only inside the groups and possible even issue such barrier groups in parallel, couldn't it? I'm sure you guys all have thought far more about barriers than I ever did; if that idea came up before why was it dismissed? -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Implement barrier support for single device DM devices
On Tue, Feb 19, 2008 at 02:39:00AM +, Alasdair G Kergon wrote: For example, how safe xfs is if barriers are not supported or turned off? The last time we tried xfs with dm it didn't seem to notice -EOPNOTSUPP everywhere it should = recovery may find corruption. Bug reports, please. What we don't know about, we can't fix. As of this commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0bfefc46dc028df60120acdb92062169c9328769 XFS should be handling all cases of -EOPNOTSUPP for barrier I/Os. If you are still having problems, please let us know. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
Jeremy Higdon wrote: [] > I'll put it even more strongly. My experience is that disabling write > cache plus disabling barriers is often much faster than enabling both > barriers and write cache enabled, when doing metadata intensive > operations, as long as you have a drive that is good at CTQ/NCQ. Now, and it's VERY interesting at least for me (and is off-topic in this thread) -- which drive(s) are good at NCQ? I tried numerous SATA (NCQ is about sata, right? :) drives, but NCQ either does nothing in terms of performance or hurts. Yesterday we ordered another drive from Hitachi (their "raid edition" thing), -- will try it tomorrow, but I've no hope here as it's some 5th or 6th model/brand already. (Ol'good SCSI drives, even 10 years old, shows large difference when TCQ is enabled...) Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
On Tue, Feb 19, 2008 at 09:16:44AM +1100, David Chinner wrote: > On Mon, Feb 18, 2008 at 04:24:27PM +0300, Michael Tokarev wrote: > > First, I still don't understand why in God's sake barriers are "working" > > while regular cache flushes are not. Almost no consumer-grade hard drive > > supports write barriers, but they all support regular cache flushes, and > > the latter should be enough (while not the most speed-optimal) to ensure > > data safety. Why to require write cache disable (like in XFS FAQ) instead > > of going the flush-cache-when-appropriate (as opposed to write-barrier- > > when-appropriate) way? > > Devil's advocate: > > Why should we need to support multiple different block layer APIs > to do the same thing? Surely any hardware that doesn't support barrier > operations can emulate them with cache flushes when they receive a > barrier I/O from the filesystem > > Also, given that disabling the write cache still allows CTQ/NCQ to > operate effectively and that in most cases WCD+CTQ is as fast as > WCE+barriers, the simplest thing to do is turn off volatile write > caches and not require any extra software kludges for safe > operation. I'll put it even more strongly. My experience is that disabling write cache plus disabling barriers is often much faster than enabling both barriers and write cache enabled, when doing metadata intensive operations, as long as you have a drive that is good at CTQ/NCQ. The only time write cache + barriers is significantly faster is when doing single threaded data writes, such as direct I/O, or if CTQ/NCQ is not enabled, or the drive does a poor job at it. jeremy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
On Tue, Feb 19, 2008 at 02:56:43AM +, Alasdair G Kergon wrote: > On Tue, Feb 19, 2008 at 09:16:44AM +1100, David Chinner wrote: > > Surely any hardware that doesn't support barrier > > operations can emulate them with cache flushes when they receive a > > barrier I/O from the filesystem > > My complaint about having to support them within dm when more than one > device is involved is because any efficiencies disappear: you can't send > further I/O to any one device until all the other devices have completed > their barrier (or else later I/O to that device could overtake the > barrier on another device). Right - it's a horrible performance hit. But - how is what you describe any different to the filesystem doing: - flush block device - issue I/O - wait for completion - flush block device around any I/O that it would otherwise simply tag as a barrier? That serialisation at the filesystem layer is a horrible, horrible performance hi. And then there's the fact that we can't implement that in XFS because all the barrier I/Os we issue are asynchronous. We'd basically have to serialise all metadata operations and now we are talking about far worse performance hits than implementing barrier emulation in the block device. Also, it's instructive to look at the implementation of blkdev_issue_flush() - the API one is supposed to use to trigger a full block device flush. It doesn't work on DM/MD either, because it uses a no-I/O barrier bio: bio->bi_end_io = bio_end_empty_barrier; bio->bi_private = bio->bi_bdev = bdev; submit_bio(1 << BIO_RW_BARRIER, bio); wait_for_completion(); So, if the underlying block device doesn't support barriers, there's no point in changing the filesystem to issue flushes, either... > And then I argue that it would be better > for the filesystem to have the information that these are not hardware > barriers so it has the opportunity of tuning its behaviour (e.g. > flushing less often because it's a more expensive operation). There is generally no option from the filesystem POV to "flush less". Either we use barrier I/Os where we need to and are safe with volatile caches or we corrupt filesystems with volatile caches when power loss occurs. There is no in-between where "flushing less" will save us from corruption Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
On Tue, Feb 19, 2008 at 09:16:44AM +1100, David Chinner wrote: > Surely any hardware that doesn't support barrier > operations can emulate them with cache flushes when they receive a > barrier I/O from the filesystem My complaint about having to support them within dm when more than one device is involved is because any efficiencies disappear: you can't send further I/O to any one device until all the other devices have completed their barrier (or else later I/O to that device could overtake the barrier on another device). And then I argue that it would be better for the filesystem to have the information that these are not hardware barriers so it has the opportunity of tuning its behaviour (e.g. flushing less often because it's a more expensive operation). Alasdair -- [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
On Mon, Feb 18, 2008 at 08:52:10AM -0500, Ric Wheeler wrote: > I understand that. Most of the time, dm or md devices are composed of > uniform components which will uniformly support (or not) the cache flush > commands used by barriers. As a dm developer, it's "almost none of the time" because trivial configurations aren't the ones that require lots of testing effort. Let's stop arguing over "most of the time":-) As Andi points out, there are certainly enough real-world users of "single linear or crypt target using one physical device" for it to be worth our supporting it. Alasdair -- [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Implement barrier support for single device DM devices
On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote: > Alasdair G Kergon wrote: > > On Fri, Feb 15, 2008 at 01:08:21PM +0100, Andi Kleen wrote: > >> Implement barrier support for single device DM devices > > Thanks. We've got some (more-invasive) dm patches in the works that > > attempt to use flushing to emulate barriers where we can't just > > pass them down like that. > I wonder if it's worth the effort to try to implement this. The decision got taken to allocate barrier bios to implement the basic flush so dm has little choice in this matter now. (If you're going to implement barriers for flush, you might as well implement them more generally.) Maybe I should spell this out more clearly for those who weren't tracking this block layer change: AFAIK You cannot currently flush a device-mapper block device without doing some jiggery-pokery. > For example, how safe > xfs is if barriers are not supported or turned off? The last time we tried xfs with dm it didn't seem to notice -EOPNOTSUPP everywhere it should => recovery may find corruption. Alasdair -- [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
On Mon, Feb 18, 2008 at 04:24:27PM +0300, Michael Tokarev wrote: > First, I still don't understand why in God's sake barriers are "working" > while regular cache flushes are not. Almost no consumer-grade hard drive > supports write barriers, but they all support regular cache flushes, and > the latter should be enough (while not the most speed-optimal) to ensure > data safety. Why to require write cache disable (like in XFS FAQ) instead > of going the flush-cache-when-appropriate (as opposed to write-barrier- > when-appropriate) way? Devil's advocate: Why should we need to support multiple different block layer APIs to do the same thing? Surely any hardware that doesn't support barrier operations can emulate them with cache flushes when they receive a barrier I/O from the filesystem Also, given that disabling the write cache still allows CTQ/NCQ to operate effectively and that in most cases WCD+CTQ is as fast as WCE+barriers, the simplest thing to do is turn off volatile write caches and not require any extra software kludges for safe operation. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
Michael Tokarev wrote: Ric Wheeler wrote: Alasdair G Kergon wrote: On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote: On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote: I wonder if it's worth the effort to try to implement this. My personal view (which seems to be in the minority) is that it's a waste of our development time *except* in the (rare?) cases similar to the ones Andi is talking about. Using working barriers is important for normal users when you really care about data loss and have normal drives in a box. We do power fail testing on boxes (with reiserfs and ext3) and can definitely see a lot of file system corruption eliminated over power failures when barriers are enabled properly. It is not unreasonable for some machines to disable barriers to get a performance boost, but I would not do that when you are storing things you really need back. The talk here is about something different - about supporting barriers on md/dm devices, i.e., on pseudo-devices which uses multiple real devices as components (software RAIDs etc). In this "world" it's nearly impossible to support barriers if there are more than one underlying component device, barriers only works if there's only one component. And the talk is about supporting barriers only in "minority" of cases - mostly for simplest device-mapper case only, NOT covering any raid1 or other "fancy" configurations. I understand that. Most of the time, dm or md devices are composed of uniform components which will uniformly support (or not) the cache flush commands used by barriers. Of course, you don't need barriers when you either disable the write cache on the drives or use a battery backed RAID array which gives you a write cache that will survive power outages... Two things here. First, I still don't understand why in God's sake barriers are "working" while regular cache flushes are not. Almost no consumer-grade hard drive supports write barriers, but they all support regular cache flushes, and the latter should be enough (while not the most speed-optimal) to ensure data safety. Why to require write cache disable (like in XFS FAQ) instead of going the flush-cache-when-appropriate (as opposed to write-barrier- when-appropriate) way? Barriers have different flavors, but can be composed of "cache" flushes which are supported on all drives that I have seen (S-ATA and ATA) for many years now. That is the flavor of barriers that we test with S-ATA & ATA drives. The issue is that without flushing/invalidating (or other way of controlling the behavior of your storage), the file system has no way to make sure that all data is on persistent & non-volatile media. And second, "surprisingly", battery-backed RAID write caches tends to fail too, sometimes... ;) Usually, such a battery is enough to keep the data in memory for several hours only (sine many RAID controllers uses regular RAM for memory caches, which requires some power to keep its state), -- I come across this issue the hard way, and realized that only very few persons around me who manages raid systems even knows about this problem - that the battery-backed cache is only for some time... For example, power failed at evening, and by tomorrow morning, batteries are empty already. Or, with better batteries, think about a weekend... ;) (I've seen some vendors now uses flash-based backing store for caches instead, which should ensure far better results here). /mjt That is why you need to get a good array, not just a simple controller ;-) Most arrays do not use batteries to hold up the write cache, they use the batteries to move any cached data to non-volatile media in the time that the batteries hold up. You could certainly get this kind of behavior from the flash scheme you describe above as well... ric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
Ric Wheeler wrote: > Alasdair G Kergon wrote: >> On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote: >>> On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote: I wonder if it's worth the effort to try to implement this. >> >> My personal view (which seems to be in the minority) is that it's a >> waste of our development time *except* in the (rare?) cases similar to >> the ones Andi is talking about. > > Using working barriers is important for normal users when you really > care about data loss and have normal drives in a box. We do power fail > testing on boxes (with reiserfs and ext3) and can definitely see a lot > of file system corruption eliminated over power failures when barriers > are enabled properly. > > It is not unreasonable for some machines to disable barriers to get a > performance boost, but I would not do that when you are storing things > you really need back. The talk here is about something different - about supporting barriers on md/dm devices, i.e., on pseudo-devices which uses multiple real devices as components (software RAIDs etc). In this "world" it's nearly impossible to support barriers if there are more than one underlying component device, barriers only works if there's only one component. And the talk is about supporting barriers only in "minority" of cases - mostly for simplest device-mapper case only, NOT covering any raid1 or other "fancy" configurations. > Of course, you don't need barriers when you either disable the write > cache on the drives or use a battery backed RAID array which gives you a > write cache that will survive power outages... Two things here. First, I still don't understand why in God's sake barriers are "working" while regular cache flushes are not. Almost no consumer-grade hard drive supports write barriers, but they all support regular cache flushes, and the latter should be enough (while not the most speed-optimal) to ensure data safety. Why to require write cache disable (like in XFS FAQ) instead of going the flush-cache-when-appropriate (as opposed to write-barrier- when-appropriate) way? And second, "surprisingly", battery-backed RAID write caches tends to fail too, sometimes... ;) Usually, such a battery is enough to keep the data in memory for several hours only (sine many RAID controllers uses regular RAM for memory caches, which requires some power to keep its state), -- I come across this issue the hard way, and realized that only very few persons around me who manages raid systems even knows about this problem - that the battery-backed cache is only for some time... For example, power failed at evening, and by tomorrow morning, batteries are empty already. Or, with better batteries, think about a weekend... ;) (I've seen some vendors now uses flash-based backing store for caches instead, which should ensure far better results here). /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
Alasdair G Kergon wrote: On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote: On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote: I wonder if it's worth the effort to try to implement this. My personal view (which seems to be in the minority) is that it's a waste of our development time *except* in the (rare?) cases similar to the ones Andi is talking about. Using working barriers is important for normal users when you really care about data loss and have normal drives in a box. We do power fail testing on boxes (with reiserfs and ext3) and can definitely see a lot of file system corruption eliminated over power failures when barriers are enabled properly. It is not unreasonable for some machines to disable barriers to get a performance boost, but I would not do that when you are storing things you really need back. Of course, you don't need barriers when you either disable the write cache on the drives or use a battery backed RAID array which gives you a write cache that will survive power outages... ric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
Alasdair G Kergon wrote: On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote: On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote: I wonder if it's worth the effort to try to implement this. My personal view (which seems to be in the minority) is that it's a waste of our development time *except* in the (rare?) cases similar to the ones Andi is talking about. Using working barriers is important for normal users when you really care about data loss and have normal drives in a box. We do power fail testing on boxes (with reiserfs and ext3) and can definitely see a lot of file system corruption eliminated over power failures when barriers are enabled properly. It is not unreasonable for some machines to disable barriers to get a performance boost, but I would not do that when you are storing things you really need back. Of course, you don't need barriers when you either disable the write cache on the drives or use a battery backed RAID array which gives you a write cache that will survive power outages... ric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
Ric Wheeler wrote: Alasdair G Kergon wrote: On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote: On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote: I wonder if it's worth the effort to try to implement this. My personal view (which seems to be in the minority) is that it's a waste of our development time *except* in the (rare?) cases similar to the ones Andi is talking about. Using working barriers is important for normal users when you really care about data loss and have normal drives in a box. We do power fail testing on boxes (with reiserfs and ext3) and can definitely see a lot of file system corruption eliminated over power failures when barriers are enabled properly. It is not unreasonable for some machines to disable barriers to get a performance boost, but I would not do that when you are storing things you really need back. The talk here is about something different - about supporting barriers on md/dm devices, i.e., on pseudo-devices which uses multiple real devices as components (software RAIDs etc). In this world it's nearly impossible to support barriers if there are more than one underlying component device, barriers only works if there's only one component. And the talk is about supporting barriers only in minority of cases - mostly for simplest device-mapper case only, NOT covering any raid1 or other fancy configurations. Of course, you don't need barriers when you either disable the write cache on the drives or use a battery backed RAID array which gives you a write cache that will survive power outages... Two things here. First, I still don't understand why in God's sake barriers are working while regular cache flushes are not. Almost no consumer-grade hard drive supports write barriers, but they all support regular cache flushes, and the latter should be enough (while not the most speed-optimal) to ensure data safety. Why to require write cache disable (like in XFS FAQ) instead of going the flush-cache-when-appropriate (as opposed to write-barrier- when-appropriate) way? And second, surprisingly, battery-backed RAID write caches tends to fail too, sometimes... ;) Usually, such a battery is enough to keep the data in memory for several hours only (sine many RAID controllers uses regular RAM for memory caches, which requires some power to keep its state), -- I come across this issue the hard way, and realized that only very few persons around me who manages raid systems even knows about this problem - that the battery-backed cache is only for some time... For example, power failed at evening, and by tomorrow morning, batteries are empty already. Or, with better batteries, think about a weekend... ;) (I've seen some vendors now uses flash-based backing store for caches instead, which should ensure far better results here). /mjt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
Michael Tokarev wrote: Ric Wheeler wrote: Alasdair G Kergon wrote: On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote: On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote: I wonder if it's worth the effort to try to implement this. My personal view (which seems to be in the minority) is that it's a waste of our development time *except* in the (rare?) cases similar to the ones Andi is talking about. Using working barriers is important for normal users when you really care about data loss and have normal drives in a box. We do power fail testing on boxes (with reiserfs and ext3) and can definitely see a lot of file system corruption eliminated over power failures when barriers are enabled properly. It is not unreasonable for some machines to disable barriers to get a performance boost, but I would not do that when you are storing things you really need back. The talk here is about something different - about supporting barriers on md/dm devices, i.e., on pseudo-devices which uses multiple real devices as components (software RAIDs etc). In this world it's nearly impossible to support barriers if there are more than one underlying component device, barriers only works if there's only one component. And the talk is about supporting barriers only in minority of cases - mostly for simplest device-mapper case only, NOT covering any raid1 or other fancy configurations. I understand that. Most of the time, dm or md devices are composed of uniform components which will uniformly support (or not) the cache flush commands used by barriers. Of course, you don't need barriers when you either disable the write cache on the drives or use a battery backed RAID array which gives you a write cache that will survive power outages... Two things here. First, I still don't understand why in God's sake barriers are working while regular cache flushes are not. Almost no consumer-grade hard drive supports write barriers, but they all support regular cache flushes, and the latter should be enough (while not the most speed-optimal) to ensure data safety. Why to require write cache disable (like in XFS FAQ) instead of going the flush-cache-when-appropriate (as opposed to write-barrier- when-appropriate) way? Barriers have different flavors, but can be composed of cache flushes which are supported on all drives that I have seen (S-ATA and ATA) for many years now. That is the flavor of barriers that we test with S-ATA ATA drives. The issue is that without flushing/invalidating (or other way of controlling the behavior of your storage), the file system has no way to make sure that all data is on persistent non-volatile media. And second, surprisingly, battery-backed RAID write caches tends to fail too, sometimes... ;) Usually, such a battery is enough to keep the data in memory for several hours only (sine many RAID controllers uses regular RAM for memory caches, which requires some power to keep its state), -- I come across this issue the hard way, and realized that only very few persons around me who manages raid systems even knows about this problem - that the battery-backed cache is only for some time... For example, power failed at evening, and by tomorrow morning, batteries are empty already. Or, with better batteries, think about a weekend... ;) (I've seen some vendors now uses flash-based backing store for caches instead, which should ensure far better results here). /mjt That is why you need to get a good array, not just a simple controller ;-) Most arrays do not use batteries to hold up the write cache, they use the batteries to move any cached data to non-volatile media in the time that the batteries hold up. You could certainly get this kind of behavior from the flash scheme you describe above as well... ric -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
On Mon, Feb 18, 2008 at 04:24:27PM +0300, Michael Tokarev wrote: First, I still don't understand why in God's sake barriers are working while regular cache flushes are not. Almost no consumer-grade hard drive supports write barriers, but they all support regular cache flushes, and the latter should be enough (while not the most speed-optimal) to ensure data safety. Why to require write cache disable (like in XFS FAQ) instead of going the flush-cache-when-appropriate (as opposed to write-barrier- when-appropriate) way? Devil's advocate: Why should we need to support multiple different block layer APIs to do the same thing? Surely any hardware that doesn't support barrier operations can emulate them with cache flushes when they receive a barrier I/O from the filesystem Also, given that disabling the write cache still allows CTQ/NCQ to operate effectively and that in most cases WCD+CTQ is as fast as WCE+barriers, the simplest thing to do is turn off volatile write caches and not require any extra software kludges for safe operation. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Implement barrier support for single device DM devices
On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote: Alasdair G Kergon wrote: On Fri, Feb 15, 2008 at 01:08:21PM +0100, Andi Kleen wrote: Implement barrier support for single device DM devices Thanks. We've got some (more-invasive) dm patches in the works that attempt to use flushing to emulate barriers where we can't just pass them down like that. I wonder if it's worth the effort to try to implement this. The decision got taken to allocate barrier bios to implement the basic flush so dm has little choice in this matter now. (If you're going to implement barriers for flush, you might as well implement them more generally.) Maybe I should spell this out more clearly for those who weren't tracking this block layer change: AFAIK You cannot currently flush a device-mapper block device without doing some jiggery-pokery. For example, how safe xfs is if barriers are not supported or turned off? The last time we tried xfs with dm it didn't seem to notice -EOPNOTSUPP everywhere it should = recovery may find corruption. Alasdair -- [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
On Mon, Feb 18, 2008 at 08:52:10AM -0500, Ric Wheeler wrote: I understand that. Most of the time, dm or md devices are composed of uniform components which will uniformly support (or not) the cache flush commands used by barriers. As a dm developer, it's almost none of the time because trivial configurations aren't the ones that require lots of testing effort. Let's stop arguing over most of the time:-) As Andi points out, there are certainly enough real-world users of single linear or crypt target using one physical device for it to be worth our supporting it. Alasdair -- [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
On Tue, Feb 19, 2008 at 09:16:44AM +1100, David Chinner wrote: Surely any hardware that doesn't support barrier operations can emulate them with cache flushes when they receive a barrier I/O from the filesystem My complaint about having to support them within dm when more than one device is involved is because any efficiencies disappear: you can't send further I/O to any one device until all the other devices have completed their barrier (or else later I/O to that device could overtake the barrier on another device). And then I argue that it would be better for the filesystem to have the information that these are not hardware barriers so it has the opportunity of tuning its behaviour (e.g. flushing less often because it's a more expensive operation). Alasdair -- [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
On Tue, Feb 19, 2008 at 02:56:43AM +, Alasdair G Kergon wrote: On Tue, Feb 19, 2008 at 09:16:44AM +1100, David Chinner wrote: Surely any hardware that doesn't support barrier operations can emulate them with cache flushes when they receive a barrier I/O from the filesystem My complaint about having to support them within dm when more than one device is involved is because any efficiencies disappear: you can't send further I/O to any one device until all the other devices have completed their barrier (or else later I/O to that device could overtake the barrier on another device). Right - it's a horrible performance hit. But - how is what you describe any different to the filesystem doing: - flush block device - issue I/O - wait for completion - flush block device around any I/O that it would otherwise simply tag as a barrier? That serialisation at the filesystem layer is a horrible, horrible performance hi. And then there's the fact that we can't implement that in XFS because all the barrier I/Os we issue are asynchronous. We'd basically have to serialise all metadata operations and now we are talking about far worse performance hits than implementing barrier emulation in the block device. Also, it's instructive to look at the implementation of blkdev_issue_flush() - the API one is supposed to use to trigger a full block device flush. It doesn't work on DM/MD either, because it uses a no-I/O barrier bio: bio-bi_end_io = bio_end_empty_barrier; bio-bi_private = wait; bio-bi_bdev = bdev; submit_bio(1 BIO_RW_BARRIER, bio); wait_for_completion(wait); So, if the underlying block device doesn't support barriers, there's no point in changing the filesystem to issue flushes, either... And then I argue that it would be better for the filesystem to have the information that these are not hardware barriers so it has the opportunity of tuning its behaviour (e.g. flushing less often because it's a more expensive operation). There is generally no option from the filesystem POV to flush less. Either we use barrier I/Os where we need to and are safe with volatile caches or we corrupt filesystems with volatile caches when power loss occurs. There is no in-between where flushing less will save us from corruption Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
Jeremy Higdon wrote: [] I'll put it even more strongly. My experience is that disabling write cache plus disabling barriers is often much faster than enabling both barriers and write cache enabled, when doing metadata intensive operations, as long as you have a drive that is good at CTQ/NCQ. Now, and it's VERY interesting at least for me (and is off-topic in this thread) -- which drive(s) are good at NCQ? I tried numerous SATA (NCQ is about sata, right? :) drives, but NCQ either does nothing in terms of performance or hurts. Yesterday we ordered another drive from Hitachi (their raid edition thing), -- will try it tomorrow, but I've no hope here as it's some 5th or 6th model/brand already. (Ol'good SCSI drives, even 10 years old, shows large difference when TCQ is enabled...) Thanks! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
On Tue, Feb 19, 2008 at 09:16:44AM +1100, David Chinner wrote: On Mon, Feb 18, 2008 at 04:24:27PM +0300, Michael Tokarev wrote: First, I still don't understand why in God's sake barriers are working while regular cache flushes are not. Almost no consumer-grade hard drive supports write barriers, but they all support regular cache flushes, and the latter should be enough (while not the most speed-optimal) to ensure data safety. Why to require write cache disable (like in XFS FAQ) instead of going the flush-cache-when-appropriate (as opposed to write-barrier- when-appropriate) way? Devil's advocate: Why should we need to support multiple different block layer APIs to do the same thing? Surely any hardware that doesn't support barrier operations can emulate them with cache flushes when they receive a barrier I/O from the filesystem Also, given that disabling the write cache still allows CTQ/NCQ to operate effectively and that in most cases WCD+CTQ is as fast as WCE+barriers, the simplest thing to do is turn off volatile write caches and not require any extra software kludges for safe operation. I'll put it even more strongly. My experience is that disabling write cache plus disabling barriers is often much faster than enabling both barriers and write cache enabled, when doing metadata intensive operations, as long as you have a drive that is good at CTQ/NCQ. The only time write cache + barriers is significantly faster is when doing single threaded data writes, such as direct I/O, or if CTQ/NCQ is not enabled, or the drive does a poor job at it. jeremy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Implement barrier support for single device DM devices
On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote: > Alasdair G Kergon wrote: > > On Fri, Feb 15, 2008 at 01:08:21PM +0100, Andi Kleen wrote: > >> Implement barrier support for single device DM devices > > > > Thanks. We've got some (more-invasive) dm patches in the works that > > attempt to use flushing to emulate barriers where we can't just > > pass them down like that. > > I wonder if it's worth the effort to try to implement this. > > As far as I understand (*), if a filesystem realizes that the > underlying block device does not support barriers, it will > switch to using regular flushes instead No, typically the filesystems won't issue flushes, either. > - isn't it the same > thing as you're trying to do on an MD level? > > Note that a filesystem must understand barriers/flushes on > underlying block device, since many disk drives don't support > barriers anyway. > > (*) this is, in fact, an interesting question. I still can't > find complete information about this. For example, how safe > xfs is if barriers are not supported or turned off? Is it > "less safe" than with barriers? Will it use regular cache > flushes if barriers are not here? Try reading at the XFS FAQ: http://oss.sgi.com/projects/xfs/faq/#wcache Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Implement barrier support for single device DM devices
On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote: Alasdair G Kergon wrote: On Fri, Feb 15, 2008 at 01:08:21PM +0100, Andi Kleen wrote: Implement barrier support for single device DM devices Thanks. We've got some (more-invasive) dm patches in the works that attempt to use flushing to emulate barriers where we can't just pass them down like that. I wonder if it's worth the effort to try to implement this. As far as I understand (*), if a filesystem realizes that the underlying block device does not support barriers, it will switch to using regular flushes instead No, typically the filesystems won't issue flushes, either. - isn't it the same thing as you're trying to do on an MD level? Note that a filesystem must understand barriers/flushes on underlying block device, since many disk drives don't support barriers anyway. (*) this is, in fact, an interesting question. I still can't find complete information about this. For example, how safe xfs is if barriers are not supported or turned off? Is it less safe than with barriers? Will it use regular cache flushes if barriers are not here? Try reading at the XFS FAQ: http://oss.sgi.com/projects/xfs/faq/#wcache Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
> And don't RH distributions install with LVM by default these days? > For those it should be the standard case too on all systems with > only a single disk. Yes - I make a point of turning it off ;) Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
On Fri, Feb 15, 2008 at 02:12:29PM +, Alasdair G Kergon wrote: > On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote: > > On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote: > > > I wonder if it's worth the effort to try to implement this. > > My personal view (which seems to be in the minority) is that it's a > waste of our development time *except* in the (rare?) cases similar to At least for my machines it is the standard case; it is not rare. And don't RH distributions install with LVM by default these days? For those it should be the standard case too on all systems with only a single disk. The other relatively simple case I plan to look into (in fact I already wrote something, but it's not postable yet) is dm-crypt on single device. But it's a little more complicated than the simple dm-linear case. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote: > On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote: > > I wonder if it's worth the effort to try to implement this. My personal view (which seems to be in the minority) is that it's a waste of our development time *except* in the (rare?) cases similar to the ones Andi is talking about. But the decision has already been made for us in the block layer: dm is now pretty much required to support (zero-length) barriers. Unfortunately we didn't get this finished in time for 2.6.25, but we intend to get it done for 2.6.26: Woe betide any callers today that don't handle EOPNOTSUPP! Alasdair -- [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Implement barrier support for single device DM devices
On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote: > Alasdair G Kergon wrote: > > On Fri, Feb 15, 2008 at 01:08:21PM +0100, Andi Kleen wrote: > >> Implement barrier support for single device DM devices > > > > Thanks. We've got some (more-invasive) dm patches in the works that > > attempt to use flushing to emulate barriers where we can't just > > pass them down like that. > > I wonder if it's worth the effort to try to implement this. DM in theory has some more knowledge for optimization. e.g. for example if it knows that a stream of requests hits only a single device then it can just pass the barriers through again and only flush when there is really a request dependency between different devices. File systems can't do it that fine grained; it's either all or nothing. I don't know if doing it fine grained will much difference in performance though. The only way to find out would be to try it. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Implement barrier support for single device DM devices
Alasdair G Kergon wrote: > On Fri, Feb 15, 2008 at 01:08:21PM +0100, Andi Kleen wrote: >> Implement barrier support for single device DM devices > > Thanks. We've got some (more-invasive) dm patches in the works that > attempt to use flushing to emulate barriers where we can't just > pass them down like that. I wonder if it's worth the effort to try to implement this. As far as I understand (*), if a filesystem realizes that the underlying block device does not support barriers, it will switch to using regular flushes instead - isn't it the same thing as you're trying to do on an MD level? Note that a filesystem must understand barriers/flushes on underlying block device, since many disk drives don't support barriers anyway. (*) this is, in fact, an interesting question. I still can't find complete information about this. For example, how safe xfs is if barriers are not supported or turned off? Is it "less safe" than with barriers? Will it use regular cache flushes if barriers are not here? Ditto for ext3fs, but here, barriers are not enabled by default. /mjt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Implement barrier support for single device DM devices
On Fri, Feb 15, 2008 at 01:08:21PM +0100, Andi Kleen wrote: > Implement barrier support for single device DM devices Thanks. We've got some (more-invasive) dm patches in the works that attempt to use flushing to emulate barriers where we can't just pass them down like that. Alasdair -- [EMAIL PROTECTED] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Implement barrier support for single device DM devices
Implement barrier support for single device DM devices This patch implements barrier support in DM for the common case of dm linear just remapping a single underlying device. In this case we can safely pass the barrier through because there can be no reordering between devices. Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> --- drivers/md/dm-linear.c |1 + drivers/md/dm-table.c | 27 ++- drivers/md/dm.c| 14 -- drivers/md/dm.h|2 ++ 4 files changed, 33 insertions(+), 11 deletions(-) Index: linux/drivers/md/dm-table.c === --- linux.orig/drivers/md/dm-table.c +++ linux/drivers/md/dm-table.c @@ -38,6 +38,9 @@ struct dm_table { sector_t *highs; struct dm_target *targets; + unsigned single_device : 1; + unsigned barrier_supported : 1; + /* * Indicates the rw permissions for the new logical * device. This should be a combination of FMODE_READ @@ -584,12 +587,21 @@ EXPORT_SYMBOL_GPL(dm_set_device_limits); int dm_get_device(struct dm_target *ti, const char *path, sector_t start, sector_t len, int mode, struct dm_dev **result) { - int r = __table_get_device(ti->table, ti, path, + struct dm_table *t = ti->table; + int r = __table_get_device(t, ti, path, start, len, mode, result); if (!r) dm_set_device_limits(ti, (*result)->bdev); + if (!r) { + /* Only got single device? */ + if (t->devices.next->next == >devices) + t->single_device = 1; + else + t->single_device = 0; + } + return r; } @@ -1023,6 +1035,16 @@ struct mapped_device *dm_table_get_md(st return t->md; } +int dm_table_barrier_ok(struct dm_table *t) +{ + return t->single_device && t->barrier_supported; +} + +void dm_table_support_barrier(struct dm_table *t) +{ + t->barrier_supported = 1; +} + EXPORT_SYMBOL(dm_vcalloc); EXPORT_SYMBOL(dm_get_device); EXPORT_SYMBOL(dm_put_device); @@ -1033,3 +1055,5 @@ EXPORT_SYMBOL(dm_table_get_md); EXPORT_SYMBOL(dm_table_put); EXPORT_SYMBOL(dm_table_get); EXPORT_SYMBOL(dm_table_unplug_all); +EXPORT_SYMBOL(dm_table_barrier_ok); +EXPORT_SYMBOL(dm_table_support_barrier); Index: linux/drivers/md/dm.c === --- linux.orig/drivers/md/dm.c +++ linux/drivers/md/dm.c @@ -801,7 +801,10 @@ static int __split_bio(struct mapped_dev ci.map = dm_get_table(md); if (unlikely(!ci.map)) return -EIO; - + if (unlikely(bio_barrier(bio) && !dm_table_barrier_ok(ci.map))) { + bio_endio(bio, -EOPNOTSUPP); + return 0; + } ci.md = md; ci.bio = bio; ci.io = alloc_io(md); @@ -837,15 +840,6 @@ static int dm_request(struct request_que int rw = bio_data_dir(bio); struct mapped_device *md = q->queuedata; - /* -* There is no use in forwarding any barrier request since we can't -* guarantee it is (or can be) handled by the targets correctly. -*/ - if (unlikely(bio_barrier(bio))) { - bio_endio(bio, -EOPNOTSUPP); - return 0; - } - down_read(>io_lock); disk_stat_inc(dm_disk(md), ios[rw]); Index: linux/drivers/md/dm.h === --- linux.orig/drivers/md/dm.h +++ linux/drivers/md/dm.h @@ -116,6 +116,8 @@ void dm_table_unplug_all(struct dm_table * To check the return value from dm_table_find_target(). */ #define dm_target_is_valid(t) ((t)->table) +int dm_table_barrier_ok(struct dm_table *t); +void dm_table_support_barrier(struct dm_table *t); /*- * A registry of target types. Index: linux/drivers/md/dm-linear.c === --- linux.orig/drivers/md/dm-linear.c +++ linux/drivers/md/dm-linear.c @@ -52,6 +52,7 @@ static int linear_ctr(struct dm_target * ti->error = "dm-linear: Device lookup failed"; goto bad; } + dm_table_support_barrier(ti->table); ti->private = lc; return 0; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Implement barrier support for single device DM devices
Implement barrier support for single device DM devices This patch implements barrier support in DM for the common case of dm linear just remapping a single underlying device. In this case we can safely pass the barrier through because there can be no reordering between devices. Signed-off-by: Andi Kleen [EMAIL PROTECTED] --- drivers/md/dm-linear.c |1 + drivers/md/dm-table.c | 27 ++- drivers/md/dm.c| 14 -- drivers/md/dm.h|2 ++ 4 files changed, 33 insertions(+), 11 deletions(-) Index: linux/drivers/md/dm-table.c === --- linux.orig/drivers/md/dm-table.c +++ linux/drivers/md/dm-table.c @@ -38,6 +38,9 @@ struct dm_table { sector_t *highs; struct dm_target *targets; + unsigned single_device : 1; + unsigned barrier_supported : 1; + /* * Indicates the rw permissions for the new logical * device. This should be a combination of FMODE_READ @@ -584,12 +587,21 @@ EXPORT_SYMBOL_GPL(dm_set_device_limits); int dm_get_device(struct dm_target *ti, const char *path, sector_t start, sector_t len, int mode, struct dm_dev **result) { - int r = __table_get_device(ti-table, ti, path, + struct dm_table *t = ti-table; + int r = __table_get_device(t, ti, path, start, len, mode, result); if (!r) dm_set_device_limits(ti, (*result)-bdev); + if (!r) { + /* Only got single device? */ + if (t-devices.next-next == t-devices) + t-single_device = 1; + else + t-single_device = 0; + } + return r; } @@ -1023,6 +1035,16 @@ struct mapped_device *dm_table_get_md(st return t-md; } +int dm_table_barrier_ok(struct dm_table *t) +{ + return t-single_device t-barrier_supported; +} + +void dm_table_support_barrier(struct dm_table *t) +{ + t-barrier_supported = 1; +} + EXPORT_SYMBOL(dm_vcalloc); EXPORT_SYMBOL(dm_get_device); EXPORT_SYMBOL(dm_put_device); @@ -1033,3 +1055,5 @@ EXPORT_SYMBOL(dm_table_get_md); EXPORT_SYMBOL(dm_table_put); EXPORT_SYMBOL(dm_table_get); EXPORT_SYMBOL(dm_table_unplug_all); +EXPORT_SYMBOL(dm_table_barrier_ok); +EXPORT_SYMBOL(dm_table_support_barrier); Index: linux/drivers/md/dm.c === --- linux.orig/drivers/md/dm.c +++ linux/drivers/md/dm.c @@ -801,7 +801,10 @@ static int __split_bio(struct mapped_dev ci.map = dm_get_table(md); if (unlikely(!ci.map)) return -EIO; - + if (unlikely(bio_barrier(bio) !dm_table_barrier_ok(ci.map))) { + bio_endio(bio, -EOPNOTSUPP); + return 0; + } ci.md = md; ci.bio = bio; ci.io = alloc_io(md); @@ -837,15 +840,6 @@ static int dm_request(struct request_que int rw = bio_data_dir(bio); struct mapped_device *md = q-queuedata; - /* -* There is no use in forwarding any barrier request since we can't -* guarantee it is (or can be) handled by the targets correctly. -*/ - if (unlikely(bio_barrier(bio))) { - bio_endio(bio, -EOPNOTSUPP); - return 0; - } - down_read(md-io_lock); disk_stat_inc(dm_disk(md), ios[rw]); Index: linux/drivers/md/dm.h === --- linux.orig/drivers/md/dm.h +++ linux/drivers/md/dm.h @@ -116,6 +116,8 @@ void dm_table_unplug_all(struct dm_table * To check the return value from dm_table_find_target(). */ #define dm_target_is_valid(t) ((t)-table) +int dm_table_barrier_ok(struct dm_table *t); +void dm_table_support_barrier(struct dm_table *t); /*- * A registry of target types. Index: linux/drivers/md/dm-linear.c === --- linux.orig/drivers/md/dm-linear.c +++ linux/drivers/md/dm-linear.c @@ -52,6 +52,7 @@ static int linear_ctr(struct dm_target * ti-error = dm-linear: Device lookup failed; goto bad; } + dm_table_support_barrier(ti-table); ti-private = lc; return 0; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Implement barrier support for single device DM devices
On Fri, Feb 15, 2008 at 01:08:21PM +0100, Andi Kleen wrote: Implement barrier support for single device DM devices Thanks. We've got some (more-invasive) dm patches in the works that attempt to use flushing to emulate barriers where we can't just pass them down like that. Alasdair -- [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Implement barrier support for single device DM devices
On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote: Alasdair G Kergon wrote: On Fri, Feb 15, 2008 at 01:08:21PM +0100, Andi Kleen wrote: Implement barrier support for single device DM devices Thanks. We've got some (more-invasive) dm patches in the works that attempt to use flushing to emulate barriers where we can't just pass them down like that. I wonder if it's worth the effort to try to implement this. DM in theory has some more knowledge for optimization. e.g. for example if it knows that a stream of requests hits only a single device then it can just pass the barriers through again and only flush when there is really a request dependency between different devices. File systems can't do it that fine grained; it's either all or nothing. I don't know if doing it fine grained will much difference in performance though. The only way to find out would be to try it. -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote: On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote: I wonder if it's worth the effort to try to implement this. My personal view (which seems to be in the minority) is that it's a waste of our development time *except* in the (rare?) cases similar to the ones Andi is talking about. But the decision has already been made for us in the block layer: dm is now pretty much required to support (zero-length) barriers. Unfortunately we didn't get this finished in time for 2.6.25, but we intend to get it done for 2.6.26: Woe betide any callers today that don't handle EOPNOTSUPP! Alasdair -- [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
On Fri, Feb 15, 2008 at 02:12:29PM +, Alasdair G Kergon wrote: On Fri, Feb 15, 2008 at 03:20:10PM +0100, Andi Kleen wrote: On Fri, Feb 15, 2008 at 04:07:54PM +0300, Michael Tokarev wrote: I wonder if it's worth the effort to try to implement this. My personal view (which seems to be in the minority) is that it's a waste of our development time *except* in the (rare?) cases similar to At least for my machines it is the standard case; it is not rare. And don't RH distributions install with LVM by default these days? For those it should be the standard case too on all systems with only a single disk. The other relatively simple case I plan to look into (in fact I already wrote something, but it's not postable yet) is dm-crypt on single device. But it's a little more complicated than the simple dm-linear case. -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [dm-devel] Re: [PATCH] Implement barrier support for single device DM devices
And don't RH distributions install with LVM by default these days? For those it should be the standard case too on all systems with only a single disk. Yes - I make a point of turning it off ;) Alan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/