Re: Vinum write performance (was: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c))
On Wed, Dec 12, 2001 at 04:22:05PM +1030, Greg Lehey wrote: On Tuesday, 11 December 2001 at 3:11:21 +0100, Bernd Walter wrote: striped: If you have 512byte stripes and have 2 disks. You access 64k which is put into 2 32k transactions onto the disk. Only if your software optimizes the transfers. There are reasons why it should not. Without optimization, you get 128 individual transfers. If the software does not we end with 128 transactions anyway, which is not very good becuase of the overhead for each of them. UFS does a more or less good job in doing this. Linear speed could be about twice the speed of a single drive. But this is more theoretic today than real. The average transaction size per disk decreases with growing number of spindles and you get more transaction overhead. Also the voice coil technology used in drives since many years add a random amount of time to the access time, which invalidates some of the spindle sync potential. Plus it may break some benefits of precaching mechanisms in drives. I'm almost shure there is no real performance gain with modern drives. The real problem with this scenario is that you're missing a couple of points: 1. Typically it's not the latency that matters. If you have to wait a few ms longer, that's not important. What's interesting is the case of a heavily loaded system, where the throughput is much more important than the latency. Agreed - especially because we don't wait for writes as most are async. 2. Throughput is the data transferred per unit time. There's active transfer time, nowadays in the order of 500 µs, and positioning time, in the order of 6 ms. Clearly the fewer positioning operations, the better. This means that you should want to put most transfers on a single spindle, not a single stripe. To do this, you need big stripes. In the general case yes. raid5: For a write you have two read transactions and two writes. This is the way Vinum does it. There are other possibilities: 1. Always do full-stripe writes. Then you don't need to read the old contents. Which isn't that good with the big stripes we usually want. 2. Cache the parity blocks. This is an optimization which I think would be very valuable, but which Vinum doesn't currently perform. I thought of connecting the parity to the wait lock. If there's a waiter for the same parity data it's not droped. This way we don't waste memory but still have an efect. There are easier things to raise performance. Ever wondered why people claim vinums raid5 writes are slow? The answer is astonishing simple: Vinum does striped based locking, while the ufs tries to lay out data mostly ascending sectors. What happens here is that the first write has to wait for two reads and two writes. If we have an ascending write it has to wait for the first write to finish, because the stripe is still locked. The first is unlocked after both physical writes are on disk. Now we start our two reads which are (thanks to drives precache) most likely in the drives cache - than we write. The problem here is that physical writes gets serialized and the drive has to wait a complete rotation between each. Not if the data is in the drive cache. This example was for writing. Reads get precached by the drive and have a very good chance of beeing in the cache. It doesn't matter on IDE disks, because if you have write cache enabled the write gets acked from the cache and not the media. If write cache is disabled writes gets serialized anyway. If we had a fine grained locking which only locks the accessed sectors in the parity we would be able to have more than a single ascending write transaction onto a single drive. Hmm. This is something I hadn't thought about. Note that sequential writes to a RAID-5 volume don't go to sequential addresses on the spindles; they will work up to the end of the stripe on one spindle, then start on the next spindle at the start of the stripe. You can do that as long as the address ranges in the parity block don't overlap, but the larger the stripe, the greater the likelihood of this would be. This might also explain the following observed behaviour: 1. RAID-5 writes slow down when the stripe size gets 256 kB or so. I don't know if this happens on all disks, but I've seen it often enough. I would guess it when the stripe size is bigger than the preread cache the drives uses. This would mean we have a less chance to get parity data out of the drive cache. 2. rawio write performance is better than ufs write performance. rawio does truly random transfers, where ufs is a mixture. The current problem is to increase linear write performance. I don't see a chance that rawio benefit of it, but ufs will. Do you feel like changing the locking code? It shouldn't be that much work, and I'd be interested to
Re: Vinum write performance (was: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c))
On Wednesday, 12 December 2001 at 12:53:37 +0100, Bernd Walter wrote: On Wed, Dec 12, 2001 at 04:22:05PM +1030, Greg Lehey wrote: On Tuesday, 11 December 2001 at 3:11:21 +0100, Bernd Walter wrote: striped: If you have 512byte stripes and have 2 disks. You access 64k which is put into 2 32k transactions onto the disk. Only if your software optimizes the transfers. There are reasons why it should not. Without optimization, you get 128 individual transfers. If the software does not we end with 128 transactions anyway, which is not very good becuase of the overhead for each of them. Correct. UFS does a more or less good job in doing this. Well, it requires a lot of moves. Vinum *could* do this, but for the reasons specified below, there's no need. raid5: For a write you have two read transactions and two writes. This is the way Vinum does it. There are other possibilities: 1. Always do full-stripe writes. Then you don't need to read the old contents. Which isn't that good with the big stripes we usually want. Correct. That's why most RAID controllers limit stripe size to something sub-optimal, because it simplifies the code to do full-stripe writes. 2. Cache the parity blocks. This is an optimization which I think would be very valuable, but which Vinum doesn't currently perform. I thought of connecting the parity to the wait lock. If there's a waiter for the same parity data it's not droped. This way we don't waste memory but still have an efect. That's a possibility, though it doesn't directly address parity block caching. The problem is that by the time you find another lock, you've already performed part of the parity calculation, and probably part of the I/O transfer. But it's an interesting consideration. If we had a fine grained locking which only locks the accessed sectors in the parity we would be able to have more than a single ascending write transaction onto a single drive. Hmm. This is something I hadn't thought about. Note that sequential writes to a RAID-5 volume don't go to sequential addresses on the spindles; they will work up to the end of the stripe on one spindle, then start on the next spindle at the start of the stripe. You can do that as long as the address ranges in the parity block don't overlap, but the larger the stripe, the greater the likelihood of this would be. This might also explain the following observed behaviour: 1. RAID-5 writes slow down when the stripe size gets 256 kB or so. I don't know if this happens on all disks, but I've seen it often enough. I would guess it when the stripe size is bigger than the preread cache the drives uses. This would mean we have a less chance to get parity data out of the drive cache. Yes, this was one of the possibilities we considered. 2. rawio write performance is better than ufs write performance. rawio does truly random transfers, where ufs is a mixture. The current problem is to increase linear write performance. I don't see a chance that rawio benefit of it, but ufs will. Well, rawio doesn't need to benefit. It's supposed to be a neutral observer, but in this case it's not doing too well. Do you feel like changing the locking code? It shouldn't be that much work, and I'd be interested to see how much performance difference it makes. I put it onto my todo list. Thanks. Note that there's another possible optimization here: delay the writes by a certain period of time and coalesce them if possible. I haven't finished thinking about the implications. That's exactly what the ufs clustering and softupdates does. If it doesn't fit modern drives anymore it should get tuned there. This doesn't have too much to do with modern drives; it's just as applicable to 70s drives. Whenever a write hits a driver there is a waiter for it. Either a softdep, a memory freeing or an application doing an sync transfer. I'm almost shure delaying writes will harm performance in upper layers. I'm not so sure. Full stripe writes, where needed, are *much* faster than partial strip writes. Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Vinum write performance (was: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c))
On Thu, Dec 13, 2001 at 12:47:53PM +1030, Greg Lehey wrote: On Thursday, 13 December 2001 at 3:06:14 +0100, Bernd Walter wrote: Currently if we have two writes in two stripes each, all initated before the first finished, the drive has to seek between the two stripes, as the second write to the same stripe has to wait. I'm not sure I understand this. The stripes are on different drives, after all. Lets asume a 256k striped single plex volume with 3 subdisks. We get a layout like this: sd1 sd2 sd3 256k256kparity 256kparity 256k parity 256k256k 256k256kparity ... ... ... Now we write on the volume the blocks 1, 10, 1040 and 1045. All writes are initated at the same time. Good would be to write first 1 then 10 then 1040 and finaly 1045. What we currently see is write 1 then 1040 then 10 and finaly 1045. This is because we can't write 10 unless 1 is finished but we already start with 1040 because it's independend. The result is avoidable seeking in subdisk 1. Back to the 256k performance breakdown you described. Because of the seeks we have not only unneeded seeks on the drive but also have a different use pattern on the drive cache. Once the locks are untangled it is required to verify the situation as the drive cache may behave differently. -- B.Walter COSMO-Project http://www.cosmo-project.de [EMAIL PROTECTED] Usergroup [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c)
On Tue, Dec 11, 2001 at 11:06:33AM +1030, Greg Lehey wrote: On Monday, 10 December 2001 at 10:30:04 -0800, Matthew Dillon wrote: performance without it - for reading OR writing. It doesn't matter so much for RAID{1,10}, but it matters a whole lot for something like RAID-5 where the difference between a spindle-synced read or write and a non-spindle-synched read or write can be upwards of 35%. If you have RAID5 with I/O sizes that result in full-stripe operations. Well, 'more then one disk' operations anyway, for random-I/O. Caching takes care of sequential I/O reasonably well but random-I/O goes down the drain for writes if you aren't spindle synced, no matter what the stripe size, Can you explain this? I don't see it. In FreeBSD, just about all I/O goes to buffer cache. and will go down the drain for reads if you cross a stripe - something that is quite common I think. I think this is what Mike was referring to when talking about parity calculation. In any case, going across a stripe boundary is not a good idea, though of course it can't be avoided. That's one of the arguments for large stripes. In a former life I was involved with a HB striping product for SysVr2 that had a slightly modified filesystem that 'knew' when it was working on a striped disk. And as it know, it avoided posting I/O s that crossed stripes. W/ -- | / o / /_ _ email: [EMAIL PROTECTED] |/|/ / / /( (_) Bulte Arnhem, The Netherlands To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c)
On Tue, Dec 11, 2001 at 03:34:37PM +0100, Wilko Bulte wrote: On Tue, Dec 11, 2001 at 11:06:33AM +1030, Greg Lehey wrote: I think this is what Mike was referring to when talking about parity calculation. In any case, going across a stripe boundary is not a good idea, though of course it can't be avoided. That's one of the arguments for large stripes. In a former life I was involved with a HB striping product for SysVr2 that had a slightly modified filesystem that 'knew' when it was working on a striped disk. And as it know, it avoided posting I/O s that crossed stripes. Here some real world statistics with UFS softupdates: Plex d1.p0: Size: 8736473088 bytes (8331 MB) Subdisks:3 State: up Organization: striped Stripe size: 256 kB Part of volume d1 Reads: 83546 Bytes read:258429952 (246 MB) Average read: 3093 bytes Writes: 100109 Bytes written: 818750464 (780 MB) Average write: 8178 bytes Multiblock: 279 (0%) Multistripe: 82 (0%) Subdisk 0: d1.p0.s0 state: up size 2912157696 (2777 MB) Subdisk 1: d1.p0.s1 state: up size 2912157696 (2777 MB) Subdisk 2: d1.p0.s2 state: up size 2912157696 (2777 MB) You can easily see that the number of Multistripe transactions are unnoticeable low. Here another case with 64k stripe size: Plex d7.p0: Size: 36419796992 bytes (34732 MB) Subdisks:2 State: up Organization: striped Stripe size: 64 kB Part of volume d7 Reads:934001 Bytes read: 3718752768 (3546 MB) Average read: 3981 bytes Writes: 220293 Bytes written:3702993920 (3531 MB) Average write: 16809 bytes Multiblock:50037 (4%) Multistripe: 25047 (2%) Subdisk 0: d7.p0.s0 state: up size 18209898496 (17366 MB) Subdisk 1: d7.p0.s1 state: up size 18209898496 (17366 MB) You can see that even we have an absolute extrem situation the number of multistripe transactions is still very low. But a value of 384k would be a much better value for other reasons. You may want to compare the multistripe number with the multiblock number and yes it doesn't look that good anymore, but you also see that the change from 64k to 256k get much better results, while the average transaction size is 5865 bytes for the first case and 6429 bytes for the second - not that different. Most of my plexes are concat anyway. -- B.Walter COSMO-Project http://www.cosmo-project.de [EMAIL PROTECTED] Usergroup [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c)
On Tuesday, 11 December 2001 at 15:34:37 +0100, Wilko Bulte wrote: On Tue, Dec 11, 2001 at 11:06:33AM +1030, Greg Lehey wrote: On Monday, 10 December 2001 at 10:30:04 -0800, Matthew Dillon wrote: performance without it - for reading OR writing. It doesn't matter so much for RAID{1,10}, but it matters a whole lot for something like RAID-5 where the difference between a spindle-synced read or write and a non-spindle-synched read or write can be upwards of 35%. If you have RAID5 with I/O sizes that result in full-stripe operations. Well, 'more then one disk' operations anyway, for random-I/O. Caching takes care of sequential I/O reasonably well but random-I/O goes down the drain for writes if you aren't spindle synced, no matter what the stripe size, Can you explain this? I don't see it. In FreeBSD, just about all I/O goes to buffer cache. and will go down the drain for reads if you cross a stripe - something that is quite common I think. I think this is what Mike was referring to when talking about parity calculation. In any case, going across a stripe boundary is not a good idea, though of course it can't be avoided. That's one of the arguments for large stripes. In a former life I was involved with a HB striping product for SysVr2 that had a slightly modified filesystem that 'knew' when it was working on a striped disk. And as it know, it avoided posting I/O s that crossed stripes. So what did it do with user requests which crossed stripes? Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c)
On Wed, Dec 12, 2001 at 09:00:34AM +1030, Greg Lehey wrote: On Tuesday, 11 December 2001 at 15:34:37 +0100, Wilko Bulte wrote: On Tue, Dec 11, 2001 at 11:06:33AM +1030, Greg Lehey wrote: On Monday, 10 December 2001 at 10:30:04 -0800, Matthew Dillon wrote: .. and will go down the drain for reads if you cross a stripe - something that is quite common I think. I think this is what Mike was referring to when talking about parity calculation. In any case, going across a stripe boundary is not a good idea, though of course it can't be avoided. That's one of the arguments for large stripes. In a former life I was involved with a HB striping product for SysVr2 that had a slightly modified filesystem that 'knew' when it was working on a striped disk. And as it know, it avoided posting I/O s that crossed stripes. So what did it do with user requests which crossed stripes? Memory is dim, but I think the fs code created a second i/o to the driver layer. So the fs never sent out an i/o that the driver layer had to break up. In case of a pre-fetch while reading I think the f/s would just pre-fetch until the stripe border and not bother sending out a second i/o down. In the end all of this benchmarked quite favorably. Note that this was 386/486 era, with the classic SysV filesystem. -- | / o / /_ _ email: [EMAIL PROTECTED] |/|/ / / /( (_) Bulte Arnhem, The Netherlands To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Vinum write performance (was: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c))
On Tuesday, 11 December 2001 at 3:11:21 +0100, Bernd Walter wrote: On Tue, Dec 11, 2001 at 11:06:33AM +1030, Greg Lehey wrote: On Monday, 10 December 2001 at 10:30:04 -0800, Matthew Dillon wrote: performance without it - for reading OR writing. It doesn't matter so much for RAID{1,10}, but it matters a whole lot for something like RAID-5 where the difference between a spindle-synced read or write and a non-spindle-synched read or write can be upwards of 35%. If you have RAID5 with I/O sizes that result in full-stripe operations. Well, 'more then one disk' operations anyway, for random-I/O. Caching takes care of sequential I/O reasonably well but random-I/O goes down the drain for writes if you aren't spindle synced, no matter what the stripe size, Can you explain this? I don't see it. In FreeBSD, just about all I/O goes to buffer cache. After waiting for the drives and not for vinum parity blocks. and will go down the drain for reads if you cross a stripe - something that is quite common I think. I think this is what Mike was referring to when talking about parity calculation. In any case, going across a stripe boundary is not a good idea, though of course it can't be avoided. That's one of the arguments for large stripes. striped: If you have 512byte stripes and have 2 disks. You access 64k which is put into 2 32k transactions onto the disk. Only if your software optimizes the transfers. There are reasons why it should not. Without optimization, you get 128 individual transfers. The wait time for the complete transaction is the worst of both, which is more than the average of a single disk. Agreed. With spindle syncronisation the access time for both disks are beleaved to be identic and you get the same as with a single disk. Correct. Linear speed could be about twice the speed of a single drive. But this is more theoretic today than real. The average transaction size per disk decreases with growing number of spindles and you get more transaction overhead. Also the voice coil technology used in drives since many years add a random amount of time to the access time, which invalidates some of the spindle sync potential. Plus it may break some benefits of precaching mechanisms in drives. I'm almost shure there is no real performance gain with modern drives. The real problem with this scenario is that you're missing a couple of points: 1. Typically it's not the latency that matters. If you have to wait a few ms longer, that's not important. What's interesting is the case of a heavily loaded system, where the throughput is much more important than the latency. 2. Throughput is the data transferred per unit time. There's active transfer time, nowadays in the order of 500 µs, and positioning time, in the order of 6 ms. Clearly the fewer positioning operations, the better. This means that you should want to put most transfers on a single spindle, not a single stripe. To do this, you need big stripes. raid5: For a write you have two read transactions and two writes. This is the way Vinum does it. There are other possibilities: 1. Always do full-stripe writes. Then you don't need to read the old contents. 2. Cache the parity blocks. This is an optimization which I think would be very valuable, but which Vinum doesn't currently perform. There are easier things to raise performance. Ever wondered why people claim vinums raid5 writes are slow? The answer is astonishing simple: Vinum does striped based locking, while the ufs tries to lay out data mostly ascending sectors. What happens here is that the first write has to wait for two reads and two writes. If we have an ascending write it has to wait for the first write to finish, because the stripe is still locked. The first is unlocked after both physical writes are on disk. Now we start our two reads which are (thanks to drives precache) most likely in the drives cache - than we write. The problem here is that physical writes gets serialized and the drive has to wait a complete rotation between each. Not if the data is in the drive cache. If we had a fine grained locking which only locks the accessed sectors in the parity we would be able to have more than a single ascending write transaction onto a single drive. Hmm. This is something I hadn't thought about. Note that sequential writes to a RAID-5 volume don't go to sequential addresses on the spindles; they will work up to the end of the stripe on one spindle, then start on the next spindle at the start of the stripe. You can do that as long as the address ranges in the parity block don't overlap, but the larger the stripe, the greater the likelihood of this would be. This might also explain the following observed behaviour: 1. RAID-5 writes slow down when the stripe size gets 256 kB or so. I don't know if this happens
Re: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c)
On Tuesday, 11 December 2001 at 23:41:51 +0100, Wilko Bulte wrote: On Wed, Dec 12, 2001 at 09:00:34AM +1030, Greg Lehey wrote: On Tuesday, 11 December 2001 at 15:34:37 +0100, Wilko Bulte wrote: On Tue, Dec 11, 2001 at 11:06:33AM +1030, Greg Lehey wrote: On Monday, 10 December 2001 at 10:30:04 -0800, Matthew Dillon wrote: .. and will go down the drain for reads if you cross a stripe - something that is quite common I think. I think this is what Mike was referring to when talking about parity calculation. In any case, going across a stripe boundary is not a good idea, though of course it can't be avoided. That's one of the arguments for large stripes. In a former life I was involved with a HB striping product for SysVr2 that had a slightly modified filesystem that 'knew' when it was working on a striped disk. And as it know, it avoided posting I/O s that crossed stripes. So what did it do with user requests which crossed stripes? Memory is dim, but I think the fs code created a second i/o to the driver layer. So the fs never sent out an i/o that the driver layer had to break up. That's what Vinum does. In case of a pre-fetch while reading I think the f/s would just pre-fetch until the stripe border and not bother sending out a second i/o down. Yes, that's reasonable. In the end all of this benchmarked quite favorably. Note that this was 386/486 era, with the classic SysV filesystem. I don't think that UFS would behave that differently, just faster :-) Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously Decidated yet again (was : cvs commit: src/sys/kern subr_diskmbr.c)
Still, it's my opinion that these BIOSes are simply broken: Joerg's personal opinion can go take a hike. The reality of the situation is that this table is required, and we're going to put it there. The reality of the situation is far from being clear. The only thing I can see is that you're trying to dictate things without adequate justification. You should reconsider that attitude. You can't substantiate your argument by closing your eyes, Greg. There's a wealth of evidence against your stance, and frankly, none that supports it other than myopic bigotry (I don't want to do this because Microsoft use this format). Are you going to stop using all of the other techniques that we share with them? -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
As Peter Wemm wrote: No, it isn't ignored, BIOS'es know that fdisk partitions end on cylinder boundaries, and therefore can intuit what the expected geometry is for the disk in question. And you call that a design? I call it a poor hack, nothing else. The restriction to whatever the BIOS believes to be a cylinder boundary is one of my gripes with fdisk tables; you obviously missed that (or you don't argue about it -- can i take this as silent agreement?). It imposes a geometry that is not even remotely there, with the drawbacks that a number of sectors can never be assigned (OK, no big deal these days), but even worse, disks are non-portable between different BIOSes that perform different intuition about how to obtain the geometry from those poorly chosen values that are included in fdisk tables. /The/ major advantage of DD mode was that all BIOSes (so far :) at least agree on how to access block 0 and the adjacent blocks, so starting our own system there makes those disks portable. [...] The problem is that the int13 code only allowed for 255 heads, and the fake end of disk entry that is unconditionally in /boot/boot1 specified an ending head number 255 (ie: 256 heads). When this gets put into a byte register it is truncated to zero and we get divide by zero errors. I've read this, and yes, i never argued about fixing /that/. Since those values chosen by our grandfather Bill Jolitz have been just `magic' numbers only, it's unfortunate they eventually turned out to be such bad magic about a decade later. We can just as easily have bootable-DD mode with a real MBR and have freebsd start on sector #2 instead of overlapping boot1 and mbr. Probably, i think i could live with that. I'd rather that we be specific about this. If somebody wants ad2e or da2e then they should not be using *any* fdisk tables at all. Ie: block 0 should be empty. That disk wouldn't boot at all, you know that. Yes, i prefer my disks to be called da0a...daNP. But to be honest, see my other article: i never argued to make this the default or a recommended strategy in any form. It should only remain intact at all. Back to the subject, the current warning however, is pointless, and has the major drawback to potentially hide important console messages. The console buffer is 32K these days. You'd have to have around 300 disks to have any real effect on the kernel. You're narrow minded here, Peter, this time about in the same way as Windoze is narrow minded: All the world's a graphical console produced by XXX. No, all the world's not like that. You might consider my pcvt console obsolete, OK, but did you ever think about a plain VT220 on a serial console? They don't have /any/ scrollback buffer. (And you can't even stop the output with ^S while FreeBSD is booting.) Also, i think that: uriah /boot/kernel/kernel: da0: invalid primary partition table: Dangerously Dedicated (ignored) uriah last message repeated 5 times uriah /boot/kernel/kernel: da1: invalid primary partition table: Dangerously Dedicated (ignored) uriah last message repeated 34 times uriah /boot/kernel/kernel: da2: invalid primary partition table: Dangerously Dedicated (ignored) uriah last message repeated 34 times ...73 of those silly messages are just beyond any form of usefulness. Either we hide this completely behind bootverbose (back to the root of this thread) since it bears no information at all (i already knew what is written there, since it was my deliberate decision, and it could not have happened unless being my deliberate decision), or we at least ensure any of those messages is emitted at most once per drive. -- cheers, Jorg .-.-. --... ...-- -.. . DL8DTL http://www.sax.de/~joerg/NIC: JW11-RIPE Never trust an operating system you don't have sources for. ;-) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously Decidated yet again (was : cvs commit: src/sys/kern subr_diskmbr.c)
On Monday, 10 December 2001 at 0:17:14 -0800, Mike Smith wrote: Still, it's my opinion that these BIOSes are simply broken: Joerg's personal opinion can go take a hike. The reality of the situation is that this table is required, and we're going to put it there. The reality of the situation is far from being clear. The only thing I can see is that you're trying to dictate things without adequate justification. You should reconsider that attitude. You can't substantiate your argument by closing your eyes, Greg. No, of course not. I also can't substantiate my arguments by sticking my fingers down my throat and shouting dangerously dedicated!. But then, I wasn't doing either. Read back this thread for the evidence I have given and which you apparently choose to ignore. There's a wealth of evidence against your stance, Possibly, you just haven't shown it. What we know so far is that there are some kludges in the boot loader which can confuse BIOSes; peter went into some detail earlier on IRC. Only, they apply both to systems with Microsoft partitions and those without. And there are reports that some Adaptec host adaptors (or, presumably, their BIOSes) can't handle our particular boot blocks. It's possible, as peter suggests, that this is a fixable bug, but every time I mention it, I get shouted down. And yes, like Jörg, I don't care enough. I'm not saying ditch the Microsoft partition table, I'm saying don't ditch disks without the Microsoft partition table. Note also that, although this is so dangerous, it has never bitten me on any system. and frankly, none that supports it other than myopic bigotry (I don't want to do this because Microsoft use this format). None that you care to remember. Are you going to stop using all of the other techniques that we share with them? No. See above. What is it about this particular topic brings out such irrational emotions in you and others? Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
Ah, the thread which would not die... 8^). Joerg Wunsch wrote: /The/ major advantage of DD mode was that all BIOSes (so far :) at least agree on how to access block 0 and the adjacent blocks, so starting our own system there makes those disks portable. I guarantee you that there are a number of controllers which have different ideas of how to do soft sector sparing _at the controller level_ rather than at the drive level. Disks created with such controllers aren't portable, since they depend on controller state information, which may not be valid from controller to controller, depending on the controller settings (I killed a disk by not having the WD1007 soft sector sparing jumper set the same in the machine I put it in as in the machine I took it out of... 8^)). I've read this, and yes, i never argued about fixing /that/. Since those values chosen by our grandfather Bill Jolitz have been just `magic' numbers only, it's unfortunate they eventually turned out to be such bad magic about a decade later. Yeah, we should pick new magic. It's bound to die again in the future, though, once what's magic changes out from under us again... 8^(. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously Decidated yet again (was : cvs commit: src/sys/kern subr_diskmbr.c)
Greg Lehey wrote: What is it about this particular topic brings out such irrational emotions in you and others? Everyone who has been around for any length of time has been bitten on the arse by it at one time or another, I think. I remember Alfred made a Lapbrick out of a system a while back ;^). -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
As Peter Wemm wrote: Can you please clarify for me what specifically you do not like.. Is it: - the cost of 32K of disk space on an average disk these days? (and if so, is reducing that to one sector instead of 62 sufficient?) The idea of a geometry that does not even remotely resembles the actual geometry and only causes additional hassles, like disks being not portable between controllers that have a different idea of that geometry (since the design of this table is missing an actual field to specify the geometry). Incidentally, it's only what you call intuition that finally stumpled across the 10-years old Jolitz fake fdisk values. So IOW, it took the BIOS vendors ten years to produce a BIOS that would break on it :), and the breakage (division by 0) was only since they needed black magic in order to infer a geometry value that was short-sightedly never specified in the table itself. - you don't like typing s1 in the device name? Aesthetically, yes, this one too. :) disklabel -rw ad2 auto is one form. That should not use fdisk at all. This is quite fine, and nobody wants that to go away. Good to hear. Well, actually i always use disklabel -Brw daN auto, partly because this sequence is wired into my fingers, and since i mentally DAbelieve that having more bootstrappable disks couldn't harm. ;-) As laid out in another message, i eventually got the habit of even including a root partition mirror on each disk as well. So each of my disks should be able to boot a single-user FreeBSD. I advocate that the bootable form (where boot1.s is expected to do the job of both the mbr *and* the partition boot) is evil and should at the very least be fixed. Fixing is OK to me. I think to recognize the dummy fdisk table of DD mode, it would be totally sufficient to verify slice 4 being labelled with 5 blocks, and the other slices being labelled 0. We do not support any physical disk anymore that is only 25 MB in size :). So all the remaining (INT 0x13 bootstrap) values could be anything -- even something that most BIOSes would recognize as a valid fdisk table. It should be something that is explicitly activated, and not something that you get whether you want it or not. I don't fully understand that. DD mode has always been an explicit decision. Even in the above, the specification of -B explicitly tells to install that bootstrap. As David O'Brien wrote: Its design is antique. Or rather: it's missing a design. Jorg, why not just buy an Alpha or Sun Blade and run FreeBSD on it?? I don't see much value in an Alpha. Maybe a Sun some day, who knows? As i understand it now, the UltraSparc port is not quite at that stage, but i'm willing to experiment with it when i find a bit of time and documentation how to get started. I've got access to a good number of Suns here at work, and i think there are even a number of colleagues who would prefer FreeBSD over Solaris on them. If FreeBSD would had been ready for it, i could have tested it on the new V880 machine that was just announced recently. :) (We were the first one worldwide to show it on a fair trade here, about 24 hours after the official announcment...) -- cheers, Jorg .-.-. --... ...-- -.. . DL8DTL http://www.sax.de/~joerg/NIC: JW11-RIPE Never trust an operating system you don't have sources for. ;-) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
In message [EMAIL PROTECTED], Peter Wemm writes : The problem is, that you **are** using fdisk tables, you have no choice. DD mode included a *broken* fdisk table that specified an illegal geometry. ... This is why it is called dangerous. BTW, I presume you are aware of the way sysinstall creates DD MBRs; it does not use the 5 sector slice 4 method, but sets up slice 1 to cover the entire disk including the MBR, with c/h/s entries corresponding to the real start and end of the disk, e.g: cylinders=3544 heads=191 sectors/track=53 (10123 blks/cyl) ... The data for partition 1 is: sysid 165,(FreeBSD/NetBSD/386BSD) start 0, size 35885168 (17522 Meg), flag 80 (active) beg: cyl 0/ head 0/ sector 1; end: cyl 1023/ head 190/ sector 53 The data for partition 2 is: UNUSED The data for partition 3 is: UNUSED The data for partition 4 is: UNUSED Otherwise the disk layout is the same as disklabel's DD. I suspect that this approach is much less illegal than disklabel's MBRs although I do remember seeing a HP PC that disliked it. I wonder if a reasonable compromise is to make disklabel use this system for DD disks instead of the bogus 5 sector slice? Obviously, it should also somehow not install a partition table unless boot1 is being used as the MBR, and the fdisk -I method should be preferred. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
As Terry Lambert wrote: Joerg Wunsch wrote: /The/ major advantage of DD mode was that all BIOSes (so far :) at least agree on how to access block 0 and the adjacent blocks, so starting our own system there makes those disks portable. I guarantee you that there are a number of controllers which have different ideas of how to do soft sector sparing _at the controller level_ rather than at the drive level. We have dropped support for ESDI controllers long since. :-) Seriously, all the disks we are supporting these days do bad block replacement at the drive level. -- cheers, Jorg .-.-. --... ...-- -.. . DL8DTL http://www.sax.de/~joerg/NIC: JW11-RIPE Never trust an operating system you don't have sources for. ;-) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
Joerg Wunsch wrote: I guarantee you that there are a number of controllers which have different ideas of how to do soft sector sparing _at the controller level_ rather than at the drive level. We have dropped support for ESDI controllers long since. :-) Seriously, all the disks we are supporting these days do bad block replacement at the drive level. Adaptec 1742 is supported, though it took a long enough time to find its way into CAM. Same for the NCR 810. For certain applications, also, you _want_ to turn off the automatic bad sector sparing: it's incompatible with spindle sync, for example, where you want to spare all drives or none, so that the spindles don't desyncronize on a sparing hit for one drive, but not another. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously Decidated yet again (was : cvs commit: src/sys/kern subr_diskmbr.c)
What is it about this particular topic brings out such irrational emotions in you and others? Because you define as irrational those opinions that don't agree with your own. I don't consider my stance irrational at all, and I find your leaps past logic and commonsense quite irrational in return. -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
Joerg Wunsch wrote: I guarantee you that there are a number of controllers which have different ideas of how to do soft sector sparing _at the controller level_ rather than at the drive level. We have dropped support for ESDI controllers long since. :-) Seriously, all the disks we are supporting these days do bad block replacement at the drive level. Adaptec 1742 is supported, though it took a long enough time to find its way into CAM. Same for the NCR 810. Neither of which do controller-level sparing. For certain applications, also, you _want_ to turn off the automatic bad sector sparing: it's incompatible with spindle sync, for example, where you want to spare all drives or none, so that the spindles don't desyncronize on a sparing hit for one drive, but not another. Spindle sync is an anachronism these days; asynchronous behaviour (write-behind in particular) is all the rage. You'd be hard-pressed to find drives that even support it anymore. -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
:Spindle sync is an anachronism these days; asynchronous behaviour :(write-behind in particular) is all the rage. You'd be hard-pressed to :find drives that even support it anymore. Woa! Say what? I think you are totally incorrect here Mike. Spindle sync is not an anachronism. You can't get good RAID{0,2,3,4,5} performance without it - for reading OR writing. It doesn't matter so much for RAID{1,10}, but it matters a whole lot for something like RAID-5 where the difference between a spindle-synced read or write and a non-spindle-synched read or write can be upwards of 35%. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
On Mon, Dec 10, 2001 at 10:13:20AM -0800, Matthew Dillon wrote: :Spindle sync is an anachronism these days; asynchronous behaviour :(write-behind in particular) is all the rage. You'd be hard-pressed to :find drives that even support it anymore. Woa! Say what? I think you are totally incorrect here Mike. Spindle sync is not an anachronism. You can't get good RAID{0,2,3,4,5} For RAID3 that is true. For the other ones... performance without it - for reading OR writing. It doesn't matter so much for RAID{1,10}, but it matters a whole lot for something like RAID-5 where the difference between a spindle-synced read or write and a non-spindle-synched read or write can be upwards of 35%. If you have RAID5 with I/O sizes that result in full-stripe operations. -- | / o / /_ _ email: [EMAIL PROTECTED] |/|/ / / /( (_) Bulte Arnhem, The Netherlands To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
: performance without it - for reading OR writing. It doesn't matter : so much for RAID{1,10}, but it matters a whole lot for something like : RAID-5 where the difference between a spindle-synced read or write : and a non-spindle-synched read or write can be upwards of 35%. : :If you have RAID5 with I/O sizes that result in full-stripe operations. Well, 'more then one disk' operations anyway, for random-I/O. Caching takes care of sequential I/O reasonably well but random-I/O goes down the drain for writes if you aren't spindle synced, no matter what the stripe size, and will go down the drain for reads if you cross a stripe - something that is quite common I think. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
:For RAID3 that is true. For the other ones... : : performance without it - for reading OR writing. It doesn't matter : so much for RAID{1,10}, but it matters a whole lot for something like : RAID-5 where the difference between a spindle-synced read or write : and a non-spindle-synched read or write can be upwards of 35%. : :If you have RAID5 with I/O sizes that result in full-stripe operations. : :-- :| / o / /_ _email: [EMAIL PROTECTED] :|/|/ / / /( (_) BulteArnhem, The Netherlands Well, for reads a non-stripe-crossing op would still work reasonably well. But for writes less then full-stripe operations without spindle sync are going to be terrible due to the read-before-write requirement (to calculate parity). The disk cache is useless in that case. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
: Well, for reads a non-stripe-crossing op would still work reasonably : well. But for writes less then full-stripe operations without : spindle sync are going to be terrible due to the read-before-write : requirement (to calculate parity). The disk cache is useless in that : case. : :You obviously weren't reading the previous thread on RAID5 checksum :calculation, I see. 8) I don't see a thread on raid-5 checksuming. What was the subject? -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
On 09-Dec-01 Joerg Wunsch wrote: As Peter Wemm wrote: There shouldn't *be* bootblocks on non-boot disks. dd if=/dev/zero of=/dev/da$n count=1 Dont use disklabel -B -rw da$n auto. Use disklabel -rw da$n auto. All my disks have bootblocks and (spare) boot partitions. All the bootblocks are DD mode. I don't see any point in using obsolete fdisk tables. (There's IMHO only one purpose obsolete fdisk tables are good for, co-operation with other operating systems in the same machine. None of my machines uses anything else than FreeBSD.) Well, since they are a de facto part of the PC architecture they are also good so that you don't break BIOS's. -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
Ian Dowse wrote: In message [EMAIL PROTECTED], Peter Wemm writ es : The problem is, that you **are** using fdisk tables, you have no choice. DD mode included a *broken* fdisk table that specified an illegal geometry. ... This is why it is called dangerous. BTW, I presume you are aware of the way sysinstall creates DD MBRs; it does not use the 5 sector slice 4 method, but sets up slice 1 to cover the entire disk including the MBR, with c/h/s entries corresponding to the real start and end of the disk, e.g: cylinders=3544 heads=191 sectors/track=53 (10123 blks/cyl) ... The data for partition 1 is: sysid 165,(FreeBSD/NetBSD/386BSD) start 0, size 35885168 (17522 Meg), flag 80 (active) beg: cyl 0/ head 0/ sector 1; end: cyl 1023/ head 190/ sector 53 The data for partition 2 is: UNUSED The data for partition 3 is: UNUSED The data for partition 4 is: UNUSED Otherwise the disk layout is the same as disklabel's DD. I suspect that this approach is much less illegal than disklabel's MBRs although I do remember seeing a HP PC that disliked it. I wonder if a reasonable compromise is to make disklabel use this system for DD disks instead of the bogus 5 sector slice? Obviously, it should also somehow not install a partition table unless boot1 is being used as the MBR, and the fdisk -I method should be preferred. Yes, that is much safer, however there are certain bioses that have interesting quirks that the MBR has to work around. The problem when overlapping mbr and boot1 into the same block is that we no longer have the space to do that. boot1.s has got *3* bytes free. For example, we dont have space to fix the case where the drive number is passed through incorrectly to the mbr. Some older Intel boards have this problem (Phoenix derived bios). See boot0's setdrv option. Also (and I think this is more likely to be the problem you ran into) many newer PC's are looking at the partition tables for a suspend-to-disk partition or a FAT filesystem with a suspend-to-disk dump file. For better or worse, PC architecture dictates that boot disk partitions start and end on cylinder boundaries (except for the first one which starts on the second head in the first cylinder). When we break those rules, we have to be prepared for the consequences. However, there is light at the end of the tunnel. EFI GPT is pretty clean. It supports up to something like 16384 partitions and it has all the useful stuff we could possibly want including unique ID's, no CHS (pure 64 bit LBA), volume tags (you can name partitions etc), and so on. It is clean enough that we could almost get away with doing away with disklabel as well. Coming soon to a PC near you. (http://developer.intel.com/technology/efi/index.htm) Cheers, -Peter -- Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] All of this is for nothing if we don't go to the stars - JMS/B5 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c)
On Monday, 10 December 2001 at 10:30:04 -0800, Matthew Dillon wrote: performance without it - for reading OR writing. It doesn't matter so much for RAID{1,10}, but it matters a whole lot for something like RAID-5 where the difference between a spindle-synced read or write and a non-spindle-synched read or write can be upwards of 35%. If you have RAID5 with I/O sizes that result in full-stripe operations. Well, 'more then one disk' operations anyway, for random-I/O. Caching takes care of sequential I/O reasonably well but random-I/O goes down the drain for writes if you aren't spindle synced, no matter what the stripe size, Can you explain this? I don't see it. In FreeBSD, just about all I/O goes to buffer cache. and will go down the drain for reads if you cross a stripe - something that is quite common I think. I think this is what Mike was referring to when talking about parity calculation. In any case, going across a stripe boundary is not a good idea, though of course it can't be avoided. That's one of the arguments for large stripes. Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Dangerously Dedicated (was: cvs commit: src/sys/kern subr_diskmbr.c)
On Sunday, 9 December 2001 at 16:59:28 -0800, Peter Wemm wrote: Joerg Wunsch wrote: Mike Smith [EMAIL PROTECTED] wrote: I'd love to never hear those invalid, unuseful, misleading opinions from you again. ETOOMANYATTRIBUTES? :-) As long as you keep the feature of DD mode intact, i won't argue. If people feel like creating disks that aren't portable to another controller, they should do. I don't like this idea. We can just as easily have bootable-DD mode with a real MBR and have freebsd start on sector #2 instead of overlapping boot1 and mbr. This would seem to be a reasonable alternative. This costs only one sector instead of 64 sectors (a whopping 32K, I'm sure that is going to break the bank on today's disks). Well, the real question is the space wasted at the end, which can be up to a megabyte. Still not going to kill you, but it's aesthetically displeasing. I'd rather that we be specific about this. If somebody wants ad2e or da2e then they should not be using *any* fdisk tables at all. Ie: block 0 should be empty. The problem is that if you put /boot/boot1 in there, then suddenly it looks like a fdisk disk and we have to have ugly magic to detect it and prevent the fake table from being used. I would prefer that the fdisk table come out of /boot/boot1 so that we dont have to have it by default, and we use fdisk to install the DD magic table if somebody wants to make it bootable. So where would you put the bootstrap? In sector 2? Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c)
On Tue, Dec 11, 2001 at 11:06:33AM +1030, Greg Lehey wrote: On Monday, 10 December 2001 at 10:30:04 -0800, Matthew Dillon wrote: performance without it - for reading OR writing. It doesn't matter so much for RAID{1,10}, but it matters a whole lot for something like RAID-5 where the difference between a spindle-synced read or write and a non-spindle-synched read or write can be upwards of 35%. If you have RAID5 with I/O sizes that result in full-stripe operations. Well, 'more then one disk' operations anyway, for random-I/O. Caching takes care of sequential I/O reasonably well but random-I/O goes down the drain for writes if you aren't spindle synced, no matter what the stripe size, Can you explain this? I don't see it. In FreeBSD, just about all I/O goes to buffer cache. After waiting for the drives and not for vinum parity blocks. and will go down the drain for reads if you cross a stripe - something that is quite common I think. I think this is what Mike was referring to when talking about parity calculation. In any case, going across a stripe boundary is not a good idea, though of course it can't be avoided. That's one of the arguments for large stripes. striped: If you have 512byte stripes and have 2 disks. You access 64k which is put into 2 32k transactions onto the disk. The wait time for the complete transaction is the worst of both, which is more than the average of a single disk. With spindle syncronisation the access time for both disks are beleaved to be identic and you get the same as with a single disk. Linear speed could be about twice the speed of a single drive. But this is more theoretic today than real. The average transaction size per disk decreases with growing number of spindles and you get more transaction overhead. Also the voice coil technology used in drives since many years add a random amount of time to the access time, which invalidates some of the spindle sync potential. Plus it may break some benefits of precaching mechanisms in drives. I'm almost shure there is no real performance gain with modern drives. raid5: For a write you have two read transactions and two writes. The two read are at the same position on two different spindless and there the same access time situation exists as in the case above. We don't have the problem with decreased transaction sizes. But we have the same problem with seek time and modern disks as in the case above plus we have the problem that the drives are not exactly equaly loaded which randomizes the access times again. I doubt that we have a performance gain with modern disks in the general case, but there might be some special uses. The last drives I saw which could do spindle sync was the IBM DCHS series. There are easier things to raise performance. Ever wondered why people claim vinums raid5 writes are slow? The answer is astonishing simple: Vinum does striped based locking, while the ufs tries to lay out data mostly ascending sectors. What happens here is that the first write has to wait for two reads and two writes. If we have an ascending write it has to wait for the first write to finish, because the stripe is still locked. The first is unlocked after both physical writes are on disk. Now we start our two reads which are (thanks to drives precache) most likely in the drives cache - than we write. The problem here is that physical writes gets serialized and the drive has to wait a complete rotation between each. If we had a fine grained locking which only locks the accessed sectors in the parity we would be able to have more than a single ascending write transaction onto a single drive. At best the stripe size is bigger than the maximum number of parallel ascending writes the OS does on the volume. -- B.Walter COSMO-Project http://www.cosmo-project.de [EMAIL PROTECTED] Usergroup [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
On Mon, Dec 10, 2001 at 10:49:28AM -0800, Matthew Dillon wrote: :For RAID3 that is true. For the other ones... : : performance without it - for reading OR writing. It doesn't matter : so much for RAID{1,10}, but it matters a whole lot for something like : RAID-5 where the difference between a spindle-synced read or write : and a non-spindle-synched read or write can be upwards of 35%. : :If you have RAID5 with I/O sizes that result in full-stripe operations. : :-- :| / o / /_ _ email: [EMAIL PROTECTED] :|/|/ / / /( (_) Bulte Arnhem, The Netherlands Well, for reads a non-stripe-crossing op would still work reasonably well. But for writes less then full-stripe operations without spindle sync are going to be terrible due to the read-before-write requirement (to calculate parity). The disk cache is useless in that case. Modern disks do prereads and writes are streamed by tagged command queueing which invalidates this for linear access. For non linear access the syncronisation is shadowed partly by different seek times and different load on the spindles. The chance that the data and parity spindle have the heads on the same track is absolutely small for random access. With 15000 upm drives the maximum rotational delay is 4ms and the average is 2ms which gives you an maximum of only 1ms to gain under ideal conditions - which we don't have as I stated above. -- B.Walter COSMO-Project http://www.cosmo-project.de [EMAIL PROTECTED] Usergroup [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
As Peter Wemm wrote: Yes, that is much safer, however there are certain bioses that have interesting quirks that the MBR has to work around. The problem when overlapping mbr and boot1 into the same block is that we no longer have the space to do that. boot1.s has got *3* bytes free. Too bad. Peter, do you care to update the section about DD mode (and its dangers) in the FAQ after all this discussion? I could probably do it, too (the original entry is mine), but i had to quote your arguments only anyway. Also (and I think this is more likely to be the problem you ran into) many newer PC's are looking at the partition tables for a suspend-to-disk partition or a FAT filesystem with a suspend-to-disk dump file. Seems i really love my Toshiba (Libretto) that simply hibernates to the last nnn MB of the physical disk. ;-) (I have reserved a FreeBSD partition as a placeholder for the hibernation data.) However, there is light at the end of the tunnel. EFI GPT is pretty clean. Good to hear. While this sounds like dedicated disks will be gone then :), at least the format looks rationale enough. It supports up to something like 16384 partitions ... It would be interesting to see how Windoze will arrange for 16K drive letters. :-)) The day vinum is up and ready to also cover the root FS, i won't need /any/ partition at all anymore. ;-) As Greg Lehey wrote: ...73 of those silly messages are just beyond any form of usefulness. Hadn't we agreed to do this? I'm certainly in favour of the bootverbose approach. I can't remember any agreement so far. But thinking a bit more about it, it sounds like the best solution to me, too. The only other useful option would be to restrict the message to once per drive, but that'll cost an additional per-drive flag, which is probably too much effort just for that message. -- cheers, Jorg .-.-. --... ...-- -.. . DL8DTL http://www.sax.de/~joerg/NIC: JW11-RIPE Never trust an operating system you don't have sources for. ;-) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)
: IBM DTLA drives are known to rotate fast enough near the spindle : that the sustained write speed exceeds the ability of the controller : electronics to keep up, and results in crap being written to disk. I would adssume it actually the tracks FURTHEREST from the spindle.. With ZBR, anything is possible. Wouldn't the linear speed be faster closer to the spindle at 7200 RPM than at the edge? The stunning ignorance being displayed in this thread appears to have reached an all-time low. *sigh* -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)
: Wouldn't the linear speed be faster closer to the spindle at 7200 RPM : than at the edge? : :The stunning ignorance being displayed in this thread appears to have :reached an all-time low. : :*sigh* Ah, another poor soul who didn't read the first sentence of tuning(7). -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)
David W. Chapman Jr. wrote: : IBM DTLA drives are known to rotate fast enough near the spindle : that the sustained write speed exceeds the ability of the controller : electronics to keep up, and results in crap being written to disk. I would adssume it actually the tracks FURTHEREST from the spindle.. Wouldn't the linear speed be faster closer to the spindle at 7200 RPM than at the edge? Linear speed is closes at the edge, but magnetic domain density is higher at the spindle, for a uniform rotation rate. I think that the electronics ended up being designed for the average rate. PS: The encoding frequency is higher at the spindle, as well. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
As Peter Wemm wrote: There shouldn't *be* bootblocks on non-boot disks. dd if=/dev/zero of=/dev/da$n count=1 Dont use disklabel -B -rw da$n auto. Use disklabel -rw da$n auto. All my disks have bootblocks and (spare) boot partitions. All the bootblocks are DD mode. I don't see any point in using obsolete fdisk tables. (There's IMHO only one purpose obsolete fdisk tables are good for, co-operation with other operating systems in the same machine. None of my machines uses anything else than FreeBSD.) -- cheers, Jorg .-.-. --... ...-- -.. . DL8DTL http://www.sax.de/~joerg/NIC: JW11-RIPE Never trust an operating system you don't have sources for. ;-) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
All my disks have bootblocks and (spare) boot partitions. All the bootblocks are DD mode. I don't see any point in using obsolete fdisk tables. (There's IMHO only one purpose obsolete fdisk tables are good for, co-operation with other operating systems in the same machine. None of my machines uses anything else than FreeBSD.) There are very good reasons NOT to use DD mode if you use certain types of Adaptec SCSI controllers - they simply won't boot from DD. Aside from that, FreeBSD needs to have *one* recommendation for disks, anything else creates too much confusion. It is certainly my impression that the recommendation has been NOT using DD for the IA32 architecture for quite a while now. (The other day a coworker of mine wanted to use DD for some IBM DTLA disks, because he'd heard that the disks performed better that way - something to do with scatter-gather not working right unless you used DD. I'm highly skeptical about this since I have my own measurements from IBM DTLA disks partitioned the normal way, ie. NOT DD, and they show the disks performing extremely well. Anybody else want to comment on this?) Steinar Haug, Nethelp consulting, [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
On 09-Dec-2001 [EMAIL PROTECTED] wrote: (The other day a coworker of mine wanted to use DD for some IBM DTLA disks, because he'd heard that the disks performed better that way - something to do with scatter-gather not working right unless you used DD. I'm highly skeptical about this since I have my own measurements from IBM DTLA disks partitioned the normal way, ie. NOT DD, and they show the disks performing extremely well. Anybody else want to comment on this?) Sounds like an Old Wives Tale to me. I don't understand the need some people have for using something that is labelled as DANGEROUS. No, it won't hurt your cats but you may lose hair from using it, and for what benefit? NONE! --- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au The nice thing about standards is that there are so many of them to choose from. -- Andrew Tanenbaum To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
As Daniel O'Connor wrote: I don't understand the need some people have for using something that is labelled as DANGEROUS. Historically, it hasn't been labelled that, it only later became common terminology for it -- in the typical half-joking manner. No, it won't hurt your cats but you may lose hair from using it, and for what benefit? NONE! See my other reply about fdisk tables: they are a misdesign from the beginning. The single most wanted feature it buys you is the ability to completely forget the term `geometry' with your disks: the very first sectors of a disk always have the same BIOS int 0x13 representation, regardless of what your BIOS/controller thinks the `geometry' might be. Thus, those disks are basically portable between controller BIOSes. (Modulo those newer broken BIOSes that believe eggs must be smarter than hens -- see my other article for an opinion.) -- cheers, Jorg .-.-. --... ...-- -.. . DL8DTL http://www.sax.de/~joerg/NIC: JW11-RIPE Never trust an operating system you don't have sources for. ;-) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
As [EMAIL PROTECTED] wrote: There are very good reasons NOT to use DD mode if you use certain types of Adaptec SCSI controllers - they simply won't boot from DD. Never seen. All my SCSI controllers so far booted from my disks (obviously :). I figure from Peter's comment in that piece of code that the original (386BSD 0.0 inherited) DD mode fake fdisk table apparently had some poor (faked) values inside that could upset some BIOSes. That's bad, and IMHO we should fix what could be fixed, but without dropping that feature entirely (see below). personal opinion Still, it's my opinion that these BIOSes are simply broken: interpretation of the fdisk table has always been in the realm of the boot block itself. The BIOS should decide whether a disk is bootable or not by looking at the 0x55aa signature at the end, nothing else. Think of the old OnTrack Disk Manager that extended the fdisk table to 16 slots -- nothing the BIOS could ever even handle. It was in the realm of the boot block to interpret it. /personal opinion Aside from that, FreeBSD needs to have *one* recommendation for disks, anything else creates too much confusion. DD mode has never been a recommendation. It is for those who know what it means. I'm only against the idea to silently drop support for it... fdisk tables are something that has been designed in the previous millenium, and i think nobody is going to argue about it that they are rather a mis-design from the beginning (or even no design at all, but an ad-hoc implementation). Two different values for the same (which could become conflicting, thus making disks unportable between different controllers), not enough value space to even remotely cover the disks of our millenium, enforcement of something they call `geometry' which isn't even remotely related to the disks' geometry anymore at all, far too few possible entries anyway, ... FreeBSD is in a position where it doesn't strictly require the existence of such an obsoleted implementation detail, so we should users leave the freedom of decision. Again, it has never been the recommendation (well, at least not after 386BSD 0.0 :), and i normally wouldn't recommend it to the innocent user. But we should not break it either. (The other day a coworker of mine wanted to use DD for some IBM DTLA disks, because he'd heard that the disks performed better that way - something to do with scatter-gather not working right unless you used DD. [...]) As much as i personally prefer DD mode: that's an urban legend. -- cheers, Jorg .-.-. --... ...-- -.. . DL8DTL http://www.sax.de/~joerg/NIC: JW11-RIPE Never trust an operating system you don't have sources for. ;-) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
As Peter Wemm wrote: There shouldn't *be* bootblocks on non-boot disks. dd if=/dev/zero of=/dev/da$n count=1 Dont use disklabel -B -rw da$n auto. Use disklabel -rw da$n auto. All my disks have bootblocks and (spare) boot partitions. All the bootblocks are DD mode. I don't see any point in using obsolete fdisk tables. (There's IMHO only one purpose obsolete fdisk tables are good for, co-operation with other operating systems in the same machine. None of my machines uses anything else than FreeBSD.) Since I tire of seeing people hit this ignorant opinion in the list archives, I'll just offer the rational counterpoints. - The MBR partition table is not obsolete, it's a part of the PC architecture specification. - You omit the fact that many peripheral device vendors' BIOS code looks for the MBR partition table, and will fail if it's not present or incorrect. You do realise that DD mode does include a (invalid) MBR partition table (now valid, courtesy of a long-needed fix), right? I'd love to never hear those invalid, unuseful, misleading opinions from you again. -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
(The other day a coworker of mine wanted to use DD for some IBM DTLA disks, because he'd heard that the disks performed better that way - something to do with scatter-gather not working right unless you used DD. I'm highly skeptical about this since I have my own measurements from IBM DTLA disks partitioned the normal way, ie. NOT DD, and they show the disks performing extremely well. Anybody else want to comment on this?) Since scatter-gather has nothing to do with the disk (it's a feature of the disk controller's interface to host memory), I think this coworker of yours is delusional. -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
Mike Smith [EMAIL PROTECTED] wrote: - The MBR partition table is not obsolete, it's a part of the PC architecture specification. Its design is antique. Or rather: it's missing a design. See other mail for the reasons. For FreeBSD, it's obsolete since we don't need to rely on fdisk slices. (Or rather: it's optional. We can make good use of it when it's there, but we don't need to insist on it being there.) You do realise that DD mode does include a (invalid) MBR partition table (now valid, courtesy of a long-needed fix), right? Yes, of course, one that is basically ignored by everything. It has always been there, back since 386BSD 0.1. 386BSD 0.0 didn't recognize fdisk tables at all, but could only live on a disk by its own (as any other BSD before used to). I'd love to never hear those invalid, unuseful, misleading opinions from you again. ETOOMANYATTRIBUTES? :-) As long as you keep the feature of DD mode intact, i won't argue. If people feel like creating disks that aren't portable to another controller, they should do. I don't like this idea. But to be honest, see my other article: i never argued to make this the default or a recommended strategy in any form. It should only remain intact at all. Back to the subject, the current warning however, is pointless, and has the major drawback to potentially hide important console messages. -- cheers, Jorg .-.-. --... ...-- -.. . DL8DTL http://www.sax.de/~joerg/NIC: JW11-RIPE Never trust an operating system you don't have sources for. ;-) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)
On Sunday, 9 December 2001 at 22:52:58 +1030, Daniel O'Connor wrote: On 09-Dec-2001 [EMAIL PROTECTED] wrote: (The other day a coworker of mine wanted to use DD for some IBM DTLA disks, because he'd heard that the disks performed better that way - something to do with scatter-gather not working right unless you used DD. I'm highly skeptical about this since I have my own measurements from IBM DTLA disks partitioned the normal way, ie. NOT DD, and they show the disks performing extremely well. Anybody else want to comment on this?) Sounds like an Old Wives Tale to me. I don't understand the need some people have for using something that is labelled as DANGEROUS. I don't understand the need some people have for labelling something as DANGEROUS when it works nearly all the time. We don't have many disks which are shared between different platforms, but that will change. As you know, I have the ability to hot swap disks between an RS/6000 platform and an ia32 platform. The RS/6000 disks will never have a Microsoft partition table on them. They will have BSD partition tables on them. Why call this dangerous? Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Dangerously Decidated yet again (was : cvs commit: src/sys/kern subr_diskmbr.c)
On Sunday, 9 December 2001 at 12:15:19 -0800, Mike Smith wrote: As Peter Wemm wrote: There shouldn't *be* bootblocks on non-boot disks. dd if=/dev/zero of=/dev/da$n count=1 Dont use disklabel -B -rw da$n auto. Use disklabel -rw da$n auto. All my disks have bootblocks and (spare) boot partitions. All the bootblocks are DD mode. I don't see any point in using obsolete fdisk tables. (There's IMHO only one purpose obsolete fdisk tables are good for, co-operation with other operating systems in the same machine. None of my machines uses anything else than FreeBSD.) Since I tire of seeing people hit this ignorant opinion in the list archives, I'll just offer the rational counterpoints. - The MBR partition table is not obsolete, it's a part of the PC architecture specification. And if it's part of the PC architecture specification, it can't be obsolete? I dont see any contradiction here. - You omit the fact that many peripheral device vendors' BIOS code looks for the MBR partition table, and will fail if it's not present or incorrect. What do you mean by peripheral device? I've never heard of disk drives having a BIOS. If you're talking about host adaptors, it's you who omit what Jörg says about it: No, on the contrary, he went into some detail on this point: On Sunday, 9 December 2001 at 19:46:06 +0100, Joerg Wunsch wrote: personal opinion Still, it's my opinion that these BIOSes are simply broken: interpretation of the fdisk table has always been in the realm of the boot block itself. The BIOS should decide whether a disk is bootable or not by looking at the 0x55aa signature at the end, nothing else. Think of the old OnTrack Disk Manager that extended the fdisk table to 16 slots -- nothing the BIOS could ever even handle. It was in the realm of the boot block to interpret it. /personal opinion I agree with Jörg on this. I'd love to never hear those invalid, unuseful, misleading opinions from you again. I'd love to never have to see this level of invective poured onto what was previously a calm discussion. Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
Joerg Wunsch wrote: As Peter Wemm wrote: There shouldn't *be* bootblocks on non-boot disks. dd if=/dev/zero of=/dev/da$n count=1 Dont use disklabel -B -rw da$n auto. Use disklabel -rw da$n auto. All my disks have bootblocks and (spare) boot partitions. All the bootblocks are DD mode. I don't see any point in using obsolete fdisk tables. (There's IMHO only one purpose obsolete fdisk tables are good for, co-operation with other operating systems in the same machine. None of my machines uses anything else than FreeBSD.) The problem is, that you **are** using fdisk tables, you have no choice. DD mode included a *broken* fdisk table that specified an illegal geometry. This illegal geometry was the reason why Thinkpad Laptops would wedge solid when you installed FreeBSD on it. This illegal geometry is the reason why FreeBSD disks wedge solid any EFI system unless you remove the illegal geometry tables. This illegal geometry causes divide by zero errors in a handful of scsi bioses from Adaptec. This illegal geometry causes divide by zero errors in a handful of scsi bioses from NCR/Symbios. This is why it is called dangerous. Cheers, -Peter -- Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] All of this is for nothing if we don't go to the stars - JMS/B5 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
:This illegal geometry causes divide by zero errors in a handful of scsi :bioses from Adaptec. : :This illegal geometry causes divide by zero errors in a handful of scsi :bioses from NCR/Symbios. : :This is why it is called dangerous. : :Cheers, :-Peter :-- :Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Handful? I'm taking my life in my hands if I DD a DELL machine. BEWM! As I found out the hard way about a year ago. (Probably the Adaptec firmware but every Dell rack-mount has one so...). The machines wouldn't boot again until I pulled the physical drives and then camcontrol rescan'd them in after a CD boot. Joy. This is why I fixed disklabel -B to operate properly on slices and added a whole section to the end of 'man disklabel' to describe how to do it. -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
Joerg Wunsch wrote: Mike Smith [EMAIL PROTECTED] wrote: - The MBR partition table is not obsolete, it's a part of the PC architecture specification. Its design is antique. Or rather: it's missing a design. See other mail for the reasons. For FreeBSD, it's obsolete since we don't need to rely on fdisk slices. (Or rather: it's optional. We can make good use of it when it's there, but we don't need to insist on it being there.) No, it isn't. We specifically have a copy of both the broken and fixed fdisk tables in the kernel and do a bcmp() to see if the fdisk table that is included in /boot/boot1 **uncoditionally** is in fact the dangerously dedicated table. If it is, then we specifically reject it or we end up with a disk size of 25MB (5 sectors). You do realise that DD mode does include a (invalid) MBR partition table (now valid, courtesy of a long-needed fix), right? Yes, of course, one that is basically ignored by everything. It has always been there, back since 386BSD 0.1. 386BSD 0.0 didn't recognize fdisk tables at all, but could only live on a disk by its own (as any other BSD before used to). No, it isn't ignored, BIOS'es know that fdisk partitions end on cylinder boundaries, and therefore can intuit what the expected geometry is for the disk in question. It does this in order to allow the CHS int 0x13 calls to work. The problem is that the int13 code only allowed for 255 heads, and the fake end of disk entry that is unconditionally in /boot/boot1 specified an ending head number 255 (ie: 256 heads). When this gets put into a byte register it is truncated to zero and we get divide by zero errors. I'd love to never hear those invalid, unuseful, misleading opinions from you again. ETOOMANYATTRIBUTES? :-) As long as you keep the feature of DD mode intact, i won't argue. If people feel like creating disks that aren't portable to another controller, they should do. I don't like this idea. We can just as easily have bootable-DD mode with a real MBR and have freebsd start on sector #2 instead of overlapping boot1 and mbr. This costs only one sector instead of 64 sectors (a whopping 32K, I'm sure that is going to break the bank on today's disks). I'd rather that we be specific about this. If somebody wants ad2e or da2e then they should not be using *any* fdisk tables at all. Ie: block 0 should be empty. The problem is that if you put /boot/boot1 in there, then suddenly it looks like a fdisk disk and we have to have ugly magic to detect it and prevent the fake table from being used. I would prefer that the fdisk table come out of /boot/boot1 so that we dont have to have it by default, and we use fdisk to install the DD magic table if somebody wants to make it bootable. But to be honest, see my other article: i never argued to make this the default or a recommended strategy in any form. It should only remain intact at all. Back to the subject, the current warning however, is pointless, and has the major drawback to potentially hide important console messages. The console buffer is 32K these days. You'd have to have around 300 disks to have any real effect on the kernel. Cheers, -Peter -- Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] All of this is for nothing if we don't go to the stars - JMS/B5 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously Decidated yet again (was : cvs commit: src/sys/kern subr_diskmbr.c)
On Sunday, 9 December 2001 at 19:46:06 +0100, Joerg Wunsch wrote: personal opinion Still, it's my opinion that these BIOSes are simply broken: Joerg's personal opinion can go take a hike. The reality of the situation is that this table is required, and we're going to put it there. End of story. -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously Decidated yet again (was : cvs commit: src/sys/kern subr_diskmbr.c)
On Sunday, 9 December 2001 at 18:32:38 -0800, Mike Smith wrote: On Sunday, 9 December 2001 at 19:46:06 +0100, Joerg Wunsch wrote: personal opinion Still, it's my opinion that these BIOSes are simply broken: Joerg's personal opinion can go take a hike. The reality of the situation is that this table is required, and we're going to put it there. The reality of the situation is far from being clear. The only thing I can see is that you're trying to dictate things without adequate justification. You should reconsider that attitude. Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)
Greg Lehey wrote: [ ... IBM DTLA drives ... ] IBM DTLA drives are known to rotate fast enough near the spindle that the sustained write speed exceeds the ability of the controller electronics to keep up, and results in crap being written to disk. This is not often a problem with windows, the FS of shich fills sectors in towards the spindle, so you only hit the problem when you near the disk full state. Do a Google/Tom's Hardware search to reassure yourself that I am not smoking anything. I don't understand the need some people have for using something that is labelled as DANGEROUS. I don't understand the need some people have for labelling something as DANGEROUS when it works nearly all the time. It's because you have to reinstall, should you want to add a second OS at a later date (e.g. Linux, or Windows). We don't have many disks which are shared between different platforms, but that will change. As you know, I have the ability to hot swap disks between an RS/6000 platform and an ia32 platform. The RS/6000 disks will never have a Microsoft partition table on them. They will have BSD partition tables on them. Why call this dangerous? Your use is orthogonal to the most common expected usage, which is disks shared between OSs on a single platform, rather than disks shared between a single OS on multiple platforms. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
Joerg Wunsch wrote: Mike Smith [EMAIL PROTECTED] wrote: - The MBR partition table is not obsolete, it's a part of the PC architecture specification. Its design is antique. Or rather: it's missing a design. See other mail for the reasons. For FreeBSD, it's obsolete since we don't need to rely on fdisk slices. (Or rather: it's optional. We can make good use of it when it's there, but we don't need to insist on it being there.) FWIW: The MBR layout is documented in great gory detail in chapter 6 of the PReP specififcation, which I believe is now available on line from the PowerPC folks, Apple, and Motorolla, and also as an IBM redbook. It discusses everything, including the LBA fields, and sharing disks between PPC (running in Motorolla byte order) and x86 machines (running a DOS-derived OS). -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)
On Sunday, 9 December 2001 at 18:46:24 -0800, Terry Lambert wrote: Greg Lehey wrote: [ ... IBM DTLA drives ... ] No, that wasn't me. IBM DTLA drives are known to rotate fast enough near the spindle that the sustained write speed exceeds the ability of the controller electronics to keep up, and results in crap being written to disk. What about the cache? This is not often a problem with windows, the FS of shich fills sectors in towards the spindle, so you only hit the problem when you near the disk full state. This sounds very unlikely. Do a Google/Tom's Hardware search to reassure yourself that I am not smoking anything. I think I'd rather put the shoe on the other foot. This looks like high-grade crack. Who was smoking it? I don't understand the need some people have for using something that is labelled as DANGEROUS. I don't understand the need some people have for labelling something as DANGEROUS when it works nearly all the time. I *did* write this. It's because you have to reinstall, should you want to add a second OS at a later date (e.g. Linux, or Windows). So all dedicated installations are dangerous? I would have to do that whether I had a Microsoft partition table or not if I had already used the entire disk for FreeBSD. We don't have many disks which are shared between different platforms, but that will change. As you know, I have the ability to hot swap disks between an RS/6000 platform and an ia32 platform. The RS/6000 disks will never have a Microsoft partition table on them. They will have BSD partition tables on them. Why call this dangerous? Your use is orthogonal to the most common expected usage, which is disks shared between OSs on a single platform, rather than disks shared between a single OS on multiple platforms. Expected usage is to install once and then never change it. Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)
Greg Lehey wrote: [ ... IBM DTLA drives ... ] No, that wasn't me. I didn't quote the full thing; that's what the brackets and ellipsis was for. IBM DTLA drives are known to rotate fast enough near the spindle that the sustained write speed exceeds the ability of the controller electronics to keep up, and results in crap being written to disk. What about the cache? Good point. The cache is known to not actually flush to disk when ordered to do so. See the EXT3FS article on www.ibm.com/developerworks for more details. This is not often a problem with windows, the FS of shich fills sectors in towards the spindle, so you only hit the problem when you near the disk full state. This sounds very unlikely. I know, doesn't it? Good thing Tom's Hardware is so thorough, or we might never have known this, with everyone on the verge of discovering it simply dismissing it as very unlikely. 8^). Do a Google/Tom's Hardware search to reassure yourself that I am not smoking anything. I think I'd rather put the shoe on the other foot. This looks like high-grade crack. Who was smoking it? Tom's Hardware, IBM, CNET, Storave Review, etc.. http://www6.tomshardware.com/storage/00q3/000821/ibmdtla-07.html http://www.storage.ibm.com/hdd/prod/deskstar.htm http://computers.cnet.com/hardware/0-1092-418-1664463.html?pn=3lb=2ob=0tag=st\.co.1092.bottom.1664463-3 http://www.storagereview.com/welcome.pl?/http://www.storagereview.com/jive/sr/thread.jsp?forum=2thread=12485 I suggest the search: http://google.yahoo.com/bin/query?p=DTLA+drive+problemhc=0hs=0 It's because you have to reinstall, should you want to add a second OS at a later date (e.g. Linux, or Windows). So all dedicated installations are dangerous? I would have to do that whether I had a Microsoft partition table or not if I had already used the entire disk for FreeBSD. Yes. I don't understand your point. Your use is orthogonal to the most common expected usage, which is disks shared between OSs on a single platform, rather than disks shared between a single OS on multiple platforms. Expected usage is to install once and then never change it. No, expected usage is to purchase a machine with an OS preinstalled, and then install FreeBSD/Linux/BeOS/other third party OS as an also ran, rather than the primary OS. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)
Greg Lehey wrote: [ ... DTLA drives ... ] Do a Google/Tom's Hardware search to reassure yourself that I am not smoking anything. I think I'd rather put the shoe on the other foot. This looks like high-grade crack. Who was smoking it? For your further amusement, here is a pointer to the class action lawsuit against IBM on the 75GXP DTLA drives: http://www.tech-report.com/news_reply.x/3035/3/ It includes a pointer to the PDF of the complaint form. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)
On google search for: deskstar 75gxp class action http://www.theregister.co.uk/content/54/22412.html http://www.pcworld.com/news/article/0,aid,67608,00.asp etc... So apparently my warning about these drives in 'man tuning' is still appropriate :-) -Matt : IBM DTLA drives are known to rotate fast enough near the spindle : that the sustained write speed exceeds the ability of the controller : electronics to keep up, and results in crap being written to disk. : : What about the cache? : :Good point. The cache is known to not actually flush to disk when :ordered to do so. See the EXT3FS article on www.ibm.com/developerworks :for more details. : : This is not often a problem with windows, the FS of shich fills : sectors in towards the spindle, so you only hit the problem when you : near the disk full state. : : This sounds very unlikely. : :I know, doesn't it? Good thing Tom's Hardware is so thorough, or we :might never have known this, with everyone on the verge of discovering :it simply dismissing it as very unlikely. 8^). :... To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
On Sun, Dec 09, 2001 at 11:00:19PM +0100, Joerg Wunsch wrote: Mike Smith [EMAIL PROTECTED] wrote: - The MBR partition table is not obsolete, it's a part of the PC architecture specification. Its design is antique. Or rather: it's missing a design. See other mail for the reasons. For FreeBSD, it's obsolete since we don't need to rely on fdisk slices. (Or rather: it's optional. We can make good use of it when it's there, but we don't need to insist on it being there.) Jorg, why not just buy an Alpha or Sun Blade and run FreeBSD on it?? You will get the traditional Unix workstation partitioning you so much long for. It really seems your arguments are nothing more than MBR's are a M$ and IBM PeeCee thing, and I hate anything PeeCee. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
IBM DTLA drives (was: Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c) )
Matthew Dillon wrote: : etc... So apparently my warning about these drives in 'man tuning' is : still appropriate :-) : :-Matt : : : IBM DTLA drives are known to rotate fast enough near the spindle : : that the sustained write speed exceeds the ability of the controller : : electronics to keep up, and results in crap being written to disk. : : :I would adssume it actually the tracks FURTHEREST from the spindle.. This is the first I've heard of the alleged controller electronics performance problem. My understanding is that the failures are due to manufacturing problems, but people have apparently experienced software lockups as well. What is not in doubt is that there have been some severe problems with this model. Yes there are two problems. The physical failure problem seems to be mostly restricted to the 75GXP. However the electronics/bandwidth/ density/whatever-it-is problem is uniform across the entire DTLA line. We stopped using 75GXP's at work a while back, but we still regularly suffer from the electronics/bandwidth/whatever-it-is problem on 30G DTLA drives on a daily basis. Cheers, -Peter -- Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] All of this is for nothing if we don't go to the stars - JMS/B5 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)
On Sun, Dec 09, 2001 at 06:46:24PM -0800, Terry Lambert wrote: It's because you have to reinstall, should you want to add a second OS at a later date (e.g. Linux, or Windows). I think it has more to do with the drive going on a new motherboard that might not boot with dangerously dedicated than the above. -- David W. Chapman Jr. [EMAIL PROTECTED] Raintree Network Services, Inc. www.inethouston.net [EMAIL PROTECTED] FreeBSD Committer www.FreeBSD.org To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)
David W. Chapman Jr. wrote: On Sun, Dec 09, 2001 at 06:46:24PM -0800, Terry Lambert wrote: It's because you have to reinstall, should you want to add a second OS at a later date (e.g. Linux, or Windows). I think it has more to do with the drive going on a new motherboard that might not boot with dangerously dedicated than the above. The concept of dangerously dedicated significantly predates BIOS being unable to boot such drives, either because of antivirus checks, or because of automatic fictitious geometry determination by Adaptec or NCR (now Symbios) controllers, which end up getting divide by zero errors when parsing the fictitious partition table that the FreeBSD dangerously dedicate mode includes in its boot block. In fact, I remember installing 386BSD dangerously dedicated on an ATT WGS 386 ESDI drive, back in 1992. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)
: IBM DTLA drives are known to rotate fast enough near the spindle : that the sustained write speed exceeds the ability of the controller : electronics to keep up, and results in crap being written to disk. I would adssume it actually the tracks FURTHEREST from the spindle.. Wouldn't the linear speed be faster closer to the spindle at 7200 RPM than at the edge? -- David W. Chapman Jr. [EMAIL PROTECTED] Raintree Network Services, Inc. www.inethouston.net [EMAIL PROTECTED] FreeBSD Committer www.FreeBSD.org To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)
David W. Chapman Jr. wrote: : IBM DTLA drives are known to rotate fast enough near the spindle : that the sustained write speed exceeds the ability of the controller : electronics to keep up, and results in crap being written to disk. I would adssume it actually the tracks FURTHEREST from the spindle.. Wouldn't the linear speed be faster closer to the spindle at 7200 RPM than at the edge? This particular tangent of the disk partitioning thread has gone *way* off topic. :-) Cheers, -Peter -- Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] All of this is for nothing if we don't go to the stars - JMS/B5 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)
David W. Chapman Jr. wrote: On Sun, Dec 09, 2001 at 06:46:24PM -0800, Terry Lambert wrote: It's because you have to reinstall, should you want to add a second OS at a later date (e.g. Linux, or Windows). I think it has more to do with the drive going on a new motherboard that might not boot with dangerously dedicated than the above. .. And the mere presence of one of the disks that causes the bios to lock up at boot. Note that this is a particularly bad thing in laptops. There are three classes of behavior: 1) You luck out and it works 2) You get a bios divide-by-zero fault when you *boot* of the disk. This shows up as a 'BTX fault'. If you check the lists, a good number of btx faults posted to the lists have int=0 (divide by zero) in them. The problem is more widespread than it appears. 3) You get a system lockup when booting the *computer* if *any* DD disk is attached anywhere at all. This is what killed the Thinkpad T20*, A20*, 600X etc. After all the yelling we did at IBM, it turned out to be FreeBSD's fault. It also happens on Dell systems. It kills all IA64 boxes if a FreeBSD/i386 disk is attached anywhere. An additional problem is that because boot1 has got a fdisk table embedded in it unconditionally, a freebsd partition *looks* like it has got a recursive MBR in it. This is what is really bad and is what is killing us on newer systems. What really sucks is that there is **NO WAY** to remove it with the tools that we have except a hex editor. Cheers, -Peter -- Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] All of this is for nothing if we don't go to the stars - JMS/B5 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)
+---[ David W. Chapman Jr. ]-- | : IBM DTLA drives are known to rotate fast enough near the spindle | : that the sustained write speed exceeds the ability of the controller | : electronics to keep up, and results in crap being written to disk. | | | I would adssume it actually the tracks FURTHEREST from the spindle.. | | | Wouldn't the linear speed be faster closer to the spindle at 7200 RPM | than at the edge? er no. The circumference of a circle is 2 PI r. So as your distance from the spindle increases the amount of physical real estate you're traversing increases. Since you are turning at a constant angular velocity, your linear velocity increases as the distance from the spindle increases by a factor of PI (or around 3 if you're not a maths person). Even been at one of those carnivals where they have a spinning thing? It's easier to stay near the centre, than near the edges, because you are moving a *lot* quicker at the edges. And just for the hell of it; If you have a 3 unit disc doing 1 RPM If you're 1/2 unit out you're doing ~3 units/sec If you're one unit out, you're doing ~6 units/sec If you're two units out you're doing ~12 units/sec at three;~18 units/sec Multiply by 7200 and s/units/inches/ The outside of your disk is really moving The density of the sectors at the outer edge is lighter than near the centre, which mitigates the speed some what. See Also: artficial gravity in space stations/ships/objects -- Totally Holistic Enterprises Internet| | Andrew Milton The Internet (Aust) Pty Ltd | | ACN: 082 081 472 ABN: 83 082 081 472 | M:+61 416 022 411 | Carpe Daemon PO Box 837 Indooroopilly QLD 4068|[EMAIL PROTECTED]| To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)
On Sunday, 9 December 2001 at 22:44:52 -0800, Peter Wemm wrote: 3) You get a system lockup when booting the *computer* if *any* DD disk is attached anywhere at all. This is what killed the Thinkpad T20*, A20*, 600X etc. After all the yelling we did at IBM, it turned out to be FreeBSD's fault. It also happens on Dell systems. It kills all IA64 boxes if a FreeBSD/i386 disk is attached anywhere. What are you talking about? The IBM lockup was due to the presence of *Microsoft partition* of type 0xn5, for any value of n. If these systems also lock up with a dedicated disk, it's due to some other bug. Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
Bernd Walter [EMAIL PROTECTED] wrote: 32 times for each disk on booting with most of 30 disks. Possibly it's triggered by vinums drive scanning. Yep, same here (and it is triggered by vinum). What can I do about these messages? Remove it. It should not have been there in the first place, at least not without an if (bootverbose) ... in front of it. It isn't telling any news anyway, because you certainly already knew that your disks are using DD mode, and the last word is telling (ignored) which is the intented and expected action to happen anyway. I do understand Peters gripe about broken BIOSes that try to interpret fdisk tables (where the fdisk table is actually in the domain of the boot block itself). The comments tell a bit more about it. But adding pointless messages that flush the boot log and possibly hide important boot messages can't be goo. -- cheers, Jorg .-.-. --... ...-- -.. . DL8DTL http://www.sax.de/~joerg/NIC: JW11-RIPE Never trust an operating system you don't have sources for. ;-) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
:boot block itself). The comments tell a bit more about it. But :adding pointless messages that flush the boot log and possibly hide :important boot messages can't be goo. : :-- :cheers, Jorg .-.-. --... ...-- -.. . DL8DTL Yes, Goo in the computer is wery, wery bad! -Matt To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
Joerg Wunsch wrote: Bernd Walter [EMAIL PROTECTED] wrote: 32 times for each disk on booting with most of 30 disks. Possibly it's triggered by vinums drive scanning. Yep, same here (and it is triggered by vinum). What can I do about these messages? Remove it. It should not have been there in the first place, at least not without an if (bootverbose) ... in front of it. It isn't telling any news anyway, because you certainly already knew that your disks are using DD mode, and the last word is telling (ignored) which is the intented and expected action to happen anyway. There shouldn't *be* bootblocks on non-boot disks. dd if=/dev/zero of=/dev/da$n count=1 Dont use disklabel -B -rw da$n auto. Use disklabel -rw da$n auto. Cheers, -Peter -- Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] All of this is for nothing if we don't go to the stars - JMS/B5 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
On Sat, Dec 08, 2001 at 05:09:11PM -0800, Peter Wemm wrote: Joerg Wunsch wrote: Bernd Walter [EMAIL PROTECTED] wrote: 32 times for each disk on booting with most of 30 disks. Possibly it's triggered by vinums drive scanning. Yep, same here (and it is triggered by vinum). What can I do about these messages? Remove it. It should not have been there in the first place, at least not without an if (bootverbose) ... in front of it. It isn't telling any news anyway, because you certainly already knew that your disks are using DD mode, and the last word is telling (ignored) which is the intented and expected action to happen anyway. There shouldn't *be* bootblocks on non-boot disks. I usually have a /boot/loader.work and a /boot/kernel.work for updating. What is wrong about having spare bootblocks? In fact I already needed them twice and the diskspace is unused anyway. -- B.Walter COSMO-Project http://www.cosmo-project.de [EMAIL PROTECTED] Usergroup [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: cvs commit: src/sys/kern subr_diskmbr.c
On Wed, Nov 21, 2001 at 12:31:45AM -0800, Peter Wemm wrote: peter 2001/11/21 00:31:45 PST Modified files: sys/kern subr_diskmbr.c Log: Recognize the fixed geometry in boot1 so that DD disks are not interpreted as real fdisk tables (and fail). Revision ChangesPath 1.53 +31 -6 src/sys/kern/subr_diskmbr.c Maybe I'm a bit late with this subject. I have updated a machine yesterday and get these messages: da28: invalid primary partition table: Dangerously Dedicated (ignored) 32 times for each disk on booting with most of 30 disks. Possibly it's triggered by vinums drive scanning. OK it was unneeded to install bootblocks on dedicated disks other than the boot device, but it's a lot of noise for that. What can I do about these messages? -- B.Walter COSMO-Project http://www.cosmo-project.de [EMAIL PROTECTED] Usergroup [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message