Re: Vinum write performance (was: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c))

2001-12-12 Thread Bernd Walter

On Wed, Dec 12, 2001 at 04:22:05PM +1030, Greg Lehey wrote:
 On Tuesday, 11 December 2001 at  3:11:21 +0100, Bernd Walter wrote:
  striped:
  If you have 512byte stripes and have 2 disks.
  You access 64k which is put into 2 32k transactions onto the disk.
 
 Only if your software optimizes the transfers.  There are reasons why
 it should not.  Without optimization, you get 128 individual
 transfers.

If the software does not we end with 128 transactions anyway, which is
not very good becuase of the overhead for each of them.
UFS does a more or less good job in doing this.

  Linear speed could be about twice the speed of a single drive.  But
  this is more theoretic today than real.  The average transaction
  size per disk decreases with growing number of spindles and you get
  more transaction overhead.  Also the voice coil technology used in
  drives since many years add a random amount of time to the access
  time, which invalidates some of the spindle sync potential.  Plus it
  may break some benefits of precaching mechanisms in drives.  I'm
  almost shure there is no real performance gain with modern drives.
 
 The real problem with this scenario is that you're missing a couple of
 points:
 
 1.  Typically it's not the latency that matters.  If you have to wait
 a few ms longer, that's not important.  What's interesting is the
 case of a heavily loaded system, where the throughput is much more
 important than the latency.

Agreed - especially because we don't wait for writes as most are async.

 2.  Throughput is the data transferred per unit time.  There's active
 transfer time, nowadays in the order of 500 µs, and positioning
 time, in the order of 6 ms.  Clearly the fewer positioning
 operations, the better.  This means that you should want to put
 most transfers on a single spindle, not a single stripe.  To do
 this, you need big stripes.

In the general case yes.

  raid5:
  For a write you have two read transactions and two writes.
 
 This is the way Vinum does it.  There are other possibilities:
 
 1.  Always do full-stripe writes.  Then you don't need to read the old
 contents.

Which isn't that good with the big stripes we usually want.

 2.  Cache the parity blocks.  This is an optimization which I think
 would be very valuable, but which Vinum doesn't currently perform.

I thought of connecting the parity to the wait lock.
If there's a waiter for the same parity data it's not droped.
This way we don't waste memory but still have an efect.

  There are easier things to raise performance.
  Ever wondered why people claim vinums raid5 writes are slow?
  The answer is astonishing simple:
  Vinum does striped based locking, while the ufs tries to lay out data
  mostly ascending sectors.
  What happens here is that the first write has to wait for two reads
  and two writes.
  If we have an ascending write it has to wait for the first write to
  finish, because the stripe is still locked.
  The first is unlocked after both physical writes are on disk.
  Now we start our two reads which are (thanks to drives precache)
  most likely in the drives cache - than we write.
 
  The problem here is that physical writes gets serialized and the drive
  has to wait a complete rotation between each.
 
 Not if the data is in the drive cache.

This example was for writing.
Reads get precached by the drive and have a very good chance of beeing
in the cache.
It doesn't matter on IDE disks, because if you have write cache enabled
the write gets acked from the cache and not the media.  If write cache
is disabled writes gets serialized anyway.

  If we had a fine grained locking which only locks the accessed sectors
  in the parity we would be able to have more than a single ascending
  write transaction onto a single drive.
 
 Hmm.  This is something I hadn't thought about.  Note that sequential
 writes to a RAID-5 volume don't go to sequential addresses on the
 spindles; they will work up to the end of the stripe on one spindle,
 then start on the next spindle at the start of the stripe.  You can do
 that as long as the address ranges in the parity block don't overlap,
 but the larger the stripe, the greater the likelihood of this would
 be. This might also explain the following observed behaviour:
 
 1.  RAID-5 writes slow down when the stripe size gets  256 kB or so.
 I don't know if this happens on all disks, but I've seen it often
 enough.

I would guess it when the stripe size is bigger than the preread cache
the drives uses.
This would mean we have a less chance to get parity data out of the
drive cache.

 2.  rawio write performance is better than ufs write performance.
 rawio does truly random transfers, where ufs is a mixture.

The current problem is to increase linear write performance.
I don't see a chance that rawio benefit of it, but ufs will.

 Do you feel like changing the locking code?  It shouldn't be that much
 work, and I'd be interested to 

Re: Vinum write performance (was: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c))

2001-12-12 Thread Greg Lehey

On Wednesday, 12 December 2001 at 12:53:37 +0100, Bernd Walter wrote:
 On Wed, Dec 12, 2001 at 04:22:05PM +1030, Greg Lehey wrote:
 On Tuesday, 11 December 2001 at  3:11:21 +0100, Bernd Walter wrote:
 striped:
 If you have 512byte stripes and have 2 disks.
 You access 64k which is put into 2 32k transactions onto the disk.

 Only if your software optimizes the transfers.  There are reasons why
 it should not.  Without optimization, you get 128 individual
 transfers.

 If the software does not we end with 128 transactions anyway, which is
 not very good becuase of the overhead for each of them.

Correct.

 UFS does a more or less good job in doing this.

Well, it requires a lot of moves.  Vinum *could* do this, but for the
reasons specified below, there's no need.

 raid5:
 For a write you have two read transactions and two writes.

 This is the way Vinum does it.  There are other possibilities:

 1.  Always do full-stripe writes.  Then you don't need to read the old
 contents.

 Which isn't that good with the big stripes we usually want.

Correct.  That's why most RAID controllers limit stripe size to
something sub-optimal, because it simplifies the code to do
full-stripe writes.

 2.  Cache the parity blocks.  This is an optimization which I think
 would be very valuable, but which Vinum doesn't currently perform.

 I thought of connecting the parity to the wait lock.
 If there's a waiter for the same parity data it's not droped.
 This way we don't waste memory but still have an efect.

That's a possibility, though it doesn't directly address parity block
caching.  The problem is that by the time you find another lock,
you've already performed part of the parity calculation, and probably
part of the I/O transfer.  But it's an interesting consideration.

 If we had a fine grained locking which only locks the accessed sectors
 in the parity we would be able to have more than a single ascending
 write transaction onto a single drive.

 Hmm.  This is something I hadn't thought about.  Note that sequential
 writes to a RAID-5 volume don't go to sequential addresses on the
 spindles; they will work up to the end of the stripe on one spindle,
 then start on the next spindle at the start of the stripe.  You can do
 that as long as the address ranges in the parity block don't overlap,
 but the larger the stripe, the greater the likelihood of this would
 be. This might also explain the following observed behaviour:

 1.  RAID-5 writes slow down when the stripe size gets  256 kB or so.
 I don't know if this happens on all disks, but I've seen it often
 enough.

 I would guess it when the stripe size is bigger than the preread cache
 the drives uses.
 This would mean we have a less chance to get parity data out of the
 drive cache.

Yes, this was one of the possibilities we considered.  

 2.  rawio write performance is better than ufs write performance.
 rawio does truly random transfers, where ufs is a mixture.

 The current problem is to increase linear write performance.
 I don't see a chance that rawio benefit of it, but ufs will.

Well, rawio doesn't need to benefit.  It's supposed to be a neutral
observer, but in this case it's not doing too well.

 Do you feel like changing the locking code?  It shouldn't be that much
 work, and I'd be interested to see how much performance difference it
 makes.

 I put it onto my todo list.

Thanks.

 Note that there's another possible optimization here: delay the writes
 by a certain period of time and coalesce them if possible.  I haven't
 finished thinking about the implications.

 That's exactly what the ufs clustering and softupdates does.
 If it doesn't fit modern drives anymore it should get tuned there.

This doesn't have too much to do with modern drives; it's just as
applicable to 70s drives.

 Whenever a write hits a driver there is a waiter for it.
 Either a softdep, a memory freeing or an application doing an sync
 transfer.
 I'm almost shure delaying writes will harm performance in upper layers.

I'm not so sure.  Full stripe writes, where needed, are *much* faster
than partial strip writes.

Greg
--
See complete headers for address and phone numbers

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Vinum write performance (was: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c))

2001-12-12 Thread Bernd Walter

On Thu, Dec 13, 2001 at 12:47:53PM +1030, Greg Lehey wrote:
 On Thursday, 13 December 2001 at  3:06:14 +0100, Bernd Walter wrote:
  Currently if we have two writes in two stripes each, all initated before
  the first finished, the drive has to seek between the two stripes, as
  the second write to the same stripe has to wait.
 
 I'm not sure I understand this.  The stripes are on different drives,
 after all.

Lets asume a 256k striped single plex volume with 3 subdisks.
We get a layout like this:

sd1 sd2 sd3
256k256kparity
256kparity  256k
parity  256k256k
256k256kparity
... ... ...

Now we write on the volume the blocks 1, 10, 1040 and 1045.
All writes are initated at the same time.
Good would be to write first 1 then 10 then 1040 and finaly 1045.
What we currently see is write 1 then 1040 then 10 and finaly 1045.
This is because we can't write 10 unless 1 is finished but we already
start with 1040 because it's independend.
The result is avoidable seeking in subdisk 1.

Back to the 256k performance breakdown you described.
Because of the seeks we have not only unneeded seeks on the drive but
also have a different use pattern on the drive cache.

Once the locks are untangled it is required to verify the situation as
the drive cache may behave differently.

-- 
B.Walter  COSMO-Project http://www.cosmo-project.de
[EMAIL PROTECTED] Usergroup   [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-11 Thread Wilko Bulte

On Tue, Dec 11, 2001 at 11:06:33AM +1030, Greg Lehey wrote:
 On Monday, 10 December 2001 at 10:30:04 -0800, Matthew Dillon wrote:
 
  performance without it - for reading OR writing.  It doesn't matter
  so much for RAID{1,10},  but it matters a whole lot for something like
  RAID-5 where the difference between a spindle-synced read or write
  and a non-spindle-synched read or write can be upwards of 35%.
 
  If you have RAID5 with I/O sizes that result in full-stripe operations.
 
  Well, 'more then one disk' operations anyway, for random-I/O.  Caching
  takes care of sequential I/O reasonably well but random-I/O goes down
  the drain for writes if you aren't spindle synced, no matter what
  the stripe size,
 
 Can you explain this?  I don't see it.  In FreeBSD, just about all I/O
 goes to buffer cache.
 
  and will go down the drain for reads if you cross a stripe -
  something that is quite common I think.
 
 I think this is what Mike was referring to when talking about parity
 calculation.  In any case, going across a stripe boundary is not a
 good idea, though of course it can't be avoided.  That's one of the
 arguments for large stripes.

In a former life I was involved with a HB striping product for SysVr2
that had a slightly modified filesystem that 'knew' when it was
working on a striped disk. And as it know, it avoided posting I/O s
that crossed stripes.

W/
-- 
|   / o / /_  _ email:  [EMAIL PROTECTED]
|/|/ / / /(  (_)  Bulte Arnhem, The Netherlands 

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-11 Thread Bernd Walter

On Tue, Dec 11, 2001 at 03:34:37PM +0100, Wilko Bulte wrote:
 On Tue, Dec 11, 2001 at 11:06:33AM +1030, Greg Lehey wrote:
  I think this is what Mike was referring to when talking about parity
  calculation.  In any case, going across a stripe boundary is not a
  good idea, though of course it can't be avoided.  That's one of the
  arguments for large stripes.
 
 In a former life I was involved with a HB striping product for SysVr2
 that had a slightly modified filesystem that 'knew' when it was
 working on a striped disk. And as it know, it avoided posting I/O s
 that crossed stripes.

Here some real world statistics with UFS softupdates:
Plex d1.p0: Size:   8736473088 bytes (8331 MB)
Subdisks:3
State: up
Organization: striped   Stripe size: 256 kB
Part of volume d1
Reads: 83546
Bytes read:258429952 (246 MB)
Average read:   3093 bytes
Writes:   100109
Bytes written: 818750464 (780 MB)
Average write:  8178 bytes
Multiblock:  279 (0%)
Multistripe:  82 (0%)

Subdisk 0:  d1.p0.s0
  state: up size  2912157696 (2777 MB)
Subdisk 1:  d1.p0.s1
  state: up size  2912157696 (2777 MB)
Subdisk 2:  d1.p0.s2
  state: up size  2912157696 (2777 MB)

You can easily see that the number of Multistripe transactions are
unnoticeable low.

Here another case with 64k stripe size:
Plex d7.p0: Size:   36419796992 bytes (34732 MB)
Subdisks:2
State: up
Organization: striped   Stripe size: 64 kB
Part of volume d7
Reads:934001
Bytes read:   3718752768 (3546 MB)
Average read:   3981 bytes
Writes:   220293
Bytes written:3702993920 (3531 MB)
Average write: 16809 bytes
Multiblock:50037 (4%)
Multistripe:   25047 (2%)

Subdisk 0:  d7.p0.s0
  state: up size 18209898496 (17366 MB)
Subdisk 1:  d7.p0.s1
  state: up size 18209898496 (17366 MB)

You can see that even we have an absolute extrem situation the number
of multistripe transactions is still very low.
But a value of 384k would be a much better value for other reasons.

You may want to compare the multistripe number with the multiblock number
and yes it doesn't look that good anymore, but you also see that the
change from 64k to 256k get much better results, while the average
transaction size is 5865 bytes for the first case and 6429 bytes for
the second - not that different.

Most of my plexes are concat anyway.

-- 
B.Walter  COSMO-Project http://www.cosmo-project.de
[EMAIL PROTECTED] Usergroup   [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-11 Thread Greg Lehey

On Tuesday, 11 December 2001 at 15:34:37 +0100, Wilko Bulte wrote:
 On Tue, Dec 11, 2001 at 11:06:33AM +1030, Greg Lehey wrote:
 On Monday, 10 December 2001 at 10:30:04 -0800, Matthew Dillon wrote:

 performance without it - for reading OR writing.  It doesn't matter
 so much for RAID{1,10},  but it matters a whole lot for something like
 RAID-5 where the difference between a spindle-synced read or write
 and a non-spindle-synched read or write can be upwards of 35%.

 If you have RAID5 with I/O sizes that result in full-stripe operations.

 Well, 'more then one disk' operations anyway, for random-I/O.  Caching
 takes care of sequential I/O reasonably well but random-I/O goes down
 the drain for writes if you aren't spindle synced, no matter what
 the stripe size,

 Can you explain this?  I don't see it.  In FreeBSD, just about all I/O
 goes to buffer cache.

 and will go down the drain for reads if you cross a stripe -
 something that is quite common I think.

 I think this is what Mike was referring to when talking about parity
 calculation.  In any case, going across a stripe boundary is not a
 good idea, though of course it can't be avoided.  That's one of the
 arguments for large stripes.

 In a former life I was involved with a HB striping product for SysVr2
 that had a slightly modified filesystem that 'knew' when it was
 working on a striped disk. And as it know, it avoided posting I/O s
 that crossed stripes.

So what did it do with user requests which crossed stripes?

Greg
--
See complete headers for address and phone numbers

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-11 Thread Wilko Bulte

On Wed, Dec 12, 2001 at 09:00:34AM +1030, Greg Lehey wrote:
 On Tuesday, 11 December 2001 at 15:34:37 +0100, Wilko Bulte wrote:
  On Tue, Dec 11, 2001 at 11:06:33AM +1030, Greg Lehey wrote:
  On Monday, 10 December 2001 at 10:30:04 -0800, Matthew Dillon wrote:
 

..

  and will go down the drain for reads if you cross a stripe -
  something that is quite common I think.
 
  I think this is what Mike was referring to when talking about parity
  calculation.  In any case, going across a stripe boundary is not a
  good idea, though of course it can't be avoided.  That's one of the
  arguments for large stripes.
 
  In a former life I was involved with a HB striping product for SysVr2
  that had a slightly modified filesystem that 'knew' when it was
  working on a striped disk. And as it know, it avoided posting I/O s
  that crossed stripes.
 
 So what did it do with user requests which crossed stripes?

Memory is dim, but I think the fs code created a second i/o to the
driver layer. So the fs never sent out an i/o that the driver layer had
to break up. In case of a pre-fetch while reading I think the f/s 
would just pre-fetch until the stripe border and not bother sending
out a second i/o down. 

In the end all of this benchmarked quite favorably.
Note that this was 386/486 era, with the classic SysV filesystem.

-- 
|   / o / /_  _ email:  [EMAIL PROTECTED]
|/|/ / / /(  (_)  Bulte Arnhem, The Netherlands 

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Vinum write performance (was: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c))

2001-12-11 Thread Greg Lehey

On Tuesday, 11 December 2001 at  3:11:21 +0100, Bernd Walter wrote:
 On Tue, Dec 11, 2001 at 11:06:33AM +1030, Greg Lehey wrote:
 On Monday, 10 December 2001 at 10:30:04 -0800, Matthew Dillon wrote:

 performance without it - for reading OR writing.  It doesn't matter
 so much for RAID{1,10},  but it matters a whole lot for something like
 RAID-5 where the difference between a spindle-synced read or write
 and a non-spindle-synched read or write can be upwards of 35%.

 If you have RAID5 with I/O sizes that result in full-stripe operations.

 Well, 'more then one disk' operations anyway, for random-I/O.  Caching
 takes care of sequential I/O reasonably well but random-I/O goes down
 the drain for writes if you aren't spindle synced, no matter what
 the stripe size,

 Can you explain this?  I don't see it.  In FreeBSD, just about all I/O
 goes to buffer cache.

 After waiting for the drives and not for vinum parity blocks.

 and will go down the drain for reads if you cross a stripe -
 something that is quite common I think.

 I think this is what Mike was referring to when talking about parity
 calculation.  In any case, going across a stripe boundary is not a
 good idea, though of course it can't be avoided.  That's one of the
 arguments for large stripes.

 striped:
 If you have 512byte stripes and have 2 disks.
 You access 64k which is put into 2 32k transactions onto the disk.

Only if your software optimizes the transfers.  There are reasons why
it should not.  Without optimization, you get 128 individual
transfers.

 The wait time for the complete transaction is the worst of both,
 which is more than the average of a single disk.

Agreed.

 With spindle syncronisation the access time for both disks are
 beleaved to be identic and you get the same as with a single disk.

Correct.

 Linear speed could be about twice the speed of a single drive.  But
 this is more theoretic today than real.  The average transaction
 size per disk decreases with growing number of spindles and you get
 more transaction overhead.  Also the voice coil technology used in
 drives since many years add a random amount of time to the access
 time, which invalidates some of the spindle sync potential.  Plus it
 may break some benefits of precaching mechanisms in drives.  I'm
 almost shure there is no real performance gain with modern drives.

The real problem with this scenario is that you're missing a couple of
points:

1.  Typically it's not the latency that matters.  If you have to wait
a few ms longer, that's not important.  What's interesting is the
case of a heavily loaded system, where the throughput is much more
important than the latency.

2.  Throughput is the data transferred per unit time.  There's active
transfer time, nowadays in the order of 500 µs, and positioning
time, in the order of 6 ms.  Clearly the fewer positioning
operations, the better.  This means that you should want to put
most transfers on a single spindle, not a single stripe.  To do
this, you need big stripes.

 raid5:
 For a write you have two read transactions and two writes.

This is the way Vinum does it.  There are other possibilities:

1.  Always do full-stripe writes.  Then you don't need to read the old
contents.

2.  Cache the parity blocks.  This is an optimization which I think
would be very valuable, but which Vinum doesn't currently perform.

 There are easier things to raise performance.
 Ever wondered why people claim vinums raid5 writes are slow?
 The answer is astonishing simple:
 Vinum does striped based locking, while the ufs tries to lay out data
 mostly ascending sectors.
 What happens here is that the first write has to wait for two reads
 and two writes.
 If we have an ascending write it has to wait for the first write to
 finish, because the stripe is still locked.
 The first is unlocked after both physical writes are on disk.
 Now we start our two reads which are (thanks to drives precache)
 most likely in the drives cache - than we write.

 The problem here is that physical writes gets serialized and the drive
 has to wait a complete rotation between each.

Not if the data is in the drive cache.

 If we had a fine grained locking which only locks the accessed sectors
 in the parity we would be able to have more than a single ascending
 write transaction onto a single drive.

Hmm.  This is something I hadn't thought about.  Note that sequential
writes to a RAID-5 volume don't go to sequential addresses on the
spindles; they will work up to the end of the stripe on one spindle,
then start on the next spindle at the start of the stripe.  You can do
that as long as the address ranges in the parity block don't overlap,
but the larger the stripe, the greater the likelihood of this would
be. This might also explain the following observed behaviour:

1.  RAID-5 writes slow down when the stripe size gets  256 kB or so.
I don't know if this happens 

Re: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-11 Thread Greg Lehey

On Tuesday, 11 December 2001 at 23:41:51 +0100, Wilko Bulte wrote:
 On Wed, Dec 12, 2001 at 09:00:34AM +1030, Greg Lehey wrote:
 On Tuesday, 11 December 2001 at 15:34:37 +0100, Wilko Bulte wrote:
 On Tue, Dec 11, 2001 at 11:06:33AM +1030, Greg Lehey wrote:
 On Monday, 10 December 2001 at 10:30:04 -0800, Matthew Dillon wrote:


 ..

 and will go down the drain for reads if you cross a stripe -
 something that is quite common I think.

 I think this is what Mike was referring to when talking about parity
 calculation.  In any case, going across a stripe boundary is not a
 good idea, though of course it can't be avoided.  That's one of the
 arguments for large stripes.

 In a former life I was involved with a HB striping product for SysVr2
 that had a slightly modified filesystem that 'knew' when it was
 working on a striped disk. And as it know, it avoided posting I/O s
 that crossed stripes.

 So what did it do with user requests which crossed stripes?

 Memory is dim, but I think the fs code created a second i/o to the
 driver layer. So the fs never sent out an i/o that the driver layer had
 to break up.

That's what Vinum does.

 In case of a pre-fetch while reading I think the f/s would just
 pre-fetch until the stripe border and not bother sending out a
 second i/o down.

Yes, that's reasonable.

 In the end all of this benchmarked quite favorably.  Note that this
 was 386/486 era, with the classic SysV filesystem.

I don't think that UFS would behave that differently, just faster :-)

Greg
--
See complete headers for address and phone numbers

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously Decidated yet again (was : cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-10 Thread Mike Smith

  Still, it's my opinion that these BIOSes are simply broken:
 
  Joerg's personal opinion can go take a hike.  The reality of the
  situation is that this table is required, and we're going to put it there.
 
 The reality of the situation is far from being clear.  The only thing
 I can see is that you're trying to dictate things without adequate
 justification.  You should reconsider that attitude.

You can't substantiate your argument by closing your eyes, Greg.

There's a wealth of evidence against your stance, and frankly, none that 
supports it other than myopic bigotry (I don't want to do this because 
Microsoft use this format).  Are you going to stop using all of the 
other techniques that we share with them?


-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
   V I C T O R Y   N O T   V E N G E A N C E



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-10 Thread Joerg Wunsch

As Peter Wemm wrote:

 No, it isn't ignored, BIOS'es know that fdisk partitions end on
 cylinder boundaries, and therefore can intuit what the expected
 geometry is for the disk in question.

And you call that a design?  I call it a poor hack, nothing else.

The restriction to whatever the BIOS believes to be a cylinder
boundary is one of my gripes with fdisk tables; you obviously missed
that (or you don't argue about it -- can i take this as silent
agreement?).  It imposes a geometry that is not even remotely there,
with the drawbacks that a number of sectors can never be assigned (OK,
no big deal these days), but even worse, disks are non-portable
between different BIOSes that perform different intuition about how
to obtain the geometry from those poorly chosen values that are
included in fdisk tables.

/The/ major advantage of DD mode was that all BIOSes (so far :) at
least agree on how to access block 0 and the adjacent blocks, so
starting our own system there makes those disks portable.

 [...] The problem is that the int13 code only allowed for 255 heads,
 and the fake end of disk entry that is unconditionally in /boot/boot1
 specified an ending head number 255 (ie: 256 heads).  When this gets put
 into a byte register it is truncated to zero and we get divide by zero
 errors.

I've read this, and yes, i never argued about fixing /that/.  Since
those values chosen by our grandfather Bill Jolitz have been just
`magic' numbers only, it's unfortunate they eventually turned out to
be such bad magic about a decade later.

 We can just as easily have bootable-DD mode with a real MBR and have
 freebsd start on sector #2 instead of overlapping boot1 and mbr.

Probably, i think i could live with that.

 I'd rather that we be specific about this.  If somebody wants ad2e
 or da2e then they should not be using *any* fdisk tables at all.
 Ie: block 0 should be empty.

That disk wouldn't boot at all, you know that.

Yes, i prefer my disks to be called da0a...daNP.

 But to be honest, see my other article: i never argued to make this
 the default or a recommended strategy in any form.  It should only
 remain intact at all.  Back to the subject, the current warning
 however, is pointless, and has the major drawback to potentially
 hide important console messages.

 The console buffer is 32K these days.  You'd have to have around 300
 disks to have any real effect on the kernel.

You're narrow minded here, Peter, this time about in the same way as
Windoze is narrow minded: All the world's a graphical console
produced by XXX.  No, all the world's not like that.  You might
consider my pcvt console obsolete, OK, but did you ever think about a
plain VT220 on a serial console?  They don't have /any/ scrollback
buffer.  (And you can't even stop the output with ^S while FreeBSD is
booting.)  Also, i think that:

uriah /boot/kernel/kernel: da0: invalid primary partition table: Dangerously Dedicated 
(ignored)
uriah last message repeated 5 times
uriah /boot/kernel/kernel: da1: invalid primary partition table: Dangerously Dedicated 
(ignored)
uriah last message repeated 34 times
uriah /boot/kernel/kernel: da2: invalid primary partition table: Dangerously Dedicated 
(ignored)
uriah last message repeated 34 times

...73 of those silly messages are just beyond any form of usefulness.
Either we hide this completely behind bootverbose (back to the root of
this thread) since it bears no information at all (i already knew what
is written there, since it was my deliberate decision, and it could
not have happened unless being my deliberate decision), or we at least
ensure any of those messages is emitted at most once per drive.

-- 
cheers, Jorg   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously Decidated yet again (was : cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-10 Thread Greg Lehey

On Monday, 10 December 2001 at  0:17:14 -0800, Mike Smith wrote:
 Still, it's my opinion that these BIOSes are simply broken:

 Joerg's personal opinion can go take a hike.  The reality of the
 situation is that this table is required, and we're going to put it there.

 The reality of the situation is far from being clear.  The only thing
 I can see is that you're trying to dictate things without adequate
 justification.  You should reconsider that attitude.

 You can't substantiate your argument by closing your eyes, Greg.

No, of course not.  I also can't substantiate my arguments by sticking
my fingers down my throat and shouting dangerously dedicated!.  But
then, I wasn't doing either.  Read back this thread for the evidence I
have given and which you apparently choose to ignore.

 There's a wealth of evidence against your stance,

Possibly, you just haven't shown it.  What we know so far is that
there are some kludges in the boot loader which can confuse BIOSes;
peter went into some detail earlier on IRC.  Only, they apply both to
systems with Microsoft partitions and those without.  And there are
reports that some Adaptec host adaptors (or, presumably, their BIOSes)
can't handle our particular boot blocks.  It's possible, as peter
suggests, that this is a fixable bug, but every time I mention it, I
get shouted down.  And yes, like Jörg, I don't care enough.  I'm not
saying ditch the Microsoft partition table, I'm saying don't ditch
disks without the Microsoft partition table.  Note also that,
although this is so dangerous, it has never bitten me on any system.

 and frankly, none that supports it other than myopic bigotry (I
 don't want to do this because Microsoft use this format).

None that you care to remember.

 Are you going to stop using all of the other techniques that we
 share with them?

No.  See above.

What is it about this particular topic brings out such irrational
emotions in you and others?

Greg
--
See complete headers for address and phone numbers

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-10 Thread Terry Lambert

Ah, the thread which would not die... 8^).

Joerg Wunsch wrote:
 /The/ major advantage of DD mode was that all BIOSes (so far :) at
 least agree on how to access block 0 and the adjacent blocks, so
 starting our own system there makes those disks portable.

I guarantee you that there are a number of controllers which have
different ideas of how to do soft sector sparing _at the controller
level_ rather than at the drive level.

Disks created with such controllers aren't portable, since they
depend on controller state information, which may not be valid
from controller to controller, depending on the controller settings
(I killed a disk by not having the WD1007 soft sector sparing
jumper set the same in the machine I put it in as in the machine I
took it out of... 8^)).



 I've read this, and yes, i never argued about fixing /that/.  Since
 those values chosen by our grandfather Bill Jolitz have been just
 `magic' numbers only, it's unfortunate they eventually turned out to
 be such bad magic about a decade later.

Yeah, we should pick new magic.  It's bound to die again in the future,
though, once what's magic changes out from under us again... 8^(.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously Decidated yet again (was : cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-10 Thread Terry Lambert

Greg Lehey wrote:
 What is it about this particular topic brings out such irrational
 emotions in you and others?

Everyone who has been around for any length of time has been bitten
on the arse by it at one time or another, I think.  I remember
Alfred made a Lapbrick out of a system a while back ;^).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-10 Thread Joerg Wunsch

As Peter Wemm wrote:

 Can you please clarify for me what specifically you do not like.. Is it:
 - the cost of 32K of disk space on an average disk these days?
   (and if so, is reducing that to one sector instead of 62 sufficient?)

The idea of a geometry that does not even remotely resembles the
actual geometry and only causes additional hassles, like disks being
not portable between controllers that have a different idea of that
geometry (since the design of this table is missing an actual field
to specify the geometry).  Incidentally, it's only what you call
intuition that finally stumpled across the 10-years old Jolitz
fake fdisk values.  So IOW, it took the BIOS vendors ten years to
produce a BIOS that would break on it :), and the breakage (division
by 0) was only since they needed black magic in order to infer a
geometry value that was short-sightedly never specified in the table
itself.

 - you don't like typing s1 in the device name?

Aesthetically, yes, this one too. :)

 disklabel -rw ad2 auto is one form.  That should not use fdisk at all.
 This is quite fine, and nobody wants that to go away.

Good to hear.

Well, actually i always use disklabel -Brw daN auto, partly because
this sequence is wired into my fingers, and since i mentally DAbelieve
that having more bootstrappable disks couldn't harm. ;-)  As laid out
in another message, i eventually got the habit of even including a
root partition mirror on each disk as well.  So each of my disks should
be able to boot a single-user FreeBSD.

 I advocate that the bootable form (where boot1.s is expected to do the
 job of both the mbr *and* the partition boot) is evil and should at the very
 least be fixed.

Fixing is OK to me.  I think to recognize the dummy fdisk table of DD mode,
it would be totally sufficient to verify slice 4 being labelled with 5
blocks, and the other slices being labelled 0.  We do not support any
physical disk anymore that is only 25 MB in size :).  So all the remaining
(INT 0x13 bootstrap) values could be anything -- even something that most
BIOSes would recognize as a valid fdisk table.

  It should be something that is explicitly activated, and
 not something that you get whether you want it or not.

I don't fully understand that.  DD mode has always been an explicit
decision.  Even in the above, the specification of -B explicitly tells
to install that bootstrap.

As David O'Brien wrote:

  Its design is antique.  Or rather: it's missing a design.

 Jorg, why not just buy an Alpha or Sun Blade and run FreeBSD on it??

I don't see much value in an Alpha.  Maybe a Sun some day, who knows?
As i understand it now, the UltraSparc port is not quite at that stage,
but i'm willing to experiment with it when i find a bit of time and
documentation how to get started.  I've got access to a good number of
Suns here at work, and i think there are even a number of colleagues
who would prefer FreeBSD over Solaris on them.  If FreeBSD would had
been ready for it, i could have tested it on the new V880 machine that
was just announced recently. :)  (We were the first one worldwide to
show it on a fair trade here, about 24 hours after the official
announcment...)

-- 
cheers, Jorg   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-10 Thread Ian Dowse

In message [EMAIL PROTECTED], Peter Wemm writes
:
The problem is, that you **are** using fdisk tables, you have no choice.
DD mode included a *broken* fdisk table that specified an illegal geometry.
...
This is why it is called dangerous.

BTW, I presume you are aware of the way sysinstall creates DD MBRs;
it does not use the 5 sector slice 4 method, but sets up slice
1 to cover the entire disk including the MBR, with c/h/s entries
corresponding to the real start and end of the disk, e.g:

cylinders=3544 heads=191 sectors/track=53 (10123 blks/cyl)
...
The data for partition 1 is:
sysid 165,(FreeBSD/NetBSD/386BSD)
start 0, size 35885168 (17522 Meg), flag 80 (active)
beg: cyl 0/ head 0/ sector 1;
end: cyl 1023/ head 190/ sector 53
The data for partition 2 is:
UNUSED
The data for partition 3 is:
UNUSED
The data for partition 4 is:
UNUSED

Otherwise the disk layout is the same as disklabel's DD. I suspect
that this approach is much less illegal than disklabel's MBRs
although I do remember seeing a HP PC that disliked it. I wonder
if a reasonable compromise is to make disklabel use this system for
DD disks instead of the bogus 5 sector slice? Obviously, it
should also somehow not install a partition table unless boot1 is
being used as the MBR, and the fdisk -I method should be preferred.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-10 Thread Joerg Wunsch

As Terry Lambert wrote:

 Joerg Wunsch wrote:
  /The/ major advantage of DD mode was that all BIOSes (so far :) at
  least agree on how to access block 0 and the adjacent blocks, so
  starting our own system there makes those disks portable.

 I guarantee you that there are a number of controllers which have
 different ideas of how to do soft sector sparing _at the controller
 level_ rather than at the drive level.

We have dropped support for ESDI controllers long since. :-)

Seriously, all the disks we are supporting these days do bad block
replacement at the drive level.

-- 
cheers, Jorg   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-10 Thread Terry Lambert

Joerg Wunsch wrote:
  I guarantee you that there are a number of controllers which have
  different ideas of how to do soft sector sparing _at the controller
  level_ rather than at the drive level.
 
 We have dropped support for ESDI controllers long since. :-)
 
 Seriously, all the disks we are supporting these days do bad block
 replacement at the drive level.

Adaptec 1742 is supported, though it took a long enough time to
find its way into CAM.  Same for the NCR 810.

For certain applications, also, you _want_ to turn off the automatic
bad sector sparing: it's incompatible with spindle sync, for example,
where you want to spare all drives or none, so that the spindles don't
desyncronize on a sparing hit for one drive, but not another.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously Decidated yet again (was : cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-10 Thread Mike Smith

 What is it about this particular topic brings out such irrational
 emotions in you and others?

Because you define as irrational those opinions that don't agree with 
your own.  I don't consider my stance irrational at all, and I find 
your leaps past logic and commonsense quite irrational in return.

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
   V I C T O R Y   N O T   V E N G E A N C E



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-10 Thread Mike Smith

 Joerg Wunsch wrote:
   I guarantee you that there are a number of controllers which have
   different ideas of how to do soft sector sparing _at the controller
   level_ rather than at the drive level.
  
  We have dropped support for ESDI controllers long since. :-)
  
  Seriously, all the disks we are supporting these days do bad block
  replacement at the drive level.
 
 Adaptec 1742 is supported, though it took a long enough time to
 find its way into CAM.  Same for the NCR 810.

Neither of which do controller-level sparing.

 For certain applications, also, you _want_ to turn off the automatic
 bad sector sparing: it's incompatible with spindle sync, for example,
 where you want to spare all drives or none, so that the spindles don't
 desyncronize on a sparing hit for one drive, but not another.

Spindle sync is an anachronism these days; asynchronous behaviour 
(write-behind in particular) is all the rage.  You'd be hard-pressed to 
find drives that even support it anymore.

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
   V I C T O R Y   N O T   V E N G E A N C E



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-10 Thread Matthew Dillon

:Spindle sync is an anachronism these days; asynchronous behaviour 
:(write-behind in particular) is all the rage.  You'd be hard-pressed to 
:find drives that even support it anymore.

Woa!  Say what?  I think you are totally incorrect here Mike.
Spindle sync is not an anachronism.  You can't get good RAID{0,2,3,4,5}
performance without it - for reading OR writing.  It doesn't matter 
so much for RAID{1,10},  but it matters a whole lot for something like
RAID-5 where the difference between a spindle-synced read or write
and a non-spindle-synched read or write can be upwards of 35%.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-10 Thread Wilko Bulte

On Mon, Dec 10, 2001 at 10:13:20AM -0800, Matthew Dillon wrote:
 :Spindle sync is an anachronism these days; asynchronous behaviour 
 :(write-behind in particular) is all the rage.  You'd be hard-pressed to 
 :find drives that even support it anymore.
 
 Woa!  Say what?  I think you are totally incorrect here Mike.
 Spindle sync is not an anachronism.  You can't get good RAID{0,2,3,4,5}

For RAID3 that is true. For the other ones...

 performance without it - for reading OR writing.  It doesn't matter 
 so much for RAID{1,10},  but it matters a whole lot for something like
 RAID-5 where the difference between a spindle-synced read or write
 and a non-spindle-synched read or write can be upwards of 35%.

If you have RAID5 with I/O sizes that result in full-stripe operations.

-- 
|   / o / /_  _ email:  [EMAIL PROTECTED]
|/|/ / / /(  (_)  Bulte Arnhem, The Netherlands 

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-10 Thread Matthew Dillon


: performance without it - for reading OR writing.  It doesn't matter 
: so much for RAID{1,10},  but it matters a whole lot for something like
: RAID-5 where the difference between a spindle-synced read or write
: and a non-spindle-synched read or write can be upwards of 35%.
:
:If you have RAID5 with I/O sizes that result in full-stripe operations.

Well, 'more then one disk' operations anyway, for random-I/O.  Caching
takes care of sequential I/O reasonably well but random-I/O goes down
the drain for writes if you aren't spindle synced, no matter what
the stripe size, and will go down the drain for reads if you cross
a stripe - something that is quite common I think.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-10 Thread Matthew Dillon


:For RAID3 that is true. For the other ones...
:
: performance without it - for reading OR writing.  It doesn't matter 
: so much for RAID{1,10},  but it matters a whole lot for something like
: RAID-5 where the difference between a spindle-synced read or write
: and a non-spindle-synched read or write can be upwards of 35%.
:
:If you have RAID5 with I/O sizes that result in full-stripe operations.
:
:-- 
:|   / o / /_  _email:  [EMAIL PROTECTED]
:|/|/ / / /(  (_)  BulteArnhem, The Netherlands 

Well, for reads a non-stripe-crossing op would still work reasonably
well.  But for writes less then full-stripe operations without
spindle sync are going to be terrible due to the read-before-write
requirement (to calculate parity).  The disk cache is useless in that
case.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-10 Thread Matthew Dillon


: Well, for reads a non-stripe-crossing op would still work reasonably
: well.  But for writes less then full-stripe operations without
: spindle sync are going to be terrible due to the read-before-write
: requirement (to calculate parity).  The disk cache is useless in that
: case.
:
:You obviously weren't reading the previous thread on RAID5 checksum 
:calculation, I see. 8)

I don't see a thread on raid-5 checksuming. What was the subject?

-Matt

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-10 Thread John Baldwin


On 09-Dec-01 Joerg Wunsch wrote:
 As Peter Wemm wrote:
 
 There shouldn't *be* bootblocks on non-boot disks.
 
 dd if=/dev/zero of=/dev/da$n count=1
 
 Dont use disklabel -B -rw da$n auto.  Use disklabel -rw da$n auto.
 
 All my disks have bootblocks and (spare) boot partitions.  All the
 bootblocks are DD mode.  I don't see any point in using obsolete fdisk
 tables.  (There's IMHO only one purpose obsolete fdisk tables are good
 for, co-operation with other operating systems in the same machine.
 None of my machines uses anything else than FreeBSD.)

Well, since they are a de facto part of the PC architecture they are also
good so that you don't break BIOS's.

-- 

John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-10 Thread Peter Wemm

Ian Dowse wrote:
 In message [EMAIL PROTECTED], Peter Wemm writ
es
 :
 The problem is, that you **are** using fdisk tables, you have no choice.
 DD mode included a *broken* fdisk table that specified an illegal geometry.
 ...
 This is why it is called dangerous.
 
 BTW, I presume you are aware of the way sysinstall creates DD MBRs;
 it does not use the 5 sector slice 4 method, but sets up slice
 1 to cover the entire disk including the MBR, with c/h/s entries
 corresponding to the real start and end of the disk, e.g:
 
 cylinders=3544 heads=191 sectors/track=53 (10123 blks/cyl)
 ...
 The data for partition 1 is:
 sysid 165,(FreeBSD/NetBSD/386BSD)
 start 0, size 35885168 (17522 Meg), flag 80 (active)
 beg: cyl 0/ head 0/ sector 1;
 end: cyl 1023/ head 190/ sector 53
 The data for partition 2 is:
 UNUSED
 The data for partition 3 is:
 UNUSED
 The data for partition 4 is:
 UNUSED
 
 Otherwise the disk layout is the same as disklabel's DD. I suspect
 that this approach is much less illegal than disklabel's MBRs
 although I do remember seeing a HP PC that disliked it. I wonder
 if a reasonable compromise is to make disklabel use this system for
 DD disks instead of the bogus 5 sector slice? Obviously, it
 should also somehow not install a partition table unless boot1 is
 being used as the MBR, and the fdisk -I method should be preferred.

Yes, that is much safer, however there are certain bioses that have
interesting quirks that the MBR has to work around.  The problem when
overlapping mbr and boot1 into the same block is that we no longer have the
space to do that.  boot1.s has got *3* bytes free.

For example, we dont have space to fix the case where the drive number is
passed through incorrectly to the mbr.  Some older Intel boards have this
problem (Phoenix derived bios).  See boot0's setdrv option.

Also (and I think this is more likely to be the problem you ran into) many
newer PC's are looking at the partition tables for a suspend-to-disk
partition or a FAT filesystem with a suspend-to-disk dump file.  For better
or worse, PC architecture dictates that boot disk partitions start and end
on cylinder boundaries (except for the first one which starts on the second
head in the first cylinder).  When we break those rules, we have to be
prepared for the consequences.

However, there is light at the end of the tunnel.  EFI GPT is pretty clean.
It supports up to something like 16384 partitions and it has all the useful
stuff we could possibly want including unique ID's, no CHS (pure 64 bit
LBA), volume tags (you can name partitions etc), and so on.  It is clean
enough that we could almost get away with doing away with disklabel as
well.  Coming soon to a PC near you.
(http://developer.intel.com/technology/efi/index.htm)

Cheers,
-Peter
--
Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
All of this is for nothing if we don't go to the stars - JMS/B5


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-10 Thread Greg Lehey

On Monday, 10 December 2001 at 10:30:04 -0800, Matthew Dillon wrote:

 performance without it - for reading OR writing.  It doesn't matter
 so much for RAID{1,10},  but it matters a whole lot for something like
 RAID-5 where the difference between a spindle-synced read or write
 and a non-spindle-synched read or write can be upwards of 35%.

 If you have RAID5 with I/O sizes that result in full-stripe operations.

 Well, 'more then one disk' operations anyway, for random-I/O.  Caching
 takes care of sequential I/O reasonably well but random-I/O goes down
 the drain for writes if you aren't spindle synced, no matter what
 the stripe size,

Can you explain this?  I don't see it.  In FreeBSD, just about all I/O
goes to buffer cache.

 and will go down the drain for reads if you cross a stripe -
 something that is quite common I think.

I think this is what Mike was referring to when talking about parity
calculation.  In any case, going across a stripe boundary is not a
good idea, though of course it can't be avoided.  That's one of the
arguments for large stripes.

Greg
--
See complete headers for address and phone numbers

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Dangerously Dedicated (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-10 Thread Greg Lehey

On Sunday,  9 December 2001 at 16:59:28 -0800, Peter Wemm wrote:
 Joerg Wunsch wrote:
 Mike Smith [EMAIL PROTECTED] wrote:
 I'd love to never hear those invalid, unuseful, misleading opinions
 from you again.

 ETOOMANYATTRIBUTES? :-)

 As long as you keep the feature of DD mode intact, i won't argue.  If
 people feel like creating disks that aren't portable to another
 controller, they should do.  I don't like this idea.

 We can just as easily have bootable-DD mode with a real MBR and have
 freebsd start on sector #2 instead of overlapping boot1 and mbr.   

This would seem to be a reasonable alternative.  

 This costs only one sector instead of 64 sectors (a whopping 32K,
 I'm sure that is going to break the bank on today's disks).

Well, the real question is the space wasted at the end, which can be
up to a megabyte.  Still not going to kill you, but it's aesthetically
displeasing.

 I'd rather that we be specific about this.  If somebody wants ad2e
 or da2e then they should not be using *any* fdisk tables at all.
 Ie: block 0 should be empty.  The problem is that if you put
 /boot/boot1 in there, then suddenly it looks like a fdisk disk and
 we have to have ugly magic to detect it and prevent the fake table
 from being used.  I would prefer that the fdisk table come out of
 /boot/boot1 so that we dont have to have it by default, and we use
 fdisk to install the DD magic table if somebody wants to make it
 bootable.

So where would you put the bootstrap?  In sector 2?

Greg
--
See complete headers for address and phone numbers

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: RAID performance (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-10 Thread Bernd Walter

On Tue, Dec 11, 2001 at 11:06:33AM +1030, Greg Lehey wrote:
 On Monday, 10 December 2001 at 10:30:04 -0800, Matthew Dillon wrote:
 
  performance without it - for reading OR writing.  It doesn't matter
  so much for RAID{1,10},  but it matters a whole lot for something like
  RAID-5 where the difference between a spindle-synced read or write
  and a non-spindle-synched read or write can be upwards of 35%.
 
  If you have RAID5 with I/O sizes that result in full-stripe operations.
 
  Well, 'more then one disk' operations anyway, for random-I/O.  Caching
  takes care of sequential I/O reasonably well but random-I/O goes down
  the drain for writes if you aren't spindle synced, no matter what
  the stripe size,
 
 Can you explain this?  I don't see it.  In FreeBSD, just about all I/O
 goes to buffer cache.

After waiting for the drives and not for vinum parity blocks.

  and will go down the drain for reads if you cross a stripe -
  something that is quite common I think.
 
 I think this is what Mike was referring to when talking about parity
 calculation.  In any case, going across a stripe boundary is not a
 good idea, though of course it can't be avoided.  That's one of the
 arguments for large stripes.

striped:
If you have 512byte stripes and have 2 disks.
You access 64k which is put into 2 32k transactions onto the disk.
The wait time for the complete transaction is the worst of both,
which is more than the average of a single disk.
With spindle syncronisation the access time for both disks are beleaved
to be identic and you get the same as with a single disk.
Linear speed could be about twice the speed of a single drive.
But this is more theoretic today than real.
The average transaction size per disk decreases with growing number
of spindles and you get more transaction overhead.
Also the voice coil technology used in drives since many years add a
random amount of time to the access time, which invalidates some of
the spindle sync potential.
Plus it may break some benefits of precaching mechanisms in drives.
I'm almost shure there is no real performance gain with modern drives.

raid5:
For a write you have two read transactions and two writes.
The two read are at the same position on two different spindless and
there the same access time situation exists as in the case above.
We don't have the problem with decreased transaction sizes.
But we have the same problem with seek time and modern disks as
in the case above plus we have the problem that the drives are not
exactly equaly loaded which randomizes the access times again.
I doubt that we have a performance gain with modern disks in the
general case, but there might be some special uses.

The last drives I saw which could do spindle sync was the IBM DCHS
series.


There are easier things to raise performance.
Ever wondered why people claim vinums raid5 writes are slow?
The answer is astonishing simple:
Vinum does striped based locking, while the ufs tries to lay out data
mostly ascending sectors.
What happens here is that the first write has to wait for two reads
and two writes.
If we have an ascending write it has to wait for the first write to
finish, because the stripe is still locked.
The first is unlocked after both physical writes are on disk.
Now we start our two reads which are (thanks to drives precache)
most likely in the drives cache - than we write.

The problem here is that physical writes gets serialized and the drive
has to wait a complete rotation between each.
If we had a fine grained locking which only locks the accessed sectors
in the parity we would be able to have more than a single ascending
write transaction onto a single drive.
At best the stripe size is bigger than the maximum number of parallel
ascending writes the OS does on the volume.

-- 
B.Walter  COSMO-Project http://www.cosmo-project.de
[EMAIL PROTECTED] Usergroup   [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-10 Thread Bernd Walter

On Mon, Dec 10, 2001 at 10:49:28AM -0800, Matthew Dillon wrote:
 
 :For RAID3 that is true. For the other ones...
 :
 : performance without it - for reading OR writing.  It doesn't matter 
 : so much for RAID{1,10},  but it matters a whole lot for something like
 : RAID-5 where the difference between a spindle-synced read or write
 : and a non-spindle-synched read or write can be upwards of 35%.
 :
 :If you have RAID5 with I/O sizes that result in full-stripe operations.
 :
 :-- 
 :|   / o / /_  _  email:  [EMAIL PROTECTED]
 :|/|/ / / /(  (_)  Bulte  Arnhem, The Netherlands 
 
 Well, for reads a non-stripe-crossing op would still work reasonably
 well.  But for writes less then full-stripe operations without
 spindle sync are going to be terrible due to the read-before-write
 requirement (to calculate parity).  The disk cache is useless in that
 case.

Modern disks do prereads and writes are streamed by tagged command
queueing which invalidates this for linear access.
For non linear access the syncronisation is shadowed partly by different
seek times and different load on the spindles.
The chance that the data and parity spindle have the heads on the same
track is absolutely small for random access.
With 15000 upm drives the maximum rotational delay is 4ms and the
average is 2ms which gives you an maximum of only 1ms to gain under
ideal conditions - which we don't have as I stated above.

-- 
B.Walter  COSMO-Project http://www.cosmo-project.de
[EMAIL PROTECTED] Usergroup   [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-10 Thread Joerg Wunsch

As Peter Wemm wrote:

 Yes, that is much safer, however there are certain bioses that have
 interesting quirks that the MBR has to work around.  The problem
 when overlapping mbr and boot1 into the same block is that we no
 longer have the space to do that.  boot1.s has got *3* bytes free.

Too bad.

Peter, do you care to update the section about DD mode (and its
dangers) in the FAQ after all this discussion?  I could probably do
it, too (the original entry is mine), but i had to quote your
arguments only anyway.

 Also (and I think this is more likely to be the problem you ran
 into) many newer PC's are looking at the partition tables for a
 suspend-to-disk partition or a FAT filesystem with a suspend-to-disk
 dump file.

Seems i really love my Toshiba (Libretto) that simply hibernates to
the last nnn MB of the physical disk. ;-)  (I have reserved a FreeBSD
partition as a placeholder for the hibernation data.)

 However, there is light at the end of the tunnel.  EFI GPT is pretty
 clean.

Good to hear.  While this sounds like dedicated disks will be gone
then :), at least the format looks rationale enough.

 It supports up to something like 16384 partitions ...

It would be interesting to see how Windoze will arrange for 16K
drive letters. :-))

The day vinum is up and ready to also cover the root FS, i won't need
/any/ partition at all anymore. ;-)

As Greg Lehey wrote:

  ...73 of those silly messages are just beyond any form of usefulness.

 Hadn't we agreed to do this?  I'm certainly in favour of the
 bootverbose approach.

I can't remember any agreement so far.  But thinking a bit more about
it, it sounds like the best solution to me, too.  The only other
useful option would be to restrict the message to once per drive, but
that'll cost an additional per-drive flag, which is probably too much
effort just for that message.

-- 
cheers, Jorg   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-10 Thread Mike Smith

   :  IBM DTLA drives are known to rotate fast enough near the spindle
   :  that the sustained write speed exceeds the ability of the controller
   :  electronics to keep up, and results in crap being written to disk.
  
  I would adssume it actually the tracks FURTHEREST from the spindle..

With ZBR, anything is possible.

 Wouldn't the linear speed be faster closer to the spindle at 7200 RPM 
 than at the edge?

The stunning ignorance being displayed in this thread appears to have 
reached an all-time low.

*sigh*

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
   V I C T O R Y   N O T   V E N G E A N C E



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-10 Thread Matthew Dillon


: Wouldn't the linear speed be faster closer to the spindle at 7200 RPM 
: than at the edge?
:
:The stunning ignorance being displayed in this thread appears to have 
:reached an all-time low.
:
:*sigh*

Ah, another poor soul who didn't read the first sentence of
tuning(7).

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-10 Thread Terry Lambert

David W. Chapman Jr. wrote:
   :  IBM DTLA drives are known to rotate fast enough near the spindle
   :  that the sustained write speed exceeds the ability of the controller
   :  electronics to keep up, and results in crap being written to disk.
 
 
  I would adssume it actually the tracks FURTHEREST from the spindle..
 
 Wouldn't the linear speed be faster closer to the spindle at 7200 RPM
 than at the edge?

Linear speed is closes at the edge, but magnetic domain density
is higher at the spindle, for a uniform rotation rate.

I think that the electronics ended up being designed for the
average rate.

PS: The encoding frequency is higher at the spindle, as well.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-09 Thread Joerg Wunsch

As Peter Wemm wrote:

 There shouldn't *be* bootblocks on non-boot disks.
 
 dd if=/dev/zero of=/dev/da$n count=1
 
 Dont use disklabel -B -rw da$n auto.  Use disklabel -rw da$n auto.

All my disks have bootblocks and (spare) boot partitions.  All the
bootblocks are DD mode.  I don't see any point in using obsolete fdisk
tables.  (There's IMHO only one purpose obsolete fdisk tables are good
for, co-operation with other operating systems in the same machine.
None of my machines uses anything else than FreeBSD.)

-- 
cheers, Jorg   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-09 Thread sthaug

 All my disks have bootblocks and (spare) boot partitions.  All the
 bootblocks are DD mode.  I don't see any point in using obsolete fdisk
 tables.  (There's IMHO only one purpose obsolete fdisk tables are good
 for, co-operation with other operating systems in the same machine.
 None of my machines uses anything else than FreeBSD.)

There are very good reasons NOT to use DD mode if you use certain types
of Adaptec SCSI controllers - they simply won't boot from DD.

Aside from that, FreeBSD needs to have *one* recommendation for disks,
anything else creates too much confusion. It is certainly my impression
that the recommendation has been NOT using DD for the IA32 architecture
for quite a while now.

(The other day a coworker of mine wanted to use DD for some IBM DTLA
disks, because he'd heard that the disks performed better that way -
something to do with scatter-gather not working right unless you used
DD. I'm highly skeptical about this since I have my own measurements
from IBM DTLA disks partitioned the normal way, ie. NOT DD, and they
show the disks performing extremely well. Anybody else want to comment
on this?)

Steinar Haug, Nethelp consulting, [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-09 Thread Daniel O'Connor


On 09-Dec-2001 [EMAIL PROTECTED] wrote:
  (The other day a coworker of mine wanted to use DD for some IBM DTLA
  disks, because he'd heard that the disks performed better that way -
  something to do with scatter-gather not working right unless you used
  DD. I'm highly skeptical about this since I have my own measurements
  from IBM DTLA disks partitioned the normal way, ie. NOT DD, and they
  show the disks performing extremely well. Anybody else want to comment
  on this?)

Sounds like an Old Wives Tale to me.

I don't understand the need some people have for using something that is
labelled as DANGEROUS.

No, it won't hurt your cats but you may lose hair from using it, and for what
benefit? NONE!

---
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
The nice thing about standards is that there
are so many of them to choose from.
  -- Andrew Tanenbaum

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-09 Thread Joerg Wunsch

As Daniel O'Connor wrote:

 I don't understand the need some people have for using something
 that is labelled as DANGEROUS.

Historically, it hasn't been labelled that, it only later became
common terminology for it -- in the typical half-joking manner.

 No, it won't hurt your cats but you may lose hair from using it, and
 for what benefit? NONE!

See my other reply about fdisk tables: they are a misdesign from the
beginning.

The single most wanted feature it buys you is the ability to
completely forget the term `geometry' with your disks: the very first
sectors of a disk always have the same BIOS int 0x13 representation,
regardless of what your BIOS/controller thinks the `geometry' might
be.  Thus, those disks are basically portable between controller
BIOSes.  (Modulo those newer broken BIOSes that believe eggs must be
smarter than hens -- see my other article for an opinion.)

-- 
cheers, Jorg   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-09 Thread Joerg Wunsch

As [EMAIL PROTECTED] wrote:

 There are very good reasons NOT to use DD mode if you use certain
 types of Adaptec SCSI controllers - they simply won't boot from DD.

Never seen.  All my SCSI controllers so far booted from my disks
(obviously :).

I figure from Peter's comment in that piece of code that the original
(386BSD 0.0 inherited) DD mode fake fdisk table apparently had some
poor (faked) values inside that could upset some BIOSes.  That's bad,
and IMHO we should fix what could be fixed, but without dropping that
feature entirely (see below).

personal opinion
Still, it's my opinion that these BIOSes are simply broken:
interpretation of the fdisk table has always been in the realm of the
boot block itself.  The BIOS should decide whether a disk is bootable
or not by looking at the 0x55aa signature at the end, nothing else.
Think of the old OnTrack Disk Manager that extended the fdisk table to
16 slots -- nothing the BIOS could ever even handle.  It was in the
realm of the boot block to interpret it.
/personal opinion

 Aside from that, FreeBSD needs to have *one* recommendation for
 disks, anything else creates too much confusion.

DD mode has never been a recommendation.  It is for those who know
what it means.  I'm only against the idea to silently drop support for
it...  fdisk tables are something that has been designed in the
previous millenium, and i think nobody is going to argue about it that
they are rather a mis-design from the beginning (or even no design at
all, but an ad-hoc implementation).  Two different values for the same
(which could become conflicting, thus making disks unportable between
different controllers), not enough value space to even remotely cover
the disks of our millenium, enforcement of something they call
`geometry' which isn't even remotely related to the disks' geometry
anymore at all, far too few possible entries anyway, ...  FreeBSD is
in a position where it doesn't strictly require the existence of such
an obsoleted implementation detail, so we should users leave the
freedom of decision.

Again, it has never been the recommendation (well, at least not after
386BSD 0.0 :), and i normally wouldn't recommend it to the innocent
user.  But we should not break it either.

 (The other day a coworker of mine wanted to use DD for some IBM DTLA
 disks, because he'd heard that the disks performed better that way -
 something to do with scatter-gather not working right unless you
 used DD. [...])

As much as i personally prefer DD mode: that's an urban legend.

-- 
cheers, Jorg   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-09 Thread Mike Smith

 As Peter Wemm wrote:
 
  There shouldn't *be* bootblocks on non-boot disks.
  
  dd if=/dev/zero of=/dev/da$n count=1
  
  Dont use disklabel -B -rw da$n auto.  Use disklabel -rw da$n auto.
 
 All my disks have bootblocks and (spare) boot partitions.  All the
 bootblocks are DD mode.  I don't see any point in using obsolete fdisk
 tables.  (There's IMHO only one purpose obsolete fdisk tables are good
 for, co-operation with other operating systems in the same machine.
 None of my machines uses anything else than FreeBSD.)

Since I tire of seeing people hit this ignorant opinion in the list 
archives, I'll just offer the rational counterpoints.

 - The MBR partition table is not obsolete, it's a part of the PC 
   architecture specification.
 - You omit the fact that many peripheral device vendors' BIOS code looks 
   for the MBR partition table, and will fail if it's not present or 
   incorrect.

You do realise that DD mode does include a (invalid) MBR partition 
table (now valid, courtesy of a long-needed fix), right?

I'd love to never hear those invalid, unuseful, misleading opinions from 
you again.


-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
   V I C T O R Y   N O T   V E N G E A N C E



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-09 Thread Mike Smith

 (The other day a coworker of mine wanted to use DD for some IBM DTLA
 disks, because he'd heard that the disks performed better that way -
 something to do with scatter-gather not working right unless you used
 DD. I'm highly skeptical about this since I have my own measurements
 from IBM DTLA disks partitioned the normal way, ie. NOT DD, and they
 show the disks performing extremely well. Anybody else want to comment
 on this?)

Since scatter-gather has nothing to do with the disk (it's a feature of 
the disk controller's interface to host memory), I think this coworker of 
yours is delusional.

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
   V I C T O R Y   N O T   V E N G E A N C E



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-09 Thread Joerg Wunsch

Mike Smith [EMAIL PROTECTED] wrote:

  - The MBR partition table is not obsolete, it's a part of the PC 
architecture specification.

Its design is antique.  Or rather: it's missing a design.  See other
mail for the reasons.  For FreeBSD, it's obsolete since we don't need
to rely on fdisk slices.  (Or rather: it's optional.  We can make good
use of it when it's there, but we don't need to insist on it being
there.)

 You do realise that DD mode does include a (invalid) MBR partition
 table (now valid, courtesy of a long-needed fix), right?

Yes, of course, one that is basically ignored by everything.  It has
always been there, back since 386BSD 0.1.  386BSD 0.0 didn't recognize
fdisk tables at all, but could only live on a disk by its own (as any
other BSD before used to).

 I'd love to never hear those invalid, unuseful, misleading opinions
 from you again.

ETOOMANYATTRIBUTES? :-)

As long as you keep the feature of DD mode intact, i won't argue.  If
people feel like creating disks that aren't portable to another
controller, they should do.  I don't like this idea.

But to be honest, see my other article: i never argued to make this
the default or a recommended strategy in any form.  It should only
remain intact at all.  Back to the subject, the current warning
however, is pointless, and has the major drawback to potentially hide
important console messages.

-- 
cheers, Jorg   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-09 Thread Greg Lehey

On Sunday,  9 December 2001 at 22:52:58 +1030, Daniel O'Connor wrote:

 On 09-Dec-2001 [EMAIL PROTECTED] wrote:
  (The other day a coworker of mine wanted to use DD for some IBM DTLA
  disks, because he'd heard that the disks performed better that way -
  something to do with scatter-gather not working right unless you used
  DD. I'm highly skeptical about this since I have my own measurements
  from IBM DTLA disks partitioned the normal way, ie. NOT DD, and they
  show the disks performing extremely well. Anybody else want to comment
  on this?)

 Sounds like an Old Wives Tale to me.

 I don't understand the need some people have for using something that is
 labelled as DANGEROUS.

I don't understand the need some people have for labelling something
as DANGEROUS when it works nearly all the time.

We don't have many disks which are shared between different platforms,
but that will change.  As you know, I have the ability to hot swap
disks between an RS/6000 platform and an ia32 platform.  The RS/6000
disks will never have a Microsoft partition table on them.  They will
have BSD partition tables on them.  Why call this dangerous?

Greg
--
See complete headers for address and phone numbers

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Dangerously Decidated yet again (was : cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-09 Thread Greg Lehey

On Sunday,  9 December 2001 at 12:15:19 -0800, Mike Smith wrote:
 As Peter Wemm wrote:

 There shouldn't *be* bootblocks on non-boot disks.

 dd if=/dev/zero of=/dev/da$n count=1

 Dont use disklabel -B -rw da$n auto.  Use disklabel -rw da$n auto.

 All my disks have bootblocks and (spare) boot partitions.  All the
 bootblocks are DD mode.  I don't see any point in using obsolete fdisk
 tables.  (There's IMHO only one purpose obsolete fdisk tables are good
 for, co-operation with other operating systems in the same machine.
 None of my machines uses anything else than FreeBSD.)

 Since I tire of seeing people hit this ignorant opinion in the list
 archives, I'll just offer the rational counterpoints.

  - The MBR partition table is not obsolete, it's a part of the PC
architecture specification.

And if it's part of the PC architecture specification, it can't be
obsolete?  I dont see any contradiction here.

  - You omit the fact that many peripheral device vendors' BIOS code looks
for the MBR partition table, and will fail if it's not present or
incorrect.

What do you mean by peripheral device?  I've never heard of disk
drives having a BIOS.  If you're talking about host adaptors, it's you
who omit what Jörg says about it:

No, on the contrary, he went into some detail on this point:

On Sunday,  9 December 2001 at 19:46:06 +0100, Joerg Wunsch wrote:

 personal opinion
 Still, it's my opinion that these BIOSes are simply broken:
 interpretation of the fdisk table has always been in the realm of the
 boot block itself.  The BIOS should decide whether a disk is bootable
 or not by looking at the 0x55aa signature at the end, nothing else.
 Think of the old OnTrack Disk Manager that extended the fdisk table to
 16 slots -- nothing the BIOS could ever even handle.  It was in the
 realm of the boot block to interpret it.
 /personal opinion

I agree with Jörg on this.

 I'd love to never hear those invalid, unuseful, misleading opinions
 from you again.

I'd love to never have to see this level of invective poured onto what
was previously a calm discussion.

Greg
--
See complete headers for address and phone numbers

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-09 Thread Peter Wemm

Joerg Wunsch wrote:
 As Peter Wemm wrote:
 
  There shouldn't *be* bootblocks on non-boot disks.
  
  dd if=/dev/zero of=/dev/da$n count=1
  
  Dont use disklabel -B -rw da$n auto.  Use disklabel -rw da$n auto.
 
 All my disks have bootblocks and (spare) boot partitions.  All the
 bootblocks are DD mode.  I don't see any point in using obsolete fdisk
 tables.  (There's IMHO only one purpose obsolete fdisk tables are good
 for, co-operation with other operating systems in the same machine.
 None of my machines uses anything else than FreeBSD.)

The problem is, that you **are** using fdisk tables, you have no choice.
DD mode included a *broken* fdisk table that specified an illegal geometry.

This illegal geometry was the reason why Thinkpad Laptops would wedge solid
when you installed FreeBSD on it.

This illegal geometry is the reason why FreeBSD disks wedge solid any EFI
system unless you remove the illegal geometry tables.

This illegal geometry causes divide by zero errors in a handful of scsi
bioses from Adaptec.

This illegal geometry causes divide by zero errors in a handful of scsi
bioses from NCR/Symbios.

This is why it is called dangerous.

Cheers,
-Peter
--
Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
All of this is for nothing if we don't go to the stars - JMS/B5


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-09 Thread Matthew Dillon


:This illegal geometry causes divide by zero errors in a handful of scsi
:bioses from Adaptec.
:
:This illegal geometry causes divide by zero errors in a handful of scsi
:bioses from NCR/Symbios.
:
:This is why it is called dangerous.
:
:Cheers,
:-Peter
:--
:Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]

Handful?  I'm taking my life in my hands if I DD a DELL machine.  BEWM!
As I found out the hard way about a year ago.  (Probably the Adaptec 
firmware but every Dell rack-mount has one so...).  The machines wouldn't
boot again until I pulled the physical drives and then camcontrol 
rescan'd them in after a CD boot.  Joy.

This is why I fixed disklabel -B to operate properly on slices and 
added a whole section to the end of 'man disklabel' to describe how
to do it.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-09 Thread Peter Wemm

Joerg Wunsch wrote:
 Mike Smith [EMAIL PROTECTED] wrote:
 
   - The MBR partition table is not obsolete, it's a part of the PC 
 architecture specification.
 
 Its design is antique.  Or rather: it's missing a design.  See other
 mail for the reasons.  For FreeBSD, it's obsolete since we don't need
 to rely on fdisk slices.  (Or rather: it's optional.  We can make good
 use of it when it's there, but we don't need to insist on it being
 there.)

No, it isn't.  We specifically have a copy of both the broken and fixed
fdisk tables in the kernel and do a bcmp() to see if the fdisk table that
is included in /boot/boot1 **uncoditionally** is in fact the dangerously
dedicated table.  If it is, then we specifically reject it or we end up
with a disk size of 25MB (5 sectors).

  You do realise that DD mode does include a (invalid) MBR partition
  table (now valid, courtesy of a long-needed fix), right?
 
 Yes, of course, one that is basically ignored by everything.  It has
 always been there, back since 386BSD 0.1.  386BSD 0.0 didn't recognize
 fdisk tables at all, but could only live on a disk by its own (as any
 other BSD before used to).

No, it isn't ignored,  BIOS'es know that fdisk partitions end on cylinder
boundaries, and therefore can intuit what the expected geometry is for
the disk in question.  It does this in order to allow the CHS int 0x13
calls to work.  The problem is that the int13 code only allowed for 255 heads,
and the fake end of disk entry that is unconditionally in /boot/boot1
specified an ending head number 255 (ie: 256 heads).  When this gets put
into a byte register it is truncated to zero and we get divide by zero
errors.

  I'd love to never hear those invalid, unuseful, misleading opinions
  from you again.
 
 ETOOMANYATTRIBUTES? :-)
 
 As long as you keep the feature of DD mode intact, i won't argue.  If
 people feel like creating disks that aren't portable to another
 controller, they should do.  I don't like this idea.

We can just as easily have bootable-DD mode with a real MBR and have
freebsd start on sector #2 instead of overlapping boot1 and mbr.   This
costs only one sector instead of 64 sectors (a whopping 32K, I'm sure that
is going to break the bank on today's disks).

I'd rather that we be specific about this.  If somebody wants ad2e or da2e
then they should not be using *any* fdisk tables at all.  Ie: block 0
should be empty.  The problem is that if you put /boot/boot1 in there, then
suddenly it looks like a fdisk disk and we have to have ugly magic to
detect it and prevent the fake table from being used.  I would prefer that
the fdisk table come out of /boot/boot1 so that we dont have to have it by
default, and we use fdisk to install the DD magic table if somebody wants
to make it bootable.

 But to be honest, see my other article: i never argued to make this
 the default or a recommended strategy in any form.  It should only
 remain intact at all.  Back to the subject, the current warning
 however, is pointless, and has the major drawback to potentially hide
 important console messages.

The console buffer is 32K these days.  You'd have to have around 300
disks to have any real effect on the kernel.

Cheers,
-Peter
--
Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
All of this is for nothing if we don't go to the stars - JMS/B5


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously Decidated yet again (was : cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-09 Thread Mike Smith

 On Sunday,  9 December 2001 at 19:46:06 +0100, Joerg Wunsch wrote:
 
  personal opinion
  Still, it's my opinion that these BIOSes are simply broken:

Joerg's personal opinion can go take a hike.  The reality of the 
situation is that this table is required, and we're going to put it there.

End of story.

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
   V I C T O R Y   N O T   V E N G E A N C E



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously Decidated yet again (was : cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-09 Thread Greg Lehey

On Sunday,  9 December 2001 at 18:32:38 -0800, Mike Smith wrote:
 On Sunday,  9 December 2001 at 19:46:06 +0100, Joerg Wunsch wrote:

 personal opinion
 Still, it's my opinion that these BIOSes are simply broken:

 Joerg's personal opinion can go take a hike.  The reality of the
 situation is that this table is required, and we're going to put it there.

The reality of the situation is far from being clear.  The only thing
I can see is that you're trying to dictate things without adequate
justification.  You should reconsider that attitude.

Greg
--
See complete headers for address and phone numbers

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-09 Thread Terry Lambert

Greg Lehey wrote:

[ ... IBM DTLA drives ... ]

IBM DTLA drives are known to rotate fast enough near the spindle
that the sustained write speed exceeds the ability of the controller
electronics to keep up, and results in crap being written to disk.

This is not often a problem with windows, the FS of shich fills
sectors in towards the spindle, so you only hit the problem when
you near the disk full state.

Do a Google/Tom's Hardware search to reassure yourself that I am
not smoking anything.

  I don't understand the need some people have for using something that is
  labelled as DANGEROUS.
 
 I don't understand the need some people have for labelling something
 as DANGEROUS when it works nearly all the time.

It's because you have to reinstall, should you want to add a second
OS at a later date (e.g. Linux, or Windows).

 We don't have many disks which are shared between different platforms,
 but that will change.  As you know, I have the ability to hot swap
 disks between an RS/6000 platform and an ia32 platform.  The RS/6000
 disks will never have a Microsoft partition table on them.  They will
 have BSD partition tables on them.  Why call this dangerous?

Your use is orthogonal to the most common expected usage, which is
disks shared between OSs on a single platform, rather than disks
shared between a single OS on multiple platforms.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-09 Thread Terry Lambert

 Joerg Wunsch wrote:
  Mike Smith [EMAIL PROTECTED] wrote:
- The MBR partition table is not obsolete, it's a part of the PC
  architecture specification.
 
  Its design is antique.  Or rather: it's missing a design.  See other
  mail for the reasons.  For FreeBSD, it's obsolete since we don't need
  to rely on fdisk slices.  (Or rather: it's optional.  We can make good
  use of it when it's there, but we don't need to insist on it being
  there.)

FWIW: The MBR layout is documented in great gory detail in chapter 6
of the PReP specififcation, which I believe is now available on line
from the PowerPC folks, Apple, and Motorolla, and also as an IBM
redbook.  It discusses everything, including the LBA fields, and
sharing disks between PPC (running in Motorolla byte order) and x86
machines (running a DOS-derived OS).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-09 Thread Greg Lehey

On Sunday,  9 December 2001 at 18:46:24 -0800, Terry Lambert wrote:
 Greg Lehey wrote:

 [ ... IBM DTLA drives ... ]

No, that wasn't me.

 IBM DTLA drives are known to rotate fast enough near the spindle
 that the sustained write speed exceeds the ability of the controller
 electronics to keep up, and results in crap being written to disk.

What about the cache?

 This is not often a problem with windows, the FS of shich fills
 sectors in towards the spindle, so you only hit the problem when you
 near the disk full state.

This sounds very unlikely.

 Do a Google/Tom's Hardware search to reassure yourself that I am not
 smoking anything.

I think I'd rather put the shoe on the other foot.  This looks like
high-grade crack.  Who was smoking it?

 I don't understand the need some people have for using something that is
 labelled as DANGEROUS.

 I don't understand the need some people have for labelling something
 as DANGEROUS when it works nearly all the time.

I *did* write this.

 It's because you have to reinstall, should you want to add a second
 OS at a later date (e.g. Linux, or Windows).

So all dedicated installations are dangerous?   I would have to do
that whether I had a Microsoft partition table or not if I had already
used the entire disk for FreeBSD.

 We don't have many disks which are shared between different platforms,
 but that will change.  As you know, I have the ability to hot swap
 disks between an RS/6000 platform and an ia32 platform.  The RS/6000
 disks will never have a Microsoft partition table on them.  They will
 have BSD partition tables on them.  Why call this dangerous?

 Your use is orthogonal to the most common expected usage, which is
 disks shared between OSs on a single platform, rather than disks
 shared between a single OS on multiple platforms.

Expected usage is to install once and then never change it.

Greg
--
See complete headers for address and phone numbers

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-09 Thread Terry Lambert

Greg Lehey wrote:
  [ ... IBM DTLA drives ... ]
 
 No, that wasn't me.

I didn't quote the full thing; that's what the brackets and ellipsis
was for.


  IBM DTLA drives are known to rotate fast enough near the spindle
  that the sustained write speed exceeds the ability of the controller
  electronics to keep up, and results in crap being written to disk.
 
 What about the cache?

Good point.  The cache is known to not actually flush to disk when
ordered to do so.  See the EXT3FS article on www.ibm.com/developerworks
for more details.


  This is not often a problem with windows, the FS of shich fills
  sectors in towards the spindle, so you only hit the problem when you
  near the disk full state.
 
 This sounds very unlikely.

I know, doesn't it?  Good thing Tom's Hardware is so thorough, or we
might never have known this, with everyone on the verge of discovering
it simply dismissing it as very unlikely.  8^).

  Do a Google/Tom's Hardware search to reassure yourself that I am not
  smoking anything.
 
 I think I'd rather put the shoe on the other foot.  This looks like
 high-grade crack.  Who was smoking it?

Tom's Hardware, IBM, CNET, Storave Review, etc..

http://www6.tomshardware.com/storage/00q3/000821/ibmdtla-07.html
http://www.storage.ibm.com/hdd/prod/deskstar.htm
http://computers.cnet.com/hardware/0-1092-418-1664463.html?pn=3lb=2ob=0tag=st\.co.1092.bottom.1664463-3
http://www.storagereview.com/welcome.pl?/http://www.storagereview.com/jive/sr/thread.jsp?forum=2thread=12485

I suggest the search:

http://google.yahoo.com/bin/query?p=DTLA+drive+problemhc=0hs=0


  It's because you have to reinstall, should you want to add a second
  OS at a later date (e.g. Linux, or Windows).
 
 So all dedicated installations are dangerous?   I would have to do
 that whether I had a Microsoft partition table or not if I had already
 used the entire disk for FreeBSD.

Yes.  I don't understand your point.


  Your use is orthogonal to the most common expected usage, which is
  disks shared between OSs on a single platform, rather than disks
  shared between a single OS on multiple platforms.
 
 Expected usage is to install once and then never change it.

No, expected usage is to purchase a machine with an OS preinstalled,
and then install FreeBSD/Linux/BeOS/other third party OS as an also
ran, rather than the primary OS.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-09 Thread Terry Lambert

Greg Lehey wrote:
[ ... DTLA drives ... ]

  Do a Google/Tom's Hardware search to reassure yourself that I am not
  smoking anything.
 
 I think I'd rather put the shoe on the other foot.  This looks like
 high-grade crack.  Who was smoking it?


For your further amusement, here is a pointer to the class action
lawsuit against IBM on the 75GXP DTLA drives:

http://www.tech-report.com/news_reply.x/3035/3/

It includes a pointer to the PDF of the complaint form.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-09 Thread Matthew Dillon

On google search for:

deskstar 75gxp class action

http://www.theregister.co.uk/content/54/22412.html
http://www.pcworld.com/news/article/0,aid,67608,00.asp

etc...  So apparently my warning about these drives in 'man tuning' is
still appropriate :-)

-Matt

:  IBM DTLA drives are known to rotate fast enough near the spindle
:  that the sustained write speed exceeds the ability of the controller
:  electronics to keep up, and results in crap being written to disk.
: 
: What about the cache?
:
:Good point.  The cache is known to not actually flush to disk when
:ordered to do so.  See the EXT3FS article on www.ibm.com/developerworks
:for more details.
:
:  This is not often a problem with windows, the FS of shich fills
:  sectors in towards the spindle, so you only hit the problem when you
:  near the disk full state.
: 
: This sounds very unlikely.
:
:I know, doesn't it?  Good thing Tom's Hardware is so thorough, or we
:might never have known this, with everyone on the verge of discovering
:it simply dismissing it as very unlikely.  8^).
:...


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-09 Thread David O'Brien

On Sun, Dec 09, 2001 at 11:00:19PM +0100, Joerg Wunsch wrote:
 Mike Smith [EMAIL PROTECTED] wrote:
   - The MBR partition table is not obsolete, it's a part of the PC 
 architecture specification.
 
 Its design is antique.  Or rather: it's missing a design.  See other
 mail for the reasons.  For FreeBSD, it's obsolete since we don't need
 to rely on fdisk slices.  (Or rather: it's optional.  We can make good
 use of it when it's there, but we don't need to insist on it being
 there.)

Jorg, why not just buy an Alpha or Sun Blade and run FreeBSD on it??
You will get the traditional Unix workstation partitioning you so much
long for.  It really seems your arguments are nothing more than MBR's
are a M$ and IBM PeeCee thing, and I hate anything PeeCee.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



IBM DTLA drives (was: Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c) )

2001-12-09 Thread Peter Wemm

Matthew Dillon wrote:
 : etc...  So apparently my warning about these drives in 'man tuning' is
 : still appropriate :-)
 : 
 :-Matt
 : 
 : :  IBM DTLA drives are known to rotate fast enough near the spindle
 : :  that the sustained write speed exceeds the ability of the controller
 : :  electronics to keep up, and results in crap being written to disk.
 :
 :
 :I would adssume it actually the tracks FURTHEREST from the spindle..
 
 This is the first I've heard of the alleged controller electronics
 performance problem.  My understanding is that the failures are due 
 to manufacturing problems, but people have apparently experienced
 software lockups as well.
 
 What is not in doubt is that there have been some severe problems with
 this model.

Yes there are two problems.  The physical failure problem seems to
be mostly restricted to the 75GXP.  However the electronics/bandwidth/
density/whatever-it-is problem is uniform across the entire DTLA line.
We stopped using 75GXP's at work a while back, but we still regularly
suffer from the electronics/bandwidth/whatever-it-is problem on 30G DTLA
drives on a daily basis.

Cheers,
-Peter
--
Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
All of this is for nothing if we don't go to the stars - JMS/B5


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-09 Thread David W. Chapman Jr.

On Sun, Dec 09, 2001 at 06:46:24PM -0800, Terry Lambert wrote:
 It's because you have to reinstall, should you want to add a second
 OS at a later date (e.g. Linux, or Windows).

I think it has more to do with the drive going on a new motherboard 
that might not boot with dangerously dedicated than the above.

-- 
David W. Chapman Jr.
[EMAIL PROTECTED]   Raintree Network Services, Inc. www.inethouston.net
[EMAIL PROTECTED]   FreeBSD Committer www.FreeBSD.org

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-09 Thread Terry Lambert

David W. Chapman Jr. wrote:
 On Sun, Dec 09, 2001 at 06:46:24PM -0800, Terry Lambert wrote:
  It's because you have to reinstall, should you want to add a second
  OS at a later date (e.g. Linux, or Windows).
 
 I think it has more to do with the drive going on a new motherboard
 that might not boot with dangerously dedicated than the above.

The concept of dangerously dedicated significantly predates BIOS
being unable to boot such drives, either because of antivirus
checks, or because of automatic fictitious geometry determination
by Adaptec or NCR (now Symbios) controllers, which end up getting
divide by zero errors when parsing the fictitious partition
table that the FreeBSD dangerously dedicate mode includes in its
boot block.

In fact, I remember installing 386BSD dangerously dedicated on
an ATT WGS 386 ESDI drive, back in 1992.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-09 Thread David W. Chapman Jr.

  :  IBM DTLA drives are known to rotate fast enough near the spindle
  :  that the sustained write speed exceeds the ability of the controller
  :  electronics to keep up, and results in crap being written to disk.
 
 
 I would adssume it actually the tracks FURTHEREST from the spindle..


Wouldn't the linear speed be faster closer to the spindle at 7200 RPM 
than at the edge?


-- 
David W. Chapman Jr.
[EMAIL PROTECTED]   Raintree Network Services, Inc. www.inethouston.net
[EMAIL PROTECTED]   FreeBSD Committer www.FreeBSD.org

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-09 Thread Peter Wemm

David W. Chapman Jr. wrote:
   :  IBM DTLA drives are known to rotate fast enough near the spindle
   :  that the sustained write speed exceeds the ability of the controller
   :  electronics to keep up, and results in crap being written to disk.
  
  
  I would adssume it actually the tracks FURTHEREST from the spindle..
 
 Wouldn't the linear speed be faster closer to the spindle at 7200 RPM 
 than at the edge?

This particular tangent of the disk partitioning thread has gone *way*
off topic. :-)

Cheers,
-Peter
--
Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
All of this is for nothing if we don't go to the stars - JMS/B5


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-09 Thread Peter Wemm

David W. Chapman Jr. wrote:
 On Sun, Dec 09, 2001 at 06:46:24PM -0800, Terry Lambert wrote:
  It's because you have to reinstall, should you want to add a second
  OS at a later date (e.g. Linux, or Windows).
 
 I think it has more to do with the drive going on a new motherboard 
 that might not boot with dangerously dedicated than the above.

.. And the mere presence of one of the disks that causes the bios
to lock up at boot.  Note that this is a particularly bad thing in
laptops.

There are three classes of behavior:
1) You luck out and it works
2) You get a bios divide-by-zero fault when you *boot* of the disk. This
   shows up as a 'BTX fault'.  If you check the lists, a good number of
   btx faults posted to the lists have int=0 (divide by zero) in them.
   The problem is more widespread than it appears.
3) You get a system lockup when booting the *computer* if *any* DD disk
   is attached anywhere at all.  This is what killed the Thinkpad T20*,
   A20*, 600X etc.  After all the yelling we did at IBM, it turned out
   to be FreeBSD's fault.  It also happens on Dell systems.  It kills
   all IA64 boxes if a FreeBSD/i386 disk is attached anywhere.

An additional problem is that because boot1 has got a fdisk table
embedded in it unconditionally, a freebsd partition *looks* like it has
got a recursive MBR in it.  This is what is really bad and is what is
killing us on newer systems.  What really sucks is that there is 
**NO WAY** to remove it with the tools that we have except a hex editor.

Cheers,
-Peter
--
Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
All of this is for nothing if we don't go to the stars - JMS/B5


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-09 Thread Andrew Kenneth Milton

+---[ David W. Chapman Jr. ]--
|   :  IBM DTLA drives are known to rotate fast enough near the spindle
|   :  that the sustained write speed exceeds the ability of the controller
|   :  electronics to keep up, and results in crap being written to disk.
|  
|  
|  I would adssume it actually the tracks FURTHEREST from the spindle..
| 
| 
| Wouldn't the linear speed be faster closer to the spindle at 7200 RPM 
| than at the edge?

er no.

The circumference of a circle is 2 PI r.

So as your distance from the spindle increases the amount of physical real estate
you're traversing increases. Since you are turning at a constant angular velocity, 
your linear velocity increases as the distance from the spindle increases 
by a factor of PI (or around 3 if you're not a maths person).

Even been at one of those carnivals where they have a spinning thing?
It's easier to stay near the centre, than near the edges, because you are moving
a *lot* quicker at the edges.

And just for the hell of it;

If you have a 3 unit disc doing 1 RPM

If you're 1/2 unit out you're doing  ~3 units/sec
If you're one unit out, you're doing ~6 units/sec
If you're two units out you're doing ~12 units/sec
at three;~18 units/sec

Multiply by 7200 and s/units/inches/
The outside of your disk is really moving

The density of the sectors at the outer edge is lighter than
near the centre, which mitigates the speed some what.

See Also: artficial gravity in space stations/ships/objects


-- 
Totally Holistic Enterprises Internet|  | Andrew Milton
The Internet (Aust) Pty Ltd  |  |
ACN: 082 081 472 ABN: 83 082 081 472 |  M:+61 416 022 411   | Carpe Daemon
PO Box 837 Indooroopilly QLD 4068|[EMAIL PROTECTED]| 

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Dangerously dedicated yet again (was: cvs commit: src/sys/kern subr_diskmbr.c)

2001-12-09 Thread Greg Lehey

On Sunday,  9 December 2001 at 22:44:52 -0800, Peter Wemm wrote:
 3) You get a system lockup when booting the *computer* if *any* DD disk
is attached anywhere at all.  This is what killed the Thinkpad T20*,
A20*, 600X etc.  After all the yelling we did at IBM, it turned out
to be FreeBSD's fault.  It also happens on Dell systems.  It kills
all IA64 boxes if a FreeBSD/i386 disk is attached anywhere.

What are you talking about?  The IBM lockup was due to the presence of
*Microsoft partition* of type 0xn5, for any value of n.  If these
systems also lock up with a dedicated disk, it's due to some other
bug.

Greg
--
See complete headers for address and phone numbers

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-08 Thread Joerg Wunsch

Bernd Walter [EMAIL PROTECTED] wrote:

 32 times for each disk on booting with most of 30 disks.
 Possibly it's triggered by vinums drive scanning.

Yep, same here (and it is triggered by vinum).

 What can I do about these messages?

Remove it.  It should not have been there in the first place, at least
not without an if (bootverbose) ... in front of it.  It isn't
telling any news anyway, because you certainly already knew that your
disks are using DD mode, and the last word is telling (ignored)
which is the intented and expected action to happen anyway.

I do understand Peters gripe about broken BIOSes that try to interpret
fdisk tables (where the fdisk table is actually in the domain of the
boot block itself).  The comments tell a bit more about it.  But
adding pointless messages that flush the boot log and possibly hide
important boot messages can't be goo.

-- 
cheers, Jorg   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-08 Thread Matthew Dillon

:boot block itself).  The comments tell a bit more about it.  But
:adding pointless messages that flush the boot log and possibly hide
:important boot messages can't be goo.
:
:-- 
:cheers, Jorg   .-.-.   --... ...--   -.. .  DL8DTL

Yes, Goo in the computer is wery, wery bad!

-Matt

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-08 Thread Peter Wemm

Joerg Wunsch wrote:
 Bernd Walter [EMAIL PROTECTED] wrote:
 
  32 times for each disk on booting with most of 30 disks.
  Possibly it's triggered by vinums drive scanning.
 
 Yep, same here (and it is triggered by vinum).
 
  What can I do about these messages?
 
 Remove it.  It should not have been there in the first place, at least
 not without an if (bootverbose) ... in front of it.  It isn't
 telling any news anyway, because you certainly already knew that your
 disks are using DD mode, and the last word is telling (ignored)
 which is the intented and expected action to happen anyway.

There shouldn't *be* bootblocks on non-boot disks.

dd if=/dev/zero of=/dev/da$n count=1

Dont use disklabel -B -rw da$n auto.  Use disklabel -rw da$n auto.


Cheers,
-Peter
--
Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
All of this is for nothing if we don't go to the stars - JMS/B5


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-08 Thread Bernd Walter

On Sat, Dec 08, 2001 at 05:09:11PM -0800, Peter Wemm wrote:
 Joerg Wunsch wrote:
  Bernd Walter [EMAIL PROTECTED] wrote:
  
   32 times for each disk on booting with most of 30 disks.
   Possibly it's triggered by vinums drive scanning.
  
  Yep, same here (and it is triggered by vinum).
  
   What can I do about these messages?
  
  Remove it.  It should not have been there in the first place, at least
  not without an if (bootverbose) ... in front of it.  It isn't
  telling any news anyway, because you certainly already knew that your
  disks are using DD mode, and the last word is telling (ignored)
  which is the intented and expected action to happen anyway.
 
 There shouldn't *be* bootblocks on non-boot disks.

I usually have a /boot/loader.work and a /boot/kernel.work for updating.
What is wrong about having spare bootblocks?
In fact I already needed them twice and the diskspace is unused anyway.

-- 
B.Walter  COSMO-Project http://www.cosmo-project.de
[EMAIL PROTECTED] Usergroup   [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: cvs commit: src/sys/kern subr_diskmbr.c

2001-12-07 Thread Bernd Walter

On Wed, Nov 21, 2001 at 12:31:45AM -0800, Peter Wemm wrote:
 peter   2001/11/21 00:31:45 PST
 
   Modified files:
 sys/kern subr_diskmbr.c 
   Log:
   Recognize the fixed geometry in boot1 so that DD disks are not
   interpreted as real fdisk tables (and fail).
   
   Revision  ChangesPath
   1.53  +31 -6 src/sys/kern/subr_diskmbr.c

Maybe I'm a bit late with this subject.
I have updated a machine yesterday and get these messages:
da28: invalid primary partition table: Dangerously Dedicated (ignored)

32 times for each disk on booting with most of 30 disks.
Possibly it's triggered by vinums drive scanning.
OK it was unneeded to install bootblocks on dedicated disks other than
the boot device, but it's a lot of noise for that.

What can I do about these messages?

-- 
B.Walter  COSMO-Project http://www.cosmo-project.de
[EMAIL PROTECTED] Usergroup   [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message