Re: RSDL v0.31

2007-03-17 Thread Mark Hahn

So in an attempt to summarise the situation, what are the advantages of RSDL
over mainline.

Fairness


why do you think fairness is good, especially always good?


Starvation free


even starvation is sometimes a good thing - there's a place for processes
that only use the CPU if it is otherwise idle.  that is, they are
deliberately starved all the rest of the time.


Much lower and bound latencies


in an average sense?  also, under what circumstances does this actually
matter?  (please don't offer something like RT audio on an overloaded machine-
that's operator error, not something to design for.)


Deterministic


not a bad thing, but how does this make itself apparent and of value 
to the user?  I think everyone is extremely comfortable with non-determinism

(stemming from networks, caches, interleaved workloads, etc)


Better interactivity for the majority of cases.


how is this measured?  is this statement really just a reiteration of 
the latency claim?



Now concentrating on the very last aspect since that seems to be the sticking
point.


nah, I think the fairness and latency claims are the real issues.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RSDL v0.31

2007-03-17 Thread Mark Hahn

So in an attempt to summarise the situation, what are the advantages of RSDL
over mainline.

Fairness


why do you think fairness is good, especially always good?


Starvation free


even starvation is sometimes a good thing - there's a place for processes
that only use the CPU if it is otherwise idle.  that is, they are
deliberately starved all the rest of the time.


Much lower and bound latencies


in an average sense?  also, under what circumstances does this actually
matter?  (please don't offer something like RT audio on an overloaded machine-
that's operator error, not something to design for.)


Deterministic


not a bad thing, but how does this make itself apparent and of value 
to the user?  I think everyone is extremely comfortable with non-determinism

(stemming from networks, caches, interleaved workloads, etc)


Better interactivity for the majority of cases.


how is this measured?  is this statement really just a reiteration of 
the latency claim?



Now concentrating on the very last aspect since that seems to be the sticking
point.


nah, I think the fairness and latency claims are the real issues.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-25 Thread Mark Hahn

Something is seriously wrong with that OOM killer.


do you know you don't have to operate in OOM-slaughter mode?

"vm.overcommit_memory = 2" in your /etc/sysctl.conf puts you 
into a mode where the kernel tracks your "committed" memory 
needs, and will eventually cause some allocations to fail.

this is often much nicer than the default random OOM slaughter.
(you probably also need to adjust vm.overcommit_ratio with 
some knowlege of your MemTotal and SwapTotal.)


regards, mark hahn.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2

2007-01-25 Thread Mark Hahn

Something is seriously wrong with that OOM killer.


do you know you don't have to operate in OOM-slaughter mode?

vm.overcommit_memory = 2 in your /etc/sysctl.conf puts you 
into a mode where the kernel tracks your committed memory 
needs, and will eventually cause some allocations to fail.

this is often much nicer than the default random OOM slaughter.
(you probably also need to adjust vm.overcommit_ratio with 
some knowlege of your MemTotal and SwapTotal.)


regards, mark hahn.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Where is the performance bottleneck?

2005-08-29 Thread Mark Hahn
>  8 SCSI U320 (15000 rpm) disks where 4 disks (sdc, sdd, sde, sdf)

figure each is worth, say, 60 MB/s, so you'll peak (theoretically) at 
240 MB/s per channel.

> The U320 SCSI controller has a 64 bit PCI-X bus for itself, there is no other
> device on that bus. Unfortunatly I was unable to determine at what speed
> it is running, here the output from lspci -vv:
...
>  Status: Bus=2 Dev=4 Func=0 64bit+ 133MHz+ SCD- USC-, 
> DC=simple,

the "133MHz+" is a good sign.  OTOH the latency (72) seems rather low - my
understanding is that that would noticably limit the size of burst transfers.

> Anyway, I thought with this system I would get theoretically 640 MB/s using
> both channels.

"theoretically" in the same sense as "according to quantum theory,
Bush and BinLadin may swap bodies tomorrow morning at 4:59."

> write speeds for this system. But testing shows that the absolute maximum I
> can reach with software raid is only approx. 270 MB/s for writting. Which is
> very disappointing.

it's a bit low, but "very" is unrealistic...

> deadline and distribution is fedora core 4 x86_64 with all updates. Chunksize
> is always the default from mdadm (64k). Filesystem was always created with the
> command mke2fs -j -b4096 -O dir_index /dev/mdx.

bear in mind that a 64k chunksize means that an 8 disk raid5 will really
only work well for writes that are multiples of of 7*64=448K...

> I also have tried with 2.6.13-rc7, but here the speed was much lower, the
> maximum there was approx. 140 MB/s for writting.

hmm, there should not have been any such dramatic slowdown.

> Version  1.03--Sequential Output-- --Sequential Input- 
> --Random-
>   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
> --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
> %CP
> Raid0 (8 disk)15744M 54406  96 247419 90 100752 25 60266  98 226651 29 830.2  
>  1
> Raid0s(4 disk)15744M 54915  97 253642 89 73976  18 59445  97 198372 24 659.8  
>  1
> Raid0s(4 disk)15744M 54866  97 268361 95 72852  17 59165  97 187183 22 666.3  
>  1

you're obviously saturating something already with 2 disks.  did you play
with "blockdev --setra" setings?

> Raid5 (8 disk)15744M 55881  98 153735 51 61680  24 56229  95 207348 44 741.2  
>  1
> Raid5s(4 disk)15744M 55238  98 81023  28 36859  14 56358  95 193030 38 605.7  
>  1
> Raid5s(4 disk)15744M 54920  97 83680  29 36551  14 56917  95 185345 35 599.8  
>  1

the block-read shows that even with 3 disks, you're hitting ~190 MB/s,
which is pretty close to your actual disk speed.  the low value for block-out
is probably just due to non-stripe writes needing R/M/W cycles.

> /dev/sdc  15744M 53861  95 102270 35 25718   6 37273  60 76275   8 377.0  
>  0

the block-out is clearly distorted by buffer-cache (too high), but the 
input rate is good and consistent.  obvoiusly, it'll fall off somewhat 
towards inner tracks, but will probably still be above 50.

> Why do I only get 247 MB/s for writting and 227 MB/s for reading (from the
> bonnie++ results) for a Raid0 over 8 disks? I was expecting to get nearly
> three times those numbers if you take the numbers from the individual disks.

expecting 3x is unreasonable; 2x (480 or so) would be good.

I suspect that some (sw kernel) components are badly tuned for fast IO.
obviously, most machines are in the 50-100 MB/s range, so this is not
surprising.  readahead is certainly one, but there are also magic numbers
in MD as well, not to mention PCI latency, scsi driver tuning, probably
even /proc/sys/vm settings.

I've got some 4x2.6G opteron servers (same board, 32G PC3200), but alas,
end-users have found out about them.  not to mention that they only have 
3x160G SATA disks...

regards, mark hahn.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Where is the performance bottleneck?

2005-08-29 Thread Mark Hahn
  8 SCSI U320 (15000 rpm) disks where 4 disks (sdc, sdd, sde, sdf)

figure each is worth, say, 60 MB/s, so you'll peak (theoretically) at 
240 MB/s per channel.

 The U320 SCSI controller has a 64 bit PCI-X bus for itself, there is no other
 device on that bus. Unfortunatly I was unable to determine at what speed
 it is running, here the output from lspci -vv:
...
  Status: Bus=2 Dev=4 Func=0 64bit+ 133MHz+ SCD- USC-, 
 DC=simple,

the 133MHz+ is a good sign.  OTOH the latency (72) seems rather low - my
understanding is that that would noticably limit the size of burst transfers.

 Anyway, I thought with this system I would get theoretically 640 MB/s using
 both channels.

theoretically in the same sense as according to quantum theory,
Bush and BinLadin may swap bodies tomorrow morning at 4:59.

 write speeds for this system. But testing shows that the absolute maximum I
 can reach with software raid is only approx. 270 MB/s for writting. Which is
 very disappointing.

it's a bit low, but very is unrealistic...

 deadline and distribution is fedora core 4 x86_64 with all updates. Chunksize
 is always the default from mdadm (64k). Filesystem was always created with the
 command mke2fs -j -b4096 -O dir_index /dev/mdx.

bear in mind that a 64k chunksize means that an 8 disk raid5 will really
only work well for writes that are multiples of of 7*64=448K...

 I also have tried with 2.6.13-rc7, but here the speed was much lower, the
 maximum there was approx. 140 MB/s for writting.

hmm, there should not have been any such dramatic slowdown.

 Version  1.03--Sequential Output-- --Sequential Input- 
 --Random-
   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
 --Seeks--
 Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
 %CP
 Raid0 (8 disk)15744M 54406  96 247419 90 100752 25 60266  98 226651 29 830.2  
  1
 Raid0s(4 disk)15744M 54915  97 253642 89 73976  18 59445  97 198372 24 659.8  
  1
 Raid0s(4 disk)15744M 54866  97 268361 95 72852  17 59165  97 187183 22 666.3  
  1

you're obviously saturating something already with 2 disks.  did you play
with blockdev --setra setings?

 Raid5 (8 disk)15744M 55881  98 153735 51 61680  24 56229  95 207348 44 741.2  
  1
 Raid5s(4 disk)15744M 55238  98 81023  28 36859  14 56358  95 193030 38 605.7  
  1
 Raid5s(4 disk)15744M 54920  97 83680  29 36551  14 56917  95 185345 35 599.8  
  1

the block-read shows that even with 3 disks, you're hitting ~190 MB/s,
which is pretty close to your actual disk speed.  the low value for block-out
is probably just due to non-stripe writes needing R/M/W cycles.

 /dev/sdc  15744M 53861  95 102270 35 25718   6 37273  60 76275   8 377.0  
  0

the block-out is clearly distorted by buffer-cache (too high), but the 
input rate is good and consistent.  obvoiusly, it'll fall off somewhat 
towards inner tracks, but will probably still be above 50.

 Why do I only get 247 MB/s for writting and 227 MB/s for reading (from the
 bonnie++ results) for a Raid0 over 8 disks? I was expecting to get nearly
 three times those numbers if you take the numbers from the individual disks.

expecting 3x is unreasonable; 2x (480 or so) would be good.

I suspect that some (sw kernel) components are badly tuned for fast IO.
obviously, most machines are in the 50-100 MB/s range, so this is not
surprising.  readahead is certainly one, but there are also magic numbers
in MD as well, not to mention PCI latency, scsi driver tuning, probably
even /proc/sys/vm settings.

I've got some 4x2.6G opteron servers (same board, 32G PC3200), but alas,
end-users have found out about them.  not to mention that they only have 
3x160G SATA disks...

regards, mark hahn.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-23 Thread Mark Hahn
> > if CKRM is just extensions, I think it should be an external patch.
> > if it provides a path towards unifying the many disparate RM mechanisms
> > already in the kernel, great!
> 
> OK, so if it provides a path towards unifying these, what should happen
> to the old interfaces when they conflict with those offered by CKRM?

I don't think the name matters, as long as the RM code is simplified/unified.
that is, the only difference at first would be a change in name - 
same behavior.

> For instance, I'm considering how a per-class (re)nice setting would
> work. What should happen when the user (re)nices a process to a
> different value than the nice of the process' class? Should CKRM:

it has to behave as it does now, unless the admin has imposed some 
class structure other than the normal POSIX one (ie, nice pertains 
only to a process and is inherited by future children.)

> a) disable the old interface by
>   i) removing it
>   ii) return an error when CKRM is active
>   iii) return an error when CKRM has specified a nice value for the
> process via membership in a class
>   iv) return an error when the (re)nice value is inconsistent with the
> nice value assigned to the class

some interfaces must remain (renice), and if their behavior is implemented
via CKRM, it must, by default, act as before.  other interfaces (say 
overcommit_ratio) probably don't need to remain.

> b) trust the user, ignore the class nice value, and allow the new nice
> value

users can only nice up, and that policy needs to remain, obviously.
you appear to be asking what happens when the scope of the old mechanism
conflicts with the scope determined by admin-set CKRM classes.  I'd 
say that nicing a single process should change the nice of the whole 
class that the process is in, if any.  otherwise, it acts to rip that 
process out of the class, which is probably even less 'least surprise'.

>   This sort of question would probably come up for any other CKRM
> "embraced-and-extended" tunables. Should they use the answer to this
> one, or would it go on a case-by-case basis?

I don't see that CKRM should play by rules different from other 
kernel improvements - preserve standard/former behavior when that 
behavior is documented (certainly nice is).  in the absense of admin-set
classes, nice would behave the same. 

all CKRM is doing here is providing a broader framework to hang the tunables
on.  it should be able to express all existing tunables in scope.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-23 Thread Mark Hahn
  if CKRM is just extensions, I think it should be an external patch.
  if it provides a path towards unifying the many disparate RM mechanisms
  already in the kernel, great!
 
 OK, so if it provides a path towards unifying these, what should happen
 to the old interfaces when they conflict with those offered by CKRM?

I don't think the name matters, as long as the RM code is simplified/unified.
that is, the only difference at first would be a change in name - 
same behavior.

 For instance, I'm considering how a per-class (re)nice setting would
 work. What should happen when the user (re)nices a process to a
 different value than the nice of the process' class? Should CKRM:

it has to behave as it does now, unless the admin has imposed some 
class structure other than the normal POSIX one (ie, nice pertains 
only to a process and is inherited by future children.)

 a) disable the old interface by
   i) removing it
   ii) return an error when CKRM is active
   iii) return an error when CKRM has specified a nice value for the
 process via membership in a class
   iv) return an error when the (re)nice value is inconsistent with the
 nice value assigned to the class

some interfaces must remain (renice), and if their behavior is implemented
via CKRM, it must, by default, act as before.  other interfaces (say 
overcommit_ratio) probably don't need to remain.

 b) trust the user, ignore the class nice value, and allow the new nice
 value

users can only nice up, and that policy needs to remain, obviously.
you appear to be asking what happens when the scope of the old mechanism
conflicts with the scope determined by admin-set CKRM classes.  I'd 
say that nicing a single process should change the nice of the whole 
class that the process is in, if any.  otherwise, it acts to rip that 
process out of the class, which is probably even less 'least surprise'.

   This sort of question would probably come up for any other CKRM
 embraced-and-extended tunables. Should they use the answer to this
 one, or would it go on a case-by-case basis?

I don't see that CKRM should play by rules different from other 
kernel improvements - preserve standard/former behavior when that 
behavior is documented (certainly nice is).  in the absense of admin-set
classes, nice would behave the same. 

all CKRM is doing here is providing a broader framework to hang the tunables
on.  it should be able to express all existing tunables in scope.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Mark Hahn
> > actually, let me also say that CKRM is on a continuum that includes 
> > current (global) /proc tuning for various subsystems, ulimits, and 
> > at the other end, Xen/VMM's.  it's conceivable that CKRM could wind up
> > being useful and fast enough to subsume the current global and per-proc
> > tunables.  after all, there are MANY places where the kernel tries to 
> > maintain some sort of context to allow it to tune/throttle/readahead
> > based on some process-linked context.  "embracing and extending"
> > those could make CKRM attractive to people outside the mainframe market.
> 
>   Seems like an excellent suggestion to me! Yeah, it may be possible to
> maintain the context the kernel keeps on a per-class basis instead of
> globally or per-process. 

right, but are the CKRM people ready to take this on?  for instance,
I just grepped 'throttle' in kernel/mm and found a per-task RM in 
page-writeback.c.  it even has a vaguely class-oriented logic, since
it exempts RT tasks.  if CKRM can become a way to make this stuff 
cleaner and more effective (again, for normal tasks), then great.
but bolting on a big new different, intrusive mechanism that slows
down all normal jobs by 3% just so someone can run 10K mostly-idle
guests on a giant Power box, well, that's gross.

> The real question is what constitutes a useful
> "extension" :).

if CKRM is just extensions, I think it should be an external patch.
if it provides a path towards unifying the many disparate RM mechanisms
already in the kernel, great!

>   I was thinking that per-class nice values might be a good place to
> start as well. One advantage of per-class as opposed to per-process nice
> is the class is less transient than the process since its lifetime is
> determined solely by the system administrator.

but the Linux RM needs to subsume traditional Unix process groups,
and inherited nice/schd class, and even CAP_ stuff.  I think CKRM
could start to do this, since classes are very general.
but merely adding a new, incompatible feature is just Not A Good Idea.

regards, mark hahn.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Mark Hahn
> > > the fast path slower and less maintainable.  if you are really concerned
> > > about isolating many competing servers on a single piece of hardware, then
> > > run separate virtualized environments, each with its own user-space.
> > 
> > And the virtualisation layer has to do the same job with less
> > information. That to me implies that the virtualisation case is likely
> > to be materially less efficient, its just the inefficiency you are
> > worried about is hidden in a different pieces of code.

I imagine you, like me, are currently sitting in the Xen talk,
and I don't believe they are or will do anything so dumb as to throw away
or lose information.  yes, in principle, the logic will need to be 
somewhere, and I'm suggesting that the virtualization logic should
be in VMM-only code so it has literally zero effect on host-native 
processes.  *or* the host-native fast-path.

> > Secondly a lot of this doesnt matter if CKRM=n compiles to no code
> > anyway
> 
> I'm actually trying to keep the impact of CKRM=y to near-zero, ergo
> only an impact if you create classes.  And even then, the goal is to
> keep that impact pretty small as well.

but to really do CKRM, you are going to want quite extensive interaction with
the scheduler, VM page replacement policies, etc.  all incredibly
performance-sensitive areas.

actually, let me also say that CKRM is on a continuum that includes 
current (global) /proc tuning for various subsystems, ulimits, and 
at the other end, Xen/VMM's.  it's conceivable that CKRM could wind up
being useful and fast enough to subsume the current global and per-proc
tunables.  after all, there are MANY places where the kernel tries to 
maintain some sort of context to allow it to tune/throttle/readahead
based on some process-linked context.  "embracing and extending"
those could make CKRM attractive to people outside the mainframe market.


> Plus you won't have to manage each operating system instance which
> can grow into a pain under virtualization.  But I still maintain that
> both have their place.

CKRM may have its place in an externally-maintained patch ;)

regards, mark hahn.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Mark Hahn
   the fast path slower and less maintainable.  if you are really concerned
   about isolating many competing servers on a single piece of hardware, then
   run separate virtualized environments, each with its own user-space.
  
  And the virtualisation layer has to do the same job with less
  information. That to me implies that the virtualisation case is likely
  to be materially less efficient, its just the inefficiency you are
  worried about is hidden in a different pieces of code.

I imagine you, like me, are currently sitting in the Xen talk,
and I don't believe they are or will do anything so dumb as to throw away
or lose information.  yes, in principle, the logic will need to be 
somewhere, and I'm suggesting that the virtualization logic should
be in VMM-only code so it has literally zero effect on host-native 
processes.  *or* the host-native fast-path.

  Secondly a lot of this doesnt matter if CKRM=n compiles to no code
  anyway
 
 I'm actually trying to keep the impact of CKRM=y to near-zero, ergo
 only an impact if you create classes.  And even then, the goal is to
 keep that impact pretty small as well.

but to really do CKRM, you are going to want quite extensive interaction with
the scheduler, VM page replacement policies, etc.  all incredibly
performance-sensitive areas.

actually, let me also say that CKRM is on a continuum that includes 
current (global) /proc tuning for various subsystems, ulimits, and 
at the other end, Xen/VMM's.  it's conceivable that CKRM could wind up
being useful and fast enough to subsume the current global and per-proc
tunables.  after all, there are MANY places where the kernel tries to 
maintain some sort of context to allow it to tune/throttle/readahead
based on some process-linked context.  embracing and extending
those could make CKRM attractive to people outside the mainframe market.


 Plus you won't have to manage each operating system instance which
 can grow into a pain under virtualization.  But I still maintain that
 both have their place.

CKRM may have its place in an externally-maintained patch ;)

regards, mark hahn.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-22 Thread Mark Hahn
  actually, let me also say that CKRM is on a continuum that includes 
  current (global) /proc tuning for various subsystems, ulimits, and 
  at the other end, Xen/VMM's.  it's conceivable that CKRM could wind up
  being useful and fast enough to subsume the current global and per-proc
  tunables.  after all, there are MANY places where the kernel tries to 
  maintain some sort of context to allow it to tune/throttle/readahead
  based on some process-linked context.  embracing and extending
  those could make CKRM attractive to people outside the mainframe market.
 
   Seems like an excellent suggestion to me! Yeah, it may be possible to
 maintain the context the kernel keeps on a per-class basis instead of
 globally or per-process. 

right, but are the CKRM people ready to take this on?  for instance,
I just grepped 'throttle' in kernel/mm and found a per-task RM in 
page-writeback.c.  it even has a vaguely class-oriented logic, since
it exempts RT tasks.  if CKRM can become a way to make this stuff 
cleaner and more effective (again, for normal tasks), then great.
but bolting on a big new different, intrusive mechanism that slows
down all normal jobs by 3% just so someone can run 10K mostly-idle
guests on a giant Power box, well, that's gross.

 The real question is what constitutes a useful
 extension :).

if CKRM is just extensions, I think it should be an external patch.
if it provides a path towards unifying the many disparate RM mechanisms
already in the kernel, great!

   I was thinking that per-class nice values might be a good place to
 start as well. One advantage of per-class as opposed to per-process nice
 is the class is less transient than the process since its lifetime is
 determined solely by the system administrator.

but the Linux RM needs to subsume traditional Unix process groups,
and inherited nice/schd class, and even CAP_ stuff.  I think CKRM
could start to do this, since classes are very general.
but merely adding a new, incompatible feature is just Not A Good Idea.

regards, mark hahn.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Mark Hahn
> of the various environments.  I don't think you are one of those end
> users, though.  I don't think I'm required to make everyone happy all
> the time.  ;)

the issue is whether CKRM (in it's real form, not this thin edge)
will noticably hurt Linux's fast-path.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Mark Hahn
> > > yes, that's the crux.  CKRM is all about resolving conflicting resource 
> > > demands in a multi-user, multi-server, multi-purpose machine.  this is a 
> > > huge undertaking, and I'd argue that it's completely inappropriate for 
> > > *most* servers.  that is, computers are generally so damn cheap that 
> > > the clear trend is towards dedicating a machine to a specific purpose, 
> > > rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single machine.  
>  
> This is a big NAK - if computers are so damn cheap, why is virtualization
> and consolidation such a big deal?  Well, the answer is actually that

yes, you did miss my point.  I'm actually arguing that it's bad design
to attempt to arbitrate within a single shared user-space.  you make 
the fast path slower and less maintainable.  if you are really concerned
about isolating many competing servers on a single piece of hardware, then
run separate virtualized environments, each with its own user-space.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Mark Hahn
   yes, that's the crux.  CKRM is all about resolving conflicting resource 
   demands in a multi-user, multi-server, multi-purpose machine.  this is a 
   huge undertaking, and I'd argue that it's completely inappropriate for 
   *most* servers.  that is, computers are generally so damn cheap that 
   the clear trend is towards dedicating a machine to a specific purpose, 
   rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single machine.  
  
 This is a big NAK - if computers are so damn cheap, why is virtualization
 and consolidation such a big deal?  Well, the answer is actually that

yes, you did miss my point.  I'm actually arguing that it's bad design
to attempt to arbitrate within a single shared user-space.  you make 
the fast path slower and less maintainable.  if you are really concerned
about isolating many competing servers on a single piece of hardware, then
run separate virtualized environments, each with its own user-space.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-21 Thread Mark Hahn
 of the various environments.  I don't think you are one of those end
 users, though.  I don't think I'm required to make everyone happy all
 the time.  ;)

the issue is whether CKRM (in it's real form, not this thin edge)
will noticably hurt Linux's fast-path.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-17 Thread Mark Hahn
> I suspect that the main problem is that this patch is not a mainstream
> kernel feature that will gain multiple uses, but rather provides
> support for a specific vendor middleware product used by that
> vendor and a few closely allied vendors.  If it were smaller or
> less intrusive, such as a driver, this would not be a big problem.
> That's not the case.

yes, that's the crux.  CKRM is all about resolving conflicting resource 
demands in a multi-user, multi-server, multi-purpose machine.  this is a 
huge undertaking, and I'd argue that it's completely inappropriate for 
*most* servers.  that is, computers are generally so damn cheap that 
the clear trend is towards dedicating a machine to a specific purpose, 
rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single machine.  

this is *directly* in conflict with certain prominent products, such as 
the Altix and various less-prominent Linux-based mainframes.  they're all
about partitioning/virtualization - the big-iron aesthetic of splitting up 
a single machine.  note that it's not just about "big", since cluster-based 
approaches can clearly scale far past big-iron, and are in effect statically
partitioned.  yes, buying a hideously expensive single box, and then chopping 
it into little pieces is more than a little bizarre, and is mainly based
on a couple assumptions:

- that clusters are hard.  really, they aren't.  they are not 
necessarily higher-maintenance, can be far more robust, usually
do cost less.  just about the only bad thing about clusters is 
that they tend to be somewhat larger in size.

- that partitioning actually makes sense.  the appeal is that if 
you have a partition to yourself, you can only hurt yourself.
but it also follows that burstiness in resource demand cannot be 
overlapped without either constantly tuning the partitions or 
infringing on the guarantee.

CKRM is one of those things that could be done to Linux, and will benefit a
few, but which will almost certainly hurt *most* of the community.

let me say that the CKRM design is actually quite good.  the issue is whether 
the extensive hooks it requires can be done (at all) in a way which does 
not disporportionately hurt maintainability or efficiency.

CKRM requires hooks into every resource-allocation decision fastpath:
- if CKRM is not CONFIG, the only overhead is software maintenance.
- if CKRM is CONFIG but not loaded, the overhead is a pointer check.
- if CKRM is CONFIG and loaded, the overhead is a pointer check
and a nontrivial callback.

but really, this is only for CKRM-enforced limits.  CKRM really wants to
change behavior in a more "weighted" way, not just causing an
allocation/fork/packet to fail.  a really meaningful CKRM needs to 
be tightly integrated into each resource manager - effecting each scheduler
(process, memory, IO, net).  I don't really see how full-on CKRM can be 
compiled out, unless these schedulers are made fully pluggable.

finally, I observe that pluggable, class-based resource _limits_ could 
probably be done without callbacks and potentially with low overhead.
but mere limits doesn't meet CKRM's goal of flexible, wide-spread resource 
partitioning within a large, shared machine.

regards, mark hahn.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3-mm1 (ckrm)

2005-07-17 Thread Mark Hahn
 I suspect that the main problem is that this patch is not a mainstream
 kernel feature that will gain multiple uses, but rather provides
 support for a specific vendor middleware product used by that
 vendor and a few closely allied vendors.  If it were smaller or
 less intrusive, such as a driver, this would not be a big problem.
 That's not the case.

yes, that's the crux.  CKRM is all about resolving conflicting resource 
demands in a multi-user, multi-server, multi-purpose machine.  this is a 
huge undertaking, and I'd argue that it's completely inappropriate for 
*most* servers.  that is, computers are generally so damn cheap that 
the clear trend is towards dedicating a machine to a specific purpose, 
rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single machine.  

this is *directly* in conflict with certain prominent products, such as 
the Altix and various less-prominent Linux-based mainframes.  they're all
about partitioning/virtualization - the big-iron aesthetic of splitting up 
a single machine.  note that it's not just about big, since cluster-based 
approaches can clearly scale far past big-iron, and are in effect statically
partitioned.  yes, buying a hideously expensive single box, and then chopping 
it into little pieces is more than a little bizarre, and is mainly based
on a couple assumptions:

- that clusters are hard.  really, they aren't.  they are not 
necessarily higher-maintenance, can be far more robust, usually
do cost less.  just about the only bad thing about clusters is 
that they tend to be somewhat larger in size.

- that partitioning actually makes sense.  the appeal is that if 
you have a partition to yourself, you can only hurt yourself.
but it also follows that burstiness in resource demand cannot be 
overlapped without either constantly tuning the partitions or 
infringing on the guarantee.

CKRM is one of those things that could be done to Linux, and will benefit a
few, but which will almost certainly hurt *most* of the community.

let me say that the CKRM design is actually quite good.  the issue is whether 
the extensive hooks it requires can be done (at all) in a way which does 
not disporportionately hurt maintainability or efficiency.

CKRM requires hooks into every resource-allocation decision fastpath:
- if CKRM is not CONFIG, the only overhead is software maintenance.
- if CKRM is CONFIG but not loaded, the overhead is a pointer check.
- if CKRM is CONFIG and loaded, the overhead is a pointer check
and a nontrivial callback.

but really, this is only for CKRM-enforced limits.  CKRM really wants to
change behavior in a more weighted way, not just causing an
allocation/fork/packet to fail.  a really meaningful CKRM needs to 
be tightly integrated into each resource manager - effecting each scheduler
(process, memory, IO, net).  I don't really see how full-on CKRM can be 
compiled out, unless these schedulers are made fully pluggable.

finally, I observe that pluggable, class-based resource _limits_ could 
probably be done without callbacks and potentially with low overhead.
but mere limits doesn't meet CKRM's goal of flexible, wide-spread resource 
partitioning within a large, shared machine.

regards, mark hahn.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11, IDE: Strange scheduling behaviour: high-pri RT process not scheduled?

2005-03-30 Thread Mark Hahn
> I've written a small test program which enables periodic RTC interrupts 
> at 8192 Hz and then goes into a loop reading /dev/rtc and collecting 
> timing statistics (using the rdtscl macro).

straightforward test, used for many years in the linux community
(I claim to have been the first to publish it on lkml ;)

> The test system runs a 2.6.11 kernel (no SMP) on a Pentium3 500 MHz 
> embedded hardware.

which probably has memory bandwidth of at most a couple hundred MB/s,
which is really horrible by modern standards.

> However, things break seriously when exercising the CF card in parallel 
> (e.g. with a dd if=/dev/hda of=/dev/null):
> 
> * The rtc *interrupt handler* is delayed for up to 250 *micro*seconds. 
> This is very bad for my purpose, but easy to explain: It is roughly the 
> time needed to transfer 512 Bytes from a CF card which can transfer 2 
> Mbyte/sec, and obviously, the CPU blocks all interrupts while making pio
> 
> transfers. (Why? Is this really necessary?)

even with -u1, isn't there still a softirq queue that will delay the wakeup
of your user-space tester?

> * The *test program* is regularly blocked for 63 *milli*seconds, 
> sometimes for up to 300 *milli*seconds, which is absolutely
> unacceptable.

guessing that's VM housekeeping.

> Now the big question:
> *** Why doesn't my rt test program get *any* CPU for 63 jiffies? ***
> (the system ticks at 1000 HZ)

because it's user-space.  the 'rt' is a bit of a misnomer - it's merely
a higher priority, less preemptable job.

> * The dd program obviously gets some CPU regularly (because it copies 2 
> MB/s, and because no other program could cause the 1 % user CPU load). 
> The dd runs at normal shell scheduling priority, so it should be 
> preempted immediately by my test program!

out of curiosity, does running it with "nice -n 19" change anything?

> 2.) Using a realtime-preempt-2.6.12-rc1-V0.7.41-11 kernel with
> PREEMPT_RT:
> If my test program runs at rtpri 99, the problem is gone: It is 
> scheduled within 30 microseconds after the rtc interrupt.
> If my test program runs at rtpri 1, it still suffers from delays 
> in the 30 to 300 millisecond range.

so your problem is solved, no?

also, did you try a (plain) preemptable kernel?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11, IDE: Strange scheduling behaviour: high-pri RT process not scheduled?

2005-03-30 Thread Mark Hahn
 I've written a small test program which enables periodic RTC interrupts 
 at 8192 Hz and then goes into a loop reading /dev/rtc and collecting 
 timing statistics (using the rdtscl macro).

straightforward test, used for many years in the linux community
(I claim to have been the first to publish it on lkml ;)

 The test system runs a 2.6.11 kernel (no SMP) on a Pentium3 500 MHz 
 embedded hardware.

which probably has memory bandwidth of at most a couple hundred MB/s,
which is really horrible by modern standards.

 However, things break seriously when exercising the CF card in parallel 
 (e.g. with a dd if=/dev/hda of=/dev/null):
 
 * The rtc *interrupt handler* is delayed for up to 250 *micro*seconds. 
 This is very bad for my purpose, but easy to explain: It is roughly the 
 time needed to transfer 512 Bytes from a CF card which can transfer 2 
 Mbyte/sec, and obviously, the CPU blocks all interrupts while making pio
 
 transfers. (Why? Is this really necessary?)

even with -u1, isn't there still a softirq queue that will delay the wakeup
of your user-space tester?

 * The *test program* is regularly blocked for 63 *milli*seconds, 
 sometimes for up to 300 *milli*seconds, which is absolutely
 unacceptable.

guessing that's VM housekeeping.

 Now the big question:
 *** Why doesn't my rt test program get *any* CPU for 63 jiffies? ***
 (the system ticks at 1000 HZ)

because it's user-space.  the 'rt' is a bit of a misnomer - it's merely
a higher priority, less preemptable job.

 * The dd program obviously gets some CPU regularly (because it copies 2 
 MB/s, and because no other program could cause the 1 % user CPU load). 
 The dd runs at normal shell scheduling priority, so it should be 
 preempted immediately by my test program!

out of curiosity, does running it with nice -n 19 change anything?

 2.) Using a realtime-preempt-2.6.12-rc1-V0.7.41-11 kernel with
 PREEMPT_RT:
 If my test program runs at rtpri 99, the problem is gone: It is 
 scheduled within 30 microseconds after the rtc interrupt.
 If my test program runs at rtpri 1, it still suffers from delays 
 in the 30 to 300 millisecond range.

so your problem is solved, no?

also, did you try a (plain) preemptable kernel?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.4.6-pre2, pre3 VM Behavior

2001-06-14 Thread Mark Hahn

> > Would it be possible to maintain a dirty-rate count
> > for the dirty buffers?
> > 
> > For example, we it is possible to figure an approximate
> > disk subsystem speed from most of the given information.
> 
> Disk speed is difficult.  I may enable and disable swap on any number of
...
> You may be able to get some useful approximations, but you
> will probably not be able to get good numbers in all cases.

a useful approximation would be simply an idle flag.
for instance, if the disk is idle, then cleaning a few 
inactive-dirty pages would make perfect sense, even in 
the absence of memory pressure.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.6-pre2, pre3 VM Behavior

2001-06-14 Thread Mark Hahn

  Would it be possible to maintain a dirty-rate count
  for the dirty buffers?
  
  For example, we it is possible to figure an approximate
  disk subsystem speed from most of the given information.
 
 Disk speed is difficult.  I may enable and disable swap on any number of
...
 You may be able to get some useful approximations, but you
 will probably not be able to get good numbers in all cases.

a useful approximation would be simply an idle flag.
for instance, if the disk is idle, then cleaning a few 
inactive-dirty pages would make perfect sense, even in 
the absence of memory pressure.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM Report was:Re: Break 2.4 VM in five easy steps

2001-06-09 Thread Mark Hahn

> reads the RTC device.  The patched RTC driver can then
> measure the elapsed time between the interrupt and the
> read from userspace.  Voila: latency.

interesting, but I'm not sure there's much advantage over
doing it entirely in user-space with the normal /dev/rtc:

http://brain.mcmaster.ca/~hahn/realfeel.c

it just prints out the raw time difference from when
rtc should have woken up the program.  you can do your own histogram;
for summary purposes, something like stdev is probably best.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: VM Report was:Re: Break 2.4 VM in five easy steps

2001-06-09 Thread Mark Hahn

 reads the RTC device.  The patched RTC driver can then
 measure the elapsed time between the interrupt and the
 read from userspace.  Voila: latency.

interesting, but I'm not sure there's much advantage over
doing it entirely in user-space with the normal /dev/rtc:

http://brain.mcmaster.ca/~hahn/realfeel.c

it just prints out the raw time difference from when
rtc should have woken up the program.  you can do your own histogram;
for summary purposes, something like stdev is probably best.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: XMM: monitor Linux MM inactive/active lists graphically

2001-06-04 Thread Mark Hahn

> XMM is heavily modified XMEM utility that shows graphically size of
> different Linux page lists: active, inactive_dirty, inactive_clean,
> code, free and swap usage. It is better suited for the monitoring of
> Linux 2.4 MM implementation than original (XMEM) utility.
> 
> Find it here:  http://linux.inet.hr/>

interesting.  I prefer to collect data separately from viewing it,
and use the following simple perl script to do so; obviously, 
it generates a bunch of separate files, one for each metric,
suitable for traditional filtering, gnuplot, etc.

#!/bin/perl
use IO::Handle;

require 'sys/syscall.ph';
sub gettimeofday {
$timeval = pack("LL", ());
syscall( _gettimeofday, $timeval, 0) != -1
or die "gettimeofday: $!";
($sec,$usec) = unpack("LL", $timeval);
return $sec + 1e-6 * $usec;
}

open(S,"pi.st");  PI->autoflush(1);
open(PO,">po.st");  PO->autoflush(1);
open(SI,">si.st");  SI->autoflush(1);
open(SO,">so.st");  SO->autoflush(1);
open(CX,">ctx.st"); CX->autoflush(1);
open(MF,">free.st");MF->autoflush(1);
open(BF,">buf.st"); BF->autoflush(1);
open(AC,">act.st"); AC->autoflush(1);
open(ID,">id.st");  ID->autoflush(1);
open(IC,">ic.st");  IC->autoflush(1);
open(IT,">it.st");  IT->autoflush(1);
open(SW,">swap.st");SW->autoflush(1);
open(BH,">bh.st");  BH->autoflush(1);
open(IN,">inode.st");   IN->autoflush(1);
open(DE,">dentry.st");  DE->autoflush(1);

$c = 0;
$first = gettimeofday();
while (1) {
  sleep(1);
  $now = gettimeofday() - $first;

  seek(S,0,SEEK_SET);
  while () {
  if (/^page\s+(\d+)\s+(\d+)$/) {
  if ($c) { print PI "$now ",4*($1 - $pi),"\n"; }
  if ($c) { print PO "$now ",4*($2 - $po),"\n"; }
  $pi = $1;
  $po = $2;
  next;
  }
  if (/^swap\s+(\d+)\s+(\d+)$/) {
  if ($c) { print SI "$now ",4*($1 - $si),"\n"; }
  if ($c) { print SO "$now ",4*($2 - $so),"\n"; }
  $si = $1;
  $so = $2;
  next;
  }
  if (/^ctxt\s+(\d+)$/) {
  if ($c) { print CX "$now ",$1 - $cx,"\n"; }
  $cx = $1;
  next;
  }
  }
  seek(M,0,SEEK_SET);
  while () {
  if (/^MemFree:\s+(\d+) kB$/) {print MF "$now ",$1,"\n"; next; }
  if (/^Buffers:\s+(\d+) kB$/) {print BF "$now ",$1,"\n"; next; }
  if (/^Active:\s+(\d+) kB$/) { print AC "$now ",$1,"\n"; next; }
  if (/^Inact_dirty:\s+(\d+) kB$/) {print ID "$now ",$1,"\n"; next; }
  if (/^Inact_clean:\s+(\d+) kB$/) {print IC "$now ",$1,"\n"; next; }
  if (/^Inact_target:\s+(\d+) kB$/) {   print IT "$now ",$1,"\n"; next; }
  if (/^Inact_target:\s+(\d+) kB$/) {   print IT "$now ",$1,"\n"; next; }
  if (/^Swap:\s+\d+\s+(\d+)/)   {   print SW "$now ",$1,"\n"; next; }
  }
  seek(B,0,SEEK_SET);
  while () {
  if (/^buffer_head\s+(\d+)\s+(\d+)\s+(\d+)/) { print BH "$now 
",$1*$3/1024,"\n"; next; }
  if (/^inode_cache\s+(\d+)\s+(\d+)\s+(\d+)/) { print IN "$now 
",$1*$3/1024,"\n"; next; }
  if (/^dentry_cache\s+(\d+)\s+(\d+)\s+(\d+)/) {print DE "$now 
",$1*$3/1024,"\n"; next; }
  }
  $c++;
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: XMM: monitor Linux MM inactive/active lists graphically

2001-06-04 Thread Mark Hahn

 XMM is heavily modified XMEM utility that shows graphically size of
 different Linux page lists: active, inactive_dirty, inactive_clean,
 code, free and swap usage. It is better suited for the monitoring of
 Linux 2.4 MM implementation than original (XMEM) utility.
 
 Find it here:  URL:http://linux.inet.hr/

interesting.  I prefer to collect data separately from viewing it,
and use the following simple perl script to do so; obviously, 
it generates a bunch of separate files, one for each metric,
suitable for traditional filtering, gnuplot, etc.

#!/bin/perl
use IO::Handle;

require 'sys/syscall.ph';
sub gettimeofday {
$timeval = pack(LL, ());
syscall( SYS_gettimeofday, $timeval, 0) != -1
or die gettimeofday: $!;
($sec,$usec) = unpack(LL, $timeval);
return $sec + 1e-6 * $usec;
}

open(S,/proc/stat) || die(failed to open /proc/stat);
open(M,/proc/meminfo) || die(failed to open /proc/meminfo);
open(B,/proc/slabinfo) || die(failed to open /proc/slabinfo);

open(PI,pi.st);  PI-autoflush(1);
open(PO,po.st);  PO-autoflush(1);
open(SI,si.st);  SI-autoflush(1);
open(SO,so.st);  SO-autoflush(1);
open(CX,ctx.st); CX-autoflush(1);
open(MF,free.st);MF-autoflush(1);
open(BF,buf.st); BF-autoflush(1);
open(AC,act.st); AC-autoflush(1);
open(ID,id.st);  ID-autoflush(1);
open(IC,ic.st);  IC-autoflush(1);
open(IT,it.st);  IT-autoflush(1);
open(SW,swap.st);SW-autoflush(1);
open(BH,bh.st);  BH-autoflush(1);
open(IN,inode.st);   IN-autoflush(1);
open(DE,dentry.st);  DE-autoflush(1);

$c = 0;
$first = gettimeofday();
while (1) {
  sleep(1);
  $now = gettimeofday() - $first;

  seek(S,0,SEEK_SET);
  while (S) {
  if (/^page\s+(\d+)\s+(\d+)$/) {
  if ($c) { print PI $now ,4*($1 - $pi),\n; }
  if ($c) { print PO $now ,4*($2 - $po),\n; }
  $pi = $1;
  $po = $2;
  next;
  }
  if (/^swap\s+(\d+)\s+(\d+)$/) {
  if ($c) { print SI $now ,4*($1 - $si),\n; }
  if ($c) { print SO $now ,4*($2 - $so),\n; }
  $si = $1;
  $so = $2;
  next;
  }
  if (/^ctxt\s+(\d+)$/) {
  if ($c) { print CX $now ,$1 - $cx,\n; }
  $cx = $1;
  next;
  }
  }
  seek(M,0,SEEK_SET);
  while (M) {
  if (/^MemFree:\s+(\d+) kB$/) {print MF $now ,$1,\n; next; }
  if (/^Buffers:\s+(\d+) kB$/) {print BF $now ,$1,\n; next; }
  if (/^Active:\s+(\d+) kB$/) { print AC $now ,$1,\n; next; }
  if (/^Inact_dirty:\s+(\d+) kB$/) {print ID $now ,$1,\n; next; }
  if (/^Inact_clean:\s+(\d+) kB$/) {print IC $now ,$1,\n; next; }
  if (/^Inact_target:\s+(\d+) kB$/) {   print IT $now ,$1,\n; next; }
  if (/^Inact_target:\s+(\d+) kB$/) {   print IT $now ,$1,\n; next; }
  if (/^Swap:\s+\d+\s+(\d+)/)   {   print SW $now ,$1,\n; next; }
  }
  seek(B,0,SEEK_SET);
  while (B) {
  if (/^buffer_head\s+(\d+)\s+(\d+)\s+(\d+)/) { print BH $now 
,$1*$3/1024,\n; next; }
  if (/^inode_cache\s+(\d+)\s+(\d+)\s+(\d+)/) { print IN $now 
,$1*$3/1024,\n; next; }
  if (/^dentry_cache\s+(\d+)\s+(\d+)\s+(\d+)/) {print DE $now 
,$1*$3/1024,\n; next; }
  }
  $c++;
}

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4 freezes on VIA KT133

2001-05-25 Thread Mark Hahn

> > contrary to the implication here, I don't believe there is any *general*
> > problem with Linux/VIA/AMD stability.  there are well-known issues
...
> VIA hardware is not suitable for anything until we _know_ the
> truth about what is wrong. VIA is hiding something big.

this is INCORRECT: we know there are specific problems with certain
VIA hardware, but there is most definitely *NO* problem with other 
VIA hardware, which is eminently suitable for servers, workstations
and cabbage dicing controllers.

afaik, there are absolutely zero problems reported with kt133-no-a
machines, for instance.  mine has certainly worked flawlessly for a 
long time, on most every 2.3/2.4 kernel over the past year+.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4 freezes on VIA KT133

2001-05-25 Thread Mark Hahn

  contrary to the implication here, I don't believe there is any *general*
  problem with Linux/VIA/AMD stability.  there are well-known issues
...
 VIA hardware is not suitable for anything until we _know_ the
 truth about what is wrong. VIA is hiding something big.

this is INCORRECT: we know there are specific problems with certain
VIA hardware, but there is most definitely *NO* problem with other 
VIA hardware, which is eminently suitable for servers, workstations
and cabbage dicing controllers.

afaik, there are absolutely zero problems reported with kt133-no-a
machines, for instance.  mine has certainly worked flawlessly for a 
long time, on most every 2.3/2.4 kernel over the past year+.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4 freezes on VIA KT133

2001-05-24 Thread Mark Hahn

> This report is probably not very helpful, but it may be useful for those who
> planned to purchase AMD / VIA solution for a server.

contrary to the implication here, I don't believe there is any *general*
problem with Linux/VIA/AMD stability.  there are well-known issues
with specific items (VIA 686b, for instance), but VIA/AMD hardware
is quite suitable for servers.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4 freezes on VIA KT133

2001-05-24 Thread Mark Hahn

 This report is probably not very helpful, but it may be useful for those who
 planned to purchase AMD / VIA solution for a server.

contrary to the implication here, I don't believe there is any *general*
problem with Linux/VIA/AMD stability.  there are well-known issues
with specific items (VIA 686b, for instance), but VIA/AMD hardware
is quite suitable for servers.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: FW: I think I've found a serious bug in AMD Athlon page_alloc.croutines, where do I mail the developer(s) ?

2001-05-15 Thread Mark Hahn

> I think I've found a serious bug in AMD Athlon page_alloc.c routines in

there's nothing athlon-specific there.

> correct on the DFI AK75-EC motherboard, if I set the CPU kernel type to 586
> everything is 100%, if I use "Athlon" kernel type I get:
> kernel BUG at page_alloc.c:73

when you select athlon at compile time, you're mainly 
getting Arjan's athlon-specific page-clear and -copy functions
(along with some relatively trivial alignment changes).
these functions are ~3x as fast as the generic ones,
and seem to cause dram/cpu-related oopes on some machines.

in short: faster code pushes the hardware past stability.
there's no reason, so far, to think that there's anything 
wrong with the code - Alan had a possible issue with prefetching
and very old Atlons, but the people reporting problems like this
are actually running kt133a and new fsb133 Athlons.

> I've changed RAM, Motherboard etc... still the same.

changed to a non-kt133a board?  how about running fsb and/or dram
at 100, rather than 133?

> Also the same system runs linux-2.2.16 100%

2.2 doesn't have the fast page-clear and -copy code afaik.

afaik, there are *no* problems on kt133 machines,
and haven't heard any pain from people who might have 
Ali Magic1, AMD 760 or KT266 boards, but they're still rare.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: FW: I think I've found a serious bug in AMD Athlon page_alloc.croutines, where do I mail the developer(s) ?

2001-05-15 Thread Mark Hahn

 I think I've found a serious bug in AMD Athlon page_alloc.c routines in

there's nothing athlon-specific there.

 correct on the DFI AK75-EC motherboard, if I set the CPU kernel type to 586
 everything is 100%, if I use Athlon kernel type I get:
 kernel BUG at page_alloc.c:73

when you select athlon at compile time, you're mainly 
getting Arjan's athlon-specific page-clear and -copy functions
(along with some relatively trivial alignment changes).
these functions are ~3x as fast as the generic ones,
and seem to cause dram/cpu-related oopes on some machines.

in short: faster code pushes the hardware past stability.
there's no reason, so far, to think that there's anything 
wrong with the code - Alan had a possible issue with prefetching
and very old Atlons, but the people reporting problems like this
are actually running kt133a and new fsb133 Athlons.

 I've changed RAM, Motherboard etc... still the same.

changed to a non-kt133a board?  how about running fsb and/or dram
at 100, rather than 133?

 Also the same system runs linux-2.2.16 100%

2.2 doesn't have the fast page-clear and -copy code afaik.

afaik, there are *no* problems on kt133 machines,
and haven't heard any pain from people who might have 
Ali Magic1, AMD 760 or KT266 boards, but they're still rare.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: PROBLEM: IDE dma_intr error on VIA chipset

2001-05-13 Thread Mark Hahn

> hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

read the fine faq.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: PROBLEM: IDE dma_intr error on VIA chipset

2001-05-13 Thread Mark Hahn

 hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

read the fine faq.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: LVM 1.0 release decision

2001-05-11 Thread Mark Hahn

On Fri, 11 May 2001, Jeff Garzik wrote:
...
> Subsystems are often maintained outside the Linus tree, with code
> getting pushed (hopefully regularly) to Linus.  For such scenarios, it

"maintained" *means* that the fixes/development get fed to Linus.
afaikt, the LVM/ISDN/etc situations were problems because the developers
merely hacked on code, and failed to do the maintenance (feed Linus) part.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: LVM 1.0 release decision

2001-05-11 Thread Mark Hahn

On Fri, 11 May 2001, Jeff Garzik wrote:
...
 Subsystems are often maintained outside the Linus tree, with code
 getting pushed (hopefully regularly) to Linus.  For such scenarios, it

maintained *means* that the fixes/development get fed to Linus.
afaikt, the LVM/ISDN/etc situations were problems because the developers
merely hacked on code, and failed to do the maintenance (feed Linus) part.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] arp_filter patch for 2.4.4 kernel.

2001-05-06 Thread Mark Hahn

> > also -- isn't it kind of wrong for arp to respond with addresses from
> > other interfaces?
> 
> Usually it makes sense, because it increases your chances of successfull
> communication. IP addresses are owned by the complete host on Linux, not by 
> different interfaces.

this is one of those things that is still hurting Linux's credibility in the
read world.  people see this kind of obviously broken behavior, and install 
*BSD or Solaris instead.

isn't this clearly a case of the kernel being too smart: making it impossible
for a clueful admin to do what he needs?  multi-nic machines are now quite
common, but this "feature" makes them far less useful, since the stack is 
violating the admin's intention.

> For some weirder setups (most of them just caused by incorrect routing
> tables, but also a few legimitate ones; including incoming load balancing
> via multipath routes) it causes problems, so arpfilter was invented to 
> sync ARP replies with the routing tables as needed.

there's NOTHING weird about a machine having two nics and two IPs,
wanting to behave like two hosts.

is there any positive/beneficial reason for the current behavior?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] arp_filter patch for 2.4.4 kernel.

2001-05-06 Thread Mark Hahn

  also -- isn't it kind of wrong for arp to respond with addresses from
  other interfaces?
 
 Usually it makes sense, because it increases your chances of successfull
 communication. IP addresses are owned by the complete host on Linux, not by 
 different interfaces.

this is one of those things that is still hurting Linux's credibility in the
read world.  people see this kind of obviously broken behavior, and install 
*BSD or Solaris instead.

isn't this clearly a case of the kernel being too smart: making it impossible
for a clueful admin to do what he needs?  multi-nic machines are now quite
common, but this feature makes them far less useful, since the stack is 
violating the admin's intention.

 For some weirder setups (most of them just caused by incorrect routing
 tables, but also a few legimitate ones; including incoming load balancing
 via multipath routes) it causes problems, so arpfilter was invented to 
 sync ARP replies with the routing tables as needed.

there's NOTHING weird about a machine having two nics and two IPs,
wanting to behave like two hosts.

is there any positive/beneficial reason for the current behavior?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Athlon and fast_page_copy: What's it worth ? :)

2001-05-05 Thread Mark Hahn

On Fri, 4 May 2001, Seth Goldberg wrote:

> Hi,
> 
>   Before I go any further with this investigation, I'd like to get an
> idea
> of how much of a performance improvement the K7 fast_page_copy will give
> me.
> Can someone suggest the best benchmark to test the speed of this
> routine?

Arjan van de Ven did the code, and he wrote a little test harness.
I've hacked it a bit (http://brain.mcmaster.ca/~hahn/athlon.c);
on my duron/600, kt133, pc133 cas2, it looks like this:

clear_page by 'normal_clear_page'took 7221 cycles (324.6 MB/s)
clear_page by 'slow_zero_page'   took 7232 cycles (324.1 MB/s)
clear_page by 'fast_clear_page'  took 6110 cycles (383.6 MB/s)
clear_page by 'faster_clear_page'took 2574 cycles (910.6 MB/s)

copy_page by 'normal_copy_page'  took 7224 cycles (324.4 MB/s)
copy_page by 'slow_copy_page'took 7223 cycles (324.5 MB/s)
copy_page by 'fast_copy_page'took 4662 cycles (502.7 MB/s)
copy_page by 'faster_copy'   took 2746 cycles (853.5 MB/s)
copy_page by 'even_faster'   took 2802 cycles (836.5 MB/s)

70% faster!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Athlon and fast_page_copy: What's it worth ? :)

2001-05-05 Thread Mark Hahn

On Fri, 4 May 2001, Seth Goldberg wrote:

 Hi,
 
   Before I go any further with this investigation, I'd like to get an
 idea
 of how much of a performance improvement the K7 fast_page_copy will give
 me.
 Can someone suggest the best benchmark to test the speed of this
 routine?

Arjan van de Ven did the code, and he wrote a little test harness.
I've hacked it a bit (http://brain.mcmaster.ca/~hahn/athlon.c);
on my duron/600, kt133, pc133 cas2, it looks like this:

clear_page by 'normal_clear_page'took 7221 cycles (324.6 MB/s)
clear_page by 'slow_zero_page'   took 7232 cycles (324.1 MB/s)
clear_page by 'fast_clear_page'  took 6110 cycles (383.6 MB/s)
clear_page by 'faster_clear_page'took 2574 cycles (910.6 MB/s)

copy_page by 'normal_copy_page'  took 7224 cycles (324.4 MB/s)
copy_page by 'slow_copy_page'took 7223 cycles (324.5 MB/s)
copy_page by 'fast_copy_page'took 4662 cycles (502.7 MB/s)
copy_page by 'faster_copy'   took 2746 cycles (853.5 MB/s)
copy_page by 'even_faster'   took 2802 cycles (836.5 MB/s)

70% faster!

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: DISCOVERED! Cause of Athlon/VIA KX133 Instability

2001-05-01 Thread Mark Hahn

> > this has nothing to do with the very specific disk corruption
> > being discussed (which has to do with the ide controller, according
> > to the most recent rumors.).
> 
>   Actually, I think there are 2 problems that have been discussed -- the
> disk corruption and a general instability resulting in oops'es at
> various points shortly after boot up.

I don't see this.  specifically, there were scattered reports
of a via-ide problem a few months ago; this is the issue that's 
gotten some press, and for which Alan has a fix.  and there are reports 
of via-smp problems at boot (which go away with noapic).  I see no reports 
of the kind of general instability you're talking about.  and all the 
via-users I've heard of have no such stability problems - 
me included (kt133/duron).

the only general issue is that kx133 systems seem to be difficult
to configure for stability.  ugly things like tweaking Vio.
there's no implication that has anything to do with Linux, though.
> 
>   My memory system jas been set up very conservitavely and has been
> rock solid in my other board (ka7), so I doubt it's that, but I
> sure am happy to try a few more cominations of bios settings.  Anything
> I should look for in particular?

how many dimms do you have?  interleave settings?  Vio jumper?
already checked on cooling issues?  and that you're not overclocking...

> > why resort to silly windows tools, when lspci under Linux does it for you?
> 
>   Because lspci does not display all 256 bytes of pci configuration
> information.

sure it does: (from my kt133 hostbridge)

[root@crema /root]# lspci -s 00:00.0 -xxx
00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02)
00: 06 11 05 03 06 00 10 22 02 00 00 06 00 00 00 00
10: 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 27 a4 0b b4 46 02 08 08 08 00 00 00 04 08 08 08
60: 0c 00 00 00 d5 d6 d5 00 50 5d 86 0d 08 01 00 00
70: c9 88 cc 0c 0e a0 d2 00 01 b4 01 02 00 00 00 00
80: 0f 40 00 00 f0 00 00 00 02 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 02 c0 20 00 07 02 00 1f 00 00 00 00 2b 02 04 00
b0: 7f 63 2a 65 31 33 c0 0c 00 00 00 00 00 00 00 00
c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 0e 22 00 00 00 00 00 00 00

[root@crema /root]# od -Ax -txC /proc/bus/pci/00/00.0 
00 06 11 05 03 06 00 10 22 02 00 00 06 00 00 00 00
10 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00
20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00
40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50 27 a4 0b b4 46 02 08 08 08 00 00 00 04 08 08 08
60 0c 00 00 00 d5 d6 d5 00 50 5d 86 0d 08 01 00 00
70 c9 88 cc 0c 0e a0 d2 00 01 b4 01 02 00 00 00 00
80 0f 40 00 00 f0 00 00 00 02 00 00 00 00 00 00 00
90 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0 02 c0 20 00 07 02 00 1f 00 00 00 00 2b 02 04 00
b0 7f 63 2a 65 31 33 c0 0c 00 00 00 00 00 00 00 00
c0 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
f0 00 00 00 00 00 00 00 0e 22 00 00 00 00 00 00 00
000100


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: DISCOVERED! Cause of Athlon/VIA KX133 Instability

2001-05-01 Thread Mark Hahn

>   And that's exactly what I did :)...  I found that ONLY the combination
> of USE_3DNOW and forcing the athlon mmx stuff in (by doing #if 1 in
> results in this wackiness.  I should alos repeat that I *DO* see that

I doubt that USE_3DNOW is causing the problem, but rather when you
USE_3DNOW, the kernel streams through your northbridge at roughly
twice the bandwidth.  if your dram settings are flakey, this could
eaily trigger a problem.  

this has nothing to do with the very specific disk corruption
being discussed (which has to do with the ide controller, according
to the most recent rumors.).

>   The other thing i was gunna try is to dump my chipset registers using 
> WPCREDIT and WPCRSET and compare them with other people on this list

why resort to silly windows tools, when lspci under Linux does it for you?

regards, mark hahn.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Re: Linux 2.4.4-ac2

2001-05-01 Thread Mark Hahn

> +  * Make sure the child gets the SCHED_YIELD flag cleared, even if
> +  * it inherited it, to avoid deadlocks.

can anyone think of a reason that SCHED_YIELD *should* be inherited?
I think it's just oversight that fork doesn't clear it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Re: Linux 2.4.4-ac2

2001-05-01 Thread Mark Hahn

 +  * Make sure the child gets the SCHED_YIELD flag cleared, even if
 +  * it inherited it, to avoid deadlocks.

can anyone think of a reason that SCHED_YIELD *should* be inherited?
I think it's just oversight that fork doesn't clear it.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: DISCOVERED! Cause of Athlon/VIA KX133 Instability

2001-05-01 Thread Mark Hahn

   And that's exactly what I did :)...  I found that ONLY the combination
 of USE_3DNOW and forcing the athlon mmx stuff in (by doing #if 1 in
 results in this wackiness.  I should alos repeat that I *DO* see that

I doubt that USE_3DNOW is causing the problem, but rather when you
USE_3DNOW, the kernel streams through your northbridge at roughly
twice the bandwidth.  if your dram settings are flakey, this could
eaily trigger a problem.  

this has nothing to do with the very specific disk corruption
being discussed (which has to do with the ide controller, according
to the most recent rumors.).

   The other thing i was gunna try is to dump my chipset registers using 
 WPCREDIT and WPCRSET and compare them with other people on this list

why resort to silly windows tools, when lspci under Linux does it for you?

regards, mark hahn.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: DISCOVERED! Cause of Athlon/VIA KX133 Instability

2001-05-01 Thread Mark Hahn

  this has nothing to do with the very specific disk corruption
  being discussed (which has to do with the ide controller, according
  to the most recent rumors.).
 
   Actually, I think there are 2 problems that have been discussed -- the
 disk corruption and a general instability resulting in oops'es at
 various points shortly after boot up.

I don't see this.  specifically, there were scattered reports
of a via-ide problem a few months ago; this is the issue that's 
gotten some press, and for which Alan has a fix.  and there are reports 
of via-smp problems at boot (which go away with noapic).  I see no reports 
of the kind of general instability you're talking about.  and all the 
via-users I've heard of have no such stability problems - 
me included (kt133/duron).

the only general issue is that kx133 systems seem to be difficult
to configure for stability.  ugly things like tweaking Vio.
there's no implication that has anything to do with Linux, though.
 
   My memory system jas been set up very conservitavely and has been
 rock solid in my other board (ka7), so I doubt it's that, but I
 sure am happy to try a few more cominations of bios settings.  Anything
 I should look for in particular?

how many dimms do you have?  interleave settings?  Vio jumper?
already checked on cooling issues?  and that you're not overclocking...

  why resort to silly windows tools, when lspci under Linux does it for you?
 
   Because lspci does not display all 256 bytes of pci configuration
 information.

sure it does: (from my kt133 hostbridge)

[root@crema /root]# lspci -s 00:00.0 -xxx
00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02)
00: 06 11 05 03 06 00 10 22 02 00 00 06 00 00 00 00
10: 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 27 a4 0b b4 46 02 08 08 08 00 00 00 04 08 08 08
60: 0c 00 00 00 d5 d6 d5 00 50 5d 86 0d 08 01 00 00
70: c9 88 cc 0c 0e a0 d2 00 01 b4 01 02 00 00 00 00
80: 0f 40 00 00 f0 00 00 00 02 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 02 c0 20 00 07 02 00 1f 00 00 00 00 2b 02 04 00
b0: 7f 63 2a 65 31 33 c0 0c 00 00 00 00 00 00 00 00
c0: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 0e 22 00 00 00 00 00 00 00

[root@crema /root]# od -Ax -txC /proc/bus/pci/00/00.0 
00 06 11 05 03 06 00 10 22 02 00 00 06 00 00 00 00
10 08 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00
20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00
40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50 27 a4 0b b4 46 02 08 08 08 00 00 00 04 08 08 08
60 0c 00 00 00 d5 d6 d5 00 50 5d 86 0d 08 01 00 00
70 c9 88 cc 0c 0e a0 d2 00 01 b4 01 02 00 00 00 00
80 0f 40 00 00 f0 00 00 00 02 00 00 00 00 00 00 00
90 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0 02 c0 20 00 07 02 00 1f 00 00 00 00 2b 02 04 00
b0 7f 63 2a 65 31 33 c0 0c 00 00 00 00 00 00 00 00
c0 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
f0 00 00 00 00 00 00 00 0e 22 00 00 00 00 00 00 00
000100


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: #define HZ 1024 -- negative effects

2001-04-25 Thread Mark Hahn

> > Are there any negative effects of editing include/asm/param.h to change 
> > HZ from 100 to 1024? Or any other number? This has been suggested as a 
> > way to improve the responsiveness of the GUI on a Linux system. Does it 
...
> Why not just run the X server at a realtime priority?  Then it will get
> to respond to existing events, such as keyboard and mouse input,
> promptly without creating lots of superfluous extra clock interrupts.
> I think you will find this is a better solution.

it's surprisingly ineffective; usually, if someone thinks responsiveness
is bad, there's a problem with the system.  for instance, if the system
is swapping, setting X (and wm, and clients) to RT makes little difference,
since the kernel is stealing pages from them, regardless of their scheduling
priority.

if you're curious, you might be interested in two toy programs
I've attached.  one is "setrealtime", which will make a pid RT, or else act
as a wrapper (ala /bin/time).  I have it installed suid root on my system,
though this is rather dangerous if your have lusers around.  the second is a
simple memory-hog: mmaps a bunch of ram, and keeps it active (printing out a
handy measure of how long it took to touch its pages...)

regards, mark hahn.


#include 
#include 
#include 
#include 
#include 

volatile unsigned sink;

double second() {
struct timeval tv;
gettimeofday(,0);
return tv.tv_sec + 1e-6 * tv.tv_usec;
}

int
main(int argc, char *argv[]) {
int doWrite = 1;
unsigned size = 80 * 1024 * 1024;

int letter;
while ((letter = getopt(argc, argv, "s:wrvh?" )) != -1) {
switch(letter) {
case 's': size = atoi(optarg) * 1024 * 1024; break;
case 'w': doWrite = 1; break;
default:
fprintf(stderr,"useup [-s mb][-w]\n");
exit(1);
}
}
int *base = (int*) mmap(0, size, 
  PROT_READ|PROT_WRITE, 
  MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
if (base == MAP_FAILED) {
perror("mmap failed");
exit(1);
}

int *end = base + size/4;

while (1) {
double start = second();
if (doWrite)
for (int *p = base; p < end; p += 1024)
*p = 0;
else {
unsigned sum = 0;
for (int *p = base; p < end; p += 1024)
sum += *p;
sink = sum;
}
printf("%f\n",1000*(second() - start));
}
}


#include 
#include 
#include 
#include 

int
main(int argc, char *argv[]) {
int uid = getuid();
int pid = atoi(argv[1]);
int sched_fifo_min, sched_fifo_max;
static struct sched_param sched_parms;

if (!pid)
pid = getpid();

sched_fifo_min = sched_get_priority_min(SCHED_FIFO);
sched_fifo_max = sched_get_priority_max(SCHED_FIFO);
sched_parms.sched_priority = sched_fifo_min + 1;

if (sched_setscheduler(pid, SCHED_FIFO, _parms) == -1)
perror("cannot set realtime scheduling policy");

if (uid)
setuid(uid);

if (pid == getpid())
execvp(argv[1],[1]);
return 0;
}



Re: /proc format (was Device Registry (DevReg) Patch 0.2.0)

2001-04-25 Thread Mark Hahn

> > Question: it is possible to redirect the same fs call (say read) to different
> > implementations, based on the open mode of the file descriptor ? So, if
> > you open the entry in binary, you just get the number chunk, if you open
> > it in ascii you get a pretty printed version, or a format description like
> 
> There is no distinction between "text" and "binary" modes on a file
> descriptor.  The distinction exists in the C stdio layer, but is a
> no-op on Unix systems.

of course.  but we could trivially define O_PROC_BINARY,
or an ioctl/fcntl, or even do something fancy like use lseek().

pardon my stream of consciousness here, but:

I think it's well-established that proc exists for humans,
and that there's no real sympathy for the eternal whines of 
how terribly hard it is to parse.  it's NOT hard to parse,
but would be more trivial if it were more consistent.

the main goal at this point is to make kernel proc-related 
code more efficient, easy-to-use, etc.  a purely secondary goal
is to make user-space tools more robust, efficient, and simpler.

there are three things that need to be communicated through the proc
interface, for each chunk of data: its type, it's name and its value.
it's critical that data be tagged in some way, since that's the only
way to permit back-compatibility.  that is, a tool looking for a particular
tag will naturally ignore new data with other tags.

/proc/sys is an attempt to provide tagged data; it works well, is 
easy to comprehend, but requires an open for each datum, and provides
no hints about type.

/proc/cpuinfo is another attempt: "tag : data", with no attempt to
provide types.  the tags have also mutated somewhat over time.

/proc/partitions is an example of a record-oriented file:
one line per record, and tags for the record members at the top.
still no typing information.

I have a sense that all of these could be collapsed into a single
api where kernel systems would register hierarchies of tuples of
, where callback would be passed the tag,
and proc code would take care of "rendering" the data into 
human readable text (default), binary, or even xml.  the latter
would require some signalling mechanism like O_PROC_XML or the like.
further, programs could perform a meta-query, where they ask for
the types and tags of a datum (or hierarchy), so that on subsequent
queries, they'd now how to handle binary data.

if only one piece of code handled the rendering of /proc stuff,
it could do more, without burdoning all the disparate /proc producers.

regards, mark hahn.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: #define HZ 1024 -- negative effects

2001-04-25 Thread Mark Hahn

  Are there any negative effects of editing include/asm/param.h to change 
  HZ from 100 to 1024? Or any other number? This has been suggested as a 
  way to improve the responsiveness of the GUI on a Linux system. Does it 
...
 Why not just run the X server at a realtime priority?  Then it will get
 to respond to existing events, such as keyboard and mouse input,
 promptly without creating lots of superfluous extra clock interrupts.
 I think you will find this is a better solution.

it's surprisingly ineffective; usually, if someone thinks responsiveness
is bad, there's a problem with the system.  for instance, if the system
is swapping, setting X (and wm, and clients) to RT makes little difference,
since the kernel is stealing pages from them, regardless of their scheduling
priority.

if you're curious, you might be interested in two toy programs
I've attached.  one is setrealtime, which will make a pid RT, or else act
as a wrapper (ala /bin/time).  I have it installed suid root on my system,
though this is rather dangerous if your have lusers around.  the second is a
simple memory-hog: mmaps a bunch of ram, and keeps it active (printing out a
handy measure of how long it took to touch its pages...)

regards, mark hahn.


#include unistd.h
#include stdlib.h
#include stdio.h
#include sys/time.h
#include sys/mman.h

volatile unsigned sink;

double second() {
struct timeval tv;
gettimeofday(tv,0);
return tv.tv_sec + 1e-6 * tv.tv_usec;
}

int
main(int argc, char *argv[]) {
int doWrite = 1;
unsigned size = 80 * 1024 * 1024;

int letter;
while ((letter = getopt(argc, argv, s:wrvh? )) != -1) {
switch(letter) {
case 's': size = atoi(optarg) * 1024 * 1024; break;
case 'w': doWrite = 1; break;
default:
fprintf(stderr,useup [-s mb][-w]\n);
exit(1);
}
}
int *base = (int*) mmap(0, size, 
  PROT_READ|PROT_WRITE, 
  MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
if (base == MAP_FAILED) {
perror(mmap failed);
exit(1);
}

int *end = base + size/4;

while (1) {
double start = second();
if (doWrite)
for (int *p = base; p  end; p += 1024)
*p = 0;
else {
unsigned sum = 0;
for (int *p = base; p  end; p += 1024)
sum += *p;
sink = sum;
}
printf(%f\n,1000*(second() - start));
}
}


#include unistd.h
#include stdlib.h
#include stdio.h
#include sched.h

int
main(int argc, char *argv[]) {
int uid = getuid();
int pid = atoi(argv[1]);
int sched_fifo_min, sched_fifo_max;
static struct sched_param sched_parms;

if (!pid)
pid = getpid();

sched_fifo_min = sched_get_priority_min(SCHED_FIFO);
sched_fifo_max = sched_get_priority_max(SCHED_FIFO);
sched_parms.sched_priority = sched_fifo_min + 1;

if (sched_setscheduler(pid, SCHED_FIFO, sched_parms) == -1)
perror(cannot set realtime scheduling policy);

if (uid)
setuid(uid);

if (pid == getpid())
execvp(argv[1],argv[1]);
return 0;
}



Re: SMP in 2.4

2001-04-18 Thread Mark Hahn


Dennis is like a pie in the face: messy, unexpected, but trivial.


On Wed, 18 Apr 2001, Dennis wrote:

> Does 2.4 have something similar to spl levels or does it still require the 
> ridiculous MS-DOSish spin-locks to protect every bit of code?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SMP in 2.4

2001-04-18 Thread Mark Hahn


Dennis is like a pie in the face: messy, unexpected, but trivial.


On Wed, 18 Apr 2001, Dennis wrote:

 Does 2.4 have something similar to spl levels or does it still require the 
 ridiculous MS-DOSish spin-locks to protect every bit of code?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC][PATCH] Scalable FD Management using Read-Copy-Update

2001-04-17 Thread Mark Hahn

> > isn't this a solution in search of a problem?
> > does it make sense to redesign parts of the kernel for the sole
> > purpose of making a completely unrealistic benchmark run faster?
> 
> Irrespective of the usefulness of the "chat" benchmark, it seems
> that there is a problem of scalability as long as CLONE_FILES is
> supported. John Hawkes (SGI) posted some nasty numbers on a
> 32 CPU mips machine in the lse-tech list some time ago.

that's not the point.  the point is that this has every sign of 
being premature optimization.  the "chat" benchmark does no work,
it only generates load.  and yes, indeed, you can cause contention
if you apply enough load in the right places.  this does NOT indicate
that any real apps apply the same load in the same places.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC][PATCH] Scalable FD Management using Read-Copy-Update

2001-04-17 Thread Mark Hahn

  isn't this a solution in search of a problem?
  does it make sense to redesign parts of the kernel for the sole
  purpose of making a completely unrealistic benchmark run faster?
 
 Irrespective of the usefulness of the "chat" benchmark, it seems
 that there is a problem of scalability as long as CLONE_FILES is
 supported. John Hawkes (SGI) posted some nasty numbers on a
 32 CPU mips machine in the lse-tech list some time ago.

that's not the point.  the point is that this has every sign of 
being premature optimization.  the "chat" benchmark does no work,
it only generates load.  and yes, indeed, you can cause contention
if you apply enough load in the right places.  this does NOT indicate
that any real apps apply the same load in the same places.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC][PATCH] Scalable FD Management using Read-Copy-Update

2001-04-16 Thread Mark Hahn

> The improvement in performance while runnig "chat" benchmark 
> (from http://lbs.sourceforge.net/) is about 30% in average throughput.

isn't this a solution in search of a problem?
does it make sense to redesign parts of the kernel for the sole
purpose of making a completely unrealistic benchmark run faster?

(the chat "benchmark" is a simple pingpong load-generator; it is
not in the same category as, say, specweb, since it does not do *any*
realistic (nonlocal) IO.  the numbers "chat" returns are interesting,
but not indicative of any problem; perhaps even less than lmbench
components.)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [RFC][PATCH] Scalable FD Management using Read-Copy-Update

2001-04-16 Thread Mark Hahn

 The improvement in performance while runnig "chat" benchmark 
 (from http://lbs.sourceforge.net/) is about 30% in average throughput.

isn't this a solution in search of a problem?
does it make sense to redesign parts of the kernel for the sole
purpose of making a completely unrealistic benchmark run faster?

(the chat "benchmark" is a simple pingpong load-generator; it is
not in the same category as, say, specweb, since it does not do *any*
realistic (nonlocal) IO.  the numbers "chat" returns are interesting,
but not indicative of any problem; perhaps even less than lmbench
components.)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: memory allocation problems

2001-04-06 Thread Mark Hahn

> > note, though, that you *CAN* actually malloc a lot more than 1G: you
> > just have to avoid causing mmaps that chop your VM at
> > TASK_UNMAPPED_BASE:
> 
> Neat trick.  I didn't realize that you could avoid allocating the mmap()
> buffers for stdin and stdout.

noone ever said you had to use stdio.  or even use libc, for that matter!

> As was pointed out to me in January, another solution for i386 would be to
> fix a maximum stack size and have the mmap() allocations grow downward
> from the "top" of the stack (3GB - max stack size).  I'm not sure why that
> is not currently done.

problems get fixed when there's some pain involved: people bumping 
into a limit, or painfully bad code, etc.  not enough people are 
feeling any pain about the current design.

this (and the "move TASK_UNMAPPED_BASE" workaround) have been known
for years; I think someone even coded up a "grow vmareas down" patch
the last time we all discussed this.

> I once wrote a tiny patch to do this, and ran it successfully for a couple
> days, but knowing so little about the kernel I probably did it in a
> completely wrong, inefficient way.  For example, some of the vma
> structures are sorted in increasing address order, and so perhaps to do
> this properly one should change them to decreasing address order.

oh, I guess you did the patch ;)
seriously, resubmit it when 2.5 opens up.  the fact is that we currently
have two things that grow up, and one that grows down.  so obviously,
one up-grower must have an arbitrary limit.  switching vma's to down-growing
is a good solution, since it's actually *good* to limit stack growth.  
I wonder whether fortraners still put all their data on the stack;
they wouldn't be happy ;)

a simple workaround would be to turn TASK_UNMAPPED_AREA into a variable,
either system-wide or thread-specific (like ia64 already has!).  that's 
compatible with the improved vmas-down approach, too.

regards, mark hahn.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: memory allocation problems

2001-04-06 Thread Mark Hahn

> can get at most 2GB.  Newer glibc's allow you to tune the definition
> of "small" via an environment variable.

eventually, perhaps libc will be smart enough to create 
more arenas in mmaped space once sbrk fails.  note, though,
that you *CAN* actually malloc a lot more than 1G: you just
have to avoid causing mmaps that chop your VM at TASK_UNMAPPED_BASE:

#include 
#include 
#include 

void printnumber(unsigned n) {
char number[20];
int i;
for (i=sizeof(number)-1; i>=0 && n; i--) {
number[i] = '0' + (n % 10);
n /= 10;
}
i++;
write(1,number+i, sizeof(number)-i);
}
int main() {
unsigned total = 0;
const unsigned size = 32*1024;

while (malloc(size)) {
total += size;
printnumber(total>>20);
write(1,"\n",1);
}
return 0;
}

compile -static, of course; printnumber is to avoid stdio, which seems
to use mmap for a small scratch buffer.  I allocated 2942 MB on my 128M 
machine(had to add a swapfile temporarily, since so many tiny mallocs 
do touch nontrivial numbers of pages for arena bookkeeping.)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: memory allocation problems

2001-04-06 Thread Mark Hahn

 can get at most 2GB.  Newer glibc's allow you to tune the definition
 of "small" via an environment variable.

eventually, perhaps libc will be smart enough to create 
more arenas in mmaped space once sbrk fails.  note, though,
that you *CAN* actually malloc a lot more than 1G: you just
have to avoid causing mmaps that chop your VM at TASK_UNMAPPED_BASE:

#include unistd.h
#include stdlib.h
#include fcntl.h

void printnumber(unsigned n) {
char number[20];
int i;
for (i=sizeof(number)-1; i=0  n; i--) {
number[i] = '0' + (n % 10);
n /= 10;
}
i++;
write(1,number+i, sizeof(number)-i);
}
int main() {
unsigned total = 0;
const unsigned size = 32*1024;

while (malloc(size)) {
total += size;
printnumber(total20);
write(1,"\n",1);
}
return 0;
}

compile -static, of course; printnumber is to avoid stdio, which seems
to use mmap for a small scratch buffer.  I allocated 2942 MB on my 128M 
machine(had to add a swapfile temporarily, since so many tiny mallocs 
do touch nontrivial numbers of pages for arena bookkeeping.)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: memory allocation problems

2001-04-06 Thread Mark Hahn

  note, though, that you *CAN* actually malloc a lot more than 1G: you
  just have to avoid causing mmaps that chop your VM at
  TASK_UNMAPPED_BASE:
 
 Neat trick.  I didn't realize that you could avoid allocating the mmap()
 buffers for stdin and stdout.

noone ever said you had to use stdio.  or even use libc, for that matter!

 As was pointed out to me in January, another solution for i386 would be to
 fix a maximum stack size and have the mmap() allocations grow downward
 from the "top" of the stack (3GB - max stack size).  I'm not sure why that
 is not currently done.

problems get fixed when there's some pain involved: people bumping 
into a limit, or painfully bad code, etc.  not enough people are 
feeling any pain about the current design.

this (and the "move TASK_UNMAPPED_BASE" workaround) have been known
for years; I think someone even coded up a "grow vmareas down" patch
the last time we all discussed this.

 I once wrote a tiny patch to do this, and ran it successfully for a couple
 days, but knowing so little about the kernel I probably did it in a
 completely wrong, inefficient way.  For example, some of the vma
 structures are sorted in increasing address order, and so perhaps to do
 this properly one should change them to decreasing address order.

oh, I guess you did the patch ;)
seriously, resubmit it when 2.5 opens up.  the fact is that we currently
have two things that grow up, and one that grows down.  so obviously,
one up-grower must have an arbitrary limit.  switching vma's to down-growing
is a good solution, since it's actually *good* to limit stack growth.  
I wonder whether fortraners still put all their data on the stack;
they wouldn't be happy ;)

a simple workaround would be to turn TASK_UNMAPPED_AREA into a variable,
either system-wide or thread-specific (like ia64 already has!).  that's 
compatible with the improved vmas-down approach, too.

regards, mark hahn.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bugreport: Kernel 2.4.x crash

2001-04-03 Thread Mark Hahn

> 2. A Fileserver with an ABIT Hotrod 66 (htp366) controller will crash within
> 5-60 minutes after boot with a 2.4.x kernel. 2.2.x works fine. No other

no problem with ext2 on hpt366 here.

> Gnu C  2.95.3

hmm.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bugreport: Kernel 2.4.x crash

2001-04-03 Thread Mark Hahn

 2. A Fileserver with an ABIT Hotrod 66 (htp366) controller will crash within
 5-60 minutes after boot with a 2.4.x kernel. 2.2.x works fine. No other

no problem with ext2 on hpt366 here.

 Gnu C  2.95.3

hmm.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SMP on assym. x86

2001-03-22 Thread Mark Hahn

> > > > handle the situation with 2 different CPUs (AMP = Assymmetric
> > > > multiprocessing ;-) correctly.
> > > 
> > > "correctly".  Intel doesn't support this (mis)configuration:
> > > especially with different steppings, not to mention models.
> 
> I wouldn't call it misconfiguration, just because it's a bit more difficult
> to handle.

again, I *would* call it misconfiguration.  intel says explicitly that 
they don't support mixing model/family parts.  and they only test
same-clock combinations (but do support mixed steppings.)  just so people
don't get the impression that random, different CPUs are a sure thing...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SMP on assym. x86

2001-03-22 Thread Mark Hahn

handle the situation with 2 different CPUs (AMP = Assymmetric
multiprocessing ;-) correctly.
   
   "correctly".  Intel doesn't support this (mis)configuration:
   especially with different steppings, not to mention models.
 
 I wouldn't call it misconfiguration, just because it's a bit more difficult
 to handle.

again, I *would* call it misconfiguration.  intel says explicitly that 
they don't support mixing model/family parts.  and they only test
same-clock combinations (but do support mixed steppings.)  just so people
don't get the impression that random, different CPUs are a sure thing...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SMP on assym. x86

2001-03-21 Thread Mark Hahn

> recently upgrading one of my two CPUs, I found kernel-2.4.2 to be unable to
> handle the situation with 2 different CPUs (AMP = Assymmetric
> multiprocessing ;-) correctly.

"correctly".  Intel doesn't support this (mis)configuration:
especially with different steppings, not to mention models.

Alan has, or is working on, a workaround to handle differing 
multipliers by turning off the use of RDTSC.  this is the right approach 
to take in the kernel: disable features not shared by both processors, 
so correctly-configured machines are not penalized. 
and the kernel should LOUDLY WARN ABOUT this stuff on boot.

regards, mark hahn.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SMP on assym. x86

2001-03-21 Thread Mark Hahn

 recently upgrading one of my two CPUs, I found kernel-2.4.2 to be unable to
 handle the situation with 2 different CPUs (AMP = Assymmetric
 multiprocessing ;-) correctly.

"correctly".  Intel doesn't support this (mis)configuration:
especially with different steppings, not to mention models.

Alan has, or is working on, a workaround to handle differing 
multipliers by turning off the use of RDTSC.  this is the right approach 
to take in the kernel: disable features not shared by both processors, 
so correctly-configured machines are not penalized. 
and the kernel should LOUDLY WARN ABOUT this stuff on boot.

regards, mark hahn.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: UDMA 100 / PIIX4 question

2001-03-20 Thread Mark Hahn


>Device BootStart   EndBlocks   Id  System
> /dev/hda1   * 1   932   7486258+   b  Win95 FAT32
> /dev/hda2   933  3737  22531162+   5  Extended
> /dev/hda5   933   935 24066   83  Linux
> /dev/hda6   936   952136521   82  Linux swap
> /dev/hda7   953  3737  22370481   83  Linux
> 
> 
> I also ran hdparm -tT /dev/hda1:
>  
> Timing buffer-cache reads:   128 MB in  1.28 seconds =100.00 MB/sec
>  Timing buffered disk reads:  64 MB in  4.35 seconds = 14.71 MB/sec
> 
> Which obviously gives much the same result as my usual hdparm -tT /dev/hda
> 
> I then tried hdparm -tT /dev/hda7:
> 
>  Timing buffer-cache reads:   128 MB in  1.28 seconds =100.00 MB/sec
>  Timing buffered disk reads:  64 MB in  2.12 seconds = 30.19 MB/sec
> 
> As you would expect, I get almost identical results with several repetitions.
> 
> Does this solve the mystery ?

no, it's quite odd.  hdparm -t cannot be effected by the filesystem
that lives in the partition, since hdparm is doing reads that don't
go through the filesystem.  hmm, I wonder if that's it: if you mount
the FS that's in hda1, it might change the block driver configuration
(changing the blocksize, for instance).  that would effect hdparm,
even though its reads don't go through the FS.

prediction: if you comment out the hda1 line in fstab, and reboot,
so that vfat never gets mounted on that partition, I predict that 
hdparm will show >30.19 MB/s on it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: UDMA 100 / PIIX4 question

2001-03-20 Thread Mark Hahn


Device BootStart   EndBlocks   Id  System
 /dev/hda1   * 1   932   7486258+   b  Win95 FAT32
 /dev/hda2   933  3737  22531162+   5  Extended
 /dev/hda5   933   935 24066   83  Linux
 /dev/hda6   936   952136521   82  Linux swap
 /dev/hda7   953  3737  22370481   83  Linux
 
 
 I also ran hdparm -tT /dev/hda1:
  
 Timing buffer-cache reads:   128 MB in  1.28 seconds =100.00 MB/sec
  Timing buffered disk reads:  64 MB in  4.35 seconds = 14.71 MB/sec
 
 Which obviously gives much the same result as my usual hdparm -tT /dev/hda
 
 I then tried hdparm -tT /dev/hda7:
 
  Timing buffer-cache reads:   128 MB in  1.28 seconds =100.00 MB/sec
  Timing buffered disk reads:  64 MB in  2.12 seconds = 30.19 MB/sec
 
 As you would expect, I get almost identical results with several repetitions.
 
 Does this solve the mystery ?

no, it's quite odd.  hdparm -t cannot be effected by the filesystem
that lives in the partition, since hdparm is doing reads that don't
go through the filesystem.  hmm, I wonder if that's it: if you mount
the FS that's in hda1, it might change the block driver configuration
(changing the blocksize, for instance).  that would effect hdparm,
even though its reads don't go through the FS.

prediction: if you comment out the hda1 line in fstab, and reboot,
so that vfat never gets mounted on that partition, I predict that 
hdparm will show 30.19 MB/s on it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: UDMA 100 / PIIX4 question

2001-03-19 Thread Mark Hahn

> > > I have an IBM DTLA 307030 (ATA 100 / UDMA 5) on an 815e board (Asus CUSL2), 
>which has a PIIX4 controller.
> > > ...
> > > My problem is that (according to hdparm -t), I never get a better transfer rate 
>than approximately 15.8 Mb/sec
> >
> > 15MB/s for hdparm is about right.

it's definitely not right: this disk sustains around 35 MB/s.

> Yes, since hdparm -t measures *SUSTAINED* transfers... the actual "head rate" of 
>data reads from
> disk surface.  Only if you read *only* data that is alread in harddrive's cache will 
>you get a speed
> close to the UDMA mode of the drive/controller.  The cache is around 1Mbyte, so for 
>a split-second
> re-read of some data

nonsequitur: the controller and disk are both quite capable of 
sustaining 20-35 MB/s (depending on zone.)  here's hdparm output
for a disk of the same rpm and density as the DTLA's:

 Timing buffer-cache reads:   128 MB in  1.07 seconds =119.63 MB/sec
 Timing buffered disk reads:  64 MB in  2.02 seconds = 31.68 MB/sec

(maxtor dm+45, hpt366 controller)
regards, mark hahn.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: UDMA 100 / PIIX4 question

2001-03-19 Thread Mark Hahn

   I have an IBM DTLA 307030 (ATA 100 / UDMA 5) on an 815e board (Asus CUSL2), 
which has a PIIX4 controller.
   ...
   My problem is that (according to hdparm -t), I never get a better transfer rate 
than approximately 15.8 Mb/sec
 
  15MB/s for hdparm is about right.

it's definitely not right: this disk sustains around 35 MB/s.

 Yes, since hdparm -t measures *SUSTAINED* transfers... the actual "head rate" of 
data reads from
 disk surface.  Only if you read *only* data that is alread in harddrive's cache will 
you get a speed
 close to the UDMA mode of the drive/controller.  The cache is around 1Mbyte, so for 
a split-second
 re-read of some data

nonsequitur: the controller and disk are both quite capable of 
sustaining 20-35 MB/s (depending on zone.)  here's hdparm output
for a disk of the same rpm and density as the DTLA's:

 Timing buffer-cache reads:   128 MB in  1.07 seconds =119.63 MB/sec
 Timing buffered disk reads:  64 MB in  2.02 seconds = 31.68 MB/sec

(maxtor dm+45, hpt366 controller)
regards, mark hahn.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Mark Hahn

> itself is a bad thing, particularly given the amount of CPU overhead that
> IDE drives demand while attached to the controller (orders of magnitude
> higher than a good SCSI controller) - the more overhead we can hand off to

I know this is just a troll by a scsi-believer, but I'm biting anyway.

on current machines and disks, ide costs a few % CPU, depending on 
which CPU, disk, kernel, the sustained bandwidth, etc.  I've measured
this using the now-trendy method of noticing how much the IO costs
a separate, CPU-bound benchmark: load = 1 - (unloadedPerf / loadedPerf).
my cheesy duron/600 desktop typically shows ~2% actual cost when running
bonnie's block IO tests.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Mark Hahn

 itself is a bad thing, particularly given the amount of CPU overhead that
 IDE drives demand while attached to the controller (orders of magnitude
 higher than a good SCSI controller) - the more overhead we can hand off to

I know this is just a troll by a scsi-believer, but I'm biting anyway.

on current machines and disks, ide costs a few % CPU, depending on 
which CPU, disk, kernel, the sustained bandwidth, etc.  I've measured
this using the now-trendy method of noticing how much the IO costs
a separate, CPU-bound benchmark: load = 1 - (unloadedPerf / loadedPerf).
my cheesy duron/600 desktop typically shows ~2% actual cost when running
bonnie's block IO tests.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.2ac8 lost char devices

2001-03-01 Thread Mark Hahn

> > > > > Well, somethig has broken in ac8, because I lost my PS/2 mouse and
> > > > me too .
> No luck.

it seems to be the mdelay(2) added to pc_keyb.c in -ac6.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.2ac8 lost char devices

2001-03-01 Thread Mark Hahn

 Well, somethig has broken in ac8, because I lost my PS/2 mouse and
me too /aol.
 No luck.

it seems to be the mdelay(2) added to pc_keyb.c in -ac6.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ide / usb problem

2001-02-26 Thread Mark Hahn

> the cable length in mind.  Anybody out there know if there's a max cable 
> length for the ATA/100 spec??

18", like *all* ide/ata cables.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: ide / usb problem

2001-02-26 Thread Mark Hahn

 the cable length in mind.  Anybody out there know if there's a max cable 
 length for the ATA/100 spec??

18", like *all* ide/ata cables.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: doing RAID 0 with HPT370

2001-02-14 Thread Mark Hahn

> do know I get the feeling they don't care to support Linux in any way
> shape or form. Feels like a pawn off job.

afaik, there's no hardware raid support in the chip - it's just 
another dual-channel controller, with some raid0 (perhaps raid1)
software in bios.  I think Andre has said that he has hopes of 
getting docs on HPT's on-disk raid layout - but this is a software
thing, and all it would give us is interoperability with that other OS.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: doing RAID 0 with HPT370

2001-02-14 Thread Mark Hahn

 do know I get the feeling they don't care to support Linux in any way
 shape or form. Feels like a pawn off job.

afaik, there's no hardware raid support in the chip - it's just 
another dual-channel controller, with some raid0 (perhaps raid1)
software in bios.  I think Andre has said that he has hopes of 
getting docs on HPT's on-disk raid layout - but this is a software
thing, and all it would give us is interoperability with that other OS.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH/REQ] Increase kmsg buffer from 16K to 32K, kernel/printk.c

2001-01-31 Thread Mark Hahn

> > Would it be possible to grow and shring that buffer on demand?
> > Let's say we have a default size and let it grow to a maximum
> > value. After some timeout, buffer size can be shrinked to
> > default value if it's enough at that moment. Or something
> > similar.
> 
> And when you can't allocate memory for expanding the
> printk() ringbuffer?  Print a message? ;)

;)
but seriously, we normally need a big printk buffer only because 
of boot messages.  no reason I know we couldn't shrink it down
to something quite modest (4k?  plenty for a few oopses) after boot.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: VT82C686A corruption with 2.4.x

2001-01-31 Thread Mark Hahn

>From what I gather this chipset on 2.4.x is only stable if you cripple just about 
>everything that makes
> it worth having (udma, 2nd ide channel  etc etc)  ?does it even work when all 
>that's done now or is
> it fully functional?

it seems to be fully functional for some or most people,
with two, apparently, reporting major problems.

my via (kt133) is flawless in 2.4.1 (a drive on each channel,
udma enabled and in use) and has for all the 2.3's since I got it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH/REQ] Increase kmsg buffer from 16K to 32K, kernel/printk.c

2001-01-31 Thread Mark Hahn

  Would it be possible to grow and shring that buffer on demand?
  Let's say we have a default size and let it grow to a maximum
  value. After some timeout, buffer size can be shrinked to
  default value if it's enough at that moment. Or something
  similar.
 
 And when you can't allocate memory for expanding the
 printk() ringbuffer?  Print a message? ;)

;)
but seriously, we normally need a big printk buffer only because 
of boot messages.  no reason I know we couldn't shrink it down
to something quite modest (4k?  plenty for a few oopses) after boot.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: *massive* slowdowns on 2.4.1-pre1[1|2]

2001-01-29 Thread Mark Hahn

> Kernel 2.4.1-pre11 and pre12 are both massively slower than 2.4.0 on the
> same machine, compiled with the same options.  The machine is a Athlon
> 900 on a KT133 chipset.  The slowdown is noticealbe in all areas...

this is known: Linus decreed that, since two people reported 
disk corruption on VIA, any machine with a VIA southbridge
must boot in stupid 1992 mode (PIO).  (yes, it might be possible
to boot with ide=autodma or something, but who would guess?)

Linus: I hope you don't consider this a releasable state!
VIA now owns 40% of the chipset market...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: More on the VIA KT133 chipset misbehaving in Linux

2001-01-29 Thread Mark Hahn

> I am not a guru, but AOpen AK73PRO which uses VIA KT133 does not
> show any of these symptoms that you describe (I cannot be sure
> about #3 since I run ntp).  You may want to make your hardware

my ga-7zm shows none of them either (I also run ntp, and the board 
has a perfectly normal drift history.)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: More on the VIA KT133 chipset misbehaving in Linux

2001-01-29 Thread Mark Hahn

 I am not a guru, but AOpen AK73PRO which uses VIA KT133 does not
 show any of these symptoms that you describe (I cannot be sure
 about #3 since I run ntp).  You may want to make your hardware

my ga-7zm shows none of them either (I also run ntp, and the board 
has a perfectly normal drift history.)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: *massive* slowdowns on 2.4.1-pre1[1|2]

2001-01-29 Thread Mark Hahn

 Kernel 2.4.1-pre11 and pre12 are both massively slower than 2.4.0 on the
 same machine, compiled with the same options.  The machine is a Athlon
 900 on a KT133 chipset.  The slowdown is noticealbe in all areas...

this is known: Linus decreed that, since two people reported 
disk corruption on VIA, any machine with a VIA southbridge
must boot in stupid 1992 mode (PIO).  (yes, it might be possible
to boot with ide=autodma or something, but who would guess?)

Linus: I hope you don't consider this a releasable state!
VIA now owns 40% of the chipset market...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux Post codes during runtime, possibly OT

2001-01-26 Thread Mark Hahn

>  #ifdef SLOW_IO_BY_JUMPING
>  #define __SLOW_DOWN_IO "\njmp 1f\n1:\tjmp 1f\n1:"
>  #else
> -#define __SLOW_DOWN_IO "\noutb %%al,$0x80"
> +#define __SLOW_DOWN_IO "\noutb %%al,$0x19"

this is nutty: why can't udelay be used here?  empirical measurements
in the thread show the delay is O(2us).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux Post codes during runtime, possibly OT

2001-01-26 Thread Mark Hahn

  #ifdef SLOW_IO_BY_JUMPING
  #define __SLOW_DOWN_IO "\njmp 1f\n1:\tjmp 1f\n1:"
  #else
 -#define __SLOW_DOWN_IO "\noutb %%al,$0x80"
 +#define __SLOW_DOWN_IO "\noutb %%al,$0x19"

this is nutty: why can't udelay be used here?  empirical measurements
in the thread show the delay is O(2us).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: multi-queue scheduler update

2001-01-18 Thread Mark Hahn

> >microseconds/yield
> > # threads  2.2.16-22   2.42.4-multi-queue
> >    -  ---
> > 16   18.7404.603 1.455
> 
> I remeber the O(1) scheduler from Davide Libenzi was beating the mainline O(N)

isn't the normal case (as in "The Right Case to optimize") 
where there are close to zero runnable tasks?  what realistic/sane
scenarios have very large numbers of spinning threads?  all server
situations I can think of do not.  not volanomark -loopback, surely!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: multi-queue scheduler update

2001-01-18 Thread Mark Hahn

 microseconds/yield
  # threads  2.2.16-22   2.42.4-multi-queue
     -  ---
  16   18.7404.603 1.455
 
 I remeber the O(1) scheduler from Davide Libenzi was beating the mainline O(N)

isn't the normal case (as in "The Right Case to optimize") 
where there are close to zero runnable tasks?  what realistic/sane
scenarios have very large numbers of spinning threads?  all server
situations I can think of do not.  not volanomark -loopback, surely!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.1-pre1 breaks XFree 4.0.2 and "w"

2001-01-12 Thread Mark Hahn

> This way we are 100% consistent and we don't lose the "cpu_has" information.

but /dev/cpu/*/{msr|cpuid} are "cpu has".

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: APIC ERRor on CPU0: 00(02) ...

2001-01-12 Thread Mark Hahn

> I have a Motherboard BP6 with two Celeron 500 (Not overclocked) and
...
> APIC error on CPU1: 00(08)
...
> What wrongs ?

Abit designed the board wrong.  there are things you can do to reduce 
the incidence of this error: upgrading the bios, better cooling, more
powerful power supply, replacing an out-of-spec capacitor (if v1.1).
jeez, it's almost like a 12-step program for recovering from BP6ing ;)

>  This message doesn 't  appears in Kernel-2.2.17 only in Kernel-2.4

indeed: the error still happens in 2.2, but is simply not reported.
note also that this message is a *warning* - an inter-apic message 
was corrupted, and automatically retried.

regards, mark hahn.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: APIC ERRor on CPU0: 00(02) ...

2001-01-12 Thread Mark Hahn

 I have a Motherboard BP6 with two Celeron 500 (Not overclocked) and
...
 APIC error on CPU1: 00(08)
...
 What wrongs ?

Abit designed the board wrong.  there are things you can do to reduce 
the incidence of this error: upgrading the bios, better cooling, more
powerful power supply, replacing an out-of-spec capacitor (if v1.1).
jeez, it's almost like a 12-step program for recovering from BP6ing ;)

  This message doesn 't  appears in Kernel-2.2.17 only in Kernel-2.4

indeed: the error still happens in 2.2, but is simply not reported.
note also that this message is a *warning* - an inter-apic message 
was corrupted, and automatically retried.

regards, mark hahn.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4.1-pre1 breaks XFree 4.0.2 and w

2001-01-12 Thread Mark Hahn

 This way we are 100% consistent and we don't lose the "cpu_has" information.

but /dev/cpu/*/{msr|cpuid} are "cpu has".

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: IDE DMA problems on 2.4.0 with vt82c686a driver

2001-01-11 Thread Mark Hahn

> Since this looks like either a chipset, drive, or driver problem, I am 

no: the only entities involved with udma crc's are the drive,
the controller (and the cable).  the kernel is not involved in any way
(except to configure udma, of course.)

> occasionally (not often/constant, but sometimes) get CRC errors:
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

nothing wrong here.  occasional crc retries cause no performance impact.

> After reading some archives in linux-kernel, I tried changing some 
> options. Then I changed out the 40 pin, 80 wire cable with a new one. 

great, since without the 80c cable, udma > 33 is illegal.
is it safe to assume your cable is also 18" or less, with both ends
plugged in (no stub)?  you might be able to minimize CRC retries
by changing where the cable runs.  it's also conceivable that CRC
errors would be caused by marginal power, bad trace layout on the 
motherboard, and definitely by overclocking (PCI other than 33 MHz).

> My main concern that I havnt beem able to find an answer for on any 
> archives or documentation, Can this cause file system corruption in any way?

abosolutely not.  udma checksums each transfer.  when checksums don't match,
the *hardware* retries the transfer (and incidentally reports the event,
which Linux obligingly passes on to you.)

regards, mark hahn.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: IDE DMA problems on 2.4.0 with vt82c686a driver

2001-01-11 Thread Mark Hahn

 Since this looks like either a chipset, drive, or driver problem, I am 

no: the only entities involved with udma crc's are the drive,
the controller (and the cable).  the kernel is not involved in any way
(except to configure udma, of course.)

 occasionally (not often/constant, but sometimes) get CRC errors:
 hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
 hda: dma_intr: error=0x84 { DriveStatusError BadCRC }

nothing wrong here.  occasional crc retries cause no performance impact.

 After reading some archives in linux-kernel, I tried changing some 
 options. Then I changed out the 40 pin, 80 wire cable with a new one. 

great, since without the 80c cable, udma  33 is illegal.
is it safe to assume your cable is also 18" or less, with both ends
plugged in (no stub)?  you might be able to minimize CRC retries
by changing where the cable runs.  it's also conceivable that CRC
errors would be caused by marginal power, bad trace layout on the 
motherboard, and definitely by overclocking (PCI other than 33 MHz).

 My main concern that I havnt beem able to find an answer for on any 
 archives or documentation, Can this cause file system corruption in any way?

abosolutely not.  udma checksums each transfer.  when checksums don't match,
the *hardware* retries the transfer (and incidentally reports the event,
which Linux obligingly passes on to you.)

regards, mark hahn.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Fw: Change of policy for future 2.2 driver submissions

2001-01-05 Thread Mark Hahn

> since Mark posted his views to the list, I figured I could safely post the
> conversation I've been having with him in email

which is universally considered rude, if not illegal.  

in any case, please don't respond to this thread, which is quite off-topic.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Fw: Change of policy for future 2.2 driver submissions

2001-01-05 Thread Mark Hahn

 since Mark posted his views to the list, I figured I could safely post the
 conversation I've been having with him in email

which is universally considered rude, if not illegal.  

in any case, please don't respond to this thread, which is quite off-topic.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Change of policy for future 2.2 driver submissions

2001-01-04 Thread Mark Hahn

> I personaly do not trust the 2.4.x kernel entirely yet, and would prefer to
...
> afraid that this may partialy criple 2.2 driver development.

egads!  how can there be "development" on a *stable* kernel line?

maybe this is the time to reconsider terminology/policy:
does "stable" mean "bugfixes only"?  
or does it mean "development kernel for conservatives"?

me, I've run the "progressive" kernel line on production boxes since ~2.3.36.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Monitoring filesystems / blockdevice for errors

2000-12-17 Thread Mark Hahn

> currently, there is no way for an external application to monitor whether a
> filesystem or underlaying block device has hit an error condition - internal
> inconsistency, read or write error, whatever.
> 
> Short of parsing syslog messages, which isn't particularly great.

what's wrong with it?  reinventing /proc/kmsg and klogd would be tre gross.

> I don't have a real idea how this could be added, short of adding a field to
> /proc/partitions (error count) or something similiar.

for reporting errors, that might be OK, but it's not a particularly nice
_notification_ mechanism...

regards, mark hahn.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Monitoring filesystems / blockdevice for errors

2000-12-17 Thread Mark Hahn

 currently, there is no way for an external application to monitor whether a
 filesystem or underlaying block device has hit an error condition - internal
 inconsistency, read or write error, whatever.
 
 Short of parsing syslog messages, which isn't particularly great.

what's wrong with it?  reinventing /proc/kmsg and klogd would be tre gross.

 I don't have a real idea how this could be added, short of adding a field to
 /proc/partitions (error count) or something similiar.

for reporting errors, that might be OK, but it's not a particularly nice
_notification_ mechanism...

regards, mark hahn.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux 2.2.18 release notes

2000-12-12 Thread Mark Hahn

> > - metrics -- L1 cacheline size is the important one: you align array
...
> Anyone can give me some pointers on how this is done runtime ? (name of
> the .c file is fine).

kernel/sched.c:aligned_data.  as mentioned elsewhere, 
the correct alignment is not necessarily L1 linesize.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



  1   2   >