from:"Roger Larsson"

Re: SMP spin-locks

2001-06-14 Thread Roger Larsson


On Thursday 14 June 2001 23:05, you wrote:
> On Thu, 14 Jun 2001, Roger Larsson wrote:
> > Hi,
> >
> > Wait a minute...
> >
> > Spinlocks on a embedded system? Is it _really_ SMP?
>
> The embedded system is not SMP. However, there is definite
> advantage to using an unmodified kernel that may/may-not
> have been compiled for SMP. Of course spin-locks are used
> to prevent interrupts from screwing up buffer pointers, etc.
>

Not really - it prevents another processor entering the same code
segment  (spin_lock_irqsave prevents both another processor and
local interrupts).

An interrupt on UP can not wait on a spin lock - it will never be released
since no other code than the interrupt spinning will be able to execute)


> > What kind of performance problem do you have?
>
> The problem is that a data acquisition board across the PCI bus
> gives a data transfer rate of 10 to 11 megabytes per second
> with a UP kernel, and the transfer drops to 5-6 megabytes per
> second with a SMP kernel. The ISR is really simple and copies
> data, that's all.
>
> The 'read()' routine uses a spinlock when it modifies pointers.
>
> I started to look into where all the CPU clocks were going. The
> SMP spinlock code is where it's going. There is often contention
> for the lock because interrupts normally occur at 50 to 60 kHz.
>

SMP compiled kernel, but running on UP hardware - right?
Then this _should not_ happen!

see linux/Documentation/spinlocks.txt

Is it your spinlocks that are causing this, or?

> When there is contention, a very longjump occurs into
> the test.lock segment. I think this is flushing queues.
>

It does not matter, if there is contention - let it take time. Waiting is what
spinlocking is about anyway...

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SMP spin-locks

2001-06-14 Thread Roger Larsson


Hi,

Wait a minute...

Spinlocks on a embedded system? Is it _really_ SMP?

What kind of performance problem do you have?
My guess, since you are mentioning spin locks, is that you are
having a latency problem - RT process does not execute/start
quickly enough?

If that is the case you should look at Andrew Mortons low latency
patches.
 http://www.uow.edu.au/~andrewm/linux/schedlat.html

/RogerL

On Thursday 14 June 2001 19:26, Richard B. Johnson wrote:
> I __finally__ got back on "the list". They finally fixed the
> company firewall!
>
> During my absence, I had the chance to look at some SMP code
> because of a performance problem (a few microseconds out of
> spec on a 130 MHz embedded system) and I have a question about
> the current spin-locks.
>
> Spin-locks now transfer control to the .text.lock segment.
> This is a separate segment that can be at an offset that
> is far away from the currently executing code. That may
> cause the cache to be reloaded. Further, each spin-lock
> invocation generates separate code within that segment.
>
> Question 1: Why?
>
> Question 2: What is the purpose of the code sequence, "repz nop"
> generated by the spin-lock code? Is this a processor BUG work-around?
> `as` doesn't "like" this sequence and, Intel doesn't seem to
> document it.
>
>
> Cheers,
> Dick Johnson
>
> Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).
>
> "Memory is like gasoline. You use it up when you are running. Of
> course you get it all back when you reboot..."; Actual explanation
> obtained from the Micro$oft help desk.
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Roger Larsson
Skellefteå
Sweden
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.4.6-pre2, pre3 VM Behavior

2001-06-14 Thread Roger Larsson


On Thursday 14 June 2001 10:47, Daniel Phillips wrote:
> On Thursday 14 June 2001 05:16, Rik van Riel wrote:
> > On Wed, 13 Jun 2001, Tom Sightler wrote:
> > > Quoting Rik van Riel <[EMAIL PROTECTED]>:
> > > > After the initial burst, the system should stabilise,
> > > > starting the writeout of pages before we run low on
> > > > memory. How to handle the initial burst is something
> > > > I haven't figured out yet ... ;)
> > >
> > > Well, at least I know that this is expected with the VM, although I do
> > > still think this is bad behavior.  If my disk is idle why would I wait
> > > until I have greater than 100MB of data to write before I finally
> > > start actually moving some data to disk?
> >
> > The file _could_ be a temporary file, which gets removed
> > before we'd get around to writing it to disk. Sure, the
> > chances of this happening with a single file are close to
> > zero, but having 100MB from 200 different temp files on a
> > shell server isn't unreasonable to expect.
>
> This still doesn't make sense if the disk bandwidth isn't being used.
>

It does if you are running on a laptop. Then you do not want the pages
go out all the time. Disk has gone too sleep, needs to start to write a few
pages, stays idle for a while, goes to sleep, a few more pages, ...

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.4.6-pre2, pre3 VM Behavior

2001-06-14 Thread Roger Larsson


On Thursday 14 June 2001 10:47, Daniel Phillips wrote:
 On Thursday 14 June 2001 05:16, Rik van Riel wrote:
  On Wed, 13 Jun 2001, Tom Sightler wrote:
   Quoting Rik van Riel [EMAIL PROTECTED]:
After the initial burst, the system should stabilise,
starting the writeout of pages before we run low on
memory. How to handle the initial burst is something
I haven't figured out yet ... ;)
  
   Well, at least I know that this is expected with the VM, although I do
   still think this is bad behavior.  If my disk is idle why would I wait
   until I have greater than 100MB of data to write before I finally
   start actually moving some data to disk?
 
  The file _could_ be a temporary file, which gets removed
  before we'd get around to writing it to disk. Sure, the
  chances of this happening with a single file are close to
  zero, but having 100MB from 200 different temp files on a
  shell server isn't unreasonable to expect.

 This still doesn't make sense if the disk bandwidth isn't being used.


It does if you are running on a laptop. Then you do not want the pages
go out all the time. Disk has gone too sleep, needs to start to write a few
pages, stays idle for a while, goes to sleep, a few more pages, ...

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SMP spin-locks

2001-06-14 Thread Roger Larsson


Hi,

Wait a minute...

Spinlocks on a embedded system? Is it _really_ SMP?

What kind of performance problem do you have?
My guess, since you are mentioning spin locks, is that you are
having a latency problem - RT process does not execute/start
quickly enough?

If that is the case you should look at Andrew Mortons low latency
patches.
 http://www.uow.edu.au/~andrewm/linux/schedlat.html

/RogerL

On Thursday 14 June 2001 19:26, Richard B. Johnson wrote:
 I __finally__ got back on the list. They finally fixed the
 company firewall!

 During my absence, I had the chance to look at some SMP code
 because of a performance problem (a few microseconds out of
 spec on a 130 MHz embedded system) and I have a question about
 the current spin-locks.

 Spin-locks now transfer control to the .text.lock segment.
 This is a separate segment that can be at an offset that
 is far away from the currently executing code. That may
 cause the cache to be reloaded. Further, each spin-lock
 invocation generates separate code within that segment.

 Question 1: Why?

 Question 2: What is the purpose of the code sequence, repz nop
 generated by the spin-lock code? Is this a processor BUG work-around?
 `as` doesn't like this sequence and, Intel doesn't seem to
 document it.


 Cheers,
 Dick Johnson

 Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

 Memory is like gasoline. You use it up when you are running. Of
 course you get it all back when you reboot...; Actual explanation
 obtained from the Micro$oft help desk.


 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
Roger Larsson
Skellefteå
Sweden
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SMP spin-locks

2001-06-14 Thread Roger Larsson


On Thursday 14 June 2001 23:05, you wrote:
 On Thu, 14 Jun 2001, Roger Larsson wrote:
  Hi,
 
  Wait a minute...
 
  Spinlocks on a embedded system? Is it _really_ SMP?

 The embedded system is not SMP. However, there is definite
 advantage to using an unmodified kernel that may/may-not
 have been compiled for SMP. Of course spin-locks are used
 to prevent interrupts from screwing up buffer pointers, etc.


Not really - it prevents another processor entering the same code
segment  (spin_lock_irqsave prevents both another processor and
local interrupts).

An interrupt on UP can not wait on a spin lock - it will never be released
since no other code than the interrupt spinning will be able to execute)


  What kind of performance problem do you have?

 The problem is that a data acquisition board across the PCI bus
 gives a data transfer rate of 10 to 11 megabytes per second
 with a UP kernel, and the transfer drops to 5-6 megabytes per
 second with a SMP kernel. The ISR is really simple and copies
 data, that's all.

 The 'read()' routine uses a spinlock when it modifies pointers.

 I started to look into where all the CPU clocks were going. The
 SMP spinlock code is where it's going. There is often contention
 for the lock because interrupts normally occur at 50 to 60 kHz.


SMP compiled kernel, but running on UP hardware - right?
Then this _should not_ happen!

see linux/Documentation/spinlocks.txt

Is it your spinlocks that are causing this, or?

 When there is contention, a very longjump occurs into
 the test.lock segment. I think this is flushing queues.


It does not matter, if there is contention - let it take time. Waiting is what
spinlocking is about anyway...

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kswapd and MM overload fix

2001-06-05 Thread Roger Larsson


On Wednesday66 0666 6 June 2001 01:16, Linus Torvalds wrote:
> On Wed, 6 Jun 2001, Andrea Arcangeli wrote:
> > Anybody running on a machine with some zone empty (<16Mbyte boxes (PDAs),
> > <1G x86 with highmem enabled kernel or an arch with an iommu like alpha)
> > probably noticed that the MM was unusable on those hardware
> > configurations (as I also incidentally mentioned a few times on l-k in
> > the last months).
> >
> > Yesterday I checked and here it is the fix (included in 2.4.5aa3).
>
> I think the real problem is that zone->pages_{min,low,high} aren't
> initialized at all for empty zones. If they were initialized (to zero, of
> course), the shortage calculations would have worked automatically.
>
> Using uninitialized variables is always bad. Your patch is certainly fine,
> but I think we should also make the init code clear the zone data for
> empty zones so that these kinds of "use uninitialized stuff" things cannot
> happen. No?
>  

Lets see - that zone will have no free nor inactive pages 

In page_alloc.c:254  function __alloc_pages_limit
where water_mark will be zero too...
  if (z->free_pages + z->inactive_clean_pages >= water_mark) {
we will attempt a lot of interesting/unnecessary stuff.
But it should be caught by the test a few lines up...
if (!z->size)
BUG();

In page_alloc.c:331 (function __alloc_pages)
if (z->free_pages >= z->pages_low) {
page = rmqueue(z, order);
if (page)
return page;

Hmm... a lot more than first meets the eye.
Note: >= matches < in another place, removing the = will leave the mm stuck...

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kswapd and MM overload fix

2001-06-05 Thread Roger Larsson


On Wednesday66 0666 6 June 2001 01:16, Linus Torvalds wrote:
 On Wed, 6 Jun 2001, Andrea Arcangeli wrote:
  Anybody running on a machine with some zone empty (16Mbyte boxes (PDAs),
  1G x86 with highmem enabled kernel or an arch with an iommu like alpha)
  probably noticed that the MM was unusable on those hardware
  configurations (as I also incidentally mentioned a few times on l-k in
  the last months).
 
  Yesterday I checked and here it is the fix (included in 2.4.5aa3).

 I think the real problem is that zone-pages_{min,low,high} aren't
 initialized at all for empty zones. If they were initialized (to zero, of
 course), the shortage calculations would have worked automatically.

 Using uninitialized variables is always bad. Your patch is certainly fine,
 but I think we should also make the init code clear the zone data for
 empty zones so that these kinds of use uninitialized stuff things cannot
 happen. No?
  

Lets see - that zone will have no free nor inactive pages 

In page_alloc.c:254  function __alloc_pages_limit
where water_mark will be zero too...
  if (z-free_pages + z-inactive_clean_pages = water_mark) {
we will attempt a lot of interesting/unnecessary stuff.
But it should be caught by the test a few lines up...
if (!z-size)
BUG();

In page_alloc.c:331 (function __alloc_pages)
if (z-free_pages = z-pages_low) {
page = rmqueue(z, order);
if (page)
return page;

Hmm... a lot more than first meets the eye.
Note: = matches  in another place, removing the = will leave the mm stuck...

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.4 and 2GB swap partition limit

2001-05-01 Thread Roger Larsson


On Wednesday 02 May 2001 02:43, Rik van Riel wrote:
> On Tue, 1 May 2001, David S. Miller wrote:
> > Rik van Riel writes:
> >  > Then we will be scanning through memory looking for something to
> >  > swap out (otherwise we'd not be in need of swap space, right?).
> >  > At this point we can simply free up swap entries while scanning
> >  > through memory looking for stuff to swap out.
> >
> > Sounds a lot like my patch I posted a few weeks ago:
>
> Not really. Your patch only reclaims swap cache pages that
> hang around after a process exit()s. What I want to do is
> reclaim swap space of pages which have been swapped in so
> we can use that swap space to swap something else out.
>

We could reclaim swap space for dirty pages. They have to be
rewritten anyway...

Or would the fragmentation risk be too high?

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.4 and 2GB swap partition limit

2001-05-01 Thread Roger Larsson


On Wednesday 02 May 2001 02:43, Rik van Riel wrote:
 On Tue, 1 May 2001, David S. Miller wrote:
  Rik van Riel writes:
Then we will be scanning through memory looking for something to
swap out (otherwise we'd not be in need of swap space, right?).
At this point we can simply free up swap entries while scanning
through memory looking for stuff to swap out.
 
  Sounds a lot like my patch I posted a few weeks ago:

 Not really. Your patch only reclaims swap cache pages that
 hang around after a process exit()s. What I want to do is
 reclaim swap space of pages which have been swapped in so
 we can use that swap space to swap something else out.


We could reclaim swap space for dirty pages. They have to be
rewritten anyway...

Or would the fragmentation risk be too high?

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux-Kernel Archive: No 100 HZ timer !

2001-04-14 Thread Roger Larsson


On Thursday 12 April 2001 23:52, Andre Hedrick wrote:
> Okay but what will be used for a base for hardware that has critical
> timing issues due to the rules of the hardware?
>
> I do not care but your drives/floppy/tapes/cdroms/cdrws do:
>
> /*
>  * Timeouts for various operations:
>  */
> #define WAIT_DRQ(5*HZ/100)  /* 50msec - spec allows up to 20ms
> */ #ifdef CONFIG_APM
> #define WAIT_READY  (5*HZ)  /* 5sec - some laptops are very
> slow */ #else
> #define WAIT_READY  (3*HZ/100)  /* 30msec - should be instantaneous
> */ #endif /* CONFIG_APM */
> #define WAIT_PIDENTIFY  (10*HZ) /* 10sec  - should be less than 3ms (?), if
> all ATAPI CD is closed at boot */ #define WAIT_WORSTCASE  (30*HZ) /* 30sec 
> - worst case when spinning up */ #define WAIT_CMD(10*HZ) /* 10sec 
> - maximum wait for an IRQ to happen */ #define WAIT_MIN_SLEEP  (2*HZ/100)  
>/* 20msec - minimum sleep time */
>
> Give me something for HZ or a rule for getting a known base so I can have
> your storage work and not corrupt.
>

Wouldn't it make sense to define these in real world units?
And to use that to determine requested accuracy...

Those who wait for seconds will probably not have a problem with up to (half) 
a second longer wait - or...?
Those in range of the current jiffie should be able to handle up to one 
jiffie longer...

Requesting a wait in ms gives yo ms accuracy...

/RogerL

-- 
Roger Larsson
Skellefteå
Sweden
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux-Kernel Archive: No 100 HZ timer !

2001-04-14 Thread Roger Larsson


On Thursday 12 April 2001 23:52, Andre Hedrick wrote:
 Okay but what will be used for a base for hardware that has critical
 timing issues due to the rules of the hardware?

 I do not care but your drives/floppy/tapes/cdroms/cdrws do:

 /*
  * Timeouts for various operations:
  */
 #define WAIT_DRQ(5*HZ/100)  /* 50msec - spec allows up to 20ms
 */ #ifdef CONFIG_APM
 #define WAIT_READY  (5*HZ)  /* 5sec - some laptops are very
 slow */ #else
 #define WAIT_READY  (3*HZ/100)  /* 30msec - should be instantaneous
 */ #endif /* CONFIG_APM */
 #define WAIT_PIDENTIFY  (10*HZ) /* 10sec  - should be less than 3ms (?), if
 all ATAPI CD is closed at boot */ #define WAIT_WORSTCASE  (30*HZ) /* 30sec 
 - worst case when spinning up */ #define WAIT_CMD(10*HZ) /* 10sec 
 - maximum wait for an IRQ to happen */ #define WAIT_MIN_SLEEP  (2*HZ/100)  
/* 20msec - minimum sleep time */

 Give me something for HZ or a rule for getting a known base so I can have
 your storage work and not corrupt.


Wouldn't it make sense to define these in real world units?
And to use that to determine requested accuracy...

Those who wait for seconds will probably not have a problem with up to (half) 
a second longer wait - or...?
Those in range of the current jiffie should be able to handle up to one 
jiffie longer...

Requesting a wait in ms gives yo ms accuracy...

/RogerL

-- 
Roger Larsson
Skellefte
Sweden
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [speculation] Partitioning the kernel

2001-03-31 Thread Roger Larsson


On Saturday 31 March 2001 22:55, Sandy Harris wrote:
> I'm wondering whether we have or need a formalisation of how work might be
> divided in future kernels.
>
> The question I'm interested in is how the work gets split up among various
> components at different levels within a single box (not SMP with many at
> the same level, or various multi-box techniques), in particular how you
> separate computation and I/O given some intelligence in devices other than
> the main CPU (or SMP set).
>
> There are a bunch of examples to look at:
>
>   IBM mainframe
> "channel processors" do all the I/O
> main CPU sets up a control block, does an EXCP instruction
> there's an interrupt when operation completes or fails
>
>   VAX 782: basically two 780s with a big cable between busses
> one has disk controllers, most of the (VMS) kernel
> other has serial I/O, runs all user processes
>
>   various smart network or disk controllers
>   and really smart ones that do RAID or Crypto
>
>   I2O stuff on newer PCs
>
>   Larry McVoy's suggestion that the right way to run, say, a 32-CPU
> box is with something like 8 separate kernels, each using 4 CPUs
>   If one of those runs the file system for everyone, this somewhat
> overlaps the techniques listed above.
>
> All of these demonstrably work, but each partitions the work between
> processors in a somewhat different way.
>
> What I'm wondering is whether, given that many drivers have a top-half
> vs. bottom-half split as a fairly basic part of their design, it would
> make sense to make it a design goal to have a clean partition at that
> boundary.
>
> On well-endowed systems, you then have the main CPUs running the top half
> of everything, while I2O processors handle all the bottom halves and the
> I/O interrupts. On lesser boxes, the CPU does both halves.
>
> It seems to me this might give a cleaner design than one where the work
> is partitioned between devices at some other boundary.
>
> If the locks you need between top and bottom halves of the driver are also
> controlling most or all CPU-to-I2O communication, it might go some way
> toward avoiding the "locking cliff" McVoy talks of.

A small cheap processor to do this with would be the ETRAX 100LX (LX = Linux)
Put an ETRAX100LX (integrated IDE, ethernet, and ...) on an IDE controller.
Telnet / SSH to your PCI boards :-)

Cheapest possible system might be one without a main CPU...
It would be possible to rebalance where to create the interface over time.

/RogerL


-- 
Roger Larsson
Skellefteå
Sweden
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [speculation] Partitioning the kernel

2001-03-31 Thread Roger Larsson


On Saturday 31 March 2001 22:55, Sandy Harris wrote:
 I'm wondering whether we have or need a formalisation of how work might be
 divided in future kernels.

 The question I'm interested in is how the work gets split up among various
 components at different levels within a single box (not SMP with many at
 the same level, or various multi-box techniques), in particular how you
 separate computation and I/O given some intelligence in devices other than
 the main CPU (or SMP set).

 There are a bunch of examples to look at:

   IBM mainframe
 "channel processors" do all the I/O
 main CPU sets up a control block, does an EXCP instruction
 there's an interrupt when operation completes or fails

   VAX 782: basically two 780s with a big cable between busses
 one has disk controllers, most of the (VMS) kernel
 other has serial I/O, runs all user processes

   various smart network or disk controllers
   and really smart ones that do RAID or Crypto

   I2O stuff on newer PCs

   Larry McVoy's suggestion that the right way to run, say, a 32-CPU
 box is with something like 8 separate kernels, each using 4 CPUs
   If one of those runs the file system for everyone, this somewhat
 overlaps the techniques listed above.

 All of these demonstrably work, but each partitions the work between
 processors in a somewhat different way.

 What I'm wondering is whether, given that many drivers have a top-half
 vs. bottom-half split as a fairly basic part of their design, it would
 make sense to make it a design goal to have a clean partition at that
 boundary.

 On well-endowed systems, you then have the main CPUs running the top half
 of everything, while I2O processors handle all the bottom halves and the
 I/O interrupts. On lesser boxes, the CPU does both halves.

 It seems to me this might give a cleaner design than one where the work
 is partitioned between devices at some other boundary.

 If the locks you need between top and bottom halves of the driver are also
 controlling most or all CPU-to-I2O communication, it might go some way
 toward avoiding the "locking cliff" McVoy talks of.

A small cheap processor to do this with would be the ETRAX 100LX (LX = Linux)
Put an ETRAX100LX (integrated IDE, ethernet, and ...) on an IDE controller.
Telnet / SSH to your PCI boards :-)

Cheapest possible system might be one without a main CPU...
It would be possible to rebalance where to create the interface over time.

/RogerL


-- 
Roger Larsson
Skellefte
Sweden
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux connectivity trashed.

2001-03-29 Thread Roger Larsson


Hi,

I assume that it is ok to sue any company that forwards viruses too...
(not only the author...)

Are Raytheon suing the company were you work, or some
unknown/unnamed company made up by Microsoft?
(you were not specific about this)

/RogerL

On Thursday 29 March 2001 15:34, Richard B. Johnson wrote:
> This is for information only.
>
> Last week a standard RH distribution of  Linux was rooted from what looks
> like a Russian invasion. The penetration used the method taught in the CERT
> Advisory CA-2000-17.
>
> The intruder(s) then attempted  to perform additional penetrations from
> this site. One of  the sites attacked was alleged to be Raytheon. Raytheon
> makes products for national security such as guided missiles.
>
> I was told that Raytheon is now suing this company.  Therefore all Linux
> machines
> are being denied access to the Internet.

>
> The penetration occurred because somebody changed our  firewall
> configuration
> so that all of the non-DHCP addresses, i.e., all the real IP addresses had
> complete
> connectivity to the outside world. This meant that every Linux and Sun
> Workstation
> in this facility was exposed to tampering from anywhere in the world. This
> appears
> to be part of a plan to remove all non-DHCP machines by getting them
> trashed.
> In other words, we were set up to take a hard fall because no machine that
> allows
> NFS mounts  can be safely exposed to the outside world without blocking
> portmap.
>
> There is a concerted effort to eliminate both Sun Workstations and Linux
> machines
> as tools in this facility. This happens as the "yuppies",  who have never,
> ever, contributed
> to product development are Peter-Principled into positions of authority.
>
> The email addresses of  those who have declared that only Windows machines
> will
> be allowed access to the outside world are:
>
> Thor T. Wallace   [EMAIL PROTECTED]
> David Pothier   [EMAIL PROTECTED]
>
> David Pothier was a beta tester for Windows/NT. Of course he wants all
> machines to
> be Windows and,  naturally, under his control.
>
> Thor Wallace is our new "security" administrator so I am told.
>
> The only  Linux  advocate in a position of authority is:
>
>Alex Shekhel[EMAIL PROTECTED]
>
> So,  now I hooked up my lap-top,  installed Windows and here I am. 
> Only windows
> machines are allowed to access the outside world.
>
>
> Cheers,
>
> Richard B. Johnson
> Formally [EMAIL PROTECTED]
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Roger Larsson
Skellefteå
Sweden
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux connectivity trashed.

2001-03-29 Thread Roger Larsson


Hi,

I assume that it is ok to sue any company that forwards viruses too...
(not only the author...)

Are Raytheon suing the company were you work, or some
unknown/unnamed company made up by Microsoft?
(you were not specific about this)

/RogerL

On Thursday 29 March 2001 15:34, Richard B. Johnson wrote:
 This is for information only.

 Last week a standard RH distribution of  Linux was rooted from what looks
 like a Russian invasion. The penetration used the method taught in the CERT
 Advisory CA-2000-17.

 The intruder(s) then attempted  to perform additional penetrations from
 this site. One of  the sites attacked was alleged to be Raytheon. Raytheon
 makes products for national security such as guided missiles.

 I was told that Raytheon is now suing this company.  Therefore all Linux
 machines
 are being denied access to the Internet.


 The penetration occurred because somebody changed our  firewall
 configuration
 so that all of the non-DHCP addresses, i.e., all the real IP addresses had
 complete
 connectivity to the outside world. This meant that every Linux and Sun
 Workstation
 in this facility was exposed to tampering from anywhere in the world. This
 appears
 to be part of a plan to remove all non-DHCP machines by getting them
 trashed.
 In other words, we were set up to take a hard fall because no machine that
 allows
 NFS mounts  can be safely exposed to the outside world without blocking
 portmap.

 There is a concerted effort to eliminate both Sun Workstations and Linux
 machines
 as tools in this facility. This happens as the "yuppies",  who have never,
 ever, contributed
 to product development are Peter-Principled into positions of authority.

 The email addresses of  those who have declared that only Windows machines
 will
 be allowed access to the outside world are:

 Thor T. Wallace   [EMAIL PROTECTED]
 David Pothier   [EMAIL PROTECTED]

 David Pothier was a beta tester for Windows/NT. Of course he wants all
 machines to
 be Windows and,  naturally, under his control.

 Thor Wallace is our new "security" administrator so I am told.

 The only  Linux  advocate in a position of authority is:

Alex Shekhel[EMAIL PROTECTED]

 So,  now I hooked up my lap-top,  installed Windows and here I am. 
 Only windows
 machines are allowed to access the outside world.


 Cheers,

 Richard B. Johnson
 Formally [EMAIL PROTECTED]



 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
Roger Larsson
Skellefte
Sweden
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH for 2.5] preemptible kernel

2001-03-20 Thread Roger Larsson


Hi,

One little readability thing I found.
The prev->state TASK_ value is mostly used as a plain value
but the new TASK_PREEMPTED is or:ed together with whatever was there.
Later when we switch to check the state it is checked against TASK_PREEMPTED
only. Since TASK_RUNNING is 0 it works OK but...

--- sched.c.nigel   Tue Mar 20 18:52:43 2001
+++ sched.c.roger   Tue Mar 20 19:03:28 2001
@@ -553,7 +553,7 @@
 #endif
del_from_runqueue(prev);
 #ifdef CONFIG_PREEMPT
-   case TASK_PREEMPTED:
+   case TASK_RUNNING | TASK_PREEMPTED:
 #endif
case TASK_RUNNING:
}


We could add all/(other common) combinations as cases 

switch (prev->state) {
case TASK_INTERRUPTIBLE:
if (signal_pending(prev)) {
prev->state = TASK_RUNNING;
break;
}
default:
#ifdef CONFIG_PREEMPT
if (prev->state & TASK_PREEMPTED)
break;
#endif
del_from_runqueue(prev);
#ifdef CONFIG_PREEMPT
case TASK_RUNNING   | TASK_PREEMPTED:
case TASK_INTERRUPTIBLE | TASK_PREEMPTED:
case TASK_UNINTERRUPTIBLE   | TASK_PREEMPTED:
#endif
case TASK_RUNNING:
}


Then the break in default case could almost be replaced with a BUG()...
(I have not checked the generated code)

/RogerL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH for 2.5] preemptible kernel

2001-03-20 Thread Roger Larsson


Hi,

One little readability thing I found.
The prev-state TASK_ value is mostly used as a plain value
but the new TASK_PREEMPTED is or:ed together with whatever was there.
Later when we switch to check the state it is checked against TASK_PREEMPTED
only. Since TASK_RUNNING is 0 it works OK but...

--- sched.c.nigel   Tue Mar 20 18:52:43 2001
+++ sched.c.roger   Tue Mar 20 19:03:28 2001
@@ -553,7 +553,7 @@
 #endif
del_from_runqueue(prev);
 #ifdef CONFIG_PREEMPT
-   case TASK_PREEMPTED:
+   case TASK_RUNNING | TASK_PREEMPTED:
 #endif
case TASK_RUNNING:
}


We could add all/(other common) combinations as cases 

switch (prev-state) {
case TASK_INTERRUPTIBLE:
if (signal_pending(prev)) {
prev-state = TASK_RUNNING;
break;
}
default:
#ifdef CONFIG_PREEMPT
if (prev-state  TASK_PREEMPTED)
break;
#endif
del_from_runqueue(prev);
#ifdef CONFIG_PREEMPT
case TASK_RUNNING   | TASK_PREEMPTED:
case TASK_INTERRUPTIBLE | TASK_PREEMPTED:
case TASK_UNINTERRUPTIBLE   | TASK_PREEMPTED:
#endif
case TASK_RUNNING:
}


Then the break in default case could almost be replaced with a BUG()...
(I have not checked the generated code)

/RogerL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel stress testing coverage

2001-03-08 Thread Roger Larsson


Hi,

Here is a link to some memory usage related test programs:

  http://carpanta.dc.fi.udc.es/~quintela/memtest/

They have proven their value many times...


/RogerL

On Thursday 08 March 2001 21:57, Paul Larson wrote:
> Alan Cox <[EMAIL PROTECTED]> on 03/08/2001 02:06:06 PM
>
> To:   Paul Larson/Austin/IBM@ibmus
> cc:
> Subject:  Re: Kernel stress testing coverage
>
> >One thing I've been using for coverage (at least some coverage) is the
>
> posix
>
> >test suite
>
> --
>
> Are you talking about the same posix test suite that LSB is using?  I've
> looked into that a little, but here are the two problems I'm wanting to
> address:
>
> 1. How much of the kernel is getting hit on a run of any given test?  Even
> an approximate percentage is fine as long as I can prove it.
>
> 2. I could run many many copies simultaneously I suppose and get some
> stress, but I'd prefer to stress individual pieces one at a time.  Those
> pieces could then be mixed together in later runs for mixed load stress.
> Additional mixed load tests will be performed with general applications
> (web servers, databases, etc) for more of a "real world" environment, but I
> want to have focused tests as well.
>
> I'm betting that there are probably a LOT of quick and dirty test programs
> that kernel hackers have written to expose a problem or thoroughly test a
> piece of the kernel that they modified.  These type of things would be
> FYI this project will be going on sourceforge very soon.  I want to have a
> little more to start out with though and finish putting together a good
> project description, testplans, etc. to post as soon as we put it on there.
> I hate it when people start projects and you don't see any good information
> about it for weeks.
>
> Thanks,
> Paul Larson
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Home page:
  none currently
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Kernel stress testing coverage

2001-03-08 Thread Roger Larsson


Hi,

Here is a link to some memory usage related test programs:

  http://carpanta.dc.fi.udc.es/~quintela/memtest/

They have proven their value many times...


/RogerL

On Thursday 08 March 2001 21:57, Paul Larson wrote:
 Alan Cox [EMAIL PROTECTED] on 03/08/2001 02:06:06 PM

 To:   Paul Larson/Austin/IBM@ibmus
 cc:
 Subject:  Re: Kernel stress testing coverage

 One thing I've been using for coverage (at least some coverage) is the

 posix

 test suite

 --

 Are you talking about the same posix test suite that LSB is using?  I've
 looked into that a little, but here are the two problems I'm wanting to
 address:

 1. How much of the kernel is getting hit on a run of any given test?  Even
 an approximate percentage is fine as long as I can prove it.

 2. I could run many many copies simultaneously I suppose and get some
 stress, but I'd prefer to stress individual pieces one at a time.  Those
 pieces could then be mixed together in later runs for mixed load stress.
 Additional mixed load tests will be performed with general applications
 (web servers, databases, etc) for more of a "real world" environment, but I
 want to have focused tests as well.

 I'm betting that there are probably a LOT of quick and dirty test programs
 that kernel hackers have written to expose a problem or thoroughly test a
 piece of the kernel that they modified.  These type of things would be
 FYI this project will be going on sourceforge very soon.  I want to have a
 little more to start out with though and finish putting together a good
 project description, testplans, etc. to post as soon as we put it on there.
 I hate it when people start projects and you don't see any good information
 about it for weeks.

 Thanks,
 Paul Larson

 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
Home page:
  none currently
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Where is my memory

2001-03-01 Thread Roger Larsson


Hi,

This is interesting.

I have found out that my freezes most oftenly happen on cold boot.
At cold boot the locatedb is run...

I have added IKD...

/RogerL


On Thursday 01 March 2001 09:39, Uwe Bonnes wrote:
> Hallo,
>
> on two systems with 2.4.2. (actually the suse tree from Hubert mantel at
> ftp://ftp.suse.com/people/mantel/next) on a single/dual celeron machine
> with 256/384 MByte Memory all show increased memory consumption after the
> daily locatedb run.
>
> Here is the output in that situation after a shutdown to single user mode
> ("init S"):
>
> (cat /proc/meminfo)
>
> total:used:free:  shared: buffers:  cached:
> Mem:  261672960 217587712 440852480  7565312 87621632
> Swap: 5372149760 537214976
> MemTotal:   255540 kB
> MemFree: 43052 kB
> MemShared:   0 kB
> Buffers:  7388 kB
> Cached:  85568 kB
> Active:   6204 kB
> Inact_dirty: 75768 kB
> Inact_clean: 10984 kB
> Inact_target:8 kB
> HighTotal:   0 kB
> HighFree:0 kB
> LowTotal:   255540 kB
> LowFree: 43052 kB
> SwapTotal:  524624 kB
> SwapFree:   524624 kB
>
> (ps axl)
>
>   F   UID   PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTYTIME
> COMMAND 100 0 1 0   9   0   404  220 do_sel S? 
> 0:04 init 040 0 2 1   9   0 00 contex SW   ? 
> 0:00 [keventd] 040 0 3 1   9   0 00 kswapd SW   ?  
>0:00 [kswapd] 040 0 4 1   9   0 00 krecla SW   ?
>  0:00 [kreclaimd] 040 0 5 1   9   0 00 bdflus SW  
> ?  0:00 [bdflush] 040 0 6 1   9   0 00 kupdat
> SW   ?  0:01 [kupdate] 040 037 1   9   0 00
> reiser SW   ?  0:03 [kreiserfsd] 000 0 13559 1  14   0 
> 2404 1392 wait4  Stty2   0:00 bash 000 0 13565 13559  18   0 
> 2980 1232 -  Rtty2   0:00 ps axl
>
> Any idea what's going on here?
>
> Thanks

-- 
Home page:
  none currently
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Where is my memory

2001-03-01 Thread Roger Larsson


Hi,

This is interesting.

I have found out that my freezes most oftenly happen on cold boot.
At cold boot the locatedb is run...

I have added IKD...

/RogerL


On Thursday 01 March 2001 09:39, Uwe Bonnes wrote:
 Hallo,

 on two systems with 2.4.2. (actually the suse tree from Hubert mantel at
 ftp://ftp.suse.com/people/mantel/next) on a single/dual celeron machine
 with 256/384 MByte Memory all show increased memory consumption after the
 daily locatedb run.

 Here is the output in that situation after a shutdown to single user mode
 ("init S"):

 (cat /proc/meminfo)

 total:used:free:  shared: buffers:  cached:
 Mem:  261672960 217587712 440852480  7565312 87621632
 Swap: 5372149760 537214976
 MemTotal:   255540 kB
 MemFree: 43052 kB
 MemShared:   0 kB
 Buffers:  7388 kB
 Cached:  85568 kB
 Active:   6204 kB
 Inact_dirty: 75768 kB
 Inact_clean: 10984 kB
 Inact_target:8 kB
 HighTotal:   0 kB
 HighFree:0 kB
 LowTotal:   255540 kB
 LowFree: 43052 kB
 SwapTotal:  524624 kB
 SwapFree:   524624 kB

 (ps axl)

   F   UID   PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTYTIME
 COMMAND 100 0 1 0   9   0   404  220 do_sel S? 
 0:04 init 040 0 2 1   9   0 00 contex SW   ? 
 0:00 [keventd] 040 0 3 1   9   0 00 kswapd SW   ?  
0:00 [kswapd] 040 0 4 1   9   0 00 krecla SW   ?
  0:00 [kreclaimd] 040 0 5 1   9   0 00 bdflus SW  
 ?  0:00 [bdflush] 040 0 6 1   9   0 00 kupdat
 SW   ?  0:01 [kupdate] 040 037 1   9   0 00
 reiser SW   ?  0:03 [kreiserfsd] 000 0 13559 1  14   0 
 2404 1392 wait4  Stty2   0:00 bash 000 0 13565 13559  18   0 
 2980 1232 -  Rtty2   0:00 ps axl

 Any idea what's going on here?

 Thanks

-- 
Home page:
  none currently
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fwd: [pre PATCH] freezes]

2001-02-28 Thread Roger Larsson


On the numlock issue...

There was a week when I got no freezes, that was after 2.4.2-pre1
so I looked around and tried to find the reason (2.4.1 freezed)
The only thing I could find was this...

But now they have reappeared!!!

I have started to use xosview on top to get a feel for what is happening.
The situation after freeze has been similar (2 captures with xosview this far
one with KTimeMon

_Programs running_
term (konsole) running
xosview
kppp
kmail
konqueror
(started in quickly one after another)

Load above 1 (latest CPU0 has been different 100% and 9%)
MEM91M (both times!)
SWAP 0  (could be a rounding problem in xosview)
PAGE 0
INTS 0,4 (one time 15 the other time 12)

The MEM & SWAP is interesting.
I have 96M RAM. Lots and lots of swap space
"Adding Swap: 530136k swap-space (priority -1)
 Adding Swap: 133048k swap-space (priority -2)"
Cache was about as big as USED+SHARE, BUFF only
a few precent about the size of FREE.

When the freeze happened MEM was rising, I wonder if
I got the freeze when hitting swap...

I retry with wider xosview...

/RogerL

On Wednesday 28 February 2001 13:04, Andrew Morton wrote:
> Alan, this 2.2 patch seems very sane to me.
>
> If we take a page fault in (say) the get_user() in write_chan()
> then handle_mm_fault() will run in state TASK_INTERRUPTIBLE.  If
> a filesystem calls schedule() without setting task->state it's
> lights out.  reiserfs_get_block() is one such.
>
> It can't hurt.
>
> But I don't see why this should cause numlock to
> stop working...
>
>
>
>  Original Message 
> Subject: [pre PATCH] freezes
> Date: Thu, 15 Feb 2001 15:29:12 +0100
> From: Roger Larsson <[EMAIL PROTECTED]>
> To: Linux Kernel Mailing List <[EMAIL PROTECTED]>
>
>
> --Boundary-00=_OWYSLVSP7YK356P9A2LT
> Content-Type: text/plain;
>   charset="iso-8859-1"
> Content-Transfer-Encoding: 8bit
>
> Hi,
>
> I have had occasional freezes (complete NumLock won't work) for some time.
> I blamed HW, irq conflicts, temperature problems, ...
>
> But suddenly with 2.4.2-pre1 the problems disappeared!
>
> Since 2.4.2-pre1 was rather short I took the time to try to find out what
> could be the fix.
>
> I found one candidate, the setting of  TASK_RUNNING in handle_mm_fault.
>
> Since the problem had appeared on both 2.4 and 2.2.18 I started to try to
> reproduce the problem in an unpatched 2.2 - it took some time, got the
> freeze today.
>
> During this time I have tried to collect information of the freezes on KDE
> mailing lists - I do now have three additional reports (one running 2.2.17)
> Hardware has varied.
>
> I have now compiled and installed this patch but since it can't be proven
> to fix the problem I submit it now.
>
> /RogerL

-- 
Home page:
  none currently
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fwd: [pre PATCH] freezes]

2001-02-28 Thread Roger Larsson


On the numlock issue...

There was a week when I got no freezes, that was after 2.4.2-pre1
so I looked around and tried to find the reason (2.4.1 freezed)
The only thing I could find was this...

But now they have reappeared!!!

I have started to use xosview on top to get a feel for what is happening.
The situation after freeze has been similar (2 captures with xosview this far
one with KTimeMon

_Programs running_
term (konsole) running
xosview
kppp
kmail
konqueror
(started in quickly one after another)

Load above 1 (latest CPU0 has been different 100% and 9%)
MEM91M (both times!)
SWAP 0  (could be a rounding problem in xosview)
PAGE 0
INTS 0,4 (one time 15 the other time 12)

The MEM  SWAP is interesting.
I have 96M RAM. Lots and lots of swap space
"Adding Swap: 530136k swap-space (priority -1)
 Adding Swap: 133048k swap-space (priority -2)"
Cache was about as big as USED+SHARE, BUFF only
a few precent about the size of FREE.

When the freeze happened MEM was rising, I wonder if
I got the freeze when hitting swap...

I retry with wider xosview...

/RogerL

On Wednesday 28 February 2001 13:04, Andrew Morton wrote:
 Alan, this 2.2 patch seems very sane to me.

 If we take a page fault in (say) the get_user() in write_chan()
 then handle_mm_fault() will run in state TASK_INTERRUPTIBLE.  If
 a filesystem calls schedule() without setting task-state it's
 lights out.  reiserfs_get_block() is one such.

 It can't hurt.

 But I don't see why this should cause numlock to
 stop working...



  Original Message 
 Subject: [pre PATCH] freezes
 Date: Thu, 15 Feb 2001 15:29:12 +0100
 From: Roger Larsson [EMAIL PROTECTED]
 To: Linux Kernel Mailing List [EMAIL PROTECTED]


 --Boundary-00=_OWYSLVSP7YK356P9A2LT
 Content-Type: text/plain;
   charset="iso-8859-1"
 Content-Transfer-Encoding: 8bit

 Hi,

 I have had occasional freezes (complete NumLock won't work) for some time.
 I blamed HW, irq conflicts, temperature problems, ...

 But suddenly with 2.4.2-pre1 the problems disappeared!

 Since 2.4.2-pre1 was rather short I took the time to try to find out what
 could be the fix.

 I found one candidate, the setting of  TASK_RUNNING in handle_mm_fault.

 Since the problem had appeared on both 2.4 and 2.2.18 I started to try to
 reproduce the problem in an unpatched 2.2 - it took some time, got the
 freeze today.

 During this time I have tried to collect information of the freezes on KDE
 mailing lists - I do now have three additional reports (one running 2.2.17)
 Hardware has varied.

 I have now compiled and installed this patch but since it can't be proven
 to fix the problem I submit it now.

 /RogerL

-- 
Home page:
  none currently
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

2001-02-20 Thread Roger Larsson


On Tuesday 20 February 2001 22:21, Colonel wrote:
>From: "Tom Sightler" <[EMAIL PROTECTED]>
>Cc: <[EMAIL PROTECTED]>
>Date: Tue, 20 Feb 2001 14:43:07 -0500
>Content-Type: text/plain;
>  charset="iso-8859-1"
>
>>> >I'm building a firewall on a P133 with 48 MB of memory using RH
>>> > 7.0, latest updates, etc. and kernel 2.4.1.
>>> >I've built a customized install of RH (~200MB)  which I untar
>>> > onto
>
>the
>
>>> >system after building my raid arrays, etc. via a Rescue CD which
>>> > I created using Timo's Rescue CD project.  The booting kernel
>>> > is 2.4.1-ac10, no networking, raid compiled in but raid1 as a
>>> > module
>>>
>>> Hmm, raid as a module was always a Bad Idea(tm) in the 2.2
>>> "alpha" raid (which was misnamed and is 2.4 raid).  I suggest you
>>> change that and update, as I had no problems with 2.4.2-pre2/3,
>>> nor have any been posted to the raid list.
>>
>>I just tried with 2.4.1-ac14, raid and raid1 compiled in and it did
>> the same thing.  I'm going to try to compile reiserfs in (if I have
>> enough
>
>room
>
>>to still fit the kernel on the floppy with it's initial ramdisk,
>> etc.)
>
>and
>
>>see what that does.
>
>There seem to be several reports of reiserfs falling over when memory is
>low.  It seems to be undetermined if this problem is actually reiserfs
> or MM related, but there are other threads on this list regarding similar
> issues. This would explain why the same disk would work on a different
> machine with more memory.  Any chance you could add memory to the box
> temporarily just to see if it helps, this may help prove if this is the
> problem or not.
>
>
> Well, I didn't happen to start the thread, but your comments may
> explain some "gee I wonder if it died" problems I just had with my
> 2.4.1-pre2+reiser test box.  It only has 16M, so it's always low
> memory (never been a real problem in the past however).  The test
> situation is easily repeatable for me [1].  It's a 486 wall mount, so
> it's easier to convert the fs than add memory, and it showed about
> 200k free at the time of the sluggishness.  Previous 2.4.1 testing
> with ext2 fs didn't show any sluggishness, but I also didn't happen to
> run the test above either.  When I come back to the office later, I'll
> convert the fs, repeat the test and pass on the results.
>
>
> [1]  Since I decided to try to catch up on kernels, I had just grabbed
> -ac18, cd to ~linux and run "rm -r *" via an ssh connection.  In a
> second connection, I tried a simple "dmesg" and waited over a minute
> for results (long enough to log in directly on the box and bring up
> top) followed by loading emacs for ftp transfers from kernel.org,
> which again 'went to sleep'.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

If these are freezes I had them too in 2.4.1, 2.4.2-pre1 fixed it for me.
Really I think it was the patch in handle_mm_fault setting TASK_RUNNING.

/RogerL

-- 
Home page:
  none currently
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

2001-02-20 Thread Roger Larsson

On Tuesday 20 February 2001 22:21, Colonel wrote:
From: "Tom Sightler" [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Date: Tue, 20 Feb 2001 14:43:07 -0500
Content-Type: text/plain;
  charset="iso-8859-1"

 I'm building a firewall on a P133 with 48 MB of memory using RH
  7.0, latest updates, etc. and kernel 2.4.1.
 I've built a customized install of RH (~200MB)  which I untar
  onto

the

 system after building my raid arrays, etc. via a Rescue CD which
  I created using Timo's Rescue CD project.  The booting kernel
  is 2.4.1-ac10, no networking, raid compiled in but raid1 as a
  module

 Hmm, raid as a module was always a Bad Idea(tm) in the 2.2
 "alpha" raid (which was misnamed and is 2.4 raid).  I suggest you
 change that and update, as I had no problems with 2.4.2-pre2/3,
 nor have any been posted to the raid list.

I just tried with 2.4.1-ac14, raid and raid1 compiled in and it did
 the same thing.  I'm going to try to compile reiserfs in (if I have
 enough

room

to still fit the kernel on the floppy with it's initial ramdisk,
 etc.)

and

see what that does.

There seem to be several reports of reiserfs falling over when memory is
low.  It seems to be undetermined if this problem is actually reiserfs
 or MM related, but there are other threads on this list regarding similar
 issues. This would explain why the same disk would work on a different
 machine with more memory.  Any chance you could add memory to the box
 temporarily just to see if it helps, this may help prove if this is the
 problem or not.

 Well, I didn't happen to start the thread, but your comments may
 explain some "gee I wonder if it died" problems I just had with my
 2.4.1-pre2+reiser test box.  It only has 16M, so it's always low
 memory (never been a real problem in the past however).  The test
 situation is easily repeatable for me [1].  It's a 486 wall mount, so
 it's easier to convert the fs than add memory, and it showed about
 200k free at the time of the sluggishness.  Previous 2.4.1 testing
 with ext2 fs didn't show any sluggishness, but I also didn't happen to
 run the test above either.  When I come back to the office later, I'll
 convert the fs, repeat the test and pass on the results.

 [1]  Since I decided to try to catch up on kernels, I had just grabbed
 -ac18, cd to ~linux and run "rm -r *" via an ssh connection.  In a
 second connection, I tried a simple "dmesg" and waited over a minute
 for results (long enough to log in directly on the box and bring up
 top) followed by loading emacs for ftp transfers from kernel.org,
 which again 'went to sleep'.
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

If these are freezes I had them too in 2.4.1, 2.4.2-pre1 fixed it for me.
Really I think it was the patch in handle_mm_fault setting TASK_RUNNING.

/RogerL

-- 
Home page:
  none currently
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[pre PATCH] freezes

2001-02-15 Thread Roger Larsson


Hi,

I have had occasional freezes (complete NumLock won't work) for some time.
I blamed HW, irq conflicts, temperature problems, ...

But suddenly with 2.4.2-pre1 the problems disappeared!

Since 2.4.2-pre1 was rather short I took the time to try to find out what 
could be the fix.

I found one candidate, the setting of  TASK_RUNNING in handle_mm_fault.

Since the problem had appeared on both 2.4 and 2.2.18 I started to try to 
reproduce the problem in an unpatched 2.2 - it took some time, got the freeze
today.

During this time I have tried to collect information of the freezes on KDE 
mailing lists - I do now have three additional reports (one running 2.2.17)
Hardware has varied.

I have now compiled and installed this patch but since it can't be proven
to fix the problem I submit it now.

/RogerL

-- 
Home page:
  none currently


--- linux/mm/memory.c.orig  Wed Feb 14 00:58:59 2001
+++ linux/mm/memory.c   Wed Feb 14 00:59:16 2001
@@ -935,6 +935,7 @@
pte_t * pte;
int ret;
 
+   current->state = TASK_RUNNING;
pgd = pgd_offset(vma->vm_mm, address);
pmd = pmd_alloc(pgd, address);
if (!pmd)

[pre PATCH] freezes

2001-02-15 Thread Roger Larsson


Hi,

I have had occasional freezes (complete NumLock won't work) for some time.
I blamed HW, irq conflicts, temperature problems, ...

But suddenly with 2.4.2-pre1 the problems disappeared!

Since 2.4.2-pre1 was rather short I took the time to try to find out what 
could be the fix.

I found one candidate, the setting of  TASK_RUNNING in handle_mm_fault.

Since the problem had appeared on both 2.4 and 2.2.18 I started to try to 
reproduce the problem in an unpatched 2.2 - it took some time, got the freeze
today.

During this time I have tried to collect information of the freezes on KDE 
mailing lists - I do now have three additional reports (one running 2.2.17)
Hardware has varied.

I have now compiled and installed this patch but since it can't be proven
to fix the problem I submit it now.

/RogerL

-- 
Home page:
  none currently


--- linux/mm/memory.c.orig  Wed Feb 14 00:58:59 2001
+++ linux/mm/memory.c   Wed Feb 14 00:59:16 2001
@@ -935,6 +935,7 @@
pte_t * pte;
int ret;
 
+   current-state = TASK_RUNNING;
pgd = pgd_offset(vma-vm_mm, address);
pmd = pmd_alloc(pgd, address);
if (!pmd)

test

2001-02-13 Thread Roger Larsson


test
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

test

2001-02-13 Thread Roger Larsson


test
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

test

2001-02-10 Thread Roger Larsson


OK, you had to...

I have not seen any emails from linux-kernel for some days.
Even tried to resubscribe - Majordomo succeeded in sending me the Confirmation

But nothing...

So I have to try this...

/RogerL

(I am subscribed as [EMAIL PROTECTED])

-- 
Home page:
  none currently
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

test

2001-02-10 Thread Roger Larsson


OK, you had to...

I have not seen any emails from linux-kernel for some days.
Even tried to resubscribe - Majordomo succeeded in sending me the Confirmation

But nothing...

So I have to try this...

/RogerL

(I am subscribed as [EMAIL PROTECTED])

-- 
Home page:
  none currently
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Latency: allowing resheduling while holding spin_locks

2001-01-15 Thread Roger Larsson


On Sunday 14 January 2001 01:06, george anzinger wrote:
> Nigel Gamble wrote:
> > On Sat, 13 Jan 2001, Roger Larsson wrote:
> > > A rethinking of the rescheduling strategy...
> >
> > Actually, I think you have more-or-less described how successful
> > preemptible kernels have already been developed, given that your
> > "sleeping spin locks" are really just sleeping mutexes (or binary
> > semaphores).
> >
> > 1.  Short critical regions are protected by spin_lock_irq().  The maximum
> > value of "short" is therefore bounded by the maximum time we are happy
> > to disable (local) interrupts - ideally ~100us.
> >
> > 2.  Longer regions are protected by sleeping mutexes.
> >
> > 3.  Algorithms are rearchitected until all of the highly contended locks
> > are of type 1, and only low contention locks are of type 2.
> >
> > This approach has the advantage that we don't need to use a no-preempt
> > count, and test it on exit from every spinlock to see if a preempting
> > interrupt that has caused a need_resched has occurred, since we won't
> > see the interrupt until it's safe to do the preemptive resched.
>
> I agree that this was true in days of yore.  But these days the irq
> instructions introduce serialization points and, me thinks, may be much
> more time consuming than the "++, --, if (false)" that a preemption
> count implemtation introduces.  Could some one with a knowledge of the
> hardware comment on this?
>
> I am not suggesting that the "++, --, if (false)" is faster than an
> interrupt, but that it is faster than cli, sti.  Of course we are
> assuming that there is  between the cli and the sti as there is
> between the ++ and the -- if (false).
>

The problem with counting scheme is that you can not schedule inside any
spinlock - you have to split them up. Maybe you will have to do that anyway.
But if your RT process never needs more memory - it should be quite safe.

The difference with a sleeping mutex is that it can be made lazier - keep it
in the runlist, there should be very few...

See first patch attempt.

(George, Nigel told me about your idea before I sent the previous mail. So 
major influence comes from you. But I do not think that it is equivalent)

/RogerL

Note: changed email...

--- ./linux/kernel/sched.c.orig	Sat Jan 13 19:19:20 2001
+++ ./linux/kernel/sched.c	Sat Jan 13 23:27:13 2001
@@ -144,7 +144,7 @@
 	 * Also, dont trigger a counter recalculation.
 	 */
 	weight = -1;
-	if (p->policy & SCHED_YIELD)
+	if (p->policy & (SCHED_YIELD | SCHED_SPINLOCK))
 		goto out;
 
 	/*
@@ -978,7 +978,7 @@
 	read_lock(_lock);
 	p = find_process_by_pid(pid);
 	if (p)
-		retval = p->policy & ~SCHED_YIELD;
+		retval = p->policy & ~(SCHED_YIELD | SCHED_SPINLOCK);
 	read_unlock(_lock);
 
 out_nounlock:
@@ -1267,3 +1267,54 @@
 	atomic_inc(_mm.mm_count);
 	enter_lazy_tlb(_mm, current, cpu);
 }
+
+void wakeup_spinlock_yielder(spinlock_t *lock)
+{
+	int need_resched = 0;
+	struct list_head *tmp;
+	struct task_struct *p;
+	
+	/* I do not like this part...
+	*   not SMP safe, the runqueue might change under us...
+	*   can not use spinlocks...
+	*   runlist might be long...
+	*/
+	local_irqsave();
+	if (lock->spin) {
+		/* someone is "spinning" on it
+		 * it has to have higher prio than this
+		 * let go of ALL :-( spinning processes
+		 */
+		lock->spin = 0;
+
+		list_for_each(tmp, _head) {
+			p = list_entry(tmp, struct task_struct, run_list);
+			if (p->policy & SCHED_SPINLOCK) {
+p->policy &= ~SCHED_SPINLOCK;
+			}
+		}
+
+		need_resched = 1;
+	}
+	local_irqrestore();
+
+	/* all higher prio will get a chance to run... */
+	if (need_resched)
+		schedule_running();
+}
+
+void schedule_spinlock(spinlock_t *lock)
+{
+	while (test_and_set(lock->lock)) {
+		/* note: owner can not race here, it has lower prio */
+
+		lock->spinon = 1;
+		p->policy |= SCHED_SPINLOCK;
+		schedule_running();
+		/* will be released in priority order */
+	}
+}
+
+
+
+
--- ./linux/include/linux/sched.h.orig	Sat Jan 13 19:25:53 2001
+++ ./linux/include/linux/sched.h	Sat Jan 13 19:26:31 2001
@@ -119,6 +119,7 @@
  * yield the CPU for one re-schedule..
  */
 #define SCHED_YIELD		0x10
+#define SCHED_SPINLOCK  0x20
 
 struct sched_param {
 	int sched_priority;
--- ./linux/include/linux/spinlock.h.orig	Sat Jan 13 19:40:30 2001
+++ ./linux/include/linux/spinlock.h	Sat Jan 13 21:51:14 2001
@@ -66,16 +66,37 @@
 
 typedef struct {
 	volatile unsigned long lock;
+	??? queue;
 } spinlock_t;
 #define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 }
 
+void wakeup_spinlock_yielder(spinlock_t *lock);
+void schedule_spinlock(spinlock_t *lock);
+
 #define spin_lock_init(x)	do { (x)->lock = 0; } while (0)
 #define spin_is_locked(lock)	(test_bit(0,(lock)))
-#de

Re: [Marcel Weber ] re:Adaptec AIC7xxx version 6.08BETA release

2001-01-15 Thread Roger Larsson


On Friday 12 January 2001 10:33, Marcel Weber wrote:
> SuSE Linux 7.0, Kernel 2.4.0
>
> Adaptec 3950U2
> Adaptec 2940
>
>
> Although the kernel is complaining about the following things:
>
> kernel: scsi0: PCI error Interrupt at seqaddr= 0x4e
> kernel: scsi0: Data Parity Error Detected during address or write
> data phase
> ...
>
> This is compared to the original drivers already a incredible
> change: Those freezed my system after some time (something that did
> not happen before I upgraded from a K6-2 to a K6-2+: Apparently the
> old driver is working with loops or something)
>

Hmm.. I start wondering if this is what I see too..
Both 2.2.18 and 2.4.0 hangs for some reason that I have not been able to 
trace down - Saturday I tried to remove all PCI cards but my 3dfx and AIC7xxx

  00:0f.0 SCSI storage controller: Adaptec AHA-7850 (rev 01) 

Has some driver been ported between 2.2 and 2.4 series recently ?
I have not seen this problem before...

I will try the new driver too...

/RogerL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [Marcel Weber mmweber@ncpro.com] re:Adaptec AIC7xxx version 6.08BETA release

2001-01-15 Thread Roger Larsson


On Friday 12 January 2001 10:33, Marcel Weber wrote:
 SuSE Linux 7.0, Kernel 2.4.0

 Adaptec 3950U2
 Adaptec 2940


 Although the kernel is complaining about the following things:

 kernel: scsi0: PCI error Interrupt at seqaddr= 0x4e
 kernel: scsi0: Data Parity Error Detected during address or write
 data phase
 ...

 This is compared to the original drivers already a incredible
 change: Those freezed my system after some time (something that did
 not happen before I upgraded from a K6-2 to a K6-2+: Apparently the
 old driver is working with loops or something)


Hmm.. I start wondering if this is what I see too..
Both 2.2.18 and 2.4.0 hangs for some reason that I have not been able to 
trace down - Saturday I tried to remove all PCI cards but my 3dfx and AIC7xxx

  00:0f.0 SCSI storage controller: Adaptec AHA-7850 (rev 01) 

Has some driver been ported between 2.2 and 2.4 series recently ?
I have not seen this problem before...

I will try the new driver too...

/RogerL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Latency: allowing resheduling while holding spin_locks

2001-01-15 Thread Roger Larsson


On Sunday 14 January 2001 01:06, george anzinger wrote:
 Nigel Gamble wrote:
  On Sat, 13 Jan 2001, Roger Larsson wrote:
   A rethinking of the rescheduling strategy...
 
  Actually, I think you have more-or-less described how successful
  preemptible kernels have already been developed, given that your
  "sleeping spin locks" are really just sleeping mutexes (or binary
  semaphores).
 
  1.  Short critical regions are protected by spin_lock_irq().  The maximum
  value of "short" is therefore bounded by the maximum time we are happy
  to disable (local) interrupts - ideally ~100us.
 
  2.  Longer regions are protected by sleeping mutexes.
 
  3.  Algorithms are rearchitected until all of the highly contended locks
  are of type 1, and only low contention locks are of type 2.
 
  This approach has the advantage that we don't need to use a no-preempt
  count, and test it on exit from every spinlock to see if a preempting
  interrupt that has caused a need_resched has occurred, since we won't
  see the interrupt until it's safe to do the preemptive resched.

 I agree that this was true in days of yore.  But these days the irq
 instructions introduce serialization points and, me thinks, may be much
 more time consuming than the "++, --, if (false)" that a preemption
 count implemtation introduces.  Could some one with a knowledge of the
 hardware comment on this?

 I am not suggesting that the "++, --, if (false)" is faster than an
 interrupt, but that it is faster than cli, sti.  Of course we are
 assuming that there is stuff between the cli and the sti as there is
 between the ++ and the -- if (false).


The problem with counting scheme is that you can not schedule inside any
spinlock - you have to split them up. Maybe you will have to do that anyway.
But if your RT process never needs more memory - it should be quite safe.

The difference with a sleeping mutex is that it can be made lazier - keep it
in the runlist, there should be very few...

See first patch attempt.

(George, Nigel told me about your idea before I sent the previous mail. So 
major influence comes from you. But I do not think that it is equivalent)

/RogerL

Note: changed email...

--- ./linux/kernel/sched.c.orig	Sat Jan 13 19:19:20 2001
+++ ./linux/kernel/sched.c	Sat Jan 13 23:27:13 2001
@@ -144,7 +144,7 @@
 	 * Also, dont trigger a counter recalculation.
 	 */
 	weight = -1;
-	if (p-policy  SCHED_YIELD)
+	if (p-policy  (SCHED_YIELD | SCHED_SPINLOCK))
 		goto out;
 
 	/*
@@ -978,7 +978,7 @@
 	read_lock(tasklist_lock);
 	p = find_process_by_pid(pid);
 	if (p)
-		retval = p-policy  ~SCHED_YIELD;
+		retval = p-policy  ~(SCHED_YIELD | SCHED_SPINLOCK);
 	read_unlock(tasklist_lock);
 
 out_nounlock:
@@ -1267,3 +1267,54 @@
 	atomic_inc(init_mm.mm_count);
 	enter_lazy_tlb(init_mm, current, cpu);
 }
+
+void wakeup_spinlock_yielder(spinlock_t *lock)
+{
+	int need_resched = 0;
+	struct list_head *tmp;
+	struct task_struct *p;
+	
+	/* I do not like this part...
+	*   not SMP safe, the runqueue might change under us...
+	*   can not use spinlocks...
+	*   runlist might be long...
+	*/
+	local_irqsave(flags);
+	if (lock-spin) {
+		/* someone is "spinning" on it
+		 * it has to have higher prio than this
+		 * let go of ALL :-( spinning processes
+		 */
+		lock-spin = 0;
+
+		list_for_each(tmp, runqueue_head) {
+			p = list_entry(tmp, struct task_struct, run_list);
+			if (p-policy  SCHED_SPINLOCK) {
+p-policy = ~SCHED_SPINLOCK;
+			}
+		}
+
+		need_resched = 1;
+	}
+	local_irqrestore(flags);
+
+	/* all higher prio will get a chance to run... */
+	if (need_resched)
+		schedule_running();
+}
+
+void schedule_spinlock(spinlock_t *lock)
+{
+	while (test_and_set(lock-lock)) {
+		/* note: owner can not race here, it has lower prio */
+
+		lock-spinon = 1;
+		p-policy |= SCHED_SPINLOCK;
+		schedule_running();
+		/* will be released in priority order */
+	}
+}
+
+
+
+
--- ./linux/include/linux/sched.h.orig	Sat Jan 13 19:25:53 2001
+++ ./linux/include/linux/sched.h	Sat Jan 13 19:26:31 2001
@@ -119,6 +119,7 @@
  * yield the CPU for one re-schedule..
  */
 #define SCHED_YIELD		0x10
+#define SCHED_SPINLOCK  0x20
 
 struct sched_param {
 	int sched_priority;
--- ./linux/include/linux/spinlock.h.orig	Sat Jan 13 19:40:30 2001
+++ ./linux/include/linux/spinlock.h	Sat Jan 13 21:51:14 2001
@@ -66,16 +66,37 @@
 
 typedef struct {
 	volatile unsigned long lock;
+	??? queue;
 } spinlock_t;
 #define SPIN_LOCK_UNLOCKED (spinlock_t) { 0 }
 
+void wakeup_spinlock_yielder(spinlock_t *lock);
+void schedule_spinlock(spinlock_t *lock);
+
 #define spin_lock_init(x)	do { (x)-lock = 0; } while (0)
 #define spin_is_locked(lock)	(test_bit(0,(lock)))
-#define spin_trylock(lock)	(!test_and_set_bit(0,(lock)))
+#define spin_trylock(lock)	(!test_and_set_bit(0,(lock))) /* fail handled */
+
+#define spin_lock(x)		do { 
+if (test_and_set(lock-lock)) \
+	   schedule_spinlock(); /* kind of yield, giving low goo

Latency: allowing resheduling while holding spin_locks

2001-01-13 Thread Roger Larsson


Hi,

A rethinking of the rescheduling strategy...

I have come to this conclusion.

A spinlock prevents other processes to enter that specific region.
But interrupts are allowed they might delay 
execution of a spin locked
reqion for a undefined (small but anyway) time.

Code with critical maximum times should use spin_lock_irq !

=> spin_locks are not about disallowing reschedules.


Prior to the introduction of spin locks it did not make sense to
allow reschedules in kernel since the big kernel lock was so big...
Any code that wanted do any non pure computation task would
hit it very quickly.

Now with spin locks the situation is quite different...

[First assume UP kernel for simplicity]

Suppose you have two processes one that normal (P) and one high priority 
(RTP).

P runs user code, makes a system call, enters a spin lock region.

Interrupt!

The interrupt service routine wakes up RTP, which marks P as need_reschedule, 
and returns, on return from interrupt it detects that P needs_reschedule -
do it even if it is executing in kernel and holding a spin_lock.

RTP starts, and if it does not hit the same spin_lock there is nothing 
special happening until it goes to sleep again. But suppose it does!

RTP tries to get the spin_lock but fails, since it is the currently highest 
prio process and P is running it wants to reschedule to P to get its own 
stuff done.

P runs the final part of its spin_locked region, upon spin_unlock it needs to
get RTP running.

Something like this:

spin_lock(lock)
{
while (test_and_set(lock->lock)) {
schedule_spinlock(); /* kind of yield, giving low goodness, sticky */
}
}

spin_unlock(lock)
{
clear(lock);

/* note: someone with higher prio than me,
   might steal the lock from even higher prio waiters here */

if (lock->queue)
wakeup_spinlock_yielder(lock);
}


schedule_spinlock()
{
/* note: owner can not run here, it has lower prio */

addqueue(lock->queue, current);

p->policy |= SCHED_SPINLOCK;
schedule();
}

wakeup_spinlock_yielder(lock)
{
int need_resched = 0;

int my_goodness = goodness(current);

forall p in lock->queue
p->policy &= ~SCHED_SPINLOCK;
if (goodness(p) > my_goodness)
need_resched = 1;
}

if (need_resched)
schedule();
}


A final note on spin_lock_irq, since they prevent IRQs there will be no 
requests to wakeup any process during their locked region => no problems.

-- 
Home page:
  no currently
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Latency: allowing resheduling while holding spin_locks

2001-01-13 Thread Roger Larsson


Hi,

A rethinking of the rescheduling strategy...

I have come to this conclusion.

A spinlock prevents other processes to enter that specific region.
But interrupts are allowed they might delay 
execution of a spin locked
reqion for a undefined (small but anyway) time.

Code with critical maximum times should use spin_lock_irq !

= spin_locks are not about disallowing reschedules.


Prior to the introduction of spin locks it did not make sense to
allow reschedules in kernel since the big kernel lock was so big...
Any code that wanted do any non pure computation task would
hit it very quickly.

Now with spin locks the situation is quite different...

[First assume UP kernel for simplicity]

Suppose you have two processes one that normal (P) and one high priority 
(RTP).

P runs user code, makes a system call, enters a spin lock region.

Interrupt!

The interrupt service routine wakes up RTP, which marks P as need_reschedule, 
and returns, on return from interrupt it detects that P needs_reschedule -
do it even if it is executing in kernel and holding a spin_lock.

RTP starts, and if it does not hit the same spin_lock there is nothing 
special happening until it goes to sleep again. But suppose it does!

RTP tries to get the spin_lock but fails, since it is the currently highest 
prio process and P is running it wants to reschedule to P to get its own 
stuff done.

P runs the final part of its spin_locked region, upon spin_unlock it needs to
get RTP running.

Something like this:

spin_lock(lock)
{
while (test_and_set(lock-lock)) {
schedule_spinlock(); /* kind of yield, giving low goodness, sticky */
}
}

spin_unlock(lock)
{
clear(lock);

/* note: someone with higher prio than me,
   might steal the lock from even higher prio waiters here */

if (lock-queue)
wakeup_spinlock_yielder(lock);
}


schedule_spinlock()
{
/* note: owner can not run here, it has lower prio */

addqueue(lock-queue, current);

p-policy |= SCHED_SPINLOCK;
schedule();
}

wakeup_spinlock_yielder(lock)
{
int need_resched = 0;

int my_goodness = goodness(current);

forall p in lock-queue
p-policy = ~SCHED_SPINLOCK;
if (goodness(p)  my_goodness)
need_resched = 1;
}

if (need_resched)
schedule();
}


A final note on spin_lock_irq, since they prevent IRQs there will be no 
requests to wakeup any process during their locked region = no problems.

-- 
Home page:
  no currently
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

SIGSEGV: Linux 2.4.0 and Konqueror 1.9.8 (from KDE 2.1 Beta 1)

2001-01-09 Thread Roger Larsson


Hi,

Konqueror behaves really strange with the new kernel (compiled with 2.95.2 as 
all my kernels for a very long time)

I have not seen this behaviour (to this extent) with earlier 2.4 kernels.

Included a strace... strange use of brk - or? [included /proc/pid/maps too]
It bugs out like this for other reasons as well. (not zdnet specific)

Note: read on it is not the icon stuff that is strange...

Computer: PPro 180, 96MB RAM
Swap [from dmesg] :
 Adding Swap: 530136k swap-space (priority -1)
 Adding Swap: 133048k swap-space (priority -2)
 
Linux version 2.4.0 (root@dox) (gcc version 2.95.2 19991024 (release)) #2 Fri 
Jan 5 23:49:17 CET 2001

access("/opt/kde2/share/apps/konqueror/icons/hicolor/22x22/actions/favicons/www.zdnet.com.xpm",
 
R_OK) = -1 ENOENT (No such file or directory)
access("/opt/kde2/share/icons/hicolor/22x22/actions/favicons/www.zdnet.com.xpm", 
R_OK) = -1 ENOENT (No such file or directory)
[...]
R_OK) = -1 ENOENT (No such file or directory)
access("/opt/kde2/share/icons/locolor/22x22/actions/favicons/www.zdnet.com.xpm", 
R_OK) = -1 ENOENT (No such file or directory)
time([979085700])   = 979085700
gettimeofday({979085700, 853318}, NULL) = 0
write(7, "\2\1\2\0.\1\0\0\1\0\0\0", 12) = 12
write(7, "\0\0\0\17konqueror-3415\0\0\0\0\vkonqueror"..., 111) = 111
write(7, "\0\0\0\210\0h\0t\0t\0p\0:\0/\0/\0w\0w\0w\0.\0z\0d\0n"..., 191) = 191
gettimeofday({979085700, 857154}, NULL) = 0
gettimeofday({979085700, 857408}, NULL) = 0
gettimeofday({979085700, 858925}, NULL) = 0
write(13, "ae_50_\0\0\0\3\0\0\0\20\0r\0e\0f\0e\0r\0r\0e"..., 184) = 184
write(13, "   105_43_\0\0\0\10\0h\0t\0t\0p\377\377\377\377\377\377"..., 271) 
= 271
gettimeofday({979085700, 863577}, NULL) = 0
gettimeofday({979085700, 863811}, NULL) = 0
gettimeofday({979085700, 864752}, NULL) = 0
gettimeofday({979085700, 865038}, NULL) = 0
gettimeofday({979085700, 865229}, NULL) = 0
gettimeofday({979085700, 865774}, NULL) = 0
gettimeofday({979085700, 866013}, NULL) = 0
gettimeofday({979085700, 866320}, NULL) = 0
gettimeofday({979085700, 866512}, NULL) = 0
gettimeofday({979085700, 866799}, NULL) = 0
gettimeofday({979085700, 870601}, NULL) = 0
write(3, "F\20\5\0\3\6\0\4\v\2\0\4\0\0\0\0Q\2\20\0J\5\n\0\3\6\0\4"..., 2048) 
= 2048
write(3, "B\20\5\0\2\2\0\4\v\2\0\4\1\0\2\0\1\0\23\0@\0\4\0\2\2\0"..., 500) = 
500ioctl(3, FIONREAD, [256])   = 0
read(3, "\26\0$L\f\2\0\4\f\2\0\4\0\0\0\0\3\0\3\0Q\2\20\0\0\0\0\0"..., 256) = 
256ioctl(3, FIONREAD, [0]) = 0
ioctl(3, FIONREAD, [0]) = 0
ioctl(3, FIONREAD, [0]) = 0
write(3, "=\0\4\0\f\2\0\4\0\0\0\0=\2\20\0005\20\4\0\t\6\0\4&\0\0"..., 268) = 
268write(3, ";\3\5\0\7\2\0\4\0\0\0\0\3\0\3\0\24\0\20\0;\3\5\0\10\2\0"..., 
676) = 676
ioctl(3, FIONREAD, [0]) = 0
ioctl(3, FIONREAD, [0]) = 0
gettimeofday({979085700, 876529}, NULL) = 0
select(16, [3 4 6 7 9 10 12 13 14 15], NULL, NULL, {0, 0}) = 2 (in [7 13], 
left {0, 0})
read(13, " 4_ a_", 10)  = 10
read(13, "\0\0\0\0", 4) = 4
read(7, "\2\1\0\2.\1\0\0", 8)   = 8
read(7, "\1\0\0\0", 4)  = 4
read(7, "\0\0\0\17konqueror-3415\0\0\0\0\vkonqueror"..., 302) = 302
brk(0x84f8000)  = 0x84f8000
brk(0x84fd000)  = 0x84fd000
brk(0x8502000)  = 0x8502000
brk(0x8507000)  = 0x8507000
brk(0x850c000)  = 0x850c000
brk(0x8511000)  = 0x8511000
brk(0x8516000)  = 0x8516000
brk(0x851b000)  = 0x851b000
brk(0x852)  = 0x852
[...]
brk(0xd02d000)  = 0xd02d000
brk(0xd02f000)  = 0xd02f000
brk(0xd031000)  = 0xd02f000
brk(0xd031000)  = 0xd02f000
brk(0xd031000)  = 0xd02f000
brk(0xd031000)  = 0xd02f000
brk(0xd031000)  = 0xd02f000
brk(0xd031000)  = 0xd02f000
--- SIGSEGV (Segmentation fault) ---
--- SIGSEGV (Segmentation fault) ---
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++  

roger@dox:~ > cat /proc/3507/maps
08048000-0804d000 r-xp  03:45 512886 /opt/kde2/bin/kdeinit
0804d000-0804f000 rw-p 4000 03:45 512886 /opt/kde2/bin/kdeinit
0804f000-0848c000 rwxp  00:00 0
4000-40013000 r-xp  16:04 73442  /lib/ld-2.1.3.so
40013000-40014000 rw-p 00012000 16:04 73442  /lib/ld-2.1.3.so
40014000-40015000 rw-p  00:00 0
40015000-40035000 r-xp  03:45 1121794
/opt/kde2/lib/libDCOP.so.1.0.0 40035000-40037000 rw-p 0001f000 03:45 1121794  
  /opt/kde2/lib/libDCOP.so.1.0.0 40037000-40063000 r-xp  03:45 
1121834/opt/kde2/lib/libkparts.so.1.0.0
40063000-40065000 rw-p 0002b000 03:45 1121834

Re: Benchmarking 2.2 and 2.4 using hdparm and dbench 1.1

2001-01-09 Thread Roger Larsson

On Tuesday 09 January 2001 12:08, Anton Blanchard wrote:
> > Where is the size defined, and is it easy to modify?
>
> Look in fs/buffer.c:buffer_init()
>
> > I noticed that /proc/sys/vm/freepages is not writable any more.  Is there
> > any reason for this?
>
> I am not sure why.
>

It can probably be made writeable, within limits (caused by zones...)

But the interesting part is that 2.4 tries to estimate how much memory it 
will need shortly (inactive_target) and try to keep that amount inactive 
clean (inactive_clean) - clean inactive memory can be freed and reused very
quickly.

cat /proc/meminfo

My feeling is that, for now, keeping it untuneable can help us in finding 
fixable cases...

/RogerL

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Benchmarking 2.2 and 2.4 using hdparm and dbench 1.1

2001-01-09 Thread Roger Larsson


On Tuesday 09 January 2001 12:08, Anton Blanchard wrote:
  Where is the size defined, and is it easy to modify?

 Look in fs/buffer.c:buffer_init()

  I noticed that /proc/sys/vm/freepages is not writable any more.  Is there
  any reason for this?

 I am not sure why.


It can probably be made writeable, within limits (caused by zones...)

But the interesting part is that 2.4 tries to estimate how much memory it 
will need shortly (inactive_target) and try to keep that amount inactive 
clean (inactive_clean) - clean inactive memory can be freed and reused very
quickly.

cat /proc/meminfo

My feeling is that, for now, keeping it untuneable can help us in finding 
fixable cases...

/RogerL

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

SIGSEGV: Linux 2.4.0 and Konqueror 1.9.8 (from KDE 2.1 Beta 1)

2001-01-09 Thread Roger Larsson


Hi,

Konqueror behaves really strange with the new kernel (compiled with 2.95.2 as 
all my kernels for a very long time)

I have not seen this behaviour (to this extent) with earlier 2.4 kernels.

Included a strace... strange use of brk - or? [included /proc/pid/maps too]
It bugs out like this for other reasons as well. (not zdnet specific)

Note: read on it is not the icon stuff that is strange...

Computer: PPro 180, 96MB RAM
Swap [from dmesg] :
 Adding Swap: 530136k swap-space (priority -1)
 Adding Swap: 133048k swap-space (priority -2)
 
Linux version 2.4.0 (root@dox) (gcc version 2.95.2 19991024 (release)) #2 Fri 
Jan 5 23:49:17 CET 2001

access("/opt/kde2/share/apps/konqueror/icons/hicolor/22x22/actions/favicons/www.zdnet.com.xpm",
 
R_OK) = -1 ENOENT (No such file or directory)
access("/opt/kde2/share/icons/hicolor/22x22/actions/favicons/www.zdnet.com.xpm", 
R_OK) = -1 ENOENT (No such file or directory)
[...]
R_OK) = -1 ENOENT (No such file or directory)
access("/opt/kde2/share/icons/locolor/22x22/actions/favicons/www.zdnet.com.xpm", 
R_OK) = -1 ENOENT (No such file or directory)
time([979085700])   = 979085700
gettimeofday({979085700, 853318}, NULL) = 0
write(7, "\2\1\2\0.\1\0\0\1\0\0\0", 12) = 12
write(7, "\0\0\0\17konqueror-3415\0\0\0\0\vkonqueror"..., 111) = 111
write(7, "\0\0\0\210\0h\0t\0t\0p\0:\0/\0/\0w\0w\0w\0.\0z\0d\0n"..., 191) = 191
gettimeofday({979085700, 857154}, NULL) = 0
gettimeofday({979085700, 857408}, NULL) = 0
gettimeofday({979085700, 858925}, NULL) = 0
write(13, "ae_50_\0\0\0\3\0\0\0\20\0r\0e\0f\0e\0r\0r\0e"..., 184) = 184
write(13, "   105_43_\0\0\0\10\0h\0t\0t\0p\377\377\377\377\377\377"..., 271) 
= 271
gettimeofday({979085700, 863577}, NULL) = 0
gettimeofday({979085700, 863811}, NULL) = 0
gettimeofday({979085700, 864752}, NULL) = 0
gettimeofday({979085700, 865038}, NULL) = 0
gettimeofday({979085700, 865229}, NULL) = 0
gettimeofday({979085700, 865774}, NULL) = 0
gettimeofday({979085700, 866013}, NULL) = 0
gettimeofday({979085700, 866320}, NULL) = 0
gettimeofday({979085700, 866512}, NULL) = 0
gettimeofday({979085700, 866799}, NULL) = 0
gettimeofday({979085700, 870601}, NULL) = 0
write(3, "F\20\5\0\3\6\0\4\v\2\0\4\0\0\0\0Q\2\20\0J\5\n\0\3\6\0\4"..., 2048) 
= 2048
write(3, "B\20\5\0\2\2\0\4\v\2\0\4\1\0\2\0\1\0\23\0@\0\4\0\2\2\0"..., 500) = 
500ioctl(3, FIONREAD, [256])   = 0
read(3, "\26\0$L\f\2\0\4\f\2\0\4\0\0\0\0\3\0\3\0Q\2\20\0\0\0\0\0"..., 256) = 
256ioctl(3, FIONREAD, [0]) = 0
ioctl(3, FIONREAD, [0]) = 0
ioctl(3, FIONREAD, [0]) = 0
write(3, "=\0\4\0\f\2\0\4\0\0\0\0=\2\20\0005\20\4\0\t\6\0\4\0\0"..., 268) = 
268write(3, ";\3\5\0\7\2\0\4\0\0\0\0\3\0\3\0\24\0\20\0;\3\5\0\10\2\0"..., 
676) = 676
ioctl(3, FIONREAD, [0]) = 0
ioctl(3, FIONREAD, [0]) = 0
gettimeofday({979085700, 876529}, NULL) = 0
select(16, [3 4 6 7 9 10 12 13 14 15], NULL, NULL, {0, 0}) = 2 (in [7 13], 
left {0, 0})
read(13, " 4_ a_", 10)  = 10
read(13, "\0\0\0\0", 4) = 4
read(7, "\2\1\0\2.\1\0\0", 8)   = 8
read(7, "\1\0\0\0", 4)  = 4
read(7, "\0\0\0\17konqueror-3415\0\0\0\0\vkonqueror"..., 302) = 302
brk(0x84f8000)  = 0x84f8000
brk(0x84fd000)  = 0x84fd000
brk(0x8502000)  = 0x8502000
brk(0x8507000)  = 0x8507000
brk(0x850c000)  = 0x850c000
brk(0x8511000)  = 0x8511000
brk(0x8516000)  = 0x8516000
brk(0x851b000)  = 0x851b000
brk(0x852)  = 0x852
[...]
brk(0xd02d000)  = 0xd02d000
brk(0xd02f000)  = 0xd02f000
brk(0xd031000)  = 0xd02f000
brk(0xd031000)  = 0xd02f000
brk(0xd031000)  = 0xd02f000
brk(0xd031000)  = 0xd02f000
brk(0xd031000)  = 0xd02f000
brk(0xd031000)  = 0xd02f000
--- SIGSEGV (Segmentation fault) ---
--- SIGSEGV (Segmentation fault) ---
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++  

roger@dox:~  cat /proc/3507/maps
08048000-0804d000 r-xp  03:45 512886 /opt/kde2/bin/kdeinit
0804d000-0804f000 rw-p 4000 03:45 512886 /opt/kde2/bin/kdeinit
0804f000-0848c000 rwxp  00:00 0
4000-40013000 r-xp  16:04 73442  /lib/ld-2.1.3.so
40013000-40014000 rw-p 00012000 16:04 73442  /lib/ld-2.1.3.so
40014000-40015000 rw-p  00:00 0
40015000-40035000 r-xp  03:45 1121794
/opt/kde2/lib/libDCOP.so.1.0.0 40035000-40037000 rw-p 0001f000 03:45 1121794  
  /opt/kde2/lib/libDCOP.so.1.0.0 40037000-40063000 r-xp  03:45 
1121834/opt/kde2/lib/libkparts.so.1.0.0
40063000-40065000 rw-p 0002b000 03:45 1121834
/opt/kde2/lib/libkparts.so.1.0.0

Re: [PATCH] 2.4.0-prerelease: preemptive kernel.

2001-01-04 Thread Roger Larsson

On Thursday 04 January 2001 09:43, ludovic fernandez wrote:
> Daniel Phillips wrote:
> > The key idea here is to disable preemption on spin lock and reenable on
> > spin unlock.  That's a practical idea, highly compatible with the
> > current way of doing things.  Its a fairly heavy hit on spinlock
> > performance, but maybe the overall performance hit is small.  Benchmarks
> > are needed.
>
> I'm not sure the hit on spinlock is this heavy (one increment for lock
> and one dec + test on unlock), but I completely agree (and volonteer)
> for benchmarking.

And the conditional jump is usually predicted correctly :-)
+static inline void enable_preempt(void)

+{
+if (atomic_read(>preemptable) <= 0) {
+BUG();
+}
+if (atomic_read(>preemptable) == 1) {

This part can probably be put in a proper non inline function.
Cache issues...
+/*
+* At that point a scheduling is healthy iff:
+* - a scheduling request is pending.
+* - the task is in running state.
+* - this is not an interrupt context.
+* - local interrupts are enabled.
+*/
+if (current->need_resched == 1 &&
+   current->state == TASK_RUNNING &&
+   !in_interrupt()&&
+   local_irq_are_enabled())
+{
+schedule();
+}

+}
+atomic_dec(>preemptable);

What if something happens during the schedule() that would require
another thread...?

+}   

I have been discussing different layout with George on Montavista
also doing this kind of work... (different var and value range)

static incline void enable_preempt(void) {
    if (--current->preempt_count) {
        smp_mb(); /* not shure if needed... */
        preempt_schedule();
    }
}

in sched.c (some smp_mb might be needed here too...)
void preempt_schedule() {
while (current->need_resched) {
current->preempt->count++; /* prevent competition with IRQ code */
if (current->need_resched)
schedule();
current->preempt_count--;
}
}

> I'm not convinced a full preemptive kernel is something
> interesting mainly due to the context switch cost (actually mmu contex
> switch). 

It will NOT be fully, it will be mostly.
You will only context switch when a higher prio thread gets runnable, two 
ways:
1) external intterupt waking higher prio process, same context swithes as 
when running in user code. We won't get more interrupts.
2) wake up due to something we do. Not that many places, mostly due to
releasing syncronization objects (spinlocks does not count).

If this still is a problem, we can select to only preemt to processes running
RT stuff. SCHED_FIFO and SCHED_RR by letting them set need_resched to 2...

> Benchmarking is a good way to get a global overview on this.

Remember to benchmark with stuff that will make the positive aspects visible 
too. Playing audio (with smaller buffers), more reliably burning CD ROMs,
less hichups while playing video [if run with higher prio...]
Plain throuput tests won't tell the whole story!

see
 http://www.gardena.net/benno/linux/audio
 http://www.linuxdj.com/latency-graph/

> What about only preemptable kernel threads ?

No, it won't help enough.

-- 
--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: scheduling problem?

2001-01-02 Thread Roger Larsson

Hi,

I have played around with this code previously.
This is my current understanding.
[yield problem?]

On Tuesday 02 January 2001 09:27, Mike Galbraith wrote:
> Hi,
>
> I am seeing (what I believe is;) severe process CPU starvation in
> 2.4.0-prerelease.  At first, I attributed it to semaphore troubles
> as when I enable semaphore deadlock detection in IKD and set it to
> 5 seconds, it triggers 100% of the time on nscd when I do sequential
> I/O (iozone eg).  In the meantime, I've done a slew of tracing, and
> I think the holder of the semaphore I'm timing out on just flat isn't
> being scheduled so it can release it.  In the usual case of nscd, I
> _think_ it's another nscd holding the semaphore.  In no trace can I
> go back far enough to catch the taker of the semaphore or any user
> task other than iozone running between __down() time and timeout 5
> seconds later.  (trace buffer covers ~8 seconds of kernel time)
>
> I think the snippet below captures the gist of the problem.
>
> c012f32e  nr_free_pages + (0.16) pid(256)
> c012f37a  nr_inactive_clean_pages + (0.22) pid(256)

wakeup_bdflush (from beginning of __alloc_pages; page_alloc.c:324 ) 
> c01377f2  wakeup_bdflush +<12/a0> (0.14) pid(256)
> c011620a  wake_up_process + (0.29) pid(256)

> c012eea4  __alloc_pages_limit +<10/b8> (0.28) pid(256)
> c012eea4  __alloc_pages_limit +<10/b8> (0.30) pid(256)
Two __alloc_pages_limit

wakeup_kswapd(0) (from page_alloc.c:392 )
> c012e3fa  wakeup_kswapd +<12/d4> (0.25) pid(256)
> c0115613  __wake_up +<13/130> (0.41) pid(256)

schedule() (from page_alloc.c:396 )
> c011527b  schedule +<13/398> (0.66) pid(256->6)
> c01077db  __switch_to +<13/d0> (0.70) pid(6)

bdflush is running!!!
> c01893c6  generic_unplug_device + (0.25) pid(6)

bdflush is ready. (but how likely is it that it will run
for long enough to get hit by a tick i.e. current->counter--
unless it is it will continue to be preferred to kswapd, and
since only one process is yielded... )
> c011527b  schedule +<13/398> (0.50) pid(6->256)
> c01077db  __switch_to +<13/d0> (0.29) pid(256)

back to client, not the additionally runable kswapd...
Why not - nothing remaining of timeslice.
Not that the yield only yields one process. Not all
in runqueue - IMHO. [is this intended?]

3:rd __alloc_pages_limit this time direct_reclaim
tests are fulfilled
> c012eea4  __alloc_pages_limit +<10/b8> (0.22) pid(256)
> c012d267  reclaim_page +<13/408> (0.54) pid(256)

Possible (in -prerelease) untested possibilities.

* Be tougher when yielding.

wakeup_kswapd(0);
if (gfp_mask & __GFP_WAIT) {
__set_current_state(TASK_RUNNING);
current->policy |= SCHED_YIELD;
+   current->counter--; /* be faster to let kswapd run */
or
+   current->counter = 0; /* too fast? [not tested] */
schedule();
}

Might be to tough on the client not doing any actual work... think dbench...

* Be tougher on bflushd, decrement its counter now and then... 
  [naive, not tested]

* Move wakeup of bflushd to kswapd. Somewhere after 'do_try_to_free_pages(..)'
  has been run. Before going to sleep... 
  [a variant tested with mixed results - this is likely a better one]

/* 
 * We go to sleep if either the free page shortage
 * or the inactive page shortage is gone. We do this
 * because:
 * 1) we need no more free pages   or
 * 2) the inactive pages need to be flushed to disk,
 *it wouldn't help to eat CPU time now ...
 *
 * We go to sleep for one second, but if it's needed
 * we'll be woken up earlier...
 */
if (!free_shortage() || !inactive_shortage()) {
/*
 * If we are about to get low on free pages and cleaning
 * the inactive_dirty pages would fix the situation,
 * wake up bdflush.
 */
if (free_shortage() && nr_inactive_dirty_pages > 
free_shortage()
&& nr_inactive_dirty_pages >= freepages.high)
wakeup_bdflush(0);

interruptible_sleep_on_timeout(_wait, HZ);
}

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: scheduling problem?

2001-01-02 Thread Roger Larsson


Hi,

I have played around with this code previously.
This is my current understanding.
[yield problem?]

On Tuesday 02 January 2001 09:27, Mike Galbraith wrote:
 Hi,

 I am seeing (what I believe is;) severe process CPU starvation in
 2.4.0-prerelease.  At first, I attributed it to semaphore troubles
 as when I enable semaphore deadlock detection in IKD and set it to
 5 seconds, it triggers 100% of the time on nscd when I do sequential
 I/O (iozone eg).  In the meantime, I've done a slew of tracing, and
 I think the holder of the semaphore I'm timing out on just flat isn't
 being scheduled so it can release it.  In the usual case of nscd, I
 _think_ it's another nscd holding the semaphore.  In no trace can I
 go back far enough to catch the taker of the semaphore or any user
 task other than iozone running between __down() time and timeout 5
 seconds later.  (trace buffer covers ~8 seconds of kernel time)

 I think the snippet below captures the gist of the problem.

 c012f32e  nr_free_pages +e/4c (0.16) pid(256)
 c012f37a  nr_inactive_clean_pages +e/44 (0.22) pid(256)

wakeup_bdflush (from beginning of __alloc_pages; page_alloc.c:324 ) 
 c01377f2  wakeup_bdflush +12/a0 (0.14) pid(256)
 c011620a  wake_up_process +e/58 (0.29) pid(256)

 c012eea4  __alloc_pages_limit +10/b8 (0.28) pid(256)
 c012eea4  __alloc_pages_limit +10/b8 (0.30) pid(256)
Two __alloc_pages_limit

wakeup_kswapd(0) (from page_alloc.c:392 )
 c012e3fa  wakeup_kswapd +12/d4 (0.25) pid(256)
 c0115613  __wake_up +13/130 (0.41) pid(256)

schedule() (from page_alloc.c:396 )
 c011527b  schedule +13/398 (0.66) pid(256-6)
 c01077db  __switch_to +13/d0 (0.70) pid(6)

bdflush is running!!!
 c01893c6  generic_unplug_device +e/38 (0.25) pid(6)

bdflush is ready. (but how likely is it that it will run
for long enough to get hit by a tick i.e. current-counter--
unless it is it will continue to be preferred to kswapd, and
since only one process is yielded... )
 c011527b  schedule +13/398 (0.50) pid(6-256)
 c01077db  __switch_to +13/d0 (0.29) pid(256)

back to client, not the additionally runable kswapd...
Why not - nothing remaining of timeslice.
Not that the yield only yields one process. Not all
in runqueue - IMHO. [is this intended?]

3:rd __alloc_pages_limit this time direct_reclaim
tests are fulfilled
 c012eea4  __alloc_pages_limit +10/b8 (0.22) pid(256)
 c012d267  reclaim_page +13/408 (0.54) pid(256)

Possible (in -prerelease) untested possibilities.

* Be tougher when yielding.


wakeup_kswapd(0);
if (gfp_mask  __GFP_WAIT) {
__set_current_state(TASK_RUNNING);
current-policy |= SCHED_YIELD;
+   current-counter--; /* be faster to let kswapd run */
or
+   current-counter = 0; /* too fast? [not tested] */
schedule();
}

Might be to tough on the client not doing any actual work... think dbench...

* Be tougher on bflushd, decrement its counter now and then... 
  [naive, not tested]

* Move wakeup of bflushd to kswapd. Somewhere after 'do_try_to_free_pages(..)'
  has been run. Before going to sleep... 
  [a variant tested with mixed results - this is likely a better one]


/* 
 * We go to sleep if either the free page shortage
 * or the inactive page shortage is gone. We do this
 * because:
 * 1) we need no more free pages   or
 * 2) the inactive pages need to be flushed to disk,
 *it wouldn't help to eat CPU time now ...
 *
 * We go to sleep for one second, but if it's needed
 * we'll be woken up earlier...
 */
if (!free_shortage() || !inactive_shortage()) {
/*
 * If we are about to get low on free pages and cleaning
 * the inactive_dirty pages would fix the situation,
 * wake up bdflush.
 */
if (free_shortage()  nr_inactive_dirty_pages  
free_shortage()
 nr_inactive_dirty_pages = freepages.high)
wakeup_bdflush(0);

interruptible_sleep_on_timeout(kswapd_wait, HZ);
}

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test12-pre4 + cs46xx + KDE 2.0 = frozen system

2000-12-04 Thread Roger Larsson


Hi,

I am seeing something strange too, trying to reliably reproduce it
for a while - it is rare but irritating.
Most likely to happen on cold power on (first@evening)

--- X ---

XFree86 Version 3.3.6 / X Window System
(protocol Version 11, revision 0, vendor release 6300)
Release Date: January 8 1999
If the server is older than 6-12 months, or if your card is newer
than the above date, look for a newer version before reporting
problems.  (see http://www.XFree86.Org/FAQ)
Operating System: Linux 2.2.14 i686 [ELF] SuSE
  
Voodo 3 2000 (PCI)
now running unaccelerated

--- KDE 2 ---

--- Audio, with or without audio ---

--- Kernel ---
2.2.16 or 2.4.0-testXX


I have more or less come to the conclusion that it is some KDE 2 interaction
with the X server that triggers this...
And that the OS - keyboard and video still runs.
I have seen data IO on modem after freeze...
(Thinking about trying another window manager for awhile...)

But every time I have had an opportunity to hook up another computer, like
tonight - It does not happen... :-(

/RogerL

On Monday 04 December 2000 22:27, Steven Cole wrote:
> If I have the cs46xx driver compiled either as a module or into
> the kernel, then 2.4.0-test12-pre4 locks up when KDE 2.0
> is started.
>
> The problem with dummy.o in 2.4.0-test12-pre4 allowed me
> to find the possible source of this lock-up which I have been
> seeing recently (since test11-ac2) while starting up KDE 2.0.
>
> This morning, I tried out 2.4.0-test12-pre4, and KDE 2.0
> started up (and there was much rejoicing). Of course, I
> saw the error when I tried to make modules, but I thought
> could live without sound for one bootup.
>
> Then I applied Mohammad A. Haque's small patch to
> linux/include/linux/module.h, recompiled , and the system
> froze again at the same spot ("Loading the panel")
> while starting up KDE 2.0.
>
> I found that if I said N for the cs46xx sound driver, then I
> get a 2.4.0-test12-pre4 kernel that will run KDE 2.0,
> sans sound :(.
>
> I can run GNOME with 2.4.0-test12-pre4 with
> cs46xx compiled as a module or compiled into the kernel,
> and everything works just fine.
>
> Here is some additional information from /var/log/messages:
> 2.4.0-test10 works OK with KDE 2.0 and sound.
>
> For 2.4.0-test12-pre4:
>
> Crystal 4280/461x + AC97 Audio, version 0.14, 13:39:25 Dec  4 2000
> cs461x: Card found at 0xf8ffe000 and 0xf8e0, IRQ 18
> cs461x: Unknown card (:) at 0xf8ffe000/0xf8e0, IRQ 18
> ac97_codec: AC97 Audio codec, id: 0x4352:0x5914 (Unknown)
>
> For 2.4.0-test10:
>
> Crystal 4280/461x + AC97 Audio, version 0.09, 15:31:37 Nov  1 2000
> cs461x: Card found at 0xf8ffe000 and 0xf8e0, IRQ 18
> cs461x: Unknown card (1028:0096) at 0xf8ffe000/0xf8e0, IRQ 18
> ac97_codec: AC97 Audio codec, id: 0x4352:0x5914 (Unknown)
>
> The hardware is a DELL 420 dual P-III.
> The base linux distro is Linux-Mandrake 7.2.
> Filesystems are ReiserFS, running reiserfs-3.6.19 for test12
> and reiserfs-3.6.18 for test10.
>
> Note: The ReiserFS folks looked at this, but could
> not reproduce this on another smp machine. That
> was before I noticed the connection with cs46xx.
>
> When I say the system freezes, I mean it completely locks up, and
> ALT-SYSRQ- does not do a thing.  The magic
> key combo gives the expected result before freezup.
>
> Thanks in advance for any help,
>
> Steven
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

-- 
--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test12-pre4 + cs46xx + KDE 2.0 = frozen system

2000-12-04 Thread Roger Larsson


Hi,

I am seeing something strange too, trying to reliably reproduce it
for a while - it is rare but irritating.
Most likely to happen on cold power on (first@evening)

--- X ---

XFree86 Version 3.3.6 / X Window System
(protocol Version 11, revision 0, vendor release 6300)
Release Date: January 8 1999
If the server is older than 6-12 months, or if your card is newer
than the above date, look for a newer version before reporting
problems.  (see http://www.XFree86.Org/FAQ)
Operating System: Linux 2.2.14 i686 [ELF] SuSE
  
Voodo 3 2000 (PCI)
now running unaccelerated

--- KDE 2 ---

--- Audio, with or without audio ---

--- Kernel ---
2.2.16 or 2.4.0-testXX


I have more or less come to the conclusion that it is some KDE 2 interaction
with the X server that triggers this...
And that the OS - keyboard and video still runs.
I have seen data IO on modem after freeze...
(Thinking about trying another window manager for awhile...)

But every time I have had an opportunity to hook up another computer, like
tonight - It does not happen... :-(

/RogerL

On Monday 04 December 2000 22:27, Steven Cole wrote:
 If I have the cs46xx driver compiled either as a module or into
 the kernel, then 2.4.0-test12-pre4 locks up when KDE 2.0
 is started.

 The problem with dummy.o in 2.4.0-test12-pre4 allowed me
 to find the possible source of this lock-up which I have been
 seeing recently (since test11-ac2) while starting up KDE 2.0.

 This morning, I tried out 2.4.0-test12-pre4, and KDE 2.0
 started up (and there was much rejoicing). Of course, I
 saw the error when I tried to make modules, but I thought
 could live without sound for one bootup.

 Then I applied Mohammad A. Haque's small patch to
 linux/include/linux/module.h, recompiled , and the system
 froze again at the same spot ("Loading the panel")
 while starting up KDE 2.0.

 I found that if I said N for the cs46xx sound driver, then I
 get a 2.4.0-test12-pre4 kernel that will run KDE 2.0,
 sans sound :(.

 I can run GNOME with 2.4.0-test12-pre4 with
 cs46xx compiled as a module or compiled into the kernel,
 and everything works just fine.

 Here is some additional information from /var/log/messages:
 2.4.0-test10 works OK with KDE 2.0 and sound.

 For 2.4.0-test12-pre4:

 Crystal 4280/461x + AC97 Audio, version 0.14, 13:39:25 Dec  4 2000
 cs461x: Card found at 0xf8ffe000 and 0xf8e0, IRQ 18
 cs461x: Unknown card (:) at 0xf8ffe000/0xf8e0, IRQ 18
 ac97_codec: AC97 Audio codec, id: 0x4352:0x5914 (Unknown)

 For 2.4.0-test10:

 Crystal 4280/461x + AC97 Audio, version 0.09, 15:31:37 Nov  1 2000
 cs461x: Card found at 0xf8ffe000 and 0xf8e0, IRQ 18
 cs461x: Unknown card (1028:0096) at 0xf8ffe000/0xf8e0, IRQ 18
 ac97_codec: AC97 Audio codec, id: 0x4352:0x5914 (Unknown)

 The hardware is a DELL 420 dual P-III.
 The base linux distro is Linux-Mandrake 7.2.
 Filesystems are ReiserFS, running reiserfs-3.6.19 for test12
 and reiserfs-3.6.18 for test10.

 Note: The ReiserFS folks looked at this, but could
 not reproduce this on another smp machine. That
 was before I noticed the connection with cs46xx.

 When I say the system freezes, I mean it completely locks up, and
 ALT-SYSRQ-whatevercommand does not do a thing.  The magic
 key combo gives the expected result before freezup.

 Thanks in advance for any help,

 Steven
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/

-- 
--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: *_trylock return on success?

2000-11-27 Thread Roger Larsson


On Saturday 25 November 2000 22:05, Roger Larsson wrote:
> On Saturday 25 November 2000 20:22, Philipp Rumpf wrote:
> > On Sat, Nov 25, 2000 at 08:03:49PM +0100, Roger Larsson wrote:
> > > > _trylock functions return 0 for success.
> > >
> > > Not   spin_trylock
> >
> > Argh, I missed the (recent ?) change to make x86 spinlocks use 1 to mean
> > unlocked.  You're correct, and obviously this should be fixed.

Have looked more into this now...
tasklet_trylock is also wrong (but there are only four of them)
Is this 2.4 only, or where there spin locks earlier too?

My suggestion now is a few steps:
1) to release a kernel version that has corrected _trylocks; 
spin2_trylock and tasklet2_trylock.
[with corresponding updates in as many places as possible:
  s/!spin_trylock/spin2_trylock/g
  s/spin_trylock/!spin2_trylock/g 
  . . .]
(ready for spin trylock, not done for tasklet yet..., attached,
 hope it got included OK - not fully used to kmail)

2) This will in house only drives or compilations that in some
strange way uses this calls...

3a) (DANGEROUS) global rename spin2_trylock to spin_trylock
 [no logic change this time - only name]
3b) (dangerous) add compatibility interface
 #define spin_trylock(L) (!spin2_trylock(L))
 Probably not necessary since it can not be linked against.
 Binary modules will contain their own compatibility code :-) 
 Probably preferred by those who maintain drivers for several
 releases; 2.2, 2.4, ...
3c) do not do anything more...


Alternative:
1b) do nothing at all - suffer later

/RogerL

-- 
Home page:
  http://www.norran.net/nra02596/


diff -Naur v2.4.0-test11-linux/arch/alpha/kernel/alpha_ksyms.c 
linux/arch/alpha/kernel/alpha_ksyms.c
--- v2.4.0-test11-linux/arch/alpha/kernel/alpha_ksyms.c Mon Nov 27 21:06:14 2000
+++ linux/arch/alpha/kernel/alpha_ksyms.c   Tue Nov 28 00:35:13 2000
@@ -198,7 +198,7 @@
 #if DEBUG_SPINLOCK
 EXPORT_SYMBOL(spin_unlock);
 EXPORT_SYMBOL(debug_spin_lock);
-EXPORT_SYMBOL(debug_spin_trylock);
+EXPORT_SYMBOL(debug_spin2_trylock);
 #endif
 #if DEBUG_RWLOCK
 EXPORT_SYMBOL(write_lock);
diff -Naur v2.4.0-test11-linux/arch/alpha/kernel/irq_smp.c 
linux/arch/alpha/kernel/irq_smp.c
--- v2.4.0-test11-linux/arch/alpha/kernel/irq_smp.c Thu Sep 21 19:53:32 2000
+++ linux/arch/alpha/kernel/irq_smp.c   Tue Nov 28 00:35:31 2000
@@ -96,7 +96,7 @@
if (!local_bh_count(cpu)
&& spin_is_locked(_bh_lock))
continue;
-   if (spin_trylock(_irq_lock))
+   if (!spin2_trylock(_irq_lock))
break;
}
}
@@ -105,7 +105,7 @@
 static inline void
 get_irqlock(int cpu, void* where)
 {
-   if (!spin_trylock(_irq_lock)) {
+   if (spin2_trylock(_irq_lock)) {
/* Do we already hold the lock?  */
if (cpu == global_irq_holder)
return;
diff -Naur v2.4.0-test11-linux/arch/alpha/kernel/smp.c linux/arch/alpha/kernel/smp.c
--- v2.4.0-test11-linux/arch/alpha/kernel/smp.c Fri Oct 13 00:57:30 2000
+++ linux/arch/alpha/kernel/smp.c   Tue Nov 28 00:34:59 2000
@@ -1078,10 +1078,10 @@
 }
 
 int
-debug_spin_trylock(spinlock_t * lock, const char *base_file, int line_no)
+debug_spin2_trylock(spinlock_t * lock, const char *base_file, int line_no)
 {
int ret;
-   if ((ret = !test_and_set_bit(0, lock))) {
+   if ((ret = test_and_set_bit(0, lock))) {
lock->on_cpu = smp_processor_id();
lock->previous = __builtin_return_address(0);
lock->task = current;
diff -Naur v2.4.0-test11-linux/arch/mips64/sgi-ip27/ip27-irq.c 
linux/arch/mips64/sgi-ip27/ip27-irq.c
--- v2.4.0-test11-linux/arch/mips64/sgi-ip27/ip27-irq.c Thu Sep 21 19:53:32 2000
+++ linux/arch/mips64/sgi-ip27/ip27-irq.c   Tue Nov 28 00:44:48 2000
@@ -480,7 +480,7 @@
continue;
if (!local_bh_count(cpu) && spin_is_locked(_bh_lock))
continue;
-   if (spin_trylock(_irq_lock))
+   if (!spin2_trylock(_irq_lock))
break;
}
}
@@ -497,7 +497,7 @@
 
 static inline void get_irqlock(int cpu)
 {
-   if (!spin_trylock(_irq_lock)) {
+   if (spin2_trylock(_irq_lock)) {
/* do we already hold the lock? */
if ((unsigned char) cpu == global_irq_holder)
return;
diff -Naur v2.4.0-test11-linux/arch/ppc/kernel/misc.S linux/arch/ppc/kernel/misc.S
--- v2.4.0-test11-linux/arch/ppc/kernel/misc.S  Mon Nov 27 21:06:14 2000
+++ linux/arch/ppc/kernel/misc.STue Nov 28 00:40:02 2000
@@ -434,6 +434,7 @@
  * Environments Manual suggests not doing unnecessary stcwx.'s
  * since they m

Re: readonly /proc/sys/vm/freepages (was: Re: PROBLEM: crashing kernels)

2000-11-27 Thread Roger Larsson


On Sunday 26 November 2000 19:36, Rik van Riel wrote:
> On Sun, 26 Nov 2000, Ingo Oeser wrote:
> > On Sun, Nov 26, 2000 at 10:49:50AM +1100, Andrew Morton wrote:
> > > You may also get some benefit from running:
> > >
> > > # echo "512 1024 1536" > /proc/sys/vm/freepages
> > >
> > > after booting.
> >
> > ... which is a NOOP on recent 2.4.0-testX-kernels. So please
> > complain at Rik for this (CC'ed him) ;-)
>
> I wasn't aware I studied at tu-chemnitz ;)
>
> > It's simply not that easy to set in the new VM since we count
> > the inactive_clean and/or inactive_dirty pages like free pages
> > in some cases.
>
> And also, because HIGHMEM pages are not at all usable for kernel
> things, so simply reserving 20MB for network bursts isn't going
> to help you when it's all in highmem pages ...

Should the
echo "512 1024 1536" > /proc/sys/vm/freepages
apply only to DMA pages?
(It would work correctly with <16 M machines, and probably ok with others)

Sidenote:
 Can you build a clean x86 computer that do not especially care about
 DMA able pages? (no ISA cards, no memory limited PCI cards, etc...)
 Would it then be nice to be able to remove that zone completely?
 (like we can with the HIGHMEM today)

>
> > > The default values are a little too low for
> > > applications which are very network intensive.
> >
> > Especially for low memory machines, which are dedicated only for
> > this purpose. Many people use (embedded) Linux and a (embedded)
> > PC to cheaply fill functionality gaps in industrial
> > environments.
>
> Indeed, I agree that we want this tunable back...
>

/RogerL

-- 
Home page:
  http://www.norran.net/nra02596/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: RTlinux & Linux Question

2000-11-27 Thread Roger Larsson


On Monday 27 November 2000 02:36, Mastoras wrote:
> Hello,
>
> I'm trying to use RTlinux to make a unix process wakeup
> periodicaly, in terms of "real time".

Have I understood correctly - you try to use a RTLinux process to get a
finer grained periodical wakeup than linux standard 10 ms?

>
> 1) the unix process uses 2 system calls, one to make it self periodic, and
> one to suspend its self until the next period.
>
> 2) The system call that makes the unix process periodic, creates a Rtlinux
> thread, which is periodic with the same period.
>
> 3) The periodic RT linux thread, sets a flag & sends fakes IRQ0 to linux,
> in order to force its scheduling as soon as possible and then suspends it
> self. (i know that this advances time, but this is not the question right
> now).
>
> 4) The unix process wakeups perfectely when there is no disk activity, but
> when there is some disk activity ("find /" and/or "updatedb") or the
> period is too small (300us) i noticed that sometimes it loses one or two
> periods. This is very rare, i mean 14 loses in 5000 executions at 5ms
> period.
>
> 5) The unix process isn't scheduled the appropriate time although that
> every IRQ is received by linux correctly, the myprocess->counter is
> initialized to a very high value (in each period) and
> current->need_resched is set to 1.
>

You have been hit by the kernel latency, see
http://www.gardena.net/benno/linux/audio

(There are patches)

> 6) I don't want to use PSC.

All attempts to guarantee wake up of a linux process within any
time frame will fail.
Applying a low latency kernel patch will help - good enough for many
applications, but no guarantees...

To get guarantees you need to do your stuff in a RTLinux thread.
(and why not, you are already using it?)


/RogerL

-- 
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: readonly /proc/sys/vm/freepages (was: Re: PROBLEM: crashing kernels)

2000-11-27 Thread Roger Larsson


On Sunday 26 November 2000 19:36, Rik van Riel wrote:
 On Sun, 26 Nov 2000, Ingo Oeser wrote:
  On Sun, Nov 26, 2000 at 10:49:50AM +1100, Andrew Morton wrote:
   You may also get some benefit from running:
  
   # echo "512 1024 1536"  /proc/sys/vm/freepages
  
   after booting.
 
  ... which is a NOOP on recent 2.4.0-testX-kernels. So please
  complain at Rik for this (CC'ed him) ;-)

 I wasn't aware I studied at tu-chemnitz ;)

  It's simply not that easy to set in the new VM since we count
  the inactive_clean and/or inactive_dirty pages like free pages
  in some cases.

 And also, because HIGHMEM pages are not at all usable for kernel
 things, so simply reserving 20MB for network bursts isn't going
 to help you when it's all in highmem pages ...

Should the
echo "512 1024 1536"  /proc/sys/vm/freepages
apply only to DMA pages?
(It would work correctly with 16 M machines, and probably ok with others)

Sidenote:
 Can you build a clean x86 computer that do not especially care about
 DMA able pages? (no ISA cards, no memory limited PCI cards, etc...)
 Would it then be nice to be able to remove that zone completely?
 (like we can with the HIGHMEM today)


   The default values are a little too low for
   applications which are very network intensive.
 
  Especially for low memory machines, which are dedicated only for
  this purpose. Many people use (embedded) Linux and a (embedded)
  PC to cheaply fill functionality gaps in industrial
  environments.

 Indeed, I agree that we want this tunable back...


/RogerL

-- 
Home page:
  http://www.norran.net/nra02596/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: *_trylock return on success?

2000-11-27 Thread Roger Larsson


On Saturday 25 November 2000 22:05, Roger Larsson wrote:
 On Saturday 25 November 2000 20:22, Philipp Rumpf wrote:
  On Sat, Nov 25, 2000 at 08:03:49PM +0100, Roger Larsson wrote:
_trylock functions return 0 for success.
  
   Not   spin_trylock
 
  Argh, I missed the (recent ?) change to make x86 spinlocks use 1 to mean
  unlocked.  You're correct, and obviously this should be fixed.

Have looked more into this now...
tasklet_trylock is also wrong (but there are only four of them)
Is this 2.4 only, or where there spin locks earlier too?

My suggestion now is a few steps:
1) to release a kernel version that has corrected _trylocks; 
spin2_trylock and tasklet2_trylock.
[with corresponding updates in as many places as possible:
  s/!spin_trylock/spin2_trylock/g
  s/spin_trylock/!spin2_trylock/g 
  . . .]
(ready for spin trylock, not done for tasklet yet..., attached,
 hope it got included OK - not fully used to kmail)

2) This will in house only drives or compilations that in some
strange way uses this calls...

3a) (DANGEROUS) global rename spin2_trylock to spin_trylock
 [no logic change this time - only name]
3b) (dangerous) add compatibility interface
 #define spin_trylock(L) (!spin2_trylock(L))
 Probably not necessary since it can not be linked against.
 Binary modules will contain their own compatibility code :-) 
 Probably preferred by those who maintain drivers for several
 releases; 2.2, 2.4, ...
3c) do not do anything more...


Alternative:
1b) do nothing at all - suffer later

/RogerL

-- 
Home page:
  http://www.norran.net/nra02596/


diff -Naur v2.4.0-test11-linux/arch/alpha/kernel/alpha_ksyms.c 
linux/arch/alpha/kernel/alpha_ksyms.c
--- v2.4.0-test11-linux/arch/alpha/kernel/alpha_ksyms.c Mon Nov 27 21:06:14 2000
+++ linux/arch/alpha/kernel/alpha_ksyms.c   Tue Nov 28 00:35:13 2000
@@ -198,7 +198,7 @@
 #if DEBUG_SPINLOCK
 EXPORT_SYMBOL(spin_unlock);
 EXPORT_SYMBOL(debug_spin_lock);
-EXPORT_SYMBOL(debug_spin_trylock);
+EXPORT_SYMBOL(debug_spin2_trylock);
 #endif
 #if DEBUG_RWLOCK
 EXPORT_SYMBOL(write_lock);
diff -Naur v2.4.0-test11-linux/arch/alpha/kernel/irq_smp.c 
linux/arch/alpha/kernel/irq_smp.c
--- v2.4.0-test11-linux/arch/alpha/kernel/irq_smp.c Thu Sep 21 19:53:32 2000
+++ linux/arch/alpha/kernel/irq_smp.c   Tue Nov 28 00:35:31 2000
@@ -96,7 +96,7 @@
if (!local_bh_count(cpu)
 spin_is_locked(global_bh_lock))
continue;
-   if (spin_trylock(global_irq_lock))
+   if (!spin2_trylock(global_irq_lock))
break;
}
}
@@ -105,7 +105,7 @@
 static inline void
 get_irqlock(int cpu, void* where)
 {
-   if (!spin_trylock(global_irq_lock)) {
+   if (spin2_trylock(global_irq_lock)) {
/* Do we already hold the lock?  */
if (cpu == global_irq_holder)
return;
diff -Naur v2.4.0-test11-linux/arch/alpha/kernel/smp.c linux/arch/alpha/kernel/smp.c
--- v2.4.0-test11-linux/arch/alpha/kernel/smp.c Fri Oct 13 00:57:30 2000
+++ linux/arch/alpha/kernel/smp.c   Tue Nov 28 00:34:59 2000
@@ -1078,10 +1078,10 @@
 }
 
 int
-debug_spin_trylock(spinlock_t * lock, const char *base_file, int line_no)
+debug_spin2_trylock(spinlock_t * lock, const char *base_file, int line_no)
 {
int ret;
-   if ((ret = !test_and_set_bit(0, lock))) {
+   if ((ret = test_and_set_bit(0, lock))) {
lock-on_cpu = smp_processor_id();
lock-previous = __builtin_return_address(0);
lock-task = current;
diff -Naur v2.4.0-test11-linux/arch/mips64/sgi-ip27/ip27-irq.c 
linux/arch/mips64/sgi-ip27/ip27-irq.c
--- v2.4.0-test11-linux/arch/mips64/sgi-ip27/ip27-irq.c Thu Sep 21 19:53:32 2000
+++ linux/arch/mips64/sgi-ip27/ip27-irq.c   Tue Nov 28 00:44:48 2000
@@ -480,7 +480,7 @@
continue;
if (!local_bh_count(cpu)  spin_is_locked(global_bh_lock))
continue;
-   if (spin_trylock(global_irq_lock))
+   if (!spin2_trylock(global_irq_lock))
break;
}
}
@@ -497,7 +497,7 @@
 
 static inline void get_irqlock(int cpu)
 {
-   if (!spin_trylock(global_irq_lock)) {
+   if (spin2_trylock(global_irq_lock)) {
/* do we already hold the lock? */
if ((unsigned char) cpu == global_irq_holder)
return;
diff -Naur v2.4.0-test11-linux/arch/ppc/kernel/misc.S linux/arch/ppc/kernel/misc.S
--- v2.4.0-test11-linux/arch/ppc/kernel/misc.S  Mon Nov 27 21:06:14 2000
+++ linux/arch/ppc/kernel/misc.STue Nov 28 00:40:02 2000
@@ -434,6 +434,7 @@
  * Environments Manual suggests not doing unnecessary stcwx.'s
  * since they may inhibit forward progress by other CPUs in getting

Re: RTlinux Linux Question

2000-11-27 Thread Roger Larsson


On Monday 27 November 2000 02:36, Mastoras wrote:
 Hello,

 I'm trying to use RTlinux to make a unix process wakeup
 periodicaly, in terms of "real time".

Have I understood correctly - you try to use a RTLinux process to get a
finer grained periodical wakeup than linux standard 10 ms?


 1) the unix process uses 2 system calls, one to make it self periodic, and
 one to suspend its self until the next period.

 2) The system call that makes the unix process periodic, creates a Rtlinux
 thread, which is periodic with the same period.

 3) The periodic RT linux thread, sets a flag  sends fakes IRQ0 to linux,
 in order to force its scheduling as soon as possible and then suspends it
 self. (i know that this advances time, but this is not the question right
 now).

 4) The unix process wakeups perfectely when there is no disk activity, but
 when there is some disk activity ("find /" and/or "updatedb") or the
 period is too small (300us) i noticed that sometimes it loses one or two
 periods. This is very rare, i mean 14 loses in 5000 executions at 5ms
 period.

 5) The unix process isn't scheduled the appropriate time although that
 every IRQ is received by linux correctly, the myprocess-counter is
 initialized to a very high value (in each period) and
 current-need_resched is set to 1.


You have been hit by the kernel latency, see
http://www.gardena.net/benno/linux/audio

(There are patches)

 6) I don't want to use PSC.

All attempts to guarantee wake up of a linux process within any
time frame will fail.
Applying a low latency kernel patch will help - good enough for many
applications, but no guarantees...

To get guarantees you need to do your stuff in a RTLinux thread.
(and why not, you are already using it?)


/RogerL

-- 
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: *_trylock return on success?

2000-11-25 Thread Roger Larsson

On Saturday 25 November 2000 20:22, Philipp Rumpf wrote:
> On Sat, Nov 25, 2000 at 08:03:49PM +0100, Roger Larsson wrote:
> > > _trylock functions return 0 for success.
> >
> > Not   spin_trylock
>
> Argh, I missed the (recent ?) change to make x86 spinlocks use 1 to mean
> unlocked.  You're correct, and obviously this should be fixed.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

If this are to change in 2.4 I would suggest
to renaming it to mutex_lock (as in Nigels preemptive kernel patch)

Why?

A) the name spin_lock describes how the function is implemented and not
the intended purpose.
B) with a preemptive kernel we will have more than four intended purposes:
1) SMP - spin_lock, prevent two processors to run currently
2) UP- not used, code can only be executed by one thread.
3) PREEMTIVE - lock a region for preemption to avoid concurrent execution.
4) debug - addition of debug checks.

With Nigels patch most are changed, with some additional stuff...

My suggestion, change the name to mutex_lock and negate let mutex_trylock
follow the example of other _trylocks (returning 0 for success).

Ok?

If it is ok, I can prepare a patch (earliest monday)

/RogerL
-- 
Home page:
  http://www.norran.net/nra02596/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: *_trylock return on success?

2000-11-25 Thread Roger Larsson


On Saturday 25 November 2000 19:30, Philipp Rumpf wrote:
> On Sat, Nov 25, 2000 at 03:49:25PM -0200, Rik van Riel wrote:
> > On Sat, 25 Nov 2000, Roger Larsson wrote:
> > > Questions:
> > >   What are _trylocks supposed to return?
> >
> > It depends on the type of _trylock  ;(
> >
> > >   Does spin_trylock and down_trylock behave differently?
> > >   Why isn't the expected return value documented?
> >
> > The whole trylock stuff is, IMHO, a big mess. When you
> > change from one type of trylock to another, you may be
> > forced to invert the logic of your code since the return
> > code from the different locks is different.
> >
> > For bitflags, for example, the trylock returns the state
> > the bit had before the lock (ie. 1 if the thing was already
> > locked).
>
> I assume you're talking about test_and_{set,clear}_bit here.  Their return
> value isn't consistent with the other _trylock functions since they're not
> _trylock functions.
>
> I think the real problem is that people use test_and_set_bit for locks,
> which is almost never[1] a good idea.  The overhead for a semaphore
> shouldn't be too much in most cases, and that way it is obvious what you
> want to do - and, hopefully, even more obvious if you end up with a
> semaphore that can be turned into a spinlock without further changes.
>
> > For spinlocks, it'll probably return something else ;/
>
> _trylock functions return 0 for success.

Not   spin_trylock

Simple example code from
code from include/asm-mips/spinlock.h:65
#define spin_trylock(lock) (!test_and_set_bit(0,(lock)))

/RogerL

> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

-- 
--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: *_trylock return on success?

2000-11-25 Thread Roger Larsson


On Saturday 25 November 2000 18:49, Rik van Riel wrote:
> On Sat, 25 Nov 2000, Roger Larsson wrote:
> > Questions:
> >   What are _trylocks supposed to return?
>
> It depends on the type of _trylock  ;(
>
> >   Does spin_trylock and down_trylock behave differently?
> >   Why isn't the expected return value documented?
>
> The whole trylock stuff is, IMHO, a big mess. When you
> change from one type of trylock to another, you may be
> forced to invert the logic of your code since the return
> code from the different locks is different.
>
> For bitflags, for example, the trylock returns the state
> the bit had before the lock (ie. 1 if the thing was already
> locked).
>

This holds for down_trylocks too.
It looks like it is the spinlocks that are wrong... :-(

As most return values tend to be error returns that also
matches other code in functionallity.

>
> For spinlocks, it'll probably return something else ;/
It does...

I guess fixing this is too much too late?


It looks like ppc mixes the ways... from arch/ppc/lib/locks.c:46

int spin_trylock(spinlock_t *lock)
{
if (__spin_trylock(>lock))  /* one on failure */
return 0; /* zero on failure */ 
lock->owner_cpu = smp_processor_id(); 
lock->owner_pc = (unsigned long)__builtin_return_address(0);
return 1;
}


BUT anyway...
 The thing I hit is not a bug in the kernel proper - it is in the preemptive 
stuff.

/RogerL

-- 
Home page:
  http://www.norran.net/nra02596/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

*_trylock return on success?

2000-11-25 Thread Roger Larsson


Hi,

Background information:
 compiled and tested a test11 with the Montavista preemptive patch.
 After pressing Magic-SysRq-M all processes that tried to do IO hung in 'D'
 Last message "Buffer memory ..."
 Pressing Magic-SysRq-M again, all hung processes continued...

Checking the patch it looks like this

printk("Buffer memory:   %6dkB\n",
atomic_read(_pages) << (PAGE_SHIFT-10));

-#ifdef CONFIG_SMP /* trylock does nothing on UP and so we could deadlock */
-   if (!spin_trylock(_list_lock))
+#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT)
+   if (!mutex_trylock(_list_mtx))
return;
for(nlist = 0; nlist < NR_LIST; nlist++) {

Ok, so I run some more code now than before (UP system with PREEMPT).
mutex_trylock is defined as:

+#define mutex_trylock(x) down_trylock(x)

Noticed that if the spin_trylock returns 0 on success, I will get the 
behavior I see.
  Not printing buffer info first time.
  Holding the lock - stopping other fs processes.
  Failing the mutex_trylock next attempt, interprete as success
  - continuing and printing the buffer info.
  - finally release the mutex

I removed the not (!) and now it works as expected.

Questions:
  What are _trylocks supposed to return?
  Does spin_trylock and down_trylock behave differently?
  Why isn't the expected return value documented?
  
/RogerL

-- 
--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

*_trylock return on success?

2000-11-25 Thread Roger Larsson


Hi,

Background information:
 compiled and tested a test11 with the Montavista preemptive patch.
 After pressing Magic-SysRq-M all processes that tried to do IO hung in 'D'
 Last message "Buffer memory ..."
 Pressing Magic-SysRq-M again, all hung processes continued...

Checking the patch it looks like this

printk("Buffer memory:   %6dkB\n",
atomic_read(buffermem_pages)  (PAGE_SHIFT-10));

-#ifdef CONFIG_SMP /* trylock does nothing on UP and so we could deadlock */
-   if (!spin_trylock(lru_list_lock))
+#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT)
+   if (!mutex_trylock(lru_list_mtx))
return;
for(nlist = 0; nlist  NR_LIST; nlist++) {

Ok, so I run some more code now than before (UP system with PREEMPT).
mutex_trylock is defined as:

+#define mutex_trylock(x) down_trylock(x)

Noticed that if the spin_trylock returns 0 on success, I will get the 
behavior I see.
  Not printing buffer info first time.
  Holding the lock - stopping other fs processes.
  Failing the mutex_trylock next attempt, interprete as success
  - continuing and printing the buffer info.
  - finally release the mutex

I removed the not (!) and now it works as expected.

Questions:
  What are _trylocks supposed to return?
  Does spin_trylock and down_trylock behave differently?
  Why isn't the expected return value documented?
  
/RogerL

-- 
--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: *_trylock return on success?

2000-11-25 Thread Roger Larsson


On Saturday 25 November 2000 19:30, Philipp Rumpf wrote:
 On Sat, Nov 25, 2000 at 03:49:25PM -0200, Rik van Riel wrote:
  On Sat, 25 Nov 2000, Roger Larsson wrote:
   Questions:
 What are _trylocks supposed to return?
 
  It depends on the type of _trylock  ;(
 
 Does spin_trylock and down_trylock behave differently?
 Why isn't the expected return value documented?
 
  The whole trylock stuff is, IMHO, a big mess. When you
  change from one type of trylock to another, you may be
  forced to invert the logic of your code since the return
  code from the different locks is different.
 
  For bitflags, for example, the trylock returns the state
  the bit had before the lock (ie. 1 if the thing was already
  locked).

 I assume you're talking about test_and_{set,clear}_bit here.  Their return
 value isn't consistent with the other _trylock functions since they're not
 _trylock functions.

 I think the real problem is that people use test_and_set_bit for locks,
 which is almost never[1] a good idea.  The overhead for a semaphore
 shouldn't be too much in most cases, and that way it is obvious what you
 want to do - and, hopefully, even more obvious if you end up with a
 semaphore that can be turned into a spinlock without further changes.

  For spinlocks, it'll probably return something else ;/

 _trylock functions return 0 for success.

Not   spin_trylock

Simple example code from
code from include/asm-mips/spinlock.h:65
#define spin_trylock(lock) (!test_and_set_bit(0,(lock)))

/RogerL

 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/

-- 
--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: *_trylock return on success?

2000-11-25 Thread Roger Larsson


On Saturday 25 November 2000 20:22, Philipp Rumpf wrote:
 On Sat, Nov 25, 2000 at 08:03:49PM +0100, Roger Larsson wrote:
   _trylock functions return 0 for success.
 
  Not   spin_trylock

 Argh, I missed the (recent ?) change to make x86 spinlocks use 1 to mean
 unlocked.  You're correct, and obviously this should be fixed.
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/

If this are to change in 2.4 I would suggest
to renaming it to mutex_lock (as in Nigels preemptive kernel patch)

Why?

A) the name spin_lock describes how the function is implemented and not
the intended purpose.
B) with a preemptive kernel we will have more than four intended purposes:
1) SMP - spin_lock, prevent two processors to run currently
2) UP- not used, code can only be executed by one thread.
3) PREEMTIVE - lock a region for preemption to avoid concurrent execution.
4) debug - addition of debug checks.

With Nigels patch most are changed, with some additional stuff...

My suggestion, change the name to mutex_lock and negate let mutex_trylock
follow the example of other _trylocks (returning 0 for success).

Ok?

If it is ok, I can prepare a patch (earliest monday)

/RogerL
-- 
Home page:
  http://www.norran.net/nra02596/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

[PATCH] Re: [PATCH] Latest preemptible kernel (low latency) patch available

2000-11-24 Thread Roger Larsson


Hi,

I got compilation errors due to use of START / STOP 
definitions (zlib.c, ppp?)

This little additional patch should fix it. They were not
used in any other place of the patch...

I am still compiling...

/RogerL

--- spinlock.h.preemt   Sat Nov 25 00:31:38 2000
+++ spinlock.h  Sat Nov 25 00:30:50 2000
@@ -47,21 +47,21 @@
 /*
  * Here are the basic preemption lock macros.
  */
-#define START 0
-#define STOP 1
-#define BKL pree)current)->lock_depth) != -1)
+#define PREEMPT_START 0
+#define PREEMPT_STOP 1
+#define PREEMPT_BKL pree)current)->lock_depth) != -1)
 
 #ifdef DEBUG_PREEMPT
 #define debug_lock(t) do {  \
-   if ((in_ctx_sw_off() - (BKL?1:0)) < t) \
+   if ((in_ctx_sw_off() - (PREEMPT_BKL?1:0)) 
< t) \
   SPIN_BREAKPOINT; \
  } while (0) 
 #else
 #define debug_lock(t) do {   } while (0) 
 #endif
 
-#define preempt_lock_start(c) debug_lock(START)
-#define preempt_lock_stop()   debug_lock(STOP)
+#define preempt_lock_start(c) debug_lock(PREEMPT_START)
+#define preempt_lock_stop()   debug_lock(PREEMPT_STOP)
 
 #ifdef CONFIG_PREEMPT
 #include 

-- 
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

[PATCH] Re: [PATCH] Latest preemptible kernel (low latency) patch available

2000-11-24 Thread Roger Larsson


Hi,

I got compilation errors due to use of START / STOP 
definitions (zlib.c, ppp?)

This little additional patch should fix it. They were not
used in any other place of the patch...

I am still compiling...

/RogerL

--- spinlock.h.preemt   Sat Nov 25 00:31:38 2000
+++ spinlock.h  Sat Nov 25 00:30:50 2000
@@ -47,21 +47,21 @@
 /*
  * Here are the basic preemption lock macros.
  */
-#define START 0
-#define STOP 1
-#define BKL pree)current)-lock_depth) != -1)
+#define PREEMPT_START 0
+#define PREEMPT_STOP 1
+#define PREEMPT_BKL pree)current)-lock_depth) != -1)
 
 #ifdef DEBUG_PREEMPT
 #define debug_lock(t) do {  \
-   if ((in_ctx_sw_off() - (BKL?1:0))  t) \
+   if ((in_ctx_sw_off() - (PREEMPT_BKL?1:0)) 
 t) \
   SPIN_BREAKPOINT; \
  } while (0) 
 #else
 #define debug_lock(t) do {   } while (0) 
 #endif
 
-#define preempt_lock_start(c) debug_lock(START)
-#define preempt_lock_stop()   debug_lock(STOP)
+#define preempt_lock_start(c) debug_lock(PREEMPT_START)
+#define preempt_lock_stop()   debug_lock(PREEMPT_STOP)
 
 #ifdef CONFIG_PREEMPT
 #include asm/current.h

-- 
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Confusing comment in reschedule_idle - unlock of runqueue.

2000-11-16 Thread Roger Larsson


Hi,

This comment is written in head of reschedule_idle, is it really
correct?

--
/*
 * This is ugly, but reschedule_idle() is very timing-critical.
 * We enter with the runqueue spinlock held, but we might end
 * up unlocking it early, so the caller must not unlock the
 * runqueue, it's always done by reschedule_idle().
 *
 * This function must be inline as anything that saves and restores
 * flags has to do so within the same register window on sparc (Anton)
 */
static FASTCALL(void reschedule_idle(struct task_struct * p));

static void reschedule_idle(struct task_struct * p)
--


If it is then, wake_up_process and schedule_tail are wrong.
But I think not...

--
reschedule_idle(p);
out:
spin_unlock_irqrestore(_lock, flags);
--

/RogerL

-- 
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Confusing comment in reschedule_idle - unlock of runqueue.

2000-11-16 Thread Roger Larsson


Hi,

This comment is written in head of reschedule_idle, is it really
correct?

--
/*
 * This is ugly, but reschedule_idle() is very timing-critical.
 * We enter with the runqueue spinlock held, but we might end
 * up unlocking it early, so the caller must not unlock the
 * runqueue, it's always done by reschedule_idle().
 *
 * This function must be inline as anything that saves and restores
 * flags has to do so within the same register window on sparc (Anton)
 */
static FASTCALL(void reschedule_idle(struct task_struct * p));

static void reschedule_idle(struct task_struct * p)
--


If it is then, wake_up_process and schedule_tail are wrong.
But I think not...

--
reschedule_idle(p);
out:
spin_unlock_irqrestore(runqueue_lock, flags);
--

/RogerL

-- 
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.4 Status/TODO page (test11-pre3)

2000-11-13 Thread Roger Larsson


On Sunday 12 November 2000 23:31, Erik Mouw wrote:
> On Sun, Nov 12, 2000 at 02:39:09PM -0500, [EMAIL PROTECTED] wrote:
> >  * USB: system hang with USB audio driver {CRITICAL} (David
> >Woodhouse, Randy Dunlap, Narayan Desai) (Fixed with usb-uhci;
> >uhci-alt is unknown -- randy dunlap)
>
> I can still hang the system with XMMS (1.0.1) using real-time priority.
> The system doesn't die, but it is completely unresponsive. There is no
> sound, but after the MP3 file is played, the system works again. I can
> reproduce this behaviour with usb-uhci and uhci-alt on 2.4.0-test10.
> I haven't test test11-pre3 yet, but the changes don't look too big that
> the "bug" has been fixed.
>

Does it use non blocking IO?
In such case you might have created an infinite loop at high priority
resulting in a busylock of all other processes.


> BTW, I don't think this is a critical bug.
>
>
> Erik

-- 
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux 2.4 Status/TODO page (test11-pre3)

2000-11-13 Thread Roger Larsson


On Sunday 12 November 2000 23:31, Erik Mouw wrote:
 On Sun, Nov 12, 2000 at 02:39:09PM -0500, [EMAIL PROTECTED] wrote:
   * USB: system hang with USB audio driver {CRITICAL} (David
 Woodhouse, Randy Dunlap, Narayan Desai) (Fixed with usb-uhci;
 uhci-alt is unknown -- randy dunlap)

 I can still hang the system with XMMS (1.0.1) using real-time priority.
 The system doesn't die, but it is completely unresponsive. There is no
 sound, but after the MP3 file is played, the system works again. I can
 reproduce this behaviour with usb-uhci and uhci-alt on 2.4.0-test10.
 I haven't test test11-pre3 yet, but the changes don't look too big that
 the "bug" has been fixed.


Does it use non blocking IO?
In such case you might have created an infinite loop at high priority
resulting in a busylock of all other processes.


 BTW, I don't think this is a critical bug.


 Erik

-- 
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.2.18Pre Lan Performance Rocks!

2000-10-31 Thread Roger Larsson


"Jeff V. Merkey" wrote:
> 
> David/Alan,
> 
> Andre Hedrick is now the CTO of TRG and Chief Scientist over Linux
> Development.  After talking
> to him, we are going to do our own ring 0 2.4 and 2.2.x code bases for
> the MANOS merge.
> the uClinux is interesting, but I agree is limited.
> 

Jeff,

What would be missed out in this approach:
* Use Montavista "fully" preemtible kernel.
* Using Kernel threads for all services (File, Print, Web, etc.).

/RogerL

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.2.18Pre Lan Performance Rocks!

2000-10-31 Thread Roger Larsson


"Jeff V. Merkey" wrote:
 
 David/Alan,
 
 Andre Hedrick is now the CTO of TRG and Chief Scientist over Linux
 Development.  After talking
 to him, we are going to do our own ring 0 2.4 and 2.2.x code bases for
 the MANOS merge.
 the uClinux is interesting, but I agree is limited.
 

Jeff,

What would be missed out in this approach:
* Use Montavista "fully" preemtible kernel.
* Using Kernel threads for all services (File, Print, Web, etc.).

/RogerL

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: test10-pre4: deadlock in VM?

2000-10-25 Thread Roger Larsson


Not.

It does not lock anything else...
This was not a problem.

/RogerL

Roger Larsson wrote:
> 
> Hi again,
> 
> Please ignore my patch suggestion from getblk -
> it will give problems later - in alloc...
> 
> It is grow_buffers that might need to lock the
> other ones too...
> 
> /RogerL
> 
> --
> Home page:
>   http://www.norran.net/nra02596/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: test10-pre4: deadlock in VM?

2000-10-25 Thread Roger Larsson


Hi again,

Please ignore my patch suggestion from getblk -
it will give problems later - in alloc...

It is grow_buffers that might need to lock the
other ones too...

/RogerL

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: test10-pre4: deadlock in VM?

2000-10-25 Thread Roger Larsson


Found a strange one.

getblk releases hash_table_lock and lru_list_lock
before calling refill_freelist that calls grow_buffers
that locks free_list[].lock
- lru_lock and hash_table_lock not held, violating
deadlock prevention rules in beginning of file.

patch.
  in getblk move the call to refill_freelist before
  releasing the locks - ok?

/RogerL


Roger Larsson wrote:
> 
> Hi,
> 
> I noted that even try_to_free_buffers locks lru_list_lock.
> Then it tries to lock some others - maybe one of the other treads
> got one of those (hash_table_lock, free_list[index].lock)
> It fits with that proc 4 it executes in the beginning of
> try_to_free_buffers, does it move?
> Or is it stuck at a spin lock there - which one? disassembly of
> try_to_free_buffers?
> 
> /RogerL
> 
> Rajagopal Ananthanarayanan wrote:
> >
> > Tigran Aivazian wrote:
> > >
> > > Hi guys,
> > >
> > > When running SPEC SFS tests against 2.4.0-test10-pre4 on a 4-way SMP
> > > machine with 6G RAM (highmem+PAE enabled) I got
> > >
> > > __alloc_pages: 0-order allocation failed.
> > >
> > > (probably coming from nfsd, why don't we print eip of the caller there?)
> > >
> > > and the machine locked up (but pingable). So I entered kdb and got stack
> > > traces of all running proceeses:
> >
> > Hmm. It appears that some of the processes are stuck on this
> > part of page_launder:
> >
> > /*
> >  * Re-take the spinlock. Note that we cannot
> >  * unlock the page yet since we're still
> >  * accessing the page_struct here...
> >  */
> > spin_lock(_lru_lock);
> >
> > It will be interesting to see what's going on in each of the cpus.
> > Use "cpu x" x=0,1,2,3 on your 4 cpu system to switch to cpu x,
> > and just type "bt" on each cpu. Also, it will be good to see what
> > kswapd (pid 2) is upto ...
> >
> > --
> > Rajagopal Ananthanarayanan ("ananth")
> > Member Technical Staff, SGI.
> > --
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [EMAIL PROTECTED]
> > Please read the FAQ at http://www.tux.org/lkml/
> 
> --
> Home page:
>   http://www.norran.net/nra02596/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: test10-pre4: deadlock in VM?

2000-10-25 Thread Roger Larsson


Hi,

I noted that even try_to_free_buffers locks lru_list_lock.
Then it tries to lock some others - maybe one of the other treads
got one of those (hash_table_lock, free_list[index].lock)
It fits with that proc 4 it executes in the beginning of
try_to_free_buffers, does it move?
Or is it stuck at a spin lock there - which one? disassembly of
try_to_free_buffers?

/RogerL

Rajagopal Ananthanarayanan wrote:
> 
> Tigran Aivazian wrote:
> >
> > Hi guys,
> >
> > When running SPEC SFS tests against 2.4.0-test10-pre4 on a 4-way SMP
> > machine with 6G RAM (highmem+PAE enabled) I got
> >
> > __alloc_pages: 0-order allocation failed.
> >
> > (probably coming from nfsd, why don't we print eip of the caller there?)
> >
> > and the machine locked up (but pingable). So I entered kdb and got stack
> > traces of all running proceeses:
> 
> Hmm. It appears that some of the processes are stuck on this
> part of page_launder:
> 
> /*
>  * Re-take the spinlock. Note that we cannot
>  * unlock the page yet since we're still
>  * accessing the page_struct here...
>  */
> spin_lock(_lru_lock);
> 
> It will be interesting to see what's going on in each of the cpus.
> Use "cpu x" x=0,1,2,3 on your 4 cpu system to switch to cpu x,
> and just type "bt" on each cpu. Also, it will be good to see what
> kswapd (pid 2) is upto ...
> 
> --
> Rajagopal Ananthanarayanan ("ananth")
> Member Technical Staff, SGI.
> --
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: test10-pre4: deadlock in VM?

2000-10-25 Thread Roger Larsson


Hi,

I noted that even try_to_free_buffers locks lru_list_lock.
Then it tries to lock some others - maybe one of the other treads
got one of those (hash_table_lock, free_list[index].lock)
It fits with that proc 4 it executes in the beginning of
try_to_free_buffers, does it move?
Or is it stuck at a spin lock there - which one? disassembly of
try_to_free_buffers?

/RogerL

Rajagopal Ananthanarayanan wrote:
 
 Tigran Aivazian wrote:
 
  Hi guys,
 
  When running SPEC SFS tests against 2.4.0-test10-pre4 on a 4-way SMP
  machine with 6G RAM (highmem+PAE enabled) I got
 
  __alloc_pages: 0-order allocation failed.
 
  (probably coming from nfsd, why don't we print eip of the caller there?)
 
  and the machine locked up (but pingable). So I entered kdb and got stack
  traces of all running proceeses:
 
 Hmm. It appears that some of the processes are stuck on this
 part of page_launder:
 
 /*
  * Re-take the spinlock. Note that we cannot
  * unlock the page yet since we're still
  * accessing the page_struct here...
  */
 spin_lock(pagemap_lru_lock);
 
 It will be interesting to see what's going on in each of the cpus.
 Use "cpu x" x=0,1,2,3 on your 4 cpu system to switch to cpu x,
 and just type "bt" on each cpu. Also, it will be good to see what
 kswapd (pid 2) is upto ...
 
 --
 Rajagopal Ananthanarayanan ("ananth")
 Member Technical Staff, SGI.
 --
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: test10-pre4: deadlock in VM?

2000-10-25 Thread Roger Larsson


Found a strange one.

getblk releases hash_table_lock and lru_list_lock
before calling refill_freelist that calls grow_buffers
that locks free_list[].lock
- lru_lock and hash_table_lock not held, violating
deadlock prevention rules in beginning of file.

patch.
  in getblk move the call to refill_freelist before
  releasing the locks - ok?

/RogerL


Roger Larsson wrote:
 
 Hi,
 
 I noted that even try_to_free_buffers locks lru_list_lock.
 Then it tries to lock some others - maybe one of the other treads
 got one of those (hash_table_lock, free_list[index].lock)
 It fits with that proc 4 it executes in the beginning of
 try_to_free_buffers, does it move?
 Or is it stuck at a spin lock there - which one? disassembly of
 try_to_free_buffers?
 
 /RogerL
 
 Rajagopal Ananthanarayanan wrote:
 
  Tigran Aivazian wrote:
  
   Hi guys,
  
   When running SPEC SFS tests against 2.4.0-test10-pre4 on a 4-way SMP
   machine with 6G RAM (highmem+PAE enabled) I got
  
   __alloc_pages: 0-order allocation failed.
  
   (probably coming from nfsd, why don't we print eip of the caller there?)
  
   and the machine locked up (but pingable). So I entered kdb and got stack
   traces of all running proceeses:
 
  Hmm. It appears that some of the processes are stuck on this
  part of page_launder:
 
  /*
   * Re-take the spinlock. Note that we cannot
   * unlock the page yet since we're still
   * accessing the page_struct here...
   */
  spin_lock(pagemap_lru_lock);
 
  It will be interesting to see what's going on in each of the cpus.
  Use "cpu x" x=0,1,2,3 on your 4 cpu system to switch to cpu x,
  and just type "bt" on each cpu. Also, it will be good to see what
  kswapd (pid 2) is upto ...
 
  --
  Rajagopal Ananthanarayanan ("ananth")
  Member Technical Staff, SGI.
  --
  -
  To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
  the body of a message to [EMAIL PROTECTED]
  Please read the FAQ at http://www.tux.org/lkml/
 
 --
 Home page:
   http://www.norran.net/nra02596/
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: test10-pre4: deadlock in VM?

2000-10-25 Thread Roger Larsson


Hi again,

Please ignore my patch suggestion from getblk -
it will give problems later - in alloc...

It is grow_buffers that might need to lock the
other ones too...

/RogerL

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: test10-pre4: deadlock in VM?

2000-10-25 Thread Roger Larsson


Not.

It does not lock anything else...
This was not a problem.

/RogerL

Roger Larsson wrote:
 
 Hi again,
 
 Please ignore my patch suggestion from getblk -
 it will give problems later - in alloc...
 
 It is grow_buffers that might need to lock the
 other ones too...
 
 /RogerL
 
 --
 Home page:
   http://www.norran.net/nra02596/
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: test10-pre5: hangs in boot

2000-10-24 Thread Roger Larsson


False alarm.

Rechecked my .config - it was strange
And remembered that I did a clean start...
Wrong config file - sorry...

/RogerL

Brian Gerst wrote:
> 
> Roger Larsson wrote:
> >
> > Hi,
> >
> > This is the first test kernel that won't boot
> > for me.
> > Last message "Ok, booting the kernel"
> > Then nothing...
> >
> > PPro 180
> > 96MB
> > 440FX chip set
> >
> > Saw something about PCI initializations earlier
> > on the list...
> 
> Could you send me your .config file please?
> 
> --
> 
> Brian Gerst

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

test10-pre5: hangs in boot

2000-10-24 Thread Roger Larsson


Hi,

This is the first test kernel that won't boot
for me.
Last message "Ok, booting the kernel"
Then nothing...

PPro 180
96MB
440FX chip set

Saw something about PCI initializations earlier
on the list...


/RogerL

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

test10-pre5: hangs in boot

2000-10-24 Thread Roger Larsson


Hi,

This is the first test kernel that won't boot
for me.
Last message "Ok, booting the kernel"
Then nothing...

PPro 180
96MB
440FX chip set

Saw something about PCI initializations earlier
on the list...


/RogerL

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: test10-pre5: hangs in boot

2000-10-24 Thread Roger Larsson


False alarm.

Rechecked my .config - it was strange
And remembered that I did a clean start...
Wrong config file - sorry...

/RogerL

Brian Gerst wrote:
 
 Roger Larsson wrote:
 
  Hi,
 
  This is the first test kernel that won't boot
  for me.
  Last message "Ok, booting the kernel"
  Then nothing...
 
  PPro 180
  96MB
  440FX chip set
 
  Saw something about PCI initializations earlier
  on the list...
 
 Could you send me your .config file please?
 
 --
 
 Brian Gerst

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-testx fr0kedness?

2000-10-21 Thread Roger Larsson


Jason Slagle wrote:
> 
> Howdy!  2.4.0 is looking almost ready except 1 HY00GE problem I'm having.
> 
> I'm SMP here 2 Celeron 300A's at 450 in an Abit BP6.  256M of RAM, all
> SCSI.
> 
> System will run for a week no problems.
> 
> Then I compile mozilla and all hell breaks loose.
> 
> Compile will go for a bit then it'll hang and need SAK'd.  w and ps and
> top will then hang.  loadavg is over 4 and I can't paticuraly see whats
> causing it.  Meminfo looks fine but it's acting like it's outta RAM.  I'm
> like 37 megs into swap when it happened with over 100 megs of buffer
> cache.
> 
> Pretty normal setup here except these:
> 
> echo "1024 2048 4096" > /proc/sys/vm/freepages
> echo "5 10 60" > /proc/sys/vm/buffermem
> echo "16384" >/proc/sys/fs/file-max
> echo "0" >/proc/sys/net/ipv4/tcp_ecn
> 
> These bad?  They worked well under 2.2 but who knows under 2.4
> 
> Please advise, will provide any info I can if needed.
> 
> Jason

Hmm... Possibly VM related.

Are you using 2.4.0-test9? There are some not so nice things that
are fixed in later "test10-preX" (from "testing" subdirectory,
pre3 might be the best choice)

Even if you are using test10 you could run vmstat 1 and include
a typical part. An ALT-SysRq-M (need magic sysrq compiled and enabled)
could be useful too.

/RogerL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-testx fr0kedness?

2000-10-21 Thread Roger Larsson


Jason Slagle wrote:
 
 Howdy!  2.4.0 is looking almost ready except 1 HY00GE problem I'm having.
 
 I'm SMP here 2 Celeron 300A's at 450 in an Abit BP6.  256M of RAM, all
 SCSI.
 
 System will run for a week no problems.
 
 Then I compile mozilla and all hell breaks loose.
 
 Compile will go for a bit then it'll hang and need SAK'd.  w and ps and
 top will then hang.  loadavg is over 4 and I can't paticuraly see whats
 causing it.  Meminfo looks fine but it's acting like it's outta RAM.  I'm
 like 37 megs into swap when it happened with over 100 megs of buffer
 cache.
 
 Pretty normal setup here except these:
 
 echo "1024 2048 4096"  /proc/sys/vm/freepages
 echo "5 10 60"  /proc/sys/vm/buffermem
 echo "16384" /proc/sys/fs/file-max
 echo "0" /proc/sys/net/ipv4/tcp_ecn
 
 These bad?  They worked well under 2.2 but who knows under 2.4
 
 Please advise, will provide any info I can if needed.
 
 Jason

Hmm... Possibly VM related.

Are you using 2.4.0-test9? There are some not so nice things that
are fixed in later "test10-preX" (from "testing" subdirectory,
pre3 might be the best choice)

Even if you are using test10 you could run vmstat 1 and include
a typical part. An ALT-SysRq-M (need magic sysrq compiled and enabled)
could be useful too.

/RogerL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Roger Larsson

Russell King wrote:
> 
> Petr Vandrovec writes:
> > ... or from sys_exit() if you forget to unmap. Or from anywhere if
> > swapping code decides to swap such page. I'm trying to hunt it down
> > for more than month, but I have no idea what's wrong. In my case
> > way to trigger bug is:
> 
> I actually think its as simple as:
> 
  0. open file
> 1. shared map file
> 2. close file
> 3. unlink file
> 4. unmap shared mapping

Will it work correctly if 4. is done before 3. (even before 2?)
Is it legal/good practice to unmap the file after closing it?
(Since the sharing needs the fd to mmap it)

I would have expected this order:
  A. open file
  B. shared map file
  C. unmap shared mapping
  D. close file
  E. unlink file

But in your case the unlink should be deferred to unmap time...
(if it is legal)

Successful unlinking a file should probably free pages directly to
free list - might be worth optimizing for.

/RogerL

Disclaimer:
 I do not really understand fs code...

> 
> In my case, I was running newaliases, and there wasn't any other
> processes using that mapping.
>_
>   |_| - ---+---+-
>   |   | Russell King[EMAIL PROTECTED]  --- ---
>   | | | | http://www.arm.linux.org.uk/personal/aboutme.html   /  /  |
>   | +-+-+ --- -+-
>   /   |   THE developer of ARM Linux  |+| /|\
>  /  | | | ---  |
> +-+-+ -  /\\\  |
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Roger Larsson


Russell King wrote:
 
 Petr Vandrovec writes:
  ... or from sys_exit() if you forget to unmap. Or from anywhere if
  swapping code decides to swap such page. I'm trying to hunt it down
  for more than month, but I have no idea what's wrong. In my case
  way to trigger bug is:
 
 I actually think its as simple as:
 
  0. open file
 1. shared map file
 2. close file
 3. unlink file
 4. unmap shared mapping

Will it work correctly if 4. is done before 3. (even before 2?)
Is it legal/good practice to unmap the file after closing it?
(Since the sharing needs the fd to mmap it)

I would have expected this order:
  A. open file
  B. shared map file
  C. unmap shared mapping
  D. close file
  E. unlink file

But in your case the unlink should be deferred to unmap time...
(if it is legal)

Successful unlinking a file should probably free pages directly to
free list - might be worth optimizing for.

/RogerL


Disclaimer:
 I do not really understand fs code...

 
 In my case, I was running newaliases, and there wasn't any other
 processes using that mapping.
_
   |_| - ---+---+-
   |   | Russell King[EMAIL PROTECTED]  --- ---
   | | | | http://www.arm.linux.org.uk/personal/aboutme.html   /  /  |
   | +-+-+ --- -+-
   /   |   THE developer of ARM Linux  |+| /|\
  /  | | | ---  |
 +-+-+ -  /\\\  |
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [BUG]: Ext2 Corruption in test10pre3 (incl. Oops)

2000-10-17 Thread Roger Larsson


Linus Torvalds wrote:
> 
> On Tue, 17 Oct 2000, Alexander Viro wrote:
> >
> > > Trace; c014efde 
> > > Trace; c014f240 
> > > Trace; c014f6af 
> > > Trace; c021e87e 
> > Huh?
> > > Trace; c01523af 
> >
> > The rest of trace is OK, but WTF is net/unix/*.c code is doing here?
> 
> The traces always (or almost always) have crud in them - it's not a real
> stack-trace, it's just a printout of the stack contents that match
> addresses in the text region. So the unix_write_space thing was probably
> from the previous system call and just hadn't been overwritten.
> 
> Linus
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/


Hmm..
Might this problem be related to...

Tigrans:
>  Subject: test10-pre1 BUG at page_alloc.c:221!


Quintelas:
> Subject: I've got the BAD_RANGE BUG in rmqueue!!! (Pre9-4)



Richard Guenther
> Subject:  [OOPS][BUG] with 2.4.0-test9
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [BUG]: Ext2 Corruption in test10pre3 (incl. Oops)

2000-10-17 Thread Roger Larsson


Linus Torvalds wrote:
 
 On Tue, 17 Oct 2000, Alexander Viro wrote:
 
   Trace; c014efde read_inode_bitmap+3e/90
   Trace; c014f240 load_inode_bitmap+210/230
   Trace; c014f6af ext2_new_inode+29f/700
   Trace; c021e87e unix_write_space+2e/50
  Huh?
   Trace; c01523af ext2_create+1f/c0
 
  The rest of trace is OK, but WTF is net/unix/*.c code is doing here?
 
 The traces always (or almost always) have crud in them - it's not a real
 stack-trace, it's just a printout of the stack contents that match
 addresses in the text region. So the unix_write_space thing was probably
 from the previous system call and just hadn't been overwritten.
 
 Linus
 
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/


Hmm..
Might this problem be related to...

Tigrans:
  Subject: test10-pre1 BUG at page_alloc.c:221!


Quintelas:
 Subject: I've got the BAD_RANGE BUG in rmqueue!!! (Pre9-4)



Richard Guenther
 Subject:  [OOPS][BUG] with 2.4.0-test9
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

[Fwd: failure to burn CDs under 2.4.0-test9]

2000-10-06 Thread Roger Larsson


To the right linux-kernel list this time.

/RogerL

Roger Larsson wrote:
> 
> Christoph Lameter wrote:
> >
> > Comparing CD contents with the original after burning showed mismatches 4
> > times in a row. Booted into linux 2.2.18 and everything is fine.
> >
> > Together with the events of freezing in pine I would suggest that there is
> > something in the kernel scribbling memory.
> >
> > I am back to 2.2 for good for now.
> >
> 
> Two issues:
> * __alloc_pages can get into a "dead lock" situation.
>   Please se my: "[PATCH] test9: another vm lockup bug - squashed"
> * it refers to Riels patch of freepages
> With these additions you get more pages free - less likely not to block
> on
> read (really: cp disk CD)
> 
> If this does not help consider trying Andrew Mortons
> "lowish-latency patch for 2.4.0-test9"
> * It avoids getting stuck in kernel loops when the CPU is better needed
>   for some other process.
> 
> /RogerL
> 
> --
> Home page:
>   http://www.norran.net/nra02596/

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

[Fwd: failure to burn CDs under 2.4.0-test9]

2000-10-06 Thread Roger Larsson


To the right linux-kernel list this time.

/RogerL

Roger Larsson wrote:
 
 Christoph Lameter wrote:
 
  Comparing CD contents with the original after burning showed mismatches 4
  times in a row. Booted into linux 2.2.18 and everything is fine.
 
  Together with the events of freezing in pine I would suggest that there is
  something in the kernel scribbling memory.
 
  I am back to 2.2 for good for now.
 
 
 Two issues:
 * __alloc_pages can get into a "dead lock" situation.
   Please se my: "[PATCH] test9: another vm lockup bug - squashed"
 * it refers to Riels patch of freepages
 With these additions you get more pages free - less likely not to block
 on
 read (really: cp disk CD)
 
 If this does not help consider trying Andrew Mortons
 "lowish-latency patch for 2.4.0-test9"
 * It avoids getting stuck in kernel loops when the CPU is better needed
   for some other process.
 
 /RogerL
 
 --
 Home page:
   http://www.norran.net/nra02596/

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

[PATCH] test9: another vm lockup bug - squashed

2000-10-04 Thread Roger Larsson


Hi,

This is applicable on Riels latest addition.
(freepages v. zone->"limit")
That is probably not needed, and you should be able
to change your limits with this patch.

This patch adds equality check in several comparisons.

It is strictly only the one in __alloc_pages_limit
that is needed, it interacts with the test in
free_shortage. Without this patch you get stuck on
exactly zone->pages_min. Too few pages to alloc and
too many to free...


Ying Chen has reported that this patch cures his problem.

/RogerL

--
Home page:
  http://www.norran.net/nra02596/

--- linux/mm/page_alloc.c.orig  Wed Oct  4 21:27:41 2000
+++ linux/mm/page_alloc.c   Wed Oct  4 21:32:17 2000
@@ -268,7 +268,7 @@ static struct page * __alloc_pages_limit
water_mark = z->pages_high;
}
 
-   if (z->free_pages + z->inactive_clean_pages > water_mark) {
+   if (z->free_pages + z->inactive_clean_pages >= water_mark) {
struct page *page = NULL;
/* If possible, reclaim a page directly. */
if (direct_reclaim && z->free_pages < z->pages_min + 8)
@@ -329,7 +329,7 @@ struct page * __alloc_pages(zonelist_t *
 * wake up bdflush.
 */
else if (free_shortage() && nr_inactive_dirty_pages > free_shortage()
-   && nr_inactive_dirty_pages > freepages.high)
+   && nr_inactive_dirty_pages >= freepages.high)
wakeup_bdflush(0);
 
 try_again:
@@ -347,7 +347,7 @@ try_again:
if (!z->size)
BUG();
 
-   if (z->free_pages > z->pages_low) {
+   if (z->free_pages >= z->pages_low) {
page = rmqueue(z, order);
if (page)
return page;

Re: VM in v2.4.0test9

2000-10-04 Thread Roger Larsson


Rik van Riel wrote:
> 
> On Wed, 4 Oct 2000, Roger Larsson wrote:
> > Rik van Riel wrote:
> > > On Wed, 4 Oct 2000, Rik van Riel wrote:
> > >
> > > > > > First, you have MORE free memory than freepages.high. In this
> > > > > > case I really don't see why __alloc_pages() wouldn't give the
> > > > > > memory to your processes 
> > > > >
> > > > > Hmm...
> > > > > Can't it be a zone problem?
> > > > > Free pages is the total free - all zones.
> > > > > But suppose you want a page from a specific zone - DMA, the more
> > > > > memory you have the less likely that you have a DMA page free...
> > > > > Does all test take this into consideration?
> > > >
> > > > Free_shortage() /should/ take this into consideration, and unless
> > > > I'm wrong, it does ;)
> > >
> > > Also, a zone problem CANNOT cause the problem in
> > > David's 16MB test ...
> > >
> > > (this is getting stranger and stranger)
> >
> > Now I know from where the 125 pages limit comes from.
> > static int zone_balance_ratio[MAX_NR_ZONES] = { 32, 128, 128, };
> > 16M/4k/32 = 125
> >
> > Probably there is a mismatch between zone->free_pages and
> > free_pages.{min,low,high}
> 
> Argh
> 
> The potential for this bug has been around since 2.3.51, when
> different balance_ratios for different zones became possible.
> 
> The bug hasn't bitten us yet since then because 1) the balance
> ratio for ZONE_DMA wasn't changed until some time later and
> 2) we didn't use freepages.{min,low,high}.
> 
> Now that we /are/ using the values in the freepages array again,
> we're running into the very big problem that freepages.high has
> a value LOWER than zone->pages_min for the DMA zone, on 16MB
> machines. This caused David's system to behave the way it did.
> 


Hi again,

I wonder if something like this is needed to...

when checking if we could take a page the test was:
 free > limit
when checking for free_shortage, the test is:
 free < min

And sometimes limit == min...
can't we then be stuck on exactly limit?

Patch attached (more places/files?)

/RogerL
--
Home page:
  http://www.norran.net/nra02596/

--- linux/mm/page_alloc.c.orig  Wed Oct  4 21:27:41 2000
+++ linux/mm/page_alloc.c   Wed Oct  4 21:32:17 2000
@@ -268,7 +268,7 @@ static struct page * __alloc_pages_limit
water_mark = z->pages_high;
}
 
-   if (z->free_pages + z->inactive_clean_pages > water_mark) {
+   if (z->free_pages + z->inactive_clean_pages >= water_mark) {
struct page *page = NULL;
/* If possible, reclaim a page directly. */
if (direct_reclaim && z->free_pages < z->pages_min + 8)
@@ -329,7 +329,7 @@ struct page * __alloc_pages(zonelist_t *
 * wake up bdflush.
 */
else if (free_shortage() && nr_inactive_dirty_pages > free_shortage()
-   && nr_inactive_dirty_pages > freepages.high)
+   && nr_inactive_dirty_pages >= freepages.high)
wakeup_bdflush(0);
 
 try_again:
@@ -347,7 +347,7 @@ try_again:
if (!z->size)
BUG();
 
-   if (z->free_pages > z->pages_low) {
+   if (z->free_pages >= z->pages_low) {
page = rmqueue(z, order);
if (page)
return page;

Re: VM in v2.4.0test9

2000-10-04 Thread Roger Larsson


Rik van Riel wrote:
> 
> On Wed, 4 Oct 2000, Rik van Riel wrote:
> 
> > > > First, you have MORE free memory than freepages.high. In this
> > > > case I really don't see why __alloc_pages() wouldn't give the
> > > > memory to your processes 
> > >
> > > Hmm...
> > > Can't it be a zone problem?
> > > Free pages is the total free - all zones.
> > > But suppose you want a page from a specific zone - DMA, the more
> > > memory you have the less likely that you have a DMA page free...
> > > Does all test take this into consideration?
> >
> > Free_shortage() /should/ take this into consideration, and unless
> > I'm wrong, it does ;)
> 
> Also, a zone problem CANNOT cause the problem in
> David's 16MB test ...
> 
> (this is getting stranger and stranger)


Now I know from where the 125 pages limit comes from.
static int zone_balance_ratio[MAX_NR_ZONES] = { 32, 128, 128, };
16M/4k/32 = 125

Probably there is a mismatch between zone->free_pages and
free_pages.{min,low,high}

/RogerL

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: VM in v2.4.0test9

2000-10-04 Thread Roger Larsson


Rik van Riel wrote:
> 
> On Wed, 4 Oct 2000, David Weinehall wrote:
> 
> > Running the included program on a clean v2.4.0test9 kernel I can
> > hang the computer practically in no time.
> 
> > What seems most strange is that the doesn't even get depleated.
> > The machine still answers to SysRq and ping, but nothing else.
> 
> Looking again at this report in more detail, something
> very strange is going on ...
> 
> > This is what I got from SysRq+M (manual copy):
> >
> > Free pages: 500 kB (0 highmem)
> > Active: 8 inactive dirty: 1009, inactive clean:0
> > free: 125 (31 62 93)
> 
> First, you have MORE free memory than freepages.high. In this
> case I really don't see why __alloc_pages() wouldn't give the
> memory to your processes 
> 

Hmm...
Can't it be a zone problem?
Free pages is the total free - all zones.
But suppose you want a page from a specific zone - DMA, the more
memory you have the less likely that you have a DMA page free...
Does all test take this into consideration?

> > Free swap: 64772
> 
> And there is tons of swap free...
> 
> Are you absolutely sure this is VM related?  This almost looks
> like the system puts a in a read request but the request queue
> doesn't get unplugged, or something strange like that ...
> 
> There is more than enough memory to satisfy all VM requests and
> the loop in __alloc_pages() is straightforward enough to give
> your processes their memory without strange bugs ...
> 
> regards,
> 
> Rik
> --
> "What you're running that piece of shit Gnome?!?!"
>-- Miguel de Icaza, UKUUG 2000
> 
> http://www.conectiva.com/   http://www.surriel.com/
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: VM in v2.4.0test9

2000-10-04 Thread Roger Larsson


Rik van Riel wrote:
 
 On Wed, 4 Oct 2000, David Weinehall wrote:
 
  Running the included program on a clean v2.4.0test9 kernel I can
  hang the computer practically in no time.
 
  What seems most strange is that the doesn't even get depleated.
  The machine still answers to SysRq and ping, but nothing else.
 
 Looking again at this report in more detail, something
 very strange is going on ...
 
  This is what I got from SysRq+M (manual copy):
 
  Free pages: 500 kB (0 highmem)
  Active: 8 inactive dirty: 1009, inactive clean:0
  free: 125 (31 62 93)
 
 First, you have MORE free memory than freepages.high. In this
 case I really don't see why __alloc_pages() wouldn't give the
 memory to your processes 
 

Hmm...
Can't it be a zone problem?
Free pages is the total free - all zones.
But suppose you want a page from a specific zone - DMA, the more
memory you have the less likely that you have a DMA page free...
Does all test take this into consideration?

  Free swap: 64772
 
 And there is tons of swap free...
 
 Are you absolutely sure this is VM related?  This almost looks
 like the system puts a in a read request but the request queue
 doesn't get unplugged, or something strange like that ...
 
 There is more than enough memory to satisfy all VM requests and
 the loop in __alloc_pages() is straightforward enough to give
 your processes their memory without strange bugs ...
 
 regards,
 
 Rik
 --
 "What you're running that piece of shit Gnome?!?!"
-- Miguel de Icaza, UKUUG 2000
 
 http://www.conectiva.com/   http://www.surriel.com/
 
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: VM in v2.4.0test9

2000-10-04 Thread Roger Larsson


Rik van Riel wrote:
 
 On Wed, 4 Oct 2000, Roger Larsson wrote:
  Rik van Riel wrote:
   On Wed, 4 Oct 2000, Rik van Riel wrote:
  
  First, you have MORE free memory than freepages.high. In this
  case I really don't see why __alloc_pages() wouldn't give the
  memory to your processes 

 Hmm...
 Can't it be a zone problem?
 Free pages is the total free - all zones.
 But suppose you want a page from a specific zone - DMA, the more
 memory you have the less likely that you have a DMA page free...
 Does all test take this into consideration?
   
Free_shortage() /should/ take this into consideration, and unless
I'm wrong, it does ;)
  
   Also, a zone problem CANNOT cause the problem in
   David's 16MB test ...
  
   (this is getting stranger and stranger)
 
  Now I know from where the 125 pages limit comes from.
  static int zone_balance_ratio[MAX_NR_ZONES] = { 32, 128, 128, };
  16M/4k/32 = 125
 
  Probably there is a mismatch between zone-free_pages and
  free_pages.{min,low,high}
 
 Argh
 
 The potential for this bug has been around since 2.3.51, when
 different balance_ratios for different zones became possible.
 
 The bug hasn't bitten us yet since then because 1) the balance
 ratio for ZONE_DMA wasn't changed until some time later and
 2) we didn't use freepages.{min,low,high}.
 
 Now that we /are/ using the values in the freepages array again,
 we're running into the very big problem that freepages.high has
 a value LOWER than zone-pages_min for the DMA zone, on 16MB
 machines. This caused David's system to behave the way it did.
 


Hi again,

I wonder if something like this is needed to...

when checking if we could take a page the test was:
 free  limit
when checking for free_shortage, the test is:
 free  min

And sometimes limit == min...
can't we then be stuck on exactly limit?

Patch attached (more places/files?)

/RogerL
--
Home page:
  http://www.norran.net/nra02596/

--- linux/mm/page_alloc.c.orig  Wed Oct  4 21:27:41 2000
+++ linux/mm/page_alloc.c   Wed Oct  4 21:32:17 2000
@@ -268,7 +268,7 @@ static struct page * __alloc_pages_limit
water_mark = z-pages_high;
}
 
-   if (z-free_pages + z-inactive_clean_pages  water_mark) {
+   if (z-free_pages + z-inactive_clean_pages = water_mark) {
struct page *page = NULL;
/* If possible, reclaim a page directly. */
if (direct_reclaim  z-free_pages  z-pages_min + 8)
@@ -329,7 +329,7 @@ struct page * __alloc_pages(zonelist_t *
 * wake up bdflush.
 */
else if (free_shortage()  nr_inactive_dirty_pages  free_shortage()
-nr_inactive_dirty_pages  freepages.high)
+nr_inactive_dirty_pages = freepages.high)
wakeup_bdflush(0);
 
 try_again:
@@ -347,7 +347,7 @@ try_again:
if (!z-size)
BUG();
 
-   if (z-free_pages  z-pages_low) {
+   if (z-free_pages = z-pages_low) {
page = rmqueue(z, order);
if (page)
return page;

[PATCH] test9: another vm lockup bug - squashed

2000-10-04 Thread Roger Larsson


Hi,

This is applicable on Riels latest addition.
(freepages v. zone-"limit")
That is probably not needed, and you should be able
to change your limits with this patch.

This patch adds equality check in several comparisons.

It is strictly only the one in __alloc_pages_limit
that is needed, it interacts with the test in
free_shortage. Without this patch you get stuck on
exactly zone-pages_min. Too few pages to alloc and
too many to free...


Ying Chen has reported that this patch cures his problem.

/RogerL

--
Home page:
  http://www.norran.net/nra02596/

--- linux/mm/page_alloc.c.orig  Wed Oct  4 21:27:41 2000
+++ linux/mm/page_alloc.c   Wed Oct  4 21:32:17 2000
@@ -268,7 +268,7 @@ static struct page * __alloc_pages_limit
water_mark = z-pages_high;
}
 
-   if (z-free_pages + z-inactive_clean_pages  water_mark) {
+   if (z-free_pages + z-inactive_clean_pages = water_mark) {
struct page *page = NULL;
/* If possible, reclaim a page directly. */
if (direct_reclaim  z-free_pages  z-pages_min + 8)
@@ -329,7 +329,7 @@ struct page * __alloc_pages(zonelist_t *
 * wake up bdflush.
 */
else if (free_shortage()  nr_inactive_dirty_pages  free_shortage()
-nr_inactive_dirty_pages  freepages.high)
+nr_inactive_dirty_pages = freepages.high)
wakeup_bdflush(0);
 
 try_again:
@@ -347,7 +347,7 @@ try_again:
if (!z-size)
BUG();
 
-   if (z-free_pages  z-pages_low) {
+   if (z-free_pages = z-pages_low) {
page = rmqueue(z, order);
if (page)
return page;

test9-pre9 keyboard and mouse stopped working - deadlock?

2000-10-03 Thread Roger Larsson


Hi,


I started a compile of kernel test9-final on a virtual console.
(make bzImage modules modules_install)

Then I started X on another one. Initial windows showed up fine.
But mouse was stuck. Tried magic - nothing. (early in compile,
should not be at modules_install for a long time)

I noticed that disk led flashed now and then - decided to wait.

After a while the xlock (really the kde one) started - did not
react on keys (I had pressed magic-r earlier).

Left it over night.

Still no reaction to keyboard and mouse - RESET.

Conclusion:
- Probably not vm related since compilation probably continued
  in the background.

/RogerL

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

test9-pre9 keyboard and mouse stopped working - deadlock?

2000-10-03 Thread Roger Larsson


Hi,


I started a compile of kernel test9-final on a virtual console.
(make bzImage modules modules_install)

Then I started X on another one. Initial windows showed up fine.
But mouse was stuck. Tried magic - nothing. (early in compile,
should not be at modules_install for a long time)

I noticed that disk led flashed now and then - decided to wait.

After a while the xlock (really the kde one) started - did not
react on keys (I had pressed magic-r earlier).

Left it over night.

Still no reaction to keyboard and mouse - RESET.

Conclusion:
- Probably not vm related since compilation probably continued
  in the background.

/RogerL

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

2.4.0-t9p7 and mmap002 - freeze

2000-09-27 Thread Roger Larsson


Hi,

Tried latest patch with the same result - freeze...

No extra patches added.

running from console as root
mmap002 from memtest-0.0.3
with RAMSIZE defined as 90 MB (I have 96MB)
after a while with heavy disk access (thrashing?) the drive
becomes silent - no more progress...
[if you can not repeat this - try with less memory 32 MB...]

Magic works!

Magic memory
 Constantly LOW on inactive_clean (0 is the most common)
 lots of shared memory (almost equals active)
 [can be normal condition since mmap002 produces dirty
  mmaped pages]

Magic process:
  Manual samples gave the following locations.
  (NOTE: not a call trace)
  We are trying to clean pages, but do we make any
  progress since disk is silent?

Trace; c0127d85 
Trace; c0126dad 
Trace; c0127e00 
Trace; c0128035 
Trace; c0127dcc 
Trace; c0127dd0 
Trace; c0127e00 
Trace; c012fd38 

Magic Sigterm (Alt+SysRq+E)
 Gives you a running system again.


Notes:
 Probably timing critical for entry into this state
 since adding a few printk:s makes it happen less often.
 I have even got complete mmap002 runs succeed - but
 disk is running too much and for too long time...
 a lot more than 10 min - normal run on previous testX
 did usually take less than 3 minutes.

/RogerL

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

2.4.0-t9p7 and mmap002 - freeze

2000-09-27 Thread Roger Larsson


Hi,

Tried latest patch with the same result - freeze...

No extra patches added.

running from console as root
mmap002 from memtest-0.0.3
with RAMSIZE defined as 90 MB (I have 96MB)
after a while with heavy disk access (thrashing?) the drive
becomes silent - no more progress...
[if you can not repeat this - try with less memory 32 MB...]

Magic works!

Magic memory
 Constantly LOW on inactive_clean (0 is the most common)
 lots of shared memory (almost equals active)
 [can be normal condition since mmap002 produces dirty
  mmaped pages]

Magic process:
  Manual samples gave the following locations.
  (NOTE: not a call trace)
  We are trying to clean pages, but do we make any
  progress since disk is silent?

Trace; c0127d85 page_launder+3d/724
Trace; c0126dad deactivate_page_nolock+13d/248
Trace; c0127e00 page_launder+b8/724
Trace; c0128035 page_launder+2ed/724
Trace; c0127dcc page_launder+84/724
Trace; c0127dd0 page_launder+88/724
Trace; c0127e00 page_launder+b8/724
Trace; c012fd38 try_to_free_buffers+4/138

Magic Sigterm (Alt+SysRq+E)
 Gives you a running system again.


Notes:
 Probably timing critical for entry into this state
 since adding a few printk:s makes it happen less often.
 I have even got complete mmap002 runs succeed - but
 disk is running too much and for too long time...
 a lot more than 10 min - normal run on previous testX
 did usually take less than 3 minutes.

/RogerL

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

test9-pre6 and GFP_BUFFER allocations

2000-09-23 Thread Roger Larsson


Hi,

What will happen in this scenario:
a process
* grabs a fs semaphore
* needs some buffers to do IO, calls __alloc_pages(GFP_BUFFER)
Suppose the system is MIN on free mem, has no inactive_clean pages.
We will end up around line 446 in pages_alloc.c and issue a
try_to_free_pages(...). Then goto try_again.
* In our case this is unlikely to work - not allowed to do IO.
* Will we sleep? Probably not, not even in refill_inactive (no GFP_IO)
  BTW, Why can't we schedule if GFP_IO is not set???
* Will we free any page, to get above MIN - only if there are enough
  clean pages in active list.
* Won't we end up in an infinite loop?
Suppose it does sleep. Will kswapd then be able to free any page
assuming we are holding a critical fs semaphore...

Or am I missing something, again?


One approach could be: only goto try_again if GFP_IO is set.
And alloc one page from the critical memory pool.
I will try this.

/RogerL

--
Home page:
  http://www.norran.net/nra02596/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

1 2 >

1 - 100 of 111 matches

Mail list logo