RE: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Kenn Humborg
> On Thursday 16 August 2007 01:39, Satyam Sharma wrote:
> >
> >  static inline void wait_for_init_deassert(atomic_t *deassert)
> >  {
> > -   while (!atomic_read(deassert));
> > +   while (!atomic_read(deassert))
> > +   cpu_relax();
> > return;
> >  }
> 
> For less-than-briliant people like me, it's totally non-obvious that
> cpu_relax() is needed for correctness here, not just to make P4 happy.
> 
> IOW: "atomic_read" name quite unambiguously means "I will read
> this variable from main memory". Which is not true and creates
> potential for confusion and bugs.

To me, "atomic_read" means a read which is synchronized with other 
changes to the variable (using the atomic_XXX functions) in such 
a way that I will always only see the "before" or "after"
state of the variable - never an intermediate state while a 
modification is happening.  It doesn't imply that I have to 
see the "after" state immediately after another thread modifies
it.

Perhaps the Linux atomic_XXX functions work like that, or used
to work like that, but it's counter-intuitive to me that "atomic"
should imply a memory read.

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Kenn Humborg
 On Thursday 16 August 2007 01:39, Satyam Sharma wrote:
 
   static inline void wait_for_init_deassert(atomic_t *deassert)
   {
  -   while (!atomic_read(deassert));
  +   while (!atomic_read(deassert))
  +   cpu_relax();
  return;
   }
 
 For less-than-briliant people like me, it's totally non-obvious that
 cpu_relax() is needed for correctness here, not just to make P4 happy.
 
 IOW: atomic_read name quite unambiguously means I will read
 this variable from main memory. Which is not true and creates
 potential for confusion and bugs.

To me, atomic_read means a read which is synchronized with other 
changes to the variable (using the atomic_XXX functions) in such 
a way that I will always only see the before or after
state of the variable - never an intermediate state while a 
modification is happening.  It doesn't imply that I have to 
see the after state immediately after another thread modifies
it.

Perhaps the Linux atomic_XXX functions work like that, or used
to work like that, but it's counter-intuitive to me that atomic
should imply a memory read.

Later,
Kenn

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LV] start_thread question...

2001-05-20 Thread Kenn Humborg

On Sun, May 20, 2001 at 05:24:48PM +0100, Dave Airlie wrote:
> 
> I'm implementing start_thread for the VAX port and am wondering does
> start_thread have to return to load_elf_binary? I'm working on the init
> thread and what is happening is it is returning the whole way back to the
> execve caller .. which I know shouldn't happen.
> 
> so I suppose what I'm looking for is the point where the user space code
> gets control... is it when the registers are set in the start_thread? if
> so how does start_thread return
> 
> On the VAX we have to call a return from interrupt to get to user space
> and I'm trying to figure out where this should happen...

I haven't got time to look at this in detail, but you could
probably do it by frobbing the saved registers that will be
restored by the ret_from_syscall in entry.S.  Do you have
a pt_regs *regs function argument at the right point?  If
so, it should point to these saved registers.

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



IP autoconfig via DHCP?

2001-03-06 Thread Kenn Humborg


Quick question...

Back in 2.2, we could use DHCP to auto-config the IP setup.  In fact,
the choice was DHCP, BOOTP or RARP.

Now there is only BOOTP or RARP.  What happened to DHCP support?

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



IP autoconfig via DHCP?

2001-03-06 Thread Kenn Humborg


Quick question...

Back in 2.2, we could use DHCP to auto-config the IP setup.  In fact,
the choice was DHCP, BOOTP or RARP.

Now there is only BOOTP or RARP.  What happened to DHCP support?

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: kmalloc() alignment

2001-03-05 Thread Kenn Humborg

On Mon, Mar 05, 2001 at 04:15:36PM -0800, H. Peter Anvin wrote:
> > So, to summarise (for 32-bit CPUs):
> > 
> > o  Alan Cox & Manfred Spraul say 4-byte alignment is guaranteed.
> > 
> > o  If you need larger alignment, you need to alloc a larger space,
> >round as necessary, and keep the original pointer for kfree()
> > 
> > Maybe I'll just use get_free_pages, since it's a 64KB chunk that
> > I need (and it's only a once-off).
> > 
> 
> It might be worth asking the question if larger blocks are more
> aligned?

OK, I'll bite...

Are larger blocks more aligned?

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: kmalloc() alignment

2001-03-05 Thread Kenn Humborg

On Sun, Mar 04, 2001 at 11:41:12PM +0100, Manfred Spraul wrote:
> >
> > Does kmalloc() make any guarantees of the alignment of allocated 
> > blocks? Will the returned block always be 4-, 8- or 16-byte 
> > aligned, for example? 
> >
> 
> 4-byte alignment is guaranteed on 32-bit cpus, 8-byte alignment on
> 64-bit cpus.

So, to summarise (for 32-bit CPUs):

o  Alan Cox & Manfred Spraul say 4-byte alignment is guaranteed.

o  If you need larger alignment, you need to alloc a larger space,
   round as necessary, and keep the original pointer for kfree()

Maybe I'll just use get_free_pages, since it's a 64KB chunk that
I need (and it's only a once-off).

Thanks for your advice.

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: kmalloc() alignment

2001-03-05 Thread Kenn Humborg

On Sun, Mar 04, 2001 at 11:41:12PM +0100, Manfred Spraul wrote:
 
  Does kmalloc() make any guarantees of the alignment of allocated 
  blocks? Will the returned block always be 4-, 8- or 16-byte 
  aligned, for example? 
 
 
 4-byte alignment is guaranteed on 32-bit cpus, 8-byte alignment on
 64-bit cpus.

So, to summarise (for 32-bit CPUs):

o  Alan Cox  Manfred Spraul say 4-byte alignment is guaranteed.

o  If you need larger alignment, you need to alloc a larger space,
   round as necessary, and keep the original pointer for kfree()

Maybe I'll just use get_free_pages, since it's a 64KB chunk that
I need (and it's only a once-off).

Thanks for your advice.

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: kmalloc() alignment

2001-03-05 Thread Kenn Humborg

On Mon, Mar 05, 2001 at 04:15:36PM -0800, H. Peter Anvin wrote:
  So, to summarise (for 32-bit CPUs):
  
  o  Alan Cox  Manfred Spraul say 4-byte alignment is guaranteed.
  
  o  If you need larger alignment, you need to alloc a larger space,
 round as necessary, and keep the original pointer for kfree()
  
  Maybe I'll just use get_free_pages, since it's a 64KB chunk that
  I need (and it's only a once-off).
  
 
 It might be worth asking the question if larger blocks are more
 aligned?

OK, I'll bite...

Are larger blocks more aligned?

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



kmalloc() alignment

2001-03-04 Thread Kenn Humborg


Does kmalloc() make any guarantees of the alignment of allocated
blocks?  Will the returned block always be 4-, 8- or 16-byte
aligned, for example?

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



kmalloc() alignment

2001-03-04 Thread Kenn Humborg


Does kmalloc() make any guarantees of the alignment of allocated
blocks?  Will the returned block always be 4-, 8- or 16-byte
aligned, for example?

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: kernel_thread() & thread starting

2001-02-20 Thread Kenn Humborg

On Sun, Feb 18, 2001 at 10:53:16PM +, Russell King wrote:
> Kenn Humborg writes:
> > When starting bdflush and kupdated, bdflush_init() uses a semaphore to
> > make sure that the threads have run before continuing.  Shouldn't
> > start_context_thread() do something similar?
> 
> I think this would be a good idea.  Here is a patch to try.  Please report
> back if it works so that it can be forwarded to Linus.  Thanks.

Works perfectly for me.  

I'll leave it up to you guys to decide what's the right way to deal with 
this and pass a patch to Linus/Alan.  Meanwhile, I'll keep Russell's 
patch below in our CVS tree.

Thanks,
Kenn

> --- orig/kernel/context.c Tue Jan 30 13:31:11 2001
> +++ linux/kernel/context.cSun Feb 18 22:51:56 2001
> @@ -63,7 +63,7 @@
>   return ret;
>  }
>  
> -static int context_thread(void *dummy)
> +static int context_thread(void *sem)
>  {
>   struct task_struct *curtask = current;
>   DECLARE_WAITQUEUE(wait, curtask);
> @@ -79,6 +79,8 @@
>   recalc_sigpending(curtask);
>   spin_unlock_irq(>sigmask_lock);
>  
> + up((struct semaphore *)sem);
> +
>   /* Install a handler so SIGCLD is delivered */
>   sa.sa.sa_handler = SIG_IGN;
>   sa.sa.sa_flags = 0;
> @@ -148,7 +150,9 @@
>   
>  int start_context_thread(void)
>  {
> - kernel_thread(context_thread, NULL, CLONE_FS | CLONE_FILES);
> + DECLARE_MUTEX_LOCKED(sem);
> + kernel_thread(context_thread, , CLONE_FS | CLONE_FILES);
> + down();
>   return 0;
>  }
>  
> 
> 
> --
> Russell King ([EMAIL PROTECTED])The developer of ARM Linux
>  http://www.arm.linux.org.uk/personal/aboutme.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: kernel_thread() thread starting

2001-02-20 Thread Kenn Humborg

On Sun, Feb 18, 2001 at 10:53:16PM +, Russell King wrote:
 Kenn Humborg writes:
  When starting bdflush and kupdated, bdflush_init() uses a semaphore to
  make sure that the threads have run before continuing.  Shouldn't
  start_context_thread() do something similar?
 
 I think this would be a good idea.  Here is a patch to try.  Please report
 back if it works so that it can be forwarded to Linus.  Thanks.

Works perfectly for me.  

I'll leave it up to you guys to decide what's the right way to deal with 
this and pass a patch to Linus/Alan.  Meanwhile, I'll keep Russell's 
patch below in our CVS tree.

Thanks,
Kenn

 --- orig/kernel/context.c Tue Jan 30 13:31:11 2001
 +++ linux/kernel/context.cSun Feb 18 22:51:56 2001
 @@ -63,7 +63,7 @@
   return ret;
  }
  
 -static int context_thread(void *dummy)
 +static int context_thread(void *sem)
  {
   struct task_struct *curtask = current;
   DECLARE_WAITQUEUE(wait, curtask);
 @@ -79,6 +79,8 @@
   recalc_sigpending(curtask);
   spin_unlock_irq(curtask-sigmask_lock);
  
 + up((struct semaphore *)sem);
 +
   /* Install a handler so SIGCLD is delivered */
   sa.sa.sa_handler = SIG_IGN;
   sa.sa.sa_flags = 0;
 @@ -148,7 +150,9 @@
   
  int start_context_thread(void)
  {
 - kernel_thread(context_thread, NULL, CLONE_FS | CLONE_FILES);
 + DECLARE_MUTEX_LOCKED(sem);
 + kernel_thread(context_thread, sem, CLONE_FS | CLONE_FILES);
 + down(sem);
   return 0;
  }
  
 
 
 --
 Russell King ([EMAIL PROTECTED])The developer of ARM Linux
  http://www.arm.linux.org.uk/personal/aboutme.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



kernel_thread() & thread starting

2001-02-18 Thread Kenn Humborg


In init/main.c, do_basic_setup() we have:

start_context_thread();
do_initcalls();

start_context_thread() calls kernel_thread() to start the keventd
thread.  Then do_initcalls() calls all the init functions and
finishes by calling flush_scheduled_tasks().  This function ends
up calling schedule_task() which checks if keventd is running.

With a very stripped down kernel, it seems possible that do_initcalls()
can complete without context_thread() having had a chance to run (and
set the flag that keventd is running).

Right now, in the Linux/VAX project, I'm working with a very stripped
down kernel and I'm seeing this behaviour.  Depending on what I enable
in the .config, I can get schedule_task() to fail with:

   schedule_task(): keventd has not started

When starting bdflush and kupdated, bdflush_init() uses a semaphore to
make sure that the threads have run before continuing.  Shouldn't
start_context_thread() do something similar?

Or am I missing something?

Thanks,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



kernel_thread() thread starting

2001-02-18 Thread Kenn Humborg


In init/main.c, do_basic_setup() we have:

start_context_thread();
do_initcalls();

start_context_thread() calls kernel_thread() to start the keventd
thread.  Then do_initcalls() calls all the init functions and
finishes by calling flush_scheduled_tasks().  This function ends
up calling schedule_task() which checks if keventd is running.

With a very stripped down kernel, it seems possible that do_initcalls()
can complete without context_thread() having had a chance to run (and
set the flag that keventd is running).

Right now, in the Linux/VAX project, I'm working with a very stripped
down kernel and I'm seeing this behaviour.  Depending on what I enable
in the .config, I can get schedule_task() to fail with:

   schedule_task(): keventd has not started

When starting bdflush and kupdated, bdflush_init() uses a semaphore to
make sure that the threads have run before continuing.  Shouldn't
start_context_thread() do something similar?

Or am I missing something?

Thanks,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Third arg to switch_to()

2000-10-30 Thread Kenn Humborg

On Mon, Oct 30, 2000 at 07:15:58PM +, I wrote:
> 
> Can anyone point me to an explanation of the third arg to 
> switch_to(prev, next, last)?
> 
> It appeared in 2.2.8.
> 
> What exactly is supposed to be written to it?

Mea culpa...

Further digging revealed that it's for returning prev in the
new task, to deal with the fact that the stack has changed
so local variables in schedule() don't exist anymore.

Later,
Kenn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Third arg to switch_to()

2000-10-30 Thread Kenn Humborg


Can anyone point me to an explanation of the third arg to 
switch_to(prev, next, last)?

It appeared in 2.2.8.

What exactly is supposed to be written to it?

Thanks,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Third arg to switch_to()

2000-10-30 Thread Kenn Humborg


Can anyone point me to an explanation of the third arg to 
switch_to(prev, next, last)?

It appeared in 2.2.8.

What exactly is supposed to be written to it?

Thanks,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Third arg to switch_to()

2000-10-30 Thread Kenn Humborg

On Mon, Oct 30, 2000 at 07:15:58PM +, I wrote:
 
 Can anyone point me to an explanation of the third arg to 
 switch_to(prev, next, last)?
 
 It appeared in 2.2.8.
 
 What exactly is supposed to be written to it?

Mea culpa...

Further digging revealed that it's for returning prev in the
new task, to deal with the fact that the stack has changed
so local variables in schedule() don't exist anymore.

Later,
Kenn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: 2.4 MM overview?

2000-10-16 Thread Kenn Humborg

> > That's not the worst!  Considering the 4-byte PTE and the
> 40-byte mem_map_t,
> > our memory management overhead is at least 44 bytes/page or 8.5%!
>
> use a logical page size of 4kb.
>
> > We are formulating cunning plans of aggregating 2, 4 or 8 pages together
> > into "bigpages", telling the arch-independent code that we've got
> > larger pages than we really have and manipulating multiple PTEs in the
> > set_pte() primitive and friends.
> >
> > We don't know how feasible this is yet..
>
> why wouldn't it be feasible ?

Because I don't know this part of the kernel well enough yet :-)
Maybe there are cuncurrency issues with modifying multiple PTEs
when the kernel thinks it's only modifying one.  There may be
hardware-mandated limitations on this too.  I'll have to check
_very_ closely with the VAX Architecture Reference Manual.

> > > OTOH, I think mapping all physical memory makes sense with
> the three page
> > > table setup.
> >
> > It might and it might not.  Expanding the system page table is pretty
> > much out of the question because it needs to be physically contiguous.
>
> agreed.
>
> > So we need to allocate system PTEs for the following at boot time:
> >
> >1. Map all physical memory pages
> >2. Spare PTEs for mapping I/O space via ioremap().
> >3. Spare PTEs for vmalloc()
> 4. Spare PTEs for making user process page tables virtually
> contiguous.

Couldn't we use vmalloc() for this?

> Note
> that this effectively gives you a two-level page table.
> (Actually, a 3-level
> page table, with 2 pmds per pgd, 4K PTEs per 3rd-level page table, and 512
> bytes per page.)
>
> So, here's what I'm proposing:

I'll need to examine this more closely when I get home later.
Too busy right now :-(

> > It seems a bit wasteful that process pages will have two PTEs, one in
> > the relevant process page table and one in the system page table.
>
> why ?  You lose 0.78 % of your physical memory compared to the more
> complicated design, which shouldn't hurt too much.

The 'scarce resource' I'm thinking about here is not memory, it's
system PTEs.

> It might make sense
> if you have tons of physical memory though so you can use all of it
> (where tons I'd guess to be about 1.8 GB, not knowing too much about
> the architecture).

Memory from 0xc000 to 0x is not usable in VAX, so
map-all-memory will give a maximum of just under 1GB.  I have
a feeling that there is an architectural limit of 1GB anyway
(21-bit page frame number + 9 bit PAGE_SHIFT = 30 bits = 1GB).

> > How much space tends to be vmalloc()-ed in a running system?
>
> See the discussion for alpha a week or so ago.  It tends to not
> be very much
> but for some applications (TUX, for example), it's expected to be most of
> physical memory.

Dammit!  Must have been just before I subscribed...  I'll do
some archive archeology later.

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: 2.4 MM overview?

2000-10-16 Thread Kenn Humborg

> > We've kind of got 1.5-level page tables.  There are actually 3
> page tables.
> > The system page table maps memory starting at 0x8000.  The
> P0 process
> > page table maps from 0x0 up and the P1 process page table maps from
> > 0x7fff down.
>
> And they have to be physically contiguous I guess ?

The system page table must be physically contiguous.  The process tables
are actually referred to via virtual addresses, so they only have to
be virtually contiguous in system space.

> > This means that sparse address spaces are going to be _really_ expensive
> > on PTEs.  I don't know how much of a problem this is going to be yet,
> > but I'm sure it's going to be fun :-)
>
> 512 byte pages, 4 bytes per pte ?  Ouch.  Can you fill the TLB manually ?

That's not the worst!  Considering the 4-byte PTE and the 40-byte mem_map_t,
our memory management overhead is at least 44 bytes/page or 8.5%!

We are formulating cunning plans of aggregating 2, 4 or 8 pages together
into "bigpages", telling the arch-independent code that we've got
larger pages than we really have and manipulating multiple PTEs in the
set_pte() primitive and friends.

We don't know how feasible this is yet..

> OTOH, I think mapping all physical memory makes sense with the three page
> table setup.

It might and it might not.  Expanding the system page table is pretty
much out of the question because it needs to be physically contiguous.
So we need to allocate system PTEs for the following at boot time:

   1. Map all physical memory pages
   2. Spare PTEs for mapping I/O space via ioremap().
   3. Spare PTEs for vmalloc()

It seems a bit wasteful that process pages will have two PTEs, one in
the relevant process page table and one in the system page table.
If we could get away without needing the system PTE, then this would
either provide more space for #2 and #3 above, or reduce the size
of the system page table.

How much space tends to be vmalloc()-ed in a running system?

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: 2.4 MM overview?

2000-10-16 Thread Kenn Humborg

  We've kind of got 1.5-level page tables.  There are actually 3
 page tables.
  The system page table maps memory starting at 0x8000.  The
 P0 process
  page table maps from 0x0 up and the P1 process page table maps from
  0x7fff down.

 And they have to be physically contiguous I guess ?

The system page table must be physically contiguous.  The process tables
are actually referred to via virtual addresses, so they only have to
be virtually contiguous in system space.

  This means that sparse address spaces are going to be _really_ expensive
  on PTEs.  I don't know how much of a problem this is going to be yet,
  but I'm sure it's going to be fun :-)

 512 byte pages, 4 bytes per pte ?  Ouch.  Can you fill the TLB manually ?

That's not the worst!  Considering the 4-byte PTE and the 40-byte mem_map_t,
our memory management overhead is at least 44 bytes/page or 8.5%!

We are formulating cunning plans of aggregating 2, 4 or 8 pages together
into "bigpages", telling the arch-independent code that we've got
larger pages than we really have and manipulating multiple PTEs in the
set_pte() primitive and friends.

We don't know how feasible this is yet..

 OTOH, I think mapping all physical memory makes sense with the three page
 table setup.

It might and it might not.  Expanding the system page table is pretty
much out of the question because it needs to be physically contiguous.
So we need to allocate system PTEs for the following at boot time:

   1. Map all physical memory pages
   2. Spare PTEs for mapping I/O space via ioremap().
   3. Spare PTEs for vmalloc()

It seems a bit wasteful that process pages will have two PTEs, one in
the relevant process page table and one in the system page table.
If we could get away without needing the system PTE, then this would
either provide more space for #2 and #3 above, or reduce the size
of the system page table.

How much space tends to be vmalloc()-ed in a running system?

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4 MM overview?

2000-10-15 Thread Kenn Humborg

On Sun, Oct 15, 2000 at 09:45:11PM +0100, Alan Cox wrote:
> > Well, we ain't got these luxuries/complications in VAXland...  Hell, 
> > we don't even have two-level page tables :-(
> 
> Really. Ugh. I always assumed Vax had at least two levels because mmap on
> 4.2 BSD used to panic on 128K+ blocks. I guess there was a different reason
> for that then

We've kind of got 1.5-level page tables.  There are actually 3 page tables.
The system page table maps memory starting at 0x8000.  The P0 process
page table maps from 0x0 up and the P1 process page table maps from
0x7fff down.

This means that sparse address spaces are going to be _really_ expensive
on PTEs.  I don't know how much of a problem this is going to be yet,
but I'm sure it's going to be fun :-)

You can be sure that the first valid page won't be at 0x08048000...

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4 MM overview?

2000-10-15 Thread Kenn Humborg

On Sun, Oct 15, 2000 at 09:22:58PM +0100, Alan Cox wrote:
> > > or you have a sane memory management model with tags/spaces then its a non issue
> > 
> > You've lost me here.  Tags/spaces?
> 
> A lot of memory management hardware allows you to build page tables that contain
> more than just the addresses. Instead a tag register or the processor state
> or both are combined in the lookup. This is paticularly important for a 
> virtually tagged cache to avoid flushing the cache on task switches

[Consults VMS/Alpha Internals & Data Structure manual...]

You mean like the way the Alpha has a PTE bit that says 'this page is
valid at the same address in every process', and the address space
number (ASN) that can be used to 'uniquefy' cache entries for the
same virtual addresses in different processes?

Well, we ain't got these luxuries/complications in VAXland...  Hell, 
we don't even have two-level page tables :-(

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4 MM overview?

2000-10-15 Thread Kenn Humborg

On Sun, Oct 15, 2000 at 08:35:46PM +0100, Alan Cox wrote:
> > I understand that 2.4 no longer maps all physical memory as 2.2
> > and earlier used to do.
> 
> Its really up to you if you choose to do that or not. If you have enough 
> address space to create all your virtual and physical mappings without problems,

OK...

> or you have a sane memory management model with tags/spaces then its a non issue

You've lost me here.  Tags/spaces?

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4 MM overview?

2000-10-15 Thread Kenn Humborg

On Sun, Oct 15, 2000 at 08:07:06PM +0200, Andi Kleen wrote:
> On Sun, Oct 15, 2000 at 05:29:46PM +0100, Kenn Humborg wrote:
> > 
> > 
> > __pa() and __va() are still defined as addr -/+ PAGE_OFFSET.  So
> > where did I hear about 2.4 not mapping all memory?  Could it be
> > that this applies only to "high memory" in x86?
> 
> It only applies to high memory. To access it you have to use kmap(). 
> To use __pa you need to create a bounce buffer in lowmem first as of 2.4.

Excellent!  Thanks for clearing that up.

Later,
Kenn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4 MM overview?

2000-10-15 Thread Kenn Humborg

On Sun, Oct 15, 2000 at 06:03:40PM +0200, Erik Mouw wrote:
> On Sun, Oct 15, 2000 at 04:24:45PM +0100, Kenn Humborg wrote:
> > I understand that 2.4 no longer maps all physical memory as 2.2
> > and earlier used to do.
> > 
> > Is there any documentation on this change and how it affects
> > arch-specific code?
> > 
> > Specifically, we've been basing the VAX port on 2.2 while waiting
> > for 2.4 to stabilize.  Now we're looking at moving to 2.4.
> 
> Have a look at the Linux-MM pages at:
> 
>   http://www.linux.eu.org/Linux-MM/

The stuff linked to from there seems to cover the higher-level VM
aspects like balancing the VM.  Basically arch-independent stuff.

I'm looking for info on the impact the 2.4 changes will have on the
"API" between the arch-indep and arch-dep code.  For example, 2.2
assumes that you can access  by referencing 
 + PAGE_OFFSET.  AFAIK, this is no longer true in 2.4.

So what's the new mechanism for accessing physical memory?

OK, this particular question is easily answered by reading the 
source...



__pa() and __va() are still defined as addr -/+ PAGE_OFFSET.  So
where did I hear about 2.4 not mapping all memory?  Could it be
that this applies only to "high memory" in x86?

Dazed and confused,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4 MM overview?

2000-10-15 Thread Kenn Humborg


I understand that 2.4 no longer maps all physical memory as 2.2
and earlier used to do.

Is there any documentation on this change and how it affects
arch-specific code?

Specifically, we've been basing the VAX port on 2.2 while waiting
for 2.4 to stabilize.  Now we're looking at moving to 2.4.

Thanks,
Kenn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



2.4 MM overview?

2000-10-15 Thread Kenn Humborg


I understand that 2.4 no longer maps all physical memory as 2.2
and earlier used to do.

Is there any documentation on this change and how it affects
arch-specific code?

Specifically, we've been basing the VAX port on 2.2 while waiting
for 2.4 to stabilize.  Now we're looking at moving to 2.4.

Thanks,
Kenn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4 MM overview?

2000-10-15 Thread Kenn Humborg

On Sun, Oct 15, 2000 at 08:07:06PM +0200, Andi Kleen wrote:
 On Sun, Oct 15, 2000 at 05:29:46PM +0100, Kenn Humborg wrote:
  Surprise!
  
  __pa() and __va() are still defined as addr -/+ PAGE_OFFSET.  So
  where did I hear about 2.4 not mapping all memory?  Could it be
  that this applies only to "high memory" in x86?
 
 It only applies to high memory. To access it you have to use kmap(). 
 To use __pa you need to create a bounce buffer in lowmem first as of 2.4.

Excellent!  Thanks for clearing that up.

Later,
Kenn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: 2.4 MM overview?

2000-10-15 Thread Kenn Humborg

On Sun, Oct 15, 2000 at 09:22:58PM +0100, Alan Cox wrote:
   or you have a sane memory management model with tags/spaces then its a non issue
  
  You've lost me here.  Tags/spaces?
 
 A lot of memory management hardware allows you to build page tables that contain
 more than just the addresses. Instead a tag register or the processor state
 or both are combined in the lookup. This is paticularly important for a 
 virtually tagged cache to avoid flushing the cache on task switches

[Consults VMS/Alpha Internals  Data Structure manual...]

You mean like the way the Alpha has a PTE bit that says 'this page is
valid at the same address in every process', and the address space
number (ASN) that can be used to 'uniquefy' cache entries for the
same virtual addresses in different processes?

Well, we ain't got these luxuries/complications in VAXland...  Hell, 
we don't even have two-level page tables :-(

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-09 Thread Kenn Humborg

On Tue, Oct 10, 2000 at 12:55:33AM +0200, Andi Kleen wrote:
> On Mon, Oct 09, 2000 at 11:45:18PM +0100, Kenn Humborg wrote:
> > Simple.  Each interrupt stack is, say, 8 pages.  You have an array
> > of N interrupt stacks.  Then you calculate
> > 
> >cpu_id = (sp & ~(INT_STACK_SIZE-1)) >> (PAGE_SHIFT + 3);
> > 
> > Actually, I'd put the interrupt stack and any other per-cpu data
> > structures together in this region.
> 
> 
> So your smp_processor_id() looks like:
> 
> #define smp_processor_id() \
>   (in_interrupt() ? (sp & ~(INT_STACK_SIZE-1)) >> (PAGE_SHIFT + 3) : 
>  (struct task_struct *)(sp & -8192)->current_cpu) 
> 
> 
> ? 

Nope.

> There is just an ugly problem: in_interrupt already requires the CPU id
> to look up the table of interrupt counters.

The PSL (processor status longword) has a bit that tells you whether you're
currently on the interrupt stack or not.  You can test this in two 
instructions:

   movpsl r0  # get PSL
   bbs $0x25, r0,# branch if I bit set (bit 25)

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-09 Thread Kenn Humborg

On Tue, Oct 10, 2000 at 12:36:35AM +0200, Andi Kleen wrote:
> On Mon, Oct 09, 2000 at 11:30:50PM +0100, Alan Cox wrote:
> > > I think I'll go for the 'current is in a well-known register'
> > > approach and see how this goes...
> > 
> > Failing that the 2.0 approach will work, current is a global in uniprocessor
> > and a #define to an array indexed by cpu id in smp
> 
> The problem is where to get the cpuid from (see how smp_processor_id
> is currently defined ;) When you don't have a hidden register in the 
> CPU you're screwed. 
> [x86-64 has one btw] 

Simple.  Each interrupt stack is, say, 8 pages.  You have an array
of N interrupt stacks.  Then you calculate

   cpu_id = (sp & ~(INT_STACK_SIZE-1)) >> (PAGE_SHIFT + 3);

Actually, I'd put the interrupt stack and any other per-cpu data
structures together in this region.

I don't know yet how you decide which secondary processor is which
at boot time.  Maybe it doesn't matter, so you can just let them
fight over the per-cpu data structures by trying to claim spinlocks
on each one in turn.

Anyway, this SMP stuff will be quite academic for a while unless
someone wants to donate a workstation-sized SMP VAX (if such a
beast exists at all :-)

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-09 Thread Kenn Humborg

On Tue, Oct 10, 2000 at 09:04:30AM +1100, Keith Owens wrote:
> On 9 Oct 2000 11:08:36 -0700, 
> [EMAIL PROTECTED] (Linus Torvalds) wrote:
> >Note that there are alternative approaches. For example, you could make
> >the interrupt stack be in the same multi-page as the regular stack, and
> >switch them both at task-switch time - just allocate four pages instead
> >of two, and use "current = esp & ~16383" instead or something like that.
> 
> Ouch.  Too many places in the source have hard coded 8191 or 8192.
> Would you take a patch to replace all those hard coded numbers with
> #defines or is that best left for 2.4.1?

It wouldn't work anyway.  There has to be _one_ interrupt stack per
CPU.  When an interrupt happens (or a certain bit is set in an 
exception vector), the CPU saves SP in the USP or KSP register
(user or kernel SP) and loads SP from the ISP register.  USP and 
KSP are considered part of process context and are saved and restored
across context switches.  ISP is not.  

Thinks out loud...

Or maybe we could play tricks with ISP during the context switch...
It would certainly be made simpler by Linux's all-or-nothing approach
to enabling/disabling interrupts.  (In contrast to VMS which makes
extensive use of the VAX's 31 interrupt priority levels.  Process
re-scheduling happens at priority 3, devices interrupt at priority 16-23,
power failure interrupts at 30, for example.)

Or maybe not...  What if a device interrupted at IPL 20.  Another
device could interrupt at IPL 21 in the window before we block 
all interrupts in the first interrupt handler.  Then this second
handler triggers a resched.  If we switch to a different interrupt
stack then we'll destroy the stack context of the first handler.
Unless we either copy around the stack context (ugh) or map the
same physical stack into each task_struct.  Seems a bit wasteful.

I think I'll go for the 'current is in a well-known register'
approach and see how this goes...

Later,
Kenn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-09 Thread Kenn Humborg

On Mon, Oct 09, 2000 at 03:54:21AM +0200, Andi Kleen wrote:
> On Mon, Oct 09, 2000 at 02:45:54AM +0100, Kenn Humborg wrote:
> > On Mon, Oct 09, 2000 at 02:21:09AM +0100, Kenn Humborg wrote:
> > > On Mon, Oct 09, 2000 at 02:20:27AM +0200, Andi Kleen wrote:
> > > > 2.4 TCP code relies on current being valid in a softirq.
> > > 
> > > And what the hell does TCP need current for anyway?
> > 
> > I think the only reference is in tcp_input.c, tcp_data_queue().
> > This does:
> 
> [...]
> 
> It is actually used in two places, in the fast path and there. It isn't
> as bad as it looks because it is only used in user context and could
> be fixed by putting a special flag into the sock for the execute
> in user context case (or just supply an argument that is passed around) 
> 
> The point was just that there are probably other users of current
> in interrupt context and AFAIK it works currently in all ports
> so you would need to fix these (mostly buggy) occurrences.

OK.  I'm convinced.  current will be valid in interrupt context.

> If you ever wanted to do a SMP VAX port you would also need to fix
> smp_processor_id().

No problem.  I've already come up with a couple of ways of doing that.

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-09 Thread Kenn Humborg

On Mon, Oct 09, 2000 at 03:54:21AM +0200, Andi Kleen wrote:
 On Mon, Oct 09, 2000 at 02:45:54AM +0100, Kenn Humborg wrote:
  On Mon, Oct 09, 2000 at 02:21:09AM +0100, Kenn Humborg wrote:
   On Mon, Oct 09, 2000 at 02:20:27AM +0200, Andi Kleen wrote:
2.4 TCP code relies on current being valid in a softirq.
   
   And what the hell does TCP need current for anyway?
  
  I think the only reference is in tcp_input.c, tcp_data_queue().
  This does:
 
 [...]
 
 It is actually used in two places, in the fast path and there. It isn't
 as bad as it looks because it is only used in user context and could
 be fixed by putting a special flag into the sock for the execute
 in user context case (or just supply an argument that is passed around) 
 
 The point was just that there are probably other users of current
 in interrupt context and AFAIK it works currently in all ports
 so you would need to fix these (mostly buggy) occurrences.

OK.  I'm convinced.  current will be valid in interrupt context.

 If you ever wanted to do a SMP VAX port you would also need to fix
 smp_processor_id().

No problem.  I've already come up with a couple of ways of doing that.

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-09 Thread Kenn Humborg

On Tue, Oct 10, 2000 at 09:04:30AM +1100, Keith Owens wrote:
 On 9 Oct 2000 11:08:36 -0700, 
 [EMAIL PROTECTED] (Linus Torvalds) wrote:
 Note that there are alternative approaches. For example, you could make
 the interrupt stack be in the same multi-page as the regular stack, and
 switch them both at task-switch time - just allocate four pages instead
 of two, and use "current = esp  ~16383" instead or something like that.
 
 Ouch.  Too many places in the source have hard coded 8191 or 8192.
 Would you take a patch to replace all those hard coded numbers with
 #defines or is that best left for 2.4.1?

It wouldn't work anyway.  There has to be _one_ interrupt stack per
CPU.  When an interrupt happens (or a certain bit is set in an 
exception vector), the CPU saves SP in the USP or KSP register
(user or kernel SP) and loads SP from the ISP register.  USP and 
KSP are considered part of process context and are saved and restored
across context switches.  ISP is not.  

Thinks out loud...

Or maybe we could play tricks with ISP during the context switch...
It would certainly be made simpler by Linux's all-or-nothing approach
to enabling/disabling interrupts.  (In contrast to VMS which makes
extensive use of the VAX's 31 interrupt priority levels.  Process
re-scheduling happens at priority 3, devices interrupt at priority 16-23,
power failure interrupts at 30, for example.)

Or maybe not...  What if a device interrupted at IPL 20.  Another
device could interrupt at IPL 21 in the window before we block 
all interrupts in the first interrupt handler.  Then this second
handler triggers a resched.  If we switch to a different interrupt
stack then we'll destroy the stack context of the first handler.
Unless we either copy around the stack context (ugh) or map the
same physical stack into each task_struct.  Seems a bit wasteful.

I think I'll go for the 'current is in a well-known register'
approach and see how this goes...

Later,
Kenn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-09 Thread Kenn Humborg

On Tue, Oct 10, 2000 at 12:36:35AM +0200, Andi Kleen wrote:
 On Mon, Oct 09, 2000 at 11:30:50PM +0100, Alan Cox wrote:
   I think I'll go for the 'current is in a well-known register'
   approach and see how this goes...
  
  Failing that the 2.0 approach will work, current is a global in uniprocessor
  and a #define to an array indexed by cpu id in smp
 
 The problem is where to get the cpuid from (see how smp_processor_id
 is currently defined ;) When you don't have a hidden register in the 
 CPU you're screwed. 
 [x86-64 has one btw] 

Simple.  Each interrupt stack is, say, 8 pages.  You have an array
of N interrupt stacks.  Then you calculate

   cpu_id = (sp  ~(INT_STACK_SIZE-1))  (PAGE_SHIFT + 3);

Actually, I'd put the interrupt stack and any other per-cpu data
structures together in this region.

I don't know yet how you decide which secondary processor is which
at boot time.  Maybe it doesn't matter, so you can just let them
fight over the per-cpu data structures by trying to claim spinlocks
on each one in turn.

Anyway, this SMP stuff will be quite academic for a while unless
someone wants to donate a workstation-sized SMP VAX (if such a
beast exists at all :-)

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-09 Thread Kenn Humborg

On Tue, Oct 10, 2000 at 12:55:33AM +0200, Andi Kleen wrote:
 On Mon, Oct 09, 2000 at 11:45:18PM +0100, Kenn Humborg wrote:
  Simple.  Each interrupt stack is, say, 8 pages.  You have an array
  of N interrupt stacks.  Then you calculate
  
 cpu_id = (sp  ~(INT_STACK_SIZE-1))  (PAGE_SHIFT + 3);
  
  Actually, I'd put the interrupt stack and any other per-cpu data
  structures together in this region.
 
 
 So your smp_processor_id() looks like:
 
 #define smp_processor_id() \
   (in_interrupt() ? (sp  ~(INT_STACK_SIZE-1))  (PAGE_SHIFT + 3) : 
  (struct task_struct *)(sp  -8192)-current_cpu) 
 
 
 ? 

Nope.

 There is just an ugly problem: in_interrupt already requires the CPU id
 to look up the table of interrupt counters.

The PSL (processor status longword) has a bit that tells you whether you're
currently on the interrupt stack or not.  You can test this in two 
instructions:

   movpsl r0  # get PSL
   bbs $0x25, r0, dst   # branch if I bit set (bit 25)

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-08 Thread Kenn Humborg

On Mon, Oct 09, 2000 at 02:21:09AM +0100, Kenn Humborg wrote:
> On Mon, Oct 09, 2000 at 02:20:27AM +0200, Andi Kleen wrote:
> > 2.4 TCP code relies on current being valid in a softirq.
> 
> And what the hell does TCP need current for anyway?

I think the only reference is in tcp_input.c, tcp_data_queue().
This does:

2483 /*  Queue data for delivery to the user.
2484  *  Packets in sequence go to the receive queue.
2485  *  Out of sequence packets to the out_of_order_queue.
2486  */
2487 if (TCP_SKB_CB(skb)->seq == tp->rcv_nxt) {
2488 /* Ok. In sequence. */
2489 if (tp->ucopy.task == current &&
2490 tp->copied_seq == tp->rcv_nxt &&
2491 tp->ucopy.len &&
2492 sk->lock.users &&
2493 !tp->urg_data) {
2494 int chunk = min(skb->len, tp->ucopy.len);
2495 
2496 __set_current_state(TASK_RUNNING);

Hmmm...  I think I like the idea of having a different current for
interrupt context code.  It looks like it's either that, slowing
down get_current() by checking for interrupt or kernel stack, or
devoting a register to current.

Must sleep on this...

Later,
Kenn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-08 Thread Kenn Humborg

On Mon, Oct 09, 2000 at 02:20:27AM +0200, Andi Kleen wrote:
> 2.4 TCP code relies on current being valid in a softirq.

Well, then as long as Linux guarantees that there is always a 
valid 'current task' on a CPU, then I can special-case the 
called-from-interrupt case.  The previous kernel stack pointer
is accessible from another processor register, so I can go in
there and pull it out and use it to calculate current.

Is it possible to get an interrupt during context switching,
for example?  Or any other window during which there isn't
a valid current?

And what the hell does TCP need current for anyway?

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-08 Thread Kenn Humborg

On Mon, Oct 09, 2000 at 01:02:21AM +0200, Jamie Lokier wrote:
> [EMAIL PROTECTED] wrote:
> > BTW: there is an implicit reference to "current"  in smp_processor_id. 
> 
> Yes I forgot about that.  (Self-flagellate).  However that is
> architecture specific.  If it's not an SMP Vax port, no big deal.  If it
> is, there's a way to arrange that smp_processor_id returns the correct
> processor id even from the interrupt stack.

Yes, that's easily done.  Interrupt stacks are per-processor, so they are
part of the per-cpu data structures.  So we can use a similar trick to the
task_struct/kernel stack hack.  (And still get a crash if current is used
from interrupt context.)

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-08 Thread Kenn Humborg

On Mon, Oct 09, 2000 at 02:20:27AM +0200, Andi Kleen wrote:
> On Mon, Oct 09, 2000 at 12:30:17AM +0200, Jamie Lokier wrote:
> > Kenn Humborg wrote:
> > > My feeling is that interrupt code has no business calling current(),
> > > but I don't know the kernel well enough to be sure.  Is there any
> > > interrupt-level code that calls current() or is it a design
> > > principle that it cannot be called?
...
> > So if you can make the machine crash utterly when calling "current" in
> > irq context, or when dereferencing the result, that would probably be a
> > good thing :-)

Easily done.  Because I don't really know how big we need to make the
stacks yet, I've put a non-accessible guard page just below the 
interrupt stack.  I can arrange for (SP & ~8192) to hit this page.

> 
> 2.4 TCP code relies on current being valid in a softirq.
> 
> The m68k port which has a interrupt stack solves the problem by 
> loading current into a global register variable on all kernel entries.
> x86-64 will likely do the same.

How do you tell GCC to stay away from that register when compiling
the kernel without also making it unusable in userland?

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Calling current() from interrupt context

2000-10-08 Thread Kenn Humborg


I'd just like to confirm that it's illegal to call current()
from interrupt-handling code.

I'm working on the VAX port and the reason I ask is that the
VAX has separate stack pointers for user, kernel and interrupt
contexts.  Therefore, the current = (SP & ~8192) hack will give
completely bogus results when handling an interrupt.

My feeling is that interrupt code has no business calling current(),
but I don't know the kernel well enough to be sure.  Is there any
interrupt-level code that calls current() or is it a design
principle that it cannot be called?

Thanks,
Kenn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Calling current() from interrupt context

2000-10-08 Thread Kenn Humborg


I'd just like to confirm that it's illegal to call current()
from interrupt-handling code.

I'm working on the VAX port and the reason I ask is that the
VAX has separate stack pointers for user, kernel and interrupt
contexts.  Therefore, the current = (SP  ~8192) hack will give
completely bogus results when handling an interrupt.

My feeling is that interrupt code has no business calling current(),
but I don't know the kernel well enough to be sure.  Is there any
interrupt-level code that calls current() or is it a design
principle that it cannot be called?

Thanks,
Kenn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-08 Thread Kenn Humborg

On Mon, Oct 09, 2000 at 02:20:27AM +0200, Andi Kleen wrote:
 On Mon, Oct 09, 2000 at 12:30:17AM +0200, Jamie Lokier wrote:
  Kenn Humborg wrote:
   My feeling is that interrupt code has no business calling current(),
   but I don't know the kernel well enough to be sure.  Is there any
   interrupt-level code that calls current() or is it a design
   principle that it cannot be called?
...
  So if you can make the machine crash utterly when calling "current" in
  irq context, or when dereferencing the result, that would probably be a
  good thing :-)

Easily done.  Because I don't really know how big we need to make the
stacks yet, I've put a non-accessible guard page just below the 
interrupt stack.  I can arrange for (SP  ~8192) to hit this page.

 
 2.4 TCP code relies on current being valid in a softirq.
 
 The m68k port which has a interrupt stack solves the problem by 
 loading current into a global register variable on all kernel entries.
 x86-64 will likely do the same.

How do you tell GCC to stay away from that register when compiling
the kernel without also making it unusable in userland?

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-08 Thread Kenn Humborg

On Mon, Oct 09, 2000 at 01:02:21AM +0200, Jamie Lokier wrote:
 [EMAIL PROTECTED] wrote:
  BTW: there is an implicit reference to "current"  in smp_processor_id. 
 
 Yes I forgot about that.  (Self-flagellate).  However that is
 architecture specific.  If it's not an SMP Vax port, no big deal.  If it
 is, there's a way to arrange that smp_processor_id returns the correct
 processor id even from the interrupt stack.

Yes, that's easily done.  Interrupt stacks are per-processor, so they are
part of the per-cpu data structures.  So we can use a similar trick to the
task_struct/kernel stack hack.  (And still get a crash if current is used
from interrupt context.)

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-08 Thread Kenn Humborg

On Mon, Oct 09, 2000 at 02:20:27AM +0200, Andi Kleen wrote:
 2.4 TCP code relies on current being valid in a softirq.

Well, then as long as Linux guarantees that there is always a 
valid 'current task' on a CPU, then I can special-case the 
called-from-interrupt case.  The previous kernel stack pointer
is accessible from another processor register, so I can go in
there and pull it out and use it to calculate current.

Is it possible to get an interrupt during context switching,
for example?  Or any other window during which there isn't
a valid current?

And what the hell does TCP need current for anyway?

Later,
Kenn

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Calling current() from interrupt context

2000-10-08 Thread Kenn Humborg

On Mon, Oct 09, 2000 at 02:21:09AM +0100, Kenn Humborg wrote:
 On Mon, Oct 09, 2000 at 02:20:27AM +0200, Andi Kleen wrote:
  2.4 TCP code relies on current being valid in a softirq.
 
 And what the hell does TCP need current for anyway?

I think the only reference is in tcp_input.c, tcp_data_queue().
This does:

2483 /*  Queue data for delivery to the user.
2484  *  Packets in sequence go to the receive queue.
2485  *  Out of sequence packets to the out_of_order_queue.
2486  */
2487 if (TCP_SKB_CB(skb)-seq == tp-rcv_nxt) {
2488 /* Ok. In sequence. */
2489 if (tp-ucopy.task == current 
2490 tp-copied_seq == tp-rcv_nxt 
2491 tp-ucopy.len 
2492 sk-lock.users 
2493 !tp-urg_data) {
2494 int chunk = min(skb-len, tp-ucopy.len);
2495 
2496 __set_current_state(TASK_RUNNING);

Hmmm...  I think I like the idea of having a different current for
interrupt context code.  It looks like it's either that, slowing
down get_current() by checking for interrupt or kernel stack, or
devoting a register to current.

Must sleep on this...

Later,
Kenn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/