date:20070502

On Thu, 3 May 2007 13:34:29 +0800 WANG Cong <[EMAIL PROTECTED]> wrote:

> 
> Fix this warning:
> drivers/usb/misc/sisusbvga/sisusb_con.c:1436: warning: initialization from 
> incompatible pointer type
> 
> 
> Signed-off-by: WANG Cong <[EMAIL PROTECTED]>
> ---
> 
> Compiling test past.;)
> 

Thanks.

> --- linux-2.6.21-rc7-mm2/drivers/usb/misc/sisusbvga/sisusb_con.c.orig 
> 2007-05-03 02:51:06.0 +0800
> +++ linux-2.6.21-rc7-mm2/drivers/usb/misc/sisusbvga/sisusb_con.c  
> 2007-05-03 02:57:08.0 +0800
> @@ -321,9 +321,10 @@ sisusbcon_deinit(struct vc_data *c)
>  /* interface routine */
>  static u8
>  sisusbcon_build_attr(struct vc_data *c, u8 color, u8 intensity,
> - u8 blink, u8 underline, u8 reverse)
> + u8 blink, u8 underline, u8 reverse, u8 unused)
>  {
>   u8 attr = color;
> + (void) unused;

This part isn't needed and we don't usually do it - the compiler will not warn
about the unused arg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] i386: fix suspend/resume with dynamically allocated irq stacks

On Wed, May 02, 2007 at 06:56:09PM -0700, Jeremy Fitzhardinge wrote:
> This fixes two bugs:
>  - the stack allocation must be marked __cpuinit, since it gets called
>on resume as well.
>  - presumably the interrupt stack should be freed on unplug if its
>going to get reallocated on every plug.
> [ Only non-vmalloced stacks tested. ]
> Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
> ---
>  arch/i386/kernel/irq.c |   42 +-
>  1 file changed, 37 insertions(+), 5 deletions(-)

Updated patch follows. Please add your Signed-off-by: if it meets your
approval; I am operating on the assumption I should never do so myself.
I'm a bit unsure of how to handle cpu 0 vs. potential freeing of per_cpu
areas and error returns from __cpuinit affairs, but anyhow:

This fixes three bugs:
  - the stack allocation must be marked __cpuinit, since it gets called
on resume as well.
  - presumably the interrupt stack should be freed on unplug if its
going to get reallocated on every plug.
  - the vm_struct got leaked by thread info freeing callbacks.
Signed-off-by: William Irwin <[EMAIL PROTECTED]>


Index: stack-paranoia/arch/i386/kernel/irq.c
===
--- stack-paranoia.orig/arch/i386/kernel/irq.c  2007-05-02 19:33:23.937945981 
-0700
+++ stack-paranoia/arch/i386/kernel/irq.c   2007-05-02 21:17:41.134523293 
-0700
@@ -142,18 +142,19 @@
  * These should really be __section__(".bss.page_aligned") as well, but
  * gcc's 3.0 and earlier don't handle that correctly.
  */
-static DEFINE_PER_CPU(char *, softirq_stack);
-static DEFINE_PER_CPU(char *, hardirq_stack);
-
+struct irq_stack_info {
+   char *stack;
 #ifdef CONFIG_DEBUG_STACK
-static void * __init irq_remap_stack(void *stack)
-{
-   int i;
struct page *pages[THREAD_SIZE/PAGE_SIZE];
+#endif /* CONFIG_DEBUG_STACK */
+};
+static DEFINE_PER_CPU(struct irq_stack_info, softirq_stack_info);
+static DEFINE_PER_CPU(struct irq_stack_info, hardirq_stack_info);
 
-   for (i = 0; i < ARRAY_SIZE(pages); ++i)
-   pages[i] =  virt_to_page(stack + PAGE_SIZE*i);
-   return vmap(pages, THREAD_SIZE/PAGE_SIZE, VM_IOREMAP, PAGE_KERNEL);
+#ifdef CONFIG_DEBUG_STACK
+static void * __init irq_remap_stack(struct irq_stack_info *info)
+{
+   return vmap(info->pages, ARRAY_SIZE(info->pages), VM_IOREMAP, 
PAGE_KERNEL);
 }
 
 static int __init irq_guard_cpu0(void)
@@ -161,59 +162,110 @@
unsigned long flags;
void *tmp;
 
-   tmp = irq_remap_stack(per_cpu(softirq_stack, 0));
+   tmp = irq_remap_stack(_cpu(softirq_stack_info, 0));
if (!tmp)
return -ENOMEM;
else {
local_irq_save(flags);
-   per_cpu(softirq_stack, 0) = tmp;
+   per_cpu(softirq_stack_info, 0).stack = tmp;
local_irq_restore(flags);
}
-   tmp = irq_remap_stack(per_cpu(hardirq_stack, 0));
+   tmp = irq_remap_stack(_cpu(hardirq_stack_info, 0));
if (!tmp)
return -ENOMEM;
else {
local_irq_save(flags);
-   per_cpu(hardirq_stack, 0) = tmp;
+   per_cpu(hardirq_stack_info, 0).stack = tmp;
local_irq_restore(flags);
}
return 0;
 }
 core_initcall(irq_guard_cpu0);
 
-static void * __init __alloc_irqstack(int cpu)
+static int __cpuinit __alloc_irqstack(int cpu, struct irq_stack_info *info)
 {
int i;
-   struct page *pages[THREAD_SIZE/PAGE_SIZE], **tmp = pages;
-   struct vm_struct *area;
 
-   if (!slab_is_available())
-   return __alloc_bootmem(THREAD_SIZE, THREAD_SIZE,
+   if (!slab_is_available()) {
+   WARN_ON(cpu != 0);
+   info->stack = __alloc_bootmem(THREAD_SIZE, THREAD_SIZE,
__pa(MAX_DMA_ADDRESS));
+   info->pages[0] = virt_to_page(info->stack);
+   for (i = 1; i < ARRAY_SIZE(info->pages); ++i)
+   info->pages[i] = info->pages[0] + i;
+   return 0;
+   }
+   for (i = 0; i < ARRAY_SIZE(info->pages); ++i) {
+   if (!cpu)
+   WARN_ON(!info->pages[i]);
+   else {
+   info->pages[i] = alloc_page(GFP_HIGHUSER);
+   if (!info->pages[i])
+   goto out;
+   }
+   }
+   info->stack = irq_remap_stack(info);
+   if (info->stack)
+   return 0;
+out:
+   if (cpu) {
+   for (--i; i >= 0; --i) {
+   __free_page(info->pages[i]);
+   info->pages[i] = NULL;
+   }
+   }
+   return -1;
+}
+
+static void __cpuinit __free_irqstack(int cpu, struct irq_stack_info *info)
+{
+   int i;
 
-   /* failures here are unrecoverable anyway */
-   area = get_vm_area(THREAD_SIZE, VM_IOREMAP);
-

Re: [RELEASE] Lguest for 2.6.21

2007-05-02 Thread WANG Cong

On Thu, May 03, 2007 at 02:20:32PM +1000, Rusty Russell wrote:
>On Thu, 2007-05-03 at 11:57 +0800, WANG Cong wrote:
>> On Thu, May 03, 2007 at 09:00:48AM +1000, Rusty Russell wrote:
>> >Thanks for the patch.  This omission (in several places) was
>> >deliberate.  We can't really do anything sensible if the user unmapped
>> >the page.  I assume you saw a gcc warning from this code?
>> 
>> Yes. In fact, I got two warnings, another one is in 
>> drivers/lguest/hypercalls.c.
>> 
>> If I understand you correctly, you mean we can do nothing useful to fix it?
>
>We can, but we can ignore those warnings for the moment; they're
>harmless.

OK.

>
>> I have sent a mail which described the errors I got when comipling
>> Documentation/lguest/lguest.c. But it seems that you didn't receive it
>> (it didn't appear in lkml.org neither!).
>
>Hmm, no, I didn't get it here either 8(
>
>>  It is that, I have already made my .config as you suggested, but I
>> still can't compile Documentation/lguest/lguest.c, errors are:
>> 
>> lguest.c: In function 'add_to_bridge':
>> lguest.c:779: error: 'SIOCBRADDIF' undeclared (first use in this function)
>> lguest.c:779: error: (Each undeclared identifier is reported only once
>> lguest.c:779: error: for each function it appears in.)
>
>Ah, perhaps older libc headers?  Can you try adding this to the top of
>Documentation/lguest/lguest.c after the #define BRIDGE_PFX "bridge:"
>
>#ifndef SIOCBRADDIF
>#define SIOCBRADDIF0x89a2  /* add interface to bridge  */
>#endif
>
>Thanks,
>Rusty.

Yes, it works. Thanks very much for your pointing.

I have already found you also forgot some checking in this user-space code. I 
will fix them soon. ;)

Regards.
WANG Cong
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Patch][SIS USB2VGA] Warning fix

2007-05-02 Thread WANG Cong


Fix this warning:
drivers/usb/misc/sisusbvga/sisusb_con.c:1436: warning: initialization from 
incompatible pointer type


Signed-off-by: WANG Cong <[EMAIL PROTECTED]>
---

Compiling test past.;)


--- linux-2.6.21-rc7-mm2/drivers/usb/misc/sisusbvga/sisusb_con.c.orig   
2007-05-03 02:51:06.0 +0800
+++ linux-2.6.21-rc7-mm2/drivers/usb/misc/sisusbvga/sisusb_con.c
2007-05-03 02:57:08.0 +0800
@@ -321,9 +321,10 @@ sisusbcon_deinit(struct vc_data *c)
 /* interface routine */
 static u8
 sisusbcon_build_attr(struct vc_data *c, u8 color, u8 intensity,
-   u8 blink, u8 underline, u8 reverse)
+   u8 blink, u8 underline, u8 reverse, u8 unused)
 {
u8 attr = color;
+   (void) unused;
 
if (underline)
attr = (attr & 0xf0) | c->vc_ulcolor;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] docbook: librs typo fixes

From: Randy Dunlap <[EMAIL PROTECTED]>

librs docbook typo fixes.

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---
 Documentation/DocBook/librs.tmpl |   16 
 1 file changed, 8 insertions(+), 8 deletions(-)

--- linux-2621-g4.orig/Documentation/DocBook/librs.tmpl
+++ linux-2621-g4/Documentation/DocBook/librs.tmpl
@@ -79,12 +79,12 @@
   
Usage

-   This chapter provides examples how to use the library.
+   This chapter provides examples of how to use the library.


Initializing

-   The init function init_rs returns a pointer to a
+   The init function init_rs returns a pointer to an
rs decoder structure, which holds the necessary
information for encoding, decoding and error correction
with the given polynomial. It either uses an existing
@@ -98,10 +98,10 @@
 static struct rs_control *rs_decoder;
 
 /* Symbolsize is 10 (bits)
- * Primitve polynomial is x^10+x^3+1
+ * Primitive polynomial is x^10+x^3+1
  * first consecutive root is 0
- * primitve element to generate roots = 1
- * generator polinomial degree (number of roots) = 6
+ * primitive element to generate roots = 1
+ * generator polynomial degree (number of roots) = 6
  */
 rs_decoder = init_rs (10, 0x409, 0, 1, 6);

@@ -116,12 +116,12 @@ rs_decoder = init_rs (10, 0x409, 0, 1, 6


The expanded data can be inverted on the fly by
-   providing a non zero inversion mask. The expanded data 
is
+   providing a non-zero inversion mask. The expanded data 
is
XOR'ed with the mask. This is used e.g. for FLASH
ECC, where the all 0xFF is inverted to an all 0x00.
The Reed-Solomon code for all 0x00 is all 0x00. The
code is inverted before storing to FLASH so it is 0xFF
-   too. This prevent's that reading from an erased FLASH
+   too. This prevents that reading from an erased FLASH
results in ECC errors.


@@ -273,7 +273,7 @@ free_rs(rs_decoder);
May be used under the terms of the GNU General Public License 
(GPL)


-   The wrapper functions and interfaces are written by Thomas 
Gleixner
+   The wrapper functions and interfaces are written by Thomas 
Gleixner.


Many users have provided bugfixes, improvements and helping 
hands for testing.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] Input patches for 2.6.20

On Sunday 18 February 2007 02:04, Dmitry Torokhov wrote:
> Hi Linus,
> 
> Please consider pulling from:
> 
>         git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git for-linus
> 
> or
>         master.kernel.org:/pub/scm/linux/kernel/git/dtor/input.git for-linus
> 
> to receive updates for input subsystem.

Linus,

If you have not pulled yet please pull from:

        git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git for-linus

or
        master.kernel.org:/pub/scm/linux/kernel/git/dtor/input.git for-linus

because master branch will have extra stuff in the next minute or so.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] pull in the linux/input.h header in linux/uinput.h

On Wednesday 02 May 2007 18:49, Mike Frysinger wrote:
> uinput.h relies on structures only found in input.h, so pull in the header
> 
> Signed-off-by: Mike Frysinger <[EMAIL PROTECTED]>
> ---
> diff --git a/include/linux/uinput.h b/include/linux/uinput.h
> index 1fd61ee..a6c1e8e 100644
> --- a/include/linux/uinput.h
> +++ b/include/linux/uinput.h
> @@ -32,6 +32,8 @@
>   *   - first public version
>   */
>  
> +#include 
> +
>  #define UINPUT_VERSION   3
>  
>  #ifdef __KERNEL__
> 

Applied, thank you.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[LGUEST] Look in object dir for .config

2007-05-02 Thread Tony Breeds

From: Tony Breeds <[EMAIL PROTECTED]>

[LGUEST] Look in object dir for .config

If you build with make O= then .config isn't in ../../, this patch
goes partway to making sure that you don't dirty the source tree.

Signed-off-by: Tony Breeds <[EMAIL PROTECTED]>

---
 Documentation/lguest/Makefile |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

--- Documentation/lguest/Makefile.orig  2007-05-03 14:23:07.0 +1000
+++ Documentation/lguest/Makefile   2007-05-03 14:25:27.0 +1000
@@ -1,8 +1,15 @@
 # This creates the demonstration utility "lguest" which runs a Linux guest.
 
+# For those people that have a separate object dir, look there for .config
+KBUILD_OUTPUT := ../..
+ifdef O
+  ifeq ("$(origin O)", "command line")
+KBUILD_OUTPUT := $(O)
+  endif
+endif
 # We rely on CONFIG_PAGE_OFFSET to know the highest address we can put
 # the lguest binary.
-include ../../.config
+include $(KBUILD_OUTPUT)/.config
 LGUEST_GUEST_TOP := ($(CONFIG_PAGE_OFFSET) - 0x0800)
 
 CFLAGS:=-Wall -Wmissing-declarations -Wmissing-prototypes -O3 \

Yours Tony

  linux.conf.auhttp://linux.conf.au/ || http://lca2008.linux.org.au/
  Jan 28 - Feb 02 2008 The Australian Linux Technical Conference!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage

2007-05-02 Thread Vivek Goyal

On Wed, May 02, 2007 at 02:59:11PM -0700, H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
> > 
> > So the bzImage structure is currently:
> > 
> >1. old-style boot sector
> >2. old-style boot info, followed by 0xaa55 at the end of the sector
> >3. the HdrS boot param block
> >4. setup.S boot code
> >5. the self-decompressing kernel
> > 
> > If we make 5 actually an ELF file, containing properly formed Ehdr,
> > Phdrs (for all the mappings required), and the actual kernel
> > decompressor, relocator and compressed kernel data, then it would be
> > easy for the Xen domain builder to find that and use it as a basis for
> > loading.  I think it would just require the bzImage boot param block to
> > contain an offset of the start of the ELF file.  The contents of the ELF
> > file would be in a form where the normal boot code could just jump over
> > the ELF headers, directly into the segment data itself.
> > 
> > ie:
> > 
> >1. old-style boot sector
> >2. old-style boot info, followed by 0xaa55 at the end of the sector
> >3. the HdrS boot param block
> >4. setup.S boot code (jumps directly into 5.3)
> >5. 32-bit self-decompressing kernel:
> >  1. Ehdr
> >  2. Phdrs for all necessary mappings
> >  3. decompressor/relocator .text
> >  4. compressed kernel data
> > 
> > Does that sound reasonable?
> > 
> 
> I don't know if that would break any programs that are currently
> bypassing the setup.

I think kexec bzImage loader will break. It bypasses the setup code and
directly jumps to the code present after setup sectors(decompressor).

> The existing setup protocol definitely allows
> invoking an entry point which isn't 0x10 (rather, the 32-bit
> entrypoint is defined by code32_start); I'm not sure how Eric's
> relocatable kernel patches (2.05 protocol) affect that, mostly because I
> haven't seen any boot loaders which actually use it so I can't comment
> on what their code looks like.

With relocatable patches, if a boot loader decides to load protected mode
component at non-1MB address, then it shall have to modify code32_start to
reflect the new location of protected mode code.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ck] [REPORT] 2.6.21.1 vs 2.6.21-sd046 vs 2.6.21-cfs-v6

2007-05-02 Thread William Lee Irwin III

William Lee Irwin III wrote:
>> That's odd. The ->load_weight changes should've improved that quite
>> a bit. There may be something slightly off in how lag is computed,
>> or maybe the O(n) lag issue Ying Tang spotted is biting you.

On Thu, May 03, 2007 at 06:51:43AM +0300, Al Boldi wrote:
> Is it not biting you too?

I'm a kernel programmer. I'm not an objective tester.

It also happens to be the case that I personally have never encountered
a performance problem with any of the schedulers, mainline included, on
any system I use interactively. So my "user experience" is not valuable.


William Lee Irwin III wrote:
>> Also, I should say that the nice number affairs don't imply fairness
>> per se. The way that works is that when tasks have "weights" (like
>> nice levels in UNIX) the definition of fairness changes so that each
>> task gets shares of CPU bandwidth proportional to its weight instead
>> of one share for one task.

On Thu, May 03, 2007 at 06:51:43AM +0300, Al Boldi wrote:
> Ok, but you can easily expose scheduler unfairness by using nice levels as 
> relative magnifiers; provided nice levels are implemented correctly.

This doesn't really fit in with anything I'm aware of.


William Lee Irwin III wrote:
>> The other thing to do is try a different number of tasks with a
>> different mix of nice levels. The weight w_i for a given nice
>> level n_i should be the same even in a different mix of tasks
>> and nice levels if the nice levels are the same.
>> If this sounds too far out, there's nothing to worry about. You can
>> just run the different numbers of tasks with different mixes of nice
>> levels and post the %cpu numbers. Or if that's still a bit far out
>> for you, a test that does all this is eventually going to get written.

On Thu, May 03, 2007 at 06:51:43AM +0300, Al Boldi wrote:
> chew.c does exactly that, just make sure sched_granularity_ms >= 5,000,000.

Please post the source of chew.c


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] rename thread_info to stack

On Tue, 1 May 2007 02:10:34 +0200 (CEST) Roman Zippel <[EMAIL PROTECTED]> wrote:

> This finally renames the thread_info field in task structure to stack,
> so that the assumptions about this field are gone and archs have more
> freedom about placing the thread_info structure.

It's been a year or so and I've forgotten what the actual point to these
changes is.  Can we be reminded please?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Mysterious RTC hangs on x86_64 - fixed, sort of

2007-05-02 Thread Zachary Amsden


Chuck Ebbert wrote:


  

CONFIG_HPET_EMULATE_RTC=y



Did you try without that?
  


Just did.  Still hangs same way; strace shows /sbin/hwclock dying after 
hundreds of RTC_RD_TIME.  And now /proc/interrupts shows no rtc 
interrupts being generated (expected, I gues).  Seems to take longer to 
crash, but this is a heisenbug.


Enough crashing for today.  Strangest thing is the NMI watchdog does not 
fire...


Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: build system: no module target ending with slash?

2007-05-02 Thread Sam Ravnborg

On Thu, May 03, 2007 at 12:43:43AM +0200, Christian Hesse wrote:
> Hi James, hi everybody,
> 
> playing with iwlwifi I try to patch it into the kernel and to build it from 
> there. But I have a problem with the build system.
> 
> The file drivers/net/wireless/mac80211/Makefile contains one single line:
> 
> obj-$(CONFIG_IWLWIFI)   += iwlwifi/
> 
> When CONFIG_IWLWIFI=m in scripts/Makefile.lib line 29 the target is filtered 
> as it ends with a slash. That results in 
> drivers/net/wireless/mac80211/built-in.o not being built and the build 
> process breaks with an error. What is the correct way to handle this? Why are 
> targets ending with a slash filtered?

Looks buggy. I will take a look tonight.

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

100k American Chiropractors

2007-05-02 Thread Ina Bullard




This week's special:

--
- New Database : American Chiropractor's Offices -
--

Fields: Chiropractor/Clinic Name, Postal Address, Phone, Fax, Email and Website
Date Created: Apr 5, 2007
Format: MS Excel
License: Unlimited Use


Breakdown:

108,421 Total Records
3,414 Emails
6,553 Faxes

Special price until May 4 - $199 

Inquiries, please email [EMAIL PROTECTED]





To stop receiving emails from us please email with "off" in the subject.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: old buffer overflow in moxa driver

On Mon, 30 Apr 2007 16:48:29 -0600 dann frazier <[EMAIL PROTECTED]> wrote:

> hey,
>   I noticed that the moxa input checking security bug described by
> CVE-2005-0504 appears to remain unfixed upstream.
>  
> The issue is described here:
>   http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2005-0504
> 
> Debian has been shipping the following patch from Andres Salomon. I
> tried contacting the listed maintainer a few months ago but received
> no response.
> 
> I've tested that this still applies to and compiles against 2.6.21.
> 
> Signed-off-by: dann frazier <[EMAIL PROTECTED]>
> 
> diff --git a/drivers/char/moxa.c b/drivers/char/moxa.c
> index 7dbaee8..e0d35c2 100644
> --- a/drivers/char/moxa.c
> +++ b/drivers/char/moxa.c
> @@ -1582,7 +1582,7 @@ copy:
>  
>   if(copy_from_user(, argp, sizeof(struct dl_str)))
>   return -EFAULT;
> - if(dltmp.cardno < 0 || dltmp.cardno >= MAX_BOARDS)
> + if(dltmp.cardno < 0 || dltmp.cardno >= MAX_BOARDS || dltmp.len < 0)
>   return -EINVAL;
>  
>   switch(cmd)
> @@ -2529,6 +2529,8 @@ static int moxaloadbios(int cardno, unsigned char 
> __user *tmp, int len)
>   void __iomem *baseAddr;
>   int i;
>  
> + if(len < 0 || len > sizeof(moxaBuff))
> + return -EINVAL;
>   if(copy_from_user(moxaBuff, tmp, len))
>   return -EFAULT;
>   baseAddr = moxa_boards[cardno].basemem;
> @@ -2576,7 +2578,7 @@ static int moxaload320b(int cardno, unsigned char 
> __user *tmp, int len)
>   void __iomem *baseAddr;
>   int i;
>  
> - if(len > sizeof(moxaBuff))
> + if(len < 0 || len > sizeof(moxaBuff))
>   return -EINVAL;
>   if(copy_from_user(moxaBuff, tmp, len))
>   return -EFAULT;
> @@ -2596,6 +2598,8 @@ static int moxaloadcode(int cardno, unsigned char 
> __user *tmp, int len)
>   void __iomem *baseAddr, *ofsAddr;
>   int retval, port, i;
>  
> + if(len < 0 || len > sizeof(moxaBuff))
> + return -EINVAL;
>   if(copy_from_user(moxaBuff, tmp, len))
>   return -EFAULT;
>   baseAddr = moxa_boards[cardno].basemem;
> 

I'm seeing copies of this patch floating around the internet from the 2.6.10
timeframe at least.

Could people please be more diligent about getting bugfixes back into
mainline?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RELEASE] Lguest for 2.6.21

2007-05-02 Thread Rusty Russell

On Thu, 2007-05-03 at 11:57 +0800, WANG Cong wrote:
> On Thu, May 03, 2007 at 09:00:48AM +1000, Rusty Russell wrote:
> > Thanks for the patch.  This omission (in several places) was
> >deliberate.  We can't really do anything sensible if the user unmapped
> >the page.  I assume you saw a gcc warning from this code?
> 
> Yes. In fact, I got two warnings, another one is in 
> drivers/lguest/hypercalls.c.
> 
> If I understand you correctly, you mean we can do nothing useful to fix it?

We can, but we can ignore those warnings for the moment; they're
harmless.

> I have sent a mail which described the errors I got when comipling
> Documentation/lguest/lguest.c. But it seems that you didn't receive it
> (it didn't appear in lkml.org neither!).

Hmm, no, I didn't get it here either 8(

>  It is that, I have already made my .config as you suggested, but I
> still can't compile Documentation/lguest/lguest.c, errors are:
> 
> lguest.c: In function 'add_to_bridge':
> lguest.c:779: error: 'SIOCBRADDIF' undeclared (first use in this function)
> lguest.c:779: error: (Each undeclared identifier is reported only once
> lguest.c:779: error: for each function it appears in.)

Ah, perhaps older libc headers?  Can you try adding this to the top of
Documentation/lguest/lguest.c after the #define BRIDGE_PFX "bridge:"

#ifndef SIOCBRADDIF
#define SIOCBRADDIF 0x89a2  /* add interface to bridge  */
#endif

Thanks,
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

don' t understand this warning.

2007-05-02 Thread clemens

The last several kernels I have built all have the following warning in them:

WARNING: vmlinux - Section mismatch: reference to .init.text:start_kernel from 
.text between 'is386'
 (at offset 0xc0401171) and 'check_x87'
WARNING: vmlinux - Section mismatch: reference to .init.text: from .text 
between 'rest_init' (at off
set 0xc0401406) and 'try_name'
WARNING: vmlinux - Section mismatch: reference to .init.data: from .text 
between 'probe_bigsmp' (at
offset 0xc0402032) and 'init_apic_ldr'
WARNING: vmlinux - Section mismatch: reference to .init.text:find_unisys_acpi_o
em_table from .text b
etween 'acpi_madt_oem_check' (at offset 0xc0402244) and 'enable_apic_mode'

and this goes on with different variables for 18 lines.

Is this something I should be worried about?
I frankly dont understand what the message is trying to say.
-- 
Reg.Clemens
[EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

I don' t understand this warning.

2007-05-02 Thread Reg Clemens

The last several kernels I have built all have the following warning in them:

WARNING: vmlinux - Section mismatch: reference to .init.text:start_kernel from 
.text between 'is386'
 (at offset 0xc0401171) and 'check_x87'
WARNING: vmlinux - Section mismatch: reference to .init.text: from .text 
between 'rest_init' (at off
set 0xc0401406) and 'try_name'
WARNING: vmlinux - Section mismatch: reference to .init.data: from .text 
between 'probe_bigsmp' (at
offset 0xc0402032) and 'init_apic_ldr'
WARNING: vmlinux - Section mismatch: reference to .init.text:find_unisys_acpi_o
em_table from .text b
etween 'acpi_madt_oem_check' (at offset 0xc0402244) and 'enable_apic_mode'

and this goes on with different variables for 18 lines.

Is this something I should be worried about?
I frankly dont understand what the message is trying to say.
-- 
Reg.Clemens
[EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 6/6] firewire: add it all to kbuild

2007-05-02 Thread Sam Ravnborg

On Thu, May 03, 2007 at 01:01:08AM +0200, Stefan Richter wrote:
> Christoph Hellwig wrote:
> >> +fw-core-objs := fw-card.o fw-topology.o fw-transaction.o fw-iso.o \
> >> +  fw-device.o fw-cdev.o
> > 
> > fw-core-y += ..
> > 
> 
> Like such?
Yes - the latter is much more readable.

Sam

> 
> --- linux.orig/drivers/usb/core/Makefile
> +++ linux/drivers/usb/core/Makefile
> @@ -2,17 +2,12 @@
>  # Makefile for USB Core files and filesystem
>  #
> 
> -usbcore-objs := usb.o hub.o hcd.o urb.o message.o driver.o \
> +usbcore-y+= usb.o hub.o hcd.o urb.o message.o driver.o \
>   config.o file.o buffer.o sysfs.o endpoint.o \
>   devio.o notify.o generic.o quirks.o
> 
> -ifeq ($(CONFIG_PCI),y)
> - usbcore-objs+= hcd-pci.o
> -endif
> -
> -ifeq ($(CONFIG_USB_DEVICEFS),y)
> - usbcore-objs+= inode.o devices.o
> -endif
> +usbcore-$(CONFIG_PCI)+= hcd-pci.o
> +usbcore-$(CONFIG_USB_DEVICEFS)   += inode.o devices.o
> 
>  obj-$(CONFIG_USB)+= usbcore.o
> 
> 
> -- 
> Stefan Richter
> -=-=-=== -=-= ---==
> http://arcgraph.de/sr/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86_64: support poll() on /dev/mcelog (try #3)

2007-05-02 Thread Tim Hockin

From: Tim Hockin <[EMAIL PROTECTED]>

Background:
 /dev/mcelog is typically polled manually.  This is less than optimal for
 situations where accurate accounting of MCEs is important.  Calling
 poll() on /dev/mcelog does not work.

Description:
 This patch adds support for poll() to /dev/mcelog.  This results in
 immediate wakeup of user apps whenever the poller finds MCEs.  Because
 the exception handler can not take any locks, it can not call the wakeup
 itself.  Instead, it uses a thread_info flag (TIF_MCE_NOTIFY) which is
 caught at the next return from interrupt or exit from idle, calling the
 mce_user_notify() routine.  This patch also disables the "fake panic"
 path of the mce_panic(), because it results in printk()s in the exception
 handler and crashy systems.

 This patch also does some small cleanup for essentially unused variables,
 and moves the user notification into the body of the poller, so it is
 only called once per poll, rather than once per CPU.

Result:
 Applications can now poll() on /dev/mcelog.  When an error is logged
 (whether through the poller or through an exception) the applications are
 woken up promptly.  This should not affect any previous behaviors.  If no
 MCEs are being logged, there is no overhead.

Alternatives:
 I considered simply supporting poll() through the poller and not using
 TIF_MCE_NOTIFY at all.  However, the time between an uncorrectable error
 happening and the user application being notified is *the*most* critical
 window for us.  Many uncorrectable errors can be logged to the network if
 given a chance.

 I also considered doing the MCE poll directly from the idle notifier, but
 decided that was overkill.

Testing:
 I used an error-injecting DIMM to create lots of correctable DRAM errors
 and verified that my user app is woken up in sync with the polling interval.
 I also used the northbridge to inject uncorrectable ECC errors, and
 verified (printk() to the rescue) that the notify routine is called and the
 user app does wake up.  I built with PREEMPT on and off, and verified
 that my machine survives MCEs.

Patch:
 This patch is against 2.6.21-rc7.

Signed-off-by: Tim Hockin <[EMAIL PROTECTED]>

---

This is the third version version of this patch.  The TIF_* approach was
suggested by Mike Waychison and Andi did not yell at me when I suggested
it.  Hooking the idle notifier was born of an Andrew Morton suggestion
and, no surprise, seems to work well.


diff -pruN linux-2.6.20+01_poll_interval/arch/x86_64/kernel/entry.S 
linux-2.6.20+03_poll/arch/x86_64/kernel/entry.S
--- linux-2.6.20+01_poll_interval/arch/x86_64/kernel/entry.S2007-04-24 
22:46:19.0 -0700
+++ linux-2.6.20+03_poll/arch/x86_64/kernel/entry.S 2007-05-02 
20:50:38.0 -0700
@@ -282,7 +282,7 @@ sysret_careful:
 sysret_signal:
TRACE_IRQS_ON
sti
-   testl $(_TIF_SIGPENDING|_TIF_NOTIFY_RESUME|_TIF_SINGLESTEP),%edx
+   testl 
$(_TIF_SIGPENDING|_TIF_NOTIFY_RESUME|_TIF_SINGLESTEP|_TIF_MCE_NOTIFY),%edx
jz1f
 
/* Really a signal */
@@ -375,7 +375,7 @@ int_very_careful:
jmp int_restore_rest

 int_signal:
-   testl $(_TIF_NOTIFY_RESUME|_TIF_SIGPENDING|_TIF_SINGLESTEP),%edx
+   testl 
$(_TIF_NOTIFY_RESUME|_TIF_SIGPENDING|_TIF_SINGLESTEP|_TIF_MCE_NOTIFY),%edx
jz 1f
movq %rsp,%rdi  #  -> arg1
xorl %esi,%esi  # oldset -> arg2
@@ -599,7 +599,7 @@ retint_careful:
jmp retint_check

 retint_signal:
-   testl $(_TIF_SIGPENDING|_TIF_NOTIFY_RESUME|_TIF_SINGLESTEP),%edx
+   testl 
$(_TIF_SIGPENDING|_TIF_NOTIFY_RESUME|_TIF_SINGLESTEP|_TIF_MCE_NOTIFY),%edx
jzretint_swapgs
TRACE_IRQS_ON
sti
diff -pruN linux-2.6.20+01_poll_interval/arch/x86_64/kernel/mce.c 
linux-2.6.20+03_poll/arch/x86_64/kernel/mce.c
--- linux-2.6.20+01_poll_interval/arch/x86_64/kernel/mce.c  2007-04-27 
14:19:08.0 -0700
+++ linux-2.6.20+03_poll/arch/x86_64/kernel/mce.c   2007-05-02 
21:02:16.0 -0700
@@ -20,12 +20,15 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include  
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #define MISC_MCELOG_MINOR 227
 #define NR_BANKS 6
@@ -39,8 +42,7 @@ static int mce_dont_init;
 static int tolerant = 1;
 static int banks;
 static unsigned long bank[NR_BANKS] = { [0 ... NR_BANKS-1] = ~0UL };
-static unsigned long console_logged;
-static int notify_user;
+static unsigned long notify_user;
 static int rip_msr;
 static int mce_bootlog = 1;
 static atomic_t mce_events;
@@ -48,6 +50,8 @@ static atomic_t mce_events;
 static char trigger[128];
 static char *trigger_argv[2] = { trigger, NULL };
 
+static DECLARE_WAIT_QUEUE_HEAD(mce_wait);
+
 /*
  * Lockless MCE logging infrastructure.
  * This avoids deadlocks on printk locks without having to break locks. Also
@@ -94,8 +98,7 @@ void mce_log(struct mce *mce)
mcelog.entry[entry].finished = 1;
wmb();
 
-   if

Re: [RELEASE] Lguest for 2.6.21

2007-05-02 Thread WANG Cong

On Thu, May 03, 2007 at 09:00:48AM +1000, Rusty Russell wrote:
>On Thu, 2007-05-03 at 03:33 +0800, WANG Cong wrote:
>> Hi Rusty!
>> 
>> I found you forgot to check the return value of copy_from_user, and
>> here is the fix for drivers/lguest/interrupts_and_traps.c.
>> 
>> Signed-off-by: WANG Cong <[EMAIL PROTECTED]>
>
>Hi Wang!
>
>   Thanks for the patch.  This omission (in several places) was
>deliberate.  We can't really do anything sensible if the user unmapped
>the page.  I assume you saw a gcc warning from this code?

Yes. In fact, I got two warnings, another one is in drivers/lguest/hypercalls.c.

If I understand you correctly, you mean we can do nothing useful to fix it?

>
>   We could also use lgread() in these places which does this check and
>kills the guest if something goes wrong.  I'll check the benchmarks to
>make sure the (slight) extra overhead doesn't cause a regression...
>
>Thanks!
>Rusty.

I have sent a mail which described the errors I got when comipling 
Documentation/lguest/lguest.c. But it seems that you didn't receive it (it 
didn't appear in lkml.org neither!). It is that, I have already made my .config 
as you suggested, but I still can't compile Documentation/lguest/lguest.c, 
errors are:

lguest.c: In function 'add_to_bridge':
lguest.c:779: error: 'SIOCBRADDIF' undeclared (first use in this function)
lguest.c:779: error: (Each undeclared identifier is reported only once
lguest.c:779: error: for each function it appears in.)

Can you help me out?

Thanks!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ck] [REPORT] 2.6.21.1 vs 2.6.21-sd046 vs 2.6.21-cfs-v6

William Lee Irwin III wrote:
> Con Kolivas wrote:
> >> Looks good, thanks. Ingo's been hard at work since then and has v8 out
> >> by now. SD has not changed so you wouldn't need to do the whole lot of
> >> tests on SD again unless you don't trust some of the results.
>
> On Thu, May 03, 2007 at 02:11:39AM +0300, Al Boldi wrote:
> > Well, I tried cfs-v8 and it still shows some nice regressions wrt
> > mainline/sd.  SD's nice-levels look rather solid, implying fairness.
>
> That's odd. The ->load_weight changes should've improved that quite
> a bit. There may be something slightly off in how lag is computed,
> or maybe the O(n) lag issue Ying Tang spotted is biting you.

Is it not biting you too?

> Also, I should say that the nice number affairs don't imply fairness
> per se. The way that works is that when tasks have "weights" (like
> nice levels in UNIX) the definition of fairness changes so that each
> task gets shares of CPU bandwidth proportional to its weight instead
> of one share for one task.

Ok, but you can easily expose scheduler unfairness by using nice levels as 
relative magnifiers; provided nice levels are implemented correctly.

> It takes a bit closer inspection than feel tests to see if weighted
> fairness is properly implemented. One thing to try is running a number
> of identical CPU hogs at the same time at different nice levels for a
> fixed period of time (e.g. 1 or 2 minutes) so they're in competition
> with each other and seeing what percent of the CPU each gets. From
> there you can figure out how many shares each is getting for its nice
> level. Trying different mixtures of nice levels and different numbers
> of tasks should give consistent results for the shares of CPU bandwidth
> the CPU hogs get for being at a particular nice level. A scheduler gets
> "bonus points" (i.e. is considered better at prioritizing) for the user
> being able to specify how the weightings come out. The finer-grained
> the control, the more bonus points.
>
> Maybe con might want to take a stab at having users be able to specify
> the weights for each nice level individually.
>
> CFS actually has a second set of weights for tasks, namely the
> timeslice for a given task. At the moment, they're all equal. It should
> be the case that the shorter the timeslice a given task has, the less
> latency it gets. So there is a fair amount of room for it to manuever
> with respect to feel tests. It really needs to be done numerically to
> get results we can be sure mean something.
>
> The way this goes is task t_i gets a percent of the CPU p_i when the
> tasks t_1, t_2, ..., t_n are all competing, and task t_i has nice level
> n_i. The share corresponding to nice level n_i is then
>
>   p_i
>   w_i = ---
> sum p_j
>
> One thing to check for is that if two tasks have the same nice level
> that their weights come out about equal. So for t_i and t_j, if n_i
> = n_j then you check that at least approximately, w_i = w_j, or even
> p_i = p_j, since we're not starting and stopping tasks in the midst of
> the test. Also, you can't simplify sum p_j to 1, since the set of tasks
> may not be the only things running.
>
> The other thing to do is try a different number of tasks with a
> different mix of nice levels. The weight w_i for a given nice
> level n_i should be the same even in a different mix of tasks
> and nice levels if the nice levels are the same.
>
> If this sounds too far out, there's nothing to worry about. You can
> just run the different numbers of tasks with different mixes of nice
> levels and post the %cpu numbers. Or if that's still a bit far out
> for you, a test that does all this is eventually going to get written.

chew.c does exactly that, just make sure sched_granularity_ms >= 5,000,000.


Thanks!

--
Al


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: cache-pipe-buf-page-address-for-non-highmem-arch.patch

2007-05-02 Thread Ken Chen


On 5/1/07, Andrew Morton <[EMAIL PROTECTED]> wrote:

Fair enough, it is a bit of an ugly thing.  And I see no measurements there
on what the overall speedup was for any workload.

Ken, which memory model was in use?  sparsemem?


discontigmem with config_numa on.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Ext3 vs NTFS performance

David Chinner wrote:
> On Tue, May 01, 2007 at 01:43:18PM -0700, Cabot, Mason B wrote:
> > I've been testing the NAS performance of ext3/Openfiler 2.2 against
> > NTFS/WinXP and have found that NTFS significantly outperforms ext3 for
> > video workloads. The Windows CIFS client will attempt a poor-man's
> > pre-allocation of the file on the server by sending 1-byte writes at
> > 128K-byte strides, breaking block allocation on ext3 and leading to
> > fragmentation and poor performance. This will happen for many
> > applications (including iTunes) as the CIFS client issues these
> > pre-allocates under the application layer.
> > 
> > I've posted a brief paper on Intel's OSS website
> > (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give
> > it a read and let me know what you think. In particular, I'd like to
> > arrive at the right place to fix this problem: is it in the filesystem,
> > VFS, or Samba?

It's a Samba problem.  Samba doesn't do async writes, which v3.0 should have 
fixed.  Did you try that?

> As I commented on IRC to Val Henson - the XFS performance indicates
> that it is not a VFS or Samba problem.

XFS somewhat hides the Samba problem, by efficiently syncing to disk.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: dead CONFIG_ variables: net/ipv4/

2007-05-02 Thread Patrick McHardy

Robert P. J. Day wrote:
>   again, this list contains some CONFIG_ variables that aren't
> technically dead, but *really* should be renamed to not be confused
> with Kconfig variables.  there are, however, legitimately dead ones in
> the following in places:

Please post to netdev for networking related things.

> $ ../dead_config.sh net/ipv4
> == IP_NF_NAT ==
> net/ipv4/netfilter/Kconfig:# If they want FTP, set to $CONFIG_IP_NF_NAT (m or 
> y),
> net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c:#if !defined(CONFIG_IP_NF_NAT) 
> && !defined(CONFIG_IP_NF_NAT_MODULE)
> net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c:#if !defined(CONFIG_IP_NF_NAT) 
> && !defined(CONFIG_IP_NF_NAT_MODULE)

This one is a bug, thanks. Could you post results from the
other net/ subdirectories please?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v8




On Wed, May 02, 2007 at 11:06:34PM +0530, Srivatsa Vaddagiri wrote:
  

There is also p->wait_runtime which is taken into account when
calculating p->fair_key. So if p3 had waiting in runqueue for long
before, it can get to run quicker than 10ms later.



Virtual time is time from the task's point of view, which it has spent
executing. ->wait_runtime is a device to subtract out time spent on the
runqueue but not running from what would otherwise be virtual time to
express lag, whether deliberately or coincidentally. ->wait_runtime
would not be useful for EEVDF AFAICT, though it may be interesting to
report.
I just want to point out that ->wait_runtime, in fact, stores the lag of 
each task in CFS, except that it is also used by other things, and 
occasionally tweaked (heuristically ?). Under normal cases the sum of 
lags of all active tasks in such a system, should be a constant 0. The 
lag information is equally important to EEVDF, when some tasks leave the 
system (becomes inactive) carrying certain amount of lag. The key point 
here is that we have to spread the lag (either negative or positive) to 
all remaining task, so that the fairness of the system is preserved. I 
thinks CFS implementation does not seems to handle this properly.


I am running out time today :-( I will write an email about CFS -v8 
tomorrow, describing 2 issues in CFS I found related to this.


Ting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v8




Li, Tong N wrote:

Thanks for the excellent explanation. I think EEVDF and many algs alike
assume global ordering of all tasks in the system (based on virtual
time), whereas CFS does so locally on each processor and relies on load
balancing to achieve fairness across processors. It'd achieve strong
fairness locally, but I'm not sure about its global fairness properties
in an MP environment. If ideally the total load weight on each processor
is always the same, then local fairness would imply global fairness, but
this is a bin packing problem and is intractable ...
First, I am not assuming a global ordering of all tasks. As the current 
implementation, EEVDF should maintain virtual time locally for each CPU. 
EEVDF is a proportional  time share scheduler, therefore the relative 
weight and actual cpu share for each task varies when tasks join and 
leave. There will be not bin-pack problem for such systems.


I understand that bin-pack problem does exist in Real-time world. 
Suppose in a system has 2 cpus,  there a 3 tasks, all of which needs to 
finish 30ms work within a window of 50ms. Any 2 of them stay together 
will exceeds the bandwidth of one cpu. There is a bin-pack problem, 
unless the system has to be clever enough to break one of them down into 
2 requests of 15ms/25ms, and execute them on different cpus at different 
time without overlap, which is quite difficult :-)


In the proportional world, weights and cpu share are scale to fit with 
the bandwidth of a cpu. Therefore putting 2 of them on one cpu is fine, 
and the fairness for each cpu is preserved. On the other hand, moving 
one task back and forth among 2 cpus do give better throughput and 
better global fairness. I have not dig into the load balancing 
algorithms of SMP yet, so I leave it aside for now, first thing first :-)


Thanks !

Ting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

drivers/rtc/hctosys.c: unable to open rtc device (rtc)

2007-05-02 Thread Mark Lord


Okay, what's this message all about,
and how do I get rid of it?

During early boot:

   drivers/rtc/hctosys.c: unable to open rtc device (rtc)

My kernel .config file is attached.

Thanks
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.21.1-cfs-v8
# Wed May  2 22:41:22 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_UTS_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_CPUSETS is not set
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_EMBEDDED=y
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
# CONFIG_VM_EVENT_COUNTERS is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y

#
# Block layer
#
CONFIG_BLOCK=y
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=m
# CONFIG_DEFAULT_AS is not set
CONFIG_DEFAULT_DEADLINE=y
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="deadline"

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_SMP=y
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_PARAVIRT is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
CONFIG_MCORE2=y
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_NR_CPUS=2
# CONFIG_SCHED_SMT is not set
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_BKL=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_NONFATAL=y
CONFIG_X86_MCE_P4THERMAL=y
CONFIG_VM86=y
# CONFIG_TOSHIBA is not set
CONFIG_I8K=m
CONFIG_X86_REBOOTFIXUPS=y
CONFIG_MICROCODE=m
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m

#
# Firmware Drivers
#
CONFIG_EDD=m
CONFIG_DELL_RBU=m
CONFIG_DCDBAS=m
# CONFIG_NOHIGHMEM is not set
CONFIG_HIGHMEM4G=y
# CONFIG_HIGHMEM64G is not set
CONFIG_VMSPLIT_3G=y
# CONFIG_VMSPLIT_3G_OPT is not set
# CONFIG_VMSPLIT_2G is not set
# CONFIG_VMSPLIT_1G is not set
CONFIG_PAGE_OFFSET=0xC000
CONFIG_HIGHMEM=y
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y

Re: [PATCH] [RFC] Added USB_DEVICE_INTERFACE_PROTOCOL

On Wednesday 02 May 2007 18:04, Greg KH wrote:
> On Wed, May 02, 2007 at 05:03:05PM +0200, Jan Kratochvil wrote:
> > The USB_DEVICE_INTERFACE_PROTOCOL will allow to match one interface
> > protocol of vendor specific device.
> > This macro is used in patch adding support for xbox360 to xpad.c
> > 
> > Signed-off-by: Jan Kratochvil <[EMAIL PROTECTED]>
> 
> I have no objection to this, other than you need an additional newline
> after the #define :)
> 
> Dmitry, I can take this through my tree, or you can take it through
> yours, as I think the other patches in this series depend on this.
> 
> If you want to take it through yours, feel free to add:
>   Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
> 

I will grab it once some issues with the patch set are resolved.

Thanks,

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] xpad.c: Added Xbox360 gamepad rumble support.

On Wednesday 02 May 2007 11:05, Jan Kratochvil wrote:
> 
> +config XPAD_FF
> + default n

Please don't default to anything.

>  
> +#ifdef CONFIG_XPAD_FF
> +/**
> + * xpad_irq_out
> + */

Comments are welcome when they say something...

> +static void xpad_irq_out(struct urb *urb)
> +{
> + int retval;
> +
> + switch (urb->status) {
> + case 0:
> + /* success */
> + break;
> + case -ECONNRESET:
> + case -ENOENT:
> + case -ESHUTDOWN:
> + /* this urb is terminated, clean up */
> + dbg("%s - urb shutting down with status: %d",  
> __FUNCTION__, urb->status);
> + return;
> + default:
> + dbg("%s - nonzero urb status received: %d",  
> __FUNCTION__, urb->status);
> + goto exit;
> + }
> +
> +exit:
> + retval = usb_submit_urb(urb, GFP_ATOMIC);
> + if (retval)
> + err("%s - usb_submit_urb failed with result %d",
> +__FUNCTION__, retval);
> +} 
> +
> +int xpad_play_effect(struct input_dev *dev, void *data, struct ff_effect 
> *effect)
> +{
> + struct usb_xpad *xpad = dev->private;
> + if (effect->type == FF_RUMBLE) {
> + __u16 strong = effect->u.rumble.strong_magnitude;
> + __u16 weak = effect->u.rumble.weak_magnitude;
> + xpad->odata[0] = 0x00; 
> + xpad->odata[1] = 0x08; 
> + xpad->odata[2] = 0x00; 
> + xpad->odata[3] = strong / 256;
> + xpad->odata[4] = weak / 256; 
> + xpad->odata[5] = 0x00;
> + xpad->odata[6] = 0x00;
> + xpad->odata[7] = 0x00;
> + usb_submit_urb(xpad->irq_out, GFP_KERNEL);
> + }
> +
> + return 0;
> +}
> +
> +static int xpad_init_ff(struct usb_interface *intf, struct usb_xpad *xpad)
> +{
> + if (xpad->flags & XPAD_FLAGS_XBOX360) {
> + struct usb_endpoint_descriptor *ep_irq_out;
> + int rv;
> +
> + xpad->odata = usb_buffer_alloc(xpad->udev, XPAD_PKT_LEN, 
> +GFP_ATOMIC, >odata_dma );
> + if (!xpad->idata)
> + goto fail1;
> +
> + xpad->irq_out = usb_alloc_urb(0, GFP_KERNEL);
> + if (!xpad->irq_out)
> + goto fail2;
> +
> +
> + ep_irq_out = >cur_altsetting->endpoint[1].desc;
> + usb_fill_int_urb(xpad->irq_out, xpad->udev,
> +  usb_sndintpipe(xpad->udev, 
> ep_irq_out->bEndpointAddress),
> +  xpad->odata, XPAD_PKT_LEN,
> +  xpad_irq_out, xpad, ep_irq_out->bInterval);
> + xpad->irq_out->transfer_dma = xpad->odata_dma;
> + xpad->irq_out->transfer_flags |= URB_NO_TRANSFER_DMA_MAP;
> +
> + set_bit( FF_RUMBLE, xpad->dev->ffbit );
> + rv = input_ff_create_memless(xpad->dev, NULL, xpad_play_effect);
> +

Error handling seems to be missing.

> + return 0;
> +
> +fail2:   usb_buffer_free(xpad->udev, XPAD_PKT_LEN, xpad->odata, 
> xpad->odata_dma);
> +fail1:   
> + return -ENOMEM;
> + }
> + return 0;
> +}
> +
> +static void xpad_deinit_ff(struct usb_interface *intf, struct usb_xpad *xpad)
> +{
> + if (xpad->flags & XPAD_FLAGS_XBOX360) {
> + usb_kill_urb(xpad->irq_out);

You may want to do that in xpad_close().

> + usb_free_urb(xpad->irq_out);
> + usb_buffer_free(interface_to_usbdev(intf), XPAD_PKT_LEN,
> + xpad->odata, xpad->odata_dma);
> + }
> +}
> +#endif
> +
>  static int xpad_open (struct input_dev *dev)
>  {
>   struct usb_xpad *xpad = dev->private;
> @@ -432,6 +535,11 @@ static int xpad_probe(struct usb_interface *intf, const 
> struct usb_device_id *id
>  
>   input_dev->evbit[0] = BIT(EV_KEY) | BIT(EV_ABS);
>  
> +#ifdef CONFIG_XPAD_FF
> + if (xpad->flags & XPAD_FLAGS_XBOX360)
> + input_dev->evbit[0] |= BIT(EV_FF);
> +#endif

Can this be moved into xpad_init_ff?

> +
>   /* set up buttons */
>   for (i = 0; xpad_btn[i] >= 0; i++)
>   set_bit(xpad_btn[i], input_dev->keybit);
> @@ -449,6 +557,11 @@ static int xpad_probe(struct usb_interface *intf, const 
> struct usb_device_id *id
>   for (i = 0; xpad_abs_pad[i] >= 0; i++)
>   xpad_set_up_abs(input_dev, xpad_abs_pad[i]);
>  
> +#ifdef CONFIG_XPAD_FF
> + if (xpad_init_ff(intf, xpad))
> + goto fail2;
> +#endif
> +

Normally we define dummy fucntions when corresponding config option is disabled
to avoid littering main code with #ifdefs.

>   ep_irq_in = >cur_altsetting->endpoint[0].desc;
>   usb_fill_int_urb(xpad->irq_in, udev,
>usb_rcvintpipe(udev, ep_irq_in->bEndpointAddress),
> @@ -476,6 +589,9 @@ static void

Re: [PATCH 2.6.21 1/3] x86_64: EFI64 support

On Tue, 01 May 2007 11:59:46 -0700 Chandramouli Narayanan wrote:

> EFI x86_64 build option is added to the kernel configuration.


Hi Mouli,

Can you share EFI code as much as possible among ia64, i386,
and x86_64 instead of duplicating it?


A diffstat patch summary would be Good.
(see Documentation/SubmittingPatches)


> diff -uprN -X linux-2.6.21rc7-git2-orig/Documentation/dontdiff 
> linux-2.6.21rc7-git2-orig/arch/x86_64/Kconfig 
> linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/Kconfig
> --- linux-2.6.21rc7-git2-orig/arch/x86_64/Kconfig 2007-04-19 
> 12:39:39.0 -0700
> +++ linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/Kconfig   2007-04-19 
> 13:01:02.0 -0700
> @@ -254,6 +254,20 @@ config X86_HT
>   depends on SMP && !MK8
>   default y
>  
> +config EFI
> + bool "Boot from EFI support (EXPERIMENTAL)"
> + default n
> + ---help---
> +

No blank line above.
Indent following lines by 2 spaces:  i.e., 
as in Documentation/CodingStyle.

> + This enables the the kernel to boot on EFI platforms using
> + system configuration information passed to it from the firmware.
> + This also enables the kernel to use any EFI runtime services that are
> + available (such as the EFI variable services).
> + This option is only useful on systems that have EFI firmware
> + and will result in a kernel image that is ~8k larger. However,
> + even with this option, the resultant kernel should continue to
> + boot on existing non-EFI platforms.
> +
>  config MATH_EMULATION
>   bool


> diff -uprN -X linux-2.6.21rc7-git2-orig/Documentation/dontdiff 
> linux-2.6.21rc7-git2-orig/include/asm-x86_64/bootsetup.h 
> linux-2.6.21rc7-git2-uefi-finaltest/include/asm-x86_64/bootsetup.h
> --- linux-2.6.21rc7-git2-orig/include/asm-x86_64/bootsetup.h  2007-04-19 
> 12:39:40.0 -0700
> +++ linux-2.6.21rc7-git2-uefi-finaltest/include/asm-x86_64/bootsetup.h
> 2007-04-19 13:01:02.0 -0700
> @@ -17,6 +17,12 @@ extern char x86_boot_params[BOOT_PARAM_S
>  #define APM_BIOS_INFO (*(struct apm_bios_info *) (PARAM+0x40))
>  #define DRIVE_INFO (*(struct drive_info_struct *) (PARAM+0x80))
>  #define SYS_DESC_TABLE (*(struct sys_desc_table_struct*)(PARAM+0xa0))
> +#define EFI_SYSTAB (*((unsigned long *)(PARAM+0x1b8)))
> +#define EFI_LOADER_SIG ((unsigned char *)(PARAM+0x1c0))
> +#define EFI_MEMDESC_SIZE (*((unsigned int *) (PARAM+0x1c4)))
> +#define EFI_MEMDESC_VERSION (*((unsigned int *) (PARAM+0x1c8)))
> +#define EFI_MEMMAP_SIZE (*((unsigned int *) (PARAM+0x1cc)))
> +#define EFI_MEMMAP (*((unsigned long *)(PARAM+0x1d0)))
>  #define MOUNT_ROOT_RDONLY (*(unsigned short *) (PARAM+0x1F2))
>  #define RAMDISK_FLAGS (*(unsigned short *) (PARAM+0x1F8))
>  #define SAVED_VIDEO_MODE (*(unsigned short *) (PARAM+0x1FA))
> diff -uprN -X linux-2.6.21rc7-git2-orig/Documentation/dontdiff 
> linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/efi.c 
> linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/efi.c
> --- linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/efi.c1969-12-31 
> 16:00:00.0 -0800
> +++ linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/efi.c  
> 2007-04-19 13:01:02.0 -0700
> @@ -0,0 +1,824 @@

> +extern unsigned long efi_call_phys(void *fp, u64 arg_num, ...);
> +struct efi efi;
> +EXPORT_SYMBOL(efi);
> +struct efi efi_phys __initdata;
> +struct efi_memory_map memmap ;

no space before ;

> +static efi_system_table_t efi_systab __initdata;
> +
> +static unsigned long efi_rt_eflags;
> +static spinlock_t efi_rt_lock = SPIN_LOCK_UNLOCKED;
> +static pgd_t save_pgd;
> +
> +/* Convert SysV calling convention to EFI x86_64 calling convention */
> +
> +static efi_status_t uefi_call_wrapper(void *fp, unsigned long va_num, ...)
> +{
> + va_list ap;
> + int i;
> + unsigned long args[EFI_ARG_NUM_MAX];
> + unsigned int arg_size,stack_adjust_size;

space after comma.

> + efi_status_t status;
> +
> + if (va_num > EFI_ARG_NUM_MAX || va_num<0)   {

va_num < 0) {

> + return EFI_LOAD_ERROR;
> + }
> + if (va_num==0)

if (va_num == 0)

> + /* There is no need to convert arguments for void argument. */
> + __asm__ __volatile__("call *%0;ret;"::"r"(fp));
> +
> + /* The EFI arguments is stored in an array. Then later on it will be 
> +  * pushed into stack or passed to registers according to MS ABI.

passed _to_ registers?  passed via or thru registers?

> +  */
> + va_start(ap, va_num);
> + for (i = 0; i < va_num; i++) {
> + args[i] = va_arg(ap, unsigned long);
> + }
> + va_end(ap);
> + arg_size = va_num*8;

arg_size = va_num * 8;

> + stack_adjust_size = (va_num > EFI_REG_ARG_NUM? EFI_REG_ARG_NUM : 
> va_num)*8;

Please re-read Documentation/CodingStyle.
> +
> + /* Starting from here, assembly code makes sure all registers used are
> +  * under controlled by our

Re: [patch] CFS scheduler, -v8


Hi,

  As encouraged by some of you, I have started implementing EEVDF. 
However, I am quite new in this area, and may not be experienced enough 
to get it through quickly.  The main problems, I am facing now ,is how 
to treat the semantics of yeild() and yield_to(). I probably will throw 
a lot of questions along the way of my implementation.


  Also I found my previous email was not clear enough in describing the 
properties of CFS and EEVDF and caused some confusion, and there were 
also some mistakes too. In this email, I will try to make up for that.


*** Let's start from CFS:
   For simplicity, let's assume that CFS preempt the current task p1 by 
another tasks p2, when p1->key - p2->key >1, and the virtual time 
rq->fair_clock is initialized to be 0. Suppose, at time t = 0, we start 
n+1 tasks that run long enough. task 1 has weight n and all other tasks 
have weight 1. It is clear that, at time t=0, p_1->key = p_2->key = ... 
=p_(n+1)-> key = rq->fair_clock = 0


  Since all tasks has the same key, CFS breaks the ties arbitrarily, 
which leads to many possibilities. Let's consider 2 of them:

_Case One:_ p1, which has weight n, executes first:
 t = 1: rq->fair_clock = 1/2n,  p1->key = 1/n   // others 
are not changed.

 t = 2: rq->fair_clock = 2/2n,  p1->key = 2/n
  ...
 t = n: rq->fair_clock = n/2n,   p1->key = n/n = 1
   Only after p1 executes n ticks, the scheduler will pick another task 
for execution. Between time [0, n)
the amount of actual work done by p1 is n. The amount of work should be 
done in ideal fluid-flow system is n * n/2n = n/2. Therefore the lag is 
n/2 - n = -n/2, negative means p1 goes faster than the ideal case. As we 
can see this lag is O(n).
_Case Two:_ the scheduler executes the tasks in the order p2, p3, ..., 
p_(n+1), p1
 t = 1: rq->fair_clock = 1/2n,  p2->key = 1; // others 
are not changed

 t = 2: rq->fair_clock = 2/2n,  p3->key = 1;
   
 t = n: rq->fair_clock = n/2n,  p_(n+1)->key = 1;
   Then the scheduler picks p1 (weight n) for execution. Between time 
[0, n) the amount actual work done by p1 is 0, and the ideal amount is 
n/2. Therefore the lag is n/2 - 0, positive means p1 falls behind the 
ideal case. The lag here for p1 is also O(n).
   As I said in the previous email, p->fair_key only has the 
information of past execution of a task and reflects a fair start point. 
It does not have the information about weight.


*** Now, let's look at EEVDF.
I have to say that I missed a very important concept in EEVDF which 
leads to confusions here. EEVDF stands for _Eligible_ Earliest Virtual 
Deadline First, and I did not explain what is _eligible_.


EEVDF maintains a virtual start time ve_i and virtual deadline vd_i for 
each task p_i, as well as a virtual time vt. A newly started/waked task 
has its ve_i initialized to be the current virtual time. Once a 
timeslice l_i amount of work is done, the new virtual start time is set 
to be the previous virtual deadline, and then virtual deadline vd_i is 
recalculated.
A task is eligible,  if and only if  ve_i  <=  current 
virtual time vt
EEVDF, at every tick, always picks the eligible task  which has the 
earliest virtual deadline for execution


Let's see how it works using a similar example as for CFS above.
Suppose, at time t = 0, we starts n+1 tasks. p1 has weight n, and all 
others have weight 1. For simplicity, we assume all task use timeslice 
l_i = 1, and virtual time vt is initialized to be 0.

  - at time t = 0, we have
vt = 0;
ve_1 = 0, vd_1 = ve_1 + l_1/w_1 = 1/n
ve_2 = 0, vd_2 = ve_1 + l_2/w_2 = 1
  ...
ve_(n+1) = 0, vd_(n+1) = ve_(n+1) + l_(n+1)/w_(n+1) = 1;
   Since p1 is eligible and has the earliest deadline 1/n, the 
scheduler will executes it first. (Here, the weight which encoded in the 
deadline plays an important rule, and allows higher weight tasks to be 
executed first).
- at time t = 1: 
 vt = 1/2n, 
 ve_1 = 1/n (previous vd_1), vd_1 = ve_1 + 1/n = 2/n
Since ve_1 > vt, p1 is _not_ eligible. EEVDF picks another task for 
execution by breaking the tie, say

it executes p2.
   - at time t = 2:
 vt = 2/2n = 1/n,  ve_1 = 1/n, vd_1 = 2/n
 ve_2 = 1, ve_2 = ve_2 + 1/1 = 2 // this makes 
p2 not eligible
 Since vt = ve_1, p1 becomes eligible again and has the earliest 
deadline 2/n, it will be scheduled for execution.  As EEVDF repeats, it 
give a schedule like p1, p2, p1,p3, p1, p4, p1   (presented by each 
tick).  As you can see, now p1 never falls behind/goes before the ideal 
case by 1.
   
 Now, let's check how timeslice l_i impacts the system. Suppose, we 
change the timeslice of p1 from 1 to 2, and keep others unchanged. EEVDF 
gives a schedule like:
   p1, p1, p2, p3, p1, p1, p4, p5, p1, p1,

Re: [PATCH 1/3] xpad.c: Added flags into xpad_device structure and removed dpad_mapping.

Hi Jan,

On Wednesday 02 May 2007 11:01, Jan Kratochvil wrote:
> This changes are expected to simplify further improves of this driver,
> We will need to add information if the driver is xbox360 device or not.
> 
> Second option was to simply add u8 is_360, but what if we'll need to know
> if device is a wheel? Or if the device can have keyboard (or headset) 
> attached.
> 

...

> -#define MAP_DPAD_TO_BUTTONS0
> -#define MAP_DPAD_TO_AXES   1
> -#define MAP_DPAD_UNKNOWN   -1
> +#define XPAD_FLAGS_DPAD_TO_BUTTONS(1 << 0)
> +#define XPAD_FLAGS_DPAD_TO_AXES   (1 << 1)
> +#define XPAD_FLAGS_DPAD_UNKNOWN   (1 << 2)
> 

Turning this into bitmaps suggests that all of these could be set which is
not the case. Since there are 3 spare bytes in xpad_device structure to
use for additional flags/bitmaps I'd leave dpad_mapping alone.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.21: Kernel won't boot with either/both of CONFIG_NO_HZ, CONFIG_HIGH_RES_TIMERS

2007-05-02 Thread Mark Lord


Thomas Gleixner wrote:

..
I try to come up with something, but I'm travelling tomorrow, so it
might be not before end of week.


Thanks, Thomas.

I believe we definitely want to nail this down before 2.6.22-final,
but there's a good workaround in the interim (CONFIG_DETECT_SOFTLOCKUP=y)
and we've got at least a couple of months 'till then.

I think I may have fiddled with the RTC config, so I'll compare with my
old config and see what changed.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] i386: fix suspend/resume with dynamically allocated irq stacks

On Wed, May 02, 2007 at 06:56:09PM -0700, Jeremy Fitzhardinge wrote:
>> +static void __cpuinit __free_irqstack(int cpu, void *stk)
>> +{
>> +int i;
>> +
>> +if (!cpu)
>> +return;
>> +
>> +unmap_vm_area(per_cpu(irqstack_area, cpu));
>> +
>> +for (i = 0; i < THREAD_SIZE/PAGE_SIZE; ++i)
>> +__free_page(per_cpu(irqstack_pages, cpu)[i]);
>> +}

On Wed, May 02, 2007 at 07:25:34PM -0700, Bill Irwin wrote:
[...]

Not sure if cpu 0 can ever be offlined, but it is remapped and so on.
So its virtual mapping should be undone if it can be offlined, but we
probably shouldn't attempt to free its underlying bootmem allocation.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Why ssse3?

Ulrich Drepper wrote:
> Andi Kleen wrote:
>> Nope. SSE3 != SSSE3. The additional S means Supplemential.
>>
>> It's probably because the few changes didn't justify a SSE4
> 
> OK, the problem is that the actual sse3 bit is misnamed.  According to
> Intel's docs bit 0 of ECX is "sse", the kernel uses "pni".  Too bad.

Intel has a nasty habit of renaming things after they are already
deployed in Linux.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] i386: fix suspend/resume with dynamically allocated irq stacks

On Wed, May 02, 2007 at 06:56:09PM -0700, Jeremy Fitzhardinge wrote:
> +static void __cpuinit __free_irqstack(int cpu, void *stk)
> +{
> + int i;
> +
> + if (!cpu)
> + return;
> +
> + unmap_vm_area(per_cpu(irqstack_area, cpu));
> +
> + for (i = 0; i < THREAD_SIZE/PAGE_SIZE; ++i)
> + __free_page(per_cpu(irqstack_pages, cpu)[i]);
> +}

This will leak the vm_struct and also leave it in the vmlist.
work_free_thread_info() needs to be redone too. It should be

remove_vm_area( /* unmap and remove the vm_struct from vmlist */ );
kfree( /* free the vm_struct */ );
for (i = 0; i < THREAD_SIZE/PAGE_SIZE; ++i)
__free_page( /* dredge up the appropriate page to free */ );


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 19/33] NuBus header update

2007-05-02 Thread Brad Boyer

On Wed, May 02, 2007 at 08:47:13PM +0100, James Simmons wrote:
> Will we see nubus ported over to the driver model soon :-)

I think it will happen eventually. However, my first priority
personally is to get the macio code working on non-PCI macs
to get the onboard stuff onto the driver model. I did look at
it, and NuBus does lend itself to this conversion due to the
fact that it does in general have devices that can be detected
at runtime and a lot of generic infrastructure. It's just a
matter of priorities and a lack of time all around.

Brad Boyer
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/5] sysfs: printk format warning

From: Randy Dunlap <[EMAIL PROTECTED]>

Fix sysfs printk format warning:
fs/sysfs/bin.c:62: warning: format '%d' expects type 'int', but argument 4 has 
type 'size_t'

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---
 fs/sysfs/bin.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
index 8ea2a51..d3b9f5f 100644
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -59,7 +59,7 @@ read(struct file * file, char __user * userbuf, size_t count, 
loff_t * off)
if (copy_to_user(userbuf, buffer, count))
return -EFAULT;
 
-   pr_debug("offs = %lld, *off = %lld, count = %d\n", offs, *off, count);
+   pr_debug("offs = %lld, *off = %lld, count = %zd\n", offs, *off, count);
 
*off = offs + count;
 
-- 
1.5.1.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: arch/i386/boot rewrite, and all the hard-coded video cards

Andi Kleen wrote:
> 
> I agree; that code can all go.
> 
> What also seems to miss are the early CPUID checks I recently added
> and which x86-64 has for some time.
> 

I probably need to rebase against your tree.  It makes more sense, anyway.

Either way, I just added a pretty decent framework for testing the CPU
features and barfing if they're missing.

> Also if you ever add x86-64 support it does an additional BIOS
> call to tell the BIOS it is 64bit.

Added.

-hpa


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/5] DOC: Fix wrong identifier name in Documentation/driver-model/devres.txt

From: Rolf Eike Beer <[EMAIL PROTECTED]>

Above and below we talk about my_midlayer_create_something, I assume that is
also meant here.

Signed-off-by: Rolf Eike Beer <[EMAIL PROTECTED]>
Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---
 Documentation/driver-model/devres.txt |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Documentation/driver-model/devres.txt 
b/Documentation/driver-model/devres.txt
index 5163b85..6c8d8f2 100644
--- a/Documentation/driver-model/devres.txt
+++ b/Documentation/driver-model/devres.txt
@@ -182,7 +182,7 @@ For example, you can do something like the following.
 
...
 
-   devres_close_group(dev, my_midlayer_something);
+   devres_close_group(dev, my_midlayer_create_something);
return 0;
   }
 
-- 
1.5.1.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/5] Driver core: fix show_uevent from taking up way too much stack

Declaring an array of PAGE_SIZE does bad things for people running with
4k stacks...

Thanks to Tilman Schmidt for tracking this down.

Cc: Tilman Schmidt <[EMAIL PROTECTED]>
Cc: Kay Sievers <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---
 drivers/base/core.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 8aa090d..59d9816 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -252,7 +252,7 @@ static ssize_t show_uevent(struct device *dev, struct 
device_attribute *attr,
struct kobject *top_kobj;
struct kset *kset;
char *envp[32];
-   char data[PAGE_SIZE];
+   char *data = NULL;
char *pos;
int i;
size_t count = 0;
@@ -276,6 +276,10 @@ static ssize_t show_uevent(struct device *dev, struct 
device_attribute *attr,
if (!kset->uevent_ops->filter(kset, >kobj))
goto out;
 
+   data = (char *)get_zeroed_page(GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
/* let the kset specific function add its keys */
pos = data;
retval = kset->uevent_ops->uevent(kset, >kobj,
@@ -290,6 +294,7 @@ static ssize_t show_uevent(struct device *dev, struct 
device_attribute *attr,
count += sprintf(pos, "%s\n", envp[i]);
}
 out:
+   free_page((unsigned long)data);
return count;
 }
 
-- 
1.5.1.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/5] platform: reorder platform_device_del

From: Jean Delvare <[EMAIL PROTECTED]>

In platform_device_del(), we currently delete the device resources
first, then we delete the device itself. This causes a (minor) bug to
occur when one unregisters a platform device before unregistering its
platform driver, and the driver is requesting (in .probe()) and
releasing (in .remove()) a resource of the device. The device
resources are already gone by the time the driver gets the chance to
release the resources it had been requesting, causing an error like:
Trying to free nonexistent resource <0295-0296>

If the platform driver is unregistered first, the problem doesn't
occur, as the driver will have the opportunity to release the
resources it had requested before the device resources themselves are
released. It's a bit odd that unregistering the driver first or the
device first doesn't lead to the same result.

So I believe that we should delete the device first in
platform_device_del(). I've searched the git history and found that it
used to be the case before 2.6.8, but was changed here:

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commitdiff;h=96ef7b3689936ee1e64b711511342026a8ce459c

> 2004/07/14 16:09:44-07:00 dtor_core
> [PATCH] Driver core: Fix OOPS in device_platform_unregister
>
> Driver core: platform_device_unregister should release resources first
>  and only then call device_unregister, otherwise if there
>  are no more references to the device it will be freed and
>  the fucntion will try to access freed memory.

However we now have an explicit call to put_device() at the end of
platform_device_unregister() so I guess the original problem no longer
exists and it is safe to revert that change.

Signed-off-by: Jean Delvare <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
---
 drivers/base/platform.c |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 30480f6..17b5ece 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -292,20 +292,22 @@ EXPORT_SYMBOL_GPL(platform_device_add);
  * @pdev:  platform device we're removing
  *
  * Note that this function will also release all memory- and port-based
- * resources owned by the device (@dev->resource).
+ * resources owned by the device (@dev->resource).  This function
+ * must _only_ be externally called in error cases.  All other usage
+ * is a bug.
  */
 void platform_device_del(struct platform_device *pdev)
 {
int i;
 
if (pdev) {
+   device_del(>dev);
+
for (i = 0; i < pdev->num_resources; i++) {
struct resource *r = >resource[i];
if (r->flags & (IORESOURCE_MEM|IORESOURCE_IO))
release_resource(r);
}
-
-   device_del(>dev);
}
 }
 EXPORT_SYMBOL_GPL(platform_device_del);
-- 
1.5.1.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PATCH] PCI patches for 2.6.21

2007-05-02 Thread Greg KH

Here are a bunch of PCI patches against your 2.6.21 git tree.

They contain:
- pci hotplug driver updates
- MSI reworks and cleanups to try to get the PPC MSI code merged
  eventually
- removal of unneeded pci.h inclusion (the majority of all of
  the different files this series touches).
- other bugfixes and minor features.

All of these have been in the -mm tree for a month or so.

Please pull from:
master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6.git/

The full patches will be sent to the linux-pci mailing list, if anyone
wants to see it

thanks,

greg k-h

 Documentation/feature-removal-schedule.txt   |7 -
 Documentation/pci.txt|   12 +-
 Documentation/power/pci.txt  |2 +-
 arch/alpha/kernel/err_common.c   |1 -
 arch/alpha/kernel/err_ev6.c  |1 -
 arch/alpha/kernel/err_ev7.c  |1 -
 arch/arm/Kconfig |1 +
 arch/i386/Kconfig|1 +
 arch/i386/kernel/cpu/cpufreq/speedstep-lib.c |1 -
 arch/i386/kernel/cpu/cpufreq/speedstep-smi.c |2 +-
 arch/i386/kernel/io_apic.c   |4 +-
 arch/i386/pci/fixup.c|2 +-
 arch/i386/pci/i386.c |4 +-
 arch/ia64/Kconfig|1 +
 arch/ia64/sn/kernel/huberror.c   |1 -
 arch/ia64/sn/kernel/msi_sn.c |4 +-
 arch/ia64/sn/kernel/xpnet.c  |1 -
 arch/m68knommu/kernel/dma.c  |1 -
 arch/mips/lib/iomap.c|1 -
 arch/powerpc/kernel/pci_64.c |2 +-
 arch/powerpc/platforms/pseries/ras.c |1 -
 arch/ppc/8260_io/enet.c  |1 -
 arch/ppc/8260_io/fcc_enet.c  |1 -
 arch/ppc/8xx_io/enet.c   |1 -
 arch/ppc/syslib/ppc4xx_sgdma.c   |1 -
 arch/sh64/mach-cayman/iomap.c|1 -
 arch/sparc64/Kconfig |1 +
 arch/sparc64/kernel/pci.c|4 +-
 arch/sparc64/kernel/pci_sun4v.c  |4 +-
 arch/x86_64/Kconfig  |1 +
 arch/x86_64/kernel/io_apic.c |4 +-
 arch/xtensa/kernel/xtensa_ksyms.c|1 -
 arch/xtensa/platform-iss/setup.c |1 -
 drivers/atm/adummy.c |1 -
 drivers/base/dd.c|   41 +--
 drivers/char/agp/alpha-agp.c |2 +-
 drivers/char/agp/parisc-agp.c|2 +-
 drivers/char/hw_random/via-rng.c |1 -
 drivers/char/pcmcia/synclink_cs.c|1 -
 drivers/char/tpm/tpm.h   |1 -
 drivers/char/watchdog/sc1200wdt.c|1 -
 drivers/char/watchdog/scx200_wdt.c   |2 +-
 drivers/i2c/busses/i2c-at91.c|1 -
 drivers/i2c/busses/i2c-mpc.c |1 -
 drivers/i2c/busses/i2c-pca-isa.c |1 -
 drivers/ieee1394/hosts.c |1 -
 drivers/infiniband/core/cm.c |1 -
 drivers/infiniband/core/iwcm.c   |1 -
 drivers/infiniband/core/mad_priv.h   |1 -
 drivers/infiniband/core/multicast.c  |1 -
 drivers/infiniband/core/sa_query.c   |1 -
 drivers/infiniband/core/user_mad.c   |1 -
 drivers/infiniband/hw/ipath/ipath_fs.c   |1 -
 drivers/infiniband/hw/ipath/ipath_layer.c|1 -
 drivers/infiniband/hw/ipath/ipath_stats.c|2 -
 drivers/infiniband/hw/ipath/ipath_sysfs.c|1 -
 drivers/infiniband/hw/mthca/mthca_memfree.h  |1 -
 drivers/infiniband/ulp/ipoib/ipoib.h |1 -
 drivers/isdn/hisax/netjet.c  |1 -
 drivers/isdn/hysdn/hysdn_proclog.c   |1 -
 drivers/media/dvb/cinergyT2/cinergyT2.c  |2 +-
 drivers/media/video/adv7170.c|1 -
 drivers/media/video/adv7175.c|1 -
 drivers/media/video/bt819.c  |1 -
 drivers/media/video/bt856.c  |1 -
 drivers/media/video/bt866.c  |1 -
 drivers/media/video/cx88/cx88-tvaudio.c  |1 -
 drivers/media/video/em28xx/em28xx-cards.c|1 -
 drivers/media/video/saa7111.c|1 -
 drivers/media/video/saa7114.c|1 -
 drivers/media/video/saa711x.c|1 -
 drivers/media/video/saa7185.c|1 -
 drivers/misc/hdpuftrs/hdpu_cpustate.c|1 -
 drivers/misc/hdpuftrs/hdpu_nexus.c   |1 -
 drivers/mtd/devices/doc2000.c|1 -
 drivers/mtd/devices/doc2001.c|1 -
 drivers/mtd/devices/doc2001plus.c|1 -
 drivers/mtd/devices/docecc.c |1 -
 drivers/mtd/inftlmount.c |1 -

[GIT PATCH] More core patches for 2.6.21

2007-05-02 Thread Greg KH

Here are some more driver core patche for 2.6.21.

They contain:
- fix a problem with blowing up the stack in ugly ways.
- compile warning fix
- documentation update
- remove 'struct subsystem' from the tree as it's pointless.  It
  only contained one field, a struct kset, and was confusing to
  everyone who ever had to use it.  This sets the stage for
  further cleanups to the driver and kobject core to be much
  saner and easier to understand.

The 'struct subsystem' removal patch has been in -mm for quite some
time, the others are all "obviously correct" :)

Please pull from:
master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6.git/

Patches will be sent as a follow-on to this message to lkml for people
to see.

thanks,

greg k-h


 Documentation/driver-model/devres.txt  |2 +-
 arch/arm/mach-omap1/pm.c   |6 +-
 arch/powerpc/kernel/vio.c  |4 +-
 arch/powerpc/platforms/pseries/power.c |8 ++--
 arch/s390/kernel/ipl.c |   32 ++--
 block/genhd.c  |   12 ++--
 drivers/base/base.h|2 +
 drivers/base/bus.c |   16 +++---
 drivers/base/class.c   |   18 +++
 drivers/base/core.c|   29 ++
 drivers/base/firmware.c|6 +-
 drivers/base/platform.c|8 ++-
 drivers/base/power/shutdown.c  |4 +-
 drivers/base/sys.c |   14 +++---
 drivers/firmware/efivars.c |   12 ++--
 drivers/input/evdev.c  |4 +-
 drivers/input/joydev.c |4 +-
 drivers/input/mousedev.c   |4 +-
 drivers/input/tsdev.c  |4 +-
 drivers/parisc/pdc_stable.c|   94 
 drivers/pci/hotplug/acpiphp_ibm.c  |4 +-
 drivers/pci/hotplug/pci_hotplug_core.c |4 +-
 fs/configfs/mount.c|2 +-
 fs/debugfs/inode.c |2 +-
 fs/dlm/lockspace.c |2 +-
 fs/ecryptfs/main.c |   12 ++--
 fs/fuse/inode.c|4 +-
 fs/gfs2/locking/dlm/sysfs.c|2 +-
 fs/gfs2/sys.c  |2 +-
 fs/ocfs2/cluster/masklog.c |4 +-
 fs/ocfs2/cluster/masklog.h |2 +-
 fs/ocfs2/cluster/sys.c |7 +--
 fs/partitions/check.c  |6 +-
 fs/sysfs/bin.c |2 +-
 fs/sysfs/file.c|   11 ++--
 include/acpi/acpi_bus.h|2 +-
 include/linux/device.h |8 ++--
 include/linux/fs.h |2 +-
 include/linux/kobject.h|   58 +---
 include/linux/module.h |2 +-
 include/linux/pci_hotplug.h|2 +-
 kernel/ksysfs.c|   12 ++--
 kernel/module.c|8 ++-
 kernel/params.c|2 +
 kernel/power/disk.c|   14 +++---
 kernel/power/main.c|   10 ++--
 kernel/power/power.h   |2 +-
 lib/kobject.c  |   69 +++-
 security/inode.c   |2 +-
 49 files changed, 244 insertions(+), 298 deletions(-)

---

Greg Kroah-Hartman (2):
  Driver core: fix show_uevent from taking up way too much stack
  remove "struct subsystem" as it is no longer needed

Jean Delvare (1):
  platform: reorder platform_device_del

Randy Dunlap (1):
  sysfs: printk format warning

Rolf Eike Beer (1):
  DOC: Fix wrong identifier name in Documentation/driver-model/devres.txt

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: arch/i386/boot rewrite, and all the hard-coded video cards

Rene Herman wrote:
> 
> It also provides them as VESA modes yes.

OK, so no work needed.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage

2007-05-02 Thread Rusty Russell

On Wed, 2007-05-02 at 14:09 -0700, H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
> > 
> > Hm, that's unfortunate.  How about an ELF file wrapped in some other
> > container, so that we can easily extract a properly formed ELF file?
> > 
> 
> Effectively the same thing as changing the magic number.  Note that the
> format for bzImage is pretty rigid, and it would be *highly* undesirable
> to muck that up.

To add some code to the debate, here's how lguest loads a bzImage (from
my draft documentation).  Almost anything would be an improvement:

/* A bzImage, unlike an ELF file, is not meant to be loaded.  You're
 * supposed to jump into it and it will unpack itself.  We can't do that
 * because the Guest can't run the unpacking code, and adding features to
 * lguest kills puppies, so we don't want to.
 *
 * The bzImage is formed by putting the decompressing code in front of the
 * compressed kernel code.  So we can simple scan through it looking for the
 * first "gzip" header, and start decompressing from there. */
static unsigned long load_bzimage(int fd, unsigned long *page_offset)
{
unsigned char c;
int state = 0;

/* GZIP header is 0x1F 0x8B  ... . */
while (read(fd, , 1) == 1) {
switch (state) {
case 0:
if (c == 0x1F)
state++;
break;
case 1:
if (c == 0x8B)
state++;
else
state = 0;
break;
case 2 ... 8:
state++;
break;
case 9:
/* Seek back to the start of the gzip header. */
lseek(fd, -10, SEEK_CUR);
/* One final check: "compressed under UNIX". */
if (c != 0x03)
state = -1;
else
return unpack_bzimage(fd, page_offset);
}
}
errx(1, "Could not find kernel in bzImage");
}

/* Unfortunately the entire ELF image isn't compressed: the segments
 * which need loading are extracted and compressed raw.  This denies us the
 * information we need to make a fully-general loader. */
static unsigned long unpack_bzimage(int fd, unsigned long *page_offset)
{
gzFile f;
int ret, len = 0;
/* A bzImage always gets loaded at physical address 1M.  This is
 * actually configurable as CONFIG_PHYSICAL_START, but as the comment
 * there says, "Don't change this unless you know what you are doing".
 * Indeed. */
void *img = (void *)0x10;

/* gzdopen takes our file descriptor (carefully placed at the start of
 * the GZIP header we found) and returns a gzFile. */
f = gzdopen(fd, "rb");
/* Unfortunately, if we made a mistake and it wasn't really a gzip
 * header, it will still read the file, but directly without
 * decompressing it.  For us, that's a misfeature. */
if (gzdirect(f))
errx(1, "did not find correct gzip header");
/* We read it into memory in 64k chunks until we hit the end. */
while ((ret = gzread(f, img + len, 65536)) > 0)
len += ret;
if (ret < 0)
err(1, "reading image from bzImage");

verbose("Unpacked size %i addr %p\n", len, img);

/* Without the ELF header, we can't tell virtual-physical gap.  This is
 * CONFIG_PAGE_OFFSET, and people do actually change it.  Fortunately,
 * I have a clever way of figuring it out from the code itself.  */
*page_offset = intuit_page_offset(img, len);

/* Entry is physical address: convert to virtual */
return (unsigned long)img + *page_offset;
}

/* Prepare to be SHOCKED and AMAZED.  And possibly a trifle nauseated.
 *
 * We know that CONFIG_PAGE_OFFSET sets what virtual address the kernel expects
 * to be.  We don't know what that option was, but we can figure it out
 * approximately by looking at the addresses in the code.  I chose the common
 * case of reading a memory location into the %eax register:
 *
 *  movl , %eax
 *
 * This gets encoded as five bytes: "0xA1 <4-byte-address>".  For example,
 * "0xA1 0x18 0x60 0x47 0xC0" reads the address 0xC0476018 into %eax.
 *
 * In this example can guess that the kernel was compiled with
 * CONFIG_PAGE_OFFSET set to 0xC000 (it's always a round number).  If the
 * kernel were larger than 16MB, we might see 0xC1 addresses show up, but our
 * kernel isn't that bloated yet.
 *
 * Unfortunately, x86 has variable-length instructions, so finding this
 * particular instruction properly involves writing a disassembler.  Instead,
 * we rely on statistics.  We look for "0xA1" and tally the different bytes
 * which occur 4

[PATCH] i386: fix suspend/resume with dynamically allocated irq stacks

2007-05-02 Thread Jeremy Fitzhardinge

This fixes two bugs:
 - the stack allocation must be marked __cpuinit, since it gets called
   on resume as well.
 - presumably the interrupt stack should be freed on unplug if its
   going to get reallocated on every plug.

[ Only non-vmalloced stacks tested. ]

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 arch/i386/kernel/irq.c |   42 +-
 1 file changed, 37 insertions(+), 5 deletions(-)

===
--- a/arch/i386/kernel/irq.c
+++ b/arch/i386/kernel/irq.c
@@ -195,10 +195,13 @@ static int __init irq_guard_cpu0(void)
 }
 core_initcall(irq_guard_cpu0);
 
-static void * __init __alloc_irqstack(int cpu)
+static DEFINE_PER_CPU(struct page *, irqstack_pages[THREAD_SIZE/PAGE_SIZE]);
+static DEFINE_PER_CPU(struct vm_struct *, irqstack_area);
+
+static void * __cpuinit __alloc_irqstack(int cpu)
 {
int i;
-   struct page *pages[THREAD_SIZE/PAGE_SIZE], **tmp = pages;
+   struct page **pages = per_cpu(irqstack_pages, cpu), **tmp = pages;
struct vm_struct *area;
 
if (!cpu)
@@ -207,13 +210,27 @@ static void * __init __alloc_irqstack(in
 
/* failures here are unrecoverable anyway */
area = get_vm_area(THREAD_SIZE, VM_IOREMAP);
-   for (i = 0; i < ARRAY_SIZE(pages); ++i)
+   for (i = 0; i < THREAD_SIZE/PAGE_SIZE; ++i)
pages[i] = alloc_page(GFP_HIGHUSER);
map_vm_area(area, PAGE_KERNEL, );
+   per_cpu(irqstack_area, cpu) = area;
return area->addr;
 }
+
+static void __cpuinit __free_irqstack(int cpu, void *stk)
+{
+   int i;
+
+   if (!cpu)
+   return;
+
+   unmap_vm_area(per_cpu(irqstack_area, cpu));
+
+   for (i = 0; i < THREAD_SIZE/PAGE_SIZE; ++i)
+   __free_page(per_cpu(irqstack_pages, cpu)[i]);
+}
 #else /* !CONFIG_VMALLOC_STACK */
-static void * __init __alloc_irqstack(int cpu)
+static void * __cpuinit __alloc_irqstack(int cpu)
 {
if (!cpu)
return __alloc_bootmem(THREAD_SIZE, THREAD_SIZE,
@@ -222,12 +239,26 @@ static void * __init __alloc_irqstack(in
return (void *)__get_free_pages(GFP_KERNEL,
ilog2(THREAD_SIZE/PAGE_SIZE));
 }
+
+static void __cpuinit __free_irqstack(int cpu, void *stk)
+{
+   if (!cpu)
+   return;
+
+   free_pages((unsigned long)stk, ilog2(THREAD_SIZE/PAGE_SIZE));
+}
 #endif /* !CONFIG_VMALLOC_STACK */
 
-static void __init alloc_irqstacks(int cpu)
+static void __cpuinit alloc_irqstacks(int cpu)
 {
per_cpu(softirq_stack, cpu) = __alloc_irqstack(cpu);
per_cpu(hardirq_stack, cpu) = __alloc_irqstack(cpu);
+}
+
+static void __cpuinit free_irqstacks(int cpu)
+{
+   __free_irqstack(cpu, per_cpu(softirq_stack, cpu));
+   __free_irqstack(cpu, per_cpu(hardirq_stack, cpu));
 }
 
 /*
@@ -266,6 +297,7 @@ void irq_ctx_init(int cpu)
 
 void irq_ctx_exit(int cpu)
 {
+   free_irqstacks(cpu);
per_cpu(hardirq_ctx, cpu) = NULL;
 }
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

It's a bug of printk?

2007-05-02 Thread gshan


Hi,

I need 2 consoles for 2 individual serial ports that is registered by 
register_console(). The console for the 1st serial port is registered at 
first. I can see the output from the serial port #1 before the console 
for the 2nd serial port is registered. However, I saw duplicated output 
from serial port #1 again after the console for #2 serial port is 
registered.


The root cause is that there is only one con_start for all consoles. I 
think con_start need to be merged with struct console so that different 
console could have different start?


Thanks,
Gavin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22 -mm merge plans -- vm bugfixes

2007-05-02 Thread Nick Piggin


Hugh Dickins wrote:

On Wed, 2 May 2007, Nick Piggin wrote:


[snip]


More on-topic, since you suggest doing more within vmtruncate_range
than the filesystem: no, I'm afraid that's misdesigned, and I want
to move almost all of it into the filesystem ->truncate_range.
Because, if what vmtruncate_range is doing before it gets to the
filesystem isn't to be just a waste of time, the filesystem needs
to know what's going on in advance - just as notify_change warns
the filesystem about a coming truncation.  But easier than inventing
some new notification is to move it all into the filesystem, with
unmap_mapping_range+truncate_inode_pages_range its library helpers.


Well I would prefer it to follow the same pattern as regular
truncate. I don't think it is misdesigned to call the filesystem
_first_, but I think if you do that then the filesystem should
call the vm to prepare / finish truncate, rather than open code
calls to unmap itself.



But I'm pretty sure (to use your words!) regular truncate was not racy
before: I believe Andrea's sequence count was handling that case fine,
without a second unmap_mapping_range.


OK, I think you're right. I _think_ it should also be OK with the
lock_page version as well: we should not be able to have any pages
after the first unmap_mapping_range call, because of the i_size
write. So if we have no pages, there is nothing to 'cow' from.



I'd be delighted if you can remove those later unmap_mapping_ranges.
As I recall, the important thing for the copy pages is to be holding
the page lock (or whatever other serialization) on the copied page
still while the copy page is inserted into pagetable: that looks
to be so in your __do_fault.


Yeah, I think my thought process went wrong on those... I'll
revisit.



But it is a shame, and leaves me wondering what you gained with the
page lock there.

One thing gained is ease of understanding, and if your later patches
build an edifice upon the knowledge of holding that page lock while
faulting, I've no wish to undermine that foundation.


It also fixes a bug, doesn't it? ;)



Well, I'd come to think that perhaps the bugs would be solved by
that second unmap_mapping_range alone, so the pagelock changes
just a misleading diversion.

I'm not sure how I feel about that: calling unmap_mapping_range a
second time feels such a cheat, but if (big if) it does solve the
races, and the pagelock method is as expensive as your numbers
now suggest...


Well aside from being terribly ugly, it means we can still drop
the dirty bit where we'd otherwise rather not, so I don't think
we can do that.

I think there may be some way we can do this without taking the
page lock, and I was going to look at it, but I think it is
quite neat to just lock the page...

I don't think performance is _that_ bad. On the P4 it is a couple
of % on the microbenchmarks. The G5 is worse, but even then I
don't think it is I'll try to improve that and get back to you.

The problem is that lock/unlock_page is expensive on powerpc, and
if we improve that, we improve more than just the fault handler...

The attached patch gets performance up a bit by avoiding some
barriers and some cachelines:

G5
 pagefault   fork  exec
2.6.21   1.49-1.51   164.6-170.8   741.8-760.3
+patch   1.71-1.73   175.2-180.8   780.5-794.2
+patch2  1.61-1.63   169.8-175.0   748.6-757.0

So that brings the fork/exec hits down to much less than 5%, and
would likely speed up other things that lock the page, like write
or page reclaim.

I think we could get further performance improvement by
implementing arch specific bitops for lock/unlock operations,
so we don't need to use things like smb_mb__before_clear_bit()
if they aren't needed or full barriers in the test_and_set_bit().

--
SUSE Labs, Novell Inc.

Index: linux-2.6/include/linux/page-flags.h
===
--- linux-2.6.orig/include/linux/page-flags.h   2007-04-24 10:39:56.0 
+1000
+++ linux-2.6/include/linux/page-flags.h2007-05-03 08:38:53.0 
+1000
@@ -91,6 +91,8 @@
 #define PG_nosave_free 18  /* Used for system suspend/resume */
 #define PG_buddy   19  /* Page is free, on buddy lists */
 
+#define PG_waiters 20  /* Page has PG_locked waiters */
+
 /* PG_owner_priv_1 users should have descriptive aliases */
 #define PG_checked PG_owner_priv_1 /* Used by some filesystems */
 
Index: linux-2.6/include/linux/pagemap.h
===
--- linux-2.6.orig/include/linux/pagemap.h  2007-04-24 10:39:56.0 
+1000
+++ linux-2.6/include/linux/pagemap.h   2007-05-03 08:35:08.0 +1000
@@ -141,7 +141,7 @@
 static inline void lock_page(struct page *page)
 {
might_sleep();
-   if (TestSetPageLocked(page))
+   if (unlikely(TestSetPageLocked(page)))
__lock_page(page);
 }
 
@@ -152,7 +152,7 @@
 static inline void

scheduling oddity on 2.6.20.3 stock

2007-05-02 Thread david

I needed to recompress some files from .bz2 to .gz so I setup a script to 
do


bunzip2 -c $file.bz2 |gzip -9 >$file.gz

I expected that the two CPU heavy processes would end up on different 
cpu's and spend a little time shuffling data between the two cpu's on a 
system (dual core opteron)


however, instead what I find is that each process is getting 50% of one 
cpu while the other cpu is 97% idle.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 00/33] m68k patches for 2.6.22

2007-05-02 Thread Roman Zippel

Hi,

On Tuesday 01 May 2007 22:49, Christoph Hellwig wrote:

> Btw, is there any chance you could update m68k to use the generic
> irq code?  The only architectures that don't have a conversion
> in progress are m68k, m68knommu and arm26

I haven't looked seriously into it since it was pretty much rewritten.
What I need is very close control over the lowlevel handling to minimize the 
number of indirect calls and to get the interrupt delivered as fast as 
possible. (E.g. we even patch the assembler entry, so the basic flow control 
is pretty much fixed after boot.)
I don't mind the if the general management is a bit more complex, but how do I 
get rid of all the useless code?
At a bit closer look e.g. irq_desc would need a serious diet, it can produce a 
pretty large table...

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] powerpc: change topology_init() to a subsys_initcall

2007-05-02 Thread Michael Ellerman

On Wed, 2007-05-02 at 12:11 -0500, Kevin Corry wrote:
> Change the powerpc version of topology_init() from an __initcall to
> a subsys_initcall to match all other architectures.
> 
> Signed-off-by: Kevin Corry <[EMAIL PROTECTED]>
> 
> Index: linux-2.6.21/arch/powerpc/kernel/sysfs.c
> ===
> --- linux-2.6.21.orig/arch/powerpc/kernel/sysfs.c
> +++ linux-2.6.21/arch/powerpc/kernel/sysfs.c
> @@ -498,4 +498,4 @@ static int __init topology_init(void)
>  
>   return 0;
>  }
> -__initcall(topology_init);
> +subsys_initcall(topology_init);  

topology_init() depends on the register_one_node() stuff being
available, which relies on register_node_type() being called AFAICT -
which is a postcore_initcall(). So that's OK.

It also creates sysfs files, which is OK because long before initcalls
run vfs_caches_init() called mnt_init() which called sysfs_init().

Just to be super safe it'd be good to diff your sysfs before and after
the change. But assuming that show's nothing this looks fine to me.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

signature.asc
Description: This is a digitally signed message part

Re: FEATURE REQUEST: merge MD software raid and LVM in one unique layer.

2007-05-02 Thread david

On Wed, 2 May 2007, Miguel Sousa Filipe wrote:

On 5/2/07, Diego Calleja <[EMAIL PROTECTED]> wrote:

 El Wed, 2 May 2007 20:18:55 +0100, "Miguel Sousa Filipe"
 <[EMAIL PROTECTED]> escribió:

>  I find it high irritanting having two kernel interfaces and two
>  userland tools that provide the same funcionality, which one should I
>  use?

 I doubt users care about kernel's design; however the lack of unification
 of
 userspace tools is a real problem. Just my 2¢.

I believe they do, since the kernels desing obviate the need and use
of several diferent tools (that shouldn't be needed) like for instance
for having raid5 with snapshot and dinamic partition resizing we will
allways need:

md-raid5
lvm
some FS.

This md-raid5 and lvm separation is the evidence of how the kernels
design & API affect usability and userspace tools that the user is
obliged to use. I cannot have those features without "bumping" into
this kernel design issue.
This is also a problem for any developer who tries to improve
usability in this area by creating some unified userland tools to
manipulate MD & LVM. (Imagining myself implementing some userland tool
to create some "storage devices" + mount points.. doesn't  seem easy
nor fun..).

why do you care if the userspace tool that does the resizing makes system 
calls to one layer or to two layers? how would you know?

currently you see that they are seperate becouse you use two different 
tools to manipulate them, but if it was one tool that manipulated both of 
them for the common cases you wouldn't know. (this is the point that Diego 
was trying to make)

David Lang

Re: [ck] [REPORT] 2.6.21.1 vs 2.6.21-sd046 vs 2.6.21-cfs-v6

2007-05-02 Thread William Lee Irwin III

Con Kolivas wrote:
>> Looks good, thanks. Ingo's been hard at work since then and has v8 out by
>> now. SD has not changed so you wouldn't need to do the whole lot of tests
>> on SD again unless you don't trust some of the results.

On Thu, May 03, 2007 at 02:11:39AM +0300, Al Boldi wrote:
> Well, I tried cfs-v8 and it still shows some nice regressions wrt 
> mainline/sd.  SD's nice-levels look rather solid, implying fairness.

That's odd. The ->load_weight changes should've improved that quite
a bit. There may be something slightly off in how lag is computed,
or maybe the O(n) lag issue Ying Tang spotted is biting you.

Also, I should say that the nice number affairs don't imply fairness
per se. The way that works is that when tasks have "weights" (like
nice levels in UNIX) the definition of fairness changes so that each
task gets shares of CPU bandwidth proportional to its weight instead
of one share for one task.

It takes a bit closer inspection than feel tests to see if weighted
fairness is properly implemented. One thing to try is running a number
of identical CPU hogs at the same time at different nice levels for a
fixed period of time (e.g. 1 or 2 minutes) so they're in competition
with each other and seeing what percent of the CPU each gets. From
there you can figure out how many shares each is getting for its nice
level. Trying different mixtures of nice levels and different numbers
of tasks should give consistent results for the shares of CPU bandwidth
the CPU hogs get for being at a particular nice level. A scheduler gets
"bonus points" (i.e. is considered better at prioritizing) for the user
being able to specify how the weightings come out. The finer-grained
the control, the more bonus points.

Maybe con might want to take a stab at having users be able to specify
the weights for each nice level individually.

CFS actually has a second set of weights for tasks, namely the
timeslice for a given task. At the moment, they're all equal. It should
be the case that the shorter the timeslice a given task has, the less
latency it gets. So there is a fair amount of room for it to manuever
with respect to feel tests. It really needs to be done numerically to
get results we can be sure mean something.

The way this goes is task t_i gets a percent of the CPU p_i when the
tasks t_1, t_2, ..., t_n are all competing, and task t_i has nice level
n_i. The share corresponding to nice level n_i is then

  p_i
  w_i = ---
sum p_j

One thing to check for is that if two tasks have the same nice level
that their weights come out about equal. So for t_i and t_j, if n_i
= n_j then you check that at least approximately, w_i = w_j, or even
p_i = p_j, since we're not starting and stopping tasks in the midst of
the test. Also, you can't simplify sum p_j to 1, since the set of tasks
may not be the only things running.

The other thing to do is try a different number of tasks with a
different mix of nice levels. The weight w_i for a given nice
level n_i should be the same even in a different mix of tasks
and nice levels if the nice levels are the same.

If this sounds too far out, there's nothing to worry about. You can
just run the different numbers of tasks with different mixes of nice
levels and post the %cpu numbers. Or if that's still a bit far out
for you, a test that does all this is eventually going to get written.

-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ANNOUNCE] Updated PS3 Linux Distributor's Starter Kit released

2007-05-02 Thread Geoff Levand

Hi.

Just to let anyone interested know, an updated PS3 Linux
Distributor's Starter Kit (v1.3) was released.

The release note is here:

  
http://www.kernel.org/pub/linux/kernel/people/geoff/cell/CELL-Linux-CL_20070425-ADDON/README-e.txt

And the CD-ROM iso image is here (238 MiB):

  ftp://ftp.infradead.org/pub/Sony-PS3/CELL-Linux-CL_20070425-ADDON.iso
  http://ftp.uk.linux.org/pub/linux/Sony-PS3/CELL-Linux-CL_20070425-ADDON.iso


-Geoff




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] New firewire stack

2007-05-02 Thread Jonathan Woithe

Kritian wrote:
> Olaf Hering wrote:
> > On Tue, May 01, Kristian H?gsberg wrote:
> > 
> >>   drivers/firewire/Kconfig  |   60 ++
> > 
> > NACK.
> > Upgrade the current drivers/ieee1394/ with the new code, and keep all
> > existing module names.
> 
> What's your reasoning here?  Having different module names allows people to 
> compile both stacks and switch between them as they wish.

While I'm not fussed about the implementation details I agree with those
who have advocated a migration period where both stacks are present.  A
major change such as this is almost certain to turn up bugs when it becomes
more widely tested, and many firewire users are unlikely to test the
new stack without an easy fallback to a known working system.  Yes, I know
development and production systems should be separate, but I (and many
others) can't afford enough hardware for that.

> Another point in favour of different module names is that the new stack
> doesn't actually provide the same user space interfaces as the old stack. 
> Basically, no applications use the raw kernel interfaces and the new stack
> is only compatible at the library level.  In the light of this, I think
> it's fair to change the module names.

Sounds reasonable to me.

However, as a compromise how about renaming the existing stack's modules and
then reusing the existing names for the new stack?  Messy I know, but this
way both stacks would still be available without recompilation for those who
needed them and the sbp2-as-root dilemma raised by Olaf would also be
covered.

> As for putting the new stack in drivers/ieee1394 - I don't know, I think it 
> makes sense to keep the new stack in it's own directory.

My immediate thought it that it would be neater and clearer to have both
stacks in different directories, but I could live with either.

Oh yes, it would be nice to have working PCILynx support again (although I
acknowledge it's unlikely to happen).  Some of us do have these cards
installed for sniffing purposes (using nosy) but it would be nice to be able
to use them with libraw1394 as well.  It would for example save me having to
swap cards depending on what I needed to do (I have insufficient PCI slots
to have both the PCILynx and OHCI cards installed simultaneously).

Regards
  jonathan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22 -mm merge plans: mm-more-rmap-checking

2007-05-02 Thread Nick Piggin


Hugh Dickins wrote:

On Wed, 2 May 2007, Nick Piggin wrote:


Yes, but IIRC I put that in because there was another check in
SLES9 that I actually couldn't put in, but used this one instead
because it also caught the bug we saw.
... 
This was actually a rare corruption that is also in 2.6.21, and

as few rmap callsites as we have, it was never noticed until the
SLES9 bug check was triggered.



You are being very mysterious.  Please describe this bug (privately
if you think it's exploitable), and let's work on the patch to fix it,
rather than this "debug" patch.


It is exec-fix-remove_arg_zero.patch in Andrew's tree, it's exploitable
in that it leaks memory, but it could also release corrupted pagetables
into quicklists on those architectures that have them...

Anyway, it quite likely would have gone unfixed for several more years
if we didn't have the bug triggers in. Now you could argue that my
patch obviously fixes all bugs in there (but I wouldn't :)), and being
most complex of the few callsites, _now_ we can avoid the bug checks.
However I'd prefer to keep them at least under CONFIG_DEBUG_VM.



Hmm, I didn't notice the do_swap_page change, rather just derived
its safety by looking at the current state of the code (which I
guess must have been post-do_swap_page change)...



Your addition of page_add_new_anon_rmap clarified the situation too.



Do you have a pointer to the patch, for my interest?



The patch which changed do_swap_page?

commit c475a8ab625d567eacf5e30ec35d6d8704558062
Author: Hugh Dickins <[EMAIL PROTECTED]>
Date:   Tue Jun 21 17:15:12 2005 -0700
[PATCH] can_share_swap_page: use page_mapcount



Yeah, this one, thanks. I'm just interested.



Or my intended PG_swapcache to PAGE_MAPPING_SWAP patch,
which does assume PageLocked in page_add_anon_rmap?
Yes, I can send you its current unsplit state if you like
(but have higher priorities before splitting and commenting
it for posting).


I would like to see that too, but when you are ready :)

Thanks,
Nick

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Ext3 vs NTFS performance

2007-05-02 Thread David Chinner

On Wed, May 02, 2007 at 03:46:21PM -0400, Chris Mason wrote:
> On Thu, May 03, 2007 at 01:44:14AM +1000, David Chinner wrote:
> > On Tue, May 01, 2007 at 01:43:18PM -0700, Cabot, Mason B wrote:
> > > Hello all,
> > > 
> > > I've been testing the NAS performance of ext3/Openfiler 2.2 against
> > > NTFS/WinXP and have found that NTFS significantly outperforms ext3 for
> > > video workloads. The Windows CIFS client will attempt a poor-man's
> > > pre-allocation of the file on the server by sending 1-byte writes at
> > > 128K-byte strides, breaking block allocation on ext3 and leading to
> > > fragmentation and poor performance. This will happen for many
> > > applications (including iTunes) as the CIFS client issues these
> > > pre-allocates under the application layer.
> > > 
> > > I've posted a brief paper on Intel's OSS website
> > > (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give
> > > it a read and let me know what you think. In particular, I'd like to
> > > arrive at the right place to fix this problem: is it in the filesystem,
> > > VFS, or Samba?
> > 
> > As I commented on IRC to Val Henson - the XFS performance indicates
> > that it is not a VFS or Samba problem.
> > 
> > I'd say it's probably delayed allocation that is making the
> > difference here - no allocation occurs on the single byte writes, it
> > occurs when the larger data writes are flushed to disk. Hence no
> > adverse fragmentation will occur and there wil be no extra
> > allocations being done.
> > 
> > Hence I think it's probably a filesystm problem - it would be
> > interesting to see how ext4 performs on this workload
> 
> If we rely on delalloc for this, what happens if another proc on the
> same fs is doing synchronous writes to other files? (say for mail
> delivery).  Will random FS commits force delayed allocations to become
> real?

Not on XFS.

> Also, I'd expect a sufficiently loaded server to break down eventually
> as load/users increase.  The cost of a bad delalloc decision gets much
> higher if we're using it as a crutch for this kind of bad userland
> coding.

This only becomes a problem if the system has enough pages dirty to
be triggering throttling so that the 1byte writes are converted before
the data actually hits the server.

Even then, if you are on an XFS filesystem with a sunit/swidth set,
the alocation alignments and speculative allocations will go a long
way to preventing fragmentations.

If that doesn't work, then set the extent allocation size hint on the
XFS inode to 128k or 256k to set the minimum all ocation size for the
file to span the distance between the 1 byte writes. This attribute
can be inherited from the parent directory on create, so it's a
set and forget type of thing...

i.e. XFS has lots of ways to prevent perfromance from degrading
on these sorts of issues.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/6] firewire: isochronous and asynchronous I/O

2007-05-02 Thread Kristian Høgsberg


Christoph Hellwig wrote:

+   for (i = 0; i < buffer->page_count; i++) {
+   buffer->pages[i] = alloc_page(GFP_KERNEL | GFP_DMA32 | 
__GFP_ZERO);
+   if (buffer->pages[i] == NULL)
+   goto out_pages;
+
+   address = dma_map_page(card->device, buffer->pages[i],
+  0, PAGE_SIZE, direction);
+   if (dma_mapping_error(address)) {
+   __free_page(buffer->pages[i]);
+   goto out_pages;
+   }


Are you sure using streaming dma mapping is safe here?  I don't see
actual user in this patch, but doing the proper ownership protocol
for them is quite difficult if you reuse them, and allocating them
in kernelspace usually means you want to keep reusing them.


What other options are there?  The only user in the stack is the userspace 
interface, which lets you mmap the pages in the iso_buffer from an 
application.  The pages need to stay around and stay mapped as long as that 
buffer is mmapped by userspace.  The buffer can be several megabytes and I 
don't want to set up a kernel side virtual mapping for it.  The pages are only 
used for either outgoing or incoming data, never both, so device/driver 
ownership isn't too difficult to handle.



+#include 


You don't actually seem to use this one ..


+#include 


.. or this one ..


+#include 


.. or this one.


Ah, right, I'll get rid of those.


+   retval = fw_core_add_address_handler(_map,
+_map_region);
+   BUG_ON(retval < 0);
+
+   retval = fw_core_add_address_handler(,
+_region);
+   BUG_ON(retval < 0);
+
+   /* Add the vendor textual descriptor. */
+   retval = fw_core_add_descriptor(_id_descriptor);
+   BUG_ON(retval < 0);
+   retval = fw_core_add_descriptor(_id_descriptor);
+   BUG_ON(retval < 0);


These kinds of bug checks look wrong.  Either the operations
can't fail in which case they should not return an error value
or you should handle them properly.


The fw_core_add_descriptor() checks that the descriptor block it's passed is 
internally consistent and is used for blocks passed in from userspace too.  In 
these two cases, the blocks are static const arrays in the driver and if 
fw_core_add_descriptor returns < 0 it's a bug in the driver.



Both the previous and this patch contain quite a lot of GFP_ATOMIC
allocation which are a sign of not having a very good layering.


Looking through the GFP_ATOMIC allocations, I see a couple that could be 
rolled back to GFP_KERNEL.  But I don't know that it means bad layering, I'm 
just typically using a GFP_ATOMIC kmalloc than, preallocating some fixed 
number of, say, packets or nodes or whatever.  For example, the old SBP-2 
(storage) driver uses a free-list of packets and grabs one from that list when 
the SCSI stacks asks it to send a request.  If that list is empty, it fails 
and lets the SCSI stack retry the command.  I'm using a GFP_ATOMIC kmalloc 
instead in that case, and I believe it's a better approach than implementing 
ad-hoc allocation data structures.


Thanks for the reviews, I'll look through your other emails.
Kristian
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Mysterious RTC hangs on x86_64 - fixed, sort of

2007-05-02 Thread Zachary Amsden


Chuck Ebbert wrote:

Well, turns out this is a heisenbug.  Which is good, since it means the 
nop patch didn't change anything.



Try leaving the spinlocks and just disabling the callbacks. And maybe
enable spinlock debugging...
  


I tried removing all the spinlocks inside the interrupt handler.  Seemed 
to work fine for a while, but still hung (at worst, it looks missing 
locks means we might screw up and read / write the wrong CMOS register, 
not hang or crash).


So I took down 2nd CPU with hotplug (did not yet try UP kernel though).  
It took a longer time, but still hung.  Seems not to be a spinlock 
problem, but I'll turn on debugging anyway.


  

CONFIG_HPET_EMULATE_RTC=y



Did you try without that?
  


Will do.  That looks much more suspicious like.  I thought I killed it 
already, but had only got this:


# CONFIG_HPET_RTC_IRQ is not set

If that still crashes, I'll try running cmos access in a loop in userspace to see if 
maybe the port I/O is tickling a chipset bug (the only other report I know of is on 
same chipset, nVidia MCP51).  Maybe SMM handler is accessing CMOS or something wacked 
out.  .  Stuck in SMM is not good for CPU thermal throttling ... hopefully 
Turion's don't reach nuclear emission point.

Would also explain maybe why NMI watchdog doesn't seem to notice anything wrong.


Thanks,
Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86_64: support poll() on /dev/mcelog (try #2)

2007-05-02 Thread Tim Hockin


Newer version coming in a while.  Testing.

On 4/30/07, Tim Hockin <[EMAIL PROTECTED]> wrote:

From: Tim Hockin <[EMAIL PROTECTED]>

Background:
 /dev/mcelog is typically polled manually.  This is less than optimal for
 situations where accurate accounting of MCEs is important.  Calling
 poll() on /dev/mcelog does not work.

Description:
 This patch adds support for poll() to /dev/mcelog.  This results in
 immediate wakeup of user apps whenever the poller finds MCEs.  Because
 the exception handler can not take any locks, it can not call the wakeup
 itself.  Instead, it uses a thread_info flag (TIF_MCE_NOTIFY) which is
 caught at the next return from interrupt or exit from idle, calling the
 mce_user_notify() routine.

 This patch also does some small cleanup for essentially unused variables,
 and moves the user notification into the body of the poller, so it is
 only called once per poll, rather than once per CPU.

Result:
 Applications can now poll() on /dev/mcelog.  When an error is logged
 (whether through the poller or through an exception) the applications are
 woken up promptly.  This should not affect any previous behaviors.  If no
 MCEs are being logged, there is no overhead.

Alternatives:
 I considered simply supporting poll() through the poller and not using
 TIF_MCE_NOTIFY at all.  However, the time between an uncorrectable error
 happening and the user application being notified is *the*most* critical
 window for us.  Many uncorrectable errors can be logged to the network if
 given a chance.

 I also considered doing the MCE poll directly from the idle notifier, but
 decided that was overkill.

Testing:
 I used an error-injecting DIMM to create lots of correctable DRAM errors
 and verified that my user app is woken up in sync with the polling interval.
 I also used the northbridge to inject uncorrectable ECC errors, and
 verified (printk() to the rescue) that the notify routine is called and the
 user app does wake up.

Caveats:
 I have seen a soft lockup with a call trace always similar to:
Call Trace:
   [] wake_up_process+0x10/0x20
[] softlockup_tick+0xea/0x110
[] run_local_timers+0x13/0x20
[] update_process_times+0x57/0x90
[] mcheck_check_cpu+0x0/0x40
[] smp_local_timer_interrupt+0x34/0x60
[] smp_apic_timer_interrupt+0x4e/0x70
[] apic_timer_interrupt+0x66/0x70

 I regressed this to the vanilla kernel, and it still happens.  It only
 crops up in the face of multiple uncorrectable errors.

Patch:
 This patch is against 2.6.21-rc7.

Signed-off-by: Tim Hockin <[EMAIL PROTECTED]>

---

This is the second version version of this patch.  The TIF_* approach was
suggested by Mike Waychison and Andi did not yell at me when I suggested
it.  Hooking the idle notifier was born of an Andrew Morton suggestion
and, no surprise, seems to work well.


diff -pruN linux-2.6.20+th/arch/x86_64/kernel/entry.S 
linux-2.6.20+th2v3/arch/x86_64/kernel/entry.S
--- linux-2.6.20+th/arch/x86_64/kernel/entry.S  2007-04-24 22:46:19.0 
-0700
+++ linux-2.6.20+th2v3/arch/x86_64/kernel/entry.S   2007-04-30 
10:57:43.0 -0700
@@ -282,7 +282,7 @@ sysret_careful:
 sysret_signal:
TRACE_IRQS_ON
sti
-   testl $(_TIF_SIGPENDING|_TIF_NOTIFY_RESUME|_TIF_SINGLESTEP),%edx
+   testl 
$(_TIF_SIGPENDING|_TIF_NOTIFY_RESUME|_TIF_SINGLESTEP|_TIF_MCE_NOTIFY),%edx
jz1f

/* Really a signal */
@@ -375,7 +375,7 @@ int_very_careful:
jmp int_restore_rest

 int_signal:
-   testl $(_TIF_NOTIFY_RESUME|_TIF_SIGPENDING|_TIF_SINGLESTEP),%edx
+   testl 
$(_TIF_NOTIFY_RESUME|_TIF_SIGPENDING|_TIF_SINGLESTEP|_TIF_MCE_NOTIFY),%edx
jz 1f
movq %rsp,%rdi  #  -> arg1
xorl %esi,%esi  # oldset -> arg2
@@ -597,9 +597,9 @@ retint_careful:
cli
TRACE_IRQS_OFF
jmp retint_check
-
+
 retint_signal:
-   testl $(_TIF_SIGPENDING|_TIF_NOTIFY_RESUME|_TIF_SINGLESTEP),%edx
+   testl 
$(_TIF_SIGPENDING|_TIF_NOTIFY_RESUME|_TIF_SINGLESTEP|_TIF_MCE_NOTIFY),%edx
jzretint_swapgs
TRACE_IRQS_ON
sti
diff -pruN linux-2.6.20+th/arch/x86_64/kernel/mce.c 
linux-2.6.20+th2v3/arch/x86_64/kernel/mce.c
--- linux-2.6.20+th/arch/x86_64/kernel/mce.c2007-04-27 14:19:08.0 
-0700
+++ linux-2.6.20+th2v3/arch/x86_64/kernel/mce.c 2007-04-30 22:19:25.0 
-0700
@@ -20,12 +20,15 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 

 #define MISC_MCELOG_MINOR 227
 #define NR_BANKS 6
@@ -39,8 +42,7 @@ static int mce_dont_init;
 static int tolerant = 1;
 static int banks;
 static unsigned long bank[NR_BANKS] = { [0 ... NR_BANKS-1] = ~0UL };
-static unsigned long console_logged;
-static int notify_user;
+static unsigned long notify_user;
 static int rip_msr;
 static int mce_bootlog = 1;
 static atomic_t

Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

2007-05-02 Thread Jesse Barnes

On Wednesday, May 2, 2007 4:45 pm Robert Hancock wrote:
> Jesse Barnes wrote:
> > On Wednesday, May 2, 2007 7:34 am Robert Hancock wrote:
> >> Jesse Barnes wrote:
> >>> On Tuesday, May 01, 2007, Jesse Barnes wrote:
> > I'm testing it now on my 965...
> 
>  Bah... nevermind Robert, I see you're doing this already in
>  pci_mmcfg_reject_broken.  I'm about to reboot & test now.
> >>>
> >>> Ok, I've tested a bit on my 965 (after re-adding my old patch to
> >>> support it) and the new checks are more complete, but my BIOS
> >>> still appears to be buggy.
> >>>
> >>> The extended config space (as defined by the register) is at
> >>> 0xf000 (full value is 0xf003 indicating 128M enabled). 
> >>> The ACPI MCFG table has this space reserved according to Robert's
> >>> new code, but the machine hangs due to the address space aliasing
> >>> Olivier mentioned awhile back.  I don't have a PCIe card to test
> >>> with (or any devices that require extended config space that I
> >>> know of) so I can't really tell if Windows supports PCIe on this
> >>> platform, but if it does I don't see how it would w/o having a
> >>> full bridge driver and sophisticated address space allocation
> >>> builtin.
> >>
> >> Windows XP doesn't use MMCONFIG or any extended configuration
> >> space. I believe Vista is supposed to, though. Not sure how they
> >> are handling this issue.
> >
> > Oh right... Vista will be the first to fully support PCIe & mcfg...
> >
> >> Can you post what your board has for PNPACPI reserved resources (I
> >> believe they're in /sys/devices/pnp0/*/resources IIRC, don't have
> >> a Linux box handy right now). Full dmesg would also be useful, I
> >> think it dumps out those reservations at boot nowadays..
> >
> > BIOS update didn't help.  Here's the boot log and a dump of the
> > pnp0 resources.
>
> Curious.. It looks like the ACPI resources have the correct
> reservation for the MMCONFIG window according to what the register
> says the location should be. There's no other reservations that
> overlap with that range (f00-f7ff), and according to the 965
> datasheet there's nothing that's hard-coded to occupy that memory
> range. I can't really see what this range could be conflicting with.

Yeah, it's strange.  Even /proc/iomem from a working boot looks ok:

d070-d07f : PCI Bus #04
d080-d08f : PCI Bus #05
f000-f7ff : pnp 00:01
fec0-fec00fff : IOAPIC 0
fed0-fed003ff : HPET 0

> What happens if you take out the chipset register detection, does the
> MCFG table give you the same result? Wonder if they're doing
> something funny with start/end bus values or something in their
> table. There's some code in my patch that prints out the important
> data from the MCFG table, can you tell me what that shows with the
> chipset detection taken out?

Yeah, I'll look a little more closely.  It could also be that another 
register needs tweaking somewhere to actually get the bridge to decode 
the space.

> If that doesn't provide any useful information, I think we may need
> some assistance from Intel chipset/motherboard people to figure out
> what is going on here..

I'm talking with them now, hopefully they'll shed some light on it.

Thanks,
Jesse

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources

2007-05-02 Thread Robert Hancock


Jesse Barnes wrote:

On Wednesday, May 2, 2007 7:34 am Robert Hancock wrote:

Jesse Barnes wrote:

On Tuesday, May 01, 2007, Jesse Barnes wrote:

I'm testing it now on my 965...

Bah... nevermind Robert, I see you're doing this already in
pci_mmcfg_reject_broken.  I'm about to reboot & test now.

Ok, I've tested a bit on my 965 (after re-adding my old patch to
support it) and the new checks are more complete, but my BIOS still
appears to be buggy.

The extended config space (as defined by the register) is at
0xf000 (full value is 0xf003 indicating 128M enabled).  The
ACPI MCFG table has this space reserved according to Robert's new
code, but the machine hangs due to the address space aliasing
Olivier mentioned awhile back.  I don't have a PCIe card to test
with (or any devices that require extended config space that I know
of) so I can't really tell if Windows supports PCIe on this
platform, but if it does I don't see how it would w/o having a full
bridge driver and sophisticated address space allocation builtin.

Windows XP doesn't use MMCONFIG or any extended configuration space.
I believe Vista is supposed to, though. Not sure how they are
handling this issue.


Oh right... Vista will be the first to fully support PCIe & mcfg...


Can you post what your board has for PNPACPI reserved resources (I
believe they're in /sys/devices/pnp0/*/resources IIRC, don't have a
Linux box handy right now). Full dmesg would also be useful, I think
it dumps out those reservations at boot nowadays..


BIOS update didn't help.  Here's the boot log and a dump of the pnp0 
resources.


Curious.. It looks like the ACPI resources have the correct reservation 
for the MMCONFIG window according to what the register says the location 
should be. There's no other reservations that overlap with that range 
(f00-f7ff), and according to the 965 datasheet there's nothing 
that's hard-coded to occupy that memory range. I can't really see what 
this range could be conflicting with.


What happens if you take out the chipset register detection, does the 
MCFG table give you the same result? Wonder if they're doing something 
funny with start/end bus values or something in their table. There's 
some code in my patch that prints out the important data from the MCFG 
table, can you tell me what that shows with the chipset detection taken out?


If that doesn't provide any useful information, I think we may need some 
assistance from Intel chipset/motherboard people to figure out what is 
going on here..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] CFS scheduler, -v8


Srivatsa Vaddagiri wrote:

I briefly went thr' the paper and my impression is it expect each task
to specify the length of each new request it initiates. Is that correct?
  
No, the timeslice l_i here serves as a granularity control w.r.t 
responsiveness (or latency depends on how you interpret it). As wli said 
it can be express as a function of the priority, as we do for weight 
now. It is not related with the length of each new request. A request 
may be 1 seconds long, but the scheduler may still process it using 10ms 
timeslice. Smaller timeslice leads to more accuracy, i.e. closer to 
ideal case.
However, the maximum of timeslice l_i used by all active tasks 
determines the total responsiveness of the system, which I will explain 
in detail later.

There is also p->wait_runtime which is taken into account when
calculating p->fair_key. So if p3 had waiting in runqueue for long
before, it can get to run quicker than 10ms later.
Consider if p3 is a newly started task or waked up task and carries no 
p->wait_runtime.


Ting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] UIO: Documentation

On Thu, 3 May 2007 00:00:28 +0200 Hans-Jürgen Koch wrote:

> Am Mittwoch 02 Mai 2007 schrieb Randy Dunlap:
> > On Wed, 2 May 2007 10:41:35 +0200 Hans-Jürgen Koch wrote:
> > > Am Mittwoch 02 Mai 2007 01:42 schrieb Randy Dunlap:
> > > > > +The Userspace I/O HOWTO
> > > >
> > > > Most of this reads well.  Thanks.
> > > > A few typo corrections are below...
> > >
> > > Thank you for your work. I generated a new patch that includes all your
> > > suggestions and also fixes the build problems.

OK, your fixes all look good, but still needs one minor fix (below).

Acked-by: Randy Dunlap <[EMAIL PROTECTED]>

---

From: Randy Dunlap <[EMAIL PROTECTED]>

Fix last UIO kernel-doc problem.

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---
 drivers/uio/uio.c |1 -
 1 file changed, 1 deletion(-)

--- linux-2.6.21-git4.orig/drivers/uio/uio.c
+++ linux-2.6.21-git4/drivers/uio/uio.c
@@ -582,7 +582,6 @@ static void uio_class_destroy(void)
 
 /**
  * uio_register_device - register a new userspace IO device
- *
  * @owner: module that creates the new device
  * @parent:parent device
  * @info:  UIO device capabilities
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] I/OAT updates for 2.6.22

2007-05-02 Thread Chris Leech

On Wed, 2007-05-02 at 15:44 -0700, David Miller wrote:
> 
> Chrstopher, I really really would like you to post these patches early
> and often to [EMAIL PROTECTED] especially because you are
> touching the TCP code.

You're right, I should have sent this to netdev as well.  I'm Sorry.

As for early and often, I have posted all of these patches to netdev,
and made suggested changes, and re-posted.

And when I have other networking changes, you can bet they'll get sent
to netdev for review first before I think about asking that they be
included.

> You aren't doing this, for several rounds, and just submitting your
> stuff directly to Linus, Andrew, and lkml, and it's starting to annoy
> me greatly.

For several rounds, I've been posting patches that go nowhere.  I
honestly don't care if they go straight to Linus, through you, or
through Andrew.

- Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: arch/i386/boot rewrite, and all the hard-coded video cards

2007-05-02 Thread Rene Herman


On 05/03/2007 12:59 AM, H. Peter Anvin wrote:


Rene Herman wrote:



Checking here, and mine also has 132x25 as BIOS mode 0x14 in addition to
0x55. Probably also not universal, and 0x54 (132x43) doesn't seem to be
repeated. Unfortunate that Qemu/Bocks don't have the VESA text modes.


Does it export these modes though the VESA interface, or do you have to
"select them blind?"


It also provides them as VESA modes yes. On further inspection, 0x14 is 
actually a bit different:


BIOS 0x14 =  132x25, 8x16 char cell on 1056x400
BIOS 0x54 = VESA 0x10A = 132x43, 8x8  char cell on 1056x350
BIOS 0x55 = VESA 0x109 = 132x25, 8x14 char cell on 1056x350

Booting 2.6.20.1 on the machine with vga=ask finds BIOS 0x14 as 0214 and 
VESA 0x10A as 030A but allows me to select 0254, 0255 and 0309 as well.

('scan' finds BIOS 0x14 as 0114 and BIOS 0x54 as 0154)

Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22 -mm merge plans

On Wed, 2 May 2007 19:11:04 -0400
Mathieu Desnoyers <[EMAIL PROTECTED]> wrote:

> > I didn't know that this was the plan.
> > 
> > The problem I have with this is that once we've merged one part, we're
> > committed to merging the other parts even though we haven't seen them yet.
> > 
> > What happens if there's a revolt over the next set of patches?  Do we
> > remove the core markers patches again?  We end up in a cant-go-forward,
> > cant-go-backward situation.
> > 
> > I thought the existing code was useful as-is for several projects, without
> > requiring additional patching to core kernel.  If such additional patching
> > _is_ needed to make the markers code useful then I agree that we should
> > continue to buffer the markers code in -mm until the
> > use-markers-for-something patches have been eyeballed.
> > 
> 
> My statement was probably not clear enough. The actual marker code is
> useful as-is without any further kernel patching required : SystemTAP is
> an example where they use external modules to load probes that can
> connect either to markers or through kprobes. LTTng, in its current state,
> has a mostly modular core that also uses the markers.

OK, that's what I thought.

> Although some, like Christoph and myself, think that it would benefit to
> the kernel community to have a common infrastructure for more than just
> markers (meaning common serialization and buffering mechanism), it does
> not change the fact that the markers, being in mainline, are usable by
> projects through additional kernel modules.
> 
> If we are looking at current "potential users" that are already in
> mainline, we could change blktrace to make it use the markers.

That'd be a useful demonstration.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[TESTING NEEDED] drivers/serial/sunzilog: Interrupt enable before ISR handler installed

2007-05-02 Thread Mark Fortescue


Hi All,

This patch changes the interrupt enable sequence for the sunzilog driver 
so that interrupts are not enabled untill after the interrupt handler has 
been installed. If this is not done, some SS1 and SS2 sun4c systems panic 
on un-handled interrupt before the handler gets installed preventing boot.


It also adds in support for the ESCC version of the zilog chips. The ESCC 
detection works but the FIFO enable may cause issues with modem and 
receive character status. My interpretation of the SCC manual and the code 
is that it sould be OK.


###

I have been unable to fully test this patch as a kernel bug (introduced in 
the transition from linux-2.6.14 to 2.6.15) prevents my SS1 clone and SS2 
from doing anything more than running sash and even that is liable to 
crash. They fail with a soft lockup, and in the 2.6.20.x kernels that are 
relevent to this driver patch that I have tested, prevent a serial BREAK 
from getting back to the boot prom so a power up reset is required.



---
diff -ruNpd linux-2.6.20.9/drivers/serial/sunzilog.c 
linux-test/drivers/serial/sunzilog.c
--- linux-2.6.20.9/drivers/serial/sunzilog.c2007-04-28 15:02:39.0 
+0100
+++ linux-test/drivers/serial/sunzilog.c2007-04-28 14:54:49.0 
+0100
@@ -10,6 +10,13 @@
  * work there.
  *
  *  Copyright (C) 2002, 2006 David S. Miller ([EMAIL PROTECTED])
+ *
+ * Thu Apr 19 02:59:49 BST 2007
+ * Mark Fortescue ([EMAIL PROTECTED])
+ *
+ * Change R9 handling to ensure interrupts disabled when no
+ * interrupt handler else it breaks sun4c Sparcstations.
+ * Add in handling for ESCC chips (enables FIFO).
  */

 #include 
@@ -93,6 +100,8 @@ struct uart_sunzilog_port {
 #define SUNZILOG_FLAG_REGS_HELD0x0040
 #define SUNZILOG_FLAG_TX_STOPPED   0x0080
 #define SUNZILOG_FLAG_TX_ACTIVE0x0100
+#define SUNZILOG_FLAG_ESCC 0x0200
+#define SUNZILOG_FLAG_ISR_HANDLER  0x0400

unsigned int cflag;

@@ -175,9 +184,11 @@ static void sunzilog_clear_fifo(struct z
 /* This function must only be called when the TX is not busy.  The UART
  * port lock must be held and local interrupts disabled.
  */
-static void __load_zsregs(struct zilog_channel __iomem *channel, unsigned char 
*regs)
+static int __load_zsregs(struct zilog_channel __iomem *channel, unsigned char 
*regs)
 {
int i;
+   int escc;
+   unsigned char r15;

/* Let pending transmits finish.  */
for (i = 0; i < 1000; i++) {
@@ -230,11 +241,25 @@ static void __load_zsregs(struct zilog_c
write_zsreg(channel, R14, regs[R14]);

/* External status interrupt control.  */
-   write_zsreg(channel, R15, regs[R15]);
+   write_zsreg(channel, R15, (regs[R15] | WR7pEN) & ~FIFOEN);
+
+   /* ESCC Extension Register */
+   r15 = read_zsreg(channel, R15);
+   if (r15 & 0x01) {
+   write_zsreg(channel, R7,  regs[R7p]);
+
+   /* External status interrupt and FIFO control.  */
+   write_zsreg(channel, R15, regs[R15] & ~WR7pEN);
+   escc = 1;
+   } else {
+/* Clear FIFO bit case it is an issue */
+   regs[R15] &= ~FIFOEN;
+   escc = 0;
+   }

/* Reset external status interrupts.  */
-   write_zsreg(channel, R0, RES_EXT_INT);
-   write_zsreg(channel, R0, RES_EXT_INT);
+   write_zsreg(channel, R0, RES_EXT_INT); /* First Latch  */
+   write_zsreg(channel, R0, RES_EXT_INT); /* Second Latch */

/* Rewrite R3/R5, this time without enables masked.  */
write_zsreg(channel, R3, regs[R3]);
@@ -242,6 +267,8 @@ static void __load_zsregs(struct zilog_c

/* Rewrite R1, this time without IRQ enabled masked.  */
write_zsreg(channel, R1, regs[R1]);
+
+   return escc;
 }

 /* Reprogram the Zilog channel HW registers with the copies found in the
@@ -732,7 +759,7 @@ static void sunzilog_enable_ms(struct ua
up->curregs[R15] = new_reg;

 		/* NOTE: Not subject to 'transmitter active' rule.  */ 
-		write_zsreg(channel, R15, up->curregs[R15]);

+   write_zsreg(channel, R15, up->curregs[R15] & ~WR7pEN);
}
 }

@@ -862,44 +889,44 @@ sunzilog_convert_to_zs(struct uart_sunzi
up->curregs[R14] = BRSRC | BRENAB;

/* Character size, stop bits, and parity. */
-   up->curregs[3] &= ~RxN_MASK;
-   up->curregs[5] &= ~TxN_MASK;
+   up->curregs[R3] &= ~RxN_MASK;
+   up->curregs[R5] &= ~TxN_MASK;
switch (cflag & CSIZE) {
case CS5:
-   up->curregs[3] |= Rx5;
-   up->curregs[5] |= Tx5;
+   up->curregs[R3] |= Rx5;
+   up->curregs[R5] |= Tx5;
up->parity_mask = 0x1f;
break;
case CS6:
-   up->curregs[3] |= Rx6;
-   up->curregs[5] |= Tx6;
+   up->curregs[R3] |= Rx6;
+   up->curregs[R5] |= Tx6;
up->parity_mask

Re: [PATCH] lib/hexdump

On Wed, 2 May 2007 16:06:35 -0700 Andrew Morton wrote:

> On Wed, 02 May 2007 15:56:48 -0700
> Randy Dunlap <[EMAIL PROTECTED]> wrote:
> 
> > Andrew Morton wrote:
> > > On Wed, 2 May 2007 15:35:56 -0700
> > > Randy Dunlap <[EMAIL PROTECTED]> wrote:
> > > 
> > >> From: Randy Dunlap <[EMAIL PROTECTED]>
> > >>
> > >> Based on ace_dump_mem() from Grant Likely for the Xilinx 
> > >> SystemACE CompactFlash interface.
> > >>
> > >> Add hex_dumper() to lib/hexdump.c and linux/kernel.h.
> > >>
> > >> This patch adds the function 'hex_dumper' which can be used to perform a 
> > >> hex + ASCII dump of data to syslog, in an easily viewable format, thus
> > >> providing a common text hex dump format.
> > >>
> > >> It does not provide a hexdump_to_buffer() function.
> > >> if someone needs that, we'll have to add it.
> > >>
> > >> Example usage:
> > >>  hex_dumper(KERN_DEBUG, data, length);
> > >>
> > > 
> > > Fair enough.  This is the sort of thing one could easily overdesign ;)
> > 
> > The Intel version also returned the number of bytes printed.
> > and they had a hexdump_to_buffer() for sysfs output.
> > 
> 
> Yeah, that's where we get into creature feeping.  Really it should be
> passed the address of a function which performs the per-char output and
> which is passed a bunch of args so it can do its stuff.  But doing printk
> of a single char at a time is a bit inefficient and produces mangled output
> on SMP.  And then we don't know the length of the output and we'd like it
> dynamically allocated and on and on.
> 
> Ho hum.  Perhaps a middle ground is to implement hexdump-to-memory as the
> core function.  hex_dumper() becomes a simple wrapper around that.  (but
> how big is its buffer?  One line would be OK, I guess)

Yeah, I almost did it that way.  We'll see.

> > OK, that's one way to do it.  I'll wait a bit for other comments.
> 
> Good luck ;)

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22 -mm merge plans

2007-05-02 Thread Mathieu Desnoyers

* Andrew Morton ([EMAIL PROTECTED]) wrote:
> On Wed, 2 May 2007 16:36:27 -0400
> Mathieu Desnoyers <[EMAIL PROTECTED]> wrote:
> 
> > * Christoph Hellwig ([EMAIL PROTECTED]) wrote:
> > > On Wed, May 02, 2007 at 09:47:07AM -0700, Andrew Morton wrote:
> > > > > That doesn't constitute using it.
> > > > 
> > > > Andi, there was a huge amount of discussion about all this in September 
> > > > last
> > > > year (subjects: *markers* and *LTTng*). The outcome of all that was, I
> > > > believe, that the kernel should have a static marker infrastructure.
> > > 
> > > Only when it's actually useable.  A prerequisite for merging it is
> > > having an actual trace transport infrastructure aswell as a few actually
> > > useful tracing modules in the kernel tree.
> > > 
> > > Let this count as a vote to merge the markers once we have the 
> > > infrastructure
> > > above ready, it'll be very useful then.
> > 
> > Hi Christoph,
> > 
> > The idea is the following : either we integrate the infrastructure for
> > instrumentation / data serialization / buffer management / extraction of
> > data to user space in multiple different steps, which makes code review
> > easier for you guys, or we bring the main pieces of the LTTng project
> > altogether with the Linux Kernel Markers, which would result in a bigger
> > change.
> > 
> > Based on the premise that discussing about logically distinct pieces of
> > infrastructure is easier and can be done more thoroughly when done
> > separately, we decided to submit the markers first, with the other
> > pieces planned in a near future.
> > 
> > I agree that it would be very useful to have the full tracing stack
> > available in the Linux kernel, but we inevitably face the argument :
> > "this change is too big" if we submit all LTTng modules at once or
> > the argument : "we want the whole tracing stack, not just part of it"
> > if we don't.
> > 
> > This is why we chose to push the tracing infrastructure chunk by chunk :
> > to make code review and criticism more efficient.
> > 
> 
> I didn't know that this was the plan.
> 
> The problem I have with this is that once we've merged one part, we're
> committed to merging the other parts even though we haven't seen them yet.
> 
> What happens if there's a revolt over the next set of patches?  Do we
> remove the core markers patches again?  We end up in a cant-go-forward,
> cant-go-backward situation.
> 
> I thought the existing code was useful as-is for several projects, without
> requiring additional patching to core kernel.  If such additional patching
> _is_ needed to make the markers code useful then I agree that we should
> continue to buffer the markers code in -mm until the
> use-markers-for-something patches have been eyeballed.
> 

My statement was probably not clear enough. The actual marker code is
useful as-is without any further kernel patching required : SystemTAP is
an example where they use external modules to load probes that can
connect either to markers or through kprobes. LTTng, in its current state,
has a mostly modular core that also uses the markers.

Although some, like Christoph and myself, think that it would benefit to
the kernel community to have a common infrastructure for more than just
markers (meaning common serialization and buffering mechanism), it does
not change the fact that the markers, being in mainline, are usable by
projects through additional kernel modules.

If we are looking at current "potential users" that are already in
mainline, we could change blktrace to make it use the markers.

Mathieu


> In which case we have:
> 
> atomich-add-atomic64-cmpxchg-xchg-and-add_unless-to-alpha.patch
> atomich-complete-atomic_long-operations-in-asm-generic.patch
> atomich-i386-type-safety-fix.patch
> atomich-add-atomic64-cmpxchg-xchg-and-add_unless-to-ia64.patch
> atomich-add-atomic64-cmpxchg-xchg-and-add_unless-to-mips.patch
> atomich-add-atomic64-cmpxchg-xchg-and-add_unless-to-parisc.patch
> atomich-add-atomic64-cmpxchg-xchg-and-add_unless-to-powerpc.patch
> atomich-add-atomic64-cmpxchg-xchg-and-add_unless-to-sparc64.patch
> atomich-add-atomic64-cmpxchg-xchg-and-add_unless-to-x86_64.patch
> atomich-atomic_add_unless-as-inline-remove-systemh-atomich-circular-dependency.patch
> local_t-architecture-independant-extension.patch
> local_t-alpha-extension.patch
> local_t-i386-extension.patch
> local_t-ia64-extension.patch
> local_t-mips-extension.patch
> local_t-parisc-cleanup.patch
> local_t-powerpc-extension.patch
> local_t-sparc64-cleanup.patch
> local_t-x86_64-extension.patch
> 
>   For 2.6.22
> 
> linux-kernel-markers-kconfig-menus.patch
> linux-kernel-markers-architecture-independant-code.patch
> linux-kernel-markers-powerpc-optimization.patch
> linux-kernel-markers-i386-optimization.patch
> markers-add-instrumentation-markers-menus-to-avr32.patch
> linux-kernel-markers-non-optimized-architectures.patch
> markers-alpha-and-avr32-supportadd-alpha-markerh-add-arm26-markerh.patch
>

[patch] sanitize linux/isdn_divertif.h for userspace

2007-05-02 Thread Mike Frysinger

the isdn_divertif contains kernel-only references so ive wrapped them in 
__KERNEL__ and add proper #include statements

Signed-off-by: Mike Frysinger <[EMAIL PROTECTED]>
---
diff --git a/include/linux/Kbuild b/include/linux/Kbuild
index 4ff0f57..ab2aaa2 100644
--- a/include/linux/Kbuild
+++ b/include/linux/Kbuild
@@ -91,7 +91,6 @@ header-y += ip_mp_alg.h
 header-y += ipsec.h
 header-y += ipx.h
 header-y += irda.h
-header-y += isdn_divertif.h
 header-y += iso_fs.h
 header-y += ixjuser.h
 header-y += jffs2.h
@@ -238,6 +237,7 @@ unifdef-y += ipv6.h
 unifdef-y += ipv6_route.h
 unifdef-y += isdn.h
 unifdef-y += isdnif.h
+unifdef-y += isdn_divertif.h
 unifdef-y += isdn_ppp.h
 unifdef-y += isicom.h
 unifdef-y += jbd.h
diff --git a/include/linux/isdn_divertif.h b/include/linux/isdn_divertif.h
index 0e7e44c..0df24b2 100644
--- a/include/linux/isdn_divertif.h
+++ b/include/linux/isdn_divertif.h
@@ -24,6 +24,10 @@
 #define DIVERT_REL_ERR  0x04  /* module not registered */
 #define DIVERT_REG_NAME isdn_register_divert
 
+#ifdef __KERNEL__
+#include 
+#include 
+
 /***/
 /* structure exchanging data between isdn hl and divert module */
 /***/ 
@@ -40,3 +43,4 @@ typedef struct
 /* function register */
 /*/
 extern int DIVERT_REG_NAME(isdn_divert_if *);
+#endif
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Execute in place

Hugh Dickins wrote:
> On Wed, 2 May 2007, Phillip Susi wrote:
> > Hugh Dickins wrote:
> > > tmpfs doesn't store its stuff in the page cache twice: that's true,
> > > and I didn't mean to imply otherwise.  But tmpfs doesn't contain any
> > > support for rom memory: you'd have to copy from rom to tmpfs to use
> > > it.
> >
> > The question is, when you execute a binary on tmpfs, does its code
> > segment get mapped directly where it's at in the buffer cache, or does
> > it get copied to another page for the executing process?  At least,
> > assuming this is possible due to the vma and file offsets of the segment
> > being aligned.
>
> Its pages are mapped directly into the executing process, without copying.

Thank GOD!  Boy, was I worried there for a second.

Now, if there were only an easy way to make tmpfs persistent?


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ck] [REPORT] 2.6.21.1 vs 2.6.21-sd046 vs 2.6.21-cfs-v6

Con Kolivas wrote:
> On Monday 30 April 2007 18:05, Michael Gerdau wrote:
> > meanwhile I've redone my numbercrunching tests with the following
> > kernels: 2.6.21.1 (mainline)
> > 2.6.21-sd046
> > 2.6.21-cfs-v6
> > running on a dualcore x86_64.
> > [I will run the same test with 2.6.21.1-cfs-v7 over the next days,
> > likely tonight]
:
:
> > However from these figures it seems as if sd does provide for the
> > fairest (as in equal share for all) scheduling among the 3 schedulers
> > tested.
>
> Looks good, thanks. Ingo's been hard at work since then and has v8 out by
> now. SD has not changed so you wouldn't need to do the whole lot of tests
> on SD again unless you don't trust some of the results.

Well, I tried cfs-v8 and it still shows some nice regressions wrt 
mainline/sd.  SD's nice-levels look rather solid, implying fairness.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] synclink_gt add compat_ioctl

2007-05-02 Thread Paul Fulghum


Arnd Bergmann wrote:

The same function contains a copy_from_user(), which cannot
be called with interrupts disabled, so yes, I am very certain
it will not change.


Good point.

--
Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] lib/hexdump

On Wed, 02 May 2007 15:56:48 -0700
Randy Dunlap <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> > On Wed, 2 May 2007 15:35:56 -0700
> > Randy Dunlap <[EMAIL PROTECTED]> wrote:
> > 
> >> From: Randy Dunlap <[EMAIL PROTECTED]>
> >>
> >> Based on ace_dump_mem() from Grant Likely for the Xilinx 
> >> SystemACE CompactFlash interface.
> >>
> >> Add hex_dumper() to lib/hexdump.c and linux/kernel.h.
> >>
> >> This patch adds the function 'hex_dumper' which can be used to perform a 
> >> hex + ASCII dump of data to syslog, in an easily viewable format, thus
> >> providing a common text hex dump format.
> >>
> >> It does not provide a hexdump_to_buffer() function.
> >> if someone needs that, we'll have to add it.
> >>
> >> Example usage:
> >>hex_dumper(KERN_DEBUG, data, length);
> >>
> > 
> > Fair enough.  This is the sort of thing one could easily overdesign ;)
> 
> The Intel version also returned the number of bytes printed.
> and they had a hexdump_to_buffer() for sysfs output.
> 

Yeah, that's where we get into creature feeping.  Really it should be
passed the address of a function which performs the per-char output and
which is passed a bunch of args so it can do its stuff.  But doing printk
of a single char at a time is a bit inefficient and produces mangled output
on SMP.  And then we don't know the length of the output and we'd like it
dynamically allocated and on and on.

Ho hum.  Perhaps a middle ground is to implement hexdump-to-memory as the
core function.  hex_dumper() becomes a simple wrapper around that.  (but
how big is its buffer?  One line would be OK, I guess)

> OK, that's one way to do it.  I'll wait a bit for other comments.

Good luck ;)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-02 Thread Thomas Hellström

Eric Anholt wrote:

On Thu, 2007-04-26 at 16:55 +1000, Dave Airlie wrote:

Hi,

The patch is too big to fit on the list and I've no idea how we could
break it down further, it just happens to be a lot of new code..

http://people.freedesktop.org/~airlied/ttm/0001-drm-Implement-TTM-Memory-manager-core-functionality.txt

The patch header and diffstat are below,

This isn't for integration yet but we'd like an initial review by
anyone with the spare time and inclination, there is a lot stuff
relying on getting this code bet into shape and into the kernel but
any cleanups people can suggest now especially to the user interfaces
would be appreciated as once we set that stuff in stone it'll be a
pain to change... also it doesn't have any driver side code, this is
just the generic pieces. I'll post the intel 915 side code later but
there isn't that much to it..

It applies on top of my drm-2.6 git tree drm-mm branch

This patch brings in the TTM (Translation Table Maps) memory
management
system from Thomas Hellstrom at Tungsten Graphics.

This patch only covers the core functionality and changes to the drm
core.

The TTM memory manager enables dynamic mapping of memory objects in
and
out of the graphic card accessible memory (e.g. AGP), this implements
the AGP backend for TTM to be used by the i915 driver.

I've been slow responding, but we've been talking a lot on IRC and at
Intel about the TTM interface recently, and trying to come up with a
concensus between us as to what we'd like to see.

1) Multiplexed ioctls hurt
The first issue I have with this version is the userland interface.
You've got two ioctls for buffer management and once for fence
management, yet these 3 ioctls are actually just attempting to be
generic interfaces for around 25 actual functions you want to call
(except for the unimplemented ones, drm_bo_fence and drm_bo_ref_fence).
So there are quasi-generic arguments to these ioctls, where most of the
members are ignored by any given function, but it's not obvious to the
caller which ones. There's no comments or anything as to what the
arguments to these functions are or what exactly they do. We've got 100
generic ioctl numbers allocated and unused still, so I don't think we
should be shy about having separate ioctls for separate functions, if
this is the interface we expect to use going forward.

Right. This interface was in its infancy when there were only (without
looking to deeply) three generic IOCTLS left.

this is definitely a good point and I agree completely.

2) Early microoptimizations
There's also apparently an unused way to chain these function calls in a
single ioctl call for the buffer object ioctl. This is one of a couple
of microoptimizations at the expense of code clarity which have bothered
me while reviewing the TTM code, when I'm guessing no testing was done
to see if it was actually a bottleneck.

Yes. The function chaining is currently only used to validate buffer
lists. I still believe it is
needed for that functionality, a bit depending on what we want to be
able to change when a buffer is validated. But I can currently not see
any other use for it in the future.

3) Fencing and flushing troubles
I'm definitely concerned by the fencing interface. Right now, the
driver is flushing caches and fencing every buffer with EXE and its
driver-specific FLUSHED flag in dispatching command buffers. We almost
surely don't want to be flushing for every batch buffer just in case
someone wants to do CPU reads from something. However, with the current
mechanism, if I fence my operation with just EXE and no flush, then
somebody goes to map that buffer, they'll wait for the fence, but no
flush will be emitted. The interface we've been imagining wouldn't have
driver-specific fence flags, but instead be only a marker of when
command execution has passed a certain point (which is what fencing is
about, anyway). In validating buffers, you would pass whether they're
for READ or WRITE as we currently do, and you'd put it on the unfenced
read/write lists as appropriate. Add one buffer object function for
emitting the flush, which would then determine whether the next
fence-all-unfenced call would cover just the list of unfenced reads or
the list of both unfenced reads and unfenced writes. Then, in mapping,
check if it's on the unfenced-writes list and emit the flush and fence,
and then wait for a fence on the buffer before continuing with the
mapping.

Right. This functionality is actually available in the current code,
except that we have only one unfenced list and the fence flags indicate
what type of flushes are needed. There's even an implementation of the
intel sync flush mechanism in the fencing code. If the batch-buffer
flush is omitted the driver specific flush flag is not needed and the
fence mechanism will do a sync flush whenever

Re: [patches] [PATCH] [21/22] x86_64: Extend bzImage protocol for relocatable bzImage

2007-05-02 Thread Jeremy Fitzhardinge

H. Peter Anvin wrote:
> I don't know if that would break any programs that are currently
> bypassing the setup.  The existing setup protocol definitely allows
> invoking an entry point which isn't 0x10 (rather, the 32-bit
> entrypoint is defined by code32_start); I'm not sure how Eric's
> relocatable kernel patches (2.05 protocol) affect that, mostly because I
> haven't seen any boot loaders which actually use it so I can't comment
> on what their code looks like.

Yes, I'd expect that code32_start would point into the ELF text
segment.   You could align things so that the entrypoint is still
actually 0x10, or bump it up a bit to fit the ELF headers.  I have
to admit I don't quite understand how all that fits together at the moment.

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 6/6] firewire: add it all to kbuild

2007-05-02 Thread Stefan Richter

Christoph Hellwig wrote:
>> +fw-core-objs := fw-card.o fw-topology.o fw-transaction.o fw-iso.o \
>> +fw-device.o fw-cdev.o
> 
> fw-core-y += ..
> 

Like such?

--- linux.orig/drivers/usb/core/Makefile
+++ linux/drivers/usb/core/Makefile
@@ -2,17 +2,12 @@
 # Makefile for USB Core files and filesystem
 #

-usbcore-objs   := usb.o hub.o hcd.o urb.o message.o driver.o \
+usbcore-y  += usb.o hub.o hcd.o urb.o message.o driver.o \
config.o file.o buffer.o sysfs.o endpoint.o \
devio.o notify.o generic.o quirks.o

-ifeq ($(CONFIG_PCI),y)
-   usbcore-objs+= hcd-pci.o
-endif
-
-ifeq ($(CONFIG_USB_DEVICEFS),y)
-   usbcore-objs+= inode.o devices.o
-endif
+usbcore-$(CONFIG_PCI)  += hcd-pci.o
+usbcore-$(CONFIG_USB_DEVICEFS) += inode.o devices.o

 obj-$(CONFIG_USB)  += usbcore.o


-- 
Stefan Richter
-=-=-=== -=-= ---==
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 09/22] pollfs: pollable hrtimers

2007-05-02 Thread Davi Arnaut

Thomas Gleixner wrote:
> On Wed, 2007-05-02 at 02:22 -0300, Davi Arnaut wrote:
>> plain text document attachment (pollfs-timer.patch)
>> Per file descriptor high-resolution timers. A classic unix file interface for
>> the POSIX timer_(create|settime|gettime|delete) family of functions.
>>
>> Signed-off-by: Davi E. M. Arnaut <[EMAIL PROTECTED]>
> 
> Nacked-by-me.
> 
> Aside of the fact, that it is a bad clone of the timerfd code, it is
> simply broken and untested.

I've made it by the same time of timerfd, I even sent it to Davide and
the list. "Clone" is a bit of overstatment, timerfd is not bugged as this :)

>> +
>> +struct hrtimerspec {
>> +int flags;
>> +clockid_t clock;
>> +struct itimerspec expr;
>> +};
> 
> How exactly knows userspace what a struct hrtimerspec is ? Is the c file
> exported as a header ?

Will move then all to another header later.

>> +static ssize_t read(struct pfs_timer *evs, struct itimerspec __user *uspec)
>> +{
>> +int ret = -EAGAIN;
>> +ktime_t remaining = {};
>> +unsigned long overruns = 0;
>> +struct itimerspec spec = {};
>> +struct hrtimer *timer = >timer;
>> +
>> +spin_lock_irq(>lock);
>> +
>> +if (!evs->overruns)
>> +goto out_unlock;
>> +
>> +if (hrtimer_active(timer))
>> +remaining = hrtimer_get_remaining(timer);
>> +else if (evs->interval.tv64 > 0)
>> +overruns = hrtimer_forward(timer, hrtimer_cb_get_time(timer),
>> +   evs->interval);
> 
> Where is the logic here ? 

Return the remaing time for timer firing, or rearm the timer. And its
pretty broken because of the first if and I forgot to reset overruns.

> If no overrun, return remaining time = 0
> 
> If active, return the real remaining time. This path is never hit, as
> the timer is nowhere restarted.
> 
> If not active, return remanining time = 0. How does the caller know how
> many events are missed ? 
> 
>> +ret = -EOVERFLOW;
>> +if (overruns > (ULONG_MAX - evs->overruns))
>> +goto out_unlock;
>> +else
>> +evs->overruns += overruns;
> 
> Interesting feature. evs->overruns is adding up forever and then limited
> to ULONG_MAX

See third comment!

>> +static enum hrtimer_restart timer_fn(struct hrtimer *timer)
>> +{
>> +struct pfs_timer *evs = container_of(timer, struct pfs_timer, timer);
>> +unsigned long flags;
>> +
>> +spin_lock_irqsave(>lock, flags);
>> +/* timer tick, interval has elapsed */
>> +if (!evs->overruns++)
>> +wake_up_all(>wait);
> 
> Cool. Waiters, which came after the first event are stuck. Simply
> because there is no second event.

See third comment!

>> +static ssize_t write(struct pfs_timer *evs,
>> + const struct hrtimerspec __user *uspec)
>> +{
>> +struct hrtimerspec spec;
> 
> See first comment !
> 
>> +if (copy_from_user(, uspec, sizeof(spec)))
>> +return -EFAULT;
>> +
>> +if (spec_invalid())
>> +return -EINVAL;
>> +
>> +rearm_timer(evs, );
>> +
>> +return 0;
>> +}
>> +
>> +static int poll(struct pfs_timer *evs)
>> +{
>> +int ret;
>> +
>> +ret = evs->overruns ? POLLIN : 0;
>> +
>> +return ret;
>> +}
> 
> Creative lockless programming style with 4 lines overhead and a
> guaranteed return POLLIN after the first timer event. This is really
> cute as it covers the missing timer restart and guarantees 100% CPU load
> for ever. Hmm, maybe it's correct: polling should loop for ever,
> shouldn't it ?

See third comment! -- It will remain lockless, as reading and setting
this data type is guaranteed to happen atomically.

>> +static const struct pfs_operations timer_ops = {
>> +.read = PFS_READ(read, struct pfs_timer, struct itimerspec),
>> +.write = PFS_WRITE(write, struct pfs_timer, struct hrtimerspec),
>> +.poll = PFS_POLL(poll, struct pfs_timer),
>> +.release = PFS_RELEASE(release, struct pfs_timer),
>> +.rsize = sizeof(struct itimerspec),
>> +.wsize = sizeof(struct hrtimerspec),
> 
> See first comment !

This has nothing to do with user space, or you got lost in comments
references.

--
Davi Arnaut

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RELEASE] Lguest for 2.6.21

2007-05-02 Thread Rusty Russell

On Thu, 2007-05-03 at 03:33 +0800, WANG Cong wrote:
> Hi Rusty!
> 
> I found you forgot to check the return value of copy_from_user, and
> here is the fix for drivers/lguest/interrupts_and_traps.c.
> 
> Signed-off-by: WANG Cong <[EMAIL PROTECTED]>

Hi Wang!

Thanks for the patch.  This omission (in several places) was
deliberate.  We can't really do anything sensible if the user unmapped
the page.  I assume you saw a gcc warning from this code?

We could also use lgread() in these places which does this check and
kills the guest if something goes wrong.  I'll check the benchmarks to
make sure the (slight) extra overhead doesn't cause a regression...

Thanks!
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: arch/i386/boot rewrite, and all the hard-coded video cards

Rene Herman wrote:
> On 05/03/2007 12:11 AM, H. Peter Anvin wrote:
> 
>> The problem is to detect the ones that have it from the ones that don't.
> 
> Checking here, and mine also has 132x25 as BIOS mode 0x14 in addition to
> 0x55. Probably also not universal, and 0x54 (132x43) doesn't seem to be
> repeated. Unfortunate that Qemu/Bocks don't have the VESA text modes.
> 

Does it export these modes though the VESA interface, or do you have to
"select them blind?"

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Mysterious RTC hangs on x86_64 - fixed, sort of

2007-05-02 Thread Chuck Ebbert

Zachary Amsden wrote:
> With this patch, /sbin/hwclock no longer hangs my AMD64 machine when run
> after reaching multiuser.  What I don't understand is why.  I have the
> RTC based sound sequencer timer as a module, but not loaded, and the
> error message I added to indicate broken rtc control does not fire.
> 
> So why is it that if I stop taking the rtc_task_lock and issuing the
> callbacks which should never be held or exist that my system no longer
> hard freezes?
> 
> --- /tmp/a  2007-05-03 15:36:07.451256181 -0700
> +++ drivers/char/rtc.c  2007-05-03 15:27:49.0 -0700
> @@ -265,10 +265,10 @@
> spin_unlock (_lock);
>  
> /* Now do the rest of the actions */
> -   spin_lock(_task_lock);
> -   if (rtc_callback)
> -   rtc_callback->func(rtc_callback->private_data);
> -   spin_unlock(_task_lock);
> +/* spin_lock(_task_lock); */
> +// if (rtc_callback)
> +// rtc_callback->func(rtc_callback->private_data);
> +/* spin_unlock(_task_lock); */
> wake_up_interruptible(_wait);   

Try leaving the spinlocks and just disabling the callbacks. And maybe
enable spinlock debugging...

> 
> CONFIG_HPET_EMULATE_RTC=y

Did you try without that?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Why ssse3?

2007-05-02 Thread Ulrich Drepper

Andi Kleen wrote:
> Nope. SSE3 != SSSE3. The additional S means Supplemential.
> 
> It's probably because the few changes didn't justify a SSE4

OK, the problem is that the actual sse3 bit is misnamed.  According to
Intel's docs bit 0 of ECX is "sse", the kernel uses "pni".  Too bad.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

signature.asc
Description: OpenPGP digital signature

Re: More JMicron troubles with 2.6.21

2007-05-02 Thread Alan Cox

> The old IDE driver for JMicron didn't work well for me either,
> running on Kubuntu Edgy 2.6.17.  It would find the PATA DVD drive,
> and work okay for the first half of a DVD movie playback, and then go bonkers.

"go bonkers" being ? (and did you keep any dumps of it ?)

> 
> I eventually gave up on it and installed a pure SATA DVD drive on ICH7(8?)
> and just disabled the JMicron completely.


Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] lib/hexdump


Andrew Morton wrote:

On Wed, 2 May 2007 15:35:56 -0700
Randy Dunlap <[EMAIL PROTECTED]> wrote:


From: Randy Dunlap <[EMAIL PROTECTED]>

Based on ace_dump_mem() from Grant Likely for the Xilinx 
SystemACE CompactFlash interface.


Add hex_dumper() to lib/hexdump.c and linux/kernel.h.

This patch adds the function 'hex_dumper' which can be used to perform a 
hex + ASCII dump of data to syslog, in an easily viewable format, thus

providing a common text hex dump format.

It does not provide a hexdump_to_buffer() function.
if someone needs that, we'll have to add it.

Example usage:
hex_dumper(KERN_DEBUG, data, length);



Fair enough.  This is the sort of thing one could easily overdesign ;)


The Intel version also returned the number of bytes printed.
and they had a hexdump_to_buffer() for sysfs output.


 include/linux/kernel.h |1 
 lib/Makefile   |2 -

 lib/hexdump.c  |   51 +
 3 files changed, 53 insertions(+), 1 deletion(-)

--- /dev/null
+++ linux-2.6.21-git4/lib/hexdump.c
@@ -0,0 +1,51 @@



+/**
+ * hex_dumper - print a text hex dump to syslog for a binary blob of data
+ * @level: kernel log level (e.g. KERN_DEBUG)
+ * @buf: data blob to dump
+ * @len: number of bytes in the @buf
+ *
+ * Given a buffer of u8 data, hex_dumper() will print a hex + ASCII dump
+ * to the kernel log at the specified kernel log level.
+ *
+ * E.g.:
+ * hex_dumper(KERN_DEBUG, frame->data, frame->len);
+ *
+ * Prints the offsets of the block of memory, not addresses:
+ * 0009ab42: 40414243 44454647 48494a4b [EMAIL PROTECTED] HIJKLMNO


But I suspect it should be printing the addresses, for many callers.

In which case we'd need a separate arg (base_address or somesuch) so that
callers who want to show real virtual addresses can pass in `base' and
callers who want to display relative offsets can pass in 0.


OK, that's one way to do it.  I'll wait a bit for other comments.


Which implies that the address will need to be printed as a 16-digit number
on 64-bit kernels.


Yep.


--
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: arch/i386/boot rewrite, and all the hard-coded video cards

2007-05-02 Thread Rene Herman


On 05/03/2007 12:11 AM, H. Peter Anvin wrote:


The problem is to detect the ones that have it from the ones that don't.


Checking here, and mine also has 132x25 as BIOS mode 0x14 in addition to 
0x55. Probably also not universal, and 0x54 (132x43) doesn't seem to be 
repeated. Unfortunate that Qemu/Bocks don't have the VESA text modes.


Rene.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [4/6] go BUG on vmallocspace in __pa()

William Lee Irwin III wrote:
>> +unsigned long __kvaddr_to_paddr(unsigned long kvaddr)
>> +{
>> +if (high_memory)
>> +BUG_ON(kvaddr >= VMALLOC_START);
>> +else
>> +BUG_ON(kvaddr >= (unsigned long)__va(MAXMEM));
>> +return kvaddr - PAGE_OFFSET;
>> +}

On Wed, May 02, 2007 at 03:31:03PM -0700, Jeremy Fitzhardinge wrote:
> Needs to be exported so that modules can use __pa.  Though I suspect
> most modules doing so are buggy:

Done.


-- wli

This patch introduces CONFIG_DEBUG_STACK, which vmalloc()'s task and IRQ
stacks in order to establish guard pages. In such a manner any stack
overflow that references pages immediately adjacent to the stack is
immediately trapped with a fault, which precludes silent memory corruption
or difficult-to-decipher failure modes resulting from stack corruption.

It furthermore adds a check to __pa() to catch drivers trying to DMA off
the stack, which more generally flags incorrect attempts to use __pa()
on vmallocspace addresses.

Signed-off-by: William Irwin <[EMAIL PROTECTED]>


Index: stack-paranoia/arch/i386/Kconfig.debug
===
--- stack-paranoia.orig/arch/i386/Kconfig.debug 2007-05-01 10:18:50.942170611 
-0700
+++ stack-paranoia/arch/i386/Kconfig.debug  2007-05-01 10:19:47.145373449 
-0700
@@ -35,6 +35,16 @@
 
  This option will slow down process creation somewhat.
 
+config DEBUG_STACK
+   bool "Debug stack overflows"
+   depends on DEBUG_KERNEL
+   help
+ Allocates the stack physically discontiguously and from high
+ memory. Furthermore an unmapped guard page follows the stack,
+ which results in immediately trapping stack overflows instead
+ of silent corruption. This is not for end-users. It's intended
+ to trigger fatal system errors under various forms of stack abuse.
+
 comment "Page alloc debug is incompatible with Software Suspend on i386"
depends on DEBUG_KERNEL && SOFTWARE_SUSPEND
 
Index: stack-paranoia/arch/i386/kernel/process.c
===
--- stack-paranoia.orig/arch/i386/kernel/process.c  2007-05-01 
10:18:50.950171067 -0700
+++ stack-paranoia/arch/i386/kernel/process.c   2007-05-01 10:19:47.145373449 
-0700
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -322,6 +323,58 @@
show_trace(NULL, regs, >esp);
 }
 
+#ifdef CONFIG_DEBUG_STACK
+struct thread_info *alloc_thread_info(struct task_struct *unused)
+{
+   int i;
+   struct page *pages[THREAD_SIZE/PAGE_SIZE], **tmp = pages;
+   struct vm_struct *area;
+
+   /*
+* passing VM_IOREMAP for the sake of alignment is why
+* all this is done by hand.
+*/
+   area = get_vm_area(THREAD_SIZE, VM_IOREMAP);
+   if (!area)
+   return NULL;
+   for (i = 0; i < THREAD_SIZE/PAGE_SIZE; ++i) {
+   pages[i] = alloc_page(GFP_HIGHUSER);
+   if (!pages[i])
+   goto out_free_pages;
+   }
+   /* implicitly transfer page refcounts to the vm_struct */
+   if (map_vm_area(area, PAGE_KERNEL, ))
+   goto out_remove_area;
+   /* it may be worth poisoning, save thread_info proper */
+   return (struct thread_info *)area->addr;
+out_remove_area:
+   remove_vm_area(area);
+out_free_pages:
+   do {
+   __free_page(pages[--i]);
+   } while (i >= 0);
+   return NULL;
+}
+
+static void work_free_thread_info(struct work_struct *work)
+{
+   int i;
+   void *p = work;
+
+   for (i = 0; i < THREAD_SIZE/PAGE_SIZE; ++i)
+   __free_page(vmalloc_to_page(p + PAGE_SIZE*i));
+   vfree(p);
+}
+
+void free_thread_info(struct thread_info *info)
+{
+   struct work_struct *work = (struct work_struct *)info;
+
+   INIT_WORK(work, work_free_thread_info);
+   schedule_work(work);
+}
+#endif
+
 /*
  * This gets run with %ebx containing the
  * function to call, and %edx containing
Index: stack-paranoia/include/asm-i386/module.h
===
--- stack-paranoia.orig/include/asm-i386/module.h   2007-05-01 
10:18:50.998173802 -0700
+++ stack-paranoia/include/asm-i386/module.h2007-05-01 10:19:47.145373449 
-0700
@@ -68,6 +68,13 @@
 #define MODULE_STACKSIZE ""
 #endif
 
-#define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY MODULE_STACKSIZE
+#ifdef CONFIG_DEBUG_STACK
+#define MODULE_DEBUG_STACK "DEBUG_STACKS "
+#else
+#define MODULE_DEBUG_STACK ""
+#endif
+
+#define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY MODULE_STACKSIZE \
+   MODULE_DEBUG_STACK
 
 #endif /* _ASM_I386_MODULE_H */
Index: stack-paranoia/include/asm-i386/thread_info.h
===
--- stack-paranoia.orig/include/asm-i386/thread_info.h  2007-05-01 
10:18:51.006174258 -0700
+++

[patch] hide spinlock in linux/quota.h behind KERNEL

2007-05-02 Thread Mike Frysinger

Signed-off-by: Mike Frysinger <[EMAIL PROTECTED]>
---
diff --git a/include/linux/quota.h b/include/linux/quota.h
index 77db80a..6243982 100644
--- a/include/linux/quota.h
+++ b/include/linux/quota.h
@@ -44,8 +44,6 @@
 typedef __kernel_uid32_t qid_t; /* Type in which we store ids in memory */
 typedef __u64 qsize_t;  /* Type in which we store sizes */
 
-extern spinlock_t dq_data_lock;
-
 /* Size of blocks in which are counted size limits */
 #define QUOTABLOCK_BITS 10
 #define QUOTABLOCK_SIZE (1 << QUOTABLOCK_BITS)
@@ -139,6 +137,8 @@ struct if_dqinfo {
 #include 
 #include 
 
+extern spinlock_t dq_data_lock;
+
 /* Maximal numbers of writes for quota operation (insert/delete/update)
  * (over VFS all formats) */
 #define DQUOT_INIT_ALLOC max(V1_INIT_ALLOC, V2_INIT_ALLOC)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] pull in the linux/input.h header in linux/uinput.h

2007-05-02 Thread Mike Frysinger

uinput.h relies on structures only found in input.h, so pull in the header

Signed-off-by: Mike Frysinger <[EMAIL PROTECTED]>
---
diff --git a/include/linux/uinput.h b/include/linux/uinput.h
index 1fd61ee..a6c1e8e 100644
--- a/include/linux/uinput.h
+++ b/include/linux/uinput.h
@@ -32,6 +32,8 @@
  * - first public version
  */
 
+#include 
+
 #define UINPUT_VERSION 3
 
 #ifdef __KERNEL__
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Why ssse3?

2007-05-02 Thread Andi Kleen

On Thursday 03 May 2007 00:41:22 Ulrich Drepper wrote:
> Note the extra 's'.  We use "sse" and "sse2", but "ssse3".  I assume
> it's a typo.

Nope. SSE3 != SSSE3. The additional S means Supplemential.

It's probably because the few changes didn't justify a SSE4

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] synclink_gt add compat_ioctl

2007-05-02 Thread Arnd Bergmann

On Thursday 03 May 2007, Paul Fulghum wrote:
> >> +
> >> +spin_lock_irqsave(>lock, flags);
> > 
> > no need for _irqsave, just use spin_{,un}lock_irq() when you know that
> > interrupts are enabled.
> 
> That makes me a little uneasy. The locking
> mechanisms (and just about everything else) above the driver
> seem to change frequently. This involves not just the VFS but
> the tty core as well.
> 
> If you are confident this will not change, I will
> switch to spin_lock(). I used spin_lock_irqsave() to be
> more robust against changes to behavior outside my driver.

The same function contains a copy_from_user(), which cannot
be called with interrupts disabled, so yes, I am very certain
it will not change.

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] lib/hexdump