Re: [PATCH -mm] Blackfin: blackfin i2c driver

2007-03-06 Thread Andrew Morton
On Wed, 07 Mar 2007 15:39:27 +0800 "Wu, Bryan" <[EMAIL PROTECTED]> wrote:

> Thanks a lot, could you please give me a script just to kill this
> whitespace? So I can do it before sending you patches.


Is pretty simple:

#!/bin/sh
#
# Strip any trailing whitespace which a unified diff adds.
#

strip1()
{
TMP=$(mktemp /tmp/XX)
cp $1 $TMP
sed -e '/^+/s/[ ]*$//' < $TMP > $1
rm $TMP
}

for i in $*
do
strip1 $i
done


that'll be in
http://www.zip.com.au/~akpm/linux/patches/patch-scripts-0.20/patch-scripts-0.20.tar.gz
too
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree

2007-03-06 Thread Thomas Gleixner
On Tue, 2007-03-06 at 17:44 -0800, Dan Hecht wrote:
> >>> 2) As I said above. The time accounting for virtualization needs to be
> >>> fixed in a generic way.
> >>>
> >>> I'm not going to accept some weird hackery for virtualization, which is
> >>> of exactly ZERO value for the kernel itself. Quite the contrary it will
> >>> make the cleanup harder and introduce another hard to remove thing,
> >>> which will in the worst case last for ever.
> >>>
> >> Okay, to confirm I'm on the same page as you, you want to move process 
> >> time accounting from being periodic sampled based to being trace based? 
> >> i.e. at the system-call/interrupt boundaries, read clocksource and 
> >> compute directly the amount of system/user/process time?
> > 
> > At least for the paravirt guests this is the correct approach. Once the
> > CPU vendors come up with a sane solution for a reliable and fast clock
> > source we might use that on real hardware as well.
> > 
> 
> I thought your preference was to not do things differently from real 
> hardware?  I guess this case you are okay with since you'd like to see 
> the real hardware case follow eventually?

Real hardware _IS_ broken and slow. If we add the facilities for
virtualization we want it in a way, which is usable by real hardware as
well.

> > Yes, with todays hardware it is simply a PITA. PowerPC has some basic
> > support for this though, IIRC.
> > 
> 
> I think S390 maybe too.

One more reason to make it a generic solution rather than some extra
hackery.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] Blackfin: blackfin i2c driver

2007-03-06 Thread Wu, Bryan
On Tue, 2007-03-06 at 23:14 -0800, Andrew Morton wrote:
> On Wed, 7 Mar 2007 07:58:22 +0100 Jean Delvare <[EMAIL PROTECTED]> wrote:
> 
> > > +config BFIN_SDA
> > 
> > I2C_BLACKFIN_SDA
> 
> The blackfin architecture uses "bfin" pretty much universally, so this
> usage is consistent.
> 
> box:/usr/src/25> grep -i blackfin patches/blackfin*|wc -l
>1608
> box:/usr/src/25> grep -i bfin patches/blackfin*|wc -l
>6198
> 

Thanks for you understanding, but now we want to move to use
CONFIG_BLACKFIN options. There is a new task in our development plan to
change things to CONFIG_BLACKFIN.

At this moment, we both provide CONFIG_BFIN and CONFIG_BLACKFIN. When
all the things relied on CONFIG_BFIN/bfin are changed to
CONFIG_BLACKFIN/blackfin, the CONFIG_BFIN will be removed.

So here I will follow Jean's comments.

> Let's just hope nobody makes a bluefin.

So it is ok  for both blackfin and bluefin. But I think Black is cooler
than Blue. -:)

> 
> > > + range 0 15 if (BF533 || BF532 || BF531) 
> > 
> > Trailing whitespace.
> 
> I always remove that when merging a patch.

Thanks a lot, could you please give me a script just to kill this
whitespace? So I can do it before sending you patches.

Thanks Jean and Andrew.
-Bryan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Bug 8136] 2.6.21-rc2-mm2 won't boot

2007-03-06 Thread Nicolas Mailhot
Le mardi 06 mars 2007 à 16:15 -0800, Andrew Morton a écrit :

> So rc2-mm2 panics due to "MP-BIOS bug: 8254 timer not connected to IO-APIC" 
> and
> rc2-mm1 does not.
> 
> Could be ACPI, could be x86_64 timer changes, could be something else.
> 
> Would you have time to bisect it? 
> http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt
> explains how.
> 
> If so, I'd suggest you drill in on the patches between
> x86_64-mm-defconfig-update.patch and
> optimize-and-simplify-get_cycles_sync.patch: the x86 changes.

I may have some more debug time this evening (CET), probably not enough
for a full bisection. I'd really love to have timer/clock problems
nailed once and for all on this box (MP BIOS, RTC, HPET, whatever)

-- 
Nicolas Mailhot


signature.asc
Description: Ceci est une partie de message	numériquement signée


[PATCH 8/20] x86_64: 64bit PIC SMP trampoline

2007-03-06 Thread Vivek Goyal


This modifies the SMP trampoline and all of the associated code so
it can jump to a 64bit kernel loaded at an arbitrary address.

The dependencies on having an idenetity mapped page in the kernel
page tables for SMP bootup have all been removed.

In addition the trampoline has been modified to verify
that long mode is supported.  Asking if long mode is implemented is
down right silly but we have traditionally had some of these checks,
and they can't hurt anything.  So when the totally ludicrous happens
we just might handle it correctly.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/head.S   |1 
 arch/x86_64/kernel/setup.c  |9 --
 arch/x86_64/kernel/trampoline.S |  168 
 3 files changed, 156 insertions(+), 22 deletions(-)

diff -puN arch/x86_64/kernel/head.S~x86_64-64bit-PIC-SMP-trampoline 
arch/x86_64/kernel/head.S
--- 
linux-2.6.21-rc2-reloc/arch/x86_64/kernel/head.S~x86_64-64bit-PIC-SMP-trampoline
2007-03-07 01:25:32.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/head.S   2007-03-07 
01:25:32.0 +0530
@@ -101,6 +101,7 @@ startup_32:
.org 0x100  
.globl startup_64
 startup_64:
+ENTRY(secondary_startup_64)
/* We come here either from startup_32
 * or directly from a 64bit bootloader.
 * Since we may have come directly from a bootloader we
diff -puN arch/x86_64/kernel/setup.c~x86_64-64bit-PIC-SMP-trampoline 
arch/x86_64/kernel/setup.c
--- 
linux-2.6.21-rc2-reloc/arch/x86_64/kernel/setup.c~x86_64-64bit-PIC-SMP-trampoline
   2007-03-07 01:25:32.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/setup.c  2007-03-07 
01:25:32.0 +0530
@@ -329,15 +329,8 @@ void __init setup_arch(char **cmdline_p)
 #endif
 
 #ifdef CONFIG_SMP
-   /*
-* But first pinch a few for the stack/trampoline stuff
-* FIXME: Don't need the extra page at 4K, but need to fix
-* trampoline before removing it. (see the GDT stuff)
-*/
-   reserve_bootmem_generic(PAGE_SIZE, PAGE_SIZE);
-
/* Reserve SMP trampoline */
-   reserve_bootmem_generic(SMP_TRAMPOLINE_BASE, PAGE_SIZE);
+   reserve_bootmem_generic(SMP_TRAMPOLINE_BASE, 2*PAGE_SIZE);
 #endif
 
 #ifdef CONFIG_ACPI_SLEEP
diff -puN arch/x86_64/kernel/trampoline.S~x86_64-64bit-PIC-SMP-trampoline 
arch/x86_64/kernel/trampoline.S
--- 
linux-2.6.21-rc2-reloc/arch/x86_64/kernel/trampoline.S~x86_64-64bit-PIC-SMP-trampoline
  2007-03-07 01:25:32.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/trampoline.S 2007-03-07 
01:25:32.0 +0530
@@ -3,6 +3,7 @@
  * Trampoline.SDerived from Setup.S by Linus Torvalds
  *
  * 4 Jan 1997 Michael Chastain: changed to gnu as.
+ * 15 Sept 2005 Eric Biederman: 64bit PIC support
  *
  * Entry: CS:IP point to the start of our code, we are 
  * in real mode with no stack, but the rest of the 
@@ -17,15 +18,20 @@
  * and IP is zero.  Thus, data addresses need to be absolute
  * (no relocation) and are taken with regard to r_base.
  *
+ * With the addition of trampoline_level4_pgt this code can
+ * now enter a 64bit kernel that lives at arbitrary 64bit
+ * physical addresses.
+ *
  * If you work on this file, check the object module with objdump
  * --full-contents --reloc to make sure there are no relocation
- * entries. For the GDT entry we do hand relocation in smpboot.c
- * because of 64bit linker limitations.
+ * entries.
  */
 
 #include 
-#include 
+#include 
 #include 
+#include 
+#include 
 
 .data
 
@@ -33,15 +39,31 @@
 
 ENTRY(trampoline_data)
 r_base = .
+   cli # We should be safe anyway
wbinvd  
mov %cs, %ax# Code and data in the same place
mov %ax, %ds
+   mov %ax, %es
+   mov %ax, %ss
 
-   cli # We should be safe anyway
 
movl$0xA5A5A5A5, trampoline_data - r_base
# write marker for master knows we're running
 
+   # Setup stack
+   movw$(trampoline_stack_end - r_base), %sp
+
+   callverify_cpu  # Verify the cpu supports long mode
+
+   mov %cs, %ax
+   movzx   %ax, %esi   # Find the 32bit trampoline location
+   shll$4, %esi
+
+   # Fixup the vectors
+   addl%esi, startup_32_vector - r_base
+   addl%esi, startup_64_vector - r_base
+   addl%esi, tgdt + 2 - r_base # Fixup the gdt pointer
+
/*
 * GDT tables in non default location kernel can be beyond 16MB and
 * lgdt will not be able to load the address as in real mode default
@@ -49,23 +71,141 @@ r_base = .
 * to 32 bit.
 */
 
-   lidtl   idt_48 - r_base # load idt with 0, 0
-  

[PATCH 17/20] x86_64: __pa and __pa_symbol address space separation

2007-03-06 Thread Vivek Goyal


Currently __pa_symbol is for use with symbols in the kernel address
map and __pa is for use with pointers into the physical memory map.
But the code is implemented so you can usually interchange the two.

__pa which is much more common can be implemented much more cheaply
if it is it doesn't have to worry about any other kernel address
spaces.  This is especially true with a relocatable kernel as
__pa_symbol needs to peform an extra variable read to resolve
the address.

There is a third macro that is added for the vsyscall data
__pa_vsymbol for finding the physical addesses of vsyscall pages.

Most of this patch is simply sorting through the references to
__pa or __pa_symbol and using the proper one.  A little of
it is continuing to use a physical address when we have it
instead of recalculating it several times.

swapper_pgd is now NULL.  leave_mm now uses init_mm.pgd
and init_mm.pgd is initialized at boot (instead of compile time)
to the physmem virtual mapping of init_level4_pgd.  The
physical address changed.

Except for the for EMPTY_ZERO page all of the remaining references
to __pa_symbol appear to be during kernel initialization.  So this
should reduce the cost of __pa in the common case, even on a relocated
kernel.

As this is technically a semantic change we need to be on the lookout
for anything I missed.  But it works for me (tm).

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/i386/kernel/alternative.c |8 
 arch/i386/mm/init.c|   15 ---
 arch/x86_64/kernel/machine_kexec.c |   14 +++---
 arch/x86_64/kernel/setup.c |9 +
 arch/x86_64/kernel/smp.c   |2 +-
 arch/x86_64/kernel/vsyscall.c  |9 +++--
 arch/x86_64/mm/init.c  |   21 +++--
 arch/x86_64/mm/pageattr.c  |   16 
 include/asm-x86_64/page.h  |6 ++
 include/asm-x86_64/pgtable.h   |4 ++--
 10 files changed, 55 insertions(+), 49 deletions(-)

diff -puN 
arch/i386/kernel/alternative.c~x86_64-__pa-and-__pa_symbol-address-space-separation
 arch/i386/kernel/alternative.c
--- 
linux-2.6.21-rc2-reloc/arch/i386/kernel/alternative.c~x86_64-__pa-and-__pa_symbol-address-space-separation
  2007-03-07 01:31:03.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/i386/kernel/alternative.c  2007-03-07 
01:31:03.0 +0530
@@ -389,8 +389,8 @@ void __init alternative_instructions(voi
if (no_replacement) {
printk(KERN_INFO "(SMP-)alternatives turned off\n");
free_init_pages("SMP alternatives",
-   (unsigned long)__smp_alt_begin,
-   (unsigned long)__smp_alt_end);
+   __pa_symbol(&__smp_alt_begin),
+   __pa_symbol(&__smp_alt_end));
return;
}
 
@@ -419,8 +419,8 @@ void __init alternative_instructions(voi
_text, _etext);
}
free_init_pages("SMP alternatives",
-   (unsigned long)__smp_alt_begin,
-   (unsigned long)__smp_alt_end);
+   __pa_symbol(&__smp_alt_begin),
+   __pa_symbol(&__smp_alt_end));
} else {
alternatives_smp_save(__smp_alt_instructions,
  __smp_alt_instructions_end);
diff -puN 
arch/i386/mm/init.c~x86_64-__pa-and-__pa_symbol-address-space-separation 
arch/i386/mm/init.c
--- 
linux-2.6.21-rc2-reloc/arch/i386/mm/init.c~x86_64-__pa-and-__pa_symbol-address-space-separation
 2007-03-07 01:31:03.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/i386/mm/init.c 2007-03-07 
01:31:03.0 +0530
@@ -774,10 +774,11 @@ void free_init_pages(char *what, unsigne
unsigned long addr;
 
for (addr = begin; addr < end; addr += PAGE_SIZE) {
-   ClearPageReserved(virt_to_page(addr));
-   init_page_count(virt_to_page(addr));
-   memset((void *)addr, POISON_FREE_INITMEM, PAGE_SIZE);
-   free_page(addr);
+   struct page *page = pfn_to_page(addr >> PAGE_SHIFT);
+   ClearPageReserved(page);
+   init_page_count(page);
+   memset(page_address(page), POISON_FREE_INITMEM, PAGE_SIZE);
+   __free_page(page);
totalram_pages++;
}
printk(KERN_INFO "Freeing %s: %ldk freed\n", what, (end - begin) >> 10);
@@ -786,14 +787,14 @@ void free_init_pages(char *what, unsigne
 void free_initmem(void)
 {
free_init_pages("unused kernel memory",
-   (unsigned long)(&__init_begin),
-   (unsigned long)(&__init_end));
+   __pa_symbol(&__init_begin),
+   __pa_symbol(&__init_end));
 }
 
 #ifdef 

Re: [BUGFIX][PATCH] fix NULL pointer in ia64/irq_chip-mask/unmask function

2007-03-06 Thread KAMEZAWA Hiroyuki
On Tue, 6 Mar 2007 22:57:10 -0800
Andrew Morton <[EMAIL PROTECTED]> wrote:

> On Wed, 7 Mar 2007 15:23:17 +0900 KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:
> 
> > This patch fixes boot failure because irq_desc->mask() is NULL.
> > 
> > - Added mask/unmask functions to ia64's irq desc function table.
> >   But I'm not sure this fix is correct or not. please review.
> > 
> > - rename hw_interrupt_type to irq_chip. hw_interrupt_type is old name.
> 
> Thanks.
> 
> This bug is present in mainline too, isn't it?
> 
Yes, I confirmed rc3 has this bug.

-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 19/20] x86_64: Extend bzImage protocol for relocatable bzImage

2007-03-06 Thread Vivek Goyal


o Extend the bzImage protocol (same as i386) to allow bzImage loaders to
  load the protected mode kernel at non-1MB address. Now protected mode
  component is relocatable and can be loaded at non-1MB addresses.

o As of today kdump uses it to run a second kernel from a reserved memory
  area.

Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/x86_64/boot/setup.S |   13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff -puN 
arch/x86_64/boot/setup.S~x86_64-extend-bzImage-protocol-for-relocatable-bzImage 
arch/x86_64/boot/setup.S
--- 
linux-2.6.21-rc2-reloc/arch/x86_64/boot/setup.S~x86_64-extend-bzImage-protocol-for-relocatable-bzImage
  2007-03-07 01:32:01.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/boot/setup.S2007-03-07 
01:32:01.0 +0530
@@ -80,7 +80,7 @@ start:
 # This is the setup header, and it must start at %cs:2 (old 0x9020:2)
 
.ascii  "HdrS"  # header signature
-   .word   0x0204  # header version number (>= 0x0105)
+   .word   0x0205  # header version number (>= 0x0105)
# or else old loadlin-1.5 will fail)
 realmode_swtch:.word   0, 0# default_switch, SETUPSEG
 start_sys_seg: .word   SYSSEG
@@ -155,7 +155,16 @@ cmd_line_ptr:  .long 0 # (Header versio
# low memory 0x1 or higher.
 
 ramdisk_max:   .long 0x
-   
+kernel_alignment:  .long 0x20   # physical addr alignment required for
+   # protected mode relocatable kernel
+#ifdef CONFIG_RELOCATABLE
+relocatable_kernel:.byte 1
+#else
+relocatable_kernel:.byte 0
+#endif
+pad2:  .byte 0
+pad3:  .word 0
+
 trampoline:callstart_of_setup
.align 16
# The offset at this point is 0x240
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/20] x86_64: Add EFER to the register set saved by save_processor_state

2007-03-06 Thread Vivek Goyal


EFER varies like %cr4 depending on the cpu capabilities, and which cpu
capabilities we want to make use of.  So save/restore it make certain
we have the same EFER value when we are done.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/suspend.c |3 ++-
 include/asm-x86_64/suspend.h |1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff -puN 
arch/x86_64/kernel/suspend.c~x86_64-Add-EFER-to-the-set-registers-saved-by-save_processor_state
 arch/x86_64/kernel/suspend.c
--- 
linux-2.6.19-rc6-reloc/arch/x86_64/kernel/suspend.c~x86_64-Add-EFER-to-the-set-registers-saved-by-save_processor_state
  2006-11-17 00:08:16.0 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/suspend.c2006-11-17 
00:08:16.0 -0500
@@ -33,7 +33,6 @@ void __save_processor_state(struct saved
asm volatile ("str %0"  : "=m" (ctxt->tr));
 
/* XMM0..XMM15 should be handled by kernel_fpu_begin(). */
-   /* EFER should be constant for kernel version, no need to handle it. */
/*
 * segment registers
 */
@@ -50,6 +49,7 @@ void __save_processor_state(struct saved
/*
 * control registers 
 */
+   rdmsrl(MSR_EFER, ctxt->efer);
asm volatile ("movq %%cr0, %0" : "=r" (ctxt->cr0));
asm volatile ("movq %%cr2, %0" : "=r" (ctxt->cr2));
asm volatile ("movq %%cr3, %0" : "=r" (ctxt->cr3));
@@ -75,6 +75,7 @@ void __restore_processor_state(struct sa
/*
 * control registers
 */
+   wrmsrl(MSR_EFER, ctxt->efer);
asm volatile ("movq %0, %%cr8" :: "r" (ctxt->cr8));
asm volatile ("movq %0, %%cr4" :: "r" (ctxt->cr4));
asm volatile ("movq %0, %%cr3" :: "r" (ctxt->cr3));
diff -puN 
include/asm-x86_64/suspend.h~x86_64-Add-EFER-to-the-set-registers-saved-by-save_processor_state
 include/asm-x86_64/suspend.h
--- 
linux-2.6.19-rc6-reloc/include/asm-x86_64/suspend.h~x86_64-Add-EFER-to-the-set-registers-saved-by-save_processor_state
  2006-11-17 00:08:16.0 -0500
+++ linux-2.6.19-rc6-reloc-root/include/asm-x86_64/suspend.h2006-11-17 
00:08:16.0 -0500
@@ -17,6 +17,7 @@ struct saved_context {
u16 ds, es, fs, gs, ss;
unsigned long gs_base, gs_kernel_base, fs_base;
unsigned long cr0, cr2, cr3, cr4, cr8;
+   unsigned long efer;
u16 gdt_pad;
u16 gdt_limit;
unsigned long gdt_base;
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/20] x86_64: Modify discover_ebda to use virtual addresses

2007-03-06 Thread Vivek Goyal


Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/setup.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff -puN 
arch/x86_64/kernel/setup.c~x86_64-Modify-discover_ebda-to-use-virtual-addresses 
arch/x86_64/kernel/setup.c
--- 
linux-2.6.21-rc2-reloc/arch/x86_64/kernel/setup.c~x86_64-Modify-discover_ebda-to-use-virtual-addresses
  2007-03-07 01:28:51.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/setup.c  2007-03-07 
01:28:51.0 +0530
@@ -205,10 +205,10 @@ static void discover_ebda(void)
 * there is a real-mode segmented pointer pointing to the 
 * 4K EBDA area at 0x40E
 */
-   ebda_addr = *(unsigned short *)EBDA_ADDR_POINTER;
+   ebda_addr = *(unsigned short *)__va(EBDA_ADDR_POINTER);
ebda_addr <<= 4;
 
-   ebda_size = *(unsigned short *)(unsigned long)ebda_addr;
+   ebda_size = *(unsigned short *)__va(ebda_addr);
 
/* Round EBDA up to pages */
if (ebda_size == 0)
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/20] swsusp: do not use virt_to_page on kernel data address

2007-03-06 Thread Vivek Goyal


o virt_to_page() call should be used on kernel linear addresses and not
  on kernel text and data addresses. Swsusp code uses it on kernel data
  (statically allocated swsusp_header).

o Allocate swsusp_header dynamically so that virt_to_page() can be used
  safely.

o I am changing this because in next few patches, __pa() on x86_64 will
  no longer support kernel text and data addresses and hibernation breaks. 

Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 kernel/power/swap.c |   42 +++---
 1 file changed, 27 insertions(+), 15 deletions(-)

diff -puN 
kernel/power/swap.c~swsusp-do-not-use-virt_to_page-on-kernel-data-addr 
kernel/power/swap.c
--- 
linux-2.6.21-rc2-reloc/kernel/power/swap.c~swsusp-do-not-use-virt_to_page-on-kernel-data-addr
   2007-03-07 01:30:43.0 +0530
+++ linux-2.6.21-rc2-reloc-root/kernel/power/swap.c 2007-03-07 
01:30:43.0 +0530
@@ -33,12 +33,14 @@ extern char resume_file[];
 
 #define SWSUSP_SIG "S1SUSPEND"
 
-static struct swsusp_header {
+struct swsusp_header {
char reserved[PAGE_SIZE - 20 - sizeof(sector_t)];
sector_t image;
charorig_sig[10];
charsig[10];
-} __attribute__((packed, aligned(PAGE_SIZE))) swsusp_header;
+} __attribute__((packed));
+
+static struct swsusp_header *swsusp_header;
 
 /*
  * General things
@@ -141,14 +143,14 @@ static int mark_swapfiles(sector_t start
 {
int error;
 
-   bio_read_page(swsusp_resume_block, _header, NULL);
-   if (!memcmp("SWAP-SPACE",swsusp_header.sig, 10) ||
-   !memcmp("SWAPSPACE2",swsusp_header.sig, 10)) {
-   memcpy(swsusp_header.orig_sig,swsusp_header.sig, 10);
-   memcpy(swsusp_header.sig,SWSUSP_SIG, 10);
-   swsusp_header.image = start;
+   bio_read_page(swsusp_resume_block, swsusp_header, NULL);
+   if (!memcmp("SWAP-SPACE",swsusp_header->sig, 10) ||
+   !memcmp("SWAPSPACE2",swsusp_header->sig, 10)) {
+   memcpy(swsusp_header->orig_sig,swsusp_header->sig, 10);
+   memcpy(swsusp_header->sig,SWSUSP_SIG, 10);
+   swsusp_header->image = start;
error = bio_write_page(swsusp_resume_block,
-   _header, NULL);
+   swsusp_header, NULL);
} else {
printk(KERN_ERR "swsusp: Swap header not found!\n");
error = -ENODEV;
@@ -564,7 +566,7 @@ int swsusp_read(void)
if (error < PAGE_SIZE)
return error < 0 ? error : -EFAULT;
header = (struct swsusp_info *)data_of(snapshot);
-   error = get_swap_reader(, swsusp_header.image);
+   error = get_swap_reader(, swsusp_header->image);
if (!error)
error = swap_read_page(, header, NULL);
if (!error)
@@ -591,17 +593,17 @@ int swsusp_check(void)
resume_bdev = open_by_devnum(swsusp_resume_device, FMODE_READ);
if (!IS_ERR(resume_bdev)) {
set_blocksize(resume_bdev, PAGE_SIZE);
-   memset(_header, 0, sizeof(swsusp_header));
+   memset(swsusp_header, 0, sizeof(PAGE_SIZE));
error = bio_read_page(swsusp_resume_block,
-   _header, NULL);
+   swsusp_header, NULL);
if (error)
return error;
 
-   if (!memcmp(SWSUSP_SIG, swsusp_header.sig, 10)) {
-   memcpy(swsusp_header.sig, swsusp_header.orig_sig, 10);
+   if (!memcmp(SWSUSP_SIG, swsusp_header->sig, 10)) {
+   memcpy(swsusp_header->sig, swsusp_header->orig_sig, 10);
/* Reset swap signature now */
error = bio_write_page(swsusp_resume_block,
-   _header, NULL);
+   swsusp_header, NULL);
} else {
return -EINVAL;
}
@@ -632,3 +634,13 @@ void swsusp_close(void)
 
blkdev_put(resume_bdev);
 }
+
+static int swsusp_header_init(void)
+{
+   swsusp_header = (struct swsusp_header*) __get_free_page(GFP_KERNEL);
+   if (!swsusp_header)
+   panic("Could not allocate memory for swsusp_header\n");
+   return 0;
+}
+
+core_initcall(swsusp_header_init);
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 20/20] x86_64: Move cpu verification code to common file

2007-03-06 Thread Vivek Goyal


o This patch moves the code to verify long mode and SSE to a common file.
  This code is now shared by trampoline.S, wakeup.S, boot/setup.S and
  boot/compressed/head.S

o So far we used to do very limited check in trampoline.S, wakeup.S and
  in 32bit entry point. Now all the entry paths are forced to do the
  exhaustive check, including SSE because verify_cpu is shared.

o I am keeping this patch as last in the x86 relocatable series because
  previous patches have got quite some amount of testing done and don't want
  to distrub that. So that if there is problem introduced by this patch, at
  least it can be easily isolated.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/x86_64/boot/compressed/head.S |   19 ++
 arch/x86_64/boot/setup.S   |   65 ++---
 arch/x86_64/kernel/acpi/wakeup.S   |   30 +-
 arch/x86_64/kernel/trampoline.S|   51 +
 arch/x86_64/kernel/verify_cpu.S|  110 +
 5 files changed, 152 insertions(+), 123 deletions(-)

diff -puN 
arch/x86_64/boot/compressed/head.S~x86_64-move-cpu-verfication-code-to-common-file
 arch/x86_64/boot/compressed/head.S
--- 
linux-2.6.21-rc2-reloc/arch/x86_64/boot/compressed/head.S~x86_64-move-cpu-verfication-code-to-common-file
   2007-03-07 01:32:27.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/boot/compressed/head.S  
2007-03-07 01:32:27.0 +0530
@@ -54,6 +54,15 @@ startup_32:
 1: popl%ebp
subl$1b, %ebp
 
+/* setup a stack and make sure cpu supports long mode. */
+   movl$user_stack_end, %eax
+   addl%ebp, %eax
+   movl%eax, %esp
+
+   callverify_cpu
+   testl   %eax, %eax
+   jnz no_longmode
+
 /* Compute the delta between where we were compiled to run at
  * and where the code will actually run at.
  */
@@ -159,13 +168,21 @@ startup_32:
/* Jump from 32bit compatibility mode into 64bit mode. */
lret
 
+no_longmode:
+   /* This isn't an x86-64 CPU so hang */
+1:
+   hlt
+   jmp 1b
+
+#include "../../kernel/verify_cpu.S"
+
/* Be careful here startup_64 needs to be at a predictable
 * address so I can export it in an ELF header.  Bootloaders
 * should look at the ELF header to find this address, as
 * it may change in the future.
 */
.code64
-   .org 0x100
+   .org 0x200
 ENTRY(startup_64)
/* We come here either from startup_32 or directly from a
 * 64bit bootloader.  If we come here from a bootloader we depend on
diff -puN 
arch/x86_64/boot/setup.S~x86_64-move-cpu-verfication-code-to-common-file 
arch/x86_64/boot/setup.S
--- 
linux-2.6.21-rc2-reloc/arch/x86_64/boot/setup.S~x86_64-move-cpu-verfication-code-to-common-file
 2007-03-07 01:32:27.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/boot/setup.S2007-03-07 
01:32:27.0 +0530
@@ -299,64 +299,10 @@ loader_ok:
movw%cs,%ax
movw%ax,%ds

-   /* minimum CPUID flags for x86-64 */
-   /* see http://www.x86-64.org/lists/discuss/msg02971.html */ 
-#define SSE_MASK ((1<<25)|(1<<26))
-#define REQUIRED_MASK1 ((1<<0)|(1<<3)|(1<<4)|(1<<5)|(1<<6)|(1<<8)|\
-  (1<<13)|(1<<15)|(1<<24))
-#define REQUIRED_MASK2 (1<<29)
-
-   pushfl  /* standard way to check for cpuid */
-   popl%eax
-   movl%eax,%ebx
-   xorl$0x20,%eax
-   pushl   %eax
-   popfl
-   pushfl
-   popl%eax
-   cmpl%eax,%ebx
-   jz  no_longmode /* cpu has no cpuid */
-   movl$0x0,%eax
-   cpuid
-   cmpl$0x1,%eax
-   jb  no_longmode /* no cpuid 1 */
-   xor %di,%di
-   cmpl$0x68747541,%ebx/* AuthenticAMD */
-   jnz noamd
-   cmpl$0x69746e65,%edx
-   jnz noamd
-   cmpl$0x444d4163,%ecx
-   jnz noamd
-   mov $1,%di  /* cpu is from AMD */
-noamd: 
-   movl$0x1,%eax
-   cpuid
-   andl$REQUIRED_MASK1,%edx
-   xorl$REQUIRED_MASK1,%edx
-   jnz no_longmode
-   movl$0x8000,%eax
-   cpuid
-   cmpl$0x8001,%eax
-   jb  no_longmode /* no extended cpuid */
-   movl$0x8001,%eax
-   cpuid
-   andl$REQUIRED_MASK2,%edx
-   xorl$REQUIRED_MASK2,%edx
-   jnz no_longmode
-sse_test:  
-   movl$1,%eax
-   cpuid
-   andl$SSE_MASK,%edx
-   cmpl$SSE_MASK,%edx
-   je  sse_ok
-   test%di,%di
-   jz  no_longmode /* only try to force SSE on AMD */ 
-   movl$0xc0010015,%ecx/* HWCR */
-   rdmsr
-   btr $15,%eax/* enable SSE */
-   wrmsr
-   xor %di,%di /* 

[PATCH 2/20] x86_64: Kill temp boot pmds

2007-03-06 Thread Vivek Goyal


Early in the boot process we need the ability to set
up temporary mappings, before our normal mechanisms are
initialized.  Currently this is used to map pages that
are part of the page tables we are building and pages
during the dmi scan.

The core problem is that we are using the user portion of
the page tables to implement this.  Which means that while
this mechanism is active we cannot catch NULL pointer dereferences
and we deviate from the normal ways of handling things.

In this patch I modify early_ioremap to map pages into
the kernel portion of address space, roughly where
we will later put modules, and I make the discovery of
which addresses we can use dynamic which removes all
kinds of static limits and remove the dependencies
on implementation details between different parts of the code.

Now alloc_low_page() and unmap_low_page() use 
early_iomap() and early_iounmap() to allocate/map and 
unmap a page.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/head.S |3 -
 arch/x86_64/mm/init.c |  100 --
 2 files changed, 45 insertions(+), 58 deletions(-)

diff -puN arch/x86_64/kernel/head.S~x86_64-Kill-temp_boot_pmds 
arch/x86_64/kernel/head.S
--- linux-2.6.21-rc2-reloc/arch/x86_64/kernel/head.S~x86_64-Kill-temp_boot_pmds 
2007-03-07 01:21:26.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/head.S   2007-03-07 
01:21:26.0 +0530
@@ -288,9 +288,6 @@ NEXT_PAGE(level2_ident_pgt)
.quad   i << 21 | 0x083
i = i + 1
.endr
-   /* Temporary mappings for the super early allocator in 
arch/x86_64/mm/init.c */
-   .globl temp_boot_pmds
-temp_boot_pmds:
.fill   492,8,0

 NEXT_PAGE(level2_kernel_pgt)
diff -puN arch/x86_64/mm/init.c~x86_64-Kill-temp_boot_pmds arch/x86_64/mm/init.c
--- linux-2.6.21-rc2-reloc/arch/x86_64/mm/init.c~x86_64-Kill-temp_boot_pmds 
2007-03-07 01:21:26.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/mm/init.c   2007-03-07 
01:21:26.0 +0530
@@ -167,23 +167,9 @@ __set_fixmap (enum fixed_addresses idx, 
 
 unsigned long __initdata table_start, table_end; 
 
-extern pmd_t temp_boot_pmds[]; 
-
-static  struct temp_map { 
-   pmd_t *pmd;
-   void  *address; 
-   intallocated; 
-} temp_mappings[] __initdata = { 
-   { _boot_pmds[0], (void *)(40UL * 1024 * 1024) },
-   { _boot_pmds[1], (void *)(42UL * 1024 * 1024) }, 
-   {}
-}; 
-
-static __meminit void *alloc_low_page(int *index, unsigned long *phys)
+static __meminit void *alloc_low_page(unsigned long *phys)
 { 
-   struct temp_map *ti;
-   int i; 
-   unsigned long pfn = table_end++, paddr; 
+   unsigned long pfn = table_end++;
void *adr;
 
if (after_bootmem) {
@@ -194,57 +180,63 @@ static __meminit void *alloc_low_page(in
 
if (pfn >= end_pfn) 
panic("alloc_low_page: ran out of memory"); 
-   for (i = 0; temp_mappings[i].allocated; i++) {
-   if (!temp_mappings[i].pmd) 
-   panic("alloc_low_page: ran out of temp mappings"); 
-   } 
-   ti = _mappings[i];
-   paddr = (pfn << PAGE_SHIFT) & PMD_MASK; 
-   set_pmd(ti->pmd, __pmd(paddr | _KERNPG_TABLE | _PAGE_PSE)); 
-   ti->allocated = 1; 
-   __flush_tlb(); 
-   adr = ti->address + ((pfn << PAGE_SHIFT) & ~PMD_MASK); 
+
+   adr = early_ioremap(pfn * PAGE_SIZE, PAGE_SIZE);
memset(adr, 0, PAGE_SIZE);
-   *index = i; 
-   *phys  = pfn * PAGE_SIZE;  
-   return adr; 
-} 
+   *phys  = pfn * PAGE_SIZE;
+   return adr;
+}
 
-static __meminit void unmap_low_page(int i)
+static __meminit void unmap_low_page(void *adr)
 { 
-   struct temp_map *ti;
 
if (after_bootmem)
return;
 
-   ti = _mappings[i];
-   set_pmd(ti->pmd, __pmd(0));
-   ti->allocated = 0; 
+   early_iounmap(adr, PAGE_SIZE);
 } 
 
 /* Must run before zap_low_mappings */
 __init void *early_ioremap(unsigned long addr, unsigned long size)
 {
-   unsigned long map = round_down(addr, LARGE_PAGE_SIZE); 
-
-   /* actually usually some more */
-   if (size >= LARGE_PAGE_SIZE) { 
-   return NULL;
+   unsigned long vaddr;
+   pmd_t *pmd, *last_pmd;
+   int i, pmds;
+
+   pmds = ((addr & ~PMD_MASK) + size + ~PMD_MASK) / PMD_SIZE;
+   vaddr = __START_KERNEL_map;
+   pmd = level2_kernel_pgt;
+   last_pmd = level2_kernel_pgt + PTRS_PER_PMD - 1;
+   for (; pmd <= last_pmd; pmd++, vaddr += PMD_SIZE) {
+   for (i = 0; i < pmds; i++) {
+   if (pmd_present(pmd[i]))
+   goto next;
+   }
+   vaddr += addr & ~PMD_MASK;
+   addr &= PMD_MASK;
+   for (i = 0; i < pmds; i++, addr += PMD_SIZE)
+   set_pmd(pmd + i,__pmd(addr | 

[PATCH 0/20] x86_64 Relocatable bzImage support (V4)

2007-03-06 Thread Vivek Goyal
Hi,

Here is another attempt on x86_64 relocatable bzImage patches(V4). This
patchset makes a bzImage relocatable and same kernel binary can be loaded
and run from different physical addresses.

As on now, this mainly helps distros who have to ship an extra kernel compiled
for a different physical address to capture the kernel crash dump. This
patchset will allow distros and kdump users to use production kernel itself
as dump capture kernel and there is no need to ship/build an extra kernel.
I am hopeful people will find other interesting usages down the line.

Eric has done all the heavy weight lifting requird to make this patchset
work. Last time I posted this patchset (V3), there were minor comments
which I have taken care of. Following are the changes since V3.

- Reduced the usage of _AC() macro to only shift operations, as per 
  Andi's comment.
- Restored the CONFIG_PHYSICAL_START option.
- Fixed few bugs with suspend to disk code path.

It would be good if these patches get into -mm so that it can undergo more
testing. I have been testing them and these just work fine for me.

Thanks
Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/20] x86_64: wakeup.S rename registers to reflect right names

2007-03-06 Thread Vivek Goyal


o Use appropriate names for 64bit regsiters.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/acpi/wakeup.S |   36 ++--
 include/asm-x86_64/suspend.h |   12 ++--
 2 files changed, 24 insertions(+), 24 deletions(-)

diff -puN 
arch/x86_64/kernel/acpi/wakeup.S~x86_64-wakeup.S-rename-registers-to-reflect-right-names
 arch/x86_64/kernel/acpi/wakeup.S
--- 
linux-2.6.21-rc2-reloc/arch/x86_64/kernel/acpi/wakeup.S~x86_64-wakeup.S-rename-registers-to-reflect-right-names
 2007-03-07 01:26:55.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/acpi/wakeup.S
2007-03-07 01:26:55.0 +0530
@@ -211,16 +211,16 @@ wakeup_long64:
movw%ax, %es
movw%ax, %fs
movw%ax, %gs
-   movqsaved_esp, %rsp
+   movqsaved_rsp, %rsp
 
movw$0x0e00 + 'x', %ds:(0xb8018)
-   movqsaved_ebx, %rbx
-   movqsaved_edi, %rdi
-   movqsaved_esi, %rsi
-   movqsaved_ebp, %rbp
+   movqsaved_rbx, %rbx
+   movqsaved_rdi, %rdi
+   movqsaved_rsi, %rsi
+   movqsaved_rbp, %rbp
 
movw$0x0e00 + '!', %ds:(0xb801a)
-   movqsaved_eip, %rax
+   movqsaved_rip, %rax
jmp *%rax
 
 .code32
@@ -408,13 +408,13 @@ do_suspend_lowlevel:
movq %r15, saved_context_r15(%rip)
pushfq ; popq saved_context_eflags(%rip)
 
-   movq$.L97, saved_eip(%rip)
+   movq$.L97, saved_rip(%rip)
 
-   movq %rsp,saved_esp
-   movq %rbp,saved_ebp
-   movq %rbx,saved_ebx
-   movq %rdi,saved_edi
-   movq %rsi,saved_esi
+   movq %rsp,saved_rsp
+   movq %rbp,saved_rbp
+   movq %rbx,saved_rbx
+   movq %rdi,saved_rdi
+   movq %rsi,saved_rsi
 
addq$8, %rsp
movl$3, %edi
@@ -461,12 +461,12 @@ do_suspend_lowlevel:

 .data
 ALIGN
-ENTRY(saved_ebp)   .quad   0
-ENTRY(saved_esi)   .quad   0
-ENTRY(saved_edi)   .quad   0
-ENTRY(saved_ebx)   .quad   0
+ENTRY(saved_rbp)   .quad   0
+ENTRY(saved_rsi)   .quad   0
+ENTRY(saved_rdi)   .quad   0
+ENTRY(saved_rbx)   .quad   0
 
-ENTRY(saved_eip)   .quad   0
-ENTRY(saved_esp)   .quad   0
+ENTRY(saved_rip)   .quad   0
+ENTRY(saved_rsp)   .quad   0
 
 ENTRY(saved_magic) .quad   0
diff -puN 
include/asm-x86_64/suspend.h~x86_64-wakeup.S-rename-registers-to-reflect-right-names
 include/asm-x86_64/suspend.h
--- 
linux-2.6.21-rc2-reloc/include/asm-x86_64/suspend.h~x86_64-wakeup.S-rename-registers-to-reflect-right-names
 2007-03-07 01:26:55.0 +0530
+++ linux-2.6.21-rc2-reloc-root/include/asm-x86_64/suspend.h2007-03-07 
01:26:55.0 +0530
@@ -45,12 +45,12 @@ extern unsigned long saved_context_eflag
 extern void fix_processor_context(void);
 
 #ifdef CONFIG_ACPI_SLEEP
-extern unsigned long saved_eip;
-extern unsigned long saved_esp;
-extern unsigned long saved_ebp;
-extern unsigned long saved_ebx;
-extern unsigned long saved_esi;
-extern unsigned long saved_edi;
+extern unsigned long saved_rip;
+extern unsigned long saved_rsp;
+extern unsigned long saved_rbp;
+extern unsigned long saved_rbx;
+extern unsigned long saved_rsi;
+extern unsigned long saved_rdi;
 
 /* routines for saving/restoring kernel state */
 extern int acpi_save_state_mem(void);
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 0/7] Resource controllers based on process containers

2007-03-06 Thread Pavel Emelianov
Balbir Singh wrote:
> Pavel Emelianov wrote:
>> This patchset adds RSS, accounting and control and
>> limiting the number of tasks and files within container.
>>
>> Based on top of Paul Menage's container subsystem v7
>>
>> RSS controller includes per-container RSS accounter,
>> reclamation and OOM killer. It behaves like standalone
>> machine - when container runs out of resources it tries
>> to reclaim some pages and if it doesn't succeed in it
>> kills some task which mm_struct belongs to container in
>> question.
>>
>> Num tasks and files containers are very simple and
>> self-descriptive from code.
>>
>> As discussed before when a task moves from one container
>> to another no resources follow it - they keep holding the
>> container they were allocated in.
>>
> 
> I have one problem with the patchset, I cannot compile
> the patches individually and some of the code is hard
> to read as it depends on functions from future patches.
> Patch 2, 3 and 4 fail to compile without patch 5 applied.
> 
> Patch 1 failed to apply with a reject in kernel/Makefile
> I applied it on top of 2.6.20 with all of Paul Menage's
> patches (all 7).

This sounds weird for me :( I've taken a stock 2.6.20
and applied Paul's patches. This is what this patchset
is applicable for.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/20] x86_64: Fix early printk to use standard ISA mapping

2007-03-06 Thread Vivek Goyal



Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/early_printk.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff -puN 
arch/x86_64/kernel/early_printk.c~x86_64-fix-early_printk-to-use-the-standard-ISA-mapping
 arch/x86_64/kernel/early_printk.c
--- 
linux-2.6.21-rc2-reloc/arch/x86_64/kernel/early_printk.c~x86_64-fix-early_printk-to-use-the-standard-ISA-mapping
2007-03-07 01:22:33.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/early_printk.c   
2007-03-07 01:22:33.0 +0530
@@ -11,11 +11,10 @@
 
 #ifdef __i386__
 #include 
-#define VGABASE(__ISA_IO_base + 0xb8000)
 #else
 #include 
-#define VGABASE((void __iomem *)0x800b8000UL)
 #endif
+#define VGABASE(__ISA_IO_base + 0xb8000)
 
 static int max_ypos = 25, max_xpos = 80;
 static int current_ypos = 25, current_xpos = 0;
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/20] x86_64: Assembly safe page.h and pgtable.h

2007-03-06 Thread Vivek Goyal


This patch makes pgtable.h and page.h safe to include
in assembly files like head.S.  Allowing us to use
symbolic constants instead of hard coded numbers when
refering to the page tables.

This patch copies asm-sparc64/const.h to asm-x86_64 to
get a definition of _AC() a very convinient macro that
allows us to force the type when we are compiling the
code in C and to drop all of the type information when
we are using the constant in assembly.  Previously this
was done with multiple definition of the same constant.
const.h was modified slightly so that it works when given
CONFIG options as arguments.

This patch adds #ifndef __ASSEMBLY__ ... #endif
and _AC(1,UL) where appropriate so the assembler won't
choke on the header files.  Otherwise nothing
should have changed.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 include/asm-x86_64/const.h   |   20 
 include/asm-x86_64/page.h|   28 ++--
 include/asm-x86_64/pgtable.h |   33 +
 3 files changed, 51 insertions(+), 30 deletions(-)

diff -puN /dev/null include/asm-x86_64/const.h
--- /dev/null   2007-03-07 00:46:17.354096448 +0530
+++ linux-2.6.21-rc2-reloc-root/include/asm-x86_64/const.h  2007-03-07 
01:20:54.0 +0530
@@ -0,0 +1,20 @@
+/* const.h: Macros for dealing with constants.  */
+
+#ifndef _X86_64_CONST_H
+#define _X86_64_CONST_H
+
+/* Some constant macros are used in both assembler and
+ * C code.  Therefore we cannot annotate them always with
+ * 'UL' and other type specificers unilaterally.  We
+ * use the following macros to deal with this.
+ */
+
+#ifdef __ASSEMBLY__
+#define _AC(X,Y)   X
+#else
+#define __AC(X,Y)  (X##Y)
+#define _AC(X,Y)   __AC(X,Y)
+#endif
+
+
+#endif /* !(_X86_64_CONST_H) */
diff -puN include/asm-x86_64/page.h~x86_64-Assembly-safe-page.h-and-pgtable.h 
include/asm-x86_64/page.h
--- 
linux-2.6.21-rc2-reloc/include/asm-x86_64/page.h~x86_64-Assembly-safe-page.h-and-pgtable.h
  2007-03-07 01:20:54.0 +0530
+++ linux-2.6.21-rc2-reloc-root/include/asm-x86_64/page.h   2007-03-07 
01:20:54.0 +0530
@@ -1,14 +1,11 @@
 #ifndef _X86_64_PAGE_H
 #define _X86_64_PAGE_H
 
+#include 
 
 /* PAGE_SHIFT determines the page size */
 #define PAGE_SHIFT 12
-#ifdef __ASSEMBLY__
-#define PAGE_SIZE  (0x1 << PAGE_SHIFT)
-#else
-#define PAGE_SIZE  (1UL << PAGE_SHIFT)
-#endif
+#define PAGE_SIZE  (_AC(1,UL) << PAGE_SHIFT)
 #define PAGE_MASK  (~(PAGE_SIZE-1))
 #define PHYSICAL_PAGE_MASK (~(PAGE_SIZE-1) & __PHYSICAL_MASK)
 
@@ -33,10 +30,10 @@
 #define N_EXCEPTION_STACKS 5  /* hw limit: 7 */
 
 #define LARGE_PAGE_MASK (~(LARGE_PAGE_SIZE-1))
-#define LARGE_PAGE_SIZE (1UL << PMD_SHIFT)
+#define LARGE_PAGE_SIZE (_AC(1,UL) << PMD_SHIFT)
 
 #define HPAGE_SHIFT PMD_SHIFT
-#define HPAGE_SIZE ((1UL) << HPAGE_SHIFT)
+#define HPAGE_SIZE (_AC(1,UL) << HPAGE_SHIFT)
 #define HPAGE_MASK (~(HPAGE_SIZE - 1))
 #define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT)
 
@@ -76,29 +73,24 @@ typedef struct { unsigned long pgprot; }
 #define __pgd(x) ((pgd_t) { (x) } )
 #define __pgprot(x)((pgprot_t) { (x) } )
 
-#define __PHYSICAL_START   ((unsigned long)CONFIG_PHYSICAL_START)
-#define __START_KERNEL (__START_KERNEL_map + __PHYSICAL_START)
-#define __START_KERNEL_map 0x8000UL
-#define __PAGE_OFFSET   0x8100UL
+#endif /* !__ASSEMBLY__ */
 
-#else
 #define __PHYSICAL_START   CONFIG_PHYSICAL_START
 #define __START_KERNEL (__START_KERNEL_map + __PHYSICAL_START)
 #define __START_KERNEL_map 0x8000
 #define __PAGE_OFFSET   0x8100
-#endif /* !__ASSEMBLY__ */
 
 /* to align the pointer to the (next) page boundary */
 #define PAGE_ALIGN(addr)   (((addr)+PAGE_SIZE-1)_MASK)
 
 /* See Documentation/x86_64/mm.txt for a description of the memory map. */
 #define __PHYSICAL_MASK_SHIFT  46
-#define __PHYSICAL_MASK((1UL << __PHYSICAL_MASK_SHIFT) - 1)
+#define __PHYSICAL_MASK((_AC(1,UL) << __PHYSICAL_MASK_SHIFT) - 
1)
 #define __VIRTUAL_MASK_SHIFT   48
-#define __VIRTUAL_MASK ((1UL << __VIRTUAL_MASK_SHIFT) - 1)
+#define __VIRTUAL_MASK ((_AC(1,UL) << __VIRTUAL_MASK_SHIFT) - 1)
 
-#define KERNEL_TEXT_SIZE  (40UL*1024*1024)
-#define KERNEL_TEXT_START 0x8000UL 
+#define KERNEL_TEXT_SIZE  (40*1024*1024)
+#define KERNEL_TEXT_START 0x8000
 
 #ifndef __ASSEMBLY__
 
@@ -106,7 +98,7 @@ typedef struct { unsigned long pgprot; }
 
 #endif /* __ASSEMBLY__ */
 
-#define PAGE_OFFSET((unsigned long)__PAGE_OFFSET)
+#define PAGE_OFFSET__PAGE_OFFSET
 
 /* Note: __pa(_visible_to_c) should be always replaced with __pa_symbol.
Otherwise you risk miscompilation. */ 
diff -puN 
include/asm-x86_64/pgtable.h~x86_64-Assembly-safe-page.h-and-pgtable.h 
include/asm-x86_64/pgtable.h
--- 

[PATCH 18/20] x86_64: Relocatable Kernel Support

2007-03-06 Thread Vivek Goyal


This patch modifies the x86_64 kernel so that it can be loaded and run
at any 2M aligned address, below 512G.  The technique used is to
compile the decompressor with -fPIC and modify it so the decompressor
is fully relocatable.  For the main kernel the page tables are
modified so the kernel remains at the same virtual address.  In
addition a variable phys_base is kept that holds the physical address
the kernel is loaded at.  __pa_symbol is modified to add that when
we take the address of a kernel symbol.

When loaded with a normal bootloader the decompressor will decompress
the kernel to 2M and it will run there.  This both ensures the
relocation code is always working, and makes it easier to use 2M
pages for the kernel and the cpu.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/x86_64/Kconfig |   50 
 arch/x86_64/boot/compressed/Makefile|   12 -
 arch/x86_64/boot/compressed/head.S  |  322 +++-
 arch/x86_64/boot/compressed/misc.c  |  251 +---
 arch/x86_64/boot/compressed/vmlinux.lds |   44 
 arch/x86_64/boot/compressed/vmlinux.scr |9 
 arch/x86_64/kernel/head.S   |  225 --
 arch/x86_64/kernel/suspend_asm.S|7 
 include/asm-x86_64/page.h   |6 
 9 files changed, 596 insertions(+), 330 deletions(-)

diff -puN arch/x86_64/boot/compressed/head.S~x86_64-Relocatable-kernel-support 
arch/x86_64/boot/compressed/head.S
--- 
linux-2.6.21-rc2-reloc/arch/x86_64/boot/compressed/head.S~x86_64-Relocatable-kernel-support
 2007-03-07 01:31:35.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/boot/compressed/head.S  
2007-03-07 01:31:35.0 +0530
@@ -26,116 +26,262 @@
 
 #include 
 #include 
+#include 
 #include 
+#include 
 
+.section ".text.head"
.code32
.globl startup_32
-   
+
 startup_32:
cld
cli
-   movl $(__KERNEL_DS),%eax
-   movl %eax,%ds
-   movl %eax,%es
-   movl %eax,%fs
-   movl %eax,%gs
-
-   lss stack_start,%esp
-   xorl %eax,%eax
-1: incl %eax   # check that A20 really IS enabled
-   movl %eax,0x00  # loop forever if it isn't
-   cmpl %eax,0x10
-   je 1b
+   movl$(__KERNEL_DS), %eax
+   movl%eax, %ds
+   movl%eax, %es
+   movl%eax, %ss
+
+/* Calculate the delta between where we were compiled to run
+ * at and where we were actually loaded at.  This can only be done
+ * with a short local call on x86.  Nothing  else will tell us what
+ * address we are running at.  The reserved chunk of the real-mode
+ * data at 0x34-0x3f are used as the stack for this calculation.
+ * Only 4 bytes are needed.
+ */
+   leal0x40(%esi), %esp
+   call1f
+1: popl%ebp
+   subl$1b, %ebp
+
+/* Compute the delta between where we were compiled to run at
+ * and where the code will actually run at.
+ */
+/* %ebp contains the address we are loaded at by the boot loader and %ebx
+ * contains the address where we should move the kernel image temporarily
+ * for safe in-place decompression.
+ */
+
+#ifdef CONFIG_RELOCATABLE
+   movl%ebp, %ebx
+   addl$(LARGE_PAGE_SIZE -1), %ebx
+   andl$LARGE_PAGE_MASK, %ebx
+#else
+   movl$CONFIG_PHYSICAL_START, %ebx
+#endif
+
+   /* Replace the compressed data size with the uncompressed size */
+   sublinput_len(%ebp), %ebx
+   movloutput_len(%ebp), %eax
+   addl%eax, %ebx
+   /* Add 8 bytes for every 32K input block */
+   shrl$12, %eax
+   addl%eax, %ebx
+   /* Add 32K + 18 bytes of extra slack and align on a 4K boundary */
+   addl$(32768 + 18 + 4095), %ebx
+   andl$~4095, %ebx
 
 /*
- * Initialize eflags.  Some BIOS's leave bits like NT set.  This would
- * confuse the debugger if this code is traced.
- * XXX - best to initialize before switching to protected mode.
+ * Prepare for entering 64 bit mode
  */
-   pushl $0
-   popfl
+
+   /* Load new GDT with the 64bit segments using 32bit descriptor */
+   lealgdt(%ebp), %eax
+   movl%eax, gdt+2(%ebp)
+   lgdtgdt(%ebp)
+
+   /* Enable PAE mode */
+   xorl%eax, %eax
+   orl $(1 << 5), %eax
+   movl%eax, %cr4
+
+ /*
+  * Build early 4G boot pagetable
+  */
+   /* Initialize Page tables to 0*/
+   lealpgtable(%ebx), %edi
+   xorl%eax, %eax
+   movl$((4096*6)/4), %ecx
+   rep stosl
+
+   /* Build Level 4 */
+   lealpgtable + 0(%ebx), %edi
+   leal0x1007 (%edi), %eax
+   movl%eax, 0(%edi)
+
+   /* Build Level 3 */
+   lealpgtable + 0x1000(%ebx), %edi
+   leal0x1007(%edi), %eax
+   movl$4, %ecx
+1: movl%eax, 0x00(%edi)
+   addl$0x1000, %eax
+   addl$8, %edi
+   decl%ecx
+

[PATCH 6/20] x86_64: cleanup segments

2007-03-06 Thread Vivek Goyal


Move __KERNEL32_CS up into the unused gdt entry.  __KERNEL32_CS is
used when entering the kernel so putting it first is useful when
trying to keep boot gdt sizes to a minimum.

Set the accessed bit on all gdt entries.  We don't care
so there is no need for the cpu to burn the extra cycles,
and it potentially allows the pages to be immutable.  Plus
it is confusing when debugging and your gdt entries mysteriously
change.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/head.S|   12 ++--
 include/asm-x86_64/segment.h |2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff -puN arch/x86_64/kernel/head.S~x86_64-cleanup-segments 
arch/x86_64/kernel/head.S
--- linux-2.6.21-rc2-reloc/arch/x86_64/kernel/head.S~x86_64-cleanup-segments
2007-03-07 01:24:53.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/head.S   2007-03-07 
01:24:53.0 +0530
@@ -362,13 +362,13 @@ gdt:

 ENTRY(cpu_gdt_table)
.quad   0x  /* NULL descriptor */
+   .quad   0x00cf9b00  /* __KERNEL32_CS */
+   .quad   0x00af9b00  /* __KERNEL_CS */
+   .quad   0x00cf9300  /* __KERNEL_DS */
+   .quad   0x00cffb00  /* __USER32_CS */
+   .quad   0x00cff300  /* __USER_DS, __USER32_DS  */
+   .quad   0x00affb00  /* __USER_CS */
.quad   0x0 /* unused */
-   .quad   0x00af9a00  /* __KERNEL_CS */
-   .quad   0x00cf9200  /* __KERNEL_DS */
-   .quad   0x00cffa00  /* __USER32_CS */
-   .quad   0x00cff200  /* __USER_DS, __USER32_DS  */   
-   .quad   0x00affa00  /* __USER_CS */
-   .quad   0x00cf9a00  /* __KERNEL32_CS */
.quad   0,0 /* TSS */
.quad   0,0 /* LDT */
.quad   0,0,0   /* three TLS descriptors */ 
diff -puN include/asm-x86_64/segment.h~x86_64-cleanup-segments 
include/asm-x86_64/segment.h
--- linux-2.6.21-rc2-reloc/include/asm-x86_64/segment.h~x86_64-cleanup-segments 
2007-03-07 01:24:53.0 +0530
+++ linux-2.6.21-rc2-reloc-root/include/asm-x86_64/segment.h2007-03-07 
01:24:53.0 +0530
@@ -6,7 +6,7 @@
 #define __KERNEL_CS0x10
 #define __KERNEL_DS0x18
 
-#define __KERNEL32_CS   0x38
+#define __KERNEL32_CS   0x08
 
 /* 
  * we cannot use the same code segment descriptor for user and kernel
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 9/20] x86_64: Get rid of dead code in suspend resume

2007-03-06 Thread Vivek Goyal


o Get rid of dead code in wakeup.S

o We never restore from saved_gdt, saved_idt, saved_ltd, saved_tss, saved_cr3,
  saved_cr4, saved_cr0, real_save_gdt, saved_efer, saved_efer2. Get rid
  of of associated code.

o Get rid of bogus_magic, bogus_31_magic and bogus_magic2. No longer being
  used.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/acpi/wakeup.S |   57 ---
 1 file changed, 1 insertion(+), 56 deletions(-)

diff -puN 
arch/x86_64/kernel/acpi/wakeup.S~x86_64-get-rid-of-dead-code-in-suspend-resume 
arch/x86_64/kernel/acpi/wakeup.S
--- 
linux-2.6.21-rc2-reloc/arch/x86_64/kernel/acpi/wakeup.S~x86_64-get-rid-of-dead-code-in-suspend-resume
   2007-03-07 01:26:21.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/acpi/wakeup.S
2007-03-07 01:26:21.0 +0530
@@ -258,8 +258,6 @@ gdt_48a:
.word   0, 0# gdt base (filled in later)


-real_save_gdt: .word 0
-   .quad 0
 real_magic:.quad 0
 video_mode:.quad 0
 video_flags:   .quad 0
@@ -272,10 +270,6 @@ bogus_32_magic:
movb$0xb3,%al   ;  outb %al,$0x80
jmp bogus_32_magic
 
-bogus_31_magic:
-   movb$0xb1,%al   ;  outb %al,$0x80
-   jmp bogus_31_magic
-
 bogus_cpu:
movb$0xbc,%al   ;  outb %al,$0x80
jmp bogus_cpu
@@ -346,16 +340,6 @@ check_vesaa:
 
 _setbada: jmp setbada
 
-   .code64
-bogus_magic:
-   movw$0x0e00 + 'B', %ds:(0xb8018)
-   jmp bogus_magic
-
-bogus_magic2:
-   movw$0x0e00 + '2', %ds:(0xb8018)
-   jmp bogus_magic2
-   
-
 wakeup_stack_begin:# Stack grows down
 
 .org   0xff0
@@ -373,28 +357,11 @@ ENTRY(wakeup_end)
 #
 # Returned address is location of code in low memory (past data and stack)
 #
+   .code64
 ENTRY(acpi_copy_wakeup_routine)
pushq   %rax
-   pushq   %rcx
pushq   %rdx
 
-   sgdtsaved_gdt
-   sidtsaved_idt
-   sldtsaved_ldt
-   str saved_tss
-
-   movq%cr3, %rdx
-   movq%rdx, saved_cr3
-   movq%cr4, %rdx
-   movq%rdx, saved_cr4
-   movq%cr0, %rdx
-   movq%rdx, saved_cr0
-   sgdtreal_save_gdt - wakeup_start (,%rdi)
-   movl$MSR_EFER, %ecx
-   rdmsr
-   movl%eax, saved_efer
-   movl%edx, saved_efer2
-
movlsaved_video_mode, %edx
movl%edx, video_mode - wakeup_start (,%rdi)
movlacpi_video_flags, %edx
@@ -407,17 +374,8 @@ ENTRY(acpi_copy_wakeup_routine)
cmpl$0x9abcdef0, %eax
jne bogus_32_magic
 
-   # make sure %cr4 is set correctly (features, etc)
-   movlsaved_cr4 - __START_KERNEL_map, %eax
-   movq%rax, %cr4
-
-   movlsaved_cr0 - __START_KERNEL_map, %eax
-   movq%rax, %cr0
-   jmp 1f  # Flush pipelines
-1:
# restore the regs we used
popq%rdx
-   popq%rcx
popq%rax
 ENTRY(do_suspend_lowlevel_s4bios)
ret
@@ -512,16 +470,3 @@ ENTRY(saved_eip)   .quad   0
 ENTRY(saved_esp)   .quad   0
 
 ENTRY(saved_magic) .quad   0
-
-ALIGN
-# saved registers
-saved_gdt: .quad   0,0
-saved_idt: .quad   0,0
-saved_ldt: .quad   0
-saved_tss: .quad   0
-
-saved_cr0: .quad 0
-saved_cr3: .quad 0
-saved_cr4: .quad 0
-saved_efer:.quad 0
-saved_efer2:   .quad 0
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/20] x86_64: wakeup.S misc cleanups

2007-03-06 Thread Vivek Goyal


o Various cleanups. One of the main purpose of cleanups is that make
  wakeup.S as close as possible to trampoline.S.

o Following are the changes
- Indentations for comments.
- Changed the gdt table to compact form and to resemble the
  one in trampoline.S
- Take the jump to 32bit from real mode using ljmpl. Makes code
  more readable.
- After enabling long mode, directly take a long jump for 64bit
  mode. No need to take an extra jump to "reach_comaptibility_mode"
- Stack is not used after real mode. So don't load stack in
  32 bit mode.
- No need to enable PGE here.
- No need to do extra EFER read, anyway we trash the read contents.
- No need to enable system call (EFER_SCE). Anyway it will be 
  enabled when original EFER is restored.
- No need to set MP, ET, NE, WP, AM bits in cr0. Very soon we will
  reload the original cr0 while restroing the processor state.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/acpi/wakeup.S |  112 +--
 1 file changed, 40 insertions(+), 72 deletions(-)

diff -puN arch/x86_64/kernel/acpi/wakeup.S~x86_64-wakeup.S-misc-cleanups 
arch/x86_64/kernel/acpi/wakeup.S
--- 
linux-2.6.21-rc2-reloc/arch/x86_64/kernel/acpi/wakeup.S~x86_64-wakeup.S-misc-cleanups
   2007-03-07 01:27:32.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/acpi/wakeup.S
2007-03-07 01:27:32.0 +0530
@@ -30,11 +30,12 @@ wakeup_code:
cld
# setup data segment
movw%cs, %ax
-   movw%ax, %ds# Make ds:0 
point to wakeup_start
+   movw%ax, %ds# Make ds:0 point to wakeup_start
movw%ax, %ss
-   mov $(wakeup_stack - wakeup_code), %sp  # Private stack 
is needed for ASUS board
+   # Private stack is needed for ASUS board
+   mov $(wakeup_stack - wakeup_code), %sp
 
-   pushl   $0  # Kill any 
dangerous flags
+   pushl   $0  # Kill any dangerous flags
popfl
 
movlreal_magic - wakeup_code, %eax
@@ -45,7 +46,7 @@ wakeup_code:
jz  1f
lcall   $0xc000,$3
movw%cs, %ax
-   movw%ax, %ds# Bios might 
have played with that
+   movw%ax, %ds# Bios might have played with that
movw%ax, %ss
 1:
 
@@ -75,9 +76,12 @@ wakeup_code:
jmp 1f
 1:
 
-   .byte 0x66, 0xea# prefix + jmpi-opcode
-   .long   wakeup_32 - __START_KERNEL_map
-   .word   __KERNEL_CS
+   ljmpl   *(wakeup_32_vector - wakeup_code)
+
+   .balign 4
+wakeup_32_vector:
+   .long   wakeup_32 - __START_KERNEL_map
+   .word   __KERNEL32_CS, 0
 
.code32
 wakeup_32:
@@ -96,65 +100,50 @@ wakeup_32:
jnc bogus_cpu
movl%edx,%edi

-   movw$__KERNEL_DS, %ax
-   movw%ax, %ds
-   movw%ax, %es
-   movw%ax, %fs
-   movw%ax, %gs
+   movl$__KERNEL_DS, %eax
+   movl%eax, %ds
 
-   movw$__KERNEL_DS, %ax   
-   movw%ax, %ss
-
-   mov $(wakeup_stack - __START_KERNEL_map), %esp
movlsaved_magic - __START_KERNEL_map, %eax
cmpl$0x9abcdef0, %eax
jne bogus_32_magic
 
+   movw$0x0e00 + 'i', %ds:(0xb8012)
+   movb$0xa8, %al  ;  outb %al, $0x80;
+
/*
 * Prepare for entering 64bits mode
 */
 
-   /* Enable PAE mode and PGE */
+   /* Enable PAE */
xorl%eax, %eax
btsl$5, %eax
-   btsl$7, %eax
movl%eax, %cr4
 
/* Setup early boot stage 4 level pagetables */
movl$(wakeup_level4_pgt - __START_KERNEL_map), %eax
movl%eax, %cr3
 
-   /* Setup EFER (Extended Feature Enable Register) */
-   movl$MSR_EFER, %ecx
-   rdmsr
-   /* Fool rdmsr and reset %eax to avoid dependences */
-   xorl%eax, %eax
/* Enable Long Mode */
+   xorl%eax, %eax
btsl$_EFER_LME, %eax
-   /* Enable System Call */
-   btsl$_EFER_SCE, %eax
 
-   /* No Execute supported? */ 
+   /* No Execute supported? */
btl $20,%edi
jnc 1f
btsl$_EFER_NX, %eax
-1: 

/* Make changes effective */
+1: movl$MSR_EFER, %ecx
+   xorl%edx, %edx
wrmsr
-   wbinvd
 
xorl%eax, %eax
btsl$31, %eax   /* Enable paging and in turn 
activate Long Mode */
btsl$0, %eax/* Enable protected mode */
-   btsl$1, %eax 

[PATCH 15/20] Move swsusp __pa() dependent code to arch portion

2007-03-06 Thread Vivek Goyal


o __pa() should be used only on kernel linearly mapped virtual addresses
  and not on kernel text and data addresses.

o Hibernation code needs to determine the physical address associated
  with kernel symbol to mark a section boundary which contains pages which
  don't have to be saved and restored during hibernate/resume operation.

o Move this piece of code in arch dependent section. So that architectures
  which don't have kernel text/data mapped into kernel linearly mapped
  region can come up with their own ways of determining physical addresses
  associated with a kernel text.

Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/i386/power/suspend.c |   14 ++
 arch/powerpc/kernel/Makefile  |1 +
 arch/powerpc/kernel/suspend.c |   24 
 arch/x86_64/kernel/suspend.c  |   14 ++
 kernel/power/power.h  |5 ++---
 kernel/power/snapshot.c   |   11 ---
 6 files changed, 55 insertions(+), 14 deletions(-)

diff -puN 
arch/i386/power/suspend.c~move-swsusp-__pa-dependent-code-to-arch-portion 
arch/i386/power/suspend.c
--- 
linux-2.6.21-rc2-reloc/arch/i386/power/suspend.c~move-swsusp-__pa-dependent-code-to-arch-portion
2007-03-07 01:30:18.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/i386/power/suspend.c   2007-03-07 
01:30:18.0 +0530
@@ -16,6 +16,9 @@
 /* Defined in arch/i386/power/swsusp.S */
 extern int restore_image(void);
 
+/* References to section boundaries */
+extern const void __nosave_begin, __nosave_end;
+
 /* Pointer to the temporary resume page tables */
 pgd_t *resume_pg_dir;
 
@@ -156,3 +159,14 @@ int swsusp_arch_resume(void)
restore_image();
return 0;
 }
+
+/*
+ * pfn_is_nosave - check if given pfn is in the 'nosave' section
+ */
+
+int pfn_is_nosave(unsigned long pfn)
+{
+   unsigned long nosave_begin_pfn = __pa_symbol(&__nosave_begin) >> 
PAGE_SHIFT;
+   unsigned long nosave_end_pfn = PAGE_ALIGN(__pa_symbol(&__nosave_end)) 
>> PAGE_SHIFT;
+   return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn);
+}
diff -puN 
arch/powerpc/kernel/Makefile~move-swsusp-__pa-dependent-code-to-arch-portion 
arch/powerpc/kernel/Makefile
--- 
linux-2.6.21-rc2-reloc/arch/powerpc/kernel/Makefile~move-swsusp-__pa-dependent-code-to-arch-portion
 2007-03-07 01:30:18.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/powerpc/kernel/Makefile2007-03-07 
01:30:18.0 +0530
@@ -37,6 +37,7 @@ obj-$(CONFIG_CRASH_DUMP)  += crash_dump.o
 obj-$(CONFIG_6xx)  += idle_6xx.o l2cr_6xx.o cpu_setup_6xx.o
 obj-$(CONFIG_TAU)  += tau_6xx.o
 obj32-$(CONFIG_SOFTWARE_SUSPEND) += swsusp_32.o
+obj-$(CONFIG_SOFTWARE_SUSPEND) += suspend.o
 obj32-$(CONFIG_MODULES)+= module_32.o
 
 ifeq ($(CONFIG_PPC_MERGE),y)
diff -puN /dev/null arch/powerpc/kernel/suspend.c
--- /dev/null   2007-03-07 00:46:17.354096448 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/powerpc/kernel/suspend.c   2007-03-07 
01:30:18.0 +0530
@@ -0,0 +1,24 @@
+/*
+ * Suspend support specific for power.
+ *
+ * Distribute under GPLv2
+ *
+ * Copyright (c) 2002 Pavel Machek <[EMAIL PROTECTED]>
+ * Copyright (c) 2001 Patrick Mochel <[EMAIL PROTECTED]>
+ */
+
+#include 
+
+/* References to section boundaries */
+extern const void __nosave_begin, __nosave_end;
+
+/*
+ * pfn_is_nosave - check if given pfn is in the 'nosave' section
+ */
+
+int pfn_is_nosave(unsigned long pfn)
+{
+   unsigned long nosave_begin_pfn = __pa(&__nosave_begin) >> PAGE_SHIFT;
+   unsigned long nosave_end_pfn = PAGE_ALIGN(__pa(&__nosave_end)) >> 
PAGE_SHIFT;
+   return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn);
+}
diff -puN 
arch/x86_64/kernel/suspend.c~move-swsusp-__pa-dependent-code-to-arch-portion 
arch/x86_64/kernel/suspend.c
--- 
linux-2.6.21-rc2-reloc/arch/x86_64/kernel/suspend.c~move-swsusp-__pa-dependent-code-to-arch-portion
 2007-03-07 01:30:18.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/suspend.c2007-03-07 
01:30:18.0 +0530
@@ -13,6 +13,9 @@
 #include 
 #include 
 
+/* References to section boundaries */
+extern const void __nosave_begin, __nosave_end;
+
 struct saved_context saved_context;
 
 unsigned long saved_context_eax, saved_context_ebx, saved_context_ecx, 
saved_context_edx;
@@ -220,4 +223,15 @@ int swsusp_arch_resume(void)
restore_image();
return 0;
 }
+
+/*
+ * pfn_is_nosave - check if given pfn is in the 'nosave' section
+ */
+
+int pfn_is_nosave(unsigned long pfn)
+{
+   unsigned long nosave_begin_pfn = __pa_symbol(&__nosave_begin) >> 
PAGE_SHIFT;
+   unsigned long nosave_end_pfn = PAGE_ALIGN(__pa_symbol(&__nosave_end)) 
>> PAGE_SHIFT;
+   return (pfn >= nosave_begin_pfn) && (pfn < nosave_end_pfn);
+}
 #endif /* CONFIG_SOFTWARE_SUSPEND */
diff -puN kernel/power/power.h~move-swsusp-__pa-dependent-code-to-arch-portion 
kernel/power/power.h
--- 

[PATCH 3/20] x86_64: Clean up the early boot page table

2007-03-06 Thread Vivek Goyal


- Merge physmem_pgt and ident_pgt, removing physmem_pgt.  The merge
  is broken as soon as mm/init.c:init_memory_mapping is run.
- As physmem_pgt is gone don't export it in pgtable.h.
- Use defines from pgtable.h for page permissions.
- Fix the physical memory identity mapping so it is at the correct
  address.
- Remove the physical memory mapping from wakeup_level4_pgt it
  is at the wrong address so we can't possibly be usinging it.
- Simply NEXT_PAGE the work to calculate the phys_ alias
  of the labels was very cool.  Unfortuantely it was a brittle
  special purpose hack that makes maitenance more difficult.
  Instead just use label - __START_KERNEL_map like we do
  everywhere else in assembly.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/head.S|   61 +++
 include/asm-x86_64/pgtable.h |1 
 2 files changed, 28 insertions(+), 34 deletions(-)

diff -puN arch/x86_64/kernel/head.S~x86_64-Cleanup-the-early-boot-page-table 
arch/x86_64/kernel/head.S
--- 
linux-2.6.21-rc2-reloc/arch/x86_64/kernel/head.S~x86_64-Cleanup-the-early-boot-page-table
   2007-03-07 01:22:07.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/head.S   2007-03-07 
01:22:07.0 +0530
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -260,52 +261,48 @@ ljumpvector:
 ENTRY(stext)
 ENTRY(_stext)
 
-   $page = 0
 #define NEXT_PAGE(name) \
-   $page = $page + 1; \
-   .org $page * 0x1000; \
-   phys_/**/name = $page * 0x1000 + __PHYSICAL_START; \
+   .balign PAGE_SIZE; \
 ENTRY(name)
 
+/* Automate the creation of 1 to 1 mapping pmd entries */
+#define PMDS(START, PERM, COUNT)   \
+   i = 0 ; \
+   .rept (COUNT) ; \
+   .quad   (START) + (i << 21) + (PERM) ;  \
+   i = i + 1 ; \
+   .endr
+
 NEXT_PAGE(init_level4_pgt)
/* This gets initialized in x86_64_start_kernel */
.fill   512,8,0
 
 NEXT_PAGE(level3_ident_pgt)
-   .quad   phys_level2_ident_pgt | 0x007
+   .quad   level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
.fill   511,8,0
 
 NEXT_PAGE(level3_kernel_pgt)
.fill   510,8,0
/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
-   .quad   phys_level2_kernel_pgt | 0x007
+   .quad   level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
.fill   1,8,0
 
 NEXT_PAGE(level2_ident_pgt)
-   /* 40MB for bootup. */
-   i = 0
-   .rept 20
-   .quad   i << 21 | 0x083
-   i = i + 1
-   .endr
-   .fill   492,8,0
+   /* Since I easily can, map the first 1G.
+* Don't set NX because code runs from these pages.
+*/
+   PMDS(0x, __PAGE_KERNEL_LARGE_EXEC, PTRS_PER_PMD)

 NEXT_PAGE(level2_kernel_pgt)
/* 40MB kernel mapping. The kernel code cannot be bigger than that.
   When you change this change KERNEL_TEXT_SIZE in page.h too. */
/* (2^48-(2*1024*1024*1024)-((2^39)*511)-((2^30)*510)) = 0 */
-   i = 0
-   .rept 20
-   .quad   i << 21 | 0x183
-   i = i + 1
-   .endr
+   PMDS(0x, __PAGE_KERNEL_LARGE_EXEC|_PAGE_GLOBAL,
+   KERNEL_TEXT_SIZE/PMD_SIZE)
/* Module mapping starts here */
-   .fill   492,8,0
-
-NEXT_PAGE(level3_physmem_pgt)
-   .quad   phys_level2_kernel_pgt | 0x007  /* so that __va works even 
before pagetable_init */
-   .fill   511,8,0
+   .fill   (PTRS_PER_PMD - (KERNEL_TEXT_SIZE/PMD_SIZE)),8,0
 
+#undef PMDS
 #undef NEXT_PAGE
 
.data
@@ -313,12 +310,10 @@ NEXT_PAGE(level3_physmem_pgt)
 #ifdef CONFIG_ACPI_SLEEP
.align PAGE_SIZE
 ENTRY(wakeup_level4_pgt)
-   .quad   phys_level3_ident_pgt | 0x007
-   .fill   255,8,0
-   .quad   phys_level3_physmem_pgt | 0x007
-   .fill   254,8,0
+   .quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+   .fill   510,8,0
/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
-   .quad   phys_level3_kernel_pgt | 0x007
+   .quad   level3_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
 #endif
 
 #ifndef CONFIG_HOTPLUG_CPU
@@ -332,12 +327,12 @@ ENTRY(wakeup_level4_pgt)
 */
.align PAGE_SIZE
 ENTRY(boot_level4_pgt)
-   .quad   phys_level3_ident_pgt | 0x007
-   .fill   255,8,0
-   .quad   phys_level3_physmem_pgt | 0x007
-   .fill   254,8,0
+   .quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+   .fill   257,8,0
+   .quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+   .fill   252,8,0
/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
-   .quad   phys_level3_kernel_pgt | 0x007
+   .quad   level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
 
.data
 
diff -puN 

Re: [RFC][PATCH 0/7] Resource controllers based on process containers

2007-03-06 Thread Pavel Emelianov
Paul Menage wrote:
> On 3/6/07, Pavel Emelianov <[EMAIL PROTECTED]> wrote:
>> 2. Extended containers may register themselves too late.
>>Kernel threads/helpers start forking, opening files
>>and touching pages much earlier. This patchset
>>workarounds this in not-so-cute manner and I'm waiting
>>for Paul's comments on this issue.
>>
> 
> Can we not make sure that each subsystem registers itself before any
> of its resources become usable? So the file counting subsystem should

Actually all the subsystems I've sent became usable very early.
Much earlier that initcalls started. I didn't found where exactly
but I can make it if we really need it.

> register at some point before filp_open() becomes usable, and the
> process counting subsystem should register before it's possible to
> fork, etc.
> 
> Paul
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/20] x86_64: modify copy_bootdata to use virtual addresses

2007-03-06 Thread Vivek Goyal

Use virtual addresses instead of physical addresses
in copy bootdata.  In addition fix the implementation
of the old bootloader convention.  Everything is
at real_mode_data always.  It is just that sometimes
real_mode_data was relocated by setup.S to not sit at
0x9.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/head64.c |   17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff -puN 
arch/x86_64/kernel/head64.c~x86_64-modify-copy_bootdata-to-use-virtual-addresses
 arch/x86_64/kernel/head64.c
--- 
linux-2.6.21-rc2-reloc/arch/x86_64/kernel/head64.c~x86_64-modify-copy_bootdata-to-use-virtual-addresses
 2007-03-07 01:23:55.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/head64.c 2007-03-07 
01:23:55.0 +0530
@@ -29,25 +29,24 @@ static void __init clear_bss(void)
 }
 
 #define NEW_CL_POINTER 0x228   /* Relative to real mode data */
-#define OLD_CL_MAGIC_ADDR  0x90020
+#define OLD_CL_MAGIC_ADDR  0x20
 #define OLD_CL_MAGIC0xA33F
-#define OLD_CL_BASE_ADDR0x9
-#define OLD_CL_OFFSET   0x90022
+#define OLD_CL_OFFSET   0x22
 
 static void __init copy_bootdata(char *real_mode_data)
 {
-   int new_data;
+   unsigned long new_data;
char * command_line;
 
memcpy(x86_boot_params, real_mode_data, BOOT_PARAM_SIZE);
-   new_data = *(int *) (x86_boot_params + NEW_CL_POINTER);
+   new_data = *(u32 *) (x86_boot_params + NEW_CL_POINTER);
if (!new_data) {
-   if (OLD_CL_MAGIC != * (u16 *) OLD_CL_MAGIC_ADDR) {
+   if (OLD_CL_MAGIC != *(u16 *)(real_mode_data + 
OLD_CL_MAGIC_ADDR)) {
return;
}
-   new_data = OLD_CL_BASE_ADDR + * (u16 *) OLD_CL_OFFSET;
+   new_data = __pa(real_mode_data) + *(u16 *)(real_mode_data + 
OLD_CL_OFFSET);
}
-   command_line = (char *) ((u64)(new_data));
+   command_line = __va(new_data);
memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
 }
 
@@ -74,7 +73,7 @@ void __init x86_64_start_kernel(char * r
cpu_pda(i) = _cpu_pda[i];
 
pda_init(0);
-   copy_bootdata(real_mode_data);
+   copy_bootdata(__va(real_mode_data));
 #ifdef CONFIG_SMP
cpu_set(0, cpu_online_map);
 #endif
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/20] x86_64: 64bit ACPI wakeup trampoline

2007-03-06 Thread Vivek Goyal


o Moved wakeup_level4_pgt into the wakeup routine so we can
  run the kernel above 4G.

o Now we first go to 64bit mode and continue to run from trampoline and
  then then start accessing kernel symbols and restore processor context.
  This enables us to resume even in relocatable kernel context when 
  kernel might not be loaded at physical addr it has been compiled for.

o Removed the need for modifying any existing kernel page table.

o Increased the size of the wakeup routine to 8K. This is required as
  wake page tables are on trampoline itself and they got to be at 4K
  boundary, hence one page is not sufficient.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/acpi/sleep.c  |   22 ++
 arch/x86_64/kernel/acpi/wakeup.S |   59 ---
 arch/x86_64/kernel/head.S|9 -
 3 files changed, 41 insertions(+), 49 deletions(-)

diff -puN arch/x86_64/kernel/acpi/sleep.c~x86_64-64bit-ACPI-wakeup-trampoline 
arch/x86_64/kernel/acpi/sleep.c
--- 
linux-2.6.21-rc2-reloc/arch/x86_64/kernel/acpi/sleep.c~x86_64-64bit-ACPI-wakeup-trampoline
  2007-03-07 01:28:11.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/acpi/sleep.c 2007-03-07 
01:28:11.0 +0530
@@ -60,17 +60,6 @@ extern char wakeup_start, wakeup_end;
 
 extern unsigned long acpi_copy_wakeup_routine(unsigned long);
 
-static pgd_t low_ptr;
-
-static void init_low_mapping(void)
-{
-   pgd_t *slot0 = pgd_offset(current->mm, 0UL);
-   low_ptr = *slot0;
-   set_pgd(slot0, *pgd_offset(current->mm, PAGE_OFFSET));
-   WARN_ON(num_online_cpus() != 1);
-   local_flush_tlb();
-}
-
 /**
  * acpi_save_state_mem - save kernel state
  *
@@ -79,8 +68,6 @@ static void init_low_mapping(void)
  */
 int acpi_save_state_mem(void)
 {
-   init_low_mapping();
-
memcpy((void *)acpi_wakeup_address, _start,
   _end - _start);
acpi_copy_wakeup_routine(acpi_wakeup_address);
@@ -93,8 +80,6 @@ int acpi_save_state_mem(void)
  */
 void acpi_restore_state_mem(void)
 {
-   set_pgd(pgd_offset(current->mm, 0UL), low_ptr);
-   local_flush_tlb();
 }
 
 /**
@@ -107,10 +92,11 @@ void acpi_restore_state_mem(void)
  */
 void __init acpi_reserve_bootmem(void)
 {
-   acpi_wakeup_address = (unsigned long)alloc_bootmem_low(PAGE_SIZE);
-   if ((_end - _start) > PAGE_SIZE)
+   acpi_wakeup_address = (unsigned long)alloc_bootmem_low(PAGE_SIZE*2);
+   if ((_end - _start) > (PAGE_SIZE*2))
printk(KERN_CRIT
-  "ACPI: Wakeup code way too big, will crash on attempt to 
suspend\n");
+  "ACPI: Wakeup code way too big, will crash on attempt"
+  " to suspend\n");
 }
 
 static int __init acpi_sleep_setup(char *str)
diff -puN arch/x86_64/kernel/acpi/wakeup.S~x86_64-64bit-ACPI-wakeup-trampoline 
arch/x86_64/kernel/acpi/wakeup.S
--- 
linux-2.6.21-rc2-reloc/arch/x86_64/kernel/acpi/wakeup.S~x86_64-64bit-ACPI-wakeup-trampoline
 2007-03-07 01:28:11.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/acpi/wakeup.S
2007-03-07 01:28:11.0 +0530
@@ -1,6 +1,7 @@
 .text
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -62,12 +63,15 @@ wakeup_code:
 
movb$0xa2, %al  ;  outb %al, $0x80

-   lidt%ds:idt_48a - wakeup_code
-   xorl%eax, %eax
-   movw%ds, %ax# (Convert %ds:gdt to a linear 
ptr)
-   shll$4, %eax
-   addl$(gdta - wakeup_code), %eax
-   movl%eax, gdt_48a +2 - wakeup_code
+   mov %ds, %ax# Find 32bit wakeup_code addr
+   movzx   %ax, %esi   # (Convert %ds:gdt to a liner 
ptr)
+   shll$4, %esi
+   # Fix up the vectors
+   addl%esi, wakeup_32_vector - wakeup_code
+   addl%esi, wakeup_long64_vector - wakeup_code
+   addl%esi, gdt_48a + 2 - wakeup_code # Fixup the gdt pointer
+
+   lidtl   %ds:idt_48a - wakeup_code
lgdtl   %ds:gdt_48a - wakeup_code   # load gdt with whatever is
# appropriate
 
@@ -80,7 +84,7 @@ wakeup_code:
 
.balign 4
 wakeup_32_vector:
-   .long   wakeup_32 - __START_KERNEL_map
+   .long   wakeup_32 - wakeup_code
.word   __KERNEL32_CS, 0
 
.code32
@@ -103,10 +107,6 @@ wakeup_32:
movl$__KERNEL_DS, %eax
movl%eax, %ds
 
-   movlsaved_magic - __START_KERNEL_map, %eax
-   cmpl$0x9abcdef0, %eax
-   jne bogus_32_magic
-
movw$0x0e00 + 'i', %ds:(0xb8012)
movb$0xa8, %al  ;  outb %al, $0x80;
 
@@ -120,7 +120,7 @@ wakeup_32:
movl%eax, %cr4
 
/* Setup early boot stage 4 level pagetables */
-   movl$(wakeup_level4_pgt - __START_KERNEL_map), %eax
+   leal   

[PATCH 14/20] x86_64: Remove the identity mapping as early as possible

2007-03-06 Thread Vivek Goyal


With the rewrite of the SMP trampoline and the early page
allocator there is nothing that needs identity mapped pages,
once we start executing C code.

So add zap_identity_mappings into head64.c and remove
zap_low_mappings() from much later in the code.  The functions
 are subtly different thus the name change.

This also kills boot_level4_pgt which was from an earlier
attempt to move the identity mappings as early as possible,
and is now no longer needed.  Essentially I have replaced
boot_level4_pgt with trampoline_level4_pgt in trampoline.S

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/head.S|   39 ++-
 arch/x86_64/kernel/head64.c  |   17 +++--
 arch/x86_64/kernel/setup.c   |2 --
 arch/x86_64/kernel/setup64.c |1 -
 arch/x86_64/mm/init.c|   24 
 include/asm-x86_64/pgtable.h |1 -
 include/asm-x86_64/proto.h   |2 --
 7 files changed, 25 insertions(+), 61 deletions(-)

diff -puN 
arch/x86_64/kernel/head64.c~x86_64-Remove-the-identity-mapping-as-early-as-possible
 arch/x86_64/kernel/head64.c
--- 
linux-2.6.21-rc2-reloc/arch/x86_64/kernel/head64.c~x86_64-Remove-the-identity-mapping-as-early-as-possible
  2007-03-07 01:29:50.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/head64.c 2007-03-07 
01:29:50.0 +0530
@@ -18,8 +18,16 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
+static void __init zap_identity_mappings(void)
+{
+   pgd_t *pgd = pgd_offset_k(0UL);
+   pgd_clear(pgd);
+   __flush_tlb();
+}
+
 /* Don't add a printk in there. printk relies on the PDA which is not 
initialized 
yet. */
 static void __init clear_bss(void)
@@ -57,18 +65,15 @@ void __init x86_64_start_kernel(char * r
/* clear bss before set_intr_gate with early_idt_handler */
clear_bss();
 
+   /* Make NULL pointers segfault */
+   zap_identity_mappings();
+
for (i = 0; i < IDT_ENTRIES; i++)
set_intr_gate(i, early_idt_handler);
asm volatile("lidt %0" :: "m" (idt_descr));
 
early_printk("Kernel alive\n");
 
-   /*
-* switch to init_level4_pgt from boot_level4_pgt
-*/
-   memcpy(init_level4_pgt, boot_level4_pgt, PTRS_PER_PGD*sizeof(pgd_t));
-   asm volatile("movq %0,%%cr3" :: "r" (__pa_symbol(_level4_pgt)));
-
for (i = 0; i < NR_CPUS; i++)
cpu_pda(i) = _cpu_pda[i];
 
diff -puN 
arch/x86_64/kernel/head.S~x86_64-Remove-the-identity-mapping-as-early-as-possible
 arch/x86_64/kernel/head.S
--- 
linux-2.6.21-rc2-reloc/arch/x86_64/kernel/head.S~x86_64-Remove-the-identity-mapping-as-early-as-possible
2007-03-07 01:29:50.0 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/head.S   2007-03-07 
01:29:50.0 +0530
@@ -71,7 +71,7 @@ startup_32:
movl%eax, %cr4
 
/* Setup early boot stage 4 level pagetables */
-   movl$(boot_level4_pgt - __START_KERNEL_map), %eax
+   movl$(init_level4_pgt - __START_KERNEL_map), %eax
movl%eax, %cr3
 
/* Setup EFER (Extended Feature Enable Register) */
@@ -115,7 +115,7 @@ ENTRY(secondary_startup_64)
movq%rax, %cr4
 
/* Setup early boot stage 4 level pagetables. */
-   movq$(boot_level4_pgt - __START_KERNEL_map), %rax
+   movq$(init_level4_pgt - __START_KERNEL_map), %rax
movq%rax, %cr3
 
/* Check if nx is implemented */
@@ -274,9 +274,19 @@ ENTRY(name)
i = i + 1 ; \
.endr
 
+   /*
+* This default setting generates an ident mapping at address 0x10
+* and a mapping for the kernel that precisely maps virtual address
+* 0x8000 to physical address 0x00. (always using
+* 2Mbyte large pages provided by PAE mode)
+*/
 NEXT_PAGE(init_level4_pgt)
-   /* This gets initialized in x86_64_start_kernel */
-   .fill   512,8,0
+   .quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+   .fill   257,8,0
+   .quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+   .fill   252,8,0
+   /* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
+   .quad   level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
 
 NEXT_PAGE(level3_ident_pgt)
.quad   level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
@@ -307,27 +317,6 @@ NEXT_PAGE(level2_kernel_pgt)
 #undef NEXT_PAGE
 
.data
-
-#ifndef CONFIG_HOTPLUG_CPU
-   __INITDATA
-#endif
-   /*
-* This default setting generates an ident mapping at address 0x10
-* and a mapping for the kernel that precisely maps virtual address
-* 0x8000 to physical address 0x00. (always using
-* 2Mbyte large pages provided by PAE mode)
-*/
-   .align PAGE_SIZE
-ENTRY(boot_level4_pgt)
-   .quad   

Re: [patch 3/6] mm: fix fault vs invalidate race for linear mappings

2007-03-06 Thread Nick Piggin
On Tue, Mar 06, 2007 at 11:08:41PM -0800, Andrew Morton wrote:
> On Wed, 7 Mar 2007 07:57:27 +0100 Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > > 
> > > Why was truncate_inode_pages_range() altered to unmap the page if it got
> > > mapped again?
> > > 
> > > Oh.  Because the unmap_mapping_range() call got removed from 
> > > vmtruncate(). 
> > > Why?  (Please send suitable updates to the changelog).
> > 
> > We have to ensure it is unmapped, and be prepared to unmap it while under
> > the page lock.
> 
> But vmtruncate() dropped i_size, so nobody will map this page into
> pagetables from then on.

But there could be a fault in progress... the only way to know is
locking the page.

> > > I guess truncate of a mmapped area isn't sufficiently common to worry 
> > > about
> > > the inefficiency of this change.
> > 
> > Yeah, and it should be more efficient for files that aren't mmapped,
> > because we don't have to take i_mmap_lock for them.
> > 
> > > Lots of memory barriers got removed in memory.c, unchangeloggedly.
> > 
> > Yeah they were all for the lockless truncate_count checks. Now that
> > we use the page lock, we don't need barriers.
> > 
> > > Gratuitous renaming of locals in do_no_page() makes the change hard to
> > > review.  Should have been a separate patch.
> > > 
> > > In fact, the patch would have been heaps clearer if that renaming had been
> > > a separate patch.
> > 
> > Shall I?
> 
> If you don't have anything better to do, yes please ;)

OK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 2/7] RSS controller core

2007-03-06 Thread Pavel Emelianov
Balbir Singh wrote:
> Pavel Emelianov wrote:
>> This includes setup of RSS container within generic
>> process containers, all the declarations used in RSS
>> accounting, and core code responsible for accounting.
>>
>>
>> 
>>
>> diff -upr linux-2.6.20.orig/include/linux/rss_container.h
>> linux-2.6.20-0/include/linux/rss_container.h
>> --- linux-2.6.20.orig/include/linux/rss_container.h2007-03-06
>> 13:39:17.0 +0300
>> +++ linux-2.6.20-0/include/linux/rss_container.h2007-03-06
>> 13:33:28.0 +0300
>> @@ -0,0 +1,68 @@
>> +#ifndef __RSS_CONTAINER_H__
>> +#define __RSS_CONTAINER_H__
>> +/*
>> + * RSS container
>> + *
>> + * Copyright 2007 OpenVZ SWsoft Inc
>> + *
>> + * Author: Pavel Emelianov <[EMAIL PROTECTED]>
>> + *
>> + */
>> +
>> +struct page_container;
>> +struct rss_container;
>> +
>> +#ifdef CONFIG_RSS_CONTAINER
>> +int container_rss_prepare(struct page *, struct vm_area_struct *vma,
>> +struct page_container **);
>> +
>> +void container_rss_add(struct page_container *);
>> +void container_rss_del(struct page_container *);
>> +void container_rss_release(struct page_container *);
>> +
>> +int mm_init_container(struct mm_struct *mm, struct task_struct *tsk);
>> +void mm_free_container(struct mm_struct *mm);
>> +
>> +unsigned long container_isolate_pages(unsigned long nr_to_scan,
>> +struct rss_container *rss, struct list_head *dst,
>> +int active, unsigned long *scanned);
>> +unsigned long container_nr_physpages(struct rss_container *rss);
>> +
>> +unsigned long container_try_to_free_pages(struct rss_container *);
>> +void container_out_of_memory(struct rss_container *);
>> +
>> +void container_rss_init_early(void);
>> +#else
>> +static inline int container_rss_prepare(struct page *pg,
>> +struct vm_area_struct *vma, struct page_container **pc)
>> +{
>> +*pc = NULL; /* to make gcc happy */
>> +return 0;
>> +}
>> +
>> +static inline void container_rss_add(struct page_container *pc)
>> +{
>> +}
>> +
>> +static inline void container_rss_del(struct page_container *pc)
>> +{
>> +}
>> +
>> +static inline void container_rss_release(struct page_container *pc)
>> +{
>> +}
>> +
>> +static inline int mm_init_container(struct mm_struct *mm, struct
>> task_struct *t)
>> +{
>> +return 0;
>> +}
>> +
>> +static inline void mm_free_container(struct mm_struct *mm)
>> +{
>> +}
>> +
>> +static inline void container_rss_init_early(void)
>> +{
>> +}
>> +#endif
>> +#endif
>> diff -upr linux-2.6.20.orig/init/Kconfig linux-2.6.20-0/init/Kconfig
>> --- linux-2.6.20.orig/init/Kconfig2007-03-06 13:33:28.0 +0300
>> +++ linux-2.6.20-0/init/Kconfig2007-03-06 13:33:28.0 +0300
>> @@ -265,6 +265,13 @@ config CPUSETS
>>  bool
>>  select CONTAINERS
>>
>> +config RSS_CONTAINER
>> +bool "RSS accounting container"
>> +select RESOURCE_COUNTERS
>> +help
>> +  Provides a simple Resource Controller for monitoring and
>> +  controlling the total Resident Set Size of the tasks in a
>> container
>> +
> 
> The wording looks very familiar :-). It would be useful to add
> "The reclaim logic is now container aware, when the container goes
> overlimit
> the page reclaimer reclaims pages belonging to this container. If we are
> unable to reclaim enough pages to satisfy the request, the process is
> killed with an out of memory warning"

OK. Thanks.

> 
>>  config SYSFS_DEPRECATED
>>  bool "Create deprecated sysfs files"
>>  default y
>> diff -upr linux-2.6.20.orig/mm/Makefile linux-2.6.20-0/mm/Makefile
>> --- linux-2.6.20.orig/mm/Makefile2007-02-04 21:44:54.0 +0300
>> +++ linux-2.6.20-0/mm/Makefile2007-03-06 13:33:28.0 +0300
>> @@ -29,3 +29,5 @@ obj-$(CONFIG_MEMORY_HOTPLUG) += memory_h
>>  obj-$(CONFIG_FS_XIP) += filemap_xip.o
>>  obj-$(CONFIG_MIGRATION) += migrate.o
>>  obj-$(CONFIG_SMP) += allocpercpu.o
>> +
>> +obj-$(CONFIG_RSS_CONTAINER) += rss_container.o
>> diff -upr linux-2.6.20.orig/mm/rss_container.c
>> linux-2.6.20-0/mm/rss_container.c
>> --- linux-2.6.20.orig/mm/rss_container.c2007-03-06
>> 13:39:17.0 +0300
>> +++ linux-2.6.20-0/mm/rss_container.c2007-03-06 13:33:28.0
>> +0300
>> @@ -0,0 +1,307 @@
>> +/*
>> + * RSS accounting container
>> + *
>> + * Copyright 2007 OpenVZ SWsoft Inc
>> + *
>> + * Author: Pavel Emelianov <[EMAIL PROTECTED]>
>> + *
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +static struct container_subsys rss_subsys;
>> +
>> +struct rss_container {
>> +struct res_counter res;
>> +struct list_head page_list;
>> +struct container_subsys_state css;
>> +};
>> +
>> +struct page_container {
>> +struct page *page;
>> +struct rss_container *cnt;
>> +struct list_head list;
>> +};
>> +
> 
> Yes, this is what I was planning to get to -- a per container LRU list.
> But you have just one list, don't you need active and inactive 

Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-06 Thread Bill Irwin
On Tue, Mar 06, 2007 at 10:51:01PM -0800, Andrew Morton wrote:
> Does anybody really pass a NULL `type' arg into filemap_nopage()?

The major vs. minor fault accounting patch that introduced the argument
didn't make non-NULL type arguments a requirement. It's essentially an
optional second return value and the NULL pointer represents the caller
choosing to ignore it. I'm not sure I actually liked that aspect of it,
but that's how it ended up going in. I think it had something to do
with driver churn clashing with the sweep at the time of the merge. I'd
rather the argument be mandatory and defaulted to VM_FAULT_MINOR.

It's something of a non-answer, though, since it only discusses a
convention as opposed to reviewing specific callers of filemap_nopage().
NULL type arguments to ->nopage() are rare at most, and could be easily
eliminated, at least for in-tree drivers.

egrep -nr 'nopage.*NULL' . 2>/dev/null | grep -v '^Bin' on a current
git tree yields zero matches.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] epoll use a single inode ...

2007-03-06 Thread Eric Dumazet

Eric Dumazet a écrit :


I would definitly *love* saving dentries for pipes (and sockets too), 
but how are you going to get the inode ?


pipes()/sockets() can use read()/write()/rw_verify_area() and thus need 
file->f_path.dentry->d_inode (so each pipe needs a separate dentry)


Are you suggesting adding a new "struct file_operations" member to get 
the inode ?

Or re-intoducing an inode pointer in struct file ?


Crazy ideas : (some readers are going to kill me)

1) Use the low order bit of f_path.dentry to say : this pointer is not a 
pointer to a dentry but the inode pointer (with the low order bit set to 1)


OR

2) file->f_path.dentry set to NULL for this special files (so that we dont 
need to dput() and cache line ping pong the common dentry each time we 
__fput() a pipe/socket.


Same trick could be used for file->f_path.mnt, because there is a big SMP 
cache line ping/pong to maintain a mnt_count on pipe/sockets mountpoint while 
these file systems cannot be un-mounted)


If dentry is NULL, we get the inode pointer from an overlay of struct 
file_ra_statef_ra; (because for this special files readahead is unused)


This adds some conditional branches of course, but being able to save ram and 
better use cpu caches might be worth them.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/7] Resource counters

2007-03-06 Thread Pavel Emelianov
Balbir Singh wrote:
> Pavel Emelianov wrote:
>> Introduce generic structures and routines for
>> resource accounting.
>>
>> Each resource accounting container is supposed to
>> aggregate it, container_subsystem_state and its
>> resource-specific members within.
>>
>>
>> 
>>
>> diff -upr linux-2.6.20.orig/include/linux/res_counter.h
>> linux-2.6.20-0/include/linux/res_counter.h
>> --- linux-2.6.20.orig/include/linux/res_counter.h2007-03-06
>> 13:39:17.0 +0300
>> +++ linux-2.6.20-0/include/linux/res_counter.h2007-03-06
>> 13:33:28.0 +0300
>> @@ -0,0 +1,83 @@
>> +#ifndef __RES_COUNTER_H__
>> +#define __RES_COUNTER_H__
>> +/*
>> + * resource counters
>> + *
>> + * Copyright 2007 OpenVZ SWsoft Inc
>> + *
>> + * Author: Pavel Emelianov <[EMAIL PROTECTED]>
>> + *
>> + */
>> +
>> +#include 
>> +
>> +struct res_counter {
>> +unsigned long usage;
>> +unsigned long limit;
>> +unsigned long failcnt;
>> +spinlock_t lock;
>> +};
>> +
>> +enum {
>> +RES_USAGE,
>> +RES_LIMIT,
>> +RES_FAILCNT,
>> +};
>> +
>> +ssize_t res_counter_read(struct res_counter *cnt, int member,
>> +const char __user *buf, size_t nbytes, loff_t *pos);
>> +ssize_t res_counter_write(struct res_counter *cnt, int member,
>> +const char __user *buf, size_t nbytes, loff_t *pos);
>> +
>> +static inline void res_counter_init(struct res_counter *cnt)
>> +{
>> +spin_lock_init(>lock);
>> +cnt->limit = (unsigned long)LONG_MAX;
>> +}
>> +
> 
> Is there any way to indicate that there are no limits on this container.

Yes - LONG_MAX is essentially a "no limit" value as no
container will ever have such many files :)

> LONG_MAX is quite huge, but still when the administrator wants to
> configure a container to *un-limited usage*, it becomes hard for
> the administrator.
> 
>> +static inline int res_counter_charge_locked(struct res_counter *cnt,
>> +unsigned long val)
>> +{
>> +if (cnt->usage <= cnt->limit - val) {
>> +cnt->usage += val;
>> +return 0;
>> +}
>> +
>> +cnt->failcnt++;
>> +return -ENOMEM;
>> +}
>> +
>> +static inline int res_counter_charge(struct res_counter *cnt,
>> +unsigned long val)
>> +{
>> +int ret;
>> +unsigned long flags;
>> +
>> +spin_lock_irqsave(>lock, flags);
>> +ret = res_counter_charge_locked(cnt, val);
>> +spin_unlock_irqrestore(>lock, flags);
>> +return ret;
>> +}
>> +
> 
> Will atomic counters help here.

I'm afraid no. We have to atomically check for limit and alter
one of usage or failcnt depending on the checking result. Making
this with atomic_xxx ops will require at least two ops.

If we'll remove failcnt this would look like
   while (atomic_cmpxchg(...))
which is also not that good.

Moreover - in RSS accounting patches I perform page list
manipulations under this lock, so this also saves one atomic op.

>> +static inline void res_counter_uncharge_locked(struct res_counter *cnt,
>> +unsigned long val)
>> +{
>> +if (unlikely(cnt->usage < val)) {
>> +WARN_ON(1);
>> +val = cnt->usage;
>> +}
>> +
>> +cnt->usage -= val;
>> +}
>> +
>> +static inline void res_counter_uncharge(struct res_counter *cnt,
>> +unsigned long val)
>> +{
>> +unsigned long flags;
>> +
>> +spin_lock_irqsave(>lock, flags);
>> +res_counter_uncharge_locked(cnt, val);
>> +spin_unlock_irqrestore(>lock, flags);
>> +}
>> +
>> +#endif
>> diff -upr linux-2.6.20.orig/init/Kconfig linux-2.6.20-0/init/Kconfig
>> --- linux-2.6.20.orig/init/Kconfig2007-03-06 13:33:28.0 +0300
>> +++ linux-2.6.20-0/init/Kconfig2007-03-06 13:33:28.0 +0300
>> @@ -265,6 +265,10 @@ config CPUSETS
>>
>>Say N if unsure.
>>
>> +config RESOURCE_COUNTERS
>> +bool
>> +select CONTAINERS
>> +
>>  config SYSFS_DEPRECATED
>>  bool "Create deprecated sysfs files"
>>  default y
>> diff -upr linux-2.6.20.orig/kernel/Makefile
>> linux-2.6.20-0/kernel/Makefile
>> --- linux-2.6.20.orig/kernel/Makefile2007-03-06 13:33:28.0
>> +0300
>> +++ linux-2.6.20-0/kernel/Makefile2007-03-06 13:33:28.0 +0300
>> @@ -51,6 +51,7 @@ obj-$(CONFIG_RELAY) += relay.o
>>  obj-$(CONFIG_UTS_NS) += utsname.o
>>  obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
>>  obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
>> +obj-$(CONFIG_RESOURCE_COUNTERS) += res_counter.o
>>
>>  ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
>>  # According to Alan Modra <[EMAIL PROTECTED]>, the
>> -fno-omit-frame-pointer is
>> diff -upr linux-2.6.20.orig/kernel/res_counter.c
>> linux-2.6.20-0/kernel/res_counter.c
>> --- linux-2.6.20.orig/kernel/res_counter.c2007-03-06
>> 13:39:17.0 +0300
>> +++ linux-2.6.20-0/kernel/res_counter.c2007-03-06
>> 13:33:28.0 +0300
>> @@ -0,0 +1,72 @@
>> +/*
>> + * resource containers
>> + *
>> + * Copyright 2007 OpenVZ SWsoft Inc
>> + *
>> + * Author: Pavel Emelianov 

Re: [patch] epoll use a single inode ...

2007-03-06 Thread Davide Libenzi
On Wed, 7 Mar 2007, Eric Dumazet wrote:

> I would definitly *love* saving dentries for pipes (and sockets too), but how
> are you going to get the inode ?

I was not planning to touch anything but epoll, signalfd and timerfd 
files.


> pipes()/sockets() can use read()/write()/rw_verify_area() and thus need
> file->f_path.dentry->d_inode (so each pipe needs a separate dentry)

Currently, they use a single inode, and multiple dentries (to give the 
name of the class). But this could be changed to a single dentry like 
Linus was suggesting. I'll wait for Al's reply before doing anything.
Memory saving can be something, on top of the already big one of avoiding 
code duplication.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v2] epoll use a single inode ...

2007-03-06 Thread H. Peter Anvin

Eric Dumazet wrote:

Linus Torvalds a écrit :


On Tue, 6 Mar 2007, Eric Dumazet wrote:

I did a user space program, attached to this mail.

I rewrote the reciprocal_div() for i386 so that one multiply is used.


Ok, this is definitely faster on Core 2 as well, so "numbers talk, 
bullshit walks". No more objections.


And the numbers were ? :)



(That said, I bet you could do even better for octal and hex numbers, 
so if you *really* want to speed things up, you should just make a 
special-case routine for each base (there's just three of them), and 
you can then also optimize the base-10 thing much better (you can do 
two digits at a time by dividing by 100, etc)


Well, given that sprintf() is frequently called only for pipe/sockets 
creation, we probably better :


1) wait a very clever idea to suppress individual dentry per 
pipe/sockets (no more sprintf() at pipe/socket setup)


2) delay the sprintf() only if needed as you mentioned in a previous 
mail (when someone wants ls -l /proc/pid/fd/), since their dentries 
are not anymore inserted in the global dcache hash, they could stay with 
a (nul) dname.


Yes, the right thing to do is probably to only generate these strings 
when someone tries to list them, not on every socket/pipe/epoll 
creation.  One can assign a counter and keep it as a binary value at the 
start, but create the strings when necessary.


-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] Blackfin: blackfin i2c driver

2007-03-06 Thread Andrew Morton
On Wed, 7 Mar 2007 07:58:22 +0100 Jean Delvare <[EMAIL PROTECTED]> wrote:

> > +config BFIN_SDA
> 
> I2C_BLACKFIN_SDA

The blackfin architecture uses "bfin" pretty much universally, so this
usage is consistent.

box:/usr/src/25> grep -i blackfin patches/blackfin*|wc -l
   1608
box:/usr/src/25> grep -i bfin patches/blackfin*|wc -l
   6198

Let's just hope nobody makes a bluefin.

> > +   range 0 15 if (BF533 || BF532 || BF531) 
> 
> Trailing whitespace.

I always remove that when merging a patch.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] ARP notify option

2007-03-06 Thread Pekka Savola

On Tue, 6 Mar 2007, Chris Friesen wrote:

Stephen Hemminger wrote:

 +arp_notify - BOOLEAN
 +  Define mode for notification of address and device changes.
 +  0 - (default): do nothing
 +  1 - Generate gratuitous arp replies when device is brought up
 +  or hardware address changes.


Did you consider using gratuitous arp requests instead?  I remember reading 
about some hardware that updated its arp cache on gratuitous requests but not 
gratuitous replies.


You might be interested in taking a look at:

http://tools.ietf.org/id/draft-cheshire-ipv4-acd

There has been some follow-up discussion on this in the thread 
starting at:


http://www1.ietf.org/mail-archive/web/int-area/current/msg00611.html

In particular, you may be interested in this comment about ARP 
request and ARP reply for gratuitous ARP:


http://www1.ietf.org/mail-archive/web/int-area/current/msg00669.html

--
Pekka Savola "You each name yourselves king, yet the
Netcore Oykingdom bleeds."
Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] signalfd v1 - signalfd core ...

2007-03-06 Thread Davide Libenzi
On Wed, 7 Mar 2007, Stephen Rothwell wrote:

> On Tue, 6 Mar 2007 17:36:56 -0800 (PST) Davide Libenzi 
>  wrote:
> >
> > The read(2) call will read u32 signal numbers that landed over the
> > signalfd. It returns the size of the data copied, or zero if the sighand
> > we are attached to, has been detached.
> 
> So what about signals that the user asked for a siginfo_t to be returned
> with?

O-Ren:   "You didn't think it was gonna be that easy, did you?"
B-Kiddo: "You know, for a second there, yeah, I kinda did."

:)

I could do that, since where I placed the signalfd_notify() I have the 
siginfo. But that is going to make code a little more complex, since the 
simple bitmaks needs to become a queue.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21-rc3] cpufreq: p4-clockmod.c compilation error

2007-03-06 Thread David Rientjes
On Wed, 7 Mar 2007, Dave Jones wrote:

> diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
> index d155e81..74747d9 100644
> --- a/drivers/cpufreq/Kconfig
> +++ b/drivers/cpufreq/Kconfig
> @@ -16,7 +16,7 @@ config CPU_FREQ
>  if CPU_FREQ
>  
>  config CPU_FREQ_TABLE
> -   tristate
> +   bool
>  
>  config CPU_FREQ_DEBUG
> bool "Enable CPUfreq debugging"
> 
> 
> 

That did the trick, thanks.

Acked-by: David Rientjes <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 6/7] Account for the number of tasks within container

2007-03-06 Thread Pavel Emelianov
Paul Menage wrote:
> Hi Pavel,
> 
> On 3/6/07, Pavel Emelianov <[EMAIL PROTECTED]> wrote:
>> diff -upr linux-2.6.20.orig/include/linux/sched.h
>> linux-2.6.20-0/include/linux/sched.h
>> --- linux-2.6.20.orig/include/linux/sched.h 2007-03-06
>> 13:33:28.0 +0300
>> +++ linux-2.6.20-0/include/linux/sched.h2007-03-06
>> 13:33:28.0 +0300
>> @@ -1052,6 +1055,9 @@ struct task_struct {
>>  #ifdef CONFIG_FAULT_INJECTION
>> int make_it_fail;
>>  #endif
>> +#ifdef CONFIG_PROCESS_CONTAINER
>> +   struct numproc_container *numproc_cnt;
>> +#endif
>>  };
> 
> Why do you need a pointer added to task_struct? One of the main points
> of the generic containers is to avoid every different subsystem and
> resource controller having to add new pointers there.
> 
>> +
>> +   rcu_read_lock();
>> +   np = numproc_from_cont(task_container(current, _subsys));
>> +   css_get_current(>css);
> 
> There's no need to hold a reference here - by definition, the task's
> container can't go away while the task is in it.
> 
> Also, shouldn't you have an attach() method to move the count from one
> container to another when a task moves?

The idea is:

Task may be "the entity that allocates the resources" and "the
entity that is a resource allocated".

When task is the first entity it may move across containers
(that is implemented in your patches). When task is a resource
it shouldn't move across containers like files or pages do.

More generally - allocated resources hold reference to original
container till they die. No resource migration is performed.

Did I express my idea cleanly?

> Paul
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/6] mm: fix fault vs invalidate race for linear mappings

2007-03-06 Thread Andrew Morton
On Wed, 7 Mar 2007 07:57:27 +0100 Nick Piggin <[EMAIL PROTECTED]> wrote:

> > 
> > Why was truncate_inode_pages_range() altered to unmap the page if it got
> > mapped again?
> > 
> > Oh.  Because the unmap_mapping_range() call got removed from vmtruncate(). 
> > Why?  (Please send suitable updates to the changelog).
> 
> We have to ensure it is unmapped, and be prepared to unmap it while under
> the page lock.

But vmtruncate() dropped i_size, so nobody will map this page into
pagetables from then on.

> > I guess truncate of a mmapped area isn't sufficiently common to worry about
> > the inefficiency of this change.
> 
> Yeah, and it should be more efficient for files that aren't mmapped,
> because we don't have to take i_mmap_lock for them.
> 
> > Lots of memory barriers got removed in memory.c, unchangeloggedly.
> 
> Yeah they were all for the lockless truncate_count checks. Now that
> we use the page lock, we don't need barriers.
> 
> > Gratuitous renaming of locals in do_no_page() makes the change hard to
> > review.  Should have been a separate patch.
> > 
> > In fact, the patch would have been heaps clearer if that renaming had been
> > a separate patch.
> 
> Shall I?

If you don't have anything better to do, yes please ;)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-06 Thread Nick Piggin
On Tue, Mar 06, 2007 at 10:51:01PM -0800, Andrew Morton wrote:
> On Wed, 21 Feb 2007 05:50:17 +0100 (CET) Nick Piggin <[EMAIL PROTECTED]> 
> wrote:
> 
> > Nonlinear mappings are (AFAIKS) simply a virtual memory concept that
> > encodes the virtual address -> file offset differently from linear
> > mappings.
> > 
> > I can't see why the filesystem/pagecache code should need to know anything
> > about it, except for the fact that the ->nopage handler didn't quite pass
> > down enough information (ie. pgoff). But it is more logical to pass pgoff
> > rather than have the ->nopage function calculate it itself anyway. And
> > having the nopage handler install the pte itself is sort of nasty.
> > 
> > This patch introduces a new fault handler that replaces ->nopage and
> > ->populate and (later) ->nopfn. Most of the old mechanism is still in place
> > so there is a lot of duplication and nice cleanups that can be removed if
> > everyone switches over.
> > 
> > The rationale for doing this in the first place is that nonlinear mappings
> > are subject to the pagefault vs invalidate/truncate race too, and it seemed
> > stupid to duplicate the synchronisation logic rather than just consolidate
> > the two.
> > 
> 
> It's awkward to layer a largely do-nothing patch like this on top of a
> significant functional change.  Makes it harder to isolate the source of
> regressions, harder to revert the do-something patch.
> 
> > After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
> > pagecache. Seems like a fringe functionality anyway.
> 
> Does Ingo agree?

I cc'ed him when first posting it. He didn't disagree.

> > NOPAGE_REFAULT is removed. This should be implemented with ->fault, and
> > no users have hit mainline yet.
> 
> Did benh agree with that?

Yes.

> The patch unchangeloggedly adds a basic new structure to core mm
> (fault_data).  Would be nice to document its fields, especially `flags'.

OK. This is actually something that I would like more people to review.
Do we need any different fields? Should it be passed as arguments instead
of a structure?

> Please add less pointless blank lines.
> 
> 
> How well has this been tested?  The ocfs2 changes?  gfs2?  We should at
> least give those guys a heads-up.

Yes we should. Not all those filesystem changes have been tested.

> Does anybody really pass a NULL `type' arg into filemap_nopage()?

Dunno, it's exported. I remove that completely in a subsequent patch
anyway.

> This patch seems to churn things around an awful lot for minimal benefit.

Well it fixes the whole design of the nonlinear fault path.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] Blackfin: blackfin i2c driver

2007-03-06 Thread Andrew Morton
On Wed, 07 Mar 2007 13:57:58 +0800 "Wu, Bryan" <[EMAIL PROTECTED]> wrote:

> Here is the updated blackfin i2c driver.
> 
> [PATCH] Blackfin: blackfin i2c driver
> 
> The i2c linux driver for blackfin architecture which supports both GPIO
> i2c operation and blackfin on-chip TWI controller i2c operation.
> 
> Signed-off-by: Bryan Wu <[EMAIL PROTECTED]> 
> ---
>  drivers/i2c/busses/Kconfig |   47 
>  drivers/i2c/busses/i2c-bfin-gpio.c |   98 +
>  drivers/i2c/busses/i2c-bfin-twi.c  |  589 
> 
>  3 files changed, 734 insertions(+)
> 
> Index: linux-2.6/drivers/i2c/busses/Kconfig
> ===
> --- linux-2.6.orig/drivers/i2c/busses/Kconfig 2007-03-07 13:32:02.0 
> +0800
> +++ linux-2.6/drivers/i2c/busses/Kconfig  2007-03-07 13:44:19.0 
> +0800
> @@ -5,6 +5,53 @@
>  menu "I2C Hardware Bus support"
>   depends on I2C
>  
> +config I2C_BFIN_GPIO
> + tristate "Generic Blackfin and HHBF533/561 development board I2C 
> support"
> + depends on I2C && EXPERIMENTAL
> + select I2C_ALGOBIT
> + help
> + --
> +
> +menu "BFIN I2C SDA/SCL Selection"
> + depends on I2C_BFIN_GPIO
> +config BFIN_SDA
> + int "SDA is GPIO Number"
> + range 0 15 if (BF533 || BF532 || BF531) 
> + range 0 47 if (BF534 || BF536 || BF537)
> + range 0 47 if BF561
> + default 2 if (BF533 || BF532 || BF531) 
> +
> +config BFIN_SCL
> + int "SCL is GPIO Number"
> + range 0 15 if (BF533 || BF532 || BF531) 
> + range 0 47 if (BF534 || BF536 || BF537)
> + range 0 47 if BF561
> + default 3 
> +endmenu
> +
> +config I2C_BFIN_GPIO_CYCLE_DELAY
> + int "Cycle Delay in usec"
> + depends on I2C_BFIN_GPIO
> + range 1 100 
> + default 40
> +
> +config I2C_BFIN_TWI
> + tristate "Blackfin TWI I2C support"
> + depends on I2C && (BF534 || BF536 || BF537)
> + help
> +   This the TWI I2C device driver for Blackfin 534/536/537.
> +
> +   This driver can also be built as a module.  If so, the module
> +   will be called i2c-bfin-twi.
> +
> +config TWICLK_KHZ
> + int "TWI clock (kHZ)"
> + depends on I2C_BFIN_TWI
> + default 50
> + help
> +   The unit of the TWI clock is kilo HZ. Please divide the clock 
> +   by 1024 if you count it in HZ. The value should be less than 400.
> +

Well that's cute.  This patch causes an i386 `make allmodconfig' to spew
these:

SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 
SDA is GPIO Number (BFIN_SDA) [] (NEW) 

out at about 1,000,000/sec, infinitely.

I'll put a `depends on BFIN' in there to shut it up, but I think you've
tickled a Kconfig bug.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Unused floppy1 going bonkers in 2.6.20-rds

2007-03-06 Thread Gene Heskett
Greetings;

Kernel 2.6.20-rds (Cons patch), biostar mobo. 1GB of memory.

I have an elderly 5.25" floppy drive mounted in this box, something I use 
for sneakernet duties to get to a 'legacy' machine occasionally.  It 
hasn't been used for anything in about a month or more.  No disk in it 
now.

Uptime 0d:15:47

The messages file is being cluttered with this:

Mar  7 01:17:22 coyote kernel:
Mar  7 01:17:22 coyote kernel: floppy driver state
Mar  7 01:17:22 coyote kernel: ---
Mar  7 01:17:22 coyote kernel: now=54902555 last interrupt=54899556 
diff=2999 last called handler=f89f833f
Mar  7 01:17:22 coyote kernel: timeout_message=floppy start
Mar  7 01:17:22 coyote kernel: last output bytes:
Mar  7 01:17:22 coyote kernel:  0 90 54899555
Mar  7 01:17:22 coyote kernel: 13 90 54899555
Mar  7 01:17:22 coyote kernel:  0 90 54899555
Mar  7 01:17:22 coyote kernel: 1a 90 54899555
Mar  7 01:17:22 coyote kernel:  0 90 54899555
Mar  7 01:17:22 coyote kernel:  3 90 54899555
Mar  7 01:17:22 coyote kernel: c1 90 54899555
Mar  7 01:17:22 coyote kernel:  8 90 54899555
Mar  7 01:17:22 coyote kernel:  7 80 54899555
Mar  7 01:17:22 coyote kernel:  1 90 54899556
Mar  7 01:17:22 coyote kernel:  8 82 54899556
Mar  7 01:17:22 coyote kernel: e6 80 54899556
Mar  7 01:17:22 coyote kernel:  1 90 54899556
Mar  7 01:17:22 coyote kernel:  0 90 54899556
Mar  7 01:17:22 coyote kernel:  0 90 54899556
Mar  7 01:17:22 coyote kernel:  1 90 54899556
Mar  7 01:17:22 coyote kernel:  2 90 54899556
Mar  7 01:17:22 coyote kernel:  9 90 54899556
Mar  7 01:17:22 coyote kernel: 2a 90 54899556
Mar  7 01:17:22 coyote kernel: ff 90 54899556
Mar  7 01:17:22 coyote kernel: last result at 54899556
Mar  7 01:17:22 coyote kernel: last redo_fd_request at 54899555
Mar  7 01:17:22 coyote kernel: 21  0
Mar  7 01:17:22 coyote kernel: status=50
Mar  7 01:17:22 coyote kernel: fdc_busy=1
Mar  7 01:17:22 coyote kernel: do_floppy=f89f3323
Mar  7 01:17:22 coyote kernel: fd_timer.function=f89f51b2
Mar  7 01:17:22 coyote kernel: cont=f89fc5ec
Mar  7 01:17:22 coyote kernel: current_req=e8eaa9c8
Mar  7 01:17:22 coyote kernel: command_status=-1
Mar  7 01:17:22 coyote kernel:
Mar  7 01:17:22 coyote kernel: floppy1: floppy timeout called
Mar  7 01:17:22 coyote kernel: end_request: I/O error, dev fd1, sector 0
Mar  7 01:17:22 coyote kernel: Buffer I/O error on device fd1, logical 
block 0

About 8 or 9 times since it was rebooted 16 hours ago, and very 
intermittent, it might skip 3 hours, then show two of these stanza's in 3 
seconds.

The drive type is properly set in the bios if that actually means 
anything.

The only mention of floppy in /var/log/dmesg is the ide-floppy driver 
signing in during the boot cause I've got one of those 100MB floppies I 
occasionally use too.  It is not installed at the moment.

Has anyone else seen this in a box with two floppy drives in it?

Cheers, Gene
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Function reject.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] Blackfin: blackfin i2c driver

2007-03-06 Thread Jean Delvare
Hi Bryan,

On Wed, 07 Mar 2007 13:57:58 +0800, Wu, Bryan wrote:
> Here is the updated blackfin i2c driver.
> 
> [PATCH] Blackfin: blackfin i2c driver
> 
> The i2c linux driver for blackfin architecture which supports both GPIO
> i2c operation and blackfin on-chip TWI controller i2c operation.
> 
> Signed-off-by: Bryan Wu <[EMAIL PROTECTED]> 
> ---
>  drivers/i2c/busses/Kconfig |   47 
>  drivers/i2c/busses/i2c-bfin-gpio.c |   98 +
>  drivers/i2c/busses/i2c-bfin-twi.c  |  589 
> 

I'd prefer i2c-blackfin-gpio and i2c-blackfin-twi. Abreviations tend to
confuse newcomers.

>  3 files changed, 734 insertions(+)
> 
> Index: linux-2.6/drivers/i2c/busses/Kconfig
> ===
> --- linux-2.6.orig/drivers/i2c/busses/Kconfig 2007-03-07 13:32:02.0 
> +0800
> +++ linux-2.6/drivers/i2c/busses/Kconfig  2007-03-07 13:44:19.0 
> +0800
> @@ -5,6 +5,53 @@
>  menu "I2C Hardware Bus support"
>   depends on I2C
>  
> +config I2C_BFIN_GPIO

I2C_BLACKFIN_GPIO

Please move the entries to the right location. The list is sorted
alphabetically if you didn't notice.

> + tristate "Generic Blackfin and HHBF533/561 development board I2C 
> support"

You can drop the trailing "I2C support", the user is in a menu named
"I2C hardware bus support" so it's pretty clear what we're talking
about.

> + depends on I2C && EXPERIMENTAL
> + select I2C_ALGOBIT
> + help
> + --
> +
> +menu "BFIN I2C SDA/SCL Selection"
> + depends on I2C_BFIN_GPIO
> +config BFIN_SDA

I2C_BLACKFIN_SDA

> + int "SDA is GPIO Number"

"SDA GPIO pin number"

> + range 0 15 if (BF533 || BF532 || BF531) 

Trailing whitespace.

> + range 0 47 if (BF534 || BF536 || BF537)
> + range 0 47 if BF561
> + default 2 if (BF533 || BF532 || BF531) 

Trailing whitespace.

No default for the other cases?

> +
> +config BFIN_SCL

I2C_BLACKFIN_SCL
Etc etc, all the options should start with I2C_BLACKFIN.

> + int "SCL is GPIO Number"

"SCL GPIO pin number"

> + range 0 15 if (BF533 || BF532 || BF531) 

Trailing whitespace, and many more after that. Please fix them all!

> + range 0 47 if (BF534 || BF536 || BF537)
> + range 0 47 if BF561
> + default 3 
> +endmenu
> +
> +config I2C_BFIN_GPIO_CYCLE_DELAY
> + int "Cycle Delay in usec"
> + depends on I2C_BFIN_GPIO
> + range 1 100 
> + default 40

This should really not be a kernel configuration option. Please turn it
into a kernel module parameter or a sysfs attribute if you really need
it. Also note that we already have an interface to change this
value from user-space (using an ioctl on /dev/i2c-N) and that might be
sufficient for your needs.

And allowing 1 usec delay is probably not a good idea, I don't
recommend values below 6 usec with i2c-algo-bit.

> +
> +config I2C_BFIN_TWI
> + tristate "Blackfin TWI I2C support"
> + depends on I2C && (BF534 || BF536 || BF537)
> + help
> +   This the TWI I2C device driver for Blackfin 534/536/537.
> +
> +   This driver can also be built as a module.  If so, the module
> +   will be called i2c-bfin-twi.
> +
> +config TWICLK_KHZ
> + int "TWI clock (kHZ)"

kHz

> + depends on I2C_BFIN_TWI
> + default 50
> + help
> +   The unit of the TWI clock is kilo HZ. Please divide the clock 
> +   by 1024 if you count it in HZ. The value should be less than 400.

Why don't you use "range" here too to ensure that the value is actually
less than 400? Either way, same as above, IMHO this should not be a
compilation-time decision.

A kHz is really 1000 Hz, not 1024. And everybody skilled enough to
configure a kernel should know that, I doubt it's worth reminding.

> +
>  config I2C_ALI1535
>   tristate "ALI 1535"
>   depends on I2C && PCI

All these options won't work really well until you also change
drivers/i2c/busses/Makefile to make something useful with them...

-- 
Jean Delvare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/6] mm: fix fault vs invalidate race for linear mappings

2007-03-06 Thread Nick Piggin
On Tue, Mar 06, 2007 at 10:36:41PM -0800, Andrew Morton wrote:
> On Wed, 21 Feb 2007 05:50:05 +0100 (CET) Nick Piggin <[EMAIL PROTECTED]> 
> wrote:
> 
> > Fix the race between invalidate_inode_pages and do_no_page.
> > 
> > Andrea Arcangeli identified a subtle race between invalidation of
> > pages from pagecache with userspace mappings, and do_no_page.
> > 
> > The issue is that invalidation has to shoot down all mappings to the
> > page, before it can be discarded from the pagecache. Between shooting
> > down ptes to a particular page, and actually dropping the struct page
> > from the pagecache, do_no_page from any process might fault on that
> > page and establish a new mapping to the page just before it gets
> > discarded from the pagecache.
> > 
> > The most common case where such invalidation is used is in file
> > truncation. This case was catered for by doing a sort of open-coded
> > seqlock between the file's i_size, and its truncate_count.
> > 
> > Truncation will decrease i_size, then increment truncate_count before
> > unmapping userspace pages; do_no_page will read truncate_count, then
> > find the page if it is within i_size, and then check truncate_count
> > under the page table lock and back out and retry if it had
> > subsequently been changed (ptl will serialise against unmapping, and
> > ensure a potentially updated truncate_count is actually visible).
> > 
> > Complexity and documentation issues aside, the locking protocol fails
> > in the case where we would like to invalidate pagecache inside i_size.
> > do_no_page can come in anytime and filemap_nopage is not aware of the
> > invalidation in progress (as it is when it is outside i_size). The
> > end result is that dangling (->mapping == NULL) pages that appear to
> > be from a particular file may be mapped into userspace with nonsense
> > data. Valid mappings to the same place will see a different page.
> > 
> > Andrea implemented two working fixes, one using a real seqlock,
> > another using a page->flags bit. He also proposed using the page lock
> > in do_no_page, but that was initially considered too heavyweight.
> > However, it is not a global or per-file lock, and the page cacheline
> > is modified in do_no_page to increment _count and _mapcount anyway, so
> > a further modification should not be a large performance hit.
> > Scalability is not an issue.
> > 
> > This patch implements this latter approach. ->nopage implementations
> > return with the page locked if it is possible for their underlying
> > file to be invalidated (in that case, they must set a special vm_flags
> > bit to indicate so). do_no_page only unlocks the page after setting
> > up the mapping completely. invalidation is excluded because it holds
> > the page lock during invalidation of each page (and ensures that the
> > page is not mapped while holding the lock).
> > 
> > This also allows significant simplifications in do_no_page, because
> > we have the page locked in the right place in the pagecache from the
> > start.
> > 
> 
> Why was truncate_inode_pages_range() altered to unmap the page if it got
> mapped again?
> 
> Oh.  Because the unmap_mapping_range() call got removed from vmtruncate(). 
> Why?  (Please send suitable updates to the changelog).

We have to ensure it is unmapped, and be prepared to unmap it while under
the page lock.

> I guess truncate of a mmapped area isn't sufficiently common to worry about
> the inefficiency of this change.

Yeah, and it should be more efficient for files that aren't mmapped,
because we don't have to take i_mmap_lock for them.

> Lots of memory barriers got removed in memory.c, unchangeloggedly.

Yeah they were all for the lockless truncate_count checks. Now that
we use the page lock, we don't need barriers.

> Gratuitous renaming of locals in do_no_page() makes the change hard to
> review.  Should have been a separate patch.
> 
> In fact, the patch would have been heaps clearer if that renaming had been
> a separate patch.

Shall I?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGFIX][PATCH] fix NULL pointer in ia64/irq_chip-mask/unmask function

2007-03-06 Thread Andrew Morton
On Wed, 7 Mar 2007 15:23:17 +0900 KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:

> This patch fixes boot failure because irq_desc->mask() is NULL.
> 
> - Added mask/unmask functions to ia64's irq desc function table.
>   But I'm not sure this fix is correct or not. please review.
> 
> - rename hw_interrupt_type to irq_chip. hw_interrupt_type is old name.

Thanks.

This bug is present in mainline too, isn't it?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v2] epoll use a single inode ...

2007-03-06 Thread Eric Dumazet

Linus Torvalds a écrit :


On Tue, 6 Mar 2007, Eric Dumazet wrote:

I did a user space program, attached to this mail.

I rewrote the reciprocal_div() for i386 so that one multiply is used.


Ok, this is definitely faster on Core 2 as well, so "numbers talk, 
bullshit walks". No more objections.


And the numbers were ? :)



(That said, I bet you could do even better for octal and hex numbers, so 
if you *really* want to speed things up, you should just make a 
special-case routine for each base (there's just three of them), and you 
can then also optimize the base-10 thing much better (you can do two 
digits at a time by dividing by 100, etc)


Well, given that sprintf() is frequently called only for pipe/sockets 
creation, we probably better :


1) wait a very clever idea to suppress individual dentry per pipe/sockets (no 
more sprintf() at pipe/socket setup)


2) delay the sprintf() only if needed as you mentioned in a previous mail 
(when someone wants ls -l /proc/pid/fd/), since their dentries are not 
anymore inserted in the global dcache hash, they could stay with a (nul) dname.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6.21-rc3] cpufreq: p4-clockmod.c compilation error

2007-03-06 Thread Dave Jones
On Tue, Mar 06, 2007 at 10:33:05PM -0800, David Rientjes wrote:
 > arch/x86_64/kernel/built-in.o: In function 
 > `cpufreq_p4_verify':p4-clockmod.c:(.text.cpufreq_p4_verify+0x8): undefined 
 > reference to `cpufreq_frequency_table_verify'
 > arch/x86_64/kernel/built-in.o: In function 
 > `cpufreq_p4_cpu_exit':p4-clockmod.c:(.text.cpufreq_p4_cpu_exit+0x8): 
 > undefined reference to `cpufreq_frequency_table_put_attr'
 > arch/x86_64/kernel/built-in.o: In function 
 > `cpufreq_p4_cpu_init':p4-clockmod.c:(.text.cpufreq_p4_cpu_init+0x13b): 
 > undefined reference to `cpufreq_frequency_table_get_attr'
 > :p4-clockmod.c:(.text.cpufreq_p4_cpu_init+0x163): undefined reference to 
 > `cpufreq_frequency_table_cpuinfo'
 > arch/x86_64/kernel/built-in.o: In function 
 > `cpufreq_p4_target':p4-clockmod.c:(.text.cpufreq_p4_target+0x21): undefined 
 > reference to `cpufreq_frequency_table_target'
 > arch/x86_64/kernel/built-in.o: In function 
 > `k8nops':alternative.c:(.data+0x2b70): undefined reference to 
 > `cpufreq_freq_attr_scaling_available_freqs'

 > CONFIG_CPU_FREQ=y
 > CONFIG_CPU_FREQ_TABLE=m

 > CONFIG_X86_P4_CLOCKMOD=y

So P4_CLOCKMOD does a 'select CPU_FREQ_TABLE', but for some reason, that makes 
it =m,
not the same as whatever the option that is doing the 'select' is set to
(which is what I thought it did).

Given the cpufreq table code is tiny anyway, I'm wondering if its worth the
pain of having it be modular, instead just making it be built-in to cpufreq.

Give the diff below a shot?

Dave

diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
index d155e81..74747d9 100644
--- a/drivers/cpufreq/Kconfig
+++ b/drivers/cpufreq/Kconfig
@@ -16,7 +16,7 @@ config CPU_FREQ
 if CPU_FREQ
 
 config CPU_FREQ_TABLE
-   tristate
+   bool
 
 config CPU_FREQ_DEBUG
bool "Enable CPUfreq debugging"


-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Blackfin: blackfin on-chip SPI controller driver

2007-03-06 Thread Wu, Bryan
Hi alls,

Could you please give some feedback about this patch? I noticed some
coding style issues and will update this patch according to your kindly
review.

Thanks 
-Bryan Wu
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] epoll use a single inode ...

2007-03-06 Thread Eric Dumazet

Linus Torvalds a écrit :


I assume that the *only* reason for having multiple dentries is really 
just the output in /proc//fd/, right? Or is there any other reason to 
have separate dentries for these pseudo-files?


It's a bit sad to waste that much memory (and time) on something like 
that. I bet that the dentry setup is a noticeable part of the whole 
sigfd()/timerfd() setup. It's likely also a big part of any memory 
footprint if you have lots of them.


So how about just doing:
 - do a single dentry
 - make a "struct file_operations" member function that prints out the 
   name of the thing in /proc//fd/, and which *defaults* to just 
   doing the d_path() on the dentry, but special filesystems like this 
   could do something else (like print out a fake inode number from the 
   "file->f_private_data" information)


There seems to really be no downsides to that approach. No existing 
filesystem will even notice (they'll all have NULL in the new f_op 
member), and it would allow pipes etc to be sped up and use less memory.




I would definitly *love* saving dentries for pipes (and sockets too), but how 
are you going to get the inode ?


pipes()/sockets() can use read()/write()/rw_verify_area() and thus need 
file->f_path.dentry->d_inode (so each pipe needs a separate dentry)


Are you suggesting adding a new "struct file_operations" member to get the 
inode ?
Or re-intoducing an inode pointer in struct file ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 0/7] Resource controllers based on process containers

2007-03-06 Thread Balbir Singh

Pavel Emelianov wrote:

This patchset adds RSS, accounting and control and
limiting the number of tasks and files within container.

Based on top of Paul Menage's container subsystem v7

RSS controller includes per-container RSS accounter,
reclamation and OOM killer. It behaves like standalone
machine - when container runs out of resources it tries
to reclaim some pages and if it doesn't succeed in it
kills some task which mm_struct belongs to container in
question.

Num tasks and files containers are very simple and
self-descriptive from code.

As discussed before when a task moves from one container
to another no resources follow it - they keep holding the
container they were allocated in.



I have one problem with the patchset, I cannot compile
the patches individually and some of the code is hard
to read as it depends on functions from future patches.
Patch 2, 3 and 4 fail to compile without patch 5 applied.

Patch 1 failed to apply with a reject in kernel/Makefile
I applied it on top of 2.6.20 with all of Paul Menage's
patches (all 7).



--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-06 Thread Andrew Morton
On Wed, 21 Feb 2007 05:50:17 +0100 (CET) Nick Piggin <[EMAIL PROTECTED]> wrote:

> Nonlinear mappings are (AFAIKS) simply a virtual memory concept that
> encodes the virtual address -> file offset differently from linear
> mappings.
> 
> I can't see why the filesystem/pagecache code should need to know anything
> about it, except for the fact that the ->nopage handler didn't quite pass
> down enough information (ie. pgoff). But it is more logical to pass pgoff
> rather than have the ->nopage function calculate it itself anyway. And
> having the nopage handler install the pte itself is sort of nasty.
> 
> This patch introduces a new fault handler that replaces ->nopage and
> ->populate and (later) ->nopfn. Most of the old mechanism is still in place
> so there is a lot of duplication and nice cleanups that can be removed if
> everyone switches over.
> 
> The rationale for doing this in the first place is that nonlinear mappings
> are subject to the pagefault vs invalidate/truncate race too, and it seemed
> stupid to duplicate the synchronisation logic rather than just consolidate
> the two.
> 

It's awkward to layer a largely do-nothing patch like this on top of a
significant functional change.  Makes it harder to isolate the source of
regressions, harder to revert the do-something patch.

> After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
> pagecache. Seems like a fringe functionality anyway.

Does Ingo agree?

> NOPAGE_REFAULT is removed. This should be implemented with ->fault, and
> no users have hit mainline yet.

Did benh agree with that?


The patch unchangeloggedly adds a basic new structure to core mm
(fault_data).  Would be nice to document its fields, especially `flags'.


Please add less pointless blank lines.


How well has this been tested?  The ocfs2 changes?  gfs2?  We should at
least give those guys a heads-up.


Does anybody really pass a NULL `type' arg into filemap_nopage()?


This patch seems to churn things around an awful lot for minimal benefit.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Wanted: simple, safe x86 stack overflow detection

2007-03-06 Thread Bill Irwin
At some point in the past, I wrote:
>> I'm certainly in favor of the move; IRQ stacks could be made
>> rather deep and cheaply at that. I may get around to writing it this
>> week if no one else does it first.

On Tue, Mar 06, 2007 at 08:28:35PM -0800, Arjan van de Ven wrote:
> the irq stacks aren't the problem; RH at some point accidentally shipped
> a kernel with 4k *shared* irq/user context stack and even that gave
> almost no issues.
> irq's really shouldn't actually nest; it's bad for just about everything
> to do that (but that's another story, I would *love* to get rid of the
> "enable irqs" thing in the x86 irq path, it hurts just about anything in
> reality)

What do you see as the obstacle to eliminating nested IRQ's? It doesn't
seem so far out to test for being on the interrupt stack and defer the
call to do_IRQ() until after the currently-running instance of do_IRQ()
has returned, or to move to per-irq stacks modulo special arrangements
for the per-cpu IRQ's. Or did you have other methods in mind?


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] Blackfin: blackfin i2c driver

2007-03-06 Thread Wu, Bryan
> > 
> > OK, I change it into yield(). So, current process will be move to the
> > tail of the run queue. Is that OK with you?
> 
> Nope, yield is terribly bad when there are busy processes running: it can
> stall for a very long time indeed,
> 
> Is this hardware not capable of generating an interrupt when BUSBUSY gets
> negated?
> 
> I guess not, in which case you're stuck with having to poll it - probably
> use a cond_resched() in the loop, and an angry comment.

Thanks, we fix it. please pick up this one.

[PATCH] Blackfin: blackfin i2c driver

The i2c linux driver for blackfin architecture which supports both GPIO
i2c operation and blackfin on-chip TWI controller i2c operation.

Signed-off-by: Bryan Wu <[EMAIL PROTECTED]> 
---

 drivers/i2c/busses/Kconfig |   47 
 drivers/i2c/busses/i2c-bfin-gpio.c |   98 +
 drivers/i2c/busses/i2c-bfin-twi.c  |  589 

 3 files changed, 734 insertions(+)

Index: linux-2.6/drivers/i2c/busses/Kconfig
===
--- linux-2.6.orig/drivers/i2c/busses/Kconfig   2007-03-07 13:32:02.0 
+0800
+++ linux-2.6/drivers/i2c/busses/Kconfig2007-03-07 13:44:19.0 
+0800
@@ -5,6 +5,53 @@
 menu "I2C Hardware Bus support"
depends on I2C
 
+config I2C_BFIN_GPIO
+   tristate "Generic Blackfin and HHBF533/561 development board I2C 
support"
+   depends on I2C && EXPERIMENTAL
+   select I2C_ALGOBIT
+   help
+   --
+
+menu "BFIN I2C SDA/SCL Selection"
+   depends on I2C_BFIN_GPIO
+config BFIN_SDA
+   int "SDA is GPIO Number"
+   range 0 15 if (BF533 || BF532 || BF531) 
+   range 0 47 if (BF534 || BF536 || BF537)
+   range 0 47 if BF561
+   default 2 if (BF533 || BF532 || BF531) 
+
+config BFIN_SCL
+   int "SCL is GPIO Number"
+   range 0 15 if (BF533 || BF532 || BF531) 
+   range 0 47 if (BF534 || BF536 || BF537)
+   range 0 47 if BF561
+   default 3 
+endmenu
+
+config I2C_BFIN_GPIO_CYCLE_DELAY
+   int "Cycle Delay in usec"
+   depends on I2C_BFIN_GPIO
+   range 1 100 
+   default 40
+
+config I2C_BFIN_TWI
+   tristate "Blackfin TWI I2C support"
+   depends on I2C && (BF534 || BF536 || BF537)
+   help
+ This the TWI I2C device driver for Blackfin 534/536/537.
+
+ This driver can also be built as a module.  If so, the module
+ will be called i2c-bfin-twi.
+
+config TWICLK_KHZ
+   int "TWI clock (kHZ)"
+   depends on I2C_BFIN_TWI
+   default 50
+   help
+ The unit of the TWI clock is kilo HZ. Please divide the clock 
+ by 1024 if you count it in HZ. The value should be less than 400.
+
 config I2C_ALI1535
tristate "ALI 1535"
depends on I2C && PCI
Index: linux-2.6/drivers/i2c/busses/i2c-bfin-gpio.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6/drivers/i2c/busses/i2c-bfin-gpio.c2007-03-07 
13:44:19.0 +0800
@@ -0,0 +1,98 @@
+/
+ * Description: *
+ *  *
+ * Maintainer: Meihui Fan <[EMAIL PROTECTED]>  *
+ *  *
+ * CopyRight (c)  2004  HHTech  *
+ *   www.hhcn.com, www.hhcn.org *
+ *   All rights reserved.   *
+ *  *
+ * This file is free software;  *
+ *   you are free to modify and/or redistribute it *
+ *   under the terms of the GNU General Public Licence (GPL).   *
+ *  *
+ /
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#defineI2C_HW_B_HHBF   0x13
+
+static void hhbf_setsda(void *data, int state)
+{
+   if (state) {
+   gpio_direction_input(CONFIG_BFIN_SDA);
+
+   } else {
+   gpio_direction_output(CONFIG_BFIN_SDA);
+   gpio_set_value(CONFIG_BFIN_SDA, 0);
+   }
+}
+
+static void hhbf_setscl(void *data, int state)
+{
+   gpio_set_value(CONFIG_BFIN_SCL, state);
+}
+
+static int hhbf_getsda(void *data)
+{
+   return (gpio_get_value(CONFIG_BFIN_SDA) != 0);
+}
+
+
+static struct i2c_algo_bit_data bit_hhbf_data = {
+   .setsda  = hhbf_setsda,
+   .setscl  = hhbf_setscl,
+   .getsda  = hhbf_getsda,
+   .udelay  = CONFIG_I2C_BFIN_GPIO_CYCLE_DELAY,
+   .timeout = HZ
+};
+
+static struct i2c_adapter hhbf_ops = {
+   .owner  = THIS_MODULE,
+   .id = 

Re: [patch 3/6] mm: fix fault vs invalidate race for linear mappings

2007-03-06 Thread Andrew Morton
On Wed, 21 Feb 2007 05:50:05 +0100 (CET) Nick Piggin <[EMAIL PROTECTED]> wrote:

> Fix the race between invalidate_inode_pages and do_no_page.
> 
> Andrea Arcangeli identified a subtle race between invalidation of
> pages from pagecache with userspace mappings, and do_no_page.
> 
> The issue is that invalidation has to shoot down all mappings to the
> page, before it can be discarded from the pagecache. Between shooting
> down ptes to a particular page, and actually dropping the struct page
> from the pagecache, do_no_page from any process might fault on that
> page and establish a new mapping to the page just before it gets
> discarded from the pagecache.
> 
> The most common case where such invalidation is used is in file
> truncation. This case was catered for by doing a sort of open-coded
> seqlock between the file's i_size, and its truncate_count.
> 
> Truncation will decrease i_size, then increment truncate_count before
> unmapping userspace pages; do_no_page will read truncate_count, then
> find the page if it is within i_size, and then check truncate_count
> under the page table lock and back out and retry if it had
> subsequently been changed (ptl will serialise against unmapping, and
> ensure a potentially updated truncate_count is actually visible).
> 
> Complexity and documentation issues aside, the locking protocol fails
> in the case where we would like to invalidate pagecache inside i_size.
> do_no_page can come in anytime and filemap_nopage is not aware of the
> invalidation in progress (as it is when it is outside i_size). The
> end result is that dangling (->mapping == NULL) pages that appear to
> be from a particular file may be mapped into userspace with nonsense
> data. Valid mappings to the same place will see a different page.
> 
> Andrea implemented two working fixes, one using a real seqlock,
> another using a page->flags bit. He also proposed using the page lock
> in do_no_page, but that was initially considered too heavyweight.
> However, it is not a global or per-file lock, and the page cacheline
> is modified in do_no_page to increment _count and _mapcount anyway, so
> a further modification should not be a large performance hit.
> Scalability is not an issue.
> 
> This patch implements this latter approach. ->nopage implementations
> return with the page locked if it is possible for their underlying
> file to be invalidated (in that case, they must set a special vm_flags
> bit to indicate so). do_no_page only unlocks the page after setting
> up the mapping completely. invalidation is excluded because it holds
> the page lock during invalidation of each page (and ensures that the
> page is not mapped while holding the lock).
> 
> This also allows significant simplifications in do_no_page, because
> we have the page locked in the right place in the pagecache from the
> start.
> 

Why was truncate_inode_pages_range() altered to unmap the page if it got
mapped again?

Oh.  Because the unmap_mapping_range() call got removed from vmtruncate(). 
Why?  (Please send suitable updates to the changelog).

I guess truncate of a mmapped area isn't sufficiently common to worry about
the inefficiency of this change.

Lots of memory barriers got removed in memory.c, unchangeloggedly.

Gratuitous renaming of locals in do_no_page() makes the change hard to
review.  Should have been a separate patch.

In fact, the patch would have been heaps clearer if that renaming had been
a separate patch.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6.21-rc3] cpufreq: p4-clockmod.c compilation error

2007-03-06 Thread David Rientjes
arch/x86_64/kernel/built-in.o: In function 
`cpufreq_p4_verify':p4-clockmod.c:(.text.cpufreq_p4_verify+0x8): undefined 
reference to `cpufreq_frequency_table_verify'
arch/x86_64/kernel/built-in.o: In function 
`cpufreq_p4_cpu_exit':p4-clockmod.c:(.text.cpufreq_p4_cpu_exit+0x8): undefined 
reference to `cpufreq_frequency_table_put_attr'
arch/x86_64/kernel/built-in.o: In function 
`cpufreq_p4_cpu_init':p4-clockmod.c:(.text.cpufreq_p4_cpu_init+0x13b): 
undefined reference to `cpufreq_frequency_table_get_attr'
:p4-clockmod.c:(.text.cpufreq_p4_cpu_init+0x163): undefined reference to 
`cpufreq_frequency_table_cpuinfo'
arch/x86_64/kernel/built-in.o: In function 
`cpufreq_p4_target':p4-clockmod.c:(.text.cpufreq_p4_target+0x21): undefined 
reference to `cpufreq_frequency_table_target'
arch/x86_64/kernel/built-in.o: In function 
`k8nops':alternative.c:(.data+0x2b70): undefined reference to 
`cpufreq_freq_attr_scaling_available_freqs'

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.21-rc3
# Tue Mar  6 21:53:36 2007
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_ZONE_DMA32=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_CMPXCHG=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_DMI=y
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
# CONFIG_EXPERIMENTAL is not set
CONFIG_BROKEN_ON_SMP=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
# CONFIG_SYSVIPC is not set
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
# CONFIG_TASK_DELAY_ACCT is not set
# CONFIG_TASK_XACCT is not set
# CONFIG_UTS_NS is not set
# CONFIG_AUDIT is not set
# CONFIG_IKCONFIG is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_EMBEDDED=y
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
# CONFIG_HOTPLUG is not set
# CONFIG_PRINTK is not set
CONFIG_BUG=y
# CONFIG_ELF_CORE is not set
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set

#
# Loadable module support
#
CONFIG_MODULES=y
# CONFIG_MODULE_UNLOAD is not set
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_KMOD=y

#
# Block layer
#
# CONFIG_BLOCK is not set

#
# Processor type and features
#
CONFIG_X86_PC=y
# CONFIG_X86_VSMP is not set
# CONFIG_MK8 is not set
CONFIG_MPSC=y
# CONFIG_MCORE2 is not set
# CONFIG_GENERIC_CPU is not set
CONFIG_X86_L1_CACHE_BYTES=128
CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_X86_INTERNODE_CACHE_BYTES=128
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_MICROCODE=y
CONFIG_MICROCODE_OLD_INTERFACE=y
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
# CONFIG_MTRR is not set
# CONFIG_SMP is not set
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
# CONFIG_PREEMPT_BKL is not set
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_RESOURCES_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_HPET_TIMER=y
# CONFIG_IOMMU is not set
# CONFIG_X86_MCE is not set
CONFIG_KEXEC=y
CONFIG_PHYSICAL_START=0x20
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
CONFIG_HZ_300=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=300
CONFIG_REORDER=y
CONFIG_K8_NB=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_ISA_DMA_API=y

#
# Power management options
#
CONFIG_PM=y
# CONFIG_PM_LEGACY is not set
# CONFIG_PM_DEBUG is not set

#
# ACPI (Advanced Configuration and Power Interface) Support
#
CONFIG_ACPI=y
# CONFIG_ACPI_SLEEP is not set
CONFIG_ACPI_PROCFS=y
CONFIG_ACPI_AC=y
# CONFIG_ACPI_BATTERY is not set
# CONFIG_ACPI_BUTTON is not set
CONFIG_ACPI_VIDEO=m
CONFIG_ACPI_FAN=m
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_ASUS=m
# CONFIG_ACPI_IBM is not set
CONFIG_ACPI_TOSHIBA=m
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y
# CONFIG_X86_PM_TIMER is not set

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=m
# CONFIG_CPU_FREQ_DEBUG is not set
# CONFIG_CPU_FREQ_STAT is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE=y
# CONFIG_CPU_FREQ_GOV_PERFORMANCE is not set

Re: [2.6.22 patch] the scheduled removal of OBSOLETE_OSS options

2007-03-06 Thread Willy Tarreau
On Tue, Mar 06, 2007 at 06:55:04PM +0100, Adrian Bunk wrote:
> On Tue, Mar 06, 2007 at 12:46:22PM -0500, Bill Davidsen wrote:
> > Adrian Bunk wrote:
> > >This patch contains the scheduled removal of the OBSOLETE_OSS options 
> > >for 2.6.22.
> > >
> > If these are drivers for which there are thought to be useful ALSA 
> > drivers, would it be reasonable to leave a stub for a help file naming 
> > the driver which claims to support the hardware?
> > 
> > I'm not objection to the removal of the drivers, just noting that 
> > identifying the new drivers can be made easier.
> 
> People compiling their own kernels aren't completely dumb - if you know 
> about people having problems finding the right ALSA driver for their 
> hardware, please name the concrete problems so that we can improve the 
> description and/or help text of these ALSA options.

Real problem is that we can expect several "sound does not work anymore"
because people doing "make oldconfig" will get no warning at all about
the removed options. Remember people complaining about keyboard not working ?
Perhaps the real problem is more Kconfig than OSS, but it would be fine if
we found a solution to enumerate the list of options which have been removed
when they do their make oldconfig.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc2-mm2

2007-03-06 Thread Jean Delvare
Hi J.A.,

On Tue, 6 Mar 2007 16:46:09 +0100, J.A. Magallón wrote:
> On Tue, 6 Mar 2007 00:44:08 -0800, Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > Temporarily at
> > 
> >   http://userweb.kernel.org/~akpm/2.6.21-rc2-mm2/
> 
> I have another question about i2c...
> 
> The 'sensors' program gives me a stange message:
> 
> w83627thf-i2c-9191-290
> Can't get adapter name for bus 9191<-
> VCore: +1.49 V  (min =  +1.94 V, max =  +1.94 V)   ALARM  
> +12.0V:   +11.86 V  (min = +10.82 V, max = +13.19 V)  
> + 3.3V:+3.30 V  (min =  +3.14 V, max =  +3.47 V)  
> ...
> 
> And gnome-sensors-applet can't read the sensors. If using libsensors, no

As a side note, this is bad design from gnome-sensors-applet. They
should definitely not plain stop just because they failed to retrieve
the i2c_adapter name, when all the monitored values are otherwise
available.

> value is displayed (I suppose an applicacion bug). And the access to sensors
> directly through i2c-dev gives an error like this:
> 
> Error opening sensor device file:
> /sys/devices/platform/i2c-9191/9191-0290/
> 
> In fact, the real path is 
> 
> /sys/devices/platform/i2c-adapter:i2c-9191/9191-0290/
> 
> I supposed it was a kernel change not tracked by userspace, but the strange
> thing is that looking at the code the sensors applet lists the sensors
> reding directories and files in /sys (AFAICS in the code).
> So perhaps there is a little inconsistency, /sys says in some place the
> sensor is at x, when it really is at y.
> Or the 'i2c-adapter:' is a bug and should be 'i2c-adapter/'.
> 
> ???

See:
http://bugzilla.kernel.org/show_bug.cgi?id=8115

lm-sensors SVN should work fine:
http://dl.lm-sensors.org/lm-sensors/snapshots/lm-sensors-r4338-20070305.tar.bz2
If not, please report.

We will release it as lm-sensors 2.10.3 soon.

-- 
Jean Delvare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG() during suspend to disk (2.6.21-rc2, x86_64)

2007-03-06 Thread Vivek Goyal
On Tue, Mar 06, 2007 at 09:56:46PM +0100, Rafael J. Wysocki wrote:
> Hi,
> 
> On Tuesday, 6 March 2007 11:32, Vivek Goyal wrote:
> > Hi,
> > 
> > I see following BUG() on serial console while hibernating on a x86_64
> > machine. I am using 2.6.21-rc2 kernel.
> 
> I see it too.
> 
> > BUG: at arch/x86_64/kernel/acpi/sleep.c:70 init_low_mapping()
> > 
> > Call Trace:
> >  [] acpi_save_state_mem+0x70/0xd6
> >  [] acpi_pm_enter+0x23/0xc1
> >  [] pm_suspend_disk+0x1ac/0x228
> >  [] enter_state+0x50/0x1e6
> >  [] acpi_system_write_sleep+0x5c/0x79
> >  [] vfs_write+0xad/0x136
> >  [] sys_write+0x45/0x6e
> >  [] system_call+0x7e/0x83
> 
> Hm, it doesn't like the fact that nonboot CPUs are online at that point, but
> we don't do anything to disable them.  Should we?
> 
> I think the appended patch might work.
> 

Yes. It does work for me. Thanks

Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUGFIX][PATCH] fix NULL pointer in ia64/irq_chip-mask/unmask function

2007-03-06 Thread KAMEZAWA Hiroyuki
This patch fixes boot failure because irq_desc->mask() is NULL.

- Added mask/unmask functions to ia64's irq desc function table.
  But I'm not sure this fix is correct or not. please review.

- rename hw_interrupt_type to irq_chip. hw_interrupt_type is old name.

Signed-Off-By: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>

Original BUG I met (following) was caused at irq 57:uhci_hcd:usb3.
 xxBUG DESCRIPTIONxx
 Unable to handle kernel NULL pointer dereference (address )
yum-updatesd[3461]: Oops 11012296146944 [1]
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 vfat fat 
dm_mirror dm_mod button parport_pc lp parport sg tg3 e100 shpchp mii 
usb_storage lpfc scsi_transport_fc mptspi mptscsih mptbase scsi_transport_spi 
sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd

Pid: 3461, CPU 5, comm: yum-updatesd
psr : 121008022018 ifs : 8286 ip  : []Not 
tainted
ip is at move_native_irq+0x91/0x140
unat:  pfs : 0205 rsc : 0003
rnat:  bsps:  pr  : 005a56a9
ldrs:  ccv :  fpsr: 0009804c0270033f
csd :  ssd : 
b0  : a00100050430 b6  : a0021b49d0c0 b7  : a00100050400
f6  : 1003e0de0 f7  : 1003e0060
f8  : 1003e0025 f9  : 1003e22e0
f10 : 1003e0060 f11 : 1003e005d
r1  : a00100d8d230 r2  : a001009f8c30 r3  : 5580
r8  : a00100b8e178 r9  : 00ab r10 : 0039
r11 : 0020 r12 : e1404b6f7e30 r13 : e1404b6f
r14 : 0020 r15 : a001009f8c00 r16 : a00100b5b160
r17 : dead4ead r18 : a001009f8c4c r19 : 
r20 : a00100b5b110 r21 : a00100b5b140 r22 : a00100b5b110
r23 : 005020050874 r24 : a00100ba98c0 r25 : a0021b53d768
r26 : e0018007e030 r27 : a00100786238 r28 : e00040004ae0
r29 : a001009f8c50 r30 : 0005 r31 : a001009f8c58

Call Trace:
 [] show_stack+0x40/0xa0
sp=e1404b6f79c0 bsp=e1404b6f1030
 [] show_regs+0x840/0x880
sp=e1404b6f7b90 bsp=e1404b6f0fd0
 [] die+0x1c0/0x2a0
sp=e1404b6f7b90 bsp=e1404b6f0f88
 [] ia64_do_page_fault+0x8d0/0xa00
sp=e1404b6f7bb0 bsp=e1404b6f0f38
 [] ia64_leave_kernel+0x0/0x270
sp=e1404b6f7c60 bsp=e1404b6f0f38
 [] move_native_irq+0x90/0x140
sp=e1404b6f7e30 bsp=e1404b6f0f08
 [] iosapic_end_level_irq+0x30/0xe0
sp=e1404b6f7e30 bsp=e1404b6f0ee8
 [] __do_IRQ+0x390/0x3c0
sp=e1404b6f7e30 bsp=e1404b6f0ea8
 [] ia64_handle_irq+0x1e0/0x2e0
sp=e1404b6f7e30 bsp=e1404b6f0e78
 [] ia64_leave_kernel+0x0/0x270
sp=e1404b6f7e30 bsp=e1404b6f0e78
Kernel panic - not syncing: Aiee, killing interrupt handler!


---
 arch/ia64/kernel/iosapic.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Index: devel-tree/arch/ia64/kernel/iosapic.c
===
--- devel-tree.orig/arch/ia64/kernel/iosapic.c
+++ devel-tree/arch/ia64/kernel/iosapic.c
@@ -446,7 +446,7 @@ iosapic_end_level_irq (unsigned int irq)
 #define iosapic_disable_level_irq  mask_irq
 #define iosapic_ack_level_irq  nop
 
-struct hw_interrupt_type irq_type_iosapic_level = {
+struct irq_chip irq_type_iosapic_level = {
.name = "IO-SAPIC-level",
.startup =  iosapic_startup_level_irq,
.shutdown = iosapic_shutdown_level_irq,
@@ -454,6 +454,8 @@ struct hw_interrupt_type irq_type_iosapi
.disable =  iosapic_disable_level_irq,
.ack =  iosapic_ack_level_irq,
.end =  iosapic_end_level_irq,
+   .mask = mask_irq,
+   .unmask =   unmask_irq,
.set_affinity = iosapic_set_affinity
 };
 
@@ -493,7 +495,7 @@ iosapic_ack_edge_irq (unsigned int irq)
 #define iosapic_disable_edge_irq   nop
 #define iosapic_end_edge_irq   nop
 
-struct hw_interrupt_type irq_type_iosapic_edge = {
+struct irq_chip irq_type_iosapic_edge = {
.name = "IO-SAPIC-edge",
.startup =  iosapic_startup_edge_irq,
.shutdown = iosapic_disable_edge_irq,
@@ -501,6 +503,8 @@ struct hw_interrupt_type irq_type_iosapi
.disable =  iosapic_disable_edge_irq,
.ack =  iosapic_ack_edge_irq,
.end =  iosapic_end_edge_irq,
+   .mask = mask_irq,
+   .unmask =   unmask_irq,
.set_affinity = iosapic_set_affinity
 };
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a 

Update to cube root benchmark code

2007-03-06 Thread Willy Tarreau
Hi Stephen,

Thanks for this code, it's easy to experiment with it.
Let me propose this simple update with a variation on your ncubic() function.
I noticed that all intermediate results were far below 32 bits, so I did a
new version which is 30% faster on my athlon with the same results. This is
because we only use x and a/x^2 in the function, with x very close to cbrt(a).
So a/x^2 is very close to cbrt(a) which is at most 22 bits. So we only use
the 32 lower bits of the result of div64_64(), and all intermediate
computations can be done on 32 bits (including multiplies and divides).

[EMAIL PROTECTED]:~$ ./bictcp 
Calibrating
Function clocks  mean(us) max(us)  std(us)  Avg error
bictcp 1085 0.7028.19 2.30 0.172%
ocubic  869 0.5622.76 1.23 0.274%
ncubic  637 0.4116.29 1.41 0.247%
ncubic32435 0.2811.18 1.03 0.247%
acbrt   824 0.5321.03 0.85 0.275%
hcbrt   547 0.3513.96 0.42 1.580%

I also tried to improve a bit by checking for early convergence and
returning before last divide, but it is worthless because it almost
never happens so it does not make the code any faster.

Here's the code. I think that it would be fine if we merged this
version since it's supposed to behave better on most 32 bits machines.

Best regards,
Willy

/*
Here is a better version of the benchmark code.
It has the original code used in 2.4 version of Cubic for comparison

---
*/
/* Test and measure perf of cube root algorithms.  */
#include 
#include 
#include 
#include 
#include 

#ifdef __x86_64

#define rdtscll(val) do { \
 unsigned int __a,__d; \
 asm volatile("rdtsc" : "=a" (__a), "=d" (__d)); \
 (val) = ((unsigned long)__a) | (((unsigned long)__d)<<32); \
} while(0)

# define do_div(n,base) ({  \
uint32_t __base = (base);   \
uint32_t __rem; \
__rem = ((uint64_t)(n)) % __base;   \
(n) = ((uint64_t)(n)) / __base; \
__rem;  \
 })


/**
 * __ffs - find first bit in word.
 * @word: The word to search
 *
 * Undefined if no bit exists, so code should check against 0 first.
 */
static __inline__ unsigned long __ffs(unsigned long word)
{
__asm__("bsfq %1,%0"
:"=r" (word)
:"rm" (word));
return word;
}

/*
 * __fls: find last bit set.
 * @word: The word to search
 *
 * Undefined if no zero exists, so code should check against ~0UL first.
 */
static inline unsigned long __fls(unsigned long word)
{
__asm__("bsrq %1,%0"
:"=r" (word)
:"rm" (word));
return word;
}

/**
 * ffs - find first bit set
 * @x: the word to search
 *
 * This is defined the same way as
 * the libc and compiler builtin ffs routines, therefore
 * differs in spirit from the above ffz (man ffs).
 */
static __inline__ int ffs(int x)
{
int r;

__asm__("bsfl %1,%0\n\t"
"cmovzl %2,%0" 
: "=r" (r) : "rm" (x), "r" (-1));
return r+1;
}

/**
 * fls - find last bit set
 * @x: the word to search
 *
 * This is defined the same way as ffs.
 */
static inline int fls(int x)
{
int r;

__asm__("bsrl %1,%0\n\t"
"cmovzl %2,%0"
: "=" (r) : "rm" (x), "rm" (-1));
return r+1;
}

/**
 * fls64 - find last bit set in 64 bit word
 * @x: the word to search
 *
 * This is defined the same way as fls.
 */
static inline int fls64(uint64_t x)
{
if (x == 0)
return 0;
return __fls(x) + 1;
}

static inline uint64_t div64_64(uint64_t dividend, uint64_t divisor)
{
return dividend / divisor;
}

#elif __i386

#define rdtscll(val) \
 __asm__ __volatile__("rdtsc" : "=A" (val))

/**
 * ffs - find first bit set
 * @x: the word to search
 *
 * This is defined the same way as
 * the libc and compiler builtin ffs routines, therefore
 * differs in spirit from the above ffz() (man ffs).
 */
static inline int ffs(int x)
{
int r;

__asm__("bsfl %1,%0\n\t"
"jnz 1f\n\t"
"movl $-1,%0\n"
"1:" : "=r" (r) : "rm" (x));
return r+1;
}

/**
 * fls - find last bit set
 * @x: the word to search
 *
 * This is defined the same way as ffs().
 */
static inline int fls(int x)
{
int r;

__asm__("bsrl %1,%0\n\t"
"jnz 1f\n\t"
"movl $-1,%0\n"
"1:" : "=r" (r) : "rm" (x));
return r+1;
}

static inline int fls64(uint64_t x)
{
uint32_t h = x >> 32;
if (h)
return fls(h) + 32;
return fls(x);
}


#define do_div(n,base) ({ \
unsigned long __upper, __low, __high, __mod, __base; \
__base = 

Re: [PATCH -mm] Blackfin: blackfin i2c driver

2007-03-06 Thread Andrew Morton
On Wed, 7 Mar 2007 13:17:57 +0800 "Sonic Zhang" <[EMAIL PROTECTED]> wrote:

> On 3/6/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
> > On Tue, 06 Mar 2007 14:54:18 +0800 "Wu, Bryan" <[EMAIL PROTECTED]> wrote:
> >
> > > Hi folks,
> > >
> > > [PATCH] Blackfin: blackfin i2c driver
> > >
> 
> > > + struct i2c_msg *pmsg;
> > > + int i, ret;
> > > + int rc = 0;
> > > +
> > > + if (!(bfin_read_TWI_CONTROL() & TWI_ENA))
> > > + return -ENXIO;
> > > +
> > > + down(>twi_lock);
> > > +
> > > + while (bfin_read_TWI_MASTER_STAT() & BUSBUSY) {
> > > + up(>twi_lock);
> > > + schedule();
> > > + down(>twi_lock);
> > > + }
> >
> > That's a busy loop until this task's timeslice has expired.  It'll work,
> > but it'll suck a bit.  (Repeated in several places)
> >
> 
> OK, I change it into yield(). So, current process will be move to the
> tail of the run queue. Is that OK with you?

Nope, yield is terribly bad when there are busy processes running: it can
stall for a very long time indeed,

Is this hardware not capable of generating an interrupt when BUSBUSY gets
negated?

I guess not, in which case you're stuck with having to poll it - probably
use a cond_resched() in the loop, and an angry comment.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] Blackfin: blackfin i2c driver

2007-03-06 Thread Wu, Bryan
Dear Andrew and Alexey:

Thanks a lot for the review.

Here is the updated blackfin i2c driver.

[PATCH] Blackfin: blackfin i2c driver

The i2c linux driver for blackfin architecture which supports both GPIO
i2c operation and blackfin on-chip TWI controller i2c operation.

Signed-off-by: Bryan Wu <[EMAIL PROTECTED]> 
---
 drivers/i2c/busses/Kconfig |   47 
 drivers/i2c/busses/i2c-bfin-gpio.c |   98 +
 drivers/i2c/busses/i2c-bfin-twi.c  |  589 

 3 files changed, 734 insertions(+)

Index: linux-2.6/drivers/i2c/busses/Kconfig
===
--- linux-2.6.orig/drivers/i2c/busses/Kconfig   2007-03-07 13:32:02.0 
+0800
+++ linux-2.6/drivers/i2c/busses/Kconfig2007-03-07 13:44:19.0 
+0800
@@ -5,6 +5,53 @@
 menu "I2C Hardware Bus support"
depends on I2C
 
+config I2C_BFIN_GPIO
+   tristate "Generic Blackfin and HHBF533/561 development board I2C 
support"
+   depends on I2C && EXPERIMENTAL
+   select I2C_ALGOBIT
+   help
+   --
+
+menu "BFIN I2C SDA/SCL Selection"
+   depends on I2C_BFIN_GPIO
+config BFIN_SDA
+   int "SDA is GPIO Number"
+   range 0 15 if (BF533 || BF532 || BF531) 
+   range 0 47 if (BF534 || BF536 || BF537)
+   range 0 47 if BF561
+   default 2 if (BF533 || BF532 || BF531) 
+
+config BFIN_SCL
+   int "SCL is GPIO Number"
+   range 0 15 if (BF533 || BF532 || BF531) 
+   range 0 47 if (BF534 || BF536 || BF537)
+   range 0 47 if BF561
+   default 3 
+endmenu
+
+config I2C_BFIN_GPIO_CYCLE_DELAY
+   int "Cycle Delay in usec"
+   depends on I2C_BFIN_GPIO
+   range 1 100 
+   default 40
+
+config I2C_BFIN_TWI
+   tristate "Blackfin TWI I2C support"
+   depends on I2C && (BF534 || BF536 || BF537)
+   help
+ This the TWI I2C device driver for Blackfin 534/536/537.
+
+ This driver can also be built as a module.  If so, the module
+ will be called i2c-bfin-twi.
+
+config TWICLK_KHZ
+   int "TWI clock (kHZ)"
+   depends on I2C_BFIN_TWI
+   default 50
+   help
+ The unit of the TWI clock is kilo HZ. Please divide the clock 
+ by 1024 if you count it in HZ. The value should be less than 400.
+
 config I2C_ALI1535
tristate "ALI 1535"
depends on I2C && PCI
Index: linux-2.6/drivers/i2c/busses/i2c-bfin-gpio.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6/drivers/i2c/busses/i2c-bfin-gpio.c2007-03-07 
13:44:19.0 +0800
@@ -0,0 +1,98 @@
+/
+ * Description: *
+ *  *
+ * Maintainer: Meihui Fan <[EMAIL PROTECTED]>  *
+ *  *
+ * CopyRight (c)  2004  HHTech  *
+ *   www.hhcn.com, www.hhcn.org *
+ *   All rights reserved.   *
+ *  *
+ * This file is free software;  *
+ *   you are free to modify and/or redistribute it *
+ *   under the terms of the GNU General Public Licence (GPL).   *
+ *  *
+ /
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#defineI2C_HW_B_HHBF   0x13
+
+static void hhbf_setsda(void *data, int state)
+{
+   if (state) {
+   gpio_direction_input(CONFIG_BFIN_SDA);
+
+   } else {
+   gpio_direction_output(CONFIG_BFIN_SDA);
+   gpio_set_value(CONFIG_BFIN_SDA, 0);
+   }
+}
+
+static void hhbf_setscl(void *data, int state)
+{
+   gpio_set_value(CONFIG_BFIN_SCL, state);
+}
+
+static int hhbf_getsda(void *data)
+{
+   return (gpio_get_value(CONFIG_BFIN_SDA) != 0);
+}
+
+
+static struct i2c_algo_bit_data bit_hhbf_data = {
+   .setsda  = hhbf_setsda,
+   .setscl  = hhbf_setscl,
+   .getsda  = hhbf_getsda,
+   .udelay  = CONFIG_I2C_BFIN_GPIO_CYCLE_DELAY,
+   .timeout = HZ
+};
+
+static struct i2c_adapter hhbf_ops = {
+   .owner  = THIS_MODULE,
+   .id = I2C_HW_B_HHBF,
+   .algo_data  = _hhbf_data,
+   .name   = "HHBF I2C driver",
+};
+
+static int __init i2c_hhbf_init(void)
+{
+
+   if (gpio_request(CONFIG_BFIN_SCL, NULL)) {
+   printk(KERN_ERR "%s: gpio_request GPIO %d failed \n",__func__, 
CONFIG_BFIN_SCL);
+   return -1;
+   }
+
+   if (gpio_request(CONFIG_BFIN_SDA, NULL)) {
+   printk(KERN_ERR "%s: gpio_request 

Re: kernel-headers

2007-03-06 Thread Arjan van de Ven
On Wed, 2007-03-07 at 13:14 +0800, zhangxiliang wrote:
> hello,
> do you know where some problems about kernel-headers-*.rpm are discussed?

Hi,

the answer to your question depends on which distro you are using

if it's a distro that gets the headers from the kernel's "make
header_install" (Fedora at least) I suspect this mailinglist is the
right place. If not, you probably should use a mailinglist for your
distribution.


Greetings,
   Arjan van de Ven

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v2] epoll use a single inode ...

2007-03-06 Thread H. Peter Anvin

Linus Torvalds wrote:


On Tue, 6 Mar 2007, Eric Dumazet wrote:

I did a user space program, attached to this mail.

I rewrote the reciprocal_div() for i386 so that one multiply is used.


Ok, this is definitely faster on Core 2 as well, so "numbers talk, 
bullshit walks". No more objections.


(That said, I bet you could do even better for octal and hex numbers, so 
if you *really* want to speed things up, you should just make a 
special-case routine for each base (there's just three of them), and you 
can then also optimize the base-10 thing much better (you can do two 
digits at a time by dividing by 100, etc)




Of course you can do better for octal and hex -- it's just shift and mask.

Decimal is trickier; however, at least on i386 it might make sense to 
divide by 100 and then use the AAM instruction, or a table lookup, to 
split it into individual digits.


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kref refcounting breakage in mainline

2007-03-06 Thread Mike Galbraith
On Tue, 2007-03-06 at 13:04 -0800, Greg KH wrote:
> On Tue, Mar 06, 2007 at 06:43:22AM +0100, Mike Galbraith wrote:
> > On Mon, 2007-03-05 at 16:25 -0800, Greg KH wrote:
> > 
> > > Mike, I've reverted this patch, and I don't see any references leaking.
> > > And, as your patch released the reference on the driver, and the
> > > module_add_driver() call would not grab a reference to the driver, only
> > > the module kobject, I don't see what you were trying to fix with this
> > > patch.
> > > 
> > > Do you have a test case that this fixes?
> > 
> > What it fixed for me was the hard hang reported below.
> > 
> > http://lkml.org/lkml/2007/2/16/96
> 
> What specific module are you trying to unload that causes the hang?  I
> think it might just be a problem with that module, and not with all
> others.

It's ipmi_si that's hanging, waits for completion that never comes.

> So, I'm going to revert your patch and work to try to find the real
> cause of this problem.

Yeah, my stab at it seems busted.  I'll take another poke at it to see
if I can find out why (post 725522b5453dd680412f2b6463a988e4fd148757)
I'm left with a reference.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 2/7] RSS controller core

2007-03-06 Thread Balbir Singh

Pavel Emelianov wrote:

This includes setup of RSS container within generic
process containers, all the declarations used in RSS
accounting, and core code responsible for accounting.




diff -upr linux-2.6.20.orig/include/linux/rss_container.h 
linux-2.6.20-0/include/linux/rss_container.h
--- linux-2.6.20.orig/include/linux/rss_container.h 2007-03-06 
13:39:17.0 +0300
+++ linux-2.6.20-0/include/linux/rss_container.h2007-03-06 
13:33:28.0 +0300
@@ -0,0 +1,68 @@
+#ifndef __RSS_CONTAINER_H__
+#define __RSS_CONTAINER_H__
+/*
+ * RSS container
+ *
+ * Copyright 2007 OpenVZ SWsoft Inc
+ *
+ * Author: Pavel Emelianov <[EMAIL PROTECTED]>
+ *
+ */
+
+struct page_container;
+struct rss_container;
+
+#ifdef CONFIG_RSS_CONTAINER
+int container_rss_prepare(struct page *, struct vm_area_struct *vma,
+   struct page_container **);
+
+void container_rss_add(struct page_container *);
+void container_rss_del(struct page_container *);
+void container_rss_release(struct page_container *);
+
+int mm_init_container(struct mm_struct *mm, struct task_struct *tsk);
+void mm_free_container(struct mm_struct *mm);
+
+unsigned long container_isolate_pages(unsigned long nr_to_scan,
+   struct rss_container *rss, struct list_head *dst,
+   int active, unsigned long *scanned);
+unsigned long container_nr_physpages(struct rss_container *rss);
+
+unsigned long container_try_to_free_pages(struct rss_container *);
+void container_out_of_memory(struct rss_container *);
+
+void container_rss_init_early(void);
+#else
+static inline int container_rss_prepare(struct page *pg,
+   struct vm_area_struct *vma, struct page_container **pc)
+{
+   *pc = NULL; /* to make gcc happy */
+   return 0;
+}
+
+static inline void container_rss_add(struct page_container *pc)
+{
+}
+
+static inline void container_rss_del(struct page_container *pc)
+{
+}
+
+static inline void container_rss_release(struct page_container *pc)
+{
+}
+
+static inline int mm_init_container(struct mm_struct *mm, struct task_struct 
*t)
+{
+   return 0;
+}
+
+static inline void mm_free_container(struct mm_struct *mm)
+{
+}
+
+static inline void container_rss_init_early(void)
+{
+}
+#endif
+#endif
diff -upr linux-2.6.20.orig/init/Kconfig linux-2.6.20-0/init/Kconfig
--- linux-2.6.20.orig/init/Kconfig  2007-03-06 13:33:28.0 +0300
+++ linux-2.6.20-0/init/Kconfig 2007-03-06 13:33:28.0 +0300
@@ -265,6 +265,13 @@ config CPUSETS
bool
select CONTAINERS

+config RSS_CONTAINER
+   bool "RSS accounting container"
+   select RESOURCE_COUNTERS
+   help
+ Provides a simple Resource Controller for monitoring and
+ controlling the total Resident Set Size of the tasks in a container
+


The wording looks very familiar :-). It would be useful to add
"The reclaim logic is now container aware, when the container goes overlimit
the page reclaimer reclaims pages belonging to this container. If we are
unable to reclaim enough pages to satisfy the request, the process is
killed with an out of memory warning"


 config SYSFS_DEPRECATED
bool "Create deprecated sysfs files"
default y
diff -upr linux-2.6.20.orig/mm/Makefile linux-2.6.20-0/mm/Makefile
--- linux-2.6.20.orig/mm/Makefile   2007-02-04 21:44:54.0 +0300
+++ linux-2.6.20-0/mm/Makefile  2007-03-06 13:33:28.0 +0300
@@ -29,3 +29,5 @@ obj-$(CONFIG_MEMORY_HOTPLUG) += memory_h
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
 obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
+
+obj-$(CONFIG_RSS_CONTAINER) += rss_container.o
diff -upr linux-2.6.20.orig/mm/rss_container.c linux-2.6.20-0/mm/rss_container.c
--- linux-2.6.20.orig/mm/rss_container.c2007-03-06 13:39:17.0 
+0300
+++ linux-2.6.20-0/mm/rss_container.c   2007-03-06 13:33:28.0 +0300
@@ -0,0 +1,307 @@
+/*
+ * RSS accounting container
+ *
+ * Copyright 2007 OpenVZ SWsoft Inc
+ *
+ * Author: Pavel Emelianov <[EMAIL PROTECTED]>
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static struct container_subsys rss_subsys;
+
+struct rss_container {
+   struct res_counter res;
+   struct list_head page_list;
+   struct container_subsys_state css;
+};
+
+struct page_container {
+   struct page *page;
+   struct rss_container *cnt;
+   struct list_head list;
+};
+


Yes, this is what I was planning to get to -- a per container LRU list.
But you have just one list, don't you need active and inactive lists?
When the global LRU is manipulated, shouldn't this list be updated as
well, so that reclaim will pick the best pages.


+static inline struct rss_container *rss_from_cont(struct container *cnt)
+{
+   return container_of(container_subsys_state(cnt, _subsys),
+   struct rss_container, css);
+}
+
+int mm_init_container(struct mm_struct *mm, struct task_struct 

Re: [PATCH] INPUT/keyboard: PXA27x keyboard support

2007-03-06 Thread Dmitry Torokhov
Hi Rodolfo,

On Friday 02 March 2007 11:05, Rodolfo Giometti wrote:
> Hello, here my last patch for the PXA27x keyboard support updated to
> linux-2.6.21-rc2.
> 
> I added power management support (suspend/resume code).

The patch has bunch of issues that are hard to list because it was sent as
an attachment... Examples are: REL_WHEEL does not belong to evbit, using 
input_free_device() is not allowed after input_unregister_device(), etc.
I tried to fix everything I notoiced; if you could try the patch below
and verify that it still works I will apply it to teh input tree.

Thanks.

-- 
Dmitry

From: Rodolfo Giometti <[EMAIL PROTECTED]>

Input: add support for PXA27x keyboard controller

Signed-off-by: Rodolfo Giometti <[EMAIL PROTECTED]>
Signed-off-by: Dmitry Torokhov <[EMAIL PROTECTED]>
---

 drivers/input/keyboard/Kconfig |9 +
 drivers/input/keyboard/Makefile|1 
 drivers/input/keyboard/pxa27x_keyboard.c   |  258 +
 include/asm-arm/arch-pxa/pxa27x_keyboard.h |   13 +
 4 files changed, 281 insertions(+)

Index: work/drivers/input/keyboard/Kconfig
===
--- work.orig/drivers/input/keyboard/Kconfig
+++ work/drivers/input/keyboard/Kconfig
@@ -203,6 +203,15 @@ config KEYBOARD_OMAP
  To compile this driver as a module, choose M here: the
  module will be called omap-keypad.
 
+config KEYBOARD_PXA27x
+   tristate "PXA27x keyboard support"
+   depends on PXA27x
+   help
+ Enable support for PXA27x matrix keyboard controller
+
+ To compile this driver as a module, choose M here: the
+ module will be called pxa27x_keyboard.
+
 config KEYBOARD_AAED2000
tristate "AAED-2000 keyboard"
depends on MACH_AAED2000
Index: work/drivers/input/keyboard/Makefile
===
--- work.orig/drivers/input/keyboard/Makefile
+++ work/drivers/input/keyboard/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_KEYBOARD_SPITZ)  += spitzkb
 obj-$(CONFIG_KEYBOARD_HIL) += hil_kbd.o
 obj-$(CONFIG_KEYBOARD_HIL_OLD) += hilkbd.o
 obj-$(CONFIG_KEYBOARD_OMAP)+= omap-keypad.o
+obj-$(CONFIG_KEYBOARD_PXA27x)  += pxa27x_keyboard.o
 obj-$(CONFIG_KEYBOARD_AAED2000)+= aaed2000_kbd.o
 obj-$(CONFIG_KEYBOARD_GPIO)+= gpio_keys.o
 
Index: work/drivers/input/keyboard/pxa27x_keyboard.c
===
--- /dev/null
+++ work/drivers/input/keyboard/pxa27x_keyboard.c
@@ -0,0 +1,258 @@
+/*
+ * linux/drivers/input/keyboard/pxa27x_keyboard.c
+ *
+ * Driver for the pxa27x matrix keyboard controller.
+ *
+ * Created:Feb 22, 2007
+ * Author: Rodolfo Giometti <[EMAIL PROTECTED]>
+ *
+ * Based on a previous implementations by Kevin O'Connor
+ *  and Alex Osborne <[EMAIL PROTECTED]> and
+ * on some suggestions by Nicolas Pitre <[EMAIL PROTECTED]>.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#define DRIVER_NAME"pxa27x-keyboard"
+
+#define KPASMKP(col)   (col/2 == 0 ? KPASMKP0 : \
+col/2 == 1 ? KPASMKP1 : \
+col/2 == 2 ? KPASMKP2 : KPASMKP3)
+#define KPASMKPx_MKC(row, col) (1 << (row + 16 * (col % 2)))
+
+static irqreturn_t pxakbd_irq_handler(int irq, void *dev_id)
+{
+   struct platform_device *pdev = dev_id;
+   struct pxa27x_keyboard_platform_data *pdev = dev->platform_data;
+   struct input_dev *input_dev = platform_get_drvdata(pdev);
+   unsigned long kpc = KPC;
+   int p, row, col, rel;
+
+   if (kpc & KPC_DI) {
+   unsigned long kpdk = KPDK;
+
+   if (!(kpdk & KPDK_DKP)) {
+   /* better luck next time */
+   } else if (kpc & KPC_REE0) {
+   unsigned long kprec = KPREC;
+   KPREC = 0x7f;
+
+   if (kprec & KPREC_OF0)
+   rel = (kprec & 0xff) + 0x7f;
+   else if (kprec & KPREC_UF0)
+   rel = (kprec & 0xff) - 0x7f - 0xff;
+   else
+   rel = (kprec & 0xff) - 0x7f;
+
+   if (rel) {
+   input_report_rel(input_dev, REL_WHEEL, rel);
+   input_sync(input_dev);
+   }
+   }
+   }
+
+   if (kpc & KPC_MI) {
+   /* report the status of every button */
+   for (row = 0; row < pdev->nr_rows; row++) {
+   for 

Sleeping thread not receive signal until it wakes up

2007-03-06 Thread Luong Ngo

Hi all,

I am having this problem. I have a process with 2 threads created. One
of the thread will keep calling IOCTL  to get information from the
kernel and will be blocked if there is no new information. If there is
information retured, the thread will be checked to see if any error
happens and trigger an action. Since we have no way to know if the
error is gone (Hardware provides no signal), so what we do is when
trigger an action for the error, we will set an timer using alarm()
and register a SIGALRM handler in the thread by using sigaction. After
setting the alarm, the thread will loop back and call IOCTL, which
could cause it to be put to sleep. The problem is the SIGALRM handler
does not receive the SIGALRM while the thread is being blocked by
IOCTL. And if we generated some event so that the IOCTL is returned
with new information, the SIGALRM handler is invoked right away.
However, as I read the manual, which says a thread/process should be
waken up even when it sleeps if there is a signal delivered to it. Am
I right?
One thing I don't know it mattters or not is that I am not using
sigwait to block the process and wait for signal because the thread
need to go back to the IOCTL call and be slept on that. So I used
sigaction to register the signal handler in hope that this handler wil
be invoked by the kernel when there is an SIGALRM delivered to the
thread.
Could anyone tell me if I did something wrong and what is the correct
way to achieve this task? I tried to avoid creating another thread
which will call sigwait and block until the IOCTL thread send it
explicitly a signal because I want to use timer.


Thank you in advance,
LNgo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc2 : Oops in rtc_cmos...

2007-03-06 Thread David Brownell
On Tuesday 06 March 2007 8:42 pm, Paul Rolland wrote:

> It seems to me that the DRV_RTC_CMOS and the "standard" CONFIG_RTC
> shouldn't be used at the same time... Am I correct on that ? 

Yes.  I recall not forcing that because I couldn't be sure the new code
was functionally identical to the legacy driver ... and in areas like
HPET, knew it was not.  Of course, since then I see that someone has
kicked in a patch making Linux stop using HPET in legacy-replacement
mode, so that particular issue now seems moot.

Another area it's not functionally identical is CONFIG_SND_RTCTIMER,
where the ALSA code doesn't know how to use the new RTC framework.
(Or, probably, cope with the fact that not all RTCs can give periodic
IRQs at the rates it wants ...)

So it wasn't clear to me that distros might not need to have both
options, to help cope with strange hardware.  Phasing out legacy code
tends to be done a bit cautiously.

On the other hand, the distro vendors have been slow to look at this
issue, and haven't even upgraded their copies of "hwclock" to be
able to recognize /dev/rtc0, so I'm not holding my breath there.


> Wouldn't it be better to have this dependancy enforced ?

Feel free to submit a patch updating the Kconfig for both drivers.

Merging it might complicate distro efforts to move away from that
legacy driver (by preventing systems that can run with either one),
but I don't see that happening very quickly anyway.

- Dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] msi: Fixup the msi enable/disable logic

2007-03-06 Thread Eric W. Biederman
Michael Ellerman <[EMAIL PROTECTED]> writes:

>
> Hi Eric, comments below ..
>
>
> I get the reasoning for disabling MSI before we start writing back the
> config space, but don't we want to re-enable MSI on the way out?

We are restoring the entire msi flags register which includes the enable bit,
setting it a second time is gratuitous.

In addition if we are restoring the register when the enable bit is not set.
(because we don't have a mask bit) enabling the msi state is actually
the wrong thing to do.But I admit that case can only happen after
the additions in my last patch.

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] Blackfin: blackfin i2c driver

2007-03-06 Thread Sonic Zhang

On 3/6/07, Andrew Morton <[EMAIL PROTECTED]> wrote:

On Tue, 06 Mar 2007 14:54:18 +0800 "Wu, Bryan" <[EMAIL PROTECTED]> wrote:

> Hi folks,
>
> [PATCH] Blackfin: blackfin i2c driver
>



> + struct i2c_msg *pmsg;
> + int i, ret;
> + int rc = 0;
> +
> + if (!(bfin_read_TWI_CONTROL() & TWI_ENA))
> + return -ENXIO;
> +
> + down(>twi_lock);
> +
> + while (bfin_read_TWI_MASTER_STAT() & BUSBUSY) {
> + up(>twi_lock);
> + schedule();
> + down(>twi_lock);
> + }

That's a busy loop until this task's timeslice has expired.  It'll work,
but it'll suck a bit.  (Repeated in several places)



OK, I change it into yield(). So, current process will be move to the
tail of the run queue. Is that OK with you?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: should RTS init in serial core be tied to CRTSCTS

2007-03-06 Thread Oleksiy Kebkal

> shouldnt TIOCM_RTS be passed down only when the 'r' is appended to the
> boot cmdline ?

How would it be useful?

CRTSCTS is for CTS only (i.e., the transmission is paused when CTS is
inactive), not for RTS. DTR and RTS should be active when the port is
open even without CRTSCTS (= without handshaking), it's used for
various purposes such as providing +12V to the device (and two pins
can supply more power than one - sure, it isn't the best idea).


The name of the option is not CCTS, but CRTSCTS, isn't it? So, you may
not only want to pause own transmission when CTS is inactive, but to
control the transmission flow from the remote side. Why should RTS be
active when the port is open even without CRTSCTS? You may still
assert RTS manually if it is used to provide +12V to the device. But
as I understand it is not common use of this pin, isn't it?

And a question is not only about supporting legacy equipment but also
about embedded hardware where RTS/CTS handshaking is handshaking, not
something else...

-Oleksiy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree

2007-03-06 Thread Jeremy Fitzhardinge
Thomas Gleixner wrote:
> Ooops. I completely forgot, that you get the absolute expiry time
> already in ktime_t format (nanoseconds) when dev->set_next_event() is
> called.
>
>   dev->next_event = expires;
>
> is done right before the call. 
>
> So it's already there for free.
>   

OK, but a trap for young players (ie, me): the absolute time is in ns
since kernel boot, but the hypervisor wants an absolute time in ns since
system boot.  Everything works reasonably well for the first guest
started early, so be sure to take a snapshot of hypervisor time early in
order to get the correction...

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Linux v2.6.21-rc3

2007-03-06 Thread Linus Torvalds

We've finally hopefully started to put a dent in the regressions, 
especially the suspend/resume problems introduced since 2.6.20.

So 2.6.21-rc3 is out there now, and there's some hope that it will work 
more widely than -rc1 and -rc2 did. Please do give it a good testing, and 
update Adrian and the mailing list (and me) about any regressions 
(hopefully many more of the "it's fixed now" than other kinds, but all 
regressions are interesting).

The appended shortlog gives a reasonable overview. In general we're 
definitely calming down, and most of the changes are fairly small and 
obvious fixes. 

Let's keep the fixes to a minimum, especially since I'm planning on biting 
peoples heads off if I get any more pull requests for things that aren't 
real and obvious fixes. 

Linus

---

Adam Litke (1):
  Fix get_unmapped_area and fsync for hugetlb shm segments

Adrian Bunk (8):
  HID: hid-debug.c should #include 
  arch/arm26/kernel/entry.S: remove dead code
  make ipc/shm.c:shm_nopage() static
  mm/{,tiny-}shmem.c cleanups
  drivers/video/sm501fb.c: make 4 functions static
  fix the SYSCTL=n compilation
  arch/i386/kernel/vmi.c must #include 
  remove arch/i386/kernel/tsc.c:custom_sched_clock

Ahmed S. Darwish (1):
  KVM: Use ARRAY_SIZE macro instead of manual calculation.

Akira Iguchi (1):
  scc_pata: bugfix for checking DMA IRQ status

Alan Cox (4):
  libata-core: Fix simplex handling
  pata_qdi: Fix initialisation
  siimage: DRAC4 note
  ide: remove a ton of pointless #undef REALLY_SLOW_IO

Alexandr Andreev (1):
  [IA64] sync compat getdents

Alexey Dobriyan (1):
  geode-aes: use unsigned long for spin_lock_irqsave

Allan Graves (1):
  uml: enable RAW

Andres Salomon (3):
  i386: make x86_64 tsc header require i386 rather than vice-versa
  hrtimers: fix HRTIMER_CB_IRQSAFE_NO_SOFTIRQ description
  hrtimers: hrtimer_clock_base description typo

Andrew Morton (7):
  throttle_vm_writeout(): don't loop on GFP_NOFS and GFP_NOIO allocations
  ide: fix pmac breakage
  KVM: Move kvmfs magic number to 
  cyclades: return closing_wait
  revert "drivers/net/tulip/dmfe: support basic carrier detection"
  sis900 warning fixes
  fix build with CONFIG_NO_IDLE_HZ=n

Andrzej Zaborowski (1):
  ARM: OMAP: correct misc 15xx and non-15xx platform code

Antonino A. Daplas (2):
  MAINTAINERS: Update email address
  atyfb: Fix kconfig error

Aristeu Sergio Rozanski Filho (1):
  tty_io: fix race in master pty close/slave pty close path

Arnaldo Carvalho de Melo (1):
  [TCP]: Fix minisock tcp_create_openreq_child() typo.

Arnaud Patard (1):
  ARM: OMAP: board-nokia770: correct lcd name

Atsushi Nemoto (4):
  [MIPS] jmr3927: build fix
  [MIPS] Convert to RTC-class ds1742 driver
  [MIPS] No need to write c0_compare in plat_timer_setup
  [MIPS] TX39: Remove redundant tx39_blast_icache() calls

Avi Kivity (13):
  KVM: mmu: add missing dirty page tracking cases
  KVM: Cosmetics
  KVM: Add hypercall host support for svm
  KVM: Wire up hypercall handlers to a central arch-independent location
  KVM: svm: init cr0 with the wp bit set
  KVM: More 0 -> NULL conversions
  KVM: Add internal filesystem for generating inodes
  KVM: Create an inode per virtual machine
  KVM: Rename some kvm_dev_ioctl_*() functions to kvm_vm_ioctl_*()
  KVM: Move kvm_vm_ioctl_create_vcpu() around
  KVM: Per-vcpu inodes
  KVM: Bump API version
  KVM: Fix bogus failure in kvm.ko module initialization

Bartlomiej Zolnierkiewicz (3):
  ide: remove some obsoleted kernel params (v2)
  ide: make legacy IDE VLB modules check for the "probe" kernel params (v2)
  pata_pdc202xx_old: fix data corruption and other problems

Ben Dooks (2):
  [ARM] 4238/1: S3C24XX: docs: update suspend and resume
  [ARM] 4239/1: S3C24XX: Update kconfig entries for PM

Brice Goglin (1):
  myri10ge: fix copyright and license

Catalin Marinas (1):
  [ARM] 4241/1: Define mb() as compiler barrier on a uniprocessor system

Christian Krafft (1):
  ipmi: check, if default ports are accessible on PPC

Christoph Lameter (1):
  Page migration: Fix vma flag checking

Con Kolivas (1):
  sched: remove SMT nice

Cornelia Huck (3):
  [S390] cio: Fix locking when calling notify function.
  [S390] cio: Use path verification to check for path state.
  [S390] cio: Call cancel_halt_clear even when actl == 0.

Dale Farnsworth (2):
  mv643xx_eth: move mac_addr inside mv643xx_eth_platform_data
  mv643xx_eth: Place explicit port number in mv643xx_eth_platform_data

Dan Aloni (1):
  [VLAN]: Avoid a 4-order allocation.

Daniel Walker (2):
  update timekeeping_is_continuous comment
  fix vsyscall settimeofday

Dave Johnson (1):
  [MIPS] Fix __raw_read_trylock() to allow multiple readers

Dave Jones (2):
  Fix mv643xx_eth 

Re: [patch 1/4] signalfd v1 - signalfd core ...

2007-03-06 Thread Stephen Rothwell
On Tue, 6 Mar 2007 17:36:56 -0800 (PST) Davide Libenzi 
 wrote:
>
> The read(2) call will read u32 signal numbers that landed over the
> signalfd. It returns the size of the data copied, or zero if the sighand
> we are attached to, has been detached.

So what about signals that the user asked for a siginfo_t to be returned
with?

--
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpmN3ZjnHr72.pgp
Description: PGP signature


Re: PROBLEM: 2.6.20-1 not working on ibook g4 (BUG/Oops)

2007-03-06 Thread Paul Collins
David Woodhouse <[EMAIL PROTECTED]> writes:

> On Tue, 2007-03-06 at 14:53 +1300, Paul Collins wrote:
>> In case it's of interest, 2.6.20 has been running fine on my
>> PowerBook5,4. 
>
> How much memory? What if you boot with mem=512M or mem=256M?

1GB.  Also works fine when booted with those options.

-- 
Paul Collins
Wellington, New Zealand

Dag vijandelijk luchtschip de huismeester is dood
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: 2.6.21-rc2 : Oops in rtc_cmos...

2007-03-06 Thread Paul Rolland
Hello,

> > Yes, it does, so it's a Good One (tm),
> 
> And points out that $SUBJECT is misleading; the root cause of
> the oops isn't rtc_cmos.  Workaround, don't enable the legacy
> driver for this hardware.

Well, sorry for that, but my point was that without enabling
CONFIG_DRV_RTC_CMOS and only using CONFIG_RTC, my dmesg says :

drivers/rtc/hctosys.c: unable to open rtc device (rtc0)

Having seem that, I got thru all the options, trying to find what I
could have forgot as an option, and added the RTC_CMOS one, that resulted
in an Oops... 

> One of the good things about getting rtc-cmos merged:  it
> exposes this new RTC framework to new mistakes, which helps
> fix some of the remaining rough spots.  

Good ;)
 
> > pnp: Device 00:03 does not support disabling.
> 
> Blame the PNP stack for that particular useless message.
> I'l send a fix for that one too.
OK, ready to test ! 

> > drivers/rtc/hctosys.c: unable to open rtc device (rtc0) 
> Because probing 00:03 failed, was never fully usable.
> So then rtc0 couldn't be found.  You'd get the same
> message if, say, the RTC was loaded as a module.

It seems to me that the DRV_RTC_CMOS and the "standard" CONFIG_RTC
shouldn't be used at the same time... Am I correct on that ? 
Wouldn't it be better to have this dependancy enforced ?

Regards,
Paul

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[kj]Patch8:replace pci_find_device in drivers/telephony/ixj.c

2007-03-06 Thread Surya
Hi,
   Cleaning up of pci_find_device in drivers/telephony/ixj.c.
Applies and compiles clean on Linus tree. No hardware hence not tested!!

Unable to find a suitable Maintainer for the current subsection in the 
Maintainers file.
I am not sure whether this is orphaned or maintained. 

Can somebody help me identify the actual maintainer.

thank you.


Signed-off-by: Surya Prabhakar <[EMAIL PROTECTED]>
--- 


diff --git a/drivers/telephony/ixj.c b/drivers/telephony/ixj.c
index 71cb64e..c7b0a35 100644
--- a/drivers/telephony/ixj.c
+++ b/drivers/telephony/ixj.c
@@ -7692,7 +7692,7 @@ static int __init ixj_probe_pci(int *cnt
IXJ *j = NULL;
 
for (i = 0; i < IXJMAX - *cnt; i++) {
-   pci = pci_find_device(PCI_VENDOR_ID_QUICKNET,
+   pci = pci_get_device(PCI_VENDOR_ID_QUICKNET,
  PCI_DEVICE_ID_QUICKNET_XJ, pci);
if (!pci)
break;
@@ -7712,6 +7712,7 @@ static int __init ixj_probe_pci(int *cnt
printk(KERN_INFO "ixj: found Internet PhoneJACK PCI at 
0x%x\n", j->DSPbase);
++*cnt;
}
+   pci_dev_put(pci);
return probe;
 }
 

-- 
surya.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: qla2xxx BUG: workqueue leaked lock or atomic

2007-03-06 Thread Andrew Morton
On Wed, 28 Feb 2007 16:37:22 +0100 Andre Noll <[EMAIL PROTECTED]> wrote:

> On 16:18, Andre Noll wrote:
> 
> > With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
> > writing to both raid systems at the same time via lvm still locks up
> > the system within minutes.
> 
> Screenshot of the resulting kernel panic:
> 
>   http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png
> 

It died in CFQ.  Please try a different IO scheduler.  Use something
like

echo deadline > /sys/block/sda/queue/scheduler

This could still be the old qla2xxx bug, or it could be a new qla2xxx bug,
or it could be a block bug, or it could be an LVM bug.

Adrian, can we please track this as a regression?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: don't use ZONE_DMA unless CONFIG_ZONE_DMA is set in setup.c

2007-03-06 Thread Andres Salomon
Dave Jones wrote:
> On Tue, Mar 06, 2007 at 05:52:46PM -0800, Andrew Morton wrote:
>  > On Tue, 06 Mar 2007 18:52:59 -0500
>  > Andres Salomon <[EMAIL PROTECTED]> wrote:
>  > 
>  > > If CONFIG_ZONE_DMA is ever undefined, ZONE_DMA will also not be defined,
>  > > and setup.c won't compile.  This wraps it with an #ifdef.
>  > > 
>  > 
>  > I guess if anyone tries to disable ZONE_DMA on i386 they'll pretty quickly
>  > discover that.  But I don't think we need to "fix" it yet?

Oh, it's certainly not urgent.  I sent it simply for correctness reasons.

It would've been nice to see the ZONE_DMA removal patches just #define
ZONE_DMA regardless, and include less #ifdefs scattered about; but at
this point, I'd just as soon prefer to see a proper way to allocate
things based on address constraints (as discussed in
http://www.gelato.unsw.edu.au/archives/linux-ia64/0609/19036.html).


> 
> CONFIG_ZONE_DMA isn't even optional on i386, so I'm curious how
> you could hit this compile failure.
> 

Why, with custom code of course ;)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Wanted: simple, safe x86 stack overflow detection

2007-03-06 Thread Arjan van de Ven
> I'm certainly in favor of the move; IRQ stacks could be made
> rather deep and cheaply at that. I may get around to writing it this
> week if no one else does it first.

the irq stacks aren't the problem; RH at some point accidentally shipped
a kernel with 4k *shared* irq/user context stack and even that gave
almost no issues.

irq's really shouldn't actually nest; it's bad for just about everything
to do that (but that's another story, I would *love* to get rid of the
"enable irqs" thing in the x86 irq path, it hurts just about anything in
reality)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] proc: maps protection

2007-03-06 Thread Arjan van de Ven

> [Adding Cc:lkml]

> How about using a reduced check, as is done for fd and environ?  This 
> would allow root-running system monitors to still do their job.  
> Effectively, this changes the test from "is ptracing" to just "can 
> ptrace".
> 
> If this still isn't considered safe, I'll add the maps_protect file...


btw I consider it an information leak that any user can see which
files/libraries any other user and root has mmap'd. (and with glibc's
stdio mmap feature that goes even beyond direct mmap to fopen()'d).

If root or some other user wants to watch
hillary-vs-obama-in-the-mud.avi, no other user has ANY business even
seeing that. So at minimum it's a privacy issue showing the filenames...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tcp_cubic: faster cube root

2007-03-06 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 6 Mar 2007 14:47:06 -0800

> The Newton-Raphson method is quadratically convergent so
> only a small fixed number of steps are necessary.
> Therefore it is faster to unroll the loop. Since div64_64 is no longer
> inline it won't cause code explosion.
> 
> Also fixes a bug that can occur if x^2 was bigger than 32 bits.
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

Applied, thanks Stephen.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA resume slowness, e1000 MSI warning

2007-03-06 Thread Eric W. Biederman
"Kok, Auke" <[EMAIL PROTECTED]> writes:

> Ingo Molnar wrote:
>> * Kok, Auke <[EMAIL PROTECTED]> wrote:
>>
> BUG: at drivers/pci/msi.c:611 pci_enable_msi()
>>
 I would poke Eric Biederman(sp?) about this one.  Maybe its even solved by
 the MSI-enable-related patch he posted in the past 24-48 hours.
>>> I tried the 3-patch series "[PATCH 0/3] Basic msi bug fixes.." and they fix
>>> this problem for me. Were you expecting the OOPS in the first place? [...]
>>
>> the bug was the warning message (a WARN_ON()) above - not an oops. So that
>> warning message is gone in your testing?
>
> yes.

Sorry for the slow delay.  I was out of town for my brothers wedding the last 
few
days.

I wasn't exactly expecting the WARN_ON to trigger.  What I fixed was
an inconsistency in handling our state bits.  Fixing that
inconsistency appears to have fixed the e1000 usage scenario mostly by
accident.

The basic issue is that pci_save_state saves the current msi state
along with other registers, and then the e1000 driver goes and
disables the msi irq after we have saved the irq state as on.

My code notices that the msi irq was disabled before restore time, so
it skips the restore.  However we now have a leak of the msi saved cap
because we are not freeing it. 

This leaves with some basic questions.
- Does it make sense for suspend/resume methods to request/free irqs?
- Does it make sense for suspend/resume methods to allocate/free msi irqs?
- Do we want pci_save/restore_cap to save/restore msi state?

The path of least resistance is to just free the extra state and we
are good.  I'm just not quite certain that is sane and it has been a
long day.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: don't use ZONE_DMA unless CONFIG_ZONE_DMA is set in setup.c

2007-03-06 Thread Dave Jones
On Tue, Mar 06, 2007 at 05:52:46PM -0800, Andrew Morton wrote:
 > On Tue, 06 Mar 2007 18:52:59 -0500
 > Andres Salomon <[EMAIL PROTECTED]> wrote:
 > 
 > > If CONFIG_ZONE_DMA is ever undefined, ZONE_DMA will also not be defined,
 > > and setup.c won't compile.  This wraps it with an #ifdef.
 > > 
 > 
 > I guess if anyone tries to disable ZONE_DMA on i386 they'll pretty quickly
 > discover that.  But I don't think we need to "fix" it yet?

CONFIG_ZONE_DMA isn't even optional on i386, so I'm curious how
you could hit this compile failure.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 1/7] Resource counters

2007-03-06 Thread Balbir Singh

Pavel Emelianov wrote:

Introduce generic structures and routines for
resource accounting.

Each resource accounting container is supposed to
aggregate it, container_subsystem_state and its
resource-specific members within.




diff -upr linux-2.6.20.orig/include/linux/res_counter.h 
linux-2.6.20-0/include/linux/res_counter.h
--- linux-2.6.20.orig/include/linux/res_counter.h   2007-03-06 
13:39:17.0 +0300
+++ linux-2.6.20-0/include/linux/res_counter.h  2007-03-06 13:33:28.0 
+0300
@@ -0,0 +1,83 @@
+#ifndef __RES_COUNTER_H__
+#define __RES_COUNTER_H__
+/*
+ * resource counters
+ *
+ * Copyright 2007 OpenVZ SWsoft Inc
+ *
+ * Author: Pavel Emelianov <[EMAIL PROTECTED]>
+ *
+ */
+
+#include 
+
+struct res_counter {
+   unsigned long usage;
+   unsigned long limit;
+   unsigned long failcnt;
+   spinlock_t lock;
+};
+
+enum {
+   RES_USAGE,
+   RES_LIMIT,
+   RES_FAILCNT,
+};
+
+ssize_t res_counter_read(struct res_counter *cnt, int member,
+   const char __user *buf, size_t nbytes, loff_t *pos);
+ssize_t res_counter_write(struct res_counter *cnt, int member,
+   const char __user *buf, size_t nbytes, loff_t *pos);
+
+static inline void res_counter_init(struct res_counter *cnt)
+{
+   spin_lock_init(>lock);
+   cnt->limit = (unsigned long)LONG_MAX;
+}
+


Is there any way to indicate that there are no limits on this container.
LONG_MAX is quite huge, but still when the administrator wants to
configure a container to *un-limited usage*, it becomes hard for
the administrator.


+static inline int res_counter_charge_locked(struct res_counter *cnt,
+   unsigned long val)
+{
+   if (cnt->usage <= cnt->limit - val) {
+   cnt->usage += val;
+   return 0;
+   }
+
+   cnt->failcnt++;
+   return -ENOMEM;
+}
+
+static inline int res_counter_charge(struct res_counter *cnt,
+   unsigned long val)
+{
+   int ret;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   ret = res_counter_charge_locked(cnt, val);
+   spin_unlock_irqrestore(>lock, flags);
+   return ret;
+}
+


Will atomic counters help here.


+static inline void res_counter_uncharge_locked(struct res_counter *cnt,
+   unsigned long val)
+{
+   if (unlikely(cnt->usage < val)) {
+   WARN_ON(1);
+   val = cnt->usage;
+   }
+
+   cnt->usage -= val;
+}
+
+static inline void res_counter_uncharge(struct res_counter *cnt,
+   unsigned long val)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   res_counter_uncharge_locked(cnt, val);
+   spin_unlock_irqrestore(>lock, flags);
+}
+
+#endif
diff -upr linux-2.6.20.orig/init/Kconfig linux-2.6.20-0/init/Kconfig
--- linux-2.6.20.orig/init/Kconfig  2007-03-06 13:33:28.0 +0300
+++ linux-2.6.20-0/init/Kconfig 2007-03-06 13:33:28.0 +0300
@@ -265,6 +265,10 @@ config CPUSETS

  Say N if unsure.

+config RESOURCE_COUNTERS
+   bool
+   select CONTAINERS
+
 config SYSFS_DEPRECATED
bool "Create deprecated sysfs files"
default y
diff -upr linux-2.6.20.orig/kernel/Makefile linux-2.6.20-0/kernel/Makefile
--- linux-2.6.20.orig/kernel/Makefile   2007-03-06 13:33:28.0 +0300
+++ linux-2.6.20-0/kernel/Makefile  2007-03-06 13:33:28.0 +0300
@@ -51,6 +51,7 @@ obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_UTS_NS) += utsname.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
+obj-$(CONFIG_RESOURCE_COUNTERS) += res_counter.o

 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <[EMAIL PROTECTED]>, the -fno-omit-frame-pointer is
diff -upr linux-2.6.20.orig/kernel/res_counter.c 
linux-2.6.20-0/kernel/res_counter.c
--- linux-2.6.20.orig/kernel/res_counter.c  2007-03-06 13:39:17.0 
+0300
+++ linux-2.6.20-0/kernel/res_counter.c 2007-03-06 13:33:28.0 +0300
@@ -0,0 +1,72 @@
+/*
+ * resource containers
+ *
+ * Copyright 2007 OpenVZ SWsoft Inc
+ *
+ * Author: Pavel Emelianov <[EMAIL PROTECTED]>
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+static inline unsigned long *res_counter_member(struct res_counter *cnt, int 
member)
+{
+   switch (member) {
+   case RES_USAGE:
+   return >usage;
+   case RES_LIMIT:
+   return >limit;
+   case RES_FAILCNT:
+   return >failcnt;
+   };
+
+   BUG();
+   return NULL;
+}
+
+ssize_t res_counter_read(struct res_counter *cnt, int member, 
+		const char __user *userbuf, size_t nbytes, loff_t *pos)

+{
+   unsigned long *val;
+   char buf[64], *s;
+
+   s = buf;
+   val = res_counter_member(cnt, member);
+   s += sprintf(s, "%lu\n", *val);
+   return simple_read_from_buffer((void __user *)userbuf, nbytes,
+

Re: passing function pointers through platform devices?

2007-03-06 Thread Ben Nizette

NZG wrote:

I'm developing an SPI- bus >MMC/SD block driver translation layer.
As part of this layer the write protect and card detect lines need to be read.
The method for determining the state of these lines will be board specific.

Is it appropriate to pass a function pointer through a platform device 
(declared in the mach initialization) to implement card_available and 
write_protect function calls?

Or is there a cleaner way to do it?


Once the generic GPIO framework migrates upstream from -mm you should 
just pass the GPIO token from board-specific code and gpio_get_value() it.


--Ben.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

2007-03-06 Thread Roland McGrath
> > Yeah, I guess that's right.  It should still return NOTIFY_STOP when
> > args->err has no other bits set, so notifiers aren't called with zero.
> 
> In practice that might not work.  On my machine, at least, reads of DR6
> return ones in all the reserved bit positions.

Does that mean asm("mov %1,%%dr6; mov %%dr6,%0" : "=r" (mask) : "r" (0)); 
puts in mask the set of reserved bits?  We could collect that value at CPU
startup and mask it off args->err, then OR it back into vdr6.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch v2] epoll use a single inode ...

2007-03-06 Thread Linus Torvalds


On Tue, 6 Mar 2007, Eric Dumazet wrote:
> 
> I did a user space program, attached to this mail.
> 
> I rewrote the reciprocal_div() for i386 so that one multiply is used.

Ok, this is definitely faster on Core 2 as well, so "numbers talk, 
bullshit walks". No more objections.

(That said, I bet you could do even better for octal and hex numbers, so 
if you *really* want to speed things up, you should just make a 
special-case routine for each base (there's just three of them), and you 
can then also optimize the base-10 thing much better (you can do two 
digits at a time by dividing by 100, etc)

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64 irq: keep consistent for changing IRQ0_VECTOR from 0x20 to 0x30

2007-03-06 Thread Linus Torvalds


On Mon, 5 Mar 2007, Yinghai Lu wrote:
>
> please check the patch

Hmm.. It doesn't look *wrong*, but could you please

 - split it up a bit (some of it is 100% obvious, ie the comment fixes)

 - write an explanation for the individually split up patches

 - not use attachments, but just make it inline. It's practically 
   impossible to reply and quote part of the patch now.

Eric/Ingo - did you go through and check the patch?

Thanks,

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] utrace: nommu fixup support utrace

2007-03-06 Thread Roland McGrath
That old ptrace check seems pretty questionable to me.  I think what you
want is for the nommu world's get_user_pages/access_process_vm when called
with force=1,write=1 on a read-only MAP_PRIVATE page to do something more
morally similar to the mmu world's COW than it does now.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] proc: maps protection

2007-03-06 Thread Kees Cook
On Tue, Mar 06, 2007 at 06:59:42PM -0800, Andrew Morton wrote:
> On Tue, 6 Mar 2007 18:13:35 -0800 Kees Cook <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, Mar 06, 2007 at 05:56:09PM -0800, Andrew Morton wrote:
> > > On Tue, 6 Mar 2007 17:22:34 -0800
> > > Kees Cook <[EMAIL PROTECTED]> wrote:
> > > 
> > > > This is a continuation of a much earlier discussion[1].  As I 
> > > > understand, the problem is:
> > > 
> > > This sounds like a really good way of breaking lots and lots of people's
> > > expensively-developed stuff.  In ways which we won't discover until a year
> > > after we shipped it.
> > > 
> > > So nope, sorry.  Need to find a compatible way of doing this.  Perhaps a
> > > kernel boot parameter or a /proc knob.
> > 
> > Do you have examples of things in the kernel that I can use as a 
> > starting point?
> 
> No, I don't think this has precedent.
> 
> >  Would something like /proc/sys/kernel/maps_protect be 
> > reasonable?
> 
> Yes, that sounds reasonable.
> 
> An alternative is to do it with elf headers, perhaps - let the process
> specify what protections it wants in some manner.
> 
> > If an acceptable toggle is made, would you consider it being enabled by 
> > default (i.e. "tighter security by default")?
> 
> Again, that sounds risky.

[Adding Cc:lkml]

How about using a reduced check, as is done for fd and environ?  This 
would allow root-running system monitors to still do their job.  
Effectively, this changes the test from "is ptracing" to just "can 
ptrace".

If this still isn't considered safe, I'll add the maps_protect file...

--- 
task_mmu.c   |   16 +++-
 task_nommu.c |6 ++
 2 files changed, 21 insertions(+), 1 deletion(-)
---
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 7445980..7c9aad3 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -134,6 +134,9 @@ static int show_map_internal(struct seq_file *m, void *v, 
struct mem_size_stats
dev_t dev = 0;
int len;
 
+   if (!ptrace_may_attach(task))
+   return -EACCES;
+
if (file) {
struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
dev = inode->i_sb->s_dev;
@@ -444,11 +447,22 @@ const struct file_operations proc_maps_operations = {
 #ifdef CONFIG_NUMA
 extern int show_numa_map(struct seq_file *m, void *v);
 
+static int show_numa_map_checked(struct seq_file *m, void *v)
+{
+   struct proc_maps_private *priv = m->private;
+   struct task_struct *task = priv->task;
+
+   if (!ptrace_may_attach(task))
+   return -EACCES;
+   
+   return show_numa_map(m, v);
+}
+
 static struct seq_operations proc_pid_numa_maps_op = {
 .start  = m_start,
 .next   = m_next,
 .stop   = m_stop,
-.show   = show_numa_map
+.show   = show_numa_map_checked
 };
 
 static int numa_maps_open(struct inode *inode, struct file *file)
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 7cddf6b..c5783b7 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -143,6 +143,12 @@ out:
 static int show_map(struct seq_file *m, void *_vml)
 {
struct vm_list_struct *vml = _vml;
+   struct proc_maps_private *priv = m->private;
+   struct task_struct *task = priv->task;
+   
+   if (!ptrace_may_attach(task))
+   return -EACCES;
+
return nommu_vma_show(m, vml->vma);
 }
 



-- 
Kees Cook@outflux.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix get_order()

2007-03-06 Thread Linus Torvalds


On Tue, 6 Mar 2007, David Howells wrote:
>  /**
> + * ilog2_up - rounded up log of base 2 of 32-bit or a 64-bit unsigned value
> + * @n - parameter
> + *
> + * constant-capable log of base 2 calculation
> + * - this can be used to initialise global variables from constant data, 
> hence
> + *   the massive ternary operator construction
> + * - the result is rounded up
> + * - the result is undefined when n < 1
> + *
> + * selects the appropriately-sized optimised version depending on sizeof(n)
> + */
> +#define ilog2_up(n) ((n) == 1 ? 0 : ilog2((n) - 1) + 1)

This is wrong. It uses "n" twice, which makes it unsafe as a macro.

It would need to be an inline function, but then the global initializer 
comment is wrong.

Or it could use a "__builtin_constant_p()" (which gcc defines to not have 
side effects) to allow the multiple use for constant data.

Or we could require that "ilog2(0)" returns -1, and then we could just say

#define ilog2_up(n) (ilog2((n)-1)+1)

Or.. ?

The whole "get_order()" macro also has some serious lack of parenthesis. 
In general, commit 39d61db0edb34d60b83c5e0d62d0e906578cc707 just was 
pretty damn bad!

I'm becoming a bit disgruntled about this whole thing, I have to admit. 
I'm just not sure the bugs here are worth it. Especially considering that 
__get_order() has apparently never even tested these things to begin with, 
since nobody but FRV has ever #defined the ARCH_HAS_ILOG2_U?? macros.

The whole *reason* for that mess seems to be bogus too, since at least 
ia64 still has its own inline "get_order()", which means that nobody can 
use get_order() for constant initializers *anyway*, quite unlike the 
comments say.

So the whole thing is:
 - buggy
 - untested
 - has untrue comments
 - makes no real sense

and I'm inclined to just revert 39d61db0 instead of adding more and more 
breakage to it, since it's simply not going to help with the fundamental 
problems!

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [SLUB 2/3] Large kmalloc pass through. Removal of large general slabs

2007-03-06 Thread Christoph Lameter
On Tue, 6 Mar 2007, Matt Mackall wrote:

> I've been meaning to do this in SLOB as well. Perhaps it warrants
> doing in stock kmalloc? I've got a grand total of 18 of these objects
> here.

The number increases with the number numa nodes. We have had trouble with
the maximum kmalloc size before and this will get rid of it for good.
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] disable NMI watchdog by default

2007-03-06 Thread Roland Dreier
 > --- linux.orig/include/asm-x86_64/nmi.h
 > +++ linux/include/asm-x86_64/nmi.h
 > @@ -63,7 +63,7 @@ extern int setup_nmi_watchdog(char *);
 >  
 >  extern atomic_t nmi_active;
 >  extern unsigned int nmi_watchdog;
 > -#define NMI_DEFAULT -1
 > +#define NMI_DEFAULT 0

Maybe I'm missing something obvious, but this patch doesn't seem
correct to me.  The sentiment of disabling the NMI watchdog by default
is fine, and I agree with it, but I don't think this patch does what
it says.  First of all, I have a system running a kernel with this
patch applied (v2.6.21-rc2-gc3442e2), and I see NMIs in
/proc/interrupts and "testing NMI watchdog ... OK." in the log.

And second, looking at the NMI code, it seems that this change
actually makes it impossible to turn off the NMI watchdog!  In
arch/x86_64/kernel/nmi.c, we have:

void nmi_watchdog_default(void)
{
if (nmi_watchdog != NMI_DEFAULT)
return;
if (nmi_known_cpu())
nmi_watchdog = NMI_LOCAL_APIC;
else
nmi_watchdog = NMI_IO_APIC;
}

so it seems changing the value of NMI_DEFAULT has no effect on this
logic, really: if nmi_watchdog is left at the default, then the kernel
chooses LAPIC or IO-APIC.  And if someone passes "nmi_watchdog=0" on
the command line, nmi_watchdog is still NMI_DEFAULT and so the same
logic triggers.

Ingo, I assume you tested this, so what am I missing?

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Xen & VMI?

2007-03-06 Thread Zachary Amsden

Rusty Russell wrote:

On Tue, 2007-03-06 at 21:37 +0100, Ingo Molnar wrote:
  
maybe i shouldnt call it 'VMI' but 'the paravirt ABI'. I dont mind if 
it's the Xen ABI or the VMWare ABI or a mesh of the two - everyone can 
map their own internals to that /one/ ABI.



I think it's an excellent aim, but it's *HARD*.  I rejected this
approach earlier because I'm just not smart enough.  (Yet?)
  


With VMI, I think we came within 90% of getting a cross vendor 
paravirt-ABI that satisfied everyone's needs.  Nobody is smart enough to 
figure out the last 10% - it needs cooperation, trial, error, and 
experience dealing with each other's hypervisors.



The Linux side is fairly stable.  The hardware side is changing, and the
hypervisor side is changing.  This means the ABI will churn fairly fast.
The hypervisors are very different, which means the ABI will be very
wide.

We could start with VMI and try to support Xen, KVM and lguest.  It
would at least give us a better idea of the scope of the problem.  But
IMHO it's a *huge* job.
  


Surely, given time, the technical issues can be worked out.  In the 
meantime, the hardware has evolved, and many of the points that are now 
important have changed - and new issues have come into play that we 
can't anticipate yet.  At some point, we will hopefully converge, but we 
might not, and it is a huge job.  UDI had similarly lofty goals.  It was 
started in 1998.  Where is it today?


But this isn't the problem.  The problem is that nobody wants a single 
ABI.  Just like no hardware vendors want a fixed ABI for their 
hardware.  They need to innovate independently, and time to market and 
features are more important than being binary compatible with a bunch of 
competing vendors.  They want to differentiate, and break away from an 
ABI, and as history repeats, again and again, this happens eventually 
with every ABI.


So once the ivory tower is built, and you let all the kids in to play, 
they are going to have a party and you are going to start noticing chips 
and eventually cracks, and eventually the tower will go into disrepair 
and fall because somebody else has built a new and better one further 
down the road.  Why go through that exercise if nobody sees any tangible 
benefit from it today?


Paravirt-ops avoids this because it is an API, and because it is 
flexible, and because it can change with the kernel, and because it 
doesn't lock you into a legacy way of doing things, it allows you to 
fork and adapt and push legacy and future compatibility issues into the 
vendor backend modules, like VMI, where they should belong.


Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [SLUB 2/3] Large kmalloc pass through. Removal of large general slabs

2007-03-06 Thread Matt Mackall
On Tue, Mar 06, 2007 at 06:35:16PM -0800, Christoph Lameter wrote:
> Unlimited kmalloc size and removal of general caches >=4.
> 
> We can directly use the page allocator for all allocations 4K and larger. This
> means that no general slabs are necessary and the size of the allocation 
> passed
> to kmalloc() can be arbitrarily large. Remove the useless general caches over 
> 4k.

I've been meaning to do this in SLOB as well. Perhaps it warrants
doing in stock kmalloc? I've got a grand total of 18 of these objects
here.

The downside is this makes them suddenly disappear off the slabinfo
radar.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[SLUB 3/3] Guarantee minimum number of objects in a slab

2007-03-06 Thread Christoph Lameter
Guarantee a mininum number of objects per slab

The number of objects per slab is important for SLUB because it determines
the number of allocations that can be performed without having to consult
per node slab lists. Add another boot option "min_objects=xx" that
allows the configuration of the objects per slab. This is similar
to SLABS queue configurations.

Set the default of objects to 4. This will increase the page order for
certain slab objects.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc2-mm1/mm/slub.c
===
--- linux-2.6.21-rc2-mm1.orig/mm/slub.c 2007-03-06 17:57:11.0 -0800
+++ linux-2.6.21-rc2-mm1/mm/slub.c  2007-03-06 17:57:15.0 -0800
@@ -1201,6 +1201,12 @@ static __always_inline struct page *get_
 static int slub_min_order = 0;
 
 /*
+ * Minumum number of objects per slab. This is necessary in order to
+ * reduce locking overhead. Similar to the queue size in SLAB.
+ */
+static int slub_min_objects = 4;
+
+/*
  * Merge control. If this is set then no merging of slab caches will occur.
  */
 static int slub_nomerge = 0;
@@ -1232,7 +1238,7 @@ static int calculate_order(int size)
order < MAX_ORDER; order++) {
unsigned long slab_size = PAGE_SIZE << order;
 
-   if (slab_size < size)
+   if (slab_size < slub_min_objects * size)
continue;
 
rem = slab_size % size;
@@ -1624,6 +1630,15 @@ static int __init setup_slub_min_order(c
 
 __setup("slub_min_order=", setup_slub_min_order);
 
+static int __init setup_slub_min_objects(char *str)
+{
+   get_option (, _min_objects);
+
+   return 1;
+}
+
+__setup("slub_min_objects=", setup_slub_min_objects);
+
 static int __init setup_slub_nomerge(char *str)
 {
slub_nomerge = 1;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >