Re: [PATCH] X86: fix typo PAT to X86_PAT

2008-01-18 Thread Yinghai Lu
On Friday 18 January 2008 07:28:49 pm Dave Jones wrote:
> On Fri, Jan 18, 2008 at 10:02:10PM +0100, Ingo Molnar wrote:
>  > 
>  > * Dave Jones <[EMAIL PROTECTED]> wrote:
>  > 
>  > >  > you mean modifies MTRRs? Which code is that? (besides the 
>  > >  > /proc/mtrr userspace API)
>  > > 
>  > > This exclusion is going to be a real pain in the ass for distro 
>  > > kernels. It's impossible for example to build a kernel that will now 
>  > > support the MTRR-alike registers on the AMD K6/early Cyrix etc and 
>  > > also support PAT.
>  > > 
>  > > Additionally, given people tend to update their kernels a lot more 
>  > > often than they update to a whole new version of X, it means until 
>  > > userspace has caught up, we can't ship a kernel with PAT supported, or 
>  > > else X gets a lot slower due to the missing mtrr support.
>  > 
>  > there's no exclusion enforced right now, and if a CPU is PAT-incapable 
>  > (or if the kernel is booted nopat) then the MTRR bits should be usable. 
>  > But if we boot with PAT enabled, and Xorg gets /proc/mtrr wrong, we'll 
>  > see nasty crashes. If it gets them right, it should all still work just 
>  > fine. Is this ok? Then, in a year or two, distros can disable write 
>  > support to /proc/mtrr. Hm?
> 
> A crazy idea just occured to me..  We could make /proc/mtrr an interface
> to set PAT on a range of memory.  This would make it transparently work
> without any changes in X or anything else that sets them in userspace.

goog idea...

we need to make X86_PAT depend on MTRR in arch/x86/Kconfig

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] X86: disable X86_PAT really

2008-01-18 Thread Yinghai Lu
[PATCH] X86: disable X86_PAT really

when X86_PAT is not selected, we don't need to do anything in reserve_mattr and 
free_mattr

also need to bail out if cpu not support PAT.

Signed-off-by: Yinghai Lu <[EMAIL PROTECTED]>

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 1036134..b3cdee1 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -57,12 +57,9 @@ static int pat_known_cpu(void)
 
 void pat_init(void)
 {
+#ifdef CONFIG_X86_PAT
u64 pat;
 
-#ifndef CONFIG_X86_PAT
-   nopat(NULL);
-#endif
-
if (!smp_processor_id() && !pat_known_cpu())
return;
 
@@ -90,6 +87,7 @@ void pat_init(void)
wrmsrl(MSR_IA32_CR_PAT, pat);
printk(KERN_INFO "x86 PAT enabled: cpu %d, old 0x%Lx, new 0x%Lx\n",
smp_processor_id(), boot_pat_state, pat);
+#endif
 }
 
 #undef PAT
@@ -135,9 +133,13 @@ static DEFINE_SPINLOCK(mattr_lock);/* protects 
memattr list */
 
 int reserve_mattr(u64 start, u64 end, unsigned long attr, unsigned long *fattr)
 {
+#ifdef CONFIG_X86_PAT
struct memattr *ma = NULL, *ml;
int err = 0;
 
+   if (!pat_wc_enabled)
+   return 0;
+
if (fattr)
*fattr = attr;
 
@@ -191,13 +193,20 @@ int reserve_mattr(u64 start, u64 end, unsigned long attr, 
unsigned long *fattr)
 
spin_unlock(_lock);
return err;
+#else
+   return 0;
+#endif
 }
 
 int free_mattr(u64 start, u64 end, unsigned long attr)
 {
+#ifdef CONFIG_X86_PAT
struct memattr *ml;
int err = attr ? -EBUSY : 0;
 
+   if (!pat_wc_enabled)
+   return 0;
+
if (is_memory_any_valid(start, end))
return 0;
 
@@ -221,6 +230,9 @@ int free_mattr(u64 start, u64 end, unsigned long attr)
current->comm, current->pid,
start, end, cattr_name(attr));
return err;
+#else
+   return 0;
+#endif
 }
 
 /* /dev/mem interface. Use the previous mapping */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: fix unconditional arch/x86/kernel/pcspeaker.c compiling

2008-01-18 Thread Taral
On 1/18/08, Michael Opdenacker <[EMAIL PROTECTED]> wrote:
> Do you mean "almost nothing"? It still allocates and adds a platform
> device, and the corresponding function always gets called at boot time.

Nothing significant then. I don't see any added functionality from this file.

-- 
Taral <[EMAIL PROTECTED]>
"Please let me know if there's any further trouble I can give you."
-- Unknown
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GEODE] Geode GX/LX watchdog timer (was 2.6.24-rc8 hangs at mfgpt-timer)

2008-01-18 Thread Willy Tarreau
On Fri, Jan 18, 2008 at 06:06:24PM -0700, Jordan Crouse wrote:
> I don't know how much of a hassle it would be for Andres to get a 2.6.24
> kernel running on the OLPC to make sure that this isn't a regression
> in the timer tick code (I suspect it isn't a regression, but you never
> know).  I also think that it would probably be in our best interest to
> default CONFIG_GEODE_MFGPT_TIMER to 'n' until we get this figured
> out.  Since most BIOSen don't have timers available, that shouldn't affect
> too many people.

Well, I've successfully used earlier version of this code with 2.6.22
on a PCEngines ALIX motherboard equipped with LX800/CS5536. It boots
on a TinyBIOS.

I will try 2.6.24 + this patch on these boards when I have some time.

Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [7/8] CPA: Implement GBpages support in change_page_attr()

2008-01-18 Thread Andi Kleen

Teach c_p_a() to split and unsplit GB pages.

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86/mm/pageattr_64.c |  150 --
 1 file changed, 119 insertions(+), 31 deletions(-)

Index: linux/arch/x86/mm/pageattr_64.c
===
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -40,6 +40,9 @@ pte_t *lookup_address(unsigned long addr
pud = pud_offset(pgd, address);
if (!pud_present(*pud))
return NULL; 
+   *level = 2;
+   if (pud_large(*pud))
+   return (pte_t *)pud;
pmd = pmd_offset(pud, address);
if (!pmd_present(*pmd))
return NULL; 
@@ -53,30 +56,85 @@ pte_t *lookup_address(unsigned long addr
return pte;
 } 
 
-static struct page *split_large_page(unsigned long address, pgprot_t prot,
-pgprot_t ref_prot)
-{ 
-   int i; 
+static pte_t *alloc_split_page(struct page **base)
+{
+   struct page *p = alloc_page(GFP_KERNEL);
+   if (!p)
+   return NULL;
+   SetPagePrivate(p);
+   page_private(p) = 0;
+   *base = p;
+   return page_address(p);
+}
+
+static struct page *free_split_page(struct page *base)
+{
+   BUG_ON(!PagePrivate(base));
+   BUG_ON(page_private(base) != 0);
+   ClearPagePrivate(base);
+   __free_page(base);
+   return NULL;
+}
+
+static struct page *
+split_pmd(unsigned long paddr, pgprot_t prot, pgprot_t ref_prot)
+{
+   int i;
unsigned long addr;
-   struct page *base = alloc_pages(GFP_KERNEL, 0);
-   pte_t *pbase;
-   if (!base) 
+   struct page *base;
+   pte_t *pbase = alloc_split_page();
+   if (!pbase)
return NULL;
-   /*
-* page_private is used to track the number of entries in
-* the page table page have non standard attributes.
-*/
-   SetPagePrivate(base);
-   page_private(base) = 0;
 
-   address = __pa(address);
-   addr = address & PMD_PAGE_MASK;
-   pbase = (pte_t *)page_address(base);
-   for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) {
-   pbase[i] = pfn_pte(addr >> PAGE_SHIFT, 
-  addr == address ? prot : ref_prot);
+   addr = paddr & PMD_PAGE_MASK;
+   for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE)
+   pbase[i] = pfn_pte(addr >> PAGE_SHIFT,
+  addr == paddr ? prot : ref_prot);
+
+   return base;
+}
+
+static struct page *
+split_gb(unsigned long paddr, pgprot_t prot, pgprot_t ref_prot)
+{
+   unsigned long addr;
+   int i;
+   struct page *base;
+   pte_t *pbase = alloc_split_page();
+
+   if (!pbase)
+   return NULL;
+   addr = paddr & PUD_PAGE_MASK;
+   for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_PAGE_SIZE) {
+   if (paddr >= addr && paddr < addr + PMD_PAGE_SIZE) {
+   struct page *l3;
+   l3 = split_pmd(paddr, prot, ref_prot);
+   if (!l3)
+   return free_split_page(base);
+   page_private(l3)++;
+   pbase[i] = mk_pte(l3, ref_prot);
+   } else {
+   pbase[i] = pfn_pte(addr>>PAGE_SHIFT, ref_prot);
+   pbase[i] = pte_mkhuge(pbase[i]);
+   }
}
return base;
+}
+
+static struct page *split_large_page(unsigned long address, pgprot_t prot,
+pgprot_t ref_prot, int level)
+{
+   unsigned long paddr = __pa(address);
+   if (level == 2)
+   return split_gb(paddr, prot, ref_prot);
+   else if (level == 3)
+   return split_pmd(paddr, prot, ref_prot);
+   else {
+   printk("address %lx\n", address);
+   dump_pagetable(address);
+   BUG();
+   }
+   return NULL;
 } 
 
 struct flush_arg {
@@ -132,17 +190,40 @@ static inline void save_page(struct page
list_add(>lru, _pages);
 }
 
+static void reset_large_pte(pte_t *pte, unsigned long addr, pgprot_t prot)
+{
+   unsigned long pfn = __pa(addr) >> PAGE_SHIFT;
+   set_pte(pte, pte_mkhuge(pfn_pte(pfn, prot)));
+}
+
+static void
+revert_gb(unsigned long address, pud_t *pud, pmd_t *pmd, pgprot_t ref_prot)
+{
+   struct page *p = virt_to_page(pmd);
+
+   /* Reserved pages have been already set up at boot. Don't touch those. 
*/
+   if (PageReserved(p))
+   return;
+
+   --page_private(p);
+   BUG_ON(page_private(p) < 0);
+   if (page_private(p) == 0) {
+   save_page(p);
+   reset_large_pte((pte_t *)pud, address & PUD_PAGE_MASK,
+   ref_prot);
+   }
+}
+
 /* 
  * No more special protections in this 2MB area - revert to a
- 

[PATCH] [8/8] GBPAGES: Do kernel direct mapping at boot using GB pages

2008-01-18 Thread Andi Kleen

This should decrease TLB pressure because the kernel will need
less TLB faults for its own data access.

Only done for 64bit because i386 does not support GB page tables.

This only applies to the data portion of the direct mapping; the
kernel text mapping stays with 2MB pages because the AMD Fam10h
microarchitecture does not support GB ITLBs and AMD recommends 
against using GB mappings for code.

Can be disabled with direct_gbpages=off

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86/mm/init_64.c |   63 ++
 1 file changed, 54 insertions(+), 9 deletions(-)

Index: linux/arch/x86/mm/init_64.c
===
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -268,13 +268,20 @@ void early_iounmap(void *addr, unsigned 
__flush_tlb();
 }
 
+static unsigned long direct_entry(unsigned long paddr)
+{
+   unsigned long entry;
+   entry = __PAGE_KERNEL_LARGE|paddr;
+   entry &= __supported_pte_mask;
+   return entry;
+}
+
 static void __meminit
 phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end)
 {
int i = pmd_index(address);
 
for (; i < PTRS_PER_PMD; i++, address += PMD_SIZE) {
-   unsigned long entry;
pmd_t *pmd = pmd_page + pmd_index(address);
 
if (address >= end) {
@@ -287,9 +294,7 @@ phys_pmd_init(pmd_t *pmd_page, unsigned 
if (pmd_val(*pmd))
continue;
 
-   entry = __PAGE_KERNEL_LARGE|_PAGE_GLOBAL|address;
-   entry &= __supported_pte_mask;
-   set_pmd(pmd, __pmd(entry));
+   set_pmd(pmd, __pmd(direct_entry(address)));
}
 }
 
@@ -317,7 +322,13 @@ static void __meminit phys_pud_init(pud_
break;
 
if (pud_val(*pud)) {
-   phys_pmd_update(pud, addr, end);
+   if (!pud_large(*pud))
+   phys_pmd_update(pud, addr, end);
+   continue;
+   }
+
+   if (direct_gbpages > 0) {
+   set_pud(pud, __pud(direct_entry(addr)));
continue;
}
 
@@ -336,9 +347,11 @@ static void __init find_early_table_spac
unsigned long puds, pmds, tables, start;
 
puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
-   pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
-   tables = round_up(puds * sizeof(pud_t), PAGE_SIZE) +
-round_up(pmds * sizeof(pmd_t), PAGE_SIZE);
+   tables = round_up(puds * sizeof(pud_t), PAGE_SIZE);
+   if (!direct_gbpages) {
+   pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
+   tables += round_up(pmds * sizeof(pmd_t), PAGE_SIZE);
+   }
 
/* RED-PEN putting page tables only on node 0 could
   cause a hotspot and fill up ZONE_DMA. The page tables
@@ -373,8 +386,15 @@ void __init_refok init_memory_mapping(un
 * mapped.  Unfortunately this is done currently before the nodes are 
 * discovered.
 */
-   if (!after_bootmem)
+   if (!after_bootmem) {
+   if (direct_gbpages >= 0 && cpu_has_gbpages) {
+   printk(KERN_INFO "Using GB pages for direct mapping\n");
+   direct_gbpages = 1;
+   } else
+   direct_gbpages = 0;
+
find_early_table_space(end);
+   }
 
start = (unsigned long)__va(start);
end = (unsigned long)__va(end);
@@ -423,6 +443,27 @@ void __init paging_init(void)
 }
 #endif
 
+static void split_gb_page(pud_t *pud, unsigned long paddr)
+{
+   int i;
+   pmd_t *pmd;
+   struct page *p = alloc_page(GFP_KERNEL);
+   if (!p)
+   return;
+
+   Dprintk("split_gb_page %lx\n", paddr);
+
+   SetPagePrivate(p);
+   /* Set reference to 1 so that c_p_a() does not undo it */
+   page_private(p) = 1;
+
+   paddr &= PUD_PAGE_MASK;
+   pmd = page_address(p);
+   for (i = 0; i < PTRS_PER_PTE; i++, paddr += PMD_PAGE_SIZE)
+   pmd[i] = __pmd(direct_entry(paddr));
+   pud_populate(NULL, pud, pmd);
+}
+
 /* Unmap a kernel mapping if it exists. This is useful to avoid prefetches
from the CPU leading to inconsistent cache lines. address and size
must be aligned to 2MB boundaries. 
@@ -434,6 +475,8 @@ __clear_kernel_mapping(unsigned long add
 
BUG_ON(address & ~PMD_PAGE_MASK);
BUG_ON(size & ~PMD_PAGE_MASK);
+
+   Dprintk("clear_kernel_mapping %lx-%lx\n", address, address+size);

for (; address < end; address += PMD_PAGE_SIZE) {
pgd_t *pgd = pgd_offset_k(address);
@@ -442,6 +485,8 @@ __clear_kernel_mapping(unsigned long add
if (pgd_none(*pgd))
continue;
pud = pud_offset(pgd, address);
+   if 

[PATCH] [6/8] Add an option to disable direct mapping gbpages and a global variable

2008-01-18 Thread Andi Kleen

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 Documentation/x86_64/boot-options.txt |3 +++
 arch/x86/mm/init_64.c |   12 
 include/asm-x86/pgtable_64.h  |2 ++
 3 files changed, 17 insertions(+)

Index: linux/arch/x86/mm/init_64.c
===
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -57,6 +57,18 @@ static unsigned long dma_reserve __initd
 
 DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
 
+int direct_gbpages;
+
+static int __init parse_direct_gbpages(char *arg)
+{
+   if (!strcmp(arg, "off")) {
+   direct_gbpages = -1;
+   return 0;
+   }
+   return -1;
+}
+early_param("direct_gbpages", parse_direct_gbpages);
+
 /*
  * NOTE: pagetable_init alloc all the fixmap pagetables contiguous on the
  * physical space so we can cache the place of the first one and move
Index: linux/include/asm-x86/pgtable_64.h
===
--- linux.orig/include/asm-x86/pgtable_64.h
+++ linux/include/asm-x86/pgtable_64.h
@@ -248,6 +248,8 @@ static inline int pud_large(pud_t pte)
 
 #define update_mmu_cache(vma,address,pte) do { } while (0)
 
+extern int direct_gbpages;
+
 /* Encode and de-code a swap entry */
 #define __swp_type(x)  (((x).val >> 1) & 0x3f)
 #define __swp_offset(x)((x).val >> 8)
Index: linux/Documentation/x86_64/boot-options.txt
===
--- linux.orig/Documentation/x86_64/boot-options.txt
+++ linux/Documentation/x86_64/boot-options.txt
@@ -307,3 +307,6 @@ Debugging
stuck (default)
 
 Miscellaneous
+
+   direct_gbpages=off
+   Do not use GB pages for kernel direct mapping.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [5/8] GBPAGES: Support gbpages in pagetable dump

2008-01-18 Thread Andi Kleen

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86/mm/fault_64.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux/arch/x86/mm/fault_64.c
===
--- linux.orig/arch/x86/mm/fault_64.c
+++ linux/arch/x86/mm/fault_64.c
@@ -200,7 +200,8 @@ void dump_pagetable(unsigned long addres
pud = pud_offset(pgd, address);
if (bad_address(pud)) goto bad;
printk("PUD %lx ", pud_val(*pud));
-   if (!pud_present(*pud)) goto ret;
+   if (!pud_present(*pud) || pud_large(*pud))
+   goto ret;
 
pmd = pmd_offset(pud, address);
if (bad_address(pmd)) goto bad;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [3/8] GBPAGES: Split LARGE_PAGE_SIZE/MASK into PUD_PAGE_SIZE/PMD_PAGE_SIZE

2008-01-18 Thread Andi Kleen

Split the existing LARGE_PAGE_SIZE/MASK macro into two new macros
PUD_PAGE_SIZE/MASK and PMD_PAGE_SIZE/MASK. 

Fix up all callers to use the new names.

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86/boot/compressed/head_64.S |8 
 arch/x86/kernel/head_64.S  |4 ++--
 arch/x86/kernel/pci-gart_64.c  |2 +-
 arch/x86/mm/init_64.c  |6 +++---
 arch/x86/mm/pageattr_64.c  |4 ++--
 include/asm-x86/page.h |4 ++--
 include/asm-x86/page_32.h  |4 
 include/asm-x86/page_64.h  |3 +++
 8 files changed, 21 insertions(+), 14 deletions(-)

Index: linux/include/asm-x86/page_64.h
===
--- linux.orig/include/asm-x86/page_64.h
+++ linux/include/asm-x86/page_64.h
@@ -23,6 +23,9 @@
 #define MCE_STACK 5
 #define N_EXCEPTION_STACKS 5  /* hw limit: 7 */
 
+#define PUD_PAGE_SIZE (_AC(1, UL) << PUD_SHIFT)
+#define PUD_PAGE_MASK (~(PUD_PAGE_SIZE-1))
+
 #define __PAGE_OFFSET  _AC(0x8100, UL)
 
 #define __PHYSICAL_START   CONFIG_PHYSICAL_START
Index: linux/arch/x86/boot/compressed/head_64.S
===
--- linux.orig/arch/x86/boot/compressed/head_64.S
+++ linux/arch/x86/boot/compressed/head_64.S
@@ -80,8 +80,8 @@ startup_32:
 
 #ifdef CONFIG_RELOCATABLE
movl%ebp, %ebx
-   addl$(LARGE_PAGE_SIZE -1), %ebx
-   andl$LARGE_PAGE_MASK, %ebx
+   addl$(PMD_PAGE_SIZE -1), %ebx
+   andl$PMD_PAGE_MASK, %ebx
 #else
movl$CONFIG_PHYSICAL_START, %ebx
 #endif
@@ -220,8 +220,8 @@ ENTRY(startup_64)
/* Start with the delta to where the kernel will run at. */
 #ifdef CONFIG_RELOCATABLE
leaqstartup_32(%rip) /* - $startup_32 */, %rbp
-   addq$(LARGE_PAGE_SIZE - 1), %rbp
-   andq$LARGE_PAGE_MASK, %rbp
+   addq$(PMD_PAGE_SIZE - 1), %rbp
+   andq$PMD_PAGE_MASK, %rbp
movq%rbp, %rbx
 #else
movq$CONFIG_PHYSICAL_START, %rbp
Index: linux/arch/x86/kernel/pci-gart_64.c
===
--- linux.orig/arch/x86/kernel/pci-gart_64.c
+++ linux/arch/x86/kernel/pci-gart_64.c
@@ -501,7 +501,7 @@ static __init unsigned long check_iommu_
}
 
a = aper + iommu_size;
-   iommu_size -= round_up(a, LARGE_PAGE_SIZE) - a;
+   iommu_size -= round_up(a, PMD_PAGE_SIZE) - a;
 
if (iommu_size < 64*1024*1024) {
printk(KERN_WARNING
Index: linux/arch/x86/kernel/head_64.S
===
--- linux.orig/arch/x86/kernel/head_64.S
+++ linux/arch/x86/kernel/head_64.S
@@ -63,7 +63,7 @@ startup_64:
 
/* Is the address not 2M aligned? */
movq%rbp, %rax
-   andl$~LARGE_PAGE_MASK, %eax
+   andl$~PMD_PAGE_MASK, %eax
testl   %eax, %eax
jnz bad_address
 
@@ -88,7 +88,7 @@ startup_64:
 
/* Add an Identity mapping if I am above 1G */
leaq_text(%rip), %rdi
-   andq$LARGE_PAGE_MASK, %rdi
+   andq$PMD_PAGE_MASK, %rdi
 
movq%rdi, %rax
shrq$PUD_SHIFT, %rax
Index: linux/arch/x86/mm/init_64.c
===
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -420,10 +420,10 @@ __clear_kernel_mapping(unsigned long add
 {
unsigned long end = address + size;
 
-   BUG_ON(address & ~LARGE_PAGE_MASK);
-   BUG_ON(size & ~LARGE_PAGE_MASK); 
+   BUG_ON(address & ~PMD_PAGE_MASK);
+   BUG_ON(size & ~PMD_PAGE_MASK);

-   for (; address < end; address += LARGE_PAGE_SIZE) { 
+   for (; address < end; address += PMD_PAGE_SIZE) {
pgd_t *pgd = pgd_offset_k(address);
pud_t *pud;
pmd_t *pmd;
Index: linux/arch/x86/mm/pageattr_64.c
===
--- linux.orig/arch/x86/mm/pageattr_64.c
+++ linux/arch/x86/mm/pageattr_64.c
@@ -70,7 +70,7 @@ static struct page *split_large_page(uns
page_private(base) = 0;
 
address = __pa(address);
-   addr = address & LARGE_PAGE_MASK; 
+   addr = address & PMD_PAGE_MASK;
pbase = (pte_t *)page_address(base);
for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE) {
pbase[i] = pfn_pte(addr >> PAGE_SHIFT, 
@@ -150,7 +150,7 @@ static void revert_page(unsigned long ad
BUG_ON(pud_none(*pud));
pmd = pmd_offset(pud, address);
BUG_ON(pmd_val(*pmd) & _PAGE_PSE);
-   pfn = (__pa(address) & LARGE_PAGE_MASK) >> PAGE_SHIFT;
+   pfn = (__pa(address) & PMD_PAGE_MASK) >> PAGE_SHIFT;
large_pte = pfn_pte(pfn, ref_prot);
large_pte = pte_mkhuge(large_pte);
set_pte((pte_t *)pmd, large_pte);
Index: linux/include/asm-x86/page_32.h

[PATCH] [4/8] Add pgtable accessor functions for GB pages

2008-01-18 Thread Andi Kleen

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 include/asm-x86/pgtable_64.h |6 ++
 1 file changed, 6 insertions(+)

Index: linux/include/asm-x86/pgtable_64.h
===
--- linux.orig/include/asm-x86/pgtable_64.h
+++ linux/include/asm-x86/pgtable_64.h
@@ -208,6 +208,12 @@ static inline unsigned long pmd_bad(pmd_
 #define pud_offset(pgd, address) ((pud_t *) pgd_page_vaddr(*(pgd)) + 
pud_index(address))
 #define pud_present(pud) (pud_val(pud) & _PAGE_PRESENT)
 
+static inline int pud_large(pud_t pte)
+{
+   return (pud_val(pte) & (_PAGE_PSE|_PAGE_PRESENT)) ==
+   (_PAGE_PSE|_PAGE_PRESENT);
+}
+
 /* PMD  - Level 2 access */
 #define pmd_page_vaddr(pmd) ((unsigned long) __va(pmd_val(pmd) & PTE_MASK))
 #define pmd_page(pmd)  (pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT))
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [0/8] GBpages support for x86-64, v2

2008-01-18 Thread Andi Kleen

This patch series supports using the new GB pages introduced with 
AMD Quad Cores for the kernel direct mapping.

I addressed all reasonable feedback for the previous version I believe.

Changes against previous version:
- Ported on top of latest git-x86 with PAT series
- Fixed some white space
- Clarify clear_kernel_mapping comments
- Minor cleanups

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [1/8] Handle kernel near memory hole in clear_kernel_mapping

2008-01-18 Thread Andi Kleen

This was a long standing obscure problem in the relocatable kernel. The
AMD GART driver needs to unmap part of the GART in the kernel direct mapping to 
prevent cache corruption. With the relocatable kernel it is in theory possible 
that the separate kernel text mapping straddles that area too. 

Normally it should not happen because GART tends to be >= 2GB, and the kernel 
is normally not loaded that high, but it is possible in theory. 

Teach clear_kernel_mapping() about this case.

This will become more important once the kernel mapping uses 1GB pages.

Cc: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86/mm/init_64.c |   25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

Index: linux/arch/x86/mm/init_64.c
===
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -415,7 +415,8 @@ void __init paging_init(void)
from the CPU leading to inconsistent cache lines. address and size
must be aligned to 2MB boundaries. 
Does nothing when the mapping doesn't exist. */
-void __init clear_kernel_mapping(unsigned long address, unsigned long size) 
+static void __init
+__clear_kernel_mapping(unsigned long address, unsigned long size)
 {
unsigned long end = address + size;
 
@@ -445,6 +446,28 @@ void __init clear_kernel_mapping(unsigne
__flush_tlb_all();
 } 
 
+#define overlaps(as, ae, bs, be) ((ae) >= (bs) && (as) <= (be))
+
+void __init clear_kernel_mapping(unsigned long address, unsigned long size)
+{
+   int sh = PMD_SHIFT;
+   unsigned long kernel = __pa(__START_KERNEL_map);
+
+   /*
+* Note that we cannot unmap the kernel itself because the unmapped
+* holes here are always at least 2MB aligned.
+* This just applies to the trailing areas of the 40MB kernel mapping.
+*/
+   if (overlaps(kernel >> sh, (kernel + KERNEL_TEXT_SIZE) >> sh,
+   __pa(address) >> sh, __pa(address + size) >> sh)) {
+   printk(KERN_WARNING
+   "Kernel mapping at %lx within 2MB of memory hole\n",
+   kernel);
+   __clear_kernel_mapping(__START_KERNEL_map+__pa(address), size);
+   }
+   __clear_kernel_mapping(address, size);
+}
+
 /*
  * Memory hotplug specific functions
  */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [2/8] GBPAGES: Add feature macros for the gbpages cpuid bit

2008-01-18 Thread Andi Kleen

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 include/asm-x86/cpufeature.h |2 ++
 1 file changed, 2 insertions(+)

Index: linux/include/asm-x86/cpufeature.h
===
--- linux.orig/include/asm-x86/cpufeature.h
+++ linux/include/asm-x86/cpufeature.h
@@ -49,6 +49,7 @@
 #define X86_FEATURE_MP (1*32+19) /* MP Capable. */
 #define X86_FEATURE_NX (1*32+20) /* Execute Disable */
 #define X86_FEATURE_MMXEXT (1*32+22) /* AMD MMX extensions */
+#define X86_FEATURE_GBPAGES(1*32+26) /* GB pages */
 #define X86_FEATURE_RDTSCP (1*32+27) /* RDTSCP */
 #define X86_FEATURE_LM (1*32+29) /* Long Mode (x86-64) */
 #define X86_FEATURE_3DNOWEXT   (1*32+30) /* AMD 3DNow! extensions */
@@ -173,6 +174,7 @@
 #define cpu_has_btsboot_cpu_has(X86_FEATURE_BTS)
 #define cpu_has_patboot_cpu_has(X86_FEATURE_PAT)
 #define cpu_has_ss boot_cpu_has(X86_FEATURE_SELFSNOOP)
+#define cpu_has_gbpagesboot_cpu_has(X86_FEATURE_GBPAGES)
 
 #if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64)
 # define cpu_has_invlpg1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Relax restrictions on setting CONFIG_NUMA on x86

2008-01-18 Thread Andi Kleen
Mel Gorman <[EMAIL PROTECTED]> writes:

> A fix[1] was merged to the x86.git tree that allowed NUMA kernels to boot
> on normal x86 machines (and not just NUMA-Q, Summit etc.). I took a look
> at the restrictions on setting NUMA on x86 to see if they could be lifted.

The problem with i386 CONFIG_NUMA previously was not that it didn't
boot on normal non NUMA systems, but that it didn't boot on very
common NUMA systems: Opterons.  Have you tested if that is fixed now?

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


" Change size of node ids from u8 to u16 fixup" causes early panic in __build_all_zonelists

2008-01-18 Thread Andi Kleen

One of my test systems didn't boot with latest git-x86.
I bisected it down to f1321f875910172bcc3e1f302fe145a9e4d3bdf7

With later patches the fault seemed to happen even earlier before
other initialization messages.

Config is available at http://halobates.de/config

-Andi

commit f1321f875910172bcc3e1f302fe145a9e4d3bdf7
Author: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
Date:   Fri Jan 18 23:05:33 2008 +0100

x86: Change size of node ids from u8 to u16 fixup

Change the size of node ids for X86_64 from 8 bits to 16 bits
to accomodate more than 256 nodes.

Introduce a "numanode_t" type for x86-generic usage.
...

swsusp: Registered nosave memory region: ff70 - 0001
Allocating PCI resources starting at e200 (gap: e000:1ec0)
SMP: Allowing 4 CPUs, 0 hotplug CPUs
PERCPU: Allocating 34912 bytes of per cpu data
PANIC: early exception 0e rip 10:802602cd error 0 cr2 6b0
Pid: 0, comm: swapper Not tainted 2.6.24-rc8-g3c2d7552 #27

Call Trace:
 [] __build_all_zonelists+0x2a9/0x40a
 [] __build_all_zonelists+0xec/0x40a
 [] build_all_zonelists+0x1a0/0x244
 [] start_kernel+0x110/0x2bd
 [] _sinittext+0x1c6/0x1cd

RIP 0x10



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] RUSAGE_THREAD

2008-01-18 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Roland McGrath wrote:
> +#define  RUSAGE_LWP  RUSAGE_THREAD   /* Solaris name for same */

No need to clutter the kernel header with this, it'll be in the libc header.

Aside from that:

Acked-by: Ulrich Drepper <[EMAIL PROTECTED]>

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFHkZbk2ijCOnn/RHQRAtohAKCyWgJsm20LSqxTznvff3LI8zplvgCgwttu
16eJFNgQXWNEk76b141uZvo=
=DzhA
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] x86: Change size of node ids from u8 to u16 fixup

2008-01-18 Thread Yinghai Lu
On Jan 18, 2008 9:17 PM, David Rientjes <[EMAIL PROTECTED]> wrote:
> On Fri, 18 Jan 2008, Yinghai Lu wrote:
>
> > > > I got
> > > > SART: PXM 0 -> APIC 0 -> Node 255
> > > > SART: PXM 0 -> APIC 1 -> Node 255
> > > > SART: PXM 1 -> APIC 2 -> Node 255
> > > > SART: PXM 1 -> APIC 3 -> Node 255
> > > >
> > >
> > > I assume this is a typo and those proximity mappings are actually from the
> > > SRAT.
> >
> > SRAT for processor only have
> > PXM and APIC id. setup_node(pxm) will get node id for pxm, start from 0...
> >
>
> I was referring to "SART" in your log.

i should copy it instead of type it...

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] Development release 0.1 of the LatencyTOP tool

2008-01-18 Thread Andi Kleen
> syscall nr and pid at minimum then.

oprofile already supports logging the pid I believe. Otherwise
the pid filter in opreport could hardly work.

> Still doesn't work for modules either.

oprofile works fine for modules.

> 
> what it ends up doing is using an entirely different interface for 
> basically the
> same code / operation inside the kernel.

Well rather it uses an existing framework for something that fits
it well.

Also the way I proposed is very cheap and would be possible
to use in production kernels without special configs.

> The current interface code is maybe 80 lines of /proc code... and very 
> simple to
> use (unlike the oprofile interface)

The oprofile interface is per CPU (so you wouldn't need to reinvent
that to fix your locking) and if you add the syscall
logging feature to it it would apply to all profile events
e.g. you could then do things like "matching cache misses to syscalls" 

-andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] kernel/params.c: fix the module name length in param_sysfs_builtin

2008-01-18 Thread rae l
From: Denis Cheng <[EMAIL PROTECTED]>
Date: Sat, 19 Jan 2008 13:29:51 +0800
Subject: [PATCH] kernel/params.c: fix the module name length in
param_sysfs_builtin

the original code use KOBJ_NAME_LEN for built-in module name length,
that's defined to 20 in linux/kobject.h, but this is not enough appearntly,
many module names are longer than this;
 #define KOBJ_NAME_LEN   20

another macro is MODULE_NAME_LEN defined in linux/module.h, I think this is
enough for module names:
 #define MODULE_NAME_LEN (64 - sizeof(unsigned long))

Signed-off-by: Denis Cheng <[EMAIL PROTECTED]>
---
 kernel/params.c |8 +++-
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/kernel/params.c b/kernel/params.c
index 7686417..a085b40 100644
--- a/kernel/params.c
+++ b/kernel/params.c
@@ -376,8 +376,6 @@ int param_get_string(char *buffer, struct kernel_param *kp)

 extern struct kernel_param __start___param[], __stop___param[];

-#define MAX_KBUILD_MODNAME KOBJ_NAME_LEN
-
 struct param_attribute
 {
struct module_attribute mattr;
@@ -588,7 +586,7 @@ static void __init param_sysfs_builtin(void)
 {
struct kernel_param *kp, *kp_begin = NULL;
unsigned int i, name_len, count = 0;
-   char modname[MAX_KBUILD_MODNAME + 1] = "";
+   char modname[MODULE_NAME_LEN + 1] = "";

for (i=0; i < __stop___param - __start___param; i++) {
char *dot;
@@ -596,12 +594,12 @@ static void __init param_sysfs_builtin(void)

kp = &__start___param[i];
max_name_len =
-   min_t(size_t, MAX_KBUILD_MODNAME, strlen(kp->name));
+   min_t(size_t, MODULE_NAME_LEN, strlen(kp->name));

dot = memchr(kp->name, '.', max_name_len);
if (!dot) {
DEBUGP("couldn't find period in first %d characters "
-  "of %s\n", MAX_KBUILD_MODNAME, kp->name);
+  "of %s\n", MODULE_NAME_LEN, kp->name);
continue;
}
name_len = dot - kp->name;
-- 
1.5.3.5

-- 
Denis Cheng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] Development release 0.1 of the LatencyTOP tool

2008-01-18 Thread Arjan van de Ven

Andi Kleen wrote:
another thing that the current profiling can't do, is to show what the 
system is doing
when it hits the latency.. so someone calling fsync() will show up in the 
waiting for

IO function, but not that it was due to an fsync().


Hmm so how about extending oprofile to always log the syscall number
in the event logs (can be gotten from top of stack).


syscall nr and pid at minimum then.
Still doesn't work for modules either.

what it ends up doing is using an entirely different interface for basically the
same code / operation inside the kernel.
The current interface code is maybe 80 lines of /proc code... and very simple to
use (unlike the oprofile interface)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] Development release 0.1 of the LatencyTOP tool II

2008-01-18 Thread Andi Kleen
On Sat, Jan 19, 2008 at 06:33:30AM +0100, Andi Kleen wrote:
> > another thing that the current profiling can't do, is to show what the 
> > system is doing
> > when it hits the latency.. so someone calling fsync() will show up in the 
> > waiting for
> > IO function, but not that it was due to an fsync().
> 
> Hmm so how about extending oprofile to always log the syscall number
> in the event logs (can be gotten from top of stack). I think given

Ok to handle exceptions like page faults this way you would need to save
the vector somewhere on entry, but that shouldn't be very costly 
or difficult and could probably even be done unconditionally.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] Development release 0.1 of the LatencyTOP tool

2008-01-18 Thread Andi Kleen
> another thing that the current profiling can't do, is to show what the 
> system is doing
> when it hits the latency.. so someone calling fsync() will show up in the 
> waiting for
> IO function, but not that it was due to an fsync().

Hmm so how about extending oprofile to always log the syscall number
in the event logs (can be gotten from top of stack). I think given
that you could reconstruct that data in the userland at least
for single threads (not for work done on behalf of them in other
threads; but I'm not sure you tried to solve that problem at all)

The advantage is that it would be an generic mechanism that would work
for all types of profiling.

-Andi
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] Development release 0.1 of the LatencyTOP tool

2008-01-18 Thread Arjan van de Ven

Andi Kleen wrote:
yes indeed; I sort of use the same infrastructure inside the scheduler; the 
biggest
reason I felt I had to do something different was that I wanted to do per 
process
data collection, so that you can see for a specific process what was going 
on.


Wouldn't it have been easier then to just extend the sleep profiler to
oprofile? oprofile already has pid filters and can do per process 
profiling.


it's more complex than that


On the other hand I'm not fully sure only doing per pid profiling
is that useful. After all often latencies come from asynchronous
threads (like kblockd). So a system level view is probably better
anyways. 


another thing that the current profiling can't do, is to show what the system 
is doing
when it hits the latency.. so someone calling fsync() will show up in the 
waiting for
IO function, but not that it was due to an fsync().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] Development release 0.1 of the LatencyTOP tool

2008-01-18 Thread Andi Kleen
> yes indeed; I sort of use the same infrastructure inside the scheduler; the 
> biggest
> reason I felt I had to do something different was that I wanted to do per 
> process
> data collection, so that you can see for a specific process what was going 
> on.

Wouldn't it have been easier then to just extend the sleep profiler to
oprofile? oprofile already has pid filters and can do per process 
profiling.

On the other hand I'm not fully sure only doing per pid profiling
is that useful. After all often latencies come from asynchronous
threads (like kblockd). So a system level view is probably better
anyways. 

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] x86: Change size of node ids from u8 to u16 fixup

2008-01-18 Thread David Rientjes
On Fri, 18 Jan 2008, Yinghai Lu wrote:

> > > I got
> > > SART: PXM 0 -> APIC 0 -> Node 255
> > > SART: PXM 0 -> APIC 1 -> Node 255
> > > SART: PXM 1 -> APIC 2 -> Node 255
> > > SART: PXM 1 -> APIC 3 -> Node 255
> > >
> >
> > I assume this is a typo and those proximity mappings are actually from the
> > SRAT.
> 
> SRAT for processor only have
> PXM and APIC id. setup_node(pxm) will get node id for pxm, start from 0...
> 

I was referring to "SART" in your log.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


create a file in kernel mode. help please!

2008-01-18 Thread Rafael Sisto
Hi there,
obviously this is a newbie question, but I couldn't find any
documentation on how to do it.. I tried several ways but couldnt do
it.
I designed a system call, so a user will call it, and a new file will
be created ('/tmp/filexx'). After that, I have another system call,
that will map the file into the maps of the user process. The idea is
the same as IPC...

I managed to create the file with this function (in the first system call):
fd = filp_open(path, O_CREAT | O_RDWR , 777);

After that, the user will call another system call, and it will map
this file to the process maps.
something like this:
 filp_open(route, O_RDWR,0 );
 do_mmap(fp, 0, tamano, PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_SHARED, 0);

After I call the second system call, the user tries to access the
memory, but gets the message "Bus Error".
I tried to manually create a file with vi, and then use the second
system call, and worked perfectly. I could use the shared memory
without problems.

The problems seems to be in the first system call (with filp_open),
when I try to create a new file... Can somebody suggest me something,
on how I could fix this issue?? It is very important because it is for
a college projects.

Greetings, and thanks in advance for the answers.
Rafael Sisto - Uruguay.-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] Development release 0.1 of the LatencyTOP tool

2008-01-18 Thread Arjan van de Ven

Andi Kleen wrote:

Arjan van de Ven <[EMAIL PROTECTED]> writes:


The Intel Open Source Technology Center is pleased to announce the
release of version 0.1 of LatencyTOP, a tool for developers to visualize
system latencies.


Just for completeness -- Linux already had a way to profile latencies
since quite some time. It's little known unfortunately and doesn't
work for modules since it's a special mode in the old non modular kernel
profiler. 


You enable CONFIG_SCHEDSTATS and boot with profile=sleep and then you can
use the readprofile command to read the data. Information can be reset with 
echo > /proc/profile


There's also a profile=sched to profile the scheduler which works even
without CONFIG_SCHEDSTATS


yes indeed; I sort of use the same infrastructure inside the scheduler; the 
biggest
reason I felt I had to do something different was that I wanted to do per 
process
data collection, so that you can see for a specific process what was going on.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Unify printk strings in fault_32|64.c

2008-01-18 Thread Harvey Harrison
On Sat, 2008-01-19 at 06:08 +0100, Andi Kleen wrote:
> On Saturday 19 January 2008 05:22:29 Harvey Harrison wrote:
> > Adding the address of the faulting library missed removing a
> > line ending from X86_32.
> > 
> > Also update the shorter printk format for X86_32 in fault_64.c
> > to make it easier to se the remaining differences.
> 
> Thanks. I think it was correct initially, but one of the merge steps
> with the changing git-x86 caused some hunks to be dropped and the patch 
> never quite recovered from that.

No worries, hoping to get them unified this weekend, should make this
easier going forward.

Harvey

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Unify printk strings in fault_32|64.c

2008-01-18 Thread Andi Kleen
On Saturday 19 January 2008 05:22:29 Harvey Harrison wrote:
> Adding the address of the faulting library missed removing a
> line ending from X86_32.
> 
> Also update the shorter printk format for X86_32 in fault_64.c
> to make it easier to se the remaining differences.

Thanks. I think it was correct initially, but one of the merge steps
with the changing git-x86 caused some hunks to be dropped and the patch 
never quite recovered from that.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] Development release 0.1 of the LatencyTOP tool

2008-01-18 Thread Andi Kleen
Arjan van de Ven <[EMAIL PROTECTED]> writes:

> The Intel Open Source Technology Center is pleased to announce the
> release of version 0.1 of LatencyTOP, a tool for developers to visualize
> system latencies.

Just for completeness -- Linux already had a way to profile latencies
since quite some time. It's little known unfortunately and doesn't
work for modules since it's a special mode in the old non modular kernel
profiler. 

You enable CONFIG_SCHEDSTATS and boot with profile=sleep and then you can
use the readprofile command to read the data. Information can be reset with 
echo > /proc/profile

There's also a profile=sched to profile the scheduler which works even
without CONFIG_SCHEDSTATS

Latencytop will be probably a little more user friendly though.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH for mm] Remove iBCS support

2008-01-18 Thread Andi Kleen

Hi Andrew,

Can you please queue this patch in -mm for .25. It was posted earlier
and nobody complained.

Thanks,
-Andi




Remove ibcs2 support in ELF loader too

ibcs2 support has never been supported on 2.6 kernels as far as I know,
and if it has it must have been an external patch. Anyways, if anybody
applies an external patch they could as well readd the ibcs checking
code to the ELF loader in the same patch. But there is no reason
to keep this code running in all Linux kernels. This will save at least
two strcmps each ELF execution.

No deprecation period because it could not have been used anyways.

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 fs/binfmt_elf.c |   15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

Index: linux/fs/binfmt_elf.c
===
--- linux.orig/fs/binfmt_elf.c
+++ linux/fs/binfmt_elf.c
@@ -530,7 +530,6 @@ static int load_elf_binary(struct linux_
unsigned long load_addr = 0, load_bias = 0;
int load_addr_set = 0;
char * elf_interpreter = NULL;
-   unsigned char ibcs2_interpreter = 0;
unsigned long error;
struct elf_phdr *elf_ppnt, *elf_phdata;
unsigned long elf_bss, elf_brk;
@@ -647,14 +646,6 @@ static int load_elf_binary(struct linux_
if (elf_interpreter[elf_ppnt->p_filesz - 1] != '\0')
goto out_free_interp;
 
-   /* If the program interpreter is one of these two,
-* then assume an iBCS2 image. Otherwise assume
-* a native linux image.
-*/
-   if (strcmp(elf_interpreter,"/usr/lib/libc.so.1") == 0 ||
-   strcmp(elf_interpreter,"/usr/lib/ld.so.1") == 0)
-   ibcs2_interpreter = 1;
-
/*
 * The early SET_PERSONALITY here is so that the lookup
 * for the interpreter happens in the namespace of the 
@@ -674,7 +665,7 @@ static int load_elf_binary(struct linux_
 * switch really is going to happen - do this in
 * flush_thread().  - akpm
 */
-   SET_PERSONALITY(loc->elf_ex, ibcs2_interpreter);
+   SET_PERSONALITY(loc->elf_ex, 0);
 
interpreter = open_exec(elf_interpreter);
retval = PTR_ERR(interpreter);
@@ -725,7 +716,7 @@ static int load_elf_binary(struct linux_
goto out_free_dentry;
} else {
/* Executables without an interpreter also need a personality  
*/
-   SET_PERSONALITY(loc->elf_ex, ibcs2_interpreter);
+   SET_PERSONALITY(loc->elf_ex, 0);
}
 
/* OK, we are done with that, now set up the arg stuff,
@@ -748,7 +739,7 @@ static int load_elf_binary(struct linux_
 
/* Do this immediately, since STACK_TOP as used in setup_arg_pages
   may depend on the personality.  */
-   SET_PERSONALITY(loc->elf_ex, ibcs2_interpreter);
+   SET_PERSONALITY(loc->elf_ex, 0);
if (elf_read_implies_exec(loc->elf_ex, executable_stack))
current->personality |= READ_IMPLIES_EXEC;
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Christoph Lameter
On Thu, 17 Jan 2008, Olaf Hering wrote:

> On Thu, Jan 17, Olaf Hering wrote:
> 
> > Since -mm boots further, what patch should I try?
> 
> rc8-mm1 crashes as well, l3 passed to reap_alien() is NULL.

Sigh. It looks like we need alien cache structures in some cases for nodes 
that have no memory. We must allocate structures for all nodes regardless 
if they have allocatable memory or not.

 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Remove information leak in Linux CIFS client

2008-01-18 Thread Andi Kleen

Fix information leak in CIFS client lookup

Putting arbitary file names on lookup failures into the system log is not
a good idea, because usually everybody can read dmesg and that is thus
an information leak if a directory was read protected.

Also changed the error printout for this case to a signed number, because
it is normally negative and that makes it easier to read.

I'm not sure the message is all that useful anyways. Perhaps it 
should be just removed completely? Or at least rate limited because
it allows to spam the kernel log nicely.

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

Index: linux/fs/cifs/dir.c
===
--- linux.orig/fs/cifs/dir.c
+++ linux/fs/cifs/dir.c
@@ -518,7 +518,7 @@ cifs_lookup(struct inode *parent_dir_ino
/*  if it was once a directory (but how can we tell?) we could do
shrink_dcache_parent(direntry); */
} else {
-   cERROR(1, ("Error 0x%x on cifs_get_inode_info in lookup of %s",
+   cERROR(1, ("Error %d on cifs_get_inode_info in lookup of file",
   rc, full_path));
/* BB special case check for Access Denied - watch security
exposure of returning dir info implicitly via different rc
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Christoph Lameter
On Fri, 18 Jan 2008, Olaf Hering wrote:

> calls cache_grow with nodeid 0
> > [c075bbd0] [c00f82d0] .cache_alloc_refill+0x234/0x2c0
> calls cache_grow with nodeid 0
> > [c075bbe0] [c00f7f38] .cache_alloc_node+0x17c/0x1e8
> 
> calls cache_grow with nodeid 1
> > [c075bbe0] [c00f7d68] .fallback_alloc+0x1a0/0x1f4

Okay that makes sense. You have no node 0 with normal memory but the node 
assigned to the executing processor is zero (correct?). Thus it needs to 
fallback to node 1 and that is not possible during bootstrap. You need to 
run kmem_cache_init() on a cpu on a processor with memory.

Or we need to revert the patch which would allocate control 
structures again for all online nodes regardless if they have memory or 
not.

Does reverting 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 change the 
situation? (However, we tried this on the other thread without success).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Celeron Core

2008-01-18 Thread Andi Kleen
> It will relative to not throttling.

No it will not. Please reread Dominik's mail I linked to. It explains 
it clearly.

> You made a claim that is -physically impossible- as stated, a claim I've
> seen here before and I'm correcting it. If something reduces heat, it
> must save power *by the definition of heat and power*. And if you reduce
> power usage, you will make your battery last longer.

I think the misunderstanding on your side is relative to what there
is less heat. Throttling essentially reduces temporary heat spikes on
the silicon, but does not make the system overall take less power
or generate less heat as measured over a longer time because it will
be idle less.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: non-choice related config entries within choice

2008-01-18 Thread Roman Zippel
Hi,

On Wed, 16 Jan 2008, Sam Ravnborg wrote:

> But one feature I really would like to see is named chocies so we can do 
> stuff like:
> 
> choice X86_PROCESSOR
> 
> config GENERIC_PROCESSOR
>   bool "A generic X86 processor"
> endchoice
> 
> 
> ...
> 
> choice PPC_PROCESSOR
> 
> config GENERIC_PROCESSOR
>   bool "A generic PowerPC processor
> 
> endchoice
> 
> The issue here is that we do not today allow the same config option
> to appear if more than one choice.

What I have in mind is slightly different, above choices would simply be 
called PROCESSOR, which would tell kconfig that all choices belong to the 
same group.

bye, Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] x86: Change size of node ids from u8 to u16 fixup

2008-01-18 Thread Yinghai Lu
On Jan 18, 2008 8:36 PM, David Rientjes <[EMAIL PROTECTED]> wrote:
> On Fri, 18 Jan 2008, Yinghai Lu wrote:
>
> > > +#if MAX_NUMNODES > 256
> > > +typedef u16 numanode_t;
> > > +#else
> > > +typedef u8 numanode_t;
> > > +#endif
> > > +
> > >  #endif /* _LINUX_NUMA_H */
> >
> > that is wrong, you can not change pxm_to_node_map from int to u8 or u16.
> >
>
> Yeah, NID_INVAL is negative so no unsigned type will work here,
> unfortunately.  And that reduces the intended savings of your change since
> the smaller type can only be used with a smaller CONFIG_NODES_SHIFT.
>
> > int acpi_map_pxm_to_node(int pxm)
> > {
> > int node = pxm_to_node_map[pxm];
> >
> > if (node < 0){
> > if (nodes_weight(nodes_found_map) >= MAX_NUMNODES)
> > return NID_INVAL;
> > node = first_unset_node(nodes_found_map);
> > __acpi_map_pxm_to_node(pxm, node);
> > node_set(node, nodes_found_map);
> > }
> >
> > return node;
> > }
> >
> > node will will be always 255 or 65535
> >
>
> Right.
>
> > please keep that to int.
> >
> > I got
> > SART: PXM 0 -> APIC 0 -> Node 255
> > SART: PXM 0 -> APIC 1 -> Node 255
> > SART: PXM 1 -> APIC 2 -> Node 255
> > SART: PXM 1 -> APIC 3 -> Node 255
> >
>
> I assume this is a typo and those proximity mappings are actually from the
> SRAT.

SRAT for processor only have
PXM and APIC id. setup_node(pxm) will get node id for pxm, start from 0...

> > if (node < 0){
> > if (nodes_weight(nodes_found_map) >= MAX_NUMNODES)
> > return NID_INVAL;
> > node = first_unset_node(nodes_found_map);
> > __acpi_map_pxm_to_node(pxm, node);
> > node_set(node, nodes_found_map);
> > }

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Celeron Core

2008-01-18 Thread Matt Mackall

On Sat, 2008-01-19 at 05:27 +0100, Andi Kleen wrote:
> > So while throttling may be less efficient in terms of watt seconds used
> > to compile something than running at full speed, it is incorrect to say
> > it uses less power. One machine running for an hour throttled to 50%
> > uses less power (and therefore less battery and cooling) than another
> > running at full speed for that same hour.
> 
> Not for the same unit of work. If you just run endless loops you 
> might be true, but most systems don't do that. 

Yes, most systems idle.

> In terms of laptops (or rather in most other systems too) you usually care 
> about battery life time while the system is mostly idling (waiting
> for your key strokes etc.). In this case enabling throttling
> as a cpufreq driver will not make your battery last longer.

It will relative to not throttling.

You made a claim that is -physically impossible- as stated, a claim I've
seen here before and I'm correcting it. If something reduces heat, it
must save power *by the definition of heat and power*. And if you reduce
power usage, you will make your battery last longer.

Make any other statement you want about the efficiency of throttling per
unit work or the effectiveness of throttling relavite to other methods,
just stop repeating the claim that "throttling reduces heat but doesn't
save power". It goes against the law of conservation of energy.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] x86: Change size of node ids from u8 to u16 fixup

2008-01-18 Thread David Rientjes
On Fri, 18 Jan 2008, Yinghai Lu wrote:

> > +#if MAX_NUMNODES > 256
> > +typedef u16 numanode_t;
> > +#else
> > +typedef u8 numanode_t;
> > +#endif
> > +
> >  #endif /* _LINUX_NUMA_H */
> 
> that is wrong, you can not change pxm_to_node_map from int to u8 or u16.
> 

Yeah, NID_INVAL is negative so no unsigned type will work here, 
unfortunately.  And that reduces the intended savings of your change since 
the smaller type can only be used with a smaller CONFIG_NODES_SHIFT.

> int acpi_map_pxm_to_node(int pxm)
> {
> int node = pxm_to_node_map[pxm];
> 
> if (node < 0){
> if (nodes_weight(nodes_found_map) >= MAX_NUMNODES)
> return NID_INVAL;
> node = first_unset_node(nodes_found_map);
> __acpi_map_pxm_to_node(pxm, node);
> node_set(node, nodes_found_map);
> }
> 
> return node;
> }
> 
> node will will be always 255 or 65535
> 

Right.

> please keep that to int.
> 
> I got
> SART: PXM 0 -> APIC 0 -> Node 255
> SART: PXM 0 -> APIC 1 -> Node 255
> SART: PXM 1 -> APIC 2 -> Node 255
> SART: PXM 1 -> APIC 3 -> Node 255
> 

I assume this is a typo and those proximity mappings are actually from the 
SRAT.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: non-choice related config entries within choice

2008-01-18 Thread Roman Zippel
Hi,

On Wed, 16 Jan 2008, Jan Beulich wrote:

> now that I finally found time to look into the problems that caused the
> patch changing boolean/tristate choice behavior to be reverted I find
> that due to the way things worked in the past there are a couple of
> cases where config options not really belonging to the choice are inside
> the choice scope (drivers/usb/gadget/Kconfig, arch/ppc/Kconfig, and
> arch/mips/Kconfig are where I found such cases, and I hope this is a
> complete list).
> 
> The question is: Is it intended for this to work the way it used to, or
> is it rather reasonable to change these scripts so that stuff dependent
> upon the choice selection is being dealt with outside the choice scope?

This is really a feature, try it with a visible option there which depends 
on a choice option.
First for the choice type I think it's simpler to just look at the first 
choice option, anything more complex simply has to specify the type 
explicitly.

The bigger problem is that menu_finalize() is little complex which makes 
such changes more difficult, basically it does two things (updating the 
dependencies and generating the menu structure) in one pass and it depends 
on a specific order, which is nonobvious. I really should clean this up to 
make it easier to follow what's happening.
For now this means the dependency to the choice symbol has to be added a 
little later right before the call to menu_add_symbol().

bye, Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [1/2] Fix some inaccurate comments in MTRR checking code

2008-01-18 Thread Andi Kleen

- is_cpu(INTEL) actually refers only to the MTRR architecture
and all AMD CPUs since K7 use the Intel MTRR architecture so the
fixup code runs on AMD too. Remove a comment claiming otherwise.

[Perhaps is_cpu should be renamed, the name is clearly confusing]

- Clarify another incorrect comment.

Cc: [EMAIL PROTECTED]
Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86/kernel/cpu/mtrr/main.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux/arch/x86/kernel/cpu/mtrr/main.c
===
--- linux.orig/arch/x86/kernel/cpu/mtrr/main.c
+++ linux/arch/x86/kernel/cpu/mtrr/main.c
@@ -640,6 +640,8 @@ early_param("disable_mtrr_trim", disable
  * Some buggy BIOSes don't setup the MTRRs properly for systems with certain
  * memory configurations.  This routine checks to make sure the MTRRs having
  * a write back type cover all of the memory the kernel is intending to use.
+ * [AK: actually it doesn't check that. It just checks that the highest
+ * MTRR is matching the end of memory. That is not quite the same.]
  * If not, it'll trim any memory off the end by adjusting end_pfn, removing
  * it from the kernel's allocation pools, warning the user with an obnoxious
  * message.
@@ -649,7 +651,6 @@ void __init mtrr_trim_uncached_memory(vo
unsigned long i, base, size, highest_addr = 0, def, dummy;
mtrr_type type;
 
-   /* Make sure we only trim uncachable memory on Intel machines */
rdmsr(MTRRdefType_MSR, def, dummy);
def &= 0xff;
if (!is_cpu(INTEL) || disable_mtrr_trim || def != MTRR_TYPE_UNCACHABLE)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] [2/2] Fix MTRR check on AMD systems with > 4GB RAM

2008-01-18 Thread Andi Kleen

Newer AMD systems (since K8RevF) have a magic SYSCFG MSR bit to force WB
on memory beyond 4GB. This is not reflected in the standard MTRR
MSRs, so the MTRR checking routine would get confused and disable
perfectly good RAM beyond 4GB. Implement code for checking that bit.


Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86/kernel/cpu/mtrr/main.c |   34 ++
 1 file changed, 34 insertions(+)

Index: linux/arch/x86/kernel/cpu/mtrr/main.c
===
--- linux.orig/arch/x86/kernel/cpu/mtrr/main.c
+++ linux/arch/x86/kernel/cpu/mtrr/main.c
@@ -634,6 +634,37 @@ static int __init disable_mtrr_trim_setu
 early_param("disable_mtrr_trim", disable_mtrr_trim_setup);
 
 #ifdef CONFIG_X86_64
+
+/*
+ * Newer AMD K8s and later CPUs have a special magic MSR way to force WB
+ * for memory >4GB. Check for that here.
+ * Note this won't check if the MTRRs < 4GB where the magic bit doesn't
+ * apply to are wrong, but so far we don't know of any such case in the wild.
+ */
+
+#define Tom2ForceMemTypeWB (1U << 22)
+static __init int amd_special_default_mtrr(void)
+{
+   u32 l, h;
+
+   /* Doesn't apply to memory < 4GB */
+   if (end_pfn <= (0x >> PAGE_SHIFT))
+   return 0;
+   if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
+   return 0;
+   if (boot_cpu_data.x86 < 0xf || boot_cpu_data.x86 > 0x11)
+   return 0;
+   /* In case some hypervisor doesn't pass SYSCFG through */
+   if (rdmsr_safe(MSR_K8_SYSCFG, , ) < 0)
+   return 0;
+   /* Memory between 4GB and top of mem is forced WB by this magic bit.
+* Reserved before K8RevF, but should be zero there.
+*/
+   if (l & Tom2ForceMemTypeWB)
+   return 1;
+   return 0;
+}
+
 /**
  * mtrr_trim_uncached_memory - trim RAM not covered by MTRRs
  *
@@ -667,6 +698,9 @@ void __init mtrr_trim_uncached_memory(vo
highest_addr = base + size;
}
 
+   if (amd_special_default_mtrr())
+   return;
+
if ((highest_addr >> PAGE_SHIFT) < end_pfn) {
printk(KERN_WARNING "***\n");
printk(KERN_WARNING " WARNING: likely BIOS bug\n");
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 16/22 -v2] add get_monotonic_cycles

2008-01-18 Thread Frank Ch. Eigler
Hi -

On Fri, Jan 18, 2008 at 10:55:27PM -0500, Steven Rostedt wrote:
> [...]
> > All this complexity is to be justified by keeping the raw prev/next
> > pointers from being sent to a naive tracer?  It seems to me way out of
> > proportion.
> 
> Damn, and I just blew away all my marker code for something like this ;-)

Sorry! :-)

> [...]
> We have in sched.c the following marker:
>  trace_mark(kernel_sched_scheduler, "prev %p next %p", prev, next);

Fine so far!

> Then Mathieu can add in some code somewhere (or a module, or something)
>   ret = marker_probe_register("kernel_sched_scheduler",
>   "prev %p next %p",
>   pretty_print_sched_switch, NULL);

> static void pretty_print_sched_switch(const struct marker *mdata,
>   void *private_data,
>   const char *format, ...)
> {
>   [...]
>   trace_mark(kernel_pretty_print_sched_switch,
>   "prev_pid %d next_pid %d prev_state %ld",
>   prev->pid, next->pid, prev->state);
> }

That marker_probe_register call would need to be done only when the
embedded (k_p_p_s_s) marker is actually being used.  Otherwise we'd
lose all the savings of a dormant sched.c marker by always calling
into pretty_print_sched_switch(), whether or not the k_p_p_s_s marker
was active.

In any case, if the naive tracer agrees to become educated about some
of these markers in the form of intermediary functions like that, they
need not insist on a second hop through marker territory anyway:

 static void pretty_print_sched_switch(const struct marker *mdata,
void *private_data,
const char *format, ...)
 {
[...]
lttng_backend_trace(kernel_pretty_print_sched_switch,
"prev_pid %d next_pid %d prev_state %ld",
prev->pid, next->pid, prev->state);
 }


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files

2008-01-18 Thread Rik van Riel
On Fri, 18 Jan 2008 18:50:03 -0600
Matt Mackall <[EMAIL PROTECTED]> wrote:
> On Fri, 2008-01-18 at 17:54 -0500, Rik van Riel wrote:

> > Backup programs not seeing an updated mtime is a really big deal.
> 
> And that's fixed with the 4-line approach.
> 
> Reminds me, I've got a patch here for addressing that problem with loop 
> mounts:
> 
> Writes to loop should update the mtime of the underlying file.
> 
> Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>

Acked-by: Rik van Riel <[EMAIL PROTECTED]>

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Celeron Core

2008-01-18 Thread Andi Kleen
> So while throttling may be less efficient in terms of watt seconds used
> to compile something than running at full speed, it is incorrect to say
> it uses less power. One machine running for an hour throttled to 50%
> uses less power (and therefore less battery and cooling) than another
> running at full speed for that same hour.

Not for the same unit of work. If you just run endless loops you 
might be true, but most systems don't do that. 

In terms of laptops (or rather in most other systems too) you usually care 
about battery life time while the system is mostly idling (waiting
for your key strokes etc.). In this case enabling throttling
as a cpufreq driver will not make your battery last longer.

Also skipping the clocks does not actually safe all very much power
compared to the other measures C-states or speedstep do (like dropping voltage) 

This means enabling it will likely make your laptop battery last shorter.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86: Unify printk strings in fault_32|64.c

2008-01-18 Thread Harvey Harrison
Adding the address of the faulting library missed removing a
line ending from X86_32.

Also update the shorter printk format for X86_32 in fault_64.c
to make it easier to se the remaining differences.

Signed-off-by: Harvey Harrison <[EMAIL PROTECTED]>
---
Ingo, trivial printk update after Andi's patches.

 arch/x86/mm/fault_32.c |2 +-
 arch/x86/mm/fault_64.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/fault_32.c b/arch/x86/mm/fault_32.c
index 793e830..0bd2417 100644
--- a/arch/x86/mm/fault_32.c
+++ b/arch/x86/mm/fault_32.c
@@ -589,7 +589,7 @@ bad_area_nosemaphore:
printk_ratelimit()) {
printk(
 #ifdef CONFIG_X86_32
-   "%s%s[%d]: segfault at %lx ip %08lx sp %08lx error 
%lx\n",
+   "%s%s[%d]: segfault at %lx ip %08lx sp %08lx error %lx",
 #else
"%s%s[%d]: segfault at %lx ip %lx sp %lx error %lx",
 #endif
diff --git a/arch/x86/mm/fault_64.c b/arch/x86/mm/fault_64.c
index 9270a7d..9ac449e 100644
--- a/arch/x86/mm/fault_64.c
+++ b/arch/x86/mm/fault_64.c
@@ -591,7 +591,7 @@ bad_area_nosemaphore:
printk_ratelimit()) {
printk(
 #ifdef CONFIG_X86_32
-   "%s%s[%d]: segfault at %08lx ip %08lx sp %08lx error 
%lx\n",
+   "%s%s[%d]: segfault at %lx ip %08lx sp %08lx error %lx",
 #else
"%s%s[%d]: segfault at %lx ip %lx sp %lx error %lx",
 #endif
-- 
1.5.4.rc3.1118.gf6754c



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Celeron Core

2008-01-18 Thread Matt Mackall

On Sat, 2008-01-19 at 02:15 +0100, Andi Kleen wrote:
> On Fri, Jan 18, 2008 at 06:27:57PM -0600, Matt Mackall wrote:
> > 
> > On Fri, 2008-01-18 at 22:11 +0100, Andi Kleen wrote:
> > > Chodorenko Michail <[EMAIL PROTECTED]> writes:
> > > 
> > > > I have a laptop "Extensa 5220", with the processor Celeron based on 
> > > > 'core'
> > > > technology.
> > > > There is ~ / arch/i386/kernel/cpu/cpufreq/p4-clockmod.c in the kernel
> > > > source code
> > > > but there's no line identification of my CPU for apply freqency change
> > > > need to add a ID line 0х16
> > > 
> > > Note that driver will likely do clock throttling on your CPU.
> > > Using that is usually a bad idea because it does not actually
> > > safe power. It's only intended to let the CPU cool down in some 
> > > situations.
> > 
> > Power consumption is more or less exactly equal to heat production
> > (that's where the power goes, after all!), so either clock throttling
> > DOES save power or it DOES NOT cool the CPU.
> 
> No actually the way it works on modern x86 CPUs is that the best
> strategy for saving power is to do things quickly and then
> idle longer. That means on anything that has reasonably
> deep sleep modi e.g. on older server/desktop systems things might
> be slightly different because they had very little power saving
> features enabled, but it's definitely true for all
> laptop systems from the last several years. But even
> on desktop/server throttling tends to be a bad idea.

Dominik is measuring energy expended (watts * seconds) vs work done (CPU
cycles). But your claim above is "clock throttling...does not save power
[but it lets] the CPU cool down", which talks about power (watts) and
heat (also watts, in fact the *very same* watts) and is physically
impossible. A CPU turns power into heat. Less heat out implies less
power in.

So while throttling may be less efficient in terms of watt seconds used
to compile something than running at full speed, it is incorrect to say
it uses less power. One machine running for an hour throttled to 50%
uses less power (and therefore less battery and cooling) than another
running at full speed for that same hour.

The first machine may take significantly longer to complete its task (or
it may not, if the task is reading email or watching video), but that's
another matter entirely. And whether it's more or less efficient than
other power-saving approaches is also another matter. Throttling does
save power.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Makes lguest's irq handler typesafe

2008-01-18 Thread Tejun Heo
Hello,

Rusty Russell wrote:
> There are three possibilities: (1) force everyone to use void *, (2) 
> force 
> everyone to be type-correct, (3) allow both with some tricks.  Currently 
> we're on (1).  For kthread, with only dozens of users, I chose (2) (very 
> simple, easy to understand).  I think for widespread things like timer and 
> interrupt handlers, I think (3) is the right way to go.

Yeah, during transition, we definitely want (3).

> I wanted to get this patch out there and see what the reaction was.  I 
> can 
> do timers next, if that's going to add fuel to the discussion.

I think you successfully got a very small sample of possible reactions.
 Jeff vetoing it (and for good reasons) and me a bit more positive but
not quite sold.  Yeah, I think we need a good flame war to determine our
heading and converting timer shouldn't take too much of your time, right?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] x86: Change size of node ids from u8 to u16 fixup

2008-01-18 Thread Yinghai Lu
On Jan 18, 2008 10:30 AM,  <[EMAIL PROTECTED]> wrote:
> Change the size of node ids for X86_64 from 8 bits to 16 bits
> to accomodate more than 256 nodes.
>
> Introduce a "numanode_t" type for x86-generic usage.
>
> Cc: Eric Dumazet <[EMAIL PROTECTED]>
> Signed-off-by: Mike Travis <[EMAIL PROTECTED]>
> Reviewed-by: Christoph Lameter <[EMAIL PROTECTED]>
> ---
> Fixup:
>
> Size of memnode.embedded_map needs to be changed to
> accomodate 16-bit node ids as suggested by Eric.
>
> V2->V3:
> - changed memnode.embedded_map from [64-16] to [64-8]
>   (and size comment to 128 bytes)
>
> V1->V2:
> - changed pxm_to_node_map to u16
> - changed memnode map entries to u16
> ---
>  arch/x86/mm/numa_64.c   |2 +-
>  drivers/acpi/numa.c |2 +-
>  include/asm-x86/mmzone_64.h |6 +++---
>  include/linux/numa.h|6 ++
>  4 files changed, 11 insertions(+), 5 deletions(-)
>
> --- a/arch/x86/mm/numa_64.c
> +++ b/arch/x86/mm/numa_64.c
> @@ -88,7 +88,7 @@ static int __init allocate_cachealigned_
> unsigned long pad, pad_addr;
>
> memnodemap = memnode.embedded_map;
> -   if (memnodemapsize <= 48)
> +   if (memnodemapsize <= ARRAY_SIZE(memnode.embedded_map))
> return 0;
>
> pad = L1_CACHE_BYTES - 1;
> --- a/drivers/acpi/numa.c
> +++ b/drivers/acpi/numa.c
> @@ -38,7 +38,7 @@ ACPI_MODULE_NAME("numa");
>  static nodemask_t nodes_found_map = NODE_MASK_NONE;
>
>  /* maps to convert between proximity domain and logical node ID */
> -static int pxm_to_node_map[MAX_PXM_DOMAINS]
> +static numanode_t pxm_to_node_map[MAX_PXM_DOMAINS]
> = { [0 ... MAX_PXM_DOMAINS - 1] = NID_INVAL };
>  static int node_to_pxm_map[MAX_NUMNODES]
> = { [0 ... MAX_NUMNODES - 1] = PXM_INVAL };
...>
>  #define MAX_NUMNODES(1 << NODES_SHIFT)
>
> +#if MAX_NUMNODES > 256
> +typedef u16 numanode_t;
> +#else
> +typedef u8 numanode_t;
> +#endif
> +
>  #endif /* _LINUX_NUMA_H */

that is wrong, you can not change pxm_to_node_map from int to u8 or u16.

int acpi_map_pxm_to_node(int pxm)
{
int node = pxm_to_node_map[pxm];

if (node < 0){
if (nodes_weight(nodes_found_map) >= MAX_NUMNODES)
return NID_INVAL;
node = first_unset_node(nodes_found_map);
__acpi_map_pxm_to_node(pxm, node);
node_set(node, nodes_found_map);
}

return node;
}

node will will be always 255 or 65535

please keep that to int.

I got
SART: PXM 0 -> APIC 0 -> Node 255
SART: PXM 0 -> APIC 1 -> Node 255
SART: PXM 1 -> APIC 2 -> Node 255
SART: PXM 1 -> APIC 3 -> Node 255

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Makes lguest's irq handler typesafe

2008-01-18 Thread Rusty Russell
On Saturday 19 January 2008 12:44:52 Tejun Heo wrote:
> Tejun Heo wrote:
> > so I think the question is "do we want to change all callbacks to
> > take native pointer type instead of void pointer?".
>
> Lemme clarity myself a bit.  I'm not saying that we should convert all
> at once or literally every callback should be converted.  What I'm
> saying is whether we're headed that way in general and converting big
> ones - timer for example - and getting the conversion agreed upon should
> be enough to set the norm.

Hi Tejun

There are three possibilities: (1) force everyone to use void *, (2) force 
everyone to be type-correct, (3) allow both with some tricks.  Currently 
we're on (1).  For kthread, with only dozens of users, I chose (2) (very 
simple, easy to understand).  I think for widespread things like timer and 
interrupt handlers, I think (3) is the right way to go.

I wanted to get this patch out there and see what the reaction was.  I can 
do timers next, if that's going to add fuel to the discussion.

Thanks!
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 16/22 -v2] add get_monotonic_cycles

2008-01-18 Thread Steven Rostedt


On Fri, 18 Jan 2008, Frank Ch. Eigler wrote:
>
> All this complexity is to be justified by keeping the raw prev/next
> pointers from being sent to a naive tracer?  It seems to me way out of
> proportion.

Damn, and I just blew away all my marker code for something like this ;-)

Actually, you just gave me a great idea that I think can help all of us.
OK, Mathieu may not be in total agreement, but I think this is the
ultimate compromise.

We have in sched.c the following marker:

 trace_mark(kernel_sched_scheduler, "prev %p next %p", prev, next);


Then Mathieu can add in some code somewhere (or a module, or something)

ret = marker_probe_register("kernel_sched_scheduler",
"prev %p next %p",
pretty_print_sched_switch, NULL);

static void pretty_print_sched_switch(const struct marker *mdata,
void *private_data,
const char *format, ...)
{
va_list ap;
struct task_struct *prev;
struct task_struct *next;

va_start(ap, format);
prev = va_arg(ap, typeof(prev));
next = va_arg(ap, typeof(next));
va_end;

trace_mark(kernel_pretty_print_sched_switch,
"prev_pid %d next_pid %d prev_state %ld",
prev->pid, next->pid, prev->state);
}


Then LTTng on startup could arm the normal kernel_sched_switch code and
have the user see the nice one. All without adding any more goo or
overhead to the non tracing case, and keeping a few critical markers with
enough information to be useful to other tracers!

Thoughts?

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 16/22 -v2] add get_monotonic_cycles

2008-01-18 Thread Frank Ch. Eigler
Hi -

On Fri, Jan 18, 2008 at 06:19:29PM -0500, Mathieu Desnoyers wrote:
> [...]
> Almost.. I would add :
> 
> static int trace_switch_to_enabled;
> 
> > static inline trace_switch_to(struct task_struct *prev,
> > struct task_struct *next)
> > {
> if (likely(!trace_switch_to_enabled))
>   return;
> > trace_mark(kernel_schedudule,
> > "prev_pid %d next_pid %d prev_state %ld",
> > prev->pid, next->pid, prev->pid);
> > 
> > trace_context_switch(prev, next);
> > }
> 
> And some code to activate the trace_switch_to_enabled variable (ideally
> keeping a refcount). [...]

All this complexity is to be justified by keeping the raw prev/next
pointers from being sent to a naive tracer?  It seems to me way out of
proportion.

- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 16/22 -v2] add get_monotonic_cycles

2008-01-18 Thread Frank Ch. Eigler
Hi -

On Fri, Jan 18, 2008 at 05:49:19PM -0500, Steven Rostedt wrote:
> [...]
> > But I have not seen a lot of situations where that kind of glue-code was
> > needed, so I think it makes sense to keep markers simple to use and
> > efficient for the common case.
> >
> > Then, in this glue-code, we can put trace_mark() and calls to in-kernel
> > tracers.
> 
> I'm almost done with the latency tracer work, and there are only a total
> of 6 hooks that I needed.
> [...]
> With the above, we could have this (if this is what I think you are
> recommending). [...]
> static inline trace_switch_to(struct task_struct *prev,
>   struct task_struct *next)
> {
>   trace_mark(kernel_schedudule,
>   "prev_pid %d next_pid %d prev_state %ld",
>   prev->pid, next->pid, prev->pid);
> 
>   trace_context_switch(prev, next);
> }

I'm afraid I don't see the point in this.  You could use one marker
for all that data (and force the more naive tracer callbacks to ignore
out some of them).  You could even use two markers (and force the more
naive tracer to attach to only to its favorite subset).  But to use a
second, different, less efficient, not more configurable tracing hook
mechanism in the same logical spot makes no sense to me.

- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] X86: fix typo PAT to X86_PAT

2008-01-18 Thread Dave Jones
On Fri, Jan 18, 2008 at 10:02:10PM +0100, Ingo Molnar wrote:
 > 
 > * Dave Jones <[EMAIL PROTECTED]> wrote:
 > 
 > >  > you mean modifies MTRRs? Which code is that? (besides the 
 > >  > /proc/mtrr userspace API)
 > > 
 > > This exclusion is going to be a real pain in the ass for distro 
 > > kernels. It's impossible for example to build a kernel that will now 
 > > support the MTRR-alike registers on the AMD K6/early Cyrix etc and 
 > > also support PAT.
 > > 
 > > Additionally, given people tend to update their kernels a lot more 
 > > often than they update to a whole new version of X, it means until 
 > > userspace has caught up, we can't ship a kernel with PAT supported, or 
 > > else X gets a lot slower due to the missing mtrr support.
 > 
 > there's no exclusion enforced right now, and if a CPU is PAT-incapable 
 > (or if the kernel is booted nopat) then the MTRR bits should be usable. 
 > But if we boot with PAT enabled, and Xorg gets /proc/mtrr wrong, we'll 
 > see nasty crashes. If it gets them right, it should all still work just 
 > fine. Is this ok? Then, in a year or two, distros can disable write 
 > support to /proc/mtrr. Hm?

A crazy idea just occured to me..  We could make /proc/mtrr an interface
to set PAT on a range of memory.  This would make it transparently work
without any changes in X or anything else that sets them in userspace.

Dave

-- 
http://www.codemonkey.org.uk
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.23-rc9 kernel panic - simple_map_write+0x4e/0x75

2008-01-18 Thread devzero
looks very similar to

http://marc.info/?l=linux-kernel=119759817332220=2
http://marc.info/?l=linux-kernel=119902059626408=2
http://marc.info/?l=linux-kernel=119259674826979=2
http://lkml.org/lkml/2006/6/14/59

so you should try without CONFIG_MTD_PNC2000 

that driver having problems for some time (first report seems 2.6.17 ) - i have 
found same issue (independently) on 2.6.22 and 2.6.24, too.

i have some updated information regarding this driver i will post here very 
soon, but please confirm if this is the issue here (i`m quite sure it is)

regards
roland


Subject:Re: [BUG] 2.6.23-rc9 kernel panic - simple_map_write+0x4e/0x75
From:   Kamalesh Babulal 
Date:   2007-10-17 7:23:33
Message-ID: 4715B5A5.9050005 () linux ! vnet ! ibm ! com
[Download message RAW]

Andrew Morton wrote:
> On Sat, 13 Oct 2007 12:10:44 +0530
> Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
> 
>> Kernel panic's with following oops message with 2.6.23-rc9 kernel 
>>
>> [  320.747257] ks0108: ERROR: parport didn't register new device
>> [  320.771314] cfag12864b: ERROR: ks0108 is not initialized
>> [  320.794308] cfag12864bfb: ERROR: cfag12864b is not initialized
>> [  320.820729] BUG: unable to handle kernel paging request at virtual address
>> bf00
>> [  320.857712]  printing eip:
>> [  320.872556] *pde = 
>> [  320.887577] Oops: 0002 [#1]
>> [  320.902383] SMP 
>> [  320.914174] Modules linked in:
>> [  320.929333] CPU:0
>> [  320.929334] EIP:0060:[]Not tainted VLI
>> [  320.929335] EFLAGS: 00010286   (2.6.23-rc9-1 #1)
>> [  320.982753] EIP is at simple_map_write+0x4e/0x75
>> [  321.001956] eax: f0f0f0f0   ebx: c1de3f00   ecx: c1de3f00   edx: c1de3f00
>> [  321.027701] esi: c3ca8d6c   edi: bf00   ebp: c3ca8d98   esp: c3ca8d6c
>> [  321.053322] ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
>> [  321.075981] Process swapper (pid: 1, ti=c3ca8000 task=f7f44000 
>> task.ti=c3ca8000)
>> [  321.103031] Stack: f0f0f0f0     
>>   
>> [  321.139446]c3ca8e20 0001 c3ca8e40 c3ca8e6c c0d692e6 f0f0f0f0
>>   
>> [  321.176495]     
>>  50e6 
>> [  321.214141] Call Trace:
>> [  321.233922]  [] show_trace_log_lvl+0x19/0x2e
>> [  321.255433]  [] show_stack_log_lvl+0x99/0xa1
>> [  321.276706]  [] show_registers+0x1b8/0x290
>> [  321.297254]  [] die+0x118/0x1fd
>> [  321.314920]  [] do_page_fault+0x51c/0x5f3
>> [  321.335291]  [] error_code+0x72/0x78
>> [  321.354413]  [] cfi_probe_chip+0x148/0x9e1
>> [  321.375202]  [] genprobe_new_chip+0x82/0x98
>> [  321.396298]  [] genprobe_ident_chips+0x26/0x205
>> [  321.418493]  [] mtd_do_chip_probe+0x10/0x97
>> [  321.439654]  [] cfi_probe+0xd/0xf
>> [  321.458157]  [] do_map_probe+0x40/0x53
>> [  321.477931]  [] init_pnc2000+0x3b/0x6d
>> [  321.497559]  [] do_initcalls+0x7a/0x1c2
>> [  321.517377]  [] do_basic_setup+0x1c/0x1e
>> [  321.537327]  [] kernel_init+0x69/0xaa
>> [  321.556311]  [] kernel_thread_helper+0x7/0x10
>> [  321.577207]  ===
>> [  321.592882] Code: 83 f8 01 75 0a 03 7b 10 8b 45 d4 88 07 eb 35 83 f8 02 
>> 75 0c
>> 0f b7 45 d4 03 7b 10 66 89 07 eb 24 83 f8 04 75 0a 03 7b 10 8b 45 d4 <89> 07 
>> eb
>> 15 7e 13 03 7b 10 89 c1 c1 e9 02 f3 a5 89 c1 83 e1 03 
>> [  321.668990] EIP: [] simple_map_write+0x4e/0x75 SS:ESP 
>> 0068:c3ca8d6c
>> [  321.695750] Kernel panic - not syncing: Attempted to kill init!
> 
> Would I be correct in assuming that the machine has no mtd devices, but
> you happened to link that driver into your vmlinux?
> 

Hi Andrew,

The machine do not have the mtd device, and the mtd is compiled into the 
vmlinuz.
This configuration works fine for other kernels and is reproducible with
2.6.23-rc9 only.
-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

_
Unbegrenzter Speicherplatz für Ihr E-Mail Postfach? Jetzt aktivieren!
http://www.digitaledienste.web.de/freemail/club/lp/?lp=7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Converting writeback linked lists to a tree based data structure

2008-01-18 Thread David Chinner
On Fri, Jan 18, 2008 at 01:41:33PM +0800, Fengguang Wu wrote:
> > That is, think of large file writes like process scheduler batch
> > jobs - bulk throughput is what matters, so the larger the time slice
> > you give them the higher the throughput.
> > 
> > IMO, the sort of result we should be looking at is a
> > writeback design that results in cycling somewhat like:
> > 
> > slice 1: iterate over small files
> > slice 2: flush large file 1
> > slice 3: iterate over small files
> > slice 4: flush large file 2
> > ..
> > slice n-1: flush large file N
> > slice n: iterate over small files
> > slice n+1: flush large file N+1
> > 
> > So that we keep the disk busy with a relatively fair mix of
> > small and large I/Os while both are necessary.
> 
> If we can sync fast enough, the lower layer would be able to merge
> those 4MB requests.

No, not necessarily - think of a stripe with a chunk size of 512k.
That 4MB will be split into 8x512k chunks and sent to different
devices (and hence elevator queues). The only way you get elevator
merging in this sort of config is that if you send multiple stripe
*width* sized amounts to the device in a very short time period.
I see quite a few filesystems with stripe widths in the tens of MB
range.

> > Put simply:
> > 
> > The higher the bandwidth of the device, the more frequently
> > we need to be servicing the inodes with large amounts of
> > dirty data to be written to maintain write throughput at a
> > significant percentage of the device capability.
> > 
> > The writeback algorithm needs to take this into account for it
> > to be able to scale effectively for high throughput devices.
> 
> Slow queues go full first. Currently the writeback code will skip
> _and_ congestion_wait() for congested filesystems. The better policy
> is to congestion_wait() _after_ all other writable pages have been
> synced.

Agreed.

The comments I've made are mainly concerned with getting efficient
flushing of a single device occuring. Interactions between multiple
devices are a separable issue

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text

2008-01-18 Thread Andi Kleen
> > I think it should be enabled on AMD too though. If the reordering breaks
> > it then blacklisting won't help anyways.

Actually it is already enabled on AMD. You check for is_cpu(INTEL)
but that just checks the generic MTRR architecture and all AMD CPUs
since K7 use that one too.

That is ok imho.

Perhaps it would be good to fix the incorrect comment though.


> >
> > -Andi
> >
> > [1] but I checked the known errata and there was nothing related to MTRR.
> 
> Ah, ok, that explains your reticence earlier.  Thanks for testing again, I 
> guess the patch is good to go.

I see a failure here now on a (AMD) system where it trims a lot of memory, but
should probably not (or at least i haven't noticed any malfunction
before without it). Investigating. 

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


something odd in emu10k1/emufx

2008-01-18 Thread Al Viro
In copy_tlv() we have
tlv = kmalloc(data[1] * 4 + sizeof(data), GFP_KERNEL);
if (!tlv)
return NULL;
memcpy(tlv, data, sizeof(data));
if (copy_from_user(tlv + 2, _tlv + 2, data[1])) {
kfree(tlv);
return NULL;
}
which looks rather odd, since either we kmalloc too much or copy too little...
Comments?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/6] export __supported_pte_mask

2008-01-18 Thread Glauber de Oliveira Costa
export __supported_pte_mask variable as GPL symbol.
lguest is a user of it.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
---
 arch/x86/kernel/setup64.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/setup64.c b/arch/x86/kernel/setup64.c
index 8fa0de8..5cc1339 100644
--- a/arch/x86/kernel/setup64.c
+++ b/arch/x86/kernel/setup64.c
@@ -41,6 +41,8 @@ struct desc_ptr idt_descr = { 256 * 16 - 1, (unsigned long) 
idt_table };
 char boot_cpu_stack[IRQSTACKSIZE] 
__attribute__((section(".bss.page_aligned")));
 
 unsigned long __supported_pte_mask __read_mostly = ~0UL;
+EXPORT_SYMBOL_GPL(__supported_pte_mask);
+
 static int do_not_nx __cpuinitdata = 0;
 
 /* noexec=on|off
-- 
1.5.0.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/6] export check_tsc_unstable

2008-01-18 Thread Glauber de Oliveira Costa
Exporrt check_tsc_unstable function as GPL symbol. lguest is
a user of it.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
---
 arch/x86/kernel/tsc_64.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/tsc_64.c b/arch/x86/kernel/tsc_64.c
index c62f3b6..947554d 100644
--- a/arch/x86/kernel/tsc_64.c
+++ b/arch/x86/kernel/tsc_64.c
@@ -92,10 +92,12 @@ sched_clock(void) 
__attribute__((alias("native_sched_clock")));
 
 static int tsc_unstable;
 
-inline int check_tsc_unstable(void)
+int check_tsc_unstable(void)
 {
return tsc_unstable;
 }
+EXPORT_SYMBOL_GPL(check_tsc_unstable);
+
 #ifdef CONFIG_CPU_FREQ
 
 /* Frequency scaling support. Adjust the TSC based timer when the cpu frequency
-- 
1.5.0.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/6] use __PAGE_KERNEL instead of _PAGE_KERNEL

2008-01-18 Thread Glauber de Oliveira Costa
x86_64 don't expose the intermediate representation with one underline,
_PAGE_KERNEL, just the double-underlined one.

Use it, to get a common ground between 32 and 64-bit

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
---
 drivers/lguest/page_tables.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/lguest/page_tables.c b/drivers/lguest/page_tables.c
index 399c05d..fb5ebd0 100644
--- a/drivers/lguest/page_tables.c
+++ b/drivers/lguest/page_tables.c
@@ -645,7 +645,7 @@ void map_switcher_in_guest(struct lg_cpu *cpu, struct 
lguest_pages *pages)
 
/* Make the last PGD entry for this Guest point to the Switcher's PTE
 * page for this CPU (with appropriate flags). */
-   switcher_pgd = __pgd(__pa(switcher_pte_page) | _PAGE_KERNEL);
+   switcher_pgd = __pgd(__pa(switcher_pte_page) | __PAGE_KERNEL);
 
cpu->lg->pgdirs[cpu->cpu_pgd].pgdir[SWITCHER_PGD_INDEX] = switcher_pgd;
 
@@ -657,7 +657,7 @@ void map_switcher_in_guest(struct lg_cpu *cpu, struct 
lguest_pages *pages)
 * page is already mapped there, we don't have to copy them out
 * again. */
pfn = __pa(cpu->regs_page) >> PAGE_SHIFT;
-   regs_pte = pfn_pte(pfn, __pgprot(_PAGE_KERNEL));
+   regs_pte = pfn_pte(pfn, __pgprot(__PAGE_KERNEL));
switcher_pte_page[(unsigned long)pages/PAGE_SIZE%PTRS_PER_PTE] = 
regs_pte;
 }
 /*:*/
-- 
1.5.0.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/6] explicitly use sched.h include

2008-01-18 Thread Glauber de Oliveira Costa
This patch adds the sched.h header explicitly to lguest_user file,
and avoid depending on it being included somewhere else.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
---
 drivers/lguest/lguest_user.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
index a87fca6..85d42d3 100644
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "lg.h"
 
 /*L:055 When something happens, the Waker process needs a way to stop the
-- 
1.5.0.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/6] explicitly use hrtimer.h include

2008-01-18 Thread Glauber de Oliveira Costa
This patch adds the hrtimer.h header explicitly to lg.h file,
and avoid depending on it being included somewhere else.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
---
 drivers/lguest/lg.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
index f9707cf..eb51fc2 100644
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
-- 
1.5.0.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/6] explicitly use ktime.h include

2008-01-18 Thread Glauber de Oliveira Costa
This patch adds the ktime.h header explicitly to hypercalls file,
and avoid depending on it being included somewhere else.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
---
 drivers/lguest/hypercalls.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/lguest/hypercalls.c b/drivers/lguest/hypercalls.c
index 32666d0..0f2cb4f 100644
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "lg.h"
-- 
1.5.0.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/6] lguest patches for compiling x86_64

2008-01-18 Thread Glauber de Oliveira Costa
Right now, I have lguest in-tree module compiling on x86_64.
It's not yet on a sendable state, since the module itself isn't loading.

However, this subset of the series is pretty straightforward, and I'm sending it
now aiming at reducing the delta size in the future ;-)

Have fun,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Makes lguest's irq handler typesafe

2008-01-18 Thread Tejun Heo
Tejun Heo wrote:
> so I think the question is "do we want to change all callbacks to
> take native pointer type instead of void pointer?".

Lemme clarity myself a bit.  I'm not saying that we should convert all
at once or literally every callback should be converted.  What I'm
saying is whether we're headed that way in general and converting big
ones - timer for example - and getting the conversion agreed upon should
be enough to set the norm.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Makes lguest's irq handler typesafe

2008-01-18 Thread Tejun Heo
Hello, Rusty.

Rusty Russell wrote:
> On Saturday 19 January 2008 10:12:33 Tejun Heo wrote:
>> Type safety is good but I doubt this would be worth the complexity.  It
>> has some benefits but there's much larger benefit in keeping things in
>> straight C.  People know that functions take fixed types and are also
>> familiar with the convention of passing void * for callback arguments.
>> IMHO, staying in line with those common knowledges easily trumps having
>> type checking on interrupt handler.
> 
> I sympathise with this argument, but I think just because people are familiar 
> with existing hacks shouldn't prevent improvement.  I think the resulting 
> code is clearer and more readable.
> 
> Even in the implementation, the tricky part is the check_either_type() macro: 
> the rest is straight-forward.

The change is a small one and both the cost and benefit aren't big.

>> Also, how often do we see a bug where things go wrong because interrupt
>> handler is given the wrong type of argument?  Even when such bug
>> happens, I doubt it can escape the developer's workstation if he/she is
>> paying any attention to testing.
> 
> I agree this one is unlikely.  But I am trying to spread type-safety more 
> widely (see previous kthread patches).
> 
> I like changing the kernel to make life simpler for developers.  We don't do 
> enough of it.

I'm in full agreement here but the cost / benefit equation doesn't seem
quite right to me.  If we're gonna convert all callbacks to take native
pointers, I'm fine with the irq handler part too.  If not, it just adds
confusion which is much worse than any benefit it can bring, so I think
the question is "do we want to change all callbacks to take native
pointer type instead of void pointer?".

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


patch driver-core-constify-the-name-passed-to-platform_device_register_simple.patch added to gregkh-2.6 tree

2008-01-18 Thread gregkh

This is a note to let you know that I've just added the patch titled

 Subject: Driver Core: constify the name passed to 
platform_device_register_simple

to my gregkh-2.6 tree.  Its filename is

 
driver-core-constify-the-name-passed-to-platform_device_register_simple.patch

This tree can be found at 
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


>From [EMAIL PROTECTED]  Fri Jan 18 17:28:36 2008
From: Stephen Rothwell <[EMAIL PROTECTED]>
Date: Fri, 11 Jan 2008 17:24:53 +1100
Subject: Driver Core: constify the name passed to 
platform_device_register_simple
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED], LKML 
Message-ID: <[EMAIL PROTECTED]>


This name is just passed to platform_device_alloc which has its parameter
declared const.

Signed-off-by: Stephen Rothwell <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

---
 drivers/base/platform.c |2 +-
 include/linux/platform_device.h |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -360,7 +360,7 @@ EXPORT_SYMBOL_GPL(platform_device_unregi
  * the Linux driver model.  In particular, when such drivers are built
  * as modules, they can't be "hotplugged".
  */
-struct platform_device *platform_device_register_simple(char *name, int id,
+struct platform_device *platform_device_register_simple(const char *name, int 
id,
struct resource *res, 
unsigned int num)
 {
struct platform_device *pdev;
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -35,7 +35,7 @@ extern struct resource *platform_get_res
 extern int platform_get_irq_byname(struct platform_device *, char *);
 extern int platform_add_devices(struct platform_device **, int);
 
-extern struct platform_device *platform_device_register_simple(char *, int id,
+extern struct platform_device *platform_device_register_simple(const char *, 
int id,
struct resource *, unsigned int);
 
 extern struct platform_device *platform_device_alloc(const char *name, int id);


Patches currently in gregkh-2.6 which might be from [EMAIL PROTECTED] are

bad/battery-class-driver.patch
driver/driver-core-constify-the-name-passed-to-platform_device_register_simple.patch
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] Improve type handling in interrupt handlers

2008-01-18 Thread Rusty Russell
On Saturday 19 January 2008 07:41:41 Jeff Garzik wrote:
> FWIW, I have been working in this area extensively.

Excellent...

> Check out the 'irq-cleanups' and 'irq-remove' branches of
> git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6.git

Your irq-cleanups branch is nice work!  But AFAICT these patches are not 
included in your irq-cleanups branch.  Did you want me to switch my patch 
over to irqreturn_t and send them for you to roll in?

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] introduce __devinitconst

2008-01-18 Thread Greg KH
On Fri, Jan 11, 2008 at 01:57:27AM -0700, Jan Beulich wrote:
> The drivers picked just serve as examples (which I routinely build and
> hence am able to easily verify), i.e. as before he patch doesn't change
> all instances where 'const' could have been added as a result of the
> base change, only where the change has a real effect (the module loader
> doesn't enforce read-only section attributes at present, so only
> built-in files make a real difference).

What does this buy us?

> --- 2.6.24-rc7-initconst.orig/include/linux/init.h
> +++ 2.6.24-rc7-initconst/include/linux/init.h
> @@ -257,11 +257,13 @@ void __init parse_early_param(void);
>  #ifdef CONFIG_HOTPLUG
>  #define __devinit
>  #define __devinitdata
> +#define __devinitconst const
>  #define __devexit
>  #define __devexitdata
>  #else
>  #define __devinit __init
>  #define __devinitdata __initdata
> +#define __devinitconst __initdata

Shoudn't that be "__initdata const" or something like that?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] constify struct attribute_group uses

2008-01-18 Thread Greg KH
On Fri, Jan 11, 2008 at 08:37:55AM +, Jan Beulich wrote:
> .. as all consumers of it don't require it to be modifiable.
> 
> Unfortunately, due to the two-level constifications, this required
> touching quite many files, not all of which I am able to test - please
> bare with eventual mistakes or oversights.
> 
> The patch doesn't change all instances where 'const' could have been
> added as a result of the base structure changes, only where either the
> change has a real effect (the module loader doesn't enforce read-only
> section attributes at present, so only built-in files matter) or where
> compiler warnings would result otherwise.

Hm, code in these areas has changed a lot in -mm, can you respin this
against that tree to catch all of the different attribute changes that
has happened?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Makes lguest's irq handler typesafe

2008-01-18 Thread Rusty Russell
On Saturday 19 January 2008 10:12:33 Tejun Heo wrote:
> Type safety is good but I doubt this would be worth the complexity.  It
> has some benefits but there's much larger benefit in keeping things in
> straight C.  People know that functions take fixed types and are also
> familiar with the convention of passing void * for callback arguments.
> IMHO, staying in line with those common knowledges easily trumps having
> type checking on interrupt handler.

I sympathise with this argument, but I think just because people are familiar 
with existing hacks shouldn't prevent improvement.  I think the resulting 
code is clearer and more readable.

Even in the implementation, the tricky part is the check_either_type() macro: 
the rest is straight-forward.

> Also, how often do we see a bug where things go wrong because interrupt
> handler is given the wrong type of argument?  Even when such bug
> happens, I doubt it can escape the developer's workstation if he/she is
> paying any attention to testing.

I agree this one is unlikely.  But I am trying to spread type-safety more 
widely (see previous kthread patches).

I like changing the kernel to make life simpler for developers.  We don't do 
enough of it.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/10] Tree fixes for PARAVIRT

2008-01-18 Thread Glauber de Oliveira Costa
On Jan 18, 2008 8:02 PM, Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> * Zachary Amsden <[EMAIL PROTECTED]> wrote:
>
> > > but in exchange you broke all of 32-bit with CONFIG_PARAVIRT=y.
> > > Which means you did not even build-test it on 32-bit, let alone boot
> > > test it...
> >
> > Why are we rushing so much to do 64-bit paravirt that we are breaking
> > working configurations?  If the developement is going to be this
> > chaotic, it should be done and tested out of tree until it can
> > stabilize.
>
> what you see is a open feedback cycle conducted on lkml. People send
> patches for arch/x86, and we tell them if it breaks something. The bug
> was found before i pushed out the x86.git devel tree (and the fix is
> below - but this shouldnt matter to you because the bug never hit a
> public x86.git tree).
>
> Ingo
>
Other than this, it seems to build and boot fine.

Do you want me to resend ?
-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Known prob: MAX_LOCK_DEPTH too low?

2008-01-18 Thread Linda Walsh

On my x86_64 machine, I got the following message
in log (kern = 2.6.23.14)

Jan 16 04:08:38 Astara kernel: BUG: MAX_LOCK_DEPTH too low!
Jan 16 04:08:38 Astara kernel: turning off the locking correctness 
validator.


Have no idea what caused it as I found the message on my console
somewhat after the fact.  The system had been up over 24 hours and
is still running.  System still seems 'fine' (been up 3 days now),
so you can treat this as a "data point".






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/10] add missing parameter for lookup_address

2008-01-18 Thread Andi Kleen
On Fri, Jan 18, 2008 at 12:26:13PM -0800, Chris Wright wrote:
> * Glauber de Oliveira Costa ([EMAIL PROTECTED]) wrote:
> > lookup_address() receives two parameters, but efi_64.c call
> > is passing only one. It's actually preventing the tree from compiling
> > 
> > Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
> 
> Good catch, I know I don't test with CONFIG_EFI=y

Ah that came probably from the CPA patchset which added the parameter.
Sorry for that.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GEODE] Geode GX/LX watchdog timer (was 2.6.24-rc8 hangs at mfgpt-timer)

2008-01-18 Thread Jordan Crouse
On 17/01/08 23:52 +0100, Arnd Hannemann wrote:
> >> Watchdog for the new API would be great :-)
> > 
> > Coming soon.

As promised, a watchdog driver for the Geode GX/LX processors is attached.
I basically just ported the previous patch forward to 2.6.24.

I also have good news or bad news depending on your perspective.  I wanted
to test this against 2.6.24, and OLPC is stuck at an older kernel version,
so I had to test this with coreboot (LinuxBIOS) on another Geode 
platform.  Like all BIOSen execpt for the OLPC firmware, coreboot uses
VSA (SMM handler) which consumes all the timers.

So I used the magical MSR and surprise! - the timer tick hung.  
I compiled out the timer tick, and tested the watchdog timer instead,
and it worked fine on timer 0.  So I don't think the MFGPTs themselves
have anything to do with this problem, but I do think it might be 
related to VSA and possibly interrupts too.  I'm going to invoke the
strong BIOS fu of our LinuxBIOS / BIOS expert Marc Jones, and see what
he comes up with.

I don't know how much of a hassle it would be for Andres to get a 2.6.24
kernel running on the OLPC to make sure that this isn't a regression
in the timer tick code (I suspect it isn't a regression, but you never
know).  I also think that it would probably be in our best interest to
default CONFIG_GEODE_MFGPT_TIMER to 'n' until we get this figured
out.  Since most BIOSen don't have timers available, that shouldn't affect
too many people.

So, anyway, enjoy the watchdog timer - I hope it meets everybody's
expectations for the 2.6.25 kernel.

Jordan
-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.
[GEODE] Add a watchdog driver based on the CS5535/CS5536 MFGPT timers

From: Jordan Crouse <[EMAIL PROTECTED]>

Add a watchdog timer based on the MFGPT timers in the CS5535/CS5536 
companion chips to the AMD Geode GX and LX processors.  Only caveat
is that the BIOS must provide at least a one free timer, and most
do not.

Signed-off-by: Jordan Crouse <[EMAIL PROTECTED]>
---

 drivers/watchdog/Kconfig|   13 ++
 drivers/watchdog/Makefile   |1 
 drivers/watchdog/geodewdt.c |  321 +++
 3 files changed, 335 insertions(+), 0 deletions(-)

Index: git/drivers/watchdog/Kconfig
===
--- git.orig/drivers/watchdog/Kconfig   2008-01-18 15:06:44.0 -0700
+++ git/drivers/watchdog/Kconfig2008-01-18 17:50:25.0 -0700
@@ -295,6 +295,20 @@
 
  Most people will say N.
 
+config GEODE_WDT
+   tristate "AMD Geode CS5535/CS5536 Watchdog"
+   depends on MGEODE_LX
+   default n
+   help
+ This driver enables a watchdog capability built into the
+CS5535/CS5536 companion chips for the AMD Geode GX and LX
+processors.  This watchdog watches your kernel to make sure
+it doesn't freeze, and if it does, it reboots your computer after
+a certain amount of time.
+
+You can compile this driver directly into the kernel, or use
+it as a module.  The module will be called geodewdt.
+
 config SC520_WDT
tristate "AMD Elan SC520 processor Watchdog"
depends on X86
Index: git/drivers/watchdog/Makefile
===
--- git.orig/drivers/watchdog/Makefile  2008-01-18 15:06:44.0 -0700
+++ git/drivers/watchdog/Makefile   2008-01-18 16:32:15.0 -0700
@@ -59,6 +59,7 @@
 obj-$(CONFIG_ADVANTECH_WDT) += advantechwdt.o
 obj-$(CONFIG_ALIM1535_WDT) += alim1535_wdt.o
 obj-$(CONFIG_ALIM7101_WDT) += alim7101_wdt.o
+obj-$(CONFIG_GEODE_WDT) += geodewdt.o
 obj-$(CONFIG_SC520_WDT) += sc520_wdt.o
 obj-$(CONFIG_EUROTECH_WDT) += eurotechwdt.o
 obj-$(CONFIG_IB700_WDT) += ib700wdt.o
Index: git/drivers/watchdog/geodewdt.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ git/drivers/watchdog/geodewdt.c 2008-01-18 17:47:39.0 -0700
@@ -0,0 +1,308 @@
+/* Watchdog timer for the Geode GX/LX with the CS5535/CS5536 companion chip
+ *
+ * Copyright (C) 2006-2007, Advanced Micro Devices, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#define GEODEWDT_HZ 500
+#define GEODEWDT_SCALE 6
+#define GEODEWDT_MAX_SECONDS 131
+
+#define WDT_FLAGS_OPEN 1
+#define WDT_FLAGS_ORPHAN 2
+
+#define DRV_NAME "geodewdt"
+#define WATCHDOG_NAME "Geode GX/LX WDT"
+#define WATCHDOG_TIMEOUT 60
+
+static int timeout = WATCHDOG_TIMEOUT;
+module_param(timeout, int, 0);
+MODULE_PARM_DESC(timeout, "Watchdog timeout in seconds. 1<= timeout <=131, 
default=" 

Re: [RFC] Per-thread getrusage

2008-01-18 Thread Roland McGrath
I agree that RUSAGE_THREAD is fine.  (In fact, if you'd pressed me to
remember without looking, I would have assumed we put it in already.)
However, in the implementation, I would keep it cleaner by moving the
identical code from inside the loop under case RUSAGE_SELF into a shared
subfunction, rather than duplicating it.  In fact, here you go (next posting).

As to getting arbitrary other threads' data, there are several problems
there.  Adding a syscall is often more trouble than it's worth.  Ulrich
cited the issues with that as the API.  You also didn't handle compat for
it correctly.  To warrant the code necessary to make this available by
whatever API, I think you need to say some more about what it's needed for.

Off hand, it seems most in keeping with other things to expose this via a
/proc file, i.e. /proc/tgid/task/tid/rusage and (/proc/tgid/rusage for the
RUSAGE_SELF behavior on a foreign process).  There we already have the
infrastructure for dealing with the security issues uniformly with how we
control other similar information.  Personally I tend to prefer a binary
interface, i.e. a virtual file whose contents are struct rusage; for that
you still need to do the extra compat work, since a 32-bit process should
have the 32-bit struct rusage layout in its /proc files.  If you put the
numbers into ascii text as some /proc interfaces do, you don't need any
special considerations for CONFIG_COMPAT.


Thanks,
Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] RUSAGE_THREAD

2008-01-18 Thread Roland McGrath

This adds the RUSAGE_THREAD option for the getrusage system call.
Solaris calls this RUSAGE_LWP and uses the same value (1).
That name is not a natural one for Linux, but we keep it as an alias.

Signed-off-by: Roland McGrath <[EMAIL PROTECTED]>
---
 include/linux/resource.h |2 ++
 kernel/sys.c |   31 ++-
 2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/include/linux/resource.h b/include/linux/resource.h
index ae13db7..02b3377 100644
--- a/include/linux/resource.h
+++ b/include/linux/resource.h
@@ -19,6 +19,8 @@ struct task_struct;
 #defineRUSAGE_SELF 0
 #defineRUSAGE_CHILDREN (-1)
 #define RUSAGE_BOTH(-2)/* sys_wait4() uses this */
+#defineRUSAGE_THREAD   1   /* only the calling thread */
+#defineRUSAGE_LWP  RUSAGE_THREAD   /* Solaris name for same */
 
 struct rusage {
struct timeval ru_utime;/* user time used */
diff --git a/kernel/sys.c b/kernel/sys.c
index d1fe71e..6a62bc4 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1554,6 +1554,19 @@ out:
  *
  */
 
+static void accumulate_thread_rusage(struct task_struct *t, struct rusage *r,
+cputime_t *utimep, cputime_t *stimep)
+{
+   *utimep = cputime_add(*utimep, t->utime);
+   *stimep = cputime_add(*stimep, t->stime);
+   r->ru_nvcsw += t->nvcsw;
+   r->ru_nivcsw += t->nivcsw;
+   r->ru_minflt += t->min_flt;
+   r->ru_majflt += t->maj_flt;
+   r->ru_inblock += task_io_get_inblock(t);
+   r->ru_oublock += task_io_get_oublock(t);
+}
+
 static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
 {
struct task_struct *t;
@@ -1563,6 +1576,11 @@ static void k_getrusage(struct task_struct *p, int who, 
struct rusage *r)
memset((char *) r, 0, sizeof *r);
utime = stime = cputime_zero;
 
+   if (who == RUSAGE_THREAD) {
+   accumulate_thread_rusage(p, r, , );
+   goto out;
+   }
+
rcu_read_lock();
if (!lock_task_sighand(p, )) {
rcu_read_unlock();
@@ -1595,14 +1613,7 @@ static void k_getrusage(struct task_struct *p, int who, 
struct rusage *r)
r->ru_oublock += p->signal->oublock;
t = p;
do {
-   utime = cputime_add(utime, t->utime);
-   stime = cputime_add(stime, t->stime);
-   r->ru_nvcsw += t->nvcsw;
-   r->ru_nivcsw += t->nivcsw;
-   r->ru_minflt += t->min_flt;
-   r->ru_majflt += t->maj_flt;
-   r->ru_inblock += task_io_get_inblock(t);
-   r->ru_oublock += task_io_get_oublock(t);
+   accumulate_thread_rusage(t, r, , );
t = next_thread(t);
} while (t != p);
break;
@@ -1614,6 +1625,7 @@ static void k_getrusage(struct task_struct *p, int who, 
struct rusage *r)
unlock_task_sighand(p, );
rcu_read_unlock();
 
+out:
cputime_to_timeval(utime, >ru_utime);
cputime_to_timeval(stime, >ru_stime);
 }
@@ -1627,7 +1639,8 @@ int getrusage(struct task_struct *p, int who, struct 
rusage __user *ru)
 
 asmlinkage long sys_getrusage(int who, struct rusage __user *ru)
 {
-   if (who != RUSAGE_SELF && who != RUSAGE_CHILDREN)
+   if (who != RUSAGE_SELF && who != RUSAGE_CHILDREN &&
+   who != RUSAGE_THREAD)
return -EINVAL;
return getrusage(current, who, ru);
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Celeron Core

2008-01-18 Thread Andi Kleen
On Fri, Jan 18, 2008 at 06:27:57PM -0600, Matt Mackall wrote:
> 
> On Fri, 2008-01-18 at 22:11 +0100, Andi Kleen wrote:
> > Chodorenko Michail <[EMAIL PROTECTED]> writes:
> > 
> > > I have a laptop "Extensa 5220", with the processor Celeron based on 'core'
> > > technology.
> > > There is ~ / arch/i386/kernel/cpu/cpufreq/p4-clockmod.c in the kernel
> > > source code
> > > but there's no line identification of my CPU for apply freqency change
> > > need to add a ID line 0х16
> > 
> > Note that driver will likely do clock throttling on your CPU.
> > Using that is usually a bad idea because it does not actually
> > safe power. It's only intended to let the CPU cool down in some situations.
> 
> Power consumption is more or less exactly equal to heat production
> (that's where the power goes, after all!), so either clock throttling
> DOES save power or it DOES NOT cool the CPU.

No actually the way it works on modern x86 CPUs is that the best
strategy for saving power is to do things quickly and then
idle longer. That means on anything that has reasonably
deep sleep modi e.g. on older server/desktop systems things might
be slightly different because they had very little power saving
features enabled, but it's definitely true for all
laptop systems from the last several years. But even
on desktop/server throttling tends to be a bad idea.

Intel style throttling makes the CPU skip cycles so the maximum built
up heat for a time unit is less, but it will run active for longer that 
makes it overall take more power for a given work unit.

Here's a better description from Dominik:

http://article.gmane.org/gmane.linux.kernel.cpufreq/3497

Note the conditions he describes are quite common. Also the OP
CPU likely has C2 and even deeper sleep modi.

Another problem with throttling / p4-clockmod is that on
at least some CPUs (not necessarily P-M, but we saw this on
some P4s) is that they can create quite long user visible
latencies. You might actually get "hanging mouse pointers" 
from it if you use it with an aggressive governour like ondemand.

The normal use case for Intel throttling is to just do 
an emergency cool down in case the CPU fails (down to thermal
shutdown). And that is done transparently behind Linux's back.

-Andi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files

2008-01-18 Thread Matt Mackall

On Fri, 2008-01-18 at 17:54 -0500, Rik van Riel wrote:
> On Fri, 18 Jan 2008 14:47:33 -0800 (PST)
> Linus Torvalds <[EMAIL PROTECTED]> wrote:
> 
> >  - keep it simple. Let's face it, Linux has never ever given those 
> >guarantees before, and it's not is if anybody has really cared. Even 
> >now, the issue seems to be more about paper standards conformance than 
> >anything else.
> 
> There is one issue which is way more than just standards conformance.
> 
> When a program changes file data through mmap(), at some point the
> mtime needs to be update so that backup programs know to back up the
> new version of the file.
> 
> Backup programs not seeing an updated mtime is a really big deal.

And that's fixed with the 4-line approach.

Reminds me, I've got a patch here for addressing that problem with loop mounts:

Writes to loop should update the mtime of the underlying file.

Signed-off-by: Matt Mackall <[EMAIL PROTECTED]>

Index: l/drivers/block/loop.c
===
--- l.orig/drivers/block/loop.c 2007-11-05 17:50:07.0 -0600
+++ l/drivers/block/loop.c  2007-11-05 19:03:51.0 -0600
@@ -221,6 +221,7 @@ static int do_lo_send_aops(struct loop_d
offset = pos & ((pgoff_t)PAGE_CACHE_SIZE - 1);
bv_offs = bvec->bv_offset;
len = bvec->bv_len;
+   file_update_time(file);
while (len > 0) {
sector_t IV;
unsigned size;
@@ -299,6 +300,7 @@ static int __do_lo_send_write(struct fil
 
set_fs(get_ds());
bw = file->f_op->write(file, buf, len, );
+   file_update_time(file);
set_fs(old_fs);
if (likely(bw == len))
return 0;


-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usb-serial: Sierra driver - add devices and update dtr

2008-01-18 Thread Greg KH
On Thu, Jan 17, 2008 at 03:15:23PM -0800, Kevin Lloyd wrote:
> > > Correct, the 0x0023 is the only newly added device that requires the
> new
> > > features.
> >
> > Does that mean things will not work for this device if it is added to
> > the device table, without the code updates?
> Adding the device will not break the driver (assuming you remove the
> tag).

Which "tag"?  The device id?

> > And is this device even public yet?
> 
> No, but we are trying to add native support for devices into kernels
> well before they are released in an effort give better native support
> to end-users.

Ok, that's great to do, and is what needs to be done, just can't add new
features during the "bug-fix-only" cycle of development :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


patch pm-acquire-device-locks-on-suspend.patch added to gregkh-2.6 tree

2008-01-18 Thread gregkh

This is a note to let you know that I've just added the patch titled

 Subject: PM: Acquire device locks on suspend

to my gregkh-2.6 tree.  Its filename is

 pm-acquire-device-locks-on-suspend.patch

This tree can be found at 
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


>From [EMAIL PROTECTED]  Fri Jan 18 16:29:07 2008
From: "Rafael J. Wysocki" <[EMAIL PROTECTED]>
Date: Sat, 12 Jan 2008 20:40:46 +0100
Subject: PM: Acquire device locks on suspend
To: Greg KH <[EMAIL PROTECTED]>
Cc: Alan Stern <[EMAIL PROTECTED]>, Len Brown <[EMAIL PROTECTED]>, Ingo Molnar 
<[EMAIL PROTECTED]>, ACPI Devel Maling List <[EMAIL PROTECTED]>, pm list 
<[EMAIL PROTECTED]>, LKML , Johannes Berg <[EMAIL 
PROTECTED]>, Andrew Morton <[EMAIL PROTECTED]>
Message-ID: <[EMAIL PROTECTED]>
Content-Disposition: inline


From: Rafael J. Wysocki <[EMAIL PROTECTED]>


This patch reorganizes the way suspend and resume notifications are
sent to drivers.  The major changes are that now the PM core acquires
every device semaphore before calling the methods, and calls to
device_add() during suspends will fail, while calls to device_del()
during suspends will block.

It also provides a way to safely remove a suspended device with the
help of the PM core, by using the device_pm_schedule_removal() callback
introduced specifically for this purpose, and updates two drivers (msr
and cpuid) that need to use it.

Signed-off-by: Alan Stern <[EMAIL PROTECTED]>
Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

---
 arch/x86/kernel/cpuid.c|6 
 arch/x86/kernel/msr.c  |6 
 drivers/base/core.c|   65 +
 drivers/base/power/main.c  |  504 +
 drivers/base/power/power.h |   12 +
 include/linux/device.h |8 
 6 files changed, 414 insertions(+), 187 deletions(-)

--- a/arch/x86/kernel/cpuid.c
+++ b/arch/x86/kernel/cpuid.c
@@ -157,15 +157,15 @@ static int __cpuinit cpuid_class_cpu_cal
 
switch (action) {
case CPU_UP_PREPARE:
-   case CPU_UP_PREPARE_FROZEN:
err = cpuid_device_create(cpu);
break;
case CPU_UP_CANCELED:
-   case CPU_UP_CANCELED_FROZEN:
case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
cpuid_device_destroy(cpu);
break;
+   case CPU_UP_CANCELED_FROZEN:
+   destroy_suspended_device(cpuid_class, MKDEV(CPUID_MAJOR, cpu));
+   break;
}
return err ? NOTIFY_BAD : NOTIFY_OK;
 }
--- a/arch/x86/kernel/msr.c
+++ b/arch/x86/kernel/msr.c
@@ -155,15 +155,15 @@ static int __cpuinit msr_class_cpu_callb
 
switch (action) {
case CPU_UP_PREPARE:
-   case CPU_UP_PREPARE_FROZEN:
err = msr_device_create(cpu);
break;
case CPU_UP_CANCELED:
-   case CPU_UP_CANCELED_FROZEN:
case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
msr_device_destroy(cpu);
break;
+   case CPU_UP_CANCELED_FROZEN:
+   destroy_suspended_device(msr_class, MKDEV(MSR_MAJOR, cpu));
+   break;
}
return err ? NOTIFY_BAD : NOTIFY_OK;
 }
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -726,11 +726,20 @@ int device_add(struct device *dev)
 {
struct device *parent = NULL;
struct class_interface *class_intf;
-   int error = -EINVAL;
+   int error;
+
+   error = pm_sleep_lock();
+   if (error) {
+   dev_warn(dev, "Suspicious %s during suspend\n", __FUNCTION__);
+   dump_stack();
+   return error;
+   }
 
dev = get_device(dev);
-   if (!dev || !strlen(dev->bus_id))
+   if (!dev || !strlen(dev->bus_id)) {
+   error = -EINVAL;
goto Error;
+   }
 
pr_debug("DEV: registering device: ID = '%s'\n", dev->bus_id);
 
@@ -795,6 +804,7 @@ int device_add(struct device *dev)
}
  Done:
put_device(dev);
+   pm_sleep_unlock();
return error;
  BusError:
device_pm_remove(dev);
@@ -905,6 +915,7 @@ void device_del(struct device * dev)
struct device * parent = dev->parent;
struct class_interface *class_intf;
 
+   device_pm_remove(dev);
if (parent)
klist_del(>knode_parent);
if (MAJOR(dev->devt))
@@ -981,7 +992,6 @@ void device_del(struct device * dev)
if (dev->bus)
blocking_notifier_call_chain(>bus->bus_notifier,
 BUS_NOTIFY_DEL_DEVICE, dev);
-   device_pm_remove(dev);
kobject_uevent(>kobj, KOBJ_REMOVE);
kobject_del(>kobj);
if (parent)
@@ -1156,14 +1166,11 @@ error:
 EXPORT_SYMBOL_GPL(device_create);
 
 /**
- * device_destroy - removes a device that was created with device_create()
+ * find_device - finds a device that was created with device_create()
  * @class: pointer to 

Re: PROBLEM: Celeron Core

2008-01-18 Thread Matt Mackall

On Fri, 2008-01-18 at 22:11 +0100, Andi Kleen wrote:
> Chodorenko Michail <[EMAIL PROTECTED]> writes:
> 
> > I have a laptop "Extensa 5220", with the processor Celeron based on 'core'
> > technology.
> > There is ~ / arch/i386/kernel/cpu/cpufreq/p4-clockmod.c in the kernel
> > source code
> > but there's no line identification of my CPU for apply freqency change
> > need to add a ID line 0х16
> 
> Note that driver will likely do clock throttling on your CPU.
> Using that is usually a bad idea because it does not actually
> safe power. It's only intended to let the CPU cool down in some situations.

Power consumption is more or less exactly equal to heat production
(that's where the power goes, after all!), so either clock throttling
DOES save power or it DOES NOT cool the CPU.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86: kdump failure

2008-01-18 Thread Roland McGrath
Oops, I overlooked the use of elf_core_copy_regs in kernel/kexec.c.  It
is certainly safe and fine to reintroduce the old macro.  Everything
removed in the "x86 user_regset cleanup" patch is purely removing code
and it doesn't hurt to have it back (it's just all unused except for this
kexec nit).

Unfortunately it really doesn't fit to have kexec call into the new
user_regset code that replaced this macro for user core dump purposes.
Those new interfaces are really purely for user-mode state, derived only
from task_struct (i.e. uses task_pt_regs), not from a struct pt_regs
pointer passed in.  (There is the minority case where it really is using
user-mode state.  That part could be done via the user_regset interface,
if that saved any trouble.)

Things like crash_fixup_ss_esp point to the poor fit of the code intended
for user core dumps with what kexec needs.  IMHO it would be cleaner for
kexec's arch interfaces to fill in elf_gregset_t directly, replacing some
of the places a struct pt_regs is passed around now.
crash_setup_regs already has to know the name of every register anyway.
A particular arch's definition can share code with its core dump or
user_regset code when that fits.


Thanks,
Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git patches] net driver updates for 2.6.25

2008-01-18 Thread David Miller
From: Jeff Garzik <[EMAIL PROTECTED]>
Date: Fri, 18 Jan 2008 15:17:21 -0500

> 
> Please pull from the 'upstream' branch of
> git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git upstream
> 
> to receive my 2.6.25 net driver queue into davem/net-2.6.25.git:

Pulled and pushed back out, thanks Jeff.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SCSI: fix isa/pcmcia compile problem

2008-01-18 Thread Tejun Heo
James Bottomley wrote:
>> I'm just a bit reluctant to touch these drivers, since they're all
>> incredibly ancient.  We don't have good luck with simple transformation
>> patches on the older drivers ... and it seems to take months before
>> anyone notices there's a problem.
> 
> This is the patch that will return them to their original behaviour.
> 
> James
> 
> ---
> diff --git a/drivers/scsi/pcmcia/Kconfig b/drivers/scsi/pcmcia/Kconfig
> index fa481b5..53857c6 100644
> --- a/drivers/scsi/pcmcia/Kconfig
> +++ b/drivers/scsi/pcmcia/Kconfig
> @@ -6,7 +6,8 @@ menuconfig SCSI_LOWLEVEL_PCMCIA
>   bool "PCMCIA SCSI adapter support"
>   depends on SCSI!=n && PCMCIA!=n
>  
> -if SCSI_LOWLEVEL_PCMCIA && SCSI && PCMCIA
> +# drivers have problems when build in, so require modules
> +if SCSI_LOWLEVEL_PCMCIA && SCSI && PCMCIA && m
>  
>  config PCMCIA_AHA152X
>   tristate "Adaptec AHA152X PCMCIA support"
> 
> 

Looks good to me.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Shrink ext3_inode_info by 8 bytes for !POSIX_ACL.

2008-01-18 Thread Indan Zupancic
On Fri, January 18, 2008 20:16, Mingming Cao wrote:
> On Sat, 2008-01-12 at 21:35 +0100, Indan Zupancic wrote:
>> i_file_acl and i_dir_acl aren't always needed.
>>
>> With certain configs this makes 10 ext3_inode_cache objects fit in
>> one slab instead of the current 9, as the size shrinks from 416 to
>> 408 bytes for 32 bit, !POSIX_ACL and !EXT3_FS_XATTR configs.
>>
>> Signed-off-by: Indan Zupancic <[EMAIL PROTECTED]>
>> ---
>>  fs/ext3/ialloc.c  |2 ++
>>  fs/ext3/inode.c   |   29 +++--
>>  include/linux/ext3_fs_i.h |2 ++
>>  3 files changed, 23 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/ext3/ialloc.c b/fs/ext3/ialloc.c
>> index 1bc8cd8..01745bc 100644
>> --- a/fs/ext3/ialloc.c
>> +++ b/fs/ext3/ialloc.c
>> @@ -574,8 +574,10 @@ got:
>>  ei->i_frag_no = 0;
>>  ei->i_frag_size = 0;
>>  #endif
>> +#ifdef CONFIG_EXT3_FS_POSIX_ACL
>>  ei->i_file_acl = 0;
>>  ei->i_dir_acl = 0;
>> +#endif
>
> For regular file, i_dir_acl is being reused as i_size_high to support
> large file.

Only the i_dir_acl of struct ext3_inode, not the one from ext3_inode_info.

Thanks,

Indan


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SCSI: fix isa/pcmcia compile problem

2008-01-18 Thread James Bottomley

On Fri, 2008-01-18 at 17:32 -0600, James Bottomley wrote:
> On Sat, 2008-01-19 at 08:27 +0900, Tejun Heo wrote:
> > James Bottomley wrote:
> > > On Fri, 2008-01-18 at 16:20 +0900, Tejun Heo wrote:
> > >> aha152x.c and fdomain are built twice - once for the isa driver and
> > >> once for the PCMCIA one.  Through #ifdefs, the compiled codes are
> > >> slightly different; thus, global symbols need to be given different
> > >> names depending on which flavor is being built.  This patch adds
> > >> GLOBAL() macro to aha152x.h and fdomain.h which change the symbol
> > >> depending on PCMCIA.
> > >>
> > >> This bug has always existed but has been masked by the fact the
> > >> drivers/scsi/pcmcia used subdir-(y|m) instead of obj-(y|m) which made
> > >> drivers/scsi/pcmcia/built_in.o not linked into the kernel and thus
> > >> avoided the duplicate symbols during compilation.
> > >>
> > >> Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
> > >> ---
> > >> Ah... missed that one.  Here's the updated version.
> > > 
> > > Actually, isn't the better fix just to return to the original behaviour?
> > > 
> > > As you pointed out, using the subdir instead of obj meant that although
> > > the modules were built, the drivers were never linked into the main
> > > kernel.  According to the records, this has been the default forever, so
> > > there can be no-one anywhere relying on these drivers being built in.
> > > Actually, as old style pcmcia drivers, I'm not sure there's much value
> > > building them into the kernel anyway.
> > > 
> > > So just modify scsi/pcmcia/Kconfig to make them all depend on m.
> > 
> > Yeap, there is no problem if you don't allow them to be linked into the
> > kernel.  If that's how you want it, please go ahead.
> > 
> > I personally think it's a bit odd to disallow building into kernel
> > because of the peculiarity of the implementation (including c files and
> > compiling them slightly differently) and also no one reporting doesn't
> > necessarily mean no one has attempted it and failed.
> 
> Heh ... I'll make you a deal.  Find just one user of one of these
> drivers who can make use of them built in, and I'll apply the patch.  
> 
> I'm just a bit reluctant to touch these drivers, since they're all
> incredibly ancient.  We don't have good luck with simple transformation
> patches on the older drivers ... and it seems to take months before
> anyone notices there's a problem.

This is the patch that will return them to their original behaviour.

James

---
diff --git a/drivers/scsi/pcmcia/Kconfig b/drivers/scsi/pcmcia/Kconfig
index fa481b5..53857c6 100644
--- a/drivers/scsi/pcmcia/Kconfig
+++ b/drivers/scsi/pcmcia/Kconfig
@@ -6,7 +6,8 @@ menuconfig SCSI_LOWLEVEL_PCMCIA
bool "PCMCIA SCSI adapter support"
depends on SCSI!=n && PCMCIA!=n
 
-if SCSI_LOWLEVEL_PCMCIA && SCSI && PCMCIA
+# drivers have problems when build in, so require modules
+if SCSI_LOWLEVEL_PCMCIA && SCSI && PCMCIA && m
 
 config PCMCIA_AHA152X
tristate "Adaptec AHA152X PCMCIA support"


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SCSI: fix isa/pcmcia compile problem

2008-01-18 Thread Tejun Heo
James Bottomley wrote:
>> I personally think it's a bit odd to disallow building into kernel
>> because of the peculiarity of the implementation (including c files and
>> compiling them slightly differently) and also no one reporting doesn't
>> necessarily mean no one has attempted it and failed.
> 
> Heh ... I'll make you a deal.  Find just one user of one of these
> drivers who can make use of them built in, and I'll apply the patch.  

I don't think I can.  I didn't even know they were isa ones before
actually looking at the code.

> I'm just a bit reluctant to touch these drivers, since they're all
> incredibly ancient.  We don't have good luck with simple transformation
> patches on the older drivers ... and it seems to take months before
> anyone notices there's a problem.

Alright then, please go ahead and disallow built-in.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why is the kfree() argument const?

2008-01-18 Thread Krzysztof Halasa
"J.A. Magallón" <[EMAIL PROTECTED]> writes:

> That's what __attribute__ ((pure)) is for, but if none of the
> functions is pure, the compiler can not be sure about side effects
> and can not reorder things. Don't forget that functions can do
> anything apart from mangling with their arguments.

Though it seems it could legally transform:

void kfree(const int *x);

{
int v, *ptr = malloc(sizeof(int));
*ptr = 51;
v = *ptr;
kfree(ptr);
printf("%d", v);

into:

{
int v, *ptr = malloc(sizeof(int));
*ptr = 51;
kfree(ptr);
v = *ptr;
printf("%d", v);
}

if it knows that malloc generates unaliased pointers, which seems
reasonable in general.
-- 
Krzysztof Halasa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SCSI: fix isa/pcmcia compile problem

2008-01-18 Thread James Bottomley

On Sat, 2008-01-19 at 08:27 +0900, Tejun Heo wrote:
> James Bottomley wrote:
> > On Fri, 2008-01-18 at 16:20 +0900, Tejun Heo wrote:
> >> aha152x.c and fdomain are built twice - once for the isa driver and
> >> once for the PCMCIA one.  Through #ifdefs, the compiled codes are
> >> slightly different; thus, global symbols need to be given different
> >> names depending on which flavor is being built.  This patch adds
> >> GLOBAL() macro to aha152x.h and fdomain.h which change the symbol
> >> depending on PCMCIA.
> >>
> >> This bug has always existed but has been masked by the fact the
> >> drivers/scsi/pcmcia used subdir-(y|m) instead of obj-(y|m) which made
> >> drivers/scsi/pcmcia/built_in.o not linked into the kernel and thus
> >> avoided the duplicate symbols during compilation.
> >>
> >> Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
> >> ---
> >> Ah... missed that one.  Here's the updated version.
> > 
> > Actually, isn't the better fix just to return to the original behaviour?
> > 
> > As you pointed out, using the subdir instead of obj meant that although
> > the modules were built, the drivers were never linked into the main
> > kernel.  According to the records, this has been the default forever, so
> > there can be no-one anywhere relying on these drivers being built in.
> > Actually, as old style pcmcia drivers, I'm not sure there's much value
> > building them into the kernel anyway.
> > 
> > So just modify scsi/pcmcia/Kconfig to make them all depend on m.
> 
> Yeap, there is no problem if you don't allow them to be linked into the
> kernel.  If that's how you want it, please go ahead.
> 
> I personally think it's a bit odd to disallow building into kernel
> because of the peculiarity of the implementation (including c files and
> compiling them slightly differently) and also no one reporting doesn't
> necessarily mean no one has attempted it and failed.

Heh ... I'll make you a deal.  Find just one user of one of these
drivers who can make use of them built in, and I'll apply the patch.  

I'm just a bit reluctant to touch these drivers, since they're all
incredibly ancient.  We don't have good luck with simple transformation
patches on the older drivers ... and it seems to take months before
anyone notices there's a problem.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SCSI: fix isa/pcmcia compile problem

2008-01-18 Thread Tejun Heo
Tejun Heo wrote:
> James Bottomley wrote:
>> On Fri, 2008-01-18 at 16:20 +0900, Tejun Heo wrote:
>>> aha152x.c and fdomain are built twice - once for the isa driver and
>>> once for the PCMCIA one.  Through #ifdefs, the compiled codes are
>>> slightly different; thus, global symbols need to be given different
>>> names depending on which flavor is being built.  This patch adds
>>> GLOBAL() macro to aha152x.h and fdomain.h which change the symbol
>>> depending on PCMCIA.
>>>
>>> This bug has always existed but has been masked by the fact the
>>> drivers/scsi/pcmcia used subdir-(y|m) instead of obj-(y|m) which made
>>> drivers/scsi/pcmcia/built_in.o not linked into the kernel and thus
>>> avoided the duplicate symbols during compilation.
>>>
>>> Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
>>> ---
>>> Ah... missed that one.  Here's the updated version.
>> Actually, isn't the better fix just to return to the original behaviour?
>>
>> As you pointed out, using the subdir instead of obj meant that although
>> the modules were built, the drivers were never linked into the main
>> kernel.  According to the records, this has been the default forever, so
>> there can be no-one anywhere relying on these drivers being built in.
>> Actually, as old style pcmcia drivers, I'm not sure there's much value
>> building them into the kernel anyway.
>>
>> So just modify scsi/pcmcia/Kconfig to make them all depend on m.
> 
> Yeap, there is no problem if you don't allow them to be linked into the
> kernel.  If that's how you want it, please go ahead.
> 
> I personally think it's a bit odd to disallow building into kernel
> because of the peculiarity of the implementation (including c files and
> compiling them slightly differently) and also no one reporting doesn't
> necessarily mean no one has attempted it and failed.

Actually what's better would be to make all symbols static and include
the c file directly into the stub file.  How about that?

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SCSI: fix isa/pcmcia compile problem

2008-01-18 Thread Tejun Heo
James Bottomley wrote:
> On Fri, 2008-01-18 at 16:20 +0900, Tejun Heo wrote:
>> aha152x.c and fdomain are built twice - once for the isa driver and
>> once for the PCMCIA one.  Through #ifdefs, the compiled codes are
>> slightly different; thus, global symbols need to be given different
>> names depending on which flavor is being built.  This patch adds
>> GLOBAL() macro to aha152x.h and fdomain.h which change the symbol
>> depending on PCMCIA.
>>
>> This bug has always existed but has been masked by the fact the
>> drivers/scsi/pcmcia used subdir-(y|m) instead of obj-(y|m) which made
>> drivers/scsi/pcmcia/built_in.o not linked into the kernel and thus
>> avoided the duplicate symbols during compilation.
>>
>> Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
>> ---
>> Ah... missed that one.  Here's the updated version.
> 
> Actually, isn't the better fix just to return to the original behaviour?
> 
> As you pointed out, using the subdir instead of obj meant that although
> the modules were built, the drivers were never linked into the main
> kernel.  According to the records, this has been the default forever, so
> there can be no-one anywhere relying on these drivers being built in.
> Actually, as old style pcmcia drivers, I'm not sure there's much value
> building them into the kernel anyway.
> 
> So just modify scsi/pcmcia/Kconfig to make them all depend on m.

Yeap, there is no problem if you don't allow them to be linked into the
kernel.  If that's how you want it, please go ahead.

I personally think it's a bit odd to disallow building into kernel
because of the peculiarity of the implementation (including c files and
compiling them slightly differently) and also no one reporting doesn't
necessarily mean no one has attempted it and failed.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] Latencytop instrumentations part 1

2008-01-18 Thread Arjan van de Ven

Frank Ch. Eigler wrote:

Hi -

On Fri, Jan 18, 2008 at 02:33:34PM -0800, Arjan van de Ven wrote:

[...]

Can you suggest of some reason why all this instrumentation could
not be in the form of standard markers (perhaps conditionally
compiled out if necessary)?

sure. Every instrumentation you see is of the nested kind (since the lowest 
level
of nesting is already automatic via wchan).
If markers can provide me the following semantics, I'd be MORE than happy to 
use markers:
[...]
If markers can provide that semantics ... you sold me.


Further to what acme said, markers are semantics-free.  Callback
functions that implement your entry & exit semantics can be attached
at run time, at your pleasure.  (So can systemtap probes, for that
matter.)  The main difference would be that these callback functions
would have manage the per-thread LIFO data structures themselves,
instead of allocating backpointers on the kernel stack.  (Bonus marks
for not modifying task_struct. :-)


modifying task struct to have storage space is no big deal...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 16/22 -v2] add get_monotonic_cycles

2008-01-18 Thread Mathieu Desnoyers
* Steven Rostedt ([EMAIL PROTECTED]) wrote:
> On Fri, 18 Jan 2008, Mathieu Desnoyers wrote:
> >
> > But I have not seen a lot of situations where that kind of glue-code was
> > needed, so I think it makes sense to keep markers simple to use and
> > efficient for the common case.
> >
> > Then, in this glue-code, we can put trace_mark() and calls to in-kernel
> > tracers.
> 
> I'm almost done with the latency tracer work, and there are only a total
> of 6 hooks that I needed.
> 
>  - schedule context switch
>  - try_to_wake_up
>  - hard_irqs_off (which is already there for lockdep)
>  - hard irqs on (also for lockdep)
>  - lock_contention (already in for the lock contention code)
>  - lock acquire (also in there for contention code)
> 
> With the above, we could have this (if this is what I think you are
> recommending). For example in the context_switch case:
> 
>   trace_switch_to(prev, next);
>   switch_to(prev, next, prev);
> 
> and in sched.h I could have:
> 

Almost.. I would add :

static int trace_switch_to_enabled;

> static inline trace_switch_to(struct task_struct *prev,
>   struct task_struct *next)
> {
if (likely(!trace_switch_to_enabled))
  return;
>   trace_mark(kernel_schedudule,
>   "prev_pid %d next_pid %d prev_state %ld",
>   prev->pid, next->pid, prev->pid);
> 
>   trace_context_switch(prev, next);
> }
> 

And some code to activate the trace_switch_to_enabled variable (ideally
keeping a refcount).

By doing this, we would have the minimum impact on the scheduled when
disabled.

But remember that this trace_switch_to_enabled could be enabled for both
markers and your tracer, so you might need to put a branch at the
beginning of trace_context_switch() too.

Mathieu

> and have the trace_context_switch code be something that is turned on with
> the latency tracing utility (config option). That way production code can
> keep it off.
> 
> -- Steve
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/3] Latencytop instrumentations part 1

2008-01-18 Thread Frank Ch. Eigler
Hi -

On Fri, Jan 18, 2008 at 02:33:34PM -0800, Arjan van de Ven wrote:
> [...]
> > Can you suggest of some reason why all this instrumentation could
> > not be in the form of standard markers (perhaps conditionally
> > compiled out if necessary)?
> 
> sure. Every instrumentation you see is of the nested kind (since the lowest 
> level
> of nesting is already automatic via wchan).
> If markers can provide me the following semantics, I'd be MORE than happy to 
> use markers:
> [...]
> If markers can provide that semantics ... you sold me.

Further to what acme said, markers are semantics-free.  Callback
functions that implement your entry & exit semantics can be attached
at run time, at your pleasure.  (So can systemtap probes, for that
matter.)  The main difference would be that these callback functions
would have manage the per-thread LIFO data structures themselves,
instead of allocating backpointers on the kernel stack.  (Bonus marks
for not modifying task_struct. :-)

- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Makes lguest's irq handler typesafe

2008-01-18 Thread Tejun Heo
Rusty Russell wrote:
> Just a trivial example.
> ---
>  drivers/lguest/lguest_device.c |3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff -r 00ab7672f658 drivers/lguest/lguest_device.c
> --- a/drivers/lguest/lguest_device.c  Thu Jan 17 16:54:00 2008 +1100
> +++ b/drivers/lguest/lguest_device.c  Thu Jan 17 16:59:46 2008 +1100
> @@ -179,9 +179,8 @@ static void lg_notify(struct virtqueue *
>   hcall(LHCALL_NOTIFY, lvq->config.pfn << PAGE_SHIFT, 0, 0);
>  }
>  
> -static irqreturn_t lguest_interrupt(int irq, void *_vq)
> +static irqreturn_t lguest_interrupt(int irq, struct virtqueue *vq)
>  {
> - struct virtqueue *vq = _vq;
>   struct lguest_device_desc *desc = to_lgdev(vq->vdev)->desc;
>  
>   if (unlikely(desc->config_change)) {

Type safety is good but I doubt this would be worth the complexity.  It
has some benefits but there's much larger benefit in keeping things in
straight C.  People know that functions take fixed types and are also
familiar with the convention of passing void * for callback arguments.
IMHO, staying in line with those common knowledges easily trumps having
type checking on interrupt handler.

Also, how often do we see a bug where things go wrong because interrupt
handler is given the wrong type of argument?  Even when such bug
happens, I doubt it can escape the developer's workstation if he/she is
paying any attention to testing.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET] printk: implement printk_header() and merging printk

2008-01-18 Thread Tejun Heo
Matt Mackall wrote:
> On Wed, 2008-01-16 at 10:00 +0900, Tejun Heo wrote:
>> And mprintk the following.
>>
>>  code:
>>   DEFINE_MPRINTK(mp, 2 * 80);
>>
>>   mprintk_set_header(, KERN_INFO "ata%u.%2u: ", 1, 0);
>>   mprintk_push(, "ATA %d", 7);
>>   mprintk_push(, ", %u sectors\n", 1024);
>>   mprintk(, "everything seems dandy\n");
> 
> I prefer Matthew Wilcox's stringbuf approach which does proper memory
> management and isn't specific to printk:
> 
> http://www.ussg.iu.edu/hypermail/linux/kernel/0710.3/0517.html

Yeap, that's generic and nice but I think both 'generic' and 'proper
memory management' are weakness if what you're trying to do is to
support collecting messages in pieces and putting it out via printk.
Please consider the following scenario.

You're in an interrupt handler and detected a severe error condition
which should be notified to the user but the information is rather
complex and best built in pieces, so you create a stringbuf and does
sb_printf() to it w/ GFP_ATOMIC but alas memory allocation failed and
you end up printing "out of memory" unless you detect the failure and go
back and printk messages piece-by-piece manually.  I would rather
assemble the message manually from the get-go into an on-stack buffer.

By being specifially 'printk' and let the user supply buffer, which in
most cases can be on-stack (messages shouldn't be too long anyway),
mprintk either can avoid those problems from the beginning or can
automatically work around when problem arises (initialized with NULL
buffer from allocation failure) without losing any message, so it's
essentially as simple as using printk.  There is no error handling (both
mprintk and kfree can handle NULL pointer) and the message is guaranteed
to go out no matter what.

Auto-expanding string buffer is nice but I don't think it fits the bill
here.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: crash in kmem_cache_init

2008-01-18 Thread Olaf Hering
On Fri, Jan 18, Christoph Lameter wrote:

> Could you try this patch?

Does not help, same crash.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files

2008-01-18 Thread Rik van Riel
On Fri, 18 Jan 2008 14:47:33 -0800 (PST)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

>  - keep it simple. Let's face it, Linux has never ever given those 
>guarantees before, and it's not is if anybody has really cared. Even 
>now, the issue seems to be more about paper standards conformance than 
>anything else.

There is one issue which is way more than just standards conformance.

When a program changes file data through mmap(), at some point the
mtime needs to be update so that backup programs know to back up the
new version of the file.

Backup programs not seeing an updated mtime is a really big deal.

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >